Utilities for unicode characters and code points. More...
#include <Character.hpp>
Static Public Member Functions | |
static void | advanceUtf8 (const std::string_view &text, int &offset) |
Advance the supplied offset from one code point boundary to the next one. More... | |
static void | advanceUtf8Safe (const std::string_view &text, int &offset) |
Advance the supplied offset from one code point boundary to the next one (validating version). More... | |
static char32_t | getNextUtf8 (const std::string_view &text, int &offset) |
Get the next code point from the UTF-8 string view. More... | |
static char32_t | getNextUtf8Safe (const std::string_view &text, int &offset) |
Get the next code point from the UTF-8 string view (validating version). More... | |
static char32_t | getPreviousUtf8 (const std::string_view &text, int &offset) |
Get the previous code point from the UTF-8 string view. More... | |
static char32_t | getPreviousUtf8Safe (const std::string_view &text, int &offset) |
Get the previous code point from the UTF-8 string view (validating version). More... | |
static bool | isAlpha (char32_t c) |
Does the specified code point have the general category "L" (letters). More... | |
static bool | isAlphaOrDecimal (char32_t c) |
Does the specified code point have the general category "L" (letters) or "Nd" (decimal digit numbers). More... | |
static bool | isBinaryDigit (char32_t c) |
Is the specified code point one of the ASCII characters 0-1. More... | |
static bool | isBlank (char32_t c) |
Is the specified code point a character that visibly separates words on a line. More... | |
static bool | isBreakableCharacter (char32_t c) |
Is the specified code point a breakable character for line endings. More... | |
static bool | isControlCharacter (char32_t c) |
Is the specified code point a control character. More... | |
static bool | isDigit (char32_t c) |
Does the specified code point have the general category "Nd" (decimal digit numbers). More... | |
static bool | isHexDigit (char32_t c) |
Does the specified code point have the general category "Nd" (decimal digit numbers) or is one of the ASCII latin letters a-f or A-F. More... | |
static bool | isIdPart (char32_t c) |
Is the specified code point valid as part of an Id. More... | |
static bool | isIdStart (char32_t c) |
Does the specified code point have the general category "L" (letters) or "Nl" (letter numbers). More... | |
static bool | isInclusiveBreakableCharacter (char32_t c) |
Is the specified code point a breakable character for line endings that should be printed. More... | |
static bool | isLower (char32_t c) |
Does the specified code point have the general category "Ll" (lowercase letter). More... | |
static bool | isOctalDigit (char32_t c) |
Is the specified code point one of the ASCII characters 0-7. More... | |
static bool | isPrintable (char32_t c) |
Is the specified code point a printable character. More... | |
static bool | isPunctuation (char32_t c) |
Does the specified code point have the general category "P" (punctuation). More... | |
static bool | isSpace (char32_t c) |
Is the specified code point a space character (excluding CR / LF). More... | |
static bool | isUpper (char32_t c) |
Does the specified code point have the general category "Lu" (uppercase letter). More... | |
static bool | isWhitespace (char32_t c) |
Is the specified code point a whitespace character. More... | |
static void | retreatUtf8 (const std::string_view &text, int &offset) |
Retreat the supplied offset from one code point boundary to the previous one. More... | |
static void | retreatUtf8Safe (const std::string_view &text, int &offset) |
Retreat the supplied offset from one code point boundary to the previous one (validating version). More... | |
static void | setUtf8AndAdvanceOffset (std::string &destination, int &offset, char32_t c) |
Write a code point into the supplied UTF-8 string. More... | |
static char32_t | toLower (char32_t c) |
Convert the supplied code point to lowercase. More... | |
static char32_t | toUpper (char32_t c) |
Convert the supplied code point to uppercase. More... | |
static size_t | utf8ByteCount (char32_t c) |
Returns the number of bytes that the character occupies when UTF-8 encoded. More... | |
Utilities for unicode characters and code points.
|
inlinestatic |
Advance the supplied offset from one code point boundary to the next one.
text | the immutable input string view |
offset | the offset into the string (must be equal or less than the string's length) |
TODO decide on most appropriate error handling strategy.
|
inlinestatic |
Advance the supplied offset from one code point boundary to the next one (validating version).
If the string has invalid UTF-8 text, then the resulting char is <0.
text | the immutable input string view |
offset | the offset into the string (must be equal or less than the string's length) |
TODO decide on most appropriate error handling strategy.
|
inlinestatic |
Get the next code point from the UTF-8 string view.
The code point is obtained at the specified code point boundary offset and the offset is advanced to the next code point boundary.
text | the immutable input string view |
offset | the offset into the string (must be at least zero and less than the string's length) |
TODO decide on most appropriate error handling strategy.
|
inlinestatic |
Get the next code point from the UTF-8 string view (validating version).
If the string has invalid UTF-8 text, then the resulting char is <0.
The code point is obtained at the specified code point boundary offset and the offset is advanced to the next code point boundary.
text | the immutable input string view |
offset | the offset into the string (must be at least zero and less than the string's length) |
TODO decide on most appropriate error handling strategy.
|
inlinestatic |
Get the previous code point from the UTF-8 string view.
The offset is retreated to the previous code point boundary and the code point is obtained at the resulting code point boundary offset.
text | the immutable input string view |
offset | the offset into the string (must be at least zero and be less or equal to the string's length) |
TODO decide on most appropriate error handling strategy.
|
inlinestatic |
Get the previous code point from the UTF-8 string view (validating version).
If the string has invalid UTF-8 text, then the resulting char is <0.
The offset is retreated to the previous code point boundary and the code point is obtained at the resulting code point boundary offset.
text | the immutable input string view |
offset | the offset into the string (must be at least zero and be less or equal to the string's length) |
TODO decide on most appropriate error handling strategy.
|
inlinestatic |
Does the specified code point have the general category "L" (letters).
|
inlinestatic |
Does the specified code point have the general category "L" (letters) or "Nd" (decimal digit numbers).
|
inlinestatic |
Is the specified code point one of the ASCII characters 0-1.
|
inlinestatic |
Is the specified code point a character that visibly separates words on a line.
|
inlinestatic |
Is the specified code point a breakable character for line endings.
|
inlinestatic |
Is the specified code point a control character.
|
inlinestatic |
Does the specified code point have the general category "Nd" (decimal digit numbers).
|
inlinestatic |
Does the specified code point have the general category "Nd" (decimal digit numbers) or is one of the ASCII latin letters a-f or A-F.
|
inlinestatic |
Is the specified code point valid as part of an Id.
|
inlinestatic |
Does the specified code point have the general category "L" (letters) or "Nl" (letter numbers).
|
inlinestatic |
Is the specified code point a breakable character for line endings that should be printed.
|
inlinestatic |
Does the specified code point have the general category "Ll" (lowercase letter).
|
inlinestatic |
Is the specified code point one of the ASCII characters 0-7.
|
inlinestatic |
Is the specified code point a printable character.
|
inlinestatic |
Does the specified code point have the general category "P" (punctuation).
|
inlinestatic |
Is the specified code point a space character (excluding CR / LF).
|
inlinestatic |
Does the specified code point have the general category "Lu" (uppercase letter).
|
inlinestatic |
Is the specified code point a whitespace character.
|
inlinestatic |
Retreat the supplied offset from one code point boundary to the previous one.
text | the immutable input string view |
offset | the offset into the string (must be equal or less than the string's length) |
TODO decide on most appropriate error handling strategy.
|
inlinestatic |
Retreat the supplied offset from one code point boundary to the previous one (validating version).
If the string has invalid UTF-8 text, then the resulting char is <0.
text | the immutable input string view |
offset | the offset into the string (must be equal or less than the string's length) |
TODO decide on most appropriate error handling strategy.
|
inlinestatic |
Write a code point into the supplied UTF-8 string.
Set bytes in the destination string at the specified offset to the UTF-8 bytes resulting from the supplied code point. Advance the offset to immediate after the written bytes.
It is the responsibility of the caller to ensure that the destination string has enough bytes available for the code point at the specified offset.
destination | the UTF-8 string into which the bytes will be written |
offset | the offset into the destination string (this will be advanced) |
c | the code point |
|
inlinestatic |
Convert the supplied code point to lowercase.
|
inlinestatic |
Convert the supplied code point to uppercase.
|
inlinestatic |
Returns the number of bytes that the character occupies when UTF-8 encoded.