Static Public Member Functions | List of all members
Character Struct Reference

Utilities for unicode characters and code points. More...

#include <Character.hpp>

Static Public Member Functions

static void advanceUtf8 (const std::string_view &text, int &offset)
 Advance the supplied offset from one code point boundary to the next one. More...
 
static void advanceUtf8Safe (const std::string_view &text, int &offset)
 Advance the supplied offset from one code point boundary to the next one (validating version). More...
 
static char32_t getNextUtf8 (const std::string_view &text, int &offset)
 Get the next code point from the UTF-8 string view. More...
 
static char32_t getNextUtf8Safe (const std::string_view &text, int &offset)
 Get the next code point from the UTF-8 string view (validating version). More...
 
static char32_t getPreviousUtf8 (const std::string_view &text, int &offset)
 Get the previous code point from the UTF-8 string view. More...
 
static char32_t getPreviousUtf8Safe (const std::string_view &text, int &offset)
 Get the previous code point from the UTF-8 string view (validating version). More...
 
static bool isAlpha (char32_t c)
 Does the specified code point have the general category "L" (letters). More...
 
static bool isAlphaOrDecimal (char32_t c)
 Does the specified code point have the general category "L" (letters) or "Nd" (decimal digit numbers). More...
 
static bool isBinaryDigit (char32_t c)
 Is the specified code point one of the ASCII characters 0-1. More...
 
static bool isBlank (char32_t c)
 Is the specified code point a character that visibly separates words on a line. More...
 
static bool isBreakableCharacter (char32_t c)
 Is the specified code point a breakable character for line endings. More...
 
static bool isControlCharacter (char32_t c)
 Is the specified code point a control character. More...
 
static bool isDigit (char32_t c)
 Does the specified code point have the general category "Nd" (decimal digit numbers). More...
 
static bool isHexDigit (char32_t c)
 Does the specified code point have the general category "Nd" (decimal digit numbers) or is one of the ASCII latin letters a-f or A-F. More...
 
static bool isIdPart (char32_t c)
 Is the specified code point valid as part of an Id. More...
 
static bool isIdStart (char32_t c)
 Does the specified code point have the general category "L" (letters) or "Nl" (letter numbers). More...
 
static bool isInclusiveBreakableCharacter (char32_t c)
 Is the specified code point a breakable character for line endings that should be printed. More...
 
static bool isLower (char32_t c)
 Does the specified code point have the general category "Ll" (lowercase letter). More...
 
static bool isOctalDigit (char32_t c)
 Is the specified code point one of the ASCII characters 0-7. More...
 
static bool isPrintable (char32_t c)
 Is the specified code point a printable character. More...
 
static bool isPunctuation (char32_t c)
 Does the specified code point have the general category "P" (punctuation). More...
 
static bool isSpace (char32_t c)
 Is the specified code point a space character (excluding CR / LF). More...
 
static bool isUpper (char32_t c)
 Does the specified code point have the general category "Lu" (uppercase letter). More...
 
static bool isWhitespace (char32_t c)
 Is the specified code point a whitespace character. More...
 
static void retreatUtf8 (const std::string_view &text, int &offset)
 Retreat the supplied offset from one code point boundary to the previous one. More...
 
static void retreatUtf8Safe (const std::string_view &text, int &offset)
 Retreat the supplied offset from one code point boundary to the previous one (validating version). More...
 
static void setUtf8AndAdvanceOffset (std::string &destination, int &offset, char32_t c)
 Write a code point into the supplied UTF-8 string. More...
 
static char32_t toLower (char32_t c)
 Convert the supplied code point to lowercase. More...
 
static char32_t toUpper (char32_t c)
 Convert the supplied code point to uppercase. More...
 
static size_t utf8ByteCount (char32_t c)
 Returns the number of bytes that the character occupies when UTF-8 encoded. More...
 

Detailed Description

Utilities for unicode characters and code points.

Member Function Documentation

◆ advanceUtf8()

static void advanceUtf8 ( const std::string_view &  text,
int &  offset 
)
inlinestatic

Advance the supplied offset from one code point boundary to the next one.

Parameters
textthe immutable input string view
offsetthe offset into the string (must be equal or less than the string's length)

TODO decide on most appropriate error handling strategy.

◆ advanceUtf8Safe()

static void advanceUtf8Safe ( const std::string_view &  text,
int &  offset 
)
inlinestatic

Advance the supplied offset from one code point boundary to the next one (validating version).

If the string has invalid UTF-8 text, then the resulting char is <0.

Parameters
textthe immutable input string view
offsetthe offset into the string (must be equal or less than the string's length)

TODO decide on most appropriate error handling strategy.

◆ getNextUtf8()

static char32_t getNextUtf8 ( const std::string_view &  text,
int &  offset 
)
inlinestatic

Get the next code point from the UTF-8 string view.

The code point is obtained at the specified code point boundary offset and the offset is advanced to the next code point boundary.

Parameters
textthe immutable input string view
offsetthe offset into the string (must be at least zero and less than the string's length)

TODO decide on most appropriate error handling strategy.

◆ getNextUtf8Safe()

static char32_t getNextUtf8Safe ( const std::string_view &  text,
int &  offset 
)
inlinestatic

Get the next code point from the UTF-8 string view (validating version).

If the string has invalid UTF-8 text, then the resulting char is <0.

The code point is obtained at the specified code point boundary offset and the offset is advanced to the next code point boundary.

Parameters
textthe immutable input string view
offsetthe offset into the string (must be at least zero and less than the string's length)

TODO decide on most appropriate error handling strategy.

◆ getPreviousUtf8()

static char32_t getPreviousUtf8 ( const std::string_view &  text,
int &  offset 
)
inlinestatic

Get the previous code point from the UTF-8 string view.

The offset is retreated to the previous code point boundary and the code point is obtained at the resulting code point boundary offset.

Parameters
textthe immutable input string view
offsetthe offset into the string (must be at least zero and be less or equal to the string's length)

TODO decide on most appropriate error handling strategy.

◆ getPreviousUtf8Safe()

static char32_t getPreviousUtf8Safe ( const std::string_view &  text,
int &  offset 
)
inlinestatic

Get the previous code point from the UTF-8 string view (validating version).

If the string has invalid UTF-8 text, then the resulting char is <0.

The offset is retreated to the previous code point boundary and the code point is obtained at the resulting code point boundary offset.

Parameters
textthe immutable input string view
offsetthe offset into the string (must be at least zero and be less or equal to the string's length)

TODO decide on most appropriate error handling strategy.

◆ isAlpha()

static bool isAlpha ( char32_t  c)
inlinestatic

Does the specified code point have the general category "L" (letters).

◆ isAlphaOrDecimal()

static bool isAlphaOrDecimal ( char32_t  c)
inlinestatic

Does the specified code point have the general category "L" (letters) or "Nd" (decimal digit numbers).

◆ isBinaryDigit()

static bool isBinaryDigit ( char32_t  c)
inlinestatic

Is the specified code point one of the ASCII characters 0-1.

◆ isBlank()

static bool isBlank ( char32_t  c)
inlinestatic

Is the specified code point a character that visibly separates words on a line.

◆ isBreakableCharacter()

static bool isBreakableCharacter ( char32_t  c)
inlinestatic

Is the specified code point a breakable character for line endings.

◆ isControlCharacter()

static bool isControlCharacter ( char32_t  c)
inlinestatic

Is the specified code point a control character.

◆ isDigit()

static bool isDigit ( char32_t  c)
inlinestatic

Does the specified code point have the general category "Nd" (decimal digit numbers).

◆ isHexDigit()

static bool isHexDigit ( char32_t  c)
inlinestatic

Does the specified code point have the general category "Nd" (decimal digit numbers) or is one of the ASCII latin letters a-f or A-F.

◆ isIdPart()

static bool isIdPart ( char32_t  c)
inlinestatic

Is the specified code point valid as part of an Id.

◆ isIdStart()

static bool isIdStart ( char32_t  c)
inlinestatic

Does the specified code point have the general category "L" (letters) or "Nl" (letter numbers).

◆ isInclusiveBreakableCharacter()

static bool isInclusiveBreakableCharacter ( char32_t  c)
inlinestatic

Is the specified code point a breakable character for line endings that should be printed.

◆ isLower()

static bool isLower ( char32_t  c)
inlinestatic

Does the specified code point have the general category "Ll" (lowercase letter).

◆ isOctalDigit()

static bool isOctalDigit ( char32_t  c)
inlinestatic

Is the specified code point one of the ASCII characters 0-7.

◆ isPrintable()

static bool isPrintable ( char32_t  c)
inlinestatic

Is the specified code point a printable character.

◆ isPunctuation()

static bool isPunctuation ( char32_t  c)
inlinestatic

Does the specified code point have the general category "P" (punctuation).

◆ isSpace()

static bool isSpace ( char32_t  c)
inlinestatic

Is the specified code point a space character (excluding CR / LF).

◆ isUpper()

static bool isUpper ( char32_t  c)
inlinestatic

Does the specified code point have the general category "Lu" (uppercase letter).

◆ isWhitespace()

static bool isWhitespace ( char32_t  c)
inlinestatic

Is the specified code point a whitespace character.

◆ retreatUtf8()

static void retreatUtf8 ( const std::string_view &  text,
int &  offset 
)
inlinestatic

Retreat the supplied offset from one code point boundary to the previous one.

Parameters
textthe immutable input string view
offsetthe offset into the string (must be equal or less than the string's length)

TODO decide on most appropriate error handling strategy.

◆ retreatUtf8Safe()

static void retreatUtf8Safe ( const std::string_view &  text,
int &  offset 
)
inlinestatic

Retreat the supplied offset from one code point boundary to the previous one (validating version).

If the string has invalid UTF-8 text, then the resulting char is <0.

Parameters
textthe immutable input string view
offsetthe offset into the string (must be equal or less than the string's length)

TODO decide on most appropriate error handling strategy.

◆ setUtf8AndAdvanceOffset()

static void setUtf8AndAdvanceOffset ( std::string &  destination,
int &  offset,
char32_t  c 
)
inlinestatic

Write a code point into the supplied UTF-8 string.

Set bytes in the destination string at the specified offset to the UTF-8 bytes resulting from the supplied code point. Advance the offset to immediate after the written bytes.

It is the responsibility of the caller to ensure that the destination string has enough bytes available for the code point at the specified offset.

Parameters
destinationthe UTF-8 string into which the bytes will be written
offsetthe offset into the destination string (this will be advanced)
cthe code point

◆ toLower()

static char32_t toLower ( char32_t  c)
inlinestatic

Convert the supplied code point to lowercase.

◆ toUpper()

static char32_t toUpper ( char32_t  c)
inlinestatic

Convert the supplied code point to uppercase.

◆ utf8ByteCount()

static size_t utf8ByteCount ( char32_t  c)
inlinestatic

Returns the number of bytes that the character occupies when UTF-8 encoded.


The documentation for this struct was generated from the following file: