Utilities for unicode characters and code points. More...

#include <Character.hpp>

Static Public Member Functions
static void	advanceUtf8 (const std::string_view &text, int &offset)
	Advance the supplied offset from one code point boundary to the next one. More...

static void	advanceUtf8Safe (const std::string_view &text, int &offset)
	Advance the supplied offset from one code point boundary to the next one (validating version). More...

static char32_t	getNextUtf8 (const std::string_view &text, int &offset)
	Get the next code point from the UTF-8 string view. More...

static char32_t	getNextUtf8Safe (const std::string_view &text, int &offset)
	Get the next code point from the UTF-8 string view (validating version). More...

static char32_t	getPreviousUtf8 (const std::string_view &text, int &offset)
	Get the previous code point from the UTF-8 string view. More...

static char32_t	getPreviousUtf8Safe (const std::string_view &text, int &offset)
	Get the previous code point from the UTF-8 string view (validating version). More...

static bool	isAlpha (char32_t c)
	Does the specified code point have the general category "L" (letters). More...

static bool	isAlphaOrDecimal (char32_t c)
	Does the specified code point have the general category "L" (letters) or "Nd" (decimal digit numbers). More...

static bool	isBinaryDigit (char32_t c)
	Is the specified code point one of the ASCII characters 0-1. More...

static bool	isBlank (char32_t c)
	Is the specified code point a character that visibly separates words on a line. More...

static bool	isBreakableCharacter (char32_t c)
	Is the specified code point a breakable character for line endings. More...

static bool	isControlCharacter (char32_t c)
	Is the specified code point a control character. More...

static bool	isDigit (char32_t c)
	Does the specified code point have the general category "Nd" (decimal digit numbers). More...

static bool	isHexDigit (char32_t c)
	Does the specified code point have the general category "Nd" (decimal digit numbers) or is one of the ASCII latin letters a-f or A-F. More...

static bool	isIdPart (char32_t c)
	Is the specified code point valid as part of an Id. More...

static bool	isIdStart (char32_t c)
	Does the specified code point have the general category "L" (letters) or "Nl" (letter numbers). More...

static bool	isInclusiveBreakableCharacter (char32_t c)
	Is the specified code point a breakable character for line endings that should be printed. More...

static bool	isLower (char32_t c)
	Does the specified code point have the general category "Ll" (lowercase letter). More...

static bool	isOctalDigit (char32_t c)
	Is the specified code point one of the ASCII characters 0-7. More...

static bool	isPrintable (char32_t c)
	Is the specified code point a printable character. More...

static bool	isPunctuation (char32_t c)
	Does the specified code point have the general category "P" (punctuation). More...

static bool	isSpace (char32_t c)
	Is the specified code point a space character (excluding CR / LF). More...

static bool	isUpper (char32_t c)
	Does the specified code point have the general category "Lu" (uppercase letter). More...

static bool	isWhitespace (char32_t c)
	Is the specified code point a whitespace character. More...

static void	retreatUtf8 (const std::string_view &text, int &offset)
	Retreat the supplied offset from one code point boundary to the previous one. More...

static void	retreatUtf8Safe (const std::string_view &text, int &offset)
	Retreat the supplied offset from one code point boundary to the previous one (validating version). More...

static void	setUtf8AndAdvanceOffset (std::string &destination, int &offset, char32_t c)
	Write a code point into the supplied UTF-8 string. More...

static char32_t	toLower (char32_t c)
	Convert the supplied code point to lowercase. More...

static char32_t	toUpper (char32_t c)
	Convert the supplied code point to uppercase. More...

static size_t	utf8ByteCount (char32_t c)
	Returns the number of bytes that the character occupies when UTF-8 encoded. More...

Detailed Description

Utilities for unicode characters and code points.

Member Function Documentation

◆ advanceUtf8()

static void advanceUtf8	(	const std::string_view &	text,
		int &	offset
	)

inlinestatic

Advance the supplied offset from one code point boundary to the next one.

Parameters

text	the immutable input string view
offset	the offset into the string (must be equal or less than the string's length)

TODO decide on most appropriate error handling strategy.

◆ advanceUtf8Safe()

static void advanceUtf8Safe	(	const std::string_view &	text,
		int &	offset
	)

inlinestatic

Advance the supplied offset from one code point boundary to the next one (validating version).

If the string has invalid UTF-8 text, then the resulting char is <0.

Parameters

text	the immutable input string view
offset	the offset into the string (must be equal or less than the string's length)

TODO decide on most appropriate error handling strategy.

◆ getNextUtf8()

static char32_t getNextUtf8	(	const std::string_view &	text,
		int &	offset
	)

inlinestatic

Get the next code point from the UTF-8 string view.

The code point is obtained at the specified code point boundary offset and the offset is advanced to the next code point boundary.

Parameters

text	the immutable input string view
offset	the offset into the string (must be at least zero and less than the string's length)

TODO decide on most appropriate error handling strategy.

◆ getNextUtf8Safe()

static char32_t getNextUtf8Safe	(	const std::string_view &	text,
		int &	offset
	)

inlinestatic

Get the next code point from the UTF-8 string view (validating version).

If the string has invalid UTF-8 text, then the resulting char is <0.

The code point is obtained at the specified code point boundary offset and the offset is advanced to the next code point boundary.

Parameters

text	the immutable input string view
offset	the offset into the string (must be at least zero and less than the string's length)

TODO decide on most appropriate error handling strategy.

◆ getPreviousUtf8()

static char32_t getPreviousUtf8	(	const std::string_view &	text,
		int &	offset
	)

inlinestatic

Get the previous code point from the UTF-8 string view.

The offset is retreated to the previous code point boundary and the code point is obtained at the resulting code point boundary offset.

Parameters

text	the immutable input string view
offset	the offset into the string (must be at least zero and be less or equal to the string's length)

TODO decide on most appropriate error handling strategy.

◆ getPreviousUtf8Safe()

static char32_t getPreviousUtf8Safe	(	const std::string_view &	text,
		int &	offset
	)

inlinestatic

Get the previous code point from the UTF-8 string view (validating version).

If the string has invalid UTF-8 text, then the resulting char is <0.

The offset is retreated to the previous code point boundary and the code point is obtained at the resulting code point boundary offset.

Parameters

text	the immutable input string view
offset	the offset into the string (must be at least zero and be less or equal to the string's length)

TODO decide on most appropriate error handling strategy.

◆ isAlpha()

static bool isAlpha ( char32_t c )

inlinestatic

Does the specified code point have the general category "L" (letters).

◆ isAlphaOrDecimal()

static bool isAlphaOrDecimal ( char32_t c )

inlinestatic

Does the specified code point have the general category "L" (letters) or "Nd" (decimal digit numbers).

◆ isBinaryDigit()

static bool isBinaryDigit ( char32_t c )

inlinestatic

Is the specified code point one of the ASCII characters 0-1.

◆ isBlank()

static bool isBlank ( char32_t c )

inlinestatic

Is the specified code point a character that visibly separates words on a line.

◆ isBreakableCharacter()

static bool isBreakableCharacter ( char32_t c )

inlinestatic

Is the specified code point a breakable character for line endings.

◆ isControlCharacter()

static bool isControlCharacter ( char32_t c )

inlinestatic

Is the specified code point a control character.

◆ isDigit()

static bool isDigit ( char32_t c )

inlinestatic

Does the specified code point have the general category "Nd" (decimal digit numbers).

◆ isHexDigit()

static bool isHexDigit ( char32_t c )

inlinestatic

Does the specified code point have the general category "Nd" (decimal digit numbers) or is one of the ASCII latin letters a-f or A-F.

◆ isIdPart()

static bool isIdPart ( char32_t c )

inlinestatic

Is the specified code point valid as part of an Id.

◆ isIdStart()

static bool isIdStart ( char32_t c )

inlinestatic

Does the specified code point have the general category "L" (letters) or "Nl" (letter numbers).

◆ isInclusiveBreakableCharacter()

static bool isInclusiveBreakableCharacter ( char32_t c )

inlinestatic

Is the specified code point a breakable character for line endings that should be printed.

◆ isLower()

static bool isLower ( char32_t c )

inlinestatic

Does the specified code point have the general category "Ll" (lowercase letter).

◆ isOctalDigit()

static bool isOctalDigit ( char32_t c )

inlinestatic

Is the specified code point one of the ASCII characters 0-7.

◆ isPrintable()

static bool isPrintable ( char32_t c )

inlinestatic

Is the specified code point a printable character.

◆ isPunctuation()

static bool isPunctuation ( char32_t c )

inlinestatic

Does the specified code point have the general category "P" (punctuation).

◆ isSpace()

static bool isSpace ( char32_t c )

inlinestatic

Is the specified code point a space character (excluding CR / LF).

◆ isUpper()

static bool isUpper ( char32_t c )

inlinestatic

Does the specified code point have the general category "Lu" (uppercase letter).

◆ isWhitespace()

static bool isWhitespace ( char32_t c )

inlinestatic

Is the specified code point a whitespace character.

◆ retreatUtf8()

static void retreatUtf8	(	const std::string_view &	text,
		int &	offset
	)

inlinestatic

Retreat the supplied offset from one code point boundary to the previous one.

Parameters

text	the immutable input string view
offset	the offset into the string (must be equal or less than the string's length)

TODO decide on most appropriate error handling strategy.

◆ retreatUtf8Safe()

static void retreatUtf8Safe	(	const std::string_view &	text,
		int &	offset
	)

inlinestatic

Retreat the supplied offset from one code point boundary to the previous one (validating version).

If the string has invalid UTF-8 text, then the resulting char is <0.

Parameters

text	the immutable input string view
offset	the offset into the string (must be equal or less than the string's length)

TODO decide on most appropriate error handling strategy.

◆ setUtf8AndAdvanceOffset()

static void setUtf8AndAdvanceOffset	(	std::string &	destination,
		int &	offset,
		char32_t	c
	)

inlinestatic

Write a code point into the supplied UTF-8 string.

Set bytes in the destination string at the specified offset to the UTF-8 bytes resulting from the supplied code point. Advance the offset to immediate after the written bytes.

It is the responsibility of the caller to ensure that the destination string has enough bytes available for the code point at the specified offset.

Parameters

destination	the UTF-8 string into which the bytes will be written
offset	the offset into the destination string (this will be advanced)
c	the code point

◆ toLower()

static char32_t toLower ( char32_t c )

inlinestatic

Convert the supplied code point to lowercase.

◆ toUpper()

static char32_t toUpper ( char32_t c )

inlinestatic

Convert the supplied code point to uppercase.

◆ utf8ByteCount()

static size_t utf8ByteCount ( char32_t c )

inlinestatic

Returns the number of bytes that the character occupies when UTF-8 encoded.

The documentation for this struct was generated from the following file:

Character.hpp (2)

Static Public Member Functions

Detailed Description

Member Function Documentation

◆ advanceUtf8()

◆ advanceUtf8Safe()

◆ getNextUtf8()

◆ getNextUtf8Safe()

◆ getPreviousUtf8()

◆ getPreviousUtf8Safe()

◆ isAlpha()

◆ isAlphaOrDecimal()

◆ isBinaryDigit()

◆ isBlank()

◆ isBreakableCharacter()

◆ isControlCharacter()

◆ isDigit()

◆ isHexDigit()

◆ isIdPart()

◆ isIdStart()

◆ isInclusiveBreakableCharacter()

◆ isLower()

◆ isOctalDigit()

◆ isPrintable()

◆ isPunctuation()

◆ isSpace()

◆ isUpper()

◆ isWhitespace()

◆ retreatUtf8()

◆ retreatUtf8Safe()

◆ setUtf8AndAdvanceOffset()

◆ toLower()

◆ toUpper()

◆ utf8ByteCount()