Contents
Hierarchical properties

Overview

This chapter describes a hierarchical property file format and associated C++ parser. The format has been conceived principally for describing application, environment, and logging configuration. The format is based upon a hierarchical extension to the Java .properties file format. Composite properties are defined via "{" and "}" delimited blocks.

In addition to providing composite properties, the hierarchical property format provides an include feature. This allows property files to be spread over multiple files and compiled via inclusion. Includes are specified via the "@" directive.

The hierarchical properties parser and scanner are written in pure C++, using the Balau parser utility classes.

The hierarchical properties parser can also parse non-hierarchical Java properties files. The exception that cannot be parsed is when a non-hierarchical properties file contains one or more names and/or values that contain unescaped special hierarchical characters. As the "{", "}" and "@" characters are used to indicate hierarchical blocks and include directives, parsing will fail if names and/or values are defined with these characters in them. Property names and values that contain these character can nevertheless be defined by escaping the special characters with a "\".

No file extension has been explicitly denoted to indicate hierarchical properties files. Given that the hierarchical property file format is effectively just a text based serialisation format, files themselves do not have any intrinsic semantics. It is thus proposed that users should define their own file extensions which attach semantic value to file contents. The Balau library thus uses the .properties extension for hierarchical properties files without specific semantics, the .thconf extension for environment configuration type specification files, and the .hconf extension for environment configuration value files. The environment configuration chapter discusses this in more detail.

Quick start

Format

The following is a simple example of a hierarchical property file. In the example file, the "=" separator is used for simple (non-hierarchical) properties and the " " separator is used for complex (hierarchical) properties. As with the non-hierarchical properties format, any of the "=", ":", or " " separators can be used for both simple and composite properties.

		http.server.worker.count = 8

		file.serve {
			location      = /
			document.root = file:src/doc
			cache.ttl     = 3600
		}
	

The hierarchical property format includes the same set of features as the non-hierarchical property format, including comments, escape codes and line continuation.

		# A hierarchical property file that has comments,
		# escaped characters, and line continuation.

		\#a\ complexly\ named\ property\# = \{ a value with curly brackets \}

		prop = a value with ## hash !! and excl

		group.config {
			# Use of line continuation.
			files = file1.txt \
			      , file2.txt \
			      , file3.txt
		}
	

Included files can be specified via the "@" directive. This directive takes an absolute URI, an absolute path, or a relative path.

		# An HTTPS include directive.
		@https://borasoftware.com/doc/examples/hprops.properties

		# An absolute path include directive.
		@/etc/balau/default-sites/default.site

		# A relative path include directive.
		@extra-sites/special.site
	

When an absolute or relative path is specified as in the second and third examples above, the URI type resolved should be the same as the URI of the property file that contains the include directive (this is performed by the implementation consuming the parsed file contents). For example, if the above example property file was supplied as a file URI, the absolute and relative path include directives would resolve to file URIs.

Include directives may also contain glob patterns.

		# A globbed, relative path include directive.
		@sites-enabled/*.site
	

Glob patterns are only supported by certain URI types (e.g. files and zip archives). It is the reponsibility of the property file writer/consumer to ensure that globbed includes are only used for URI types that support them.

Parsing

#include <Balau/Lang/Property/PropertyParsingService.hpp>

Creating a hierarchical properties parser and parsing some input text involves a single line of code.

		// The input URI that represents the source properties text (normally sourced elsewhere).
		Resource::File input("somePropertyFile.properties");

		// Call the parsing service.
		Properties properties = PropertyParsingService::parse(input);
	

Printing the parsed properties AST back into text can be achieved via the PropertyAstToString visitor class. This is normally performed via the PropertyNode toString function.

		// Pretty print the hierarchical properties AST back out to a string.
		std::string propertiesText = toString(items);
	

Visiting

#include <Balau/Lang/Property/Util/PropertyVisitor.hpp>

Once the input properties text has been parsed into an AST, it can be visited by implementing the PropertyVisitor interface.

As an example, an extract from the PropertyAstToString class provided in the Balau library is given below.

		class PropertyAstToString : public PropertyVisitor {
			public: void visit(Payload & payload, const Properties & object) override {
				for (auto & node : object.getNodes()) {
					node->visit(payload, *this);
				}
			}

			public: void visit(Payload & payload, const ValueProperty & object) override {
				auto & pl = static_cast<PropertyAstToStringPayload &>(payload);
				pl.writeIndent();
				pl.write(object.getName());
				pl.write(" = ");
				pl.write(object.getValue());
				pl.write("\n");
			}

			public: void visit(Payload & payload, const CompositeProperty & object) override {
				auto & pl = static_cast<PropertyAstToStringPayload &>(payload);
				pl.writeIndent();
				pl.write(object.getName());
				pl.write(" {\n");
				pl.incrementIndent();

				for (const auto & node : object.getNodes()) {
					node->visit(payload, *this);
				}

				pl.decrementIndent();
				pl.writeIndent();
				pl.write("}\n");
			}

			// ... more visitor methods ...
		};
	

When creating a custom properties AST visitor implementation, a quick way of achieving this is to copy the PropertyAstToString class and modify it to meet the requirements of the new visitor implementation.

Hierarchical format

The basic format of the hierarchical propery format is the same as that of Java .properties files.

The following additional rules add the hierarchical extension to the Java .properties file format.

Classes

All the classes are found in the Balau::Lang::Property namespace.

Class/enum Description
PropertyToken Language terminals enum
PropertyNode Abstract base class of the language non-terminal AST nodes
PropertyScanner The property scanner implementation
PropertyParser The property parser implementation
PropertyParserService Convenience class providing single function parsing.
PropertyVisitor The tightly coupled AST visitor interface
PropertyAstToString Property AST pretty printer

Data structures

The data structures used to hold the data generated from the Balau hierarchical properties parser are as follows.

Node classes can only exist within the context of an owning Properties instance that owns the parsed string.

		///
		/// Partial base class of nodes.
		///
		struct PropertyNode {
		};

		///
		/// The outer structure. A single instance of this
		/// struct represents the entire parsed properties text.
		///
		struct Properties : public PropertyNode {
			std::string text;
			std::vector<std::unique_ptr<PropertyNode>> nodes;
		};

		///
		/// Partial implementation of a key-value node.
		///
		struct ValueProperty : public PropertyNode {
			std::string_view key;
			std::string_view value;
		};

		///
		/// Partial implementation of a hierarchical node.
		///
		struct CompositeProperty : public PropertyNode {
			std::string_view key;
			std::vector<std::unique_ptr<PropertyNode>> nodes;
		};

		///
		/// Partial implementation of an include node.
		///
		struct IncludePropertyNode : public PropertyNode {
			std::string_view text;
		};

		///
		/// Partial implementation of a comment line node.
		///
		struct CommentPropertyNode : public PropertyNode {
			std::string_view text;
		};
	

As the AST classes are views onto the original input text, the names and values of properties are string views onto the original text, including any line continuation / leading blank combinations. In addition, escaped characters are in their escaped form.

In order to obtain final name and value text, the ValueProperty AST class has getName and getValue methods, and the CompositeProperty AST class has a getName method. These methods will process name and value text into the final form.

Grammar

Notation

The following notation is used in the grammar.

Symbol Meaning
= definition of rule
() grouping for precedence creation
* zero or more repetitions
+ one or more repetitions
? optional
| choice separator
(^ .. | ..) any except the content choice
"text" literal string in terminal
// .. comment

The choice separator has lowest notation precedence. All other notational entities have equal precedence.

Whitespace

A "\" character placed at the end of a line indicates line continuation. All non-escaped blanks (space/tab) occurring at the start of a line are semantically removed from property names/values that are broken up by line continuation. This is not represented in the grammar and thus occurs after parsing.

Explicit non-terminals

// Explicit non-terminals are the produced AST nodes.

S               = Properties

Properties      = Property*

Property        = Blank* (ValueProperty | ComplexProperty | Include | Comment)
                  (LineBreak | LineBreak? EndOfFile)

ValueProperty   = Key (Assignment Value?)?

ComplexProperty = Key Assignment OpenCurly LineBreak Property* CloseCurly

Include         = Arobase
                  ( OpenCurly | CloseCurly | Arobase     | Colon | Equals
                  | Blank     | Hash       | Exclamation | Text  | BackSlash )+

Comment         = (Hash | Exclamation)
                  ( OpenCurly | CloseCurly | Arobase     | Colon | Equals
                  | Blank     | Hash       | Exclamation | Text  | BackSlash )*

Implicit non-terminals

// Implicit non-terminals are assimilated into produced AST nodes.

Assignment      = ((Blank? (Equals | Colon) Blank?) | Blank)

Key             = KeyStart KeyCont

KeyStart        = Text           | EscapedOpenCurly | EscapedCloseCurly
                | EscapedArobase | EscapedColon     | EscapedEquals
                | EscapedHash    | EscapedExcl      | EscapedBackSlash
                | EscapedBlank   | EscapedChar      | (LineCont KeyCont)

KeyCont         = ( Text           | EscapedOpenCurly | EscapedCloseCurly
                  | EscapedArobase | EscapedColon     | EscapedEquals
                  | EscapedHash    | EscapedExcl      | EscapedBackSlash
                  | EscapedBlank   | EscapedChar      | Hash
                  | Exclamation    | (LineCont KeyCont)
                  )*

Value           = ValueStart ValueCont

ValueStart      = Text           | EscapedOpenCurly | EscapedCloseCurly
                | EscapedArobase | EscapedColon     | EscapedEquals
                | EscapedHash    | EscapedExcl      | EscapedBackSlash
                | EscapedBlank   | EscapedChar      | Hash
                | Exclamation    | CloseCurly       | Colon
                | Equals         | Blank            | (LineCont ValueCont)

ValueCont       = ( Text           | EscapedOpenCurly | EscapedCloseCurly
                  | EscapedArobase | EscapedColon     | EscapedEquals
                  | EscapedHash    | EscapedExcl      | EscapedBackSlash
                  | EscapedBlank   | EscapedChar      | Hash
                  | Exclamation    | CloseCurly       | Colon
                  | Equals         | Blank            | OpenCurly
                  | (LineCont ValueCont)
                  )*

LineCont        = EscapedLineBreak Blank*

Terminals

// The terminal strings in the definitions use \t, \r, \n, and \\
// placeholders in regular expressions (purple strings) to denote
// tab, carriage return, line feed, and BackSlash characters.

OpenCurly         = {
CloseCurly        = }
Arobase           = @
Colon             = :
Equals            = =
Blank             = space | \t
LineBreak         = \r\n|\n\r|\n|\r
Hash              = #
Exclamation       = !
EndOfFile         = no further input available
Text              = [^{}@:= \t\r\n#!\\]+
BackSlash         = \

EscapedOpenCurly  = \{
EscapedCloseCurly = \}
EscapedArobase    = \@
EscapedColon      = \:
EscapedEquals     = \=
EscapedHash       = \#
EscapedExcl       = \!
EscapedBackSlash  = \\
EscapedBlank      = \  | \\\t
EscapedChar       = \\[^{}:=#!\\ \t\r\n]
EscapedLineBreak  = \\(\r\n|\n\r|\r|\n)