A standalone C++17 parser for a NEML2-flavored dialect of HIT (Hierarchical Input Text) — the hierarchical input format used by MOOSE. This library provides a self-contained, opinionated implementation tailored for use in NEML2 and related projects. It differs from the upstream MOOSE HIT parser in syntax restrictions and API design choices; it is not a general-purpose drop-in replacement. The library depends on Flex & Bison.
The public C++ namespace is nmhit ("NEML2 HIT").
HIT is a simple, human-readable format for hierarchical configuration. A file is a flat sequence of items — sections, key-value fields, comments, blank lines, and file includes — which together form a tree.
A # character begins a comment that extends to the end of the line. Comments are preserved in
the AST and are reproduced by render().
# This is a comment
key = value # inline comments are not supported; this text is part of the value
Note:
#is a reserved character in all value positions. It cannot appear inside an unquoted string value or an array element.
One or more consecutive blank lines are preserved as a single Blank node and are reproduced by
render().
A section groups related fields and nested sub-sections.
[section_name]
key = value
nested_key = 42
[]
Every section must be closed with []. There is no [../] or [./name] syntax.
Path splitting. A slash in the section header creates the corresponding nesting in the AST:
[mesh/generator]
type = CartesianMesh
[]
is equivalent to:
[mesh]
[generator]
type = CartesianMesh
[]
[]
Sections may appear at the top level or nested inside other sections. Fields and nested sections can appear in any order within a section body.
A field assigns a value to a name.
key = value
Identifier characters. A field name may contain letters, digits, and any of
. / < > + - * ! _ ~. Slashes in a field name trigger path splitting (see below).
The ~ is allowed primarily for NEML2's var~1 history-variable convention.
Path splitting. A slash in the field name creates intermediate Section nodes in the AST:
[solver]
linear/max_iter = 100
[]
is equivalent to:
[solver]
[linear]
max_iter = 100
[]
[]
Override assignment. The operators := and :override= are both accepted. The library
implements last-override-wins semantics directly: the earlier occurrence of the field is
removed from the tree, leaving only the overriding value.
max_iter := 200
max_iter :override= 200 # identical meaning
Every value is one of the following kinds.
An optional sign followed by one or more decimal digits.
n = 42
n = -7
n = +0
Standard decimal notation with an optional sign and optional exponent.
x = 3.14
x = -1.0e-3
x = .5
x = 2.
x = 1e10
At least one digit must appear on one side of the decimal point, or an exponent must be present.
The value is stored verbatim as a string. At interpretation time:
param<double>()parses it as a 64-bit IEEE 754 double-precision value.param<float>()parses it asdoublefirst, then narrows to 32-bit single precision. Values outside thefloatrange become±inf; values that are representable indoublebut not exactly infloatare rounded to the nearestfloat.
Exactly the two lowercase literals true and false. No other strings
(including yes, no, on, off, or any capitalised variant) are accepted.
flag = true
flag = false
Any sequence of non-whitespace characters that does not begin a number, boolean, quoted string,
array, or brace expression, and contains none of [ # $ ' " \.
type = GeneratedMesh
label = some_label
path = /usr/local/share
Unquoted strings are single-line only — they cannot contain whitespace or newlines.
A triple-quoted string stores its content verbatim — all whitespace, newlines, and any mix of quote characters are preserved exactly as written. Two delimiter styles are supported:
# Triple single-quote delimiter
code = '''
import torch
result = torch.tensor([1.0, 2.0, 3.0])
'''
# Triple double-quote delimiter
label = """it's a "verbatim" value"""
The content between the opening and closing ''' (or """) delimiters is returned by
param_str() with whitespace and quote characters preserved exactly — no whitespace stripping,
no quote unescaping. ${...} brace expressions ARE expanded, the same way they are for
single-quoted strings, so triple-quoted bodies can interpolate values from elsewhere in the document:
n = 5
[block]
code = '''
for i in range(${n}):
print(i)
'''
[]
Verbatim fields are string-only. Calling param_int(), param_float(), param_bool(),
param_list_*(), or any other non-string accessor on a verbatim field raises nmhit::Error.
Only param_str() (and its param_optional_str variant) is allowed.
Tip: Triple-quoted strings solve the classic HIT quoting problem — a
'...'string cannot contain', and a"..."string cannot contain". Triple-quoted strings can contain any combination of single and double quotes as long as they do not form the closing triple delimiter.
A whitespace-delimited sequence of elements enclosed in single quotes or double quotes —
both delimiters are completely equivalent. Elements may be integers, floating-point numbers, or unquoted tokens
(none of which may contain ;, #, $, ', ", or \).
vals = '1 2 3'
floats = '1.0 2.5 3.14'
tags = 'alpha beta gamma'
The two quote styles are interchangeable:
vals = '1 2 3'
vals = "1 2 3" # identical meaning
An empty array is written as '' or "".
Array contents may span multiple lines — newlines inside the quotes are treated as whitespace:
vals = '
1 2 3
4 5 6
'
Rows are separated by ;. Each row is a whitespace-delimited sequence of elements, following the
same rules as 1-D array elements.
matrix = '1 2 3; 4 5 6; 7 8 9'
The semicolons and surrounding whitespace (including newlines) are flexible:
matrix = '
1 2 3;
4 5 6;
7 8 9
'
Every row must contain at least one element. Trailing semicolons (an empty last row) are a parse error.
Accessing a 2-D array value as a 1-D type (e.g. param<std::vector<int>>) will fail because the
semicolons are stored as part of the raw value. Accessing a 1-D array as a 2-D type returns a
single-row result.
A ${...} expression is expanded at value-extraction time (i.e. when param<T>() is called).
The raw token is stored in the AST as-is.
The following built-in commands are supported:
| Expression | Effect |
|---|---|
${varname} |
Look up the field at path varname from the document root and return its string value. |
${replace varname} |
Identical to ${varname}. |
${env VARNAME} |
Substitute the environment variable VARNAME. Returns an empty string when unset. |
${raw a b c} |
Concatenate all arguments literally: abc. |
Brace expressions may be nested:
prefix = /opt
lib = ${raw ${prefix} /lib} # → /opt/lib
A brace expression may appear as the sole value of a field:
dim = ${mesh/dim}
!include relative/or/absolute/path.i
The referenced file is parsed recursively and its top-level items are spliced into the AST at the
point of the !include directive. Relative paths are resolved against the directory of the
including file.
file = item* ;
item = section | field | comment | blank | include ;
section = '[' path ']' item* '[]' ;
field = ident ('=' | ':=' | ':override=') value ;
quote = "'" | '"' ;
value = integer | float | bool | unquoted_str
| brace_expr
| quote array_row (';' array_row)* quote
| quote quote
| "'''" <verbatim content> "'''"
| '"""' <verbatim content> '"""' ;
array_row = array_elem+ ;
array_elem = integer | float | unquoted_elem ;
include = '!include' path ;
comment = '#' <to end of line> ;
blank = <two or more consecutive newlines> ;
path = segment ('/' segment)* ;
segment = <one or more non-whitespace, non-bracket characters> ;
ident = [A-Za-z0-9_./<>+\-*!~]+ ;
integer = [+\-]? [0-9]+ ;
float = [+\-]? ( [0-9]* '.' [0-9]+ | [0-9]+ '.' [0-9]* ) ([eE] [+\-]? [0-9]+)?
| [+\-]? [0-9]+ [eE] [+\-]? [0-9]+ ;
(* stored verbatim; interpreted as double-precision (64-bit IEEE 754) by default,
narrowed to single-precision (32-bit) when read as float *)
bool = 'true' | 'false' ;
unquoted_str= [^ \t\n\r\[#$'"\\]+ ;
unquoted_elem=[^ \t\n\r;#$'"\\]+ ;
brace_expr = '${' <content, brace-depth-tracked> '}' ;Two entry points are provided to avoid ambiguity when passing string literals:
#include "nmhit/nmhit.h"
// Read and parse a file from disk.
// Throws nmhit::Error if the file cannot be opened or on syntax errors.
std::unique_ptr<nmhit::Node> root = nmhit::parse_file("my_file.i");
// Parse an in-memory string.
// !include paths are resolved relative to the current working directory.
std::unique_ptr<nmhit::Node> root = nmhit::parse_text("dim = 3\n");Both functions accept optional pre/post string vectors for injecting HIT snippets
(e.g. command-line overrides). All content is concatenated and parsed as a single
document, so := override semantics apply globally across all sources:
std::vector<std::string> cli_overrides = { "solver/max_iter := 200" };
auto root = nmhit::parse_file("input.i", /*pre=*/{}, cli_overrides);
auto root = nmhit::parse_text(input_text, /*pre=*/{}, cli_overrides);// Resolve a slash-separated path and return a typed value.
// Throws nmhit::Error if the path does not exist or the value cannot be converted.
int n = root->param<int>("mesh/dim");
double x = root->param<double>("solver/tol");
bool on = root->param<bool>("output/enabled");
// Return a default when the path is absent (does not throw).
int n = root->param_optional<int>("mesh/dim", 3);Built-in scalar types: bool, int, unsigned int, int64_t, float, double,
std::string.
1-D arrays: std::vector<T> for any built-in or registered scalar T.
2-D arrays: std::vector<std::vector<T>> for any built-in or registered scalar T.
// Walk direct children, optionally filtered by node type.
for (nmhit::Node * child : root->children()) { ... }
for (nmhit::Node * child : root->children(nmhit::NodeType::Field)) { ... }
// Find a node by relative path (returns nullptr when absent).
nmhit::Node * n = root->find("mesh/dim");
// Walk upward.
nmhit::Node * parent = n->parent();
nmhit::Node * docroot = n->root();
// Full slash-joined path from the root.
std::string fp = n->fullpath(); // e.g. "mesh/dim"
// Source location.
int line = n->line();
int col = n->column();
std::string file = n->filename();auto * f = dynamic_cast<nmhit::Field *>(root->find("mesh/dim"));
if (f) {
std::string raw = f->raw_val(); // stored string, e.g. "3" or "'1 2 3'"
f->set_val("4"); // replace the stored value
}// Render the tree back to HIT text (preserves comments and blank lines).
std::string text = root->render();
// Custom indentation.
std::string text = root->render(0, " "); // 4-space indentThe same conversions used internally by param<T>() are available as free functions for use on raw strings (e.g. from Field::raw_val()). Surrounding single or double quotes are stripped before conversion. All functions throw nmhit::Error on failure; the optional ctx node is used only to attach file/line/column information to the error.
bool nmhit::parse_bool (const std::string & s, const nmhit::Node * ctx = nullptr);
int64_t nmhit::parse_int (const std::string & s, const nmhit::Node * ctx = nullptr);
double nmhit::parse_double(const std::string & s, const nmhit::Node * ctx = nullptr);
float nmhit::parse_float (const std::string & s, const nmhit::Node * ctx = nullptr);Register a scalar parser once before any param<T>() call:
// Registration (e.g. in main() or a static initializer)
nmhit::TypeRegistry::register_parser<MyEnum>(
[](const std::string & s) -> MyEnum {
if (s == "linear") return MyEnum::Linear;
if (s == "quadratic") return MyEnum::Quadratic;
throw std::invalid_argument("unknown MyEnum value: " + s);
}
);
// Usage — all three arities work automatically once T is registered.
MyEnum e = root->param<MyEnum>("order");
std::vector<MyEnum> v = root->param<std::vector<MyEnum>>("orders");
std::vector<std::vector<MyEnum>> m = root->param<std::vector<std::vector<MyEnum>>>("order_matrix");The parser receives the unquoted, brace-expanded token string. Calling param<T>() for an
unregistered type throws nmhit::Error.
Thread safety:
register_parseris not thread-safe relative to concurrentparamcalls. Register all custom types before spawning threads that callparam.
All errors throw nmhit::Error, which is a std::exception carrying a vector of
nmhit::ErrorMessage (filename, line, column, message).
try {
auto root = nmhit::parse("input.i", text);
} catch (const nmhit::Error & e) {
for (auto & msg : e.messages)
std::cerr << msg.str() << '\n'; // "file.i:10:5: unexpected '}'"
}pip install nmhitWheels are published to PyPI for Linux (x86_64, aarch64) and macOS (x86_64, arm64), covering Python 3.9 and later. No Flex or Bison is required.
import nmhit
# Parse a file or an in-memory string
root = nmhit.parse_file("input.i")
root = nmhit.parse_text("[mesh]\n dim = 3\n[]")
# Read typed values via slash-separated paths
dim = root.param_int("mesh/dim") # int
tol = root.param_float("solver/tol") # float
on = root.param_bool("output/enabled") # bool
tag = root.param_str("type") # str
# Optional — returns a default when the path is absent
n = root.param_optional_int("mesh/dim", 3)
# 1-D and 2-D arrays
vals = root.param_list_int("vals") # list[int]
matrix = root.param_list_list_float("matrix") # list[list[float]]parse_text and parse_file accept optional pre and post keyword arguments
(lists of HIT strings) for injecting snippets or command-line overrides:
root = nmhit.parse_file("input.i", post=["solver/max_iter := 200"])nmhit.param() infers the type from the raw value (bool → int → float → str)
and returns a native Python object. Pass an explicit type as the third argument
to override inference:
nmhit.param(root, "mesh/dim") # → 3 (int)
nmhit.param(root, "mesh/dim", float) # → 3.0
nmhit.param(root, "mesh/dim", str) # → "3"root = nmhit.parse_text("[mesh]\n dim = 3\n[]")
node = root.find("mesh/dim") # returns Field, or None if absent
sec = root.find("mesh") # returns Section
node.type() # nmhit.NodeType.Field / .Section / .Root / ...
node.path() # "dim"
node.fullpath() # "mesh/dim"
node.line() # source line number
# Direct children, optionally filtered by type
root.children() # list[Node]
root.children(nmhit.NodeType.Section) # list[Section]
# Walk upward
node.parent() # parent Node, or None at root
node.root_node() # the Root node# Change a field value in-place
root.find("mesh/dim").set_val("2")
# Add / insert / remove children (cloned into the tree)
root.add_child(nmhit.Field("k", "42"))
root.insert_child(0, nmhit.Field("first", "1"))
removed = root.remove_child("mesh") # returns the detached node
# Deep copy
root2 = root.clone()text = root.render() # default 2-space indent
text = root.render(indent_text=" ")All errors raise nmhit.Error (a subclass of RuntimeError).
The exception carries a .messages attribute — a list of ErrorMessage objects
with line, column, message, and filename fields:
try:
nmhit.parse_text("[mesh]\n dim = 3") # missing []
except nmhit.Error as e:
for m in e.messages:
print(m) # e.g. "<string>:2:9: expected '[]'"| Tool | Minimum version | When required |
|---|---|---|
| CMake | 3.20 | Always |
| C++ compiler | C++17 | Always |
| Flex | 2.6 | Debug builds only |
| Bison | 3.7 | Debug builds only |
Pre-generated parser/lexer sources are committed to generated/ and used
automatically for non-Debug build types, so end users and CI release builds do
not need flex or bison installed.
Release build (no flex or bison required — uses committed generated sources):
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)Debug build (requires flex ≥ 2.6 and bison ≥ 3.7 — regenerates parser/lexer from source):
cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug
cmake --build build -j$(nproc)After modifying src/Lexer.l or src/Parser.y, run the helper target to refresh
the committed sources in generated/ and then commit them:
cmake --build build --target update_generated
git add generated/ && git commitPass -DNMHIT_BUILD_TESTS=OFF to skip building the test executable.
ctest --test-dir build --output-on-failurecmake -S . -B build -DCMAKE_INSTALL_PREFIX=/your/prefix
cmake --build build -j$(nproc)
cmake --install buildThis installs:
| Path | Contents |
|---|---|
<prefix>/lib/libnmhit.a |
Static library |
<prefix>/include/nmhit/ |
Public headers |
<prefix>/lib/cmake/nmhit/ |
CMake config files |
<prefix>/lib/pkgconfig/nmhit.pc |
pkg-config file |
CMake find_package:
find_package(nmhit REQUIRED)
target_link_libraries(myapp PRIVATE nmhit::nmhit)If the library was installed to a non-standard prefix, point CMake at it:
cmake -S . -B build -Dnmhit_DIR=/your/prefix/lib/cmake/nmhitpkg-config:
pkg-config --cflags --libs nmhitIf the library was installed to a non-standard prefix:
PKG_CONFIG_PATH=/your/prefix/lib/pkgconfig pkg-config --cflags --libs nmhitAdd the repository as a subdirectory of your project:
add_subdirectory(neml2-hit)
target_link_libraries(myapp PRIVATE nmhit)The nmhit target exports include/ as a public include directory, so
#include "nmhit/nmhit.h" works without any additional configuration.
This project is a sub-component of NEML2 and is distributed under the same license.