This document provides a detailed look at Hydra's implementation, from type modules to coders to primitives to DSLs. It complements the Concepts documentation by focusing on the concrete architecture and code organization rather than abstract foundations.
Before reading this guide, you should:
- Understand Hydra's core concepts (Concepts)
- Be familiar with at least one of: Haskell, Java, or Python
- Have Hydra cloned and built locally (see main README)
This guide is for:
- Contributors who want to extend Hydra's kernel
- Developers implementing new language coders
- Anyone curious about Hydra's internal architecture
If you just want to use Hydra, start with Concepts and the main README instead.
- Architecture overview
- Type modules
- DSL system
- Primitive functions
- Variable resolution and graphs
- Cross-language compilation (coders)
- The bootstrap process
- Extending Hydra
- Appendix: Build scripts and executables
Hydra is a strongly-typed functional programming language that executes in multiple language environments. By design, developers can write Hydra source code in any of the supported host languages (Haskell, Java, Python, Scala, Lisp) and cross-compile it to any other supported language. Hydra-Haskell serves as the source of truth for the Hydra kernel (the core type system and transformation infrastructure), but Hydra programs themselves can be written and executed in Java, Python, Scala, Lisp, or any other supported implementation.
The implementation follows a layered architecture:
┌──────────────────────────────────────────────────────────────┐
│ Hydra Kernel (Source of Truth) │
│ Type system: Term, Type, Module, Graph, primitives, etc. │
│ Location: packages/hydra-kernel/src/main/haskell/Hydra/Sources/│
│ Written using: Haskell DSLs │
└────────────────────────┬─────────────────────────────────────┘
│ Defines
▼
┌──────────────────────────────────────────────────────────────┐
│ Language Implementations (Peers) │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Haskell │ │ Java │ │ Python │ ... │
│ │ (bootstrap)│ │ │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ Each implementation provides: │
│ • Hydra type system runtime │
│ • Primitive function implementations │
│ • Ability to execute Hydra programs │
│ • APIs for writing Hydra code in host language │
└────────────────────────┬─────────────────────────────────────┘
│ Cross-compile via
▼
┌──────────────────────────────────────────────────────────────┐
│ Coders (Cross-Language Transformations) │
│ Transform Hydra modules between language implementations │
│ DSL sources: packages/hydra-<lang>/src/main/haskell/Hydra/Sources/<Lang>/ │
│ Runtime driver: heads/haskell/src/main/haskell/Hydra/ExtGeneration.hs │
│ Enable: Write in Java, compile to Python (or vice versa) │
└──────────────────────────────────────────────────────────────┘
- Multi-language by design: Hydra programs can be written in any supported host language and cross-compiled to others
- Unified type system: All implementations share the same Hydra kernel (types, primitives, semantics)
- Self-hosting: The Hydra kernel is defined in Hydra itself (using Haskell as the bootstrap language)
- Type safety: Multiple layers of static type checking (host language + Hydra type system)
- Modularity: Clean separation between kernel definition, language implementations, and cross-compilation
- Metadata over file-system discovery: The build pipeline operates on declared metadata
(
hydra.json, per-packagepackage.json, in-DSL module manifests) and reads or writes files at known paths derived from that metadata. It does not scan the file system to discover what to do. Tools that walk a directory looking for "whatever's there" invert the source-of-truth relationship — the layout follows the tree instead of the tree following declarations — and silently drift when files are added, renamed, or hand-edited. When a build script needs to know which files to copy or process, the answer must come from a declaration, not afindwalk. - Per-package host code lives in
bindings/: Handwritten host-language code tied to a specific Hydra package belongs underbindings/<host>/<artifact>/, not inheads/<host>/. Two flavors: (a) third-party adapters that wrap external libraries (e.g.,hydra-rdf4jconnectshydra.rdf.syntax.*to Eclipse rdf4j;hydra-neo4jprovides ANTLR-based Cypher/GQL parsers); and (b) per-package host DSL helpers with no third-party deps (e.g.,hydra-pg-dslprovides Java fluent builders forhydra.pg.{model,query}). Each binding is independently versioned and publishable; it depends on exactly one Hydra package (e.g.,hydra-rdf4jdepends onhydra-rdf). The binding tree is not part of the DSL pipeline — bindings don't appear inhydra.json's package list, aren't synced throughbin/sync.sh, and aren't consumed by the bootstrap demo. They sit at the leaves of the dependency graph. This rule keepsheads/<host>/runtimes minimal: language-independent Hydra runtime + stdlib + build tooling.
Type modules define Hydra's core type system. They are located in:
packages/hydra-kernel/src/main/haskell/Hydra/Sources/Kernel/Types/
Hydra's kernel consists of ~20 type modules organized into logical categories.
The canonical list is Hydra.Sources.Kernel.Types.All.kernelTypesModules;
the descriptions below cover the main ones:
Core.hs - hydra.core module name (largest type module)
- Central hub defining fundamental types:
Term,Type,Literal,Function,Application,Lambda,Let,Record,Union, etc. - All other modules depend on Core directly or transitively
- Special property: imports itself as a dependency
Variants.hs - hydra.variants module name
- Supplements Core with metadata types NOT referenced by Core
- Defines variant enums:
TermVariant,TypeVariant,LiteralVariant, etc. - Provides introspection capabilities:
Precision,Comparison
Packaging.hs - hydra.packaging module
- Defines the packaging model:
Package,Module,Definition,ModuleName,ModuleDependency,PackageDependency,VersionSpecifier, and the metadata typesEntityMetadata,LifecycleInfo,EntityReference,DefinitionReference,Version. - A
Modulecarries aname :: ModuleName, an optionalmetadata :: Maybe EntityMetadata, a list ofdependencies :: [ModuleDependency], and a list ofdefinitions. AModuleDependencyis the depended-onmodule :: ModuleNameplus an optionalpackage :: Maybe PackageName. - For the conceptual model (entity metadata, lifecycle/versioning, cross-references), see the Packaging wiki page.
Coders.hs - hydra.coders module name
- Defines
Coder,Adapter,Bicoder,Language,LanguageConstraints,AdapterContext,TraversalOrder - The framework is Either-based; the former
Flowmonad was removed in #245
Graph.hs - hydra.graph module name
- Extends core with graph operations
- Defines:
Graph,Primitive,TermCoder
Query.hs - hydra.query module name
- Language-agnostic graph pattern queries
- Triple patterns and path expressions
Typing.hs - hydra.typing module name
- Type inference and reconstruction
- Type constraints and substitutions
TypeClassrecord (used byhydra.classesterm bindings)
hydra.classes - term module (not a type module)
equalityandorderingbindings of typeTypeClass- See the Concepts wiki § Type classes
Errors.hs - hydra.errors module name and the Error/ subdirectory
- Structured error types used by inference, checking, and coders
Parsing.hs - hydra.parsing module name
Paths.hs - hydra.paths module name
Ast.hs - hydra.ast — common syntax tree for serializers
Tabular.hs - hydra.tabular — CSV/TSV data model (generic)
Testing.hs - hydra.testing — unit testing framework
Typed.hs - hydra.typed — typed (phantom) wrappers for DSL use
Relational.hs - hydra.relational — Codd's Relational Model
Topology.hs - hydra.topology — graph algorithms (Tarjan SCC)
Util.hs - hydra.util — misc utilities
All type modules follow a consistent structure:
module Hydra.Sources.Kernel.Types.ModuleName where
import Hydra.Kernel
import Hydra.Dsl.Bootstrap
import Hydra.Dsl.Types as Types
import qualified Hydra.Sources.Kernel.Types.Core as Core
module_ :: Module
module_ = Module {
moduleName = ns,
moduleDefinitions = DefinitionType <$> definitions,
moduleDependencies = unqualifiedDep <$> [moduleName Core.module_],
moduleDescription = Just description}
where
ns = ModuleName "hydra.namespace"
core = typeref $ moduleName Core.module_
def = datatype ns
definitions = [
def "TypeName1" $ doc "Description" $ definition1,
def "TypeName2" $ doc "Description" $ definition2,
-- ...
]def "Term" $
doc "A data term" $
union [
"annotated">: core "AnnotatedTerm",
"application">: core "Application",
"either">: Types.either_ (core "Term") (core "Term"),
"function">: core "Function",
"let">: core "Let",
"list">: list $ core "Term",
"literal">: core "Literal",
"map">: Types.map (core "Term") (core "Term"),
"optional">: optional $ core "Term",
"pair">: Types.pair (core "Term") (core "Term"),
"record">: core "Record",
"set">: set $ core "Term",
"typeApplication">: core "TypeApplicationTerm",
"typeLambda">: core "TypeLambda",
"union">: core "Injection",
"unit">: T.unit,
"variable">: core "Name",
"wrap">: core "WrappedTerm"
]def "Module" $
doc "A logical collection of definitions sharing a module name" $
record [
"description">: optional string,
"name">: packaging "ModuleName",
"dependencies">: list $ packaging "ModuleDependency",
"definitions">: list $ packaging "Definition"
]def "Table" $
doc "A simple table with header and data rows" $
forAll "v" $ record [
"header">: optional $ tabular "HeaderRow",
"data">: list (tabular "DataRow" @@ "v")
]def "TermVariant" $
doc "The identifier of a term constructor" $
enum [
"annotated", "application", "either", "function",
"let", "list", "literal", "map", "optional",
"pair", "record", "set", "typeApplication",
"typeLambda", "union", "unit", "variable", "wrap"
]Core (hydra.core) - Foundation
├─ Variants - Supplements with variants and introspection types
├─ Classes - Typeclass metadata (Ord, Eq)
├─ Typing - Type system support (inference results, schemes)
├─ Phantoms - DSL phantom types
├─ Tabular - Tabular data
├─ Query - Graph queries
├─ Testing - Test framework
└─ Topology - Graph algorithms
Core + supporting types
├─ Graph - Extends core with graph operations
├─ Coders - Language-transformation framework (Coder, Adapter, Language, ...)
└─ Packaging - Module, Definition, ModuleName, ModuleDependency, Package
Error model
├─ Errors - Structured error types
└─ Error/* - Per-subsystem error types
Key properties:
- No circular dependencies at type level
- Clear separation: foundation (Core/Variants) vs. extensions
- Layered architecture: Atomic → Composite → Integrative → Specialized
Hydra uses embedded domain-specific languages (eDSLs) in Haskell to define its entire kernel. The DSL system provides multiple levels of abstraction for different use cases.
heads/haskell/src/main/haskell/Hydra/Dsl/ # Hand-written base DSLs
heads/haskell/src/main/haskell/Hydra/Dsl/Meta/ # Hand-written meta DSL wrappers
heads/haskell/src/main/haskell/Hydra/Dsl/Meta/Lib/ # Library DSLs (13 files)
dist/haskell/hydra-kernel/src/main/haskell/Hydra/Dsl/ # Generated DSLs (from hydra.dsls)
heads/haskell/src/main/haskell/Hydra/ # Generation drivers and sources
dist/haskell/hydra-<pkg>/src/main/haskell/Hydra/ # Generated per-package coder modules
# (hydra-haskell, hydra-java, hydra-python,
# hydra-scala, hydra-lisp, hydra-typescript,
# hydra-go (head bud),
# hydra-pg, hydra-rdf, hydra-ext for the long-tail,
# hydra-coq, ...)
See also: DSL guide - Comprehensive guide with examples and operator reference
Direct term/type construction without compile-time safety:
-- Terms.hs - construct Term values
term1 = var "x"
term2 = apply (var "f") (int32 42)
term3 = lambda "x" (var "x")
-- Types.hs - construct Type values
type1 = string
type2 = int32 --> string
type3 = list (optional boolean)Use Case: Low-level term construction, minimal overhead, runtime errors possible
Compile-time type safety via phantom types:
-- Phantoms.hs - TypedTerm a where 'a' is a phantom type
goodFunc :: TypedTerm (Int -> String)
goodFunc = lambda "x" (Strings.toUpper (var "x"))
-- Type error at Haskell compile time!
badFunc :: TypedTerm (Int -> String)
badFunc = lambda "x" (int32 42) -- Expected String, got IntUse Case: Write Hydra code with Haskell's type checking as a safety net
Write programs that build programs (meta-programming):
-- Meta/Terms.hs - terms that construct terms
buildAddFunction :: TypedTerm (Int -> Int -> Int)
buildAddFunction =
lambda "x" $ lambda "y" $
primitive DefMath.add @@ var "x" @@ var "y"
-- Can inspect and transform this representationUse Case: Code generators, meta-programs, self-modifying code
The hydra.dsls module (Sources/Kernel/Terms/Dsls.hs) automatically generates
phantom-typed DSL functions from any Hydra type module. For each type definition, it
produces:
- Record constructors — one function taking all fields as
TypedTermarguments - Field accessors — one function per field, returning the field value
- Field updaters —
withXxxfunctions that return a modified copy of the record - Union injectors — one function per variant (unit variants produce nullary values)
- Wrap/unwrap — for newtype wrappers
Generated modules live in dist/haskell/hydra-kernel/src/main/haskell/Hydra/Dsl/ (e.g., Hydra.Dsl.Core,
Hydra.Dsl.Coders, Hydra.Dsl.Ast). They are also generated into Java and Python
as part of the sync pipeline.
- Terms.hs - Plain DSL for terms (
apply,lambda,record,inject) - Types.hs - Plain DSL for types (operators
-->,@@) - ShorthandTypes.hs - Convenient aliases (
tInt32,tString,tList) - Bootstrap.hs - Bootstrapping utilities
- Annotations.hs - Annotation handling
- Grammars.hs - Grammar and syntax definitions
- Literals.hs, LiteralTypes.hs - Literal handling
These modules re-export the corresponding generated DSL module and add non-standard
helpers such as AsTerm-flexible overrides, expression conversion pipelines, and
compatibility shims.
- Meta/Core.hs - Wraps
Hydra.Dsl.Core; addsAsTermoverrides forbinding,injection,typeVariable; helpers likeequalName_,false - Meta/Graph.hs - Wraps
Hydra.Dsl.Graph; adds graph construction helpers - Meta/Phantoms.hs - Phantom-typed term construction (
TypedTerm a), operators (@@,~>,<~) - Meta/Terms.hs - Phantom-typed term-encoded terms
- Meta/Types.hs - Phantom-typed term-encoded types
- Meta/Variants.hs - Wraps
Hydra.Dsl.Variants; metadata variants and introspection - Meta/Testing.hs - Wraps
Hydra.Dsl.Testing; test convenience helpers
Phantom-typed wrappers for standard library functions:
Hydra/Dsl/Meta/Lib/
├── Lists.hs # map, filter, fold, concat, etc.
├── Maps.hs # lookup, insert, keys, values, etc.
├── Sets.hs # union, intersection, member, etc.
├── Strings.hs # concat, split, toUpper, toLower, etc.
├── Chars.hs # isAlpha, isDigit, toUpper, toLower
├── Math.hs # add, sub, mul, div, sin, cos, sqrt, etc.
├── Logic.hs # and, or, not, ifElse
├── Optionals.hs # fromOptional, cases, isGiven, etc.
├── Eithers.hs # either, isLeft, rights, etc.
├── Equality.hs # equal, compare, gt, lt, etc.
├── Pairs.hs # fst, snd, curry, uncurry
├── Regex.hs # matches, find, replace, split
└── Literals.hs # Type conversions and parsing
The DSL provides convenient operators for readable code:
-- Type construction
(-->) :: Type -> Type -> Type -- Function type
(@@) :: Type -> Type -> Type -- Type application
-- Term construction
(<.>) :: Term -> Term -> Term -- Function composition
(@@) :: Term -> Term -> Term -- Function application
(>:) :: String -> a -> Field -- Field definition
-- Phantom-typed construction
(~>) :: String -> TypedTerm a -> TypedTerm (x -> b) -- Lambda
(<~) :: String -> TypedTerm a -> TypedTerm b -> TypedTerm b -- Let binding
(<<~) :: String -> TypedTerm (Either e a) -> TypedTerm (Either e b) -> TypedTerm (Either e b) -- Either bind
-- Examples
intToString = int32 --> string -- Type
addOne = lambda "x" (var "x" <.> int32 1) -- Term
person = record "Person" [
"name" >: string,
"age" >: int32
]Here's a complete example showing DSL usage in type inference:
-- From Hydra.Sources.Kernel.Terms.Inference
inferTypeOfEither :: TypedTermDefinition (InferenceContext -> Graph -> Either Term Term -> Either Error InferenceResult)
inferTypeOfEitherDef = define "inferTypeOfEither" $
doc "Infer the type of an Either term" $
"cx" ~> "e" ~>
-- Pattern match on left or right
Eithers.either_
-- Left case
("left" ~>
"leftResult" <<~ ref inferTypeDef @@ var "cx" @@ var "left" $
"type_" <~ InferenceResult.type_ (var "leftResult") $
"cx2" <~ InferenceResult.context (var "leftResult") $
produce $ InferenceResult.inferenceResult (var "cx2")
(Types.either_ (var "type_") (var "any")))
-- Right case
("right" ~>
"rightResult" <<~ ref inferTypeDef @@ var "cx" @@ var "right" $
"type_" <~ InferenceResult.type_ (var "rightResult") $
"cx2" <~ InferenceResult.context (var "rightResult") $
produce $ InferenceResult.inferenceResult (var "cx2")
(Types.either_ (var "any") (var "type_")))
(var "e")Features Demonstrated:
define- Define a named function~>- Function abstraction<~- Let binding<<~-Either-bind (bind into the error-handling monad-like combinator)@@- Function applicationref- Reference to another definition- Type-safe operations on
InferenceResultandEither
User Code (Python/Java/Haskell)
↓ (serialized as Core.Term)
Hydra Core Language (Type, Term, Function, Lambda, etc.)
↓ (defined via DSLs)
Hydra DSLs in Haskell (Terms.hs, Types.hs, Phantoms.hs, etc.)
↓ (generates code for)
Generated Source Code (Haskell, Python, Java)
Self-Hosting Loop:
- Write inference logic in Phantom DSL →
Sources/Kernel/Terms/Inference.hs - DSL produces Term/Type values representing functions
- Code generator converts to executable Haskell →
dist/haskell/hydra-kernel/src/main/haskell/Hydra/Inference.hs - Generated code can now infer types for new Hydra code (including DSL code itself!)
Primitive functions are the standard library of Hydra, providing built-in operations for common data manipulations.
Primitives are organized into 13 library modules by category. Each module
lives in packages/hydra-kernel/src/main/haskell/Hydra/Sources/Kernel/Lib/<Sub>.hs
and is the canonical registry for its module name:
| Library | Count | Examples |
|---|---|---|
| hydra.lib.chars | 6 | isAlphaNum, isLower, toUpper |
| hydra.lib.eithers | 15 | either, isLeft, rights, bimap, bind |
| hydra.lib.equality | 9 | equal, compare, gt, lt, max |
| hydra.lib.lists | 37 | map, filter, foldl, concat, sort |
| hydra.lib.literals | 55 | Type conversions, parsing, showing |
| hydra.lib.logic | 4 | and, or, not, ifElse |
| hydra.lib.maps | 20 | lookup, insert, keys, toList |
| hydra.lib.math | 46 | add, mul, sin, sqrt, abs |
| hydra.lib.optionals | 12 | fromOptional, cases, isGiven |
| hydra.lib.pairs | 3 | first, second, bimap |
| hydra.lib.regex | 6 | matches, find, findAll, replace, replaceAll, split |
| hydra.lib.sets | 14 | union, intersection, member |
| hydra.lib.strings | 13 | cat, splitOn, length, lines |
Total: 241 primitive functions (post-#156).
Each primitive is defined at three levels, with a clear separation of concerns between universal metadata and per-host implementation (introduced in #156):
PrimitiveDefinition (in hydra.packaging) carries the universal metadata that
is the same in every host language:
def "PrimitiveDefinition" $
record [
"name">: doc "Fully-qualified name" $ core "Name",
"description">: doc "Human-readable description" $ core "String",
"signature">: doc "Full type signature with parameter names" $
typing "TermSignature",
"isPure">: doc "Purity flag (defaults to True)" $ core "Boolean",
"isTotal">: doc "Totality flag (defaults to True)" $ core "Boolean",
"defaultImplementation">: doc "Optional reference implementation in Hydra terms" $
T.optional (core "Term")
]Primitive (in hydra.graph) pairs the universal metadata with a host-specific
implementation. This is what lives in a Graph as the per-host primitive
registry:
def "Primitive" $
record [
"definition">: doc "Host-independent metadata (name, signature, purity, totality)" $
packaging "PrimitiveDefinition",
"implementation">: doc "Concrete, host-specific implementation" $
graph ~> list (core "Term") ~> Types.either_ (errors "Error") (core "Term")
]The implementation maps the (already-reduced, annotation-stripped) argument terms to a result term, or an error, given the current graph. The interpreter strips annotations and reduces each argument before invoking the primitive, so the implementation can pattern-match the argument terms directly.
The two faces have deliberately different shapes:
PrimitiveDefinition.defaultImplementationis an optional pure Hydra term whose type is exactly the primitive's public signature (int32 -> int32 -> int32formath.add,(a -> Bool) -> [a] -> [a]forlists.filter). It never mentions a graph. It is the portable reference implementation — what the primitive computes — and is used for type-checking and cross-host documentation, not as a runtime substitute (interpreting it would be far slower than a native impl).Primitive.implementationis the host-native runtime carrier,Graph -> [Term] -> Either Error Term. It is how a host evaluates the primitive, natively and quickly.
So the graph appears in the runtime carrier but never in a primitive's signature or in its
defaultImplementation. The graph is a property of the implementation's calling convention, not
of the primitive's type.
Why does the carrier carry a graph at all? Most primitives ignore it — math.add just adds its two
arguments. The graph matters only for higher-order primitives that must evaluate a function
argument mid-computation. Take lists.filter applied to the predicate \x -> equality.gt x 2: the
native impl is (Term -> Bool) -> [Term] -> [Term], so it must turn that predicate term into a
native Term -> Bool, which means reducing gt x 2 per element. But gt arrives as an unresolved
name (hydra.lib.equality.gt) — it sits under a lambda binder and cannot be evaluated until filter
supplies a concrete x — and resolving that name requires the graph's primitive table. The graph
passed in is the interpreter's live graph at the call site (which may hold primitives or bindings
beyond the kernel's), so a captured or global graph would be wrong; it must be threaded from the
reducer. The complementary case — a higher-order primitive whose result shape is fixed by its data
argument, e.g. lists.map or eithers.bimap — can instead return an unreduced applicative term
([f x1, f x2, ...]) and let the outer reducer fold it, needing no graph. The graph is retained for
the minority of primitives that branch on a reduced function result (filter, find, foldl over Either, …).
(The Either-based implementation replaces the former Flow monad, removed in #245.
The host-independent PrimitiveDefinition was split out from the implementation in #156.
The vestigial InferenceContext parameter — which no primitive ever consulted — was dropped from the
carrier in #446, leaving the graph; this was sequenced with the defaultImplementation integration in #437.)
The kernel modules Hydra/Sources/Kernel/Lib/<Sub>.hs declare every primitive
as a PrimitiveDefinition (an arm of Definition alongside term and type),
collectively forming the primitive registry. The 13 modules — Chars,
Eithers, Equality, Lists, Literals, Logic, Maps, Math, Optionals,
Pairs, Regex, Sets, Strings — declare 240 primitives total.
Example (Hydra/Sources/Kernel/Lib/Logic.hs):
ns :: ModuleName
ns = ModuleName "hydra.lib.logic"
module_ :: Module
module_ = Module {
moduleName = ns,
moduleDefinitions = definitions,
moduleDependencies = Bootstrap.unqualifiedDep <$> kernelTypesModuleNames,
moduleDescription = Just "Primitives in the hydra.lib.logic namespace."}
where
definitions = [
toPrimitive "Compute the logical AND of two boolean values." andSig and_,
primNoDef "ifElse" "Compute a conditional expression." ifElseSig,
toPrimitive "Compute the logical NOT of a boolean value." notSig not_,
toPrimitive "Compute the logical OR of two boolean values." orSig or_]
andSig :: TermSignature
andSig = sig $ TypeScheme [] (Types.boolean Types.~> Types.boolean Types.~> Types.boolean) Nothing
and_ :: TypedTermDefinition (Bool -> Bool -> Bool)
and_ = define "and" $
doc "Logical AND, defined in terms of ifElse." $
"a" ~> "b" ~> Logic.ifElse (var "a") (var "b" :: TypedTerm Bool) falseThe metadata flows through to JSON in dist/json/hydra-kernel/src/main/json/hydra/lib/<sub>.json,
where it becomes the cross-host source of truth for the primitive's name,
signature, description, and default implementation.
Per host, two things are needed beyond the kernel metadata:
-
Native implementations — for the big three, in the
overlay/<lang>/hydra-kernel/tree (#418); e.g. Haskell atoverlay/haskell/hydra-kernel/src/main/haskell/Hydra/Haskell/Lib/Math.hs:add :: Int -> Int -> Int add x y = x + y
-
Host-side primitive registry — binds names to native impls. The Haskell registry lives in
overlay/haskell/hydra-kernel/src/main/haskell/Hydra/Dsl/Libraries.hs(#473):hydraLibMath :: Library hydraLibMath = standardLibrary [ prim2 DefMath.add Math.add [] int32 int32 int32, ...]
The
prim1/prim2/prim3helpers build aPrimitiveby pairing the host's nativeimplementationwith a signature and the generatedPrimitiveDefinition(DefMath.add, whereDefMathis the generatedHydra.Lib.Mathdef-module) — the single source of truth for the name (#473), taken via theToPrimNameclass.standardLibraryderives the library's module name from its first primitive, so it needs noModuleNameargument. The argument-type info passed to the helper is a host-side repetition the registry needs in native type-coder form, not the source of truth for the name.Every other host has an analogous registry that likewise derives names from the generated
hydra.lib.*def-modules:overlay/{java,python}/.../lib/Libraries.{java,py}andheads/<lang>/.../lib/Libraries.<ext>for Scala and the Lisp dialects.
The type information passed to prim1/prim2/prim3 at host registration is
a sanity-check repetition of the canonical signature in the kernel-side
registry (Hydra/Sources/Kernel/Lib/<Sub>.hs) — it's expected to match, and
divergence is a bug. On the Haskell side this keeps the bootstrap graph aligned
with the JSON kernel; other hosts consume the JSON kernel directly and so
inherit the canonical metadata. Future work (see follow-ups) may have host
registries derive their signatures directly from the kernel metadata to
eliminate this duplication.
PrimitiveDefinition.defaultImplementation : optional<Term> carries an optional
declarative reference implementation in pure Hydra terms. Two uses:
- Fallback for minimal interpreters. A host that doesn't ship a native impl for a primitive can fall back to evaluating the default Hydra term.
- Proof-friendly reference. Targets that can prove or simulate the default body (e.g. Coq) get a verified reference implementation for free.
Default implementations are pure expressions — they take only the primitive's
declared arguments (no Context, no Graph) and reduce using only other
primitives. Not every primitive has one: fundamental operations like
logic.ifElse, pairs.first, character predicates, and arithmetic cannot
be expressed in terms of other primitives and use primNoDef.
The defaultImplementation field replaces the pre-#156
Hydra.Sources.Kernel.Lib.Defaults.* modules, which encoded the same
notion as interpreter-friendly Term-AST constructions. Those modules
were merged into the canonical Lib/<Sub>.hs registries' inline
toPrimitive ... name_ entries and then removed (#437).
The Hydra.Dsl.Prims module provides type coding:
-- Literal types
int32, int64 :: TermCoder Int
float32, float64 :: TermCoder Double
bigint :: TermCoder Integer
string :: TermCoder String
boolean :: TermCoder Bool
binary :: TermCoder ByteString
-- Container types
list :: TermCoder a -> TermCoder [a]
set :: TermCoder a -> TermCoder (Set a)
optional :: TermCoder a -> TermCoder (Maybe a)
map :: TermCoder k -> TermCoder v -> TermCoder (Map k v)
tuple2 :: TermCoder a -> TermCoder b -> TermCoder (a, b)
-- Function types
function :: TermCoder a -> TermCoder b -> TermCoder (a -> b)
-- Sum types
either_ :: TermCoder a -> TermCoder b -> TermCoder (Either a b)Each TermCoder contains:
- Type representation
- Encoder: Haskell value → Hydra Term
- Decoder: Hydra Term → Haskell value
Primitive names and signatures are defined once in Haskell, in each kernel
packages/hydra-kernel/src/main/haskell/Hydra/Sources/Kernel/Lib/<Sub>.hs module
(as PrimitiveDefinitions), and become part of the generated kernel — the hydra.lib.*
def-modules — in every target language. The implementations
shown below are hand-written per host language. For the big three (Haskell, Java,
Python) they live in the top-level overlay/<lang>/hydra-kernel/ tree (#418) and
are overlaid into the published dist/<host>/hydra-kernel/ artifact during sync
(for Haskell by sync-haskell.sh, for Java/Python by
heads/<host>/bin/copy-kernel-runtime.sh); other hosts keep their implementations
under heads/<host>/. See
build-system.md §Hand-written runtime in hydra-kernel
for the full mechanism and the catalog of which subtrees are overlaid per language.
Location: overlay/java/hydra-kernel/src/main/java/hydra/lib/ (#418; was heads/java/src/main/java/hydra/lib/)
Each primitive becomes a class extending PrimitiveFunction:
// hydra/lib/math/Add.java
public class Add extends PrimitiveFunction {
public Name name() {
return new Name("hydra.lib.math.add");
}
public TypeScheme type() {
return scheme(function(int32(), int32(), int32()));
}
protected Function<List<Term>, Either<Error, Term>> implementation() {
return args -> map2(
Expect.int32(args.get(0)),
Expect.int32(args.get(1)),
(arg0, arg1) -> Terms.int32(apply(arg0, arg1))
);
}
public static Integer apply(Integer augend, Integer addend) {
return augend + addend;
}
}Location: overlay/python/hydra-kernel/src/main/python/hydra/lib/ (#418; was heads/python/src/main/python/hydra/lib/)
Pure Python implementations:
# hydra/lib/math.py
def add(x: int, y: int) -> int:
"""Add two integers."""
return x + y
def sqrt(x: float) -> float:
"""Square root of a float."""
return math.sqrt(x)
# hydra/lib/lists.py
def map_(f: Callable[[A], B], xs: frozenlist[A]) -> frozenlist[B]:
"""Map a function over a list."""
return tuple(f(x) for x in xs)prim2 _equality_equal Equality.equal ["x"] x x boolean
where x = variable "x"The same primitive works with any type supporting equality.
A PrimitiveDefinition carries an optional defaultImplementation : optional<Term>
— a declarative reference implementation in pure Hydra terms. The kernel
registry declares each primitive with one of two helpers:
primDef— supplies a default Hydra-term implementation; usable as a fallback by minimal interpreters that lack a native implementation, and as a proof-friendly reference.primNoDef— no default; used for primitives that are fundamental (e.g.logic.ifElse,pairs.first) or whose meaning is host-native (e.g. arithmetic, char predicates, regex matching).
On the Haskell host, the prim* family in Hydra.Dsl.Prims pairs each name
with its native implementation regardless of which kernel helper declared the
primitive.
All primitives operate within Either Error a, where Error is the structured union
type from hydra.errors:
type Result a = Either Error aAn InferenceContext value is threaded alongside the Graph as an explicit
parameter, carrying inference state (the fresh-variable counter and the
current subterm-path trace, accumulated backward). This provides:
- Explicit error handling with short-circuit semantics
- Subterm-path tracing via the threaded
InferenceContextparameter - No hidden state — all inference state is passed explicitly
All named references in Hydra use TermVariable.
At runtime, the reduction engine resolves each variable name through the Graph,
which holds three separate namespaces.
A Graph contains (among other fields):
| Field | Type | Contents |
|---|---|---|
graphBoundTerms |
Map Name Term |
Module-level definitions (element bindings, let-bound variables) |
graphPrimitives |
Map Name Primitive |
Built-in primitive functions and constants |
graphBoundTypes |
Map Name TypeScheme |
Type schemes for bound terms |
Lambda-bound variables are not stored in the graph; they are resolved structurally during beta-reduction.
When reduceTerm encounters a TermVariable, it resolves the name in this order:
graphBoundTerms— module definitions, let-bound variables. If found, the binding's term is recursively reduced.graphPrimitives— built-in functions and constants. If found, the primitive is applied with arity-based argument collection.- Lambda-bound — the variable was introduced by a lambda parameter. It remains as-is (a free variable in the current scope).
This means module bindings shadow primitives, and primitives shadow lambda-bound variables.
In practice, names don't collide: module definitions use qualified names like hydra.core.Term,
while primitives use the hydra.lib.* module name.
As a safety mechanism, buildGraph filters graphBoundTerms and graphBoundTypes
against graphPrimitives at construction time.
Any binding whose name matches a primitive is removed from the graph.
This ensures primitives always take priority by construction,
not just by resolution order.
The hydra.lexical.graphWithPrimitives function creates a graph
with primitives assembled from two lists:
graphWithPrimitives :: [Primitive] -> [Primitive] -> Graph
graphWithPrimitives builtIn userProvided = ...
User-provided primitives shadow built-in ones (left-biased map union). This enables:
- Language implementers to override kernel primitives with optimized host-language versions.
- Users to provide domain-specific primitive functions alongside the standard library.
The bootstrap graph (Hydra.Dsl.Bootstrap.bootstrapGraph in Haskell) uses the standard
libraries directly.
Test runners and custom applications can use graphWithPrimitives to inject additional primitives.
Built-in primitives (graphPrimitives) are implemented natively in the host language.
Each Primitive carries a definition : PrimitiveDefinition (universal metadata: name,
description, signature, isPure/isTotal flags, optional reference implementation) and an
implementation function that maps a list of Term arguments to a result Term.
See Primitive functions above.
User-defined functions (graphBoundTerms) are Hydra terms — typically lambdas or
compositions of other terms.
They are defined in modules and resolved by name just like primitives,
but they are reduced by the Hydra reduction engine rather than calling native code.
Both are referenced the same way in Hydra source code: as TermVariable with a qualified name.
The distinction is invisible to Hydra programs.
Coders enable cross-compilation of Hydra programs between different language implementations. They transform Hydra modules (types and terms) from one language's representation to another, allowing developers to write Hydra code in their preferred language and compile it to any other supported language.
See also:
- Property Graphs - Mapping Hydra schemas to property graphs with annotations
- Testing - How the common test suite validates coder parity
In 0.15, generated Haskell coder output is split across per-package
directories under dist/haskell/. Each package corresponds to a coder
family or domain.
dist/haskell/hydra-haskell/src/main/haskell/Hydra/Haskell/ # Haskell coder
dist/haskell/hydra-java/src/main/haskell/Hydra/Java/ # Java coder
dist/haskell/hydra-python/src/main/haskell/Hydra/Python/ # Python coder
dist/haskell/hydra-scala/src/main/haskell/Hydra/Scala/ # Scala 3 coder
dist/haskell/hydra-lisp/src/main/haskell/Hydra/Lisp/ # Lisp coder (4 dialects)
dist/haskell/hydra-pg/src/main/haskell/Hydra/ # Property graphs
│ ├── Pg/ # PG model + GraphSON
│ ├── Cypher/ # Cypher
│ ├── Graphviz/ # Visualization
│ └── Tinkerpop/ # Gremlin / TinkerPop
dist/haskell/hydra-rdf/src/main/haskell/Hydra/ # RDF family
│ ├── Rdf/ # RDF model + N-Triples
│ ├── Shacl/ # SHACL
│ ├── Owl/ # OWL
│ ├── Shex/ # ShEx
│ └── Xml/ # XML schema
dist/haskell/hydra-ext/src/main/haskell/Hydra/ # Long-tail coders
│ ├── Avro/ # Avro
│ ├── Protobuf/ # Protocol Buffers
│ ├── Graphql/ # GraphQL
│ ├── Pegasus/ # LinkedIn PDL
│ ├── Json/Schema/ # JSON Schema
│ ├── Cpp/, Csharp/, Go/, Rust/, Yaml/, ... # Other languages
│ └── Atlas/, Azure/, Datalog/, Delta/, Geojson/, Iana/, Kusto/, Osv/, Parquet/, Sql/, Stac/, Workflow/
dist/haskell/hydra-coq/src/main/haskell/Hydra/Coq/ # Coq coder
dist/haskell/hydra-typescript/src/main/haskell/Hydra/TypeScript/ # TypeScript (in progress)
dist/haskell/hydra-wasm/src/main/haskell/Hydra/Wasm/ # WebAssembly (in progress)
The hydra-ext package collects long-tail coders that don't yet have
their own dedicated package. Domain-specific groups (hydra-pg,
hydra-rdf) and language targets that bootstrap (hydra-java,
hydra-python, hydra-scala, hydra-lisp) are split out so each
maps cleanly to its own published artifact.
The hydra-bench package is an opt-in sibling: it holds synthetic inference
benchmark workloads (hydra.bench.*) which are deliberately stress-shaped and
not regenerated by the default sync. See packages/hydra-bench/README.md;
run bin/sync-bench.sh to refresh, then bin/run-inference-bench.sh to measure.
Each language directory typically contains:
Language/
├── Coder.hs # Main transformation logic
├── Serde.hs # AST to text serialization
├── Language.hs # Language definition and constraints
├── Names.hs # Name conversion and case conventions
├── Utils.hs # Language-specific utilities
└── Settings.hs # Configuration (optional)
All coders follow the same shape: a Module plus context goes in, a map of generated
file paths to contents comes out, and errors are reported via Either Error.
Examples:
moduleToJava :: InferenceContext -> Graph -> Module -> Either Error (M.Map FilePath String)
moduleToPython :: InferenceContext -> Graph -> Module -> Either Error (M.Map FilePath String)
moduleToCpp :: InferenceContext -> Graph -> Module -> Either Error (M.Map FilePath String)Located in packages/hydra-kernel/src/main/haskell/Hydra/Sources/Kernel/Types/Coders.hs
(generated output: dist/haskell/hydra-kernel/src/main/haskell/Hydra/Coders.hs):
-- Bidirectional transformation
data Coder v1 v2 = Coder {
coderEncode :: InferenceContext -> v1 -> Either Error v2,
coderDecode :: InferenceContext -> v2 -> Either Error v1
}
-- Adapter for language-specific transformations
data Adapter t1 t2 v1 v2 = Adapter {
adapterIsLossy :: Bool, -- Track lossy conversions
adapterSource :: t1, -- Source type schema
adapterTarget :: t2, -- Target type schema
adapterCoder :: Coder v1 v2 -- Value-level transformation
}Terms are recursively converted to target language expressions:
-- Java example
encodeTerm :: InferenceContext -> Graph -> Aliases -> Term -> Either Error Java.Expression
-- Handles:
-- - Literals (int, string, boolean, etc.)
-- - Applications (function calls)
-- - Functions (lambdas or method references)
-- - Records (class constructors)
-- - Unions (abstract class with visitors)
-- - Variables (local variables or fields)
-- - Let bindings (variable declarations)
-- - Case statements (visitor pattern)Hydra types map to language types:
-- Java example
encodeType :: InferenceContext -> Graph -> Aliases -> Type -> Either Error Java.Type
-- Maps:
-- TypeRecord → Java Class
-- TypeUnion → Abstract class with subclasses
-- TypeLambda → Generic type parameter
-- TypeForall → Java generics with bounds
-- TypeFunction → Java functional interfaces
-- TypeList → List<T>
-- TypeMap → Map<K, V>
-- TypeOptional → Optional<T>Complete module transformation:
-- Java example from hydra-java's Coder.hs
moduleToJava :: InferenceContext -> Graph -> Module -> Either Error (M.Map FilePath String)
moduleToJava cx g mod = do
-- Extract types from module
types <- getTypes cx g mod
-- Generate class for each type
classes <- traverse (typeToJavaClass cx g mod) types
-- Generate package structure
let packagePath = moduleNameToPath (moduleName mod)
-- Map file paths to source code
pure $ M.fromList $ map (\cls ->
(packagePath </> className cls <.> "java",
renderJavaClass cls)) classesAdapters handle type compatibility between languages:
-- Core adapter functions
languageAdapter :: InferenceContext -> AdapterContext -> Language -> Type
-> Either Error (Adapter Type Type Term Term)
adaptTypeForLanguage :: InferenceContext -> AdapterContext -> Language -> Type
-> Either Error Type
termAdapter :: InferenceContext -> AdapterContext -> Type
-> Either Error (Adapter FieldType FieldType Field Field)Adapter composition:
composeCoders :: Coder v1 v2 -> Coder v2 v3 -> Coder v1 v3
constructCoder :: InferenceContext -> AdapterContext -> Language -> Type
-> Either Error (Coder Term Term)Module transformation pipeline:
- Extract all elements as
TypeApplicationTerms - Gather unique types
- Construct coders for each type (via adapters)
- Pass coders to the module constructor
- Generate output files
Each language defines its capabilities:
data Language = Language {
languageName :: LanguageName,
languageConstraints :: LanguageConstraints
}
data LanguageConstraints = LanguageConstraints {
languageConstraintsEliminationVariants :: S.Set EliminationVariant,
languageConstraintsLiteralVariants :: S.Set LiteralVariant,
languageConstraintsFloatTypes :: S.Set FloatType,
languageConstraintsFunctionVariants :: S.Set FunctionVariant,
languageConstraintsIntegerTypes :: S.Set IntegerType,
languageConstraintsTermVariants :: S.Set TermVariant,
languageConstraintsTypeVariants :: S.Set TypeVariant,
languageConstraintsTypes :: Type -> Bool -- Custom type predicate
}Key features:
- Generic type parameter handling
- Visitor pattern for union elimination
- Serialization support (JSON/Avro)
- Let-binding flattening with recursive variable detection
- Symbol classification (constant, nullary, unary, local variable)
-- Java/Coder.hs (line 715-723)
TermUnion (Injection name (Field (Name fname) v)) -> do
let (Java.Identifier typeId) = nameToJavaName aliases name
let consId = Java.Identifier $ typeId ++ "." ++ sanitizeJavaName (capitalize fname)
args <- if EncodeCore.isUnitTerm v
then return []
else do
ex <- encode v
return [ex]
return $ javaConstructorCall (javaConstructorName consId Nothing) args NothingKey features:
- Metadata gathering for imports
- Type variable tracking
- Case statement deduplication
- Walrus operator for let-bindings (Python 3.8+)
- Inline type parameters (Python 3.12+)
- Automatic casting for polymorphic values
Recent fix for Issue #206:
-- Python/Coder.hs (Term inject case, in DSL form)
_Term_inject>>: "inj" ~>
"tname" <~ Core.injectionTypeName (var "inj") $
"field" <~ Core.injectionField (var "inj") $
"rt" <<~ (Resolution.requireUnionType @@ var "cx" @@ (pythonEnvironmentGetGraph @@ var "env") @@ var "tname") $
Logic.ifElse (Predicates.isEnumRowType @@ var "rt")
(projectFromExpression (pyNameToPyExpression (encodeNameQualified @@ var "env" @@ var "tname"))
(encodeEnumValue @@ var "env" @@ Core.fieldName (var "field")))
-- Omit argument for unit-valued variants (resolves #206)
...The Serde.hs files bridge language AST to formatted source code:
Java Serde (~600+ lines)
- Java AST → formatted Java source
- Comment preservation
- Import organization
Python Serde (~400+ lines)
- Python AST → formatted Python source
- Indentation and block structure
- Quote styles and escaping
Conventions shared across all Serde modules:
- Naming. Every per-syntax-element writer is named
*ToExpr— e.g.expressionToExpr,statementToExpr,typeToExpr— uniformly across Java, Python, Haskell, Scala, the four Lisp dialects, Cpp, TypeScript, Rust, Go, and the Pegasus / GraphQL / Protobuf / JsonSchema / RDF / Graphviz extension coders. - Layout. Writers compose output through shared helpers in
hydra.serialization:chooseLayoutselects between vertical and horizontal forms by measured width,parenListAdaptive/commaSepAdaptive/spaceSepAdaptivelay out punctuated lists, and the canonical line-length budget ismaxLineWidth = 120. Per-language Serde files call into these helpers and contribute only the language-specific token-emission. - The Yaml writer is the sole exception: it returns
Stringdirectly rather than going through theExprlayer. (Yaml's whitespace- sensitive layout doesn't fit the adaptive framework cleanly.)
Hydra is self-hosting: it defines its own type system and can regenerate itself.
Hydra's source modules are divided into type modules and term modules. Type modules define data models — the types that make up Hydra's internal representation. Term modules provide the logic and procedural aspect — the functions that operate on those types. This distinction applies throughout, not just to the kernel.
The modules compiled in the Haskell head are aggregated in Hydra.Sources.All
(kernel + Haskell coder + JSON) and Hydra.Sources.Ext (all extension coders):
-
Kernel type modules (
kernelTypesModules) — Hydra's internal data model: the core type system (hydra.core), graph and package structures (hydra.graph,hydra.packaging), and supporting types likehydra.typing,hydra.coders,hydra.query,hydra.tabular, etc. Hand-written DSL definitions inHydra.Sources.Kernel.Types.*. -
Kernel term modules (
kernelTermsModules) — The logic of Hydra: type inference, type checking, term reduction, rewriting, code generation, etc. Hand-written DSL definitions inHydra.Sources.Kernel.Terms.*. Also includes the encoder/decoder source modules (see below). For the high-level framing of how inference and checking cooperate, see the Inference wiki page; this section covers only the build-system mechanics. -
Haskell modules (
haskellModules) — Both type modules (the Haskell AST model) and term modules (the Haskell coder, serializer, and utilities). These are specific to hydra-haskell and enable Haskell code generation. -
JSON modules (
jsonModules) — The JSON data model (type module) along with the JSON coder, parser, and writer (term modules). -
Other modules (
otherModules) — Currently the YAML model and coder utilities. -
Test modules (
testModules) — The common test suite, compiled into each target language as part of the sync process. Defined separately frommainModules.
Encoder/decoder source modules are a special category of term modules that are
generated from the type modules rather than hand-written. For each kernel type module
(e.g., hydra.core), a pair of modules is generated that can encode objects of that type
as Hydra Terms and decode them from Terms. These live in Hydra.Sources.{Encode,Decode}.*
and are included in kernelTermsModules alongside the hand-written term modules.
The full set is composed as:
mainModules = kernelModules ++ haskellModules ++ jsonModules ++ otherModules
kernelModules = kernelTypesModules ++ kernelTermsModules ++ jsonModules
kernelTermsModules = kernelPrimaryTermsModules -- hand-written logic modulesThe encode/decode modules (hydra.encode.*, hydra.decode.*) are synthesized
in-memory at runtime by generateEncoderModules/generateDecoderModules (#448)
and injected into the driver's universe before inference runs.
They are no longer shipped as dist/haskell/.../Sources/{Encode,Decode}/*.hs files.
All modules in mainModules — regardless of category — go through the same code generation
pipeline: writeHaskell (or writeJava, writePython) compiles them from Hydra module
definitions into executable code in the target language.
The encoder/decoder source modules require a special staging step because they are derived
from the type modules rather than hand-written. The sync script (sync-haskell.sh) handles
this with an initial generation pass, followed by a source module generation step, followed
by a second generation pass.
Because these derived modules are produced mechanically from a known type, the synthesizer
is the authority on their types. Each derived Source module contains a single module_
TermDefinition whose term has type hydra.packaging.Module. The synthesizer
(moduleToSourceModule in Hydra.Sources.Kernel.Terms.Generation) must set
termDefinitionTypeScheme = Just (TypeScheme [] (TypeVariable "hydra.packaging.Module") Nothing)
on that binding, so downstream consumers can skip type inference rather than re-derive it
from the term's large encoded structure. Leaving the field as Nothing forces a full
inferModulesIO pass per derived module — manageable locally but memory-prohibitive on
typical CI runners. See #367 for
the case where this invariant was violated.
Phases:
| Phase | What it does |
|---|---|
| 1 | Compile mainModules into executable Haskell (initial pass) |
| 2–3 | Generate universal test cases and eval lib |
| 4 | Generate encoder/decoder source modules from kernelTypesModules |
| 5 | Recompile mainModules into executable Haskell (picking up the new source modules) |
| 6 | Export and verify JSON kernel |
| 7 | Run tests |
Phase 5 is necessary because the encoder/decoder source modules generated in phase 4 are
part of kernelTermsModules and therefore mainModules. They need to be compiled into
executable code just like every other module. A stack build between phases 4 and 5
ensures the Haskell compiler picks up the newly generated source files.
writeHaskell/writeJava/writePython— Compile Hydra modules into executable code in the target language. Signature:FilePath -> [Module] -> [Module] -> IO ()(output directory, universe modules for resolution, modules to generate).writeDecoderSourceHaskell/writeEncoderSourceHaskell— Generate encoder/decoder source modules (Hydra module definitions) from type modules. Used in phase 4.writeDecoderHaskell/writeEncoderHaskell— Convenience functions that generate encoder/decoder modules and immediately compile them to executable Haskell in one step.
For detailed context on encoder/decoder modules, see Issue #47: Per-Type Term Coders.
This section covers how Hydra's build system runs inference at scale — caching,
incremental skipping, per-package iteration. For what inference itself does
(HM with elaboration to typed System F, the two cooperating modules
hydra.inference and hydra.checking, the Graph as inference context), see
the Inference wiki page.
inferModulesGiven (in Hydra.Codegen) takes a universe and a target set
and re-infers only the relevant subset. Bindings in the target modules or
in the transitive term-dependency closure that lack a pre-attached
TypeScheme are fed to inferGraphTypes; clean non-target bindings are
left untouched, and their cached schemes are consulted during inference
via graphBoundTypes. Equivalent to inferModules when nothing in the
universe carries a scheme, which is today's default path.
A content-hash cache (Hydra.Digest) sits on top: writeModulesJson
computes SHA-256 hashes of kernel DSL source files and short-circuits
inference and writes entirely if every hash matches the stored digest and
every target JSON file exists. Digest files live under the gitignored
build-cache subtree at dist/json/<pkg>/build/<main|test>/digest.json
and dist/json/build/digest.json (see #247
and #379 for the
build/ layout rationale).
When the cache misses or the dirty set is too large to fit in one
inferModulesGiven call, Phase 1 falls back to a per-package iterative
driver (Hydra.Generation.inferAndWriteByPackage and its seeded variant
inferAndWriteByPackageSeeded). The driver topologically sorts the
package dep graph from each packages/<pkg>/package.json's
dependencies field, then for each package in order runs a
Generation-side wrapper inferModulesGivenSchemes over only that
package's modules — with the typed-so-far universe merged into the
inference graph as (Name, TypeScheme) maps rather than full Module
values. Each iteration writes the focus package's JSON to disk
immediately, which forces the inferred TypeSchemes through serialization
and breaks any lazy thunk chain across iterations.
Two entry points use the same driver:
- The cold-cache fallback in
writeModulesJsonPackageSplitcallsinferAndWriteByPackagewith empty seed maps and an empty schema context. Every module flows through the per-package loop. - The warm-cache incremental path in
tryIncrementalInferencecallsinferAndWriteByPackageSeededwith the JSON-loaded clean modules'(Name, TypeScheme)pairs as the seed maps (one for term bindings, one for type-def schemas) and the clean modules themselves as a schema-context-only set used to build the JSON-writeschemaMaponce up front. After that one-shot build, the clean modules are unreferenced and can be GC'd; the iteration carries only the seed maps + dirty modules forward.
Peak memory per iteration is bounded by type-schemes of transitive
deps + bindings of the focus package, not by every prior module's
full payload (which is what the original [Module] accumulator
retained). A TypeScheme is typically 1-3 orders of magnitude smaller
than the term body it types, so this is what keeps Phase 1 within the
-M6G CI heap cap on a wholly dirty universe (e.g. after a kernel-wide
rename invalidates every module name's digest). See
#381 and
Phase 1's memory envelope
in the build-system doc for the wall-time trade-off and the dead-end
per-SCC attempt that preceded it.
One subtlety: DSL-authored TypeDefinitions encode polymorphism as
nested TypeForall wrappers inside typeSchemeBody (with empty
typeSchemeVariables). The kernel's schemaGraphToTypingEnvironment
unwraps these at schema-graph lookup time; when we bypass the schema
graph and inject TypeSchemes directly into graphSchemaTypes, we have
to apply the same normalization (normalizeTypeScheme) ourselves —
otherwise downstream consumers that pattern-match on the body shape
(e.g. expecting record{...}) hit UnexpectedShape errors against
the raw ∀.∀.…record{...} form.
The Java and Python native DSL → JSON pipelines (heads/java/.../Generation.java
and heads/python/.../generation.py) mirror the same driver shape in
their host language so the native generators (UpdateJavaJson,
update-python-json.py) hit the same per-package memory envelope —
relevant when those pipelines grow to cover more than their own one
package.
DSL defines Hydra → Generates code for Hydra
↓ ↓
But generator needs Code generation
to understand DSL requires understanding
the new DSL constructs!
CIRCULAR DEPENDENCY!
When adding new features (like Either type):
Add to core types in Core.hs:
def "Term" $
union [
-- ... existing variants
"either">: Types.either_ (core "Term") (core "Term"),
-- ...
]Add DSL operations in Phantoms.hs:
either_ :: TypedTerm (a -> c) -> TypedTerm (b -> c) -> TypedTerm (Either a b) -> TypedTerm cstack build
# Error: Generator doesn't understand 'either' yetHand-translate DSL definitions to Haskell in generated files:
-- Manually edit: dist/haskell/hydra-kernel/src/main/haskell/Hydra/Inference.hs
inferTypeOfEither :: InferenceContext -> Graph -> Either Term Term -> Either Error InferenceResult
inferTypeOfEither cx graph (Left left) = do
leftResult <- inferType cx graph left
let leftType = inferenceResultType leftResult
let cx2 = inferenceResultContext leftResult
return $ InferenceResult cx2 (TypeUnion [leftType, typeAny])
inferTypeOfEither cx graph (Right right) = do
rightResult <- inferType cx graph right
let rightType = inferenceResultType rightResult
let cx2 = inferenceResultContext rightResult
return $ InferenceResult cx2 (TypeUnion [typeAny, rightType])stack build
# Success! Generator now understands Eitherbin/sync-haskell.sh
# Regenerates DSL → JSON → Haskell (Phase 1 of the sync pipeline).
# Replaces the retired hydra-ext-debug exec.stack build
# Self-hosting loop complete!heads/haskell/src/main/haskell/Hydra/
├── Dsl/ # DSL definitions (manual)
├── Lib/ # Native implementations (manual)
├── Generation.hs # Code-gen driver (manual)
├── ExtGeneration.hs # Driver for ext-language coders (manual)
└── Haskell/Generation.hs # Haskell-specific coder driver (manual)
packages/hydra-kernel/src/main/haskell/Hydra/
└── Sources/ # Kernel DSL-based specifications (manual)
├── Kernel/Types/ # Type modules (data shapes)
├── Kernel/Terms/ # Term modules (kernel functions)
├── Kernel/Lib/ # Primitive registry: PrimitiveDefinition per hydra.lib.<sub> module name
└── Test/ # Common test suite
packages/hydra-<lang>/src/main/haskell/Hydra/
└── Sources/<Lang>/ # Per-language coder DSL sources (manual)
dist/haskell/hydra-kernel/src/main/haskell/ # Generated kernel code
├── Hydra/
│ ├── Core.hs # Generated Core types
│ ├── Variants.hs # Generated Variants types
│ ├── Inference.hs # Generated type inference
│ ├── Checking.hs # Generated type checking
│ └── ... # All kernel modules
dist/haskell/hydra-haskell/src/main/haskell/Hydra/Haskell/Coder.hs # Haskell coder
dist/haskell/hydra-java/src/main/haskell/Hydra/Java/Coder.hs # Java coder
dist/haskell/hydra-python/src/main/haskell/Hydra/Python/Coder.hs # Python coder
dist/haskell/hydra-scala/src/main/haskell/Hydra/Scala/Coder.hs # Scala coder
dist/haskell/hydra-lisp/src/main/haskell/Hydra/Lisp/Coder.hs # Lisp coder (4 dialects)
dist/haskell/hydra-pg/src/main/haskell/Hydra/Pg/ # Property graphs
dist/haskell/hydra-rdf/src/main/haskell/Hydra/Rdf/ # RDF / SHACL
dist/haskell/hydra-ext/src/main/haskell/Hydra/ # Long-tail (Avro, Protobuf, GraphQL, ...)
Hydra's modular architecture provides clear extension points for adding new functionality. For detailed step-by-step guides, see the Developer Recipes.
Primitive functions: Add new standard library functions by declaring a PrimitiveDefinition
in the appropriate Hydra/Sources/Kernel/Lib/<Sub>.hs module, adding native implementations in
each host, and regenerating code for all target languages.
See the
Adding primitives recipe.
Core types: Extend the kernel type system by adding new type definitions to Core.hs, updating DSL constructors,
and following the bootstrap process to regenerate the system.
See the
Extending Hydra Core recipe.
Target languages: Add support for new programming languages by implementing a coder (term/type encoding),
serializer (AST to text), and language constraint definitions in the appropriate package under packages/.
Standard libraries: Create new library modules by defining types in Sources/Kernel/Types/,
implementing native functions in Lib/, registering primitives, and creating DSL wrappers.
packages/hydra-kernel/src/main/haskell/Hydra/Sources/Kernel/Types/
├── Core.hs # hydra.core - foundation
├── Variants.hs # hydra.variants - metadata
├── Coders.hs # hydra.coders - Coder, Adapter, Language
├── Graph.hs # hydra.graph - primitives
├── Packaging.hs # hydra.packaging - modules, namespaces, packages
├── Typing.hs # hydra.typing - inference results
└── ... # see Hydra.Sources.Kernel.Types.All for the full list
heads/haskell/src/main/haskell/Hydra/Dsl/
├── Terms.hs # Untyped term DSL
├── Types.hs # Untyped type DSL
├── Phantoms.hs # Phantom-typed DSL
├── Meta/Terms.hs # Term-encoded terms
├── Core.hs # High-level constructors
├── Bootstrap.hs # Bootstrapping utilities
└── Lib/ # Library DSLs
├── Lists.hs
├── Eithers.hs
└── ...
(#418: three of these DSL-support modules — Dsl/Terms.hs, Dsl/Literals.hs, and
Dsl/Meta/Common.hs — are part of the hydra-kernel distribution runtime and have
moved to overlay/haskell/hydra-kernel/src/main/haskell/Hydra/Dsl/. The rest of
Hydra/Dsl/ remains head-only.)
packages/hydra-kernel/src/main/haskell/Hydra/Sources/Kernel/Lib/ — Canonical primitive registry (one PrimitiveDefinition-emitting module per hydra.lib.<sub> module name)
overlay/haskell/hydra-kernel/src/main/haskell/Hydra/Haskell/Lib/ — Native Haskell implementations (relocated here from the head by #418)
overlay/haskell/hydra-kernel/src/main/haskell/Hydra/Dsl/Libraries.hs — Host-side bindings (pairs each native impl with a name derived from its PrimitiveDefinition via prim1/prim2/prim3; relocated here + name-derivation by #473)
Sources/Kernel/Lib/
├── Math.hs
├── Lists.hs
└── ...
overlay/haskell/hydra-kernel/.../Hydra/Haskell/Lib/
├── Math.hs
├── Lists.hs
└── ...
Per-language DSL sources live under packages/hydra-<lang>/src/main/. Most are authored in Haskell
(.../haskell/Hydra/Sources/<Lang>/); hydra-java and hydra-python are authored host-natively in
Java and Python (.../{java,python}/hydra/sources/):
packages/hydra-haskell/src/main/haskell/Hydra/Sources/Haskell/packages/hydra-java/src/main/java/hydra/sources/java/— Java coder, authored in Java (host-native sole source of truth; the Haskell DSL copy was deleted in #346)packages/hydra-python/src/main/python/hydra/sources/python/— Python coder, authored in Python (host-native sole source of truth; the Haskell DSL copy was deleted in #346)packages/hydra-scala/src/main/haskell/Hydra/Sources/Scala/packages/hydra-lisp/src/main/haskell/Hydra/Sources/Lisp/packages/hydra-pg/src/main/haskell/Hydra/Sources/— property graphs (Pg, Cypher, Tinkerpop, Graphviz)packages/hydra-rdf/src/main/haskell/Hydra/Sources/— RDF, SHACL, OWL, ShEx, XML schemapackages/hydra-ext/src/main/haskell/Hydra/Sources/— long-tail: Avro, Cpp, Csharp, Datalog, Geojson, Go, GraphQL, JsonSchema, Pegasus, Protobuf, Rust, Yaml, ...packages/hydra-bench/src/main/haskell/Hydra/Sources/Bench/— synthetic inference benchmark workloads (opt-in viabin/sync-bench.sh)packages/hydra-coq/src/main/haskell/Hydra/Sources/— Coqpackages/hydra-typescript/src/main/haskell/Hydra/Sources/TypeScript/— TypeScript
Generated coder output lands under dist/haskell/<pkg>/ for each source package.
The long-tail dist/haskell/hydra-ext/
tree is frozen (targetLanguages: [] in
packages/hydra-ext/package.json)
and shipped as-is rather than regenerated by the sync matrix.
dist/haskell/hydra-kernel/src/main/haskell/ — Generated Haskell
dist/java/hydra-kernel/src/main/java/ — Generated Java
dist/python/hydra-kernel/src/main/python/ — Generated Python
Hydra's implementation demonstrates a sophisticated multi-layer architecture:
- Type modules define the core type system in a modular, dependency-aware manner
- DSLs provide multiple levels of abstraction for writing Hydra code with compile-time safety
- Primitives offer a comprehensive standard library with multi-language generation
- Coders transform Hydra definitions into multiple target languages systematically
- Bootstrap process enables self-hosting and gradual extension of the language
This architecture enables:
- Type-safe code generation across languages
- Self-modifying compiler capabilities
- Systematic addition of new features
- Clear separation of concerns
- Maintainable and extensible codebase
The combination of Haskell's type system, phantom types, and careful layering creates a robust foundation for a multi-language transformation framework.
Hydra uses a combination of shell script wrappers (in bin/ directories) and Stack executables
for code generation and synchronization. The main sync scripts orchestrate the individual executables
in the correct order; the individual scripts and executables are useful during development when you
need to rerun a single phase.
For how these fit into the release workflow, see docs/release-workflow.md (procedure) and the release policy (wiki).
The sync system is organized in three layers under per-package dist trees
(dist/<lang>/<pkg>/src/main/<lang>/...):
- Layer 1 (transforms): per-language
transform-json-to-<lang>.shscripts inheads/haskell/bin/convert the JSON universe into source files for one target. - Layer 2 (assemblers): per-language
heads/<lang>/bin/assemble-all.shscripts produce complete per-package distributions for one target in batch mode (one Haskell universe load per target). - Layer 2.5 (testers): per-language
heads/<lang>/bin/test-*.shscripts compile and test the assembled distributions. - Layer 3 (orchestrators): top-level
bin/sync*.shscripts compose the above across hosts and targets.
Each layer caches its work via per-package digest files; warm-cache runs short-circuit in seconds. See Code generation for the full workflow.
Warm bin/sync.sh runs complete in a few seconds when no inputs have changed.
The short-circuits, from coarsest to finest:
| Layer | Gate | Cache location | Skips when … |
|---|---|---|---|
| Top-level Phase 1 skip | bin/lib/check-phase1-fresh.py |
heads/haskell/.stack-work/phase1-input-cache.txt |
DSL sources + heads/haskell/src/** + heads/haskell/package.yaml + heads/haskell/stack.yaml + sync-haskell.sh content-hash unchanged since last green sync |
| Step 3 (verify) | sync-haskell.sh coarse skip |
heads/haskell/.stack-work/verify-json-kernel-cache.txt |
dist/json/hydra-kernel/**.json + verify-json-kernel source content-hash unchanged |
| Step 3 (verify, per-module) | verify-json-kernel exec |
heads/haskell/.stack-work/verify-json-kernel-per-module-cache.json |
Each module's JSON file content-hash matches its last green-verify record |
| Step 4 (generate Haskell) | sync-haskell.sh coarse skip |
heads/haskell/.stack-work/bootstrap-from-json-cache.txt |
dist/json/**.json + bootstrap-from-json source content-hash unchanged |
| Step 6 (stack test) | sync-haskell.sh coarse skip |
heads/haskell/.stack-work/haskell-test-cache.txt |
Generated kernel + heads/haskell/src/{main,test}/**.hs + package.yaml + stack.yaml content-hash unchanged |
| Layer 2/2.5 per-package | digest-check |
dist/<lang>/<pkg>/build/<set>/digest.json |
Per-package input digest matches recorded digest |
| Layer 2.5 per-target tests | bin/lib/test-cache.sh |
dist/<lang>/test-cache.json |
Every source + test helper + runner content-hash unchanged since last green run |
All caches are content-hash based (not mtime). Editing a file with no byte-level change does not invalidate any cache; changing a byte by any amount invalidates. Caches stamp only after a fully-green run; a failed run does not poison the cache.
The hydra-ext tree has targetLanguages: ["python"], so the haskell
sync matrix does not include it in batch mode (assemble-all.sh omits
--include-ext). To regenerate dist/haskell/hydra-ext/ after a source
DSL change in packages/hydra-ext/Sources/, run
heads/haskell/bin/assemble-distribution.sh hydra-ext directly.
| Script | Purpose |
|---|---|
sync-all.sh |
Full sync. Run the complete matrix (Phase 1 JSON build + Phase 2 per-target assemble + Phase 3 test). Supports --no-tests. |
sync.sh |
Scoped sync. Run a chosen host/target subset via --hosts <list> --targets <list>. |
sync-default.sh |
Shortcut for the haskell/java/python bootstrapping triad. |
sync-packages.sh |
Per-package sync. Bring one or more packages/<pkg>/ trees into sync with their dist/ outputs across all targets. Symmetric to sync.sh but scoped by package rather than (host, target). |
sync-java.sh, sync-python.sh, sync-scala.sh |
Per-language wrappers (host == target). |
sync-clojure.sh, sync-common-lisp.sh, sync-emacs-lisp.sh, sync-scheme.sh |
Per-Lisp-dialect wrappers. |
regenerate-lexicon.sh |
Regenerate docs/hydra-lexicon.txt from the Haskell kernel. On-demand / pre-release (not part of regular sync). |
prepare-release.sh |
Cross-implementation pre-release preparation: verification + upload-ready sdist/docs. |
Shell script wrappers live in heads/haskell/bin/. Executables without shell wrappers are run via stack exec <name>.
| Script / Executable | Purpose |
|---|---|
sync-haskell.sh |
Phase 1 sync. Regenerate DSL → JSON, the Haskell kernel, and run stack test. The lexicon is no longer refreshed here; use bin/regenerate-lexicon.sh on demand. |
assemble-all.sh |
Layer 2 batch assembler. Produce Haskell distributions for every package in one bootstrap-from-json invocation. |
assemble-distribution.sh <pkg> |
Layer 2 single-package assembler for one Haskell target package. |
transform-haskell-dsl-to-json.sh |
Transform Haskell DSL sources into the JSON universe under dist/json/. |
transform-json-to-haskell.sh |
Transform JSON into Haskell source files. |
transform-json-to-{java,python,scala,lisp,target}.sh |
Layer 1 per-target transforms. |
test-distribution.sh |
Layer 2.5 tester for Haskell distributions. |
update-json-{kernel,main,test,manifest}.sh |
Export kernel / non-kernel / test / manifest modules to JSON. |
verify-json-kernel.sh |
Verify the JSON kernel round-trips correctly. |
bootstrap-from-json |
Hydrate target-language distributions from the per-package JSON exports (executable; supports --scoped, --all-packages, and flat modes). |
digest-check |
Inspect and refresh per-package digest files used for warm-cache short-circuits (executable). |
Each non-Haskell host mirrors the same shape:
| Script | Purpose |
|---|---|
heads/<lang>/bin/assemble-all.sh |
Batch Layer 2 assembler for this target. |
heads/<lang>/bin/test-*.sh |
Layer 2.5 tester: compile the assembled per-package distributions and run target-language tests. |
<lang> ∈ {java, python, scala}. Lisp follows a similar shape under
heads/lisp/bin/ (shared) and heads/lisp/<dialect>/bin/ for each of
clojure, common-lisp, emacs-lisp, and scheme.