Skip to content

Latest commit

 

History

History
1835 lines (1455 loc) · 82.4 KB

File metadata and controls

1835 lines (1455 loc) · 82.4 KB

Hydra Implementation Overview

This document provides a detailed look at Hydra's implementation, from type modules to coders to primitives to DSLs. It complements the Concepts documentation by focusing on the concrete architecture and code organization rather than abstract foundations.

Prerequisites

Before reading this guide, you should:

  • Understand Hydra's core concepts (Concepts)
  • Be familiar with at least one of: Haskell, Java, or Python
  • Have Hydra cloned and built locally (see main README)

This guide is for:

  • Contributors who want to extend Hydra's kernel
  • Developers implementing new language coders
  • Anyone curious about Hydra's internal architecture

If you just want to use Hydra, start with Concepts and the main README instead.

Table of contents

  1. Architecture overview
  2. Type modules
  3. DSL system
  4. Primitive functions
  5. Variable resolution and graphs
  6. Cross-language compilation (coders)
  7. The bootstrap process
  8. Extending Hydra
  9. Appendix: Build scripts and executables

Architecture overview

Hydra is a strongly-typed functional programming language that executes in multiple language environments. By design, developers can write Hydra source code in any of the supported host languages (Haskell, Java, Python, Scala, Lisp) and cross-compile it to any other supported language. Hydra-Haskell serves as the source of truth for the Hydra kernel (the core type system and transformation infrastructure), but Hydra programs themselves can be written and executed in Java, Python, Scala, Lisp, or any other supported implementation.

The implementation follows a layered architecture:

┌──────────────────────────────────────────────────────────────┐
│                   Hydra Kernel (Source of Truth)             │
│  Type system: Term, Type, Module, Graph, primitives, etc.    │
│  Location: packages/hydra-kernel/src/main/haskell/Hydra/Sources/│
│  Written using: Haskell DSLs                                 │
└────────────────────────┬─────────────────────────────────────┘
                         │ Defines
                         ▼
┌──────────────────────────────────────────────────────────────┐
│              Language Implementations (Peers)                │
│                                                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐           │
│  │   Haskell   │  │    Java     │  │   Python    │  ...      │
│  │  (bootstrap)│  │             │  │             │           │
│  └─────────────┘  └─────────────┘  └─────────────┘           │
│                                                              │
│  Each implementation provides:                               │
│  • Hydra type system runtime                                 │
│  • Primitive function implementations                        │
│  • Ability to execute Hydra programs                         │
│  • APIs for writing Hydra code in host language              │
└────────────────────────┬─────────────────────────────────────┘
                         │ Cross-compile via
                         ▼
┌──────────────────────────────────────────────────────────────┐
│         Coders (Cross-Language Transformations)              │
│  Transform Hydra modules between language implementations    │
│  DSL sources: packages/hydra-<lang>/src/main/haskell/Hydra/Sources/<Lang>/ │
│  Runtime driver: heads/haskell/src/main/haskell/Hydra/ExtGeneration.hs │
│  Enable: Write in Java, compile to Python (or vice versa)    │
└──────────────────────────────────────────────────────────────┘

Key design principles

  1. Multi-language by design: Hydra programs can be written in any supported host language and cross-compiled to others
  2. Unified type system: All implementations share the same Hydra kernel (types, primitives, semantics)
  3. Self-hosting: The Hydra kernel is defined in Hydra itself (using Haskell as the bootstrap language)
  4. Type safety: Multiple layers of static type checking (host language + Hydra type system)
  5. Modularity: Clean separation between kernel definition, language implementations, and cross-compilation
  6. Metadata over file-system discovery: The build pipeline operates on declared metadata (hydra.json, per-package package.json, in-DSL module manifests) and reads or writes files at known paths derived from that metadata. It does not scan the file system to discover what to do. Tools that walk a directory looking for "whatever's there" invert the source-of-truth relationship — the layout follows the tree instead of the tree following declarations — and silently drift when files are added, renamed, or hand-edited. When a build script needs to know which files to copy or process, the answer must come from a declaration, not a find walk.
  7. Per-package host code lives in bindings/: Handwritten host-language code tied to a specific Hydra package belongs under bindings/<host>/<artifact>/, not in heads/<host>/. Two flavors: (a) third-party adapters that wrap external libraries (e.g., hydra-rdf4j connects hydra.rdf.syntax.* to Eclipse rdf4j; hydra-neo4j provides ANTLR-based Cypher/GQL parsers); and (b) per-package host DSL helpers with no third-party deps (e.g., hydra-pg-dsl provides Java fluent builders for hydra.pg.{model,query}). Each binding is independently versioned and publishable; it depends on exactly one Hydra package (e.g., hydra-rdf4j depends on hydra-rdf). The binding tree is not part of the DSL pipeline — bindings don't appear in hydra.json's package list, aren't synced through bin/sync.sh, and aren't consumed by the bootstrap demo. They sit at the leaves of the dependency graph. This rule keeps heads/<host>/ runtimes minimal: language-independent Hydra runtime + stdlib + build tooling.

Type modules

Type modules define Hydra's core type system. They are located in:

packages/hydra-kernel/src/main/haskell/Hydra/Sources/Kernel/Types/

Module organization

Hydra's kernel consists of ~20 type modules organized into logical categories. The canonical list is Hydra.Sources.Kernel.Types.All.kernelTypesModules; the descriptions below cover the main ones:

Core foundation

Core.hs - hydra.core module name (largest type module)

  • Central hub defining fundamental types: Term, Type, Literal, Function, Application, Lambda, Let, Record, Union, etc.
  • All other modules depend on Core directly or transitively
  • Special property: imports itself as a dependency

Variants.hs - hydra.variants module name

  • Supplements Core with metadata types NOT referenced by Core
  • Defines variant enums: TermVariant, TypeVariant, LiteralVariant, etc.
  • Provides introspection capabilities: Precision, Comparison

Packaging.hs - hydra.packaging module

  • Defines the packaging model: Package, Module, Definition, ModuleName, ModuleDependency, PackageDependency, VersionSpecifier, and the metadata types EntityMetadata, LifecycleInfo, EntityReference, DefinitionReference, Version.
  • A Module carries a name :: ModuleName, an optional metadata :: Maybe EntityMetadata, a list of dependencies :: [ModuleDependency], and a list of definitions. A ModuleDependency is the depended-on module :: ModuleName plus an optional package :: Maybe PackageName.
  • For the conceptual model (entity metadata, lifecycle/versioning, cross-references), see the Packaging wiki page.

Transformation framework

Coders.hs - hydra.coders module name

  • Defines Coder, Adapter, Bicoder, Language, LanguageConstraints, AdapterContext, TraversalOrder
  • The framework is Either-based; the former Flow monad was removed in #245

Graph and query

Graph.hs - hydra.graph module name

  • Extends core with graph operations
  • Defines: Graph, Primitive, TermCoder

Query.hs - hydra.query module name

  • Language-agnostic graph pattern queries
  • Triple patterns and path expressions

Type system support

Typing.hs - hydra.typing module name

  • Type inference and reconstruction
  • Type constraints and substitutions
  • TypeClass record (used by hydra.classes term bindings)

hydra.classes - term module (not a type module)

Error model

Errors.hs - hydra.errors module name and the Error/ subdirectory

  • Structured error types used by inference, checking, and coders

Parsing and path resolution

Parsing.hs - hydra.parsing module name Paths.hs - hydra.paths module name

Data model helpers

Ast.hs - hydra.ast — common syntax tree for serializers Tabular.hs - hydra.tabular — CSV/TSV data model (generic)

Utility and specialized

Testing.hs - hydra.testing — unit testing framework Typed.hs - hydra.typed — typed (phantom) wrappers for DSL use Relational.hs - hydra.relational — Codd's Relational Model Topology.hs - hydra.topology — graph algorithms (Tarjan SCC) Util.hs - hydra.util — misc utilities

Type definition patterns

All type modules follow a consistent structure:

module Hydra.Sources.Kernel.Types.ModuleName where

import Hydra.Kernel
import Hydra.Dsl.Bootstrap
import Hydra.Dsl.Types as Types
import qualified Hydra.Sources.Kernel.Types.Core as Core

module_ :: Module
module_ = Module {
    moduleName = ns,
    moduleDefinitions = DefinitionType <$> definitions,
    moduleDependencies = unqualifiedDep <$> [moduleName Core.module_],
    moduleDescription = Just description}
  where
    ns = ModuleName "hydra.namespace"
    core = typeref $ moduleName Core.module_
    def = datatype ns

    definitions = [
      def "TypeName1" $ doc "Description" $ definition1,
      def "TypeName2" $ doc "Description" $ definition2,
      -- ...
    ]

Example: Union Type (from Core.hs)

def "Term" $
  doc "A data term" $
  union [
    "annotated">: core "AnnotatedTerm",
    "application">: core "Application",
    "either">: Types.either_ (core "Term") (core "Term"),
    "function">: core "Function",
    "let">: core "Let",
    "list">: list $ core "Term",
    "literal">: core "Literal",
    "map">: Types.map (core "Term") (core "Term"),
    "optional">: optional $ core "Term",
    "pair">: Types.pair (core "Term") (core "Term"),
    "record">: core "Record",
    "set">: set $ core "Term",
    "typeApplication">: core "TypeApplicationTerm",
    "typeLambda">: core "TypeLambda",
    "union">: core "Injection",
    "unit">: T.unit,
    "variable">: core "Name",
    "wrap">: core "WrappedTerm"
  ]

Example: Record Type (from Packaging.hs)

def "Module" $
  doc "A logical collection of definitions sharing a module name" $
  record [
    "description">: optional string,
    "name">: packaging "ModuleName",
    "dependencies">: list $ packaging "ModuleDependency",
    "definitions">: list $ packaging "Definition"
  ]

Example: Generic Type (from Tabular.hs)

def "Table" $
  doc "A simple table with header and data rows" $
  forAll "v" $ record [
    "header">: optional $ tabular "HeaderRow",
    "data">: list (tabular "DataRow" @@ "v")
  ]

Example: Enum Type (from Variants.hs)

def "TermVariant" $
  doc "The identifier of a term constructor" $
  enum [
    "annotated", "application", "either", "function",
    "let", "list", "literal", "map", "optional",
    "pair", "record", "set", "typeApplication",
    "typeLambda", "union", "unit", "variable", "wrap"
  ]

Dependency graph

Core (hydra.core) - Foundation
  ├─ Variants   - Supplements with variants and introspection types
  ├─ Classes    - Typeclass metadata (Ord, Eq)
  ├─ Typing     - Type system support (inference results, schemes)
  ├─ Phantoms   - DSL phantom types
  ├─ Tabular    - Tabular data
  ├─ Query      - Graph queries
  ├─ Testing    - Test framework
  └─ Topology   - Graph algorithms

Core + supporting types
  ├─ Graph      - Extends core with graph operations
  ├─ Coders     - Language-transformation framework (Coder, Adapter, Language, ...)
  └─ Packaging  - Module, Definition, ModuleName, ModuleDependency, Package

Error model
  ├─ Errors     - Structured error types
  └─ Error/*    - Per-subsystem error types

Key properties:

  • No circular dependencies at type level
  • Clear separation: foundation (Core/Variants) vs. extensions
  • Layered architecture: Atomic → Composite → Integrative → Specialized

DSL system

Hydra uses embedded domain-specific languages (eDSLs) in Haskell to define its entire kernel. The DSL system provides multiple levels of abstraction for different use cases.

DSL module locations

heads/haskell/src/main/haskell/Hydra/Dsl/                # Hand-written base DSLs
heads/haskell/src/main/haskell/Hydra/Dsl/Meta/           # Hand-written meta DSL wrappers
heads/haskell/src/main/haskell/Hydra/Dsl/Meta/Lib/       # Library DSLs (13 files)
dist/haskell/hydra-kernel/src/main/haskell/Hydra/Dsl/    # Generated DSLs (from hydra.dsls)
heads/haskell/src/main/haskell/Hydra/                    # Generation drivers and sources
dist/haskell/hydra-<pkg>/src/main/haskell/Hydra/         # Generated per-package coder modules
                                                          #   (hydra-haskell, hydra-java, hydra-python,
                                                          #    hydra-scala, hydra-lisp, hydra-typescript,
                                                          #    hydra-go (head bud),
                                                          #    hydra-pg, hydra-rdf, hydra-ext for the long-tail,
                                                          #    hydra-coq, ...)

See also: DSL guide - Comprehensive guide with examples and operator reference

Three levels of DSLs

Level 1: Untyped DSLs (Terms.hs, Types.hs)

Direct term/type construction without compile-time safety:

-- Terms.hs - construct Term values
term1 = var "x"
term2 = apply (var "f") (int32 42)
term3 = lambda "x" (var "x")

-- Types.hs - construct Type values
type1 = string
type2 = int32 --> string
type3 = list (optional boolean)

Use Case: Low-level term construction, minimal overhead, runtime errors possible

Level 2: Phantom-Typed DSLs (Phantoms.hs, Library DSLs)

Compile-time type safety via phantom types:

-- Phantoms.hs - TypedTerm a where 'a' is a phantom type
goodFunc :: TypedTerm (Int -> String)
goodFunc = lambda "x" (Strings.toUpper (var "x"))

-- Type error at Haskell compile time!
badFunc :: TypedTerm (Int -> String)
badFunc = lambda "x" (int32 42)  -- Expected String, got Int

Use Case: Write Hydra code with Haskell's type checking as a safety net

Level 3: Term-Encoded DSLs (Meta/Terms.hs, Meta/Types.hs)

Write programs that build programs (meta-programming):

-- Meta/Terms.hs - terms that construct terms
buildAddFunction :: TypedTerm (Int -> Int -> Int)
buildAddFunction =
  lambda "x" $ lambda "y" $
    primitive DefMath.add @@ var "x" @@ var "y"

-- Can inspect and transform this representation

Use Case: Code generators, meta-programs, self-modifying code

Generated DSL modules

The hydra.dsls module (Sources/Kernel/Terms/Dsls.hs) automatically generates phantom-typed DSL functions from any Hydra type module. For each type definition, it produces:

  • Record constructors — one function taking all fields as TypedTerm arguments
  • Field accessors — one function per field, returning the field value
  • Field updaterswithXxx functions that return a modified copy of the record
  • Union injectors — one function per variant (unit variants produce nullary values)
  • Wrap/unwrap — for newtype wrappers

Generated modules live in dist/haskell/hydra-kernel/src/main/haskell/Hydra/Dsl/ (e.g., Hydra.Dsl.Core, Hydra.Dsl.Coders, Hydra.Dsl.Ast). They are also generated into Java and Python as part of the sync pipeline.

Hand-written DSL modules

Base infrastructure (in Hydra/Dsl/)

  • Terms.hs - Plain DSL for terms (apply, lambda, record, inject)
  • Types.hs - Plain DSL for types (operators -->, @@)
  • ShorthandTypes.hs - Convenient aliases (tInt32, tString, tList)
  • Bootstrap.hs - Bootstrapping utilities
  • Annotations.hs - Annotation handling
  • Grammars.hs - Grammar and syntax definitions
  • Literals.hs, LiteralTypes.hs - Literal handling

Meta DSL wrappers (in Hydra/Dsl/Meta/)

These modules re-export the corresponding generated DSL module and add non-standard helpers such as AsTerm-flexible overrides, expression conversion pipelines, and compatibility shims.

  • Meta/Core.hs - Wraps Hydra.Dsl.Core; adds AsTerm overrides for binding, injection, typeVariable; helpers like equalName_, false
  • Meta/Graph.hs - Wraps Hydra.Dsl.Graph; adds graph construction helpers
  • Meta/Phantoms.hs - Phantom-typed term construction (TypedTerm a), operators (@@, ~>, <~)
  • Meta/Terms.hs - Phantom-typed term-encoded terms
  • Meta/Types.hs - Phantom-typed term-encoded types
  • Meta/Variants.hs - Wraps Hydra.Dsl.Variants; metadata variants and introspection
  • Meta/Testing.hs - Wraps Hydra.Dsl.Testing; test convenience helpers

Library DSLs

Phantom-typed wrappers for standard library functions:

Hydra/Dsl/Meta/Lib/
├── Lists.hs       # map, filter, fold, concat, etc.
├── Maps.hs        # lookup, insert, keys, values, etc.
├── Sets.hs        # union, intersection, member, etc.
├── Strings.hs     # concat, split, toUpper, toLower, etc.
├── Chars.hs       # isAlpha, isDigit, toUpper, toLower
├── Math.hs        # add, sub, mul, div, sin, cos, sqrt, etc.
├── Logic.hs       # and, or, not, ifElse
├── Optionals.hs   # fromOptional, cases, isGiven, etc.
├── Eithers.hs     # either, isLeft, rights, etc.
├── Equality.hs    # equal, compare, gt, lt, etc.
├── Pairs.hs       # fst, snd, curry, uncurry
├── Regex.hs       # matches, find, replace, split
└── Literals.hs    # Type conversions and parsing

DSL operators

The DSL provides convenient operators for readable code:

-- Type construction
(-->) :: Type -> Type -> Type          -- Function type
(@@) :: Type -> Type -> Type           -- Type application

-- Term construction
(<.>) :: Term -> Term -> Term          -- Function composition
(@@) :: Term -> Term -> Term           -- Function application
(>:) :: String -> a -> Field           -- Field definition

-- Phantom-typed construction
(~>) :: String -> TypedTerm a -> TypedTerm (x -> b)     -- Lambda
(<~) :: String -> TypedTerm a -> TypedTerm b -> TypedTerm b  -- Let binding
(<<~) :: String -> TypedTerm (Either e a) -> TypedTerm (Either e b) -> TypedTerm (Either e b)  -- Either bind

-- Examples
intToString = int32 --> string                -- Type
addOne = lambda "x" (var "x" <.> int32 1)    -- Term
person = record "Person" [
  "name" >: string,
  "age" >: int32
]

DSL usage example

Here's a complete example showing DSL usage in type inference:

-- From Hydra.Sources.Kernel.Terms.Inference
inferTypeOfEither :: TypedTermDefinition (InferenceContext -> Graph -> Either Term Term -> Either Error InferenceResult)
inferTypeOfEitherDef = define "inferTypeOfEither" $
  doc "Infer the type of an Either term" $
  "cx" ~> "e" ~>

  -- Pattern match on left or right
  Eithers.either_
    -- Left case
    ("left" ~>
      "leftResult" <<~ ref inferTypeDef @@ var "cx" @@ var "left" $
      "type_" <~ InferenceResult.type_ (var "leftResult") $
      "cx2" <~ InferenceResult.context (var "leftResult") $
      produce $ InferenceResult.inferenceResult (var "cx2")
        (Types.either_ (var "type_") (var "any")))

    -- Right case
    ("right" ~>
      "rightResult" <<~ ref inferTypeDef @@ var "cx" @@ var "right" $
      "type_" <~ InferenceResult.type_ (var "rightResult") $
      "cx2" <~ InferenceResult.context (var "rightResult") $
      produce $ InferenceResult.inferenceResult (var "cx2")
        (Types.either_ (var "any") (var "type_")))

    (var "e")

Features Demonstrated:

  • define - Define a named function
  • ~> - Function abstraction
  • <~ - Let binding
  • <<~ - Either-bind (bind into the error-handling monad-like combinator)
  • @@ - Function application
  • ref - Reference to another definition
  • Type-safe operations on InferenceResult and Either

Relationship to core language

User Code (Python/Java/Haskell)
         ↓ (serialized as Core.Term)
Hydra Core Language (Type, Term, Function, Lambda, etc.)
         ↓ (defined via DSLs)
Hydra DSLs in Haskell (Terms.hs, Types.hs, Phantoms.hs, etc.)
         ↓ (generates code for)
Generated Source Code (Haskell, Python, Java)

Self-Hosting Loop:

  1. Write inference logic in Phantom DSL → Sources/Kernel/Terms/Inference.hs
  2. DSL produces Term/Type values representing functions
  3. Code generator converts to executable Haskell → dist/haskell/hydra-kernel/src/main/haskell/Hydra/Inference.hs
  4. Generated code can now infer types for new Hydra code (including DSL code itself!)

Primitive functions

Primitive functions are the standard library of Hydra, providing built-in operations for common data manipulations.

Organization

Primitives are organized into 13 library modules by category. Each module lives in packages/hydra-kernel/src/main/haskell/Hydra/Sources/Kernel/Lib/<Sub>.hs and is the canonical registry for its module name:

Library Count Examples
hydra.lib.chars 6 isAlphaNum, isLower, toUpper
hydra.lib.eithers 15 either, isLeft, rights, bimap, bind
hydra.lib.equality 9 equal, compare, gt, lt, max
hydra.lib.lists 37 map, filter, foldl, concat, sort
hydra.lib.literals 55 Type conversions, parsing, showing
hydra.lib.logic 4 and, or, not, ifElse
hydra.lib.maps 20 lookup, insert, keys, toList
hydra.lib.math 46 add, mul, sin, sqrt, abs
hydra.lib.optionals 12 fromOptional, cases, isGiven
hydra.lib.pairs 3 first, second, bimap
hydra.lib.regex 6 matches, find, findAll, replace, replaceAll, split
hydra.lib.sets 14 union, intersection, member
hydra.lib.strings 13 cat, splitOn, length, lines

Total: 241 primitive functions (post-#156).

Three-level definition structure

Each primitive is defined at three levels, with a clear separation of concerns between universal metadata and per-host implementation (introduced in #156):

Level 1: PrimitiveDefinition + Primitive (kernel types)

PrimitiveDefinition (in hydra.packaging) carries the universal metadata that is the same in every host language:

def "PrimitiveDefinition" $
  record [
    "name">:                  doc "Fully-qualified name" $ core "Name",
    "description">:           doc "Human-readable description" $ core "String",
    "signature">:             doc "Full type signature with parameter names" $
                                typing "TermSignature",
    "isPure">:                doc "Purity flag (defaults to True)" $ core "Boolean",
    "isTotal">:               doc "Totality flag (defaults to True)" $ core "Boolean",
    "defaultImplementation">: doc "Optional reference implementation in Hydra terms" $
                                T.optional (core "Term")
  ]

Primitive (in hydra.graph) pairs the universal metadata with a host-specific implementation. This is what lives in a Graph as the per-host primitive registry:

def "Primitive" $
  record [
    "definition">: doc "Host-independent metadata (name, signature, purity, totality)" $
      packaging "PrimitiveDefinition",
    "implementation">: doc "Concrete, host-specific implementation" $
      graph ~> list (core "Term") ~> Types.either_ (errors "Error") (core "Term")
  ]

The implementation maps the (already-reduced, annotation-stripped) argument terms to a result term, or an error, given the current graph. The interpreter strips annotations and reduces each argument before invoking the primitive, so the implementation can pattern-match the argument terms directly.

The two faces have deliberately different shapes:

  • PrimitiveDefinition.defaultImplementation is an optional pure Hydra term whose type is exactly the primitive's public signature (int32 -> int32 -> int32 for math.add, (a -> Bool) -> [a] -> [a] for lists.filter). It never mentions a graph. It is the portable reference implementation — what the primitive computes — and is used for type-checking and cross-host documentation, not as a runtime substitute (interpreting it would be far slower than a native impl).
  • Primitive.implementation is the host-native runtime carrier, Graph -> [Term] -> Either Error Term. It is how a host evaluates the primitive, natively and quickly.

So the graph appears in the runtime carrier but never in a primitive's signature or in its defaultImplementation. The graph is a property of the implementation's calling convention, not of the primitive's type.

Why does the carrier carry a graph at all? Most primitives ignore it — math.add just adds its two arguments. The graph matters only for higher-order primitives that must evaluate a function argument mid-computation. Take lists.filter applied to the predicate \x -> equality.gt x 2: the native impl is (Term -> Bool) -> [Term] -> [Term], so it must turn that predicate term into a native Term -> Bool, which means reducing gt x 2 per element. But gt arrives as an unresolved name (hydra.lib.equality.gt) — it sits under a lambda binder and cannot be evaluated until filter supplies a concrete x — and resolving that name requires the graph's primitive table. The graph passed in is the interpreter's live graph at the call site (which may hold primitives or bindings beyond the kernel's), so a captured or global graph would be wrong; it must be threaded from the reducer. The complementary case — a higher-order primitive whose result shape is fixed by its data argument, e.g. lists.map or eithers.bimap — can instead return an unreduced applicative term ([f x1, f x2, ...]) and let the outer reducer fold it, needing no graph. The graph is retained for the minority of primitives that branch on a reduced function result (filter, find, foldl over Either, …).

(The Either-based implementation replaces the former Flow monad, removed in #245. The host-independent PrimitiveDefinition was split out from the implementation in #156. The vestigial InferenceContext parameter — which no primitive ever consulted — was dropped from the carrier in #446, leaving the graph; this was sequenced with the defaultImplementation integration in #437.)

Level 2: PrimitiveDefinition declaration (the canonical registry)

The kernel modules Hydra/Sources/Kernel/Lib/<Sub>.hs declare every primitive as a PrimitiveDefinition (an arm of Definition alongside term and type), collectively forming the primitive registry. The 13 modules — Chars, Eithers, Equality, Lists, Literals, Logic, Maps, Math, Optionals, Pairs, Regex, Sets, Strings — declare 240 primitives total.

Example (Hydra/Sources/Kernel/Lib/Logic.hs):

ns :: ModuleName
ns = ModuleName "hydra.lib.logic"

module_ :: Module
module_ = Module {
            moduleName = ns,
            moduleDefinitions = definitions,
            moduleDependencies = Bootstrap.unqualifiedDep <$> kernelTypesModuleNames,
            moduleDescription = Just "Primitives in the hydra.lib.logic namespace."}
  where
    definitions = [
      toPrimitive "Compute the logical AND of two boolean values." andSig and_,
      primNoDef "ifElse" "Compute a conditional expression." ifElseSig,
      toPrimitive "Compute the logical NOT of a boolean value." notSig not_,
      toPrimitive "Compute the logical OR of two boolean values." orSig or_]

andSig :: TermSignature
andSig = sig $ TypeScheme [] (Types.boolean Types.~> Types.boolean Types.~> Types.boolean) Nothing

and_ :: TypedTermDefinition (Bool -> Bool -> Bool)
and_ = define "and" $
  doc "Logical AND, defined in terms of ifElse." $
  "a" ~> "b" ~> Logic.ifElse (var "a") (var "b" :: TypedTerm Bool) false

The metadata flows through to JSON in dist/json/hydra-kernel/src/main/json/hydra/lib/<sub>.json, where it becomes the cross-host source of truth for the primitive's name, signature, description, and default implementation.

Level 3: Native implementations and host registries

Per host, two things are needed beyond the kernel metadata:

  1. Native implementations — for the big three, in the overlay/<lang>/hydra-kernel/ tree (#418); e.g. Haskell at overlay/haskell/hydra-kernel/src/main/haskell/Hydra/Haskell/Lib/Math.hs:

    add :: Int -> Int -> Int
    add x y = x + y
  2. Host-side primitive registry — binds names to native impls. The Haskell registry lives in overlay/haskell/hydra-kernel/src/main/haskell/Hydra/Dsl/Libraries.hs (#473):

    hydraLibMath :: Library
    hydraLibMath = standardLibrary [
      prim2 DefMath.add Math.add [] int32 int32 int32,
      ...]

    The prim1/prim2/prim3 helpers build a Primitive by pairing the host's native implementation with a signature and the generated PrimitiveDefinition (DefMath.add, where DefMath is the generated Hydra.Lib.Math def-module) — the single source of truth for the name (#473), taken via the ToPrimName class. standardLibrary derives the library's module name from its first primitive, so it needs no ModuleName argument. The argument-type info passed to the helper is a host-side repetition the registry needs in native type-coder form, not the source of truth for the name.

    Every other host has an analogous registry that likewise derives names from the generated hydra.lib.* def-modules: overlay/{java,python}/.../lib/Libraries.{java,py} and heads/<lang>/.../lib/Libraries.<ext> for Scala and the Lisp dialects.

The type information passed to prim1/prim2/prim3 at host registration is a sanity-check repetition of the canonical signature in the kernel-side registry (Hydra/Sources/Kernel/Lib/<Sub>.hs) — it's expected to match, and divergence is a bug. On the Haskell side this keeps the bootstrap graph aligned with the JSON kernel; other hosts consume the JSON kernel directly and so inherit the canonical metadata. Future work (see follow-ups) may have host registries derive their signatures directly from the kernel metadata to eliminate this duplication.

Default implementations

PrimitiveDefinition.defaultImplementation : optional<Term> carries an optional declarative reference implementation in pure Hydra terms. Two uses:

  • Fallback for minimal interpreters. A host that doesn't ship a native impl for a primitive can fall back to evaluating the default Hydra term.
  • Proof-friendly reference. Targets that can prove or simulate the default body (e.g. Coq) get a verified reference implementation for free.

Default implementations are pure expressions — they take only the primitive's declared arguments (no Context, no Graph) and reduce using only other primitives. Not every primitive has one: fundamental operations like logic.ifElse, pairs.first, character predicates, and arithmetic cannot be expressed in terms of other primitives and use primNoDef.

The defaultImplementation field replaces the pre-#156 Hydra.Sources.Kernel.Lib.Defaults.* modules, which encoded the same notion as interpreter-friendly Term-AST constructions. Those modules were merged into the canonical Lib/<Sub>.hs registries' inline toPrimitive ... name_ entries and then removed (#437).

TermCoder system

The Hydra.Dsl.Prims module provides type coding:

-- Literal types
int32, int64 :: TermCoder Int
float32, float64 :: TermCoder Double
bigint :: TermCoder Integer
string :: TermCoder String
boolean :: TermCoder Bool
binary :: TermCoder ByteString

-- Container types
list :: TermCoder a -> TermCoder [a]
set :: TermCoder a -> TermCoder (Set a)
optional :: TermCoder a -> TermCoder (Maybe a)
map :: TermCoder k -> TermCoder v -> TermCoder (Map k v)
tuple2 :: TermCoder a -> TermCoder b -> TermCoder (a, b)

-- Function types
function :: TermCoder a -> TermCoder b -> TermCoder (a -> b)

-- Sum types
either_ :: TermCoder a -> TermCoder b -> TermCoder (Either a b)

Each TermCoder contains:

  1. Type representation
  2. Encoder: Haskell value → Hydra Term
  3. Decoder: Hydra Term → Haskell value

Multi-language generation

Primitive names and signatures are defined once in Haskell, in each kernel packages/hydra-kernel/src/main/haskell/Hydra/Sources/Kernel/Lib/<Sub>.hs module (as PrimitiveDefinitions), and become part of the generated kernel — the hydra.lib.* def-modules — in every target language. The implementations shown below are hand-written per host language. For the big three (Haskell, Java, Python) they live in the top-level overlay/<lang>/hydra-kernel/ tree (#418) and are overlaid into the published dist/<host>/hydra-kernel/ artifact during sync (for Haskell by sync-haskell.sh, for Java/Python by heads/<host>/bin/copy-kernel-runtime.sh); other hosts keep their implementations under heads/<host>/. See build-system.md §Hand-written runtime in hydra-kernel for the full mechanism and the catalog of which subtrees are overlaid per language.

Java Generation

Location: overlay/java/hydra-kernel/src/main/java/hydra/lib/ (#418; was heads/java/src/main/java/hydra/lib/)

Each primitive becomes a class extending PrimitiveFunction:

// hydra/lib/math/Add.java
public class Add extends PrimitiveFunction {
    public Name name() {
        return new Name("hydra.lib.math.add");
    }

    public TypeScheme type() {
        return scheme(function(int32(), int32(), int32()));
    }

    protected Function<List<Term>, Either<Error, Term>> implementation() {
        return args -> map2(
            Expect.int32(args.get(0)),
            Expect.int32(args.get(1)),
            (arg0, arg1) -> Terms.int32(apply(arg0, arg1))
        );
    }

    public static Integer apply(Integer augend, Integer addend) {
        return augend + addend;
    }
}

Python Generation

Location: overlay/python/hydra-kernel/src/main/python/hydra/lib/ (#418; was heads/python/src/main/python/hydra/lib/)

Pure Python implementations:

# hydra/lib/math.py
def add(x: int, y: int) -> int:
    """Add two integers."""
    return x + y

def sqrt(x: float) -> float:
    """Square root of a float."""
    return math.sqrt(x)

# hydra/lib/lists.py
def map_(f: Callable[[A], B], xs: frozenlist[A]) -> frozenlist[B]:
    """Map a function over a list."""
    return tuple(f(x) for x in xs)

Key design patterns

Pattern 1: Type Polymorphism

prim2 _equality_equal Equality.equal ["x"] x x boolean
  where x = variable "x"

The same primitive works with any type supporting equality.

Pattern 2: Default implementations

A PrimitiveDefinition carries an optional defaultImplementation : optional<Term> — a declarative reference implementation in pure Hydra terms. The kernel registry declares each primitive with one of two helpers:

  • primDef — supplies a default Hydra-term implementation; usable as a fallback by minimal interpreters that lack a native implementation, and as a proof-friendly reference.
  • primNoDef — no default; used for primitives that are fundamental (e.g. logic.ifElse, pairs.first) or whose meaning is host-native (e.g. arithmetic, char predicates, regex matching).

On the Haskell host, the prim* family in Hydra.Dsl.Prims pairs each name with its native implementation regardless of which kernel helper declared the primitive.

Pattern 3: Either for Error Handling

All primitives operate within Either Error a, where Error is the structured union type from hydra.errors:

type Result a = Either Error a

An InferenceContext value is threaded alongside the Graph as an explicit parameter, carrying inference state (the fresh-variable counter and the current subterm-path trace, accumulated backward). This provides:

  • Explicit error handling with short-circuit semantics
  • Subterm-path tracing via the threaded InferenceContext parameter
  • No hidden state — all inference state is passed explicitly

Variable resolution and graphs

All named references in Hydra use TermVariable. At runtime, the reduction engine resolves each variable name through the Graph, which holds three separate namespaces.

The Graph structure

A Graph contains (among other fields):

Field Type Contents
graphBoundTerms Map Name Term Module-level definitions (element bindings, let-bound variables)
graphPrimitives Map Name Primitive Built-in primitive functions and constants
graphBoundTypes Map Name TypeScheme Type schemes for bound terms

Lambda-bound variables are not stored in the graph; they are resolved structurally during beta-reduction.

Resolution order

When reduceTerm encounters a TermVariable, it resolves the name in this order:

  1. graphBoundTerms — module definitions, let-bound variables. If found, the binding's term is recursively reduced.
  2. graphPrimitives — built-in functions and constants. If found, the primitive is applied with arity-based argument collection.
  3. Lambda-bound — the variable was introduced by a lambda parameter. It remains as-is (a free variable in the current scope).

This means module bindings shadow primitives, and primitives shadow lambda-bound variables. In practice, names don't collide: module definitions use qualified names like hydra.core.Term, while primitives use the hydra.lib.* module name.

Construction-time shadowing

As a safety mechanism, buildGraph filters graphBoundTerms and graphBoundTypes against graphPrimitives at construction time. Any binding whose name matches a primitive is removed from the graph. This ensures primitives always take priority by construction, not just by resolution order.

Assembling primitives: graphWithPrimitives

The hydra.lexical.graphWithPrimitives function creates a graph with primitives assembled from two lists:

graphWithPrimitives :: [Primitive] -> [Primitive] -> Graph
graphWithPrimitives builtIn userProvided = ...

User-provided primitives shadow built-in ones (left-biased map union). This enables:

  • Language implementers to override kernel primitives with optimized host-language versions.
  • Users to provide domain-specific primitive functions alongside the standard library.

The bootstrap graph (Hydra.Dsl.Bootstrap.bootstrapGraph in Haskell) uses the standard libraries directly. Test runners and custom applications can use graphWithPrimitives to inject additional primitives.

Built-in primitives vs. user-defined functions

Built-in primitives (graphPrimitives) are implemented natively in the host language. Each Primitive carries a definition : PrimitiveDefinition (universal metadata: name, description, signature, isPure/isTotal flags, optional reference implementation) and an implementation function that maps a list of Term arguments to a result Term. See Primitive functions above.

User-defined functions (graphBoundTerms) are Hydra terms — typically lambdas or compositions of other terms. They are defined in modules and resolved by name just like primitives, but they are reduced by the Hydra reduction engine rather than calling native code.

Both are referenced the same way in Hydra source code: as TermVariable with a qualified name. The distinction is invisible to Hydra programs.


Cross-language compilation (coders)

Coders enable cross-compilation of Hydra programs between different language implementations. They transform Hydra modules (types and terms) from one language's representation to another, allowing developers to write Hydra code in their preferred language and compile it to any other supported language.

See also:

  • Property Graphs - Mapping Hydra schemas to property graphs with annotations
  • Testing - How the common test suite validates coder parity

Coder locations

In 0.15, generated Haskell coder output is split across per-package directories under dist/haskell/. Each package corresponds to a coder family or domain.

dist/haskell/hydra-haskell/src/main/haskell/Hydra/Haskell/    # Haskell coder
dist/haskell/hydra-java/src/main/haskell/Hydra/Java/          # Java coder
dist/haskell/hydra-python/src/main/haskell/Hydra/Python/      # Python coder
dist/haskell/hydra-scala/src/main/haskell/Hydra/Scala/        # Scala 3 coder
dist/haskell/hydra-lisp/src/main/haskell/Hydra/Lisp/          # Lisp coder (4 dialects)
dist/haskell/hydra-pg/src/main/haskell/Hydra/                 # Property graphs
│   ├── Pg/                                                   # PG model + GraphSON
│   ├── Cypher/                                               # Cypher
│   ├── Graphviz/                                             # Visualization
│   └── Tinkerpop/                                            # Gremlin / TinkerPop
dist/haskell/hydra-rdf/src/main/haskell/Hydra/                # RDF family
│   ├── Rdf/                                                  # RDF model + N-Triples
│   ├── Shacl/                                                # SHACL
│   ├── Owl/                                                  # OWL
│   ├── Shex/                                                 # ShEx
│   └── Xml/                                                  # XML schema
dist/haskell/hydra-ext/src/main/haskell/Hydra/                # Long-tail coders
│   ├── Avro/                                                 # Avro
│   ├── Protobuf/                                             # Protocol Buffers
│   ├── Graphql/                                              # GraphQL
│   ├── Pegasus/                                              # LinkedIn PDL
│   ├── Json/Schema/                                          # JSON Schema
│   ├── Cpp/, Csharp/, Go/, Rust/, Yaml/, ...                # Other languages
│   └── Atlas/, Azure/, Datalog/, Delta/, Geojson/, Iana/, Kusto/, Osv/, Parquet/, Sql/, Stac/, Workflow/
dist/haskell/hydra-coq/src/main/haskell/Hydra/Coq/            # Coq coder
dist/haskell/hydra-typescript/src/main/haskell/Hydra/TypeScript/  # TypeScript (in progress)
dist/haskell/hydra-wasm/src/main/haskell/Hydra/Wasm/          # WebAssembly (in progress)

The hydra-ext package collects long-tail coders that don't yet have their own dedicated package. Domain-specific groups (hydra-pg, hydra-rdf) and language targets that bootstrap (hydra-java, hydra-python, hydra-scala, hydra-lisp) are split out so each maps cleanly to its own published artifact.

The hydra-bench package is an opt-in sibling: it holds synthetic inference benchmark workloads (hydra.bench.*) which are deliberately stress-shaped and not regenerated by the default sync. See packages/hydra-bench/README.md; run bin/sync-bench.sh to refresh, then bin/run-inference-bench.sh to measure.

Common coder structure

Each language directory typically contains:

Language/
├── Coder.hs        # Main transformation logic
├── Serde.hs        # AST to text serialization
├── Language.hs     # Language definition and constraints
├── Names.hs        # Name conversion and case conventions
├── Utils.hs        # Language-specific utilities
└── Settings.hs     # Configuration (optional)

Entry point pattern

All coders follow the same shape: a Module plus context goes in, a map of generated file paths to contents comes out, and errors are reported via Either Error.

Examples:

moduleToJava   :: InferenceContext -> Graph -> Module -> Either Error (M.Map FilePath String)
moduleToPython :: InferenceContext -> Graph -> Module -> Either Error (M.Map FilePath String)
moduleToCpp    :: InferenceContext -> Graph -> Module -> Either Error (M.Map FilePath String)

Coder framework

Located in packages/hydra-kernel/src/main/haskell/Hydra/Sources/Kernel/Types/Coders.hs (generated output: dist/haskell/hydra-kernel/src/main/haskell/Hydra/Coders.hs):

-- Bidirectional transformation
data Coder v1 v2 = Coder {
  coderEncode :: InferenceContext -> v1 -> Either Error v2,
  coderDecode :: InferenceContext -> v2 -> Either Error v1
}

-- Adapter for language-specific transformations
data Adapter t1 t2 v1 v2 = Adapter {
  adapterIsLossy :: Bool,              -- Track lossy conversions
  adapterSource :: t1,                 -- Source type schema
  adapterTarget :: t2,                 -- Target type schema
  adapterCoder  :: Coder v1 v2         -- Value-level transformation
}

Encoding process

Step 1: Term to Language

Terms are recursively converted to target language expressions:

-- Java example
encodeTerm :: InferenceContext -> Graph -> Aliases -> Term -> Either Error Java.Expression

-- Handles:
-- - Literals (int, string, boolean, etc.)
-- - Applications (function calls)
-- - Functions (lambdas or method references)
-- - Records (class constructors)
-- - Unions (abstract class with visitors)
-- - Variables (local variables or fields)
-- - Let bindings (variable declarations)
-- - Case statements (visitor pattern)

Step 2: Type Encoding

Hydra types map to language types:

-- Java example
encodeType :: InferenceContext -> Graph -> Aliases -> Type -> Either Error Java.Type

-- Maps:
-- TypeRecord → Java Class
-- TypeUnion → Abstract class with subclasses
-- TypeLambda → Generic type parameter
-- TypeForall → Java generics with bounds
-- TypeFunction → Java functional interfaces
-- TypeList → List<T>
-- TypeMap → Map<K, V>
-- TypeOptional → Optional<T>

Step 3: Module Generation

Complete module transformation:

-- Java example from hydra-java's Coder.hs
moduleToJava :: InferenceContext -> Graph -> Module -> Either Error (M.Map FilePath String)
moduleToJava cx g mod = do
  -- Extract types from module
  types <- getTypes cx g mod

  -- Generate class for each type
  classes <- traverse (typeToJavaClass cx g mod) types

  -- Generate package structure
  let packagePath = moduleNameToPath (moduleName mod)

  -- Map file paths to source code
  pure $ M.fromList $ map (\cls ->
    (packagePath </> className cls <.> "java",
     renderJavaClass cls)) classes

The adapter framework

Adapters handle type compatibility between languages:

-- Core adapter functions
languageAdapter    :: InferenceContext -> AdapterContext -> Language -> Type
                   -> Either Error (Adapter Type Type Term Term)

adaptTypeForLanguage :: InferenceContext -> AdapterContext -> Language -> Type
                    -> Either Error Type

termAdapter        :: InferenceContext -> AdapterContext -> Type
                   -> Either Error (Adapter FieldType FieldType Field Field)

Adapter composition:

composeCoders  :: Coder v1 v2 -> Coder v2 v3 -> Coder v1 v3

constructCoder :: InferenceContext -> AdapterContext -> Language -> Type
               -> Either Error (Coder Term Term)

Module transformation pipeline:

  1. Extract all elements as TypeApplicationTerms
  2. Gather unique types
  3. Construct coders for each type (via adapters)
  4. Pass coders to the module constructor
  5. Generate output files

Language constraints

Each language defines its capabilities:

data Language = Language {
  languageName :: LanguageName,
  languageConstraints :: LanguageConstraints
}

data LanguageConstraints = LanguageConstraints {
  languageConstraintsEliminationVariants :: S.Set EliminationVariant,
  languageConstraintsLiteralVariants :: S.Set LiteralVariant,
  languageConstraintsFloatTypes :: S.Set FloatType,
  languageConstraintsFunctionVariants :: S.Set FunctionVariant,
  languageConstraintsIntegerTypes :: S.Set IntegerType,
  languageConstraintsTermVariants :: S.Set TermVariant,
  languageConstraintsTypeVariants :: S.Set TypeVariant,
  languageConstraintsTypes :: Type -> Bool  -- Custom type predicate
}

Language-specific patterns

Java Coder

Key features:

  • Generic type parameter handling
  • Visitor pattern for union elimination
  • Serialization support (JSON/Avro)
  • Let-binding flattening with recursive variable detection
  • Symbol classification (constant, nullary, unary, local variable)
-- Java/Coder.hs (line 715-723)
TermUnion (Injection name (Field (Name fname) v)) -> do
  let (Java.Identifier typeId) = nameToJavaName aliases name
  let consId = Java.Identifier $ typeId ++ "." ++ sanitizeJavaName (capitalize fname)
  args <- if EncodeCore.isUnitTerm v
    then return []
    else do
      ex <- encode v
      return [ex]
  return $ javaConstructorCall (javaConstructorName consId Nothing) args Nothing

Python Coder

Key features:

  • Metadata gathering for imports
  • Type variable tracking
  • Case statement deduplication
  • Walrus operator for let-bindings (Python 3.8+)
  • Inline type parameters (Python 3.12+)
  • Automatic casting for polymorphic values

Recent fix for Issue #206:

-- Python/Coder.hs (Term inject case, in DSL form)
_Term_inject>>: "inj" ~>
  "tname" <~ Core.injectionTypeName (var "inj") $
  "field" <~ Core.injectionField (var "inj") $
  "rt" <<~ (Resolution.requireUnionType @@ var "cx" @@ (pythonEnvironmentGetGraph @@ var "env") @@ var "tname") $
  Logic.ifElse (Predicates.isEnumRowType @@ var "rt")
    (projectFromExpression (pyNameToPyExpression (encodeNameQualified @@ var "env" @@ var "tname"))
      (encodeEnumValue @@ var "env" @@ Core.fieldName (var "field")))
    -- Omit argument for unit-valued variants (resolves #206)
    ...

Serialization (Serde) layer

The Serde.hs files bridge language AST to formatted source code:

Java Serde (~600+ lines)

  • Java AST → formatted Java source
  • Comment preservation
  • Import organization

Python Serde (~400+ lines)

  • Python AST → formatted Python source
  • Indentation and block structure
  • Quote styles and escaping

Conventions shared across all Serde modules:

  • Naming. Every per-syntax-element writer is named *ToExpr — e.g. expressionToExpr, statementToExpr, typeToExpr — uniformly across Java, Python, Haskell, Scala, the four Lisp dialects, Cpp, TypeScript, Rust, Go, and the Pegasus / GraphQL / Protobuf / JsonSchema / RDF / Graphviz extension coders.
  • Layout. Writers compose output through shared helpers in hydra.serialization: chooseLayout selects between vertical and horizontal forms by measured width, parenListAdaptive / commaSepAdaptive / spaceSepAdaptive lay out punctuated lists, and the canonical line-length budget is maxLineWidth = 120. Per-language Serde files call into these helpers and contribute only the language-specific token-emission.
  • The Yaml writer is the sole exception: it returns String directly rather than going through the Expr layer. (Yaml's whitespace- sensitive layout doesn't fit the adaptive framework cleanly.)

The bootstrap process

Hydra is self-hosting: it defines its own type system and can regenerate itself.

Module structure

Hydra's source modules are divided into type modules and term modules. Type modules define data models — the types that make up Hydra's internal representation. Term modules provide the logic and procedural aspect — the functions that operate on those types. This distinction applies throughout, not just to the kernel.

The modules compiled in the Haskell head are aggregated in Hydra.Sources.All (kernel + Haskell coder + JSON) and Hydra.Sources.Ext (all extension coders):

  • Kernel type modules (kernelTypesModules) — Hydra's internal data model: the core type system (hydra.core), graph and package structures (hydra.graph, hydra.packaging), and supporting types like hydra.typing, hydra.coders, hydra.query, hydra.tabular, etc. Hand-written DSL definitions in Hydra.Sources.Kernel.Types.*.

  • Kernel term modules (kernelTermsModules) — The logic of Hydra: type inference, type checking, term reduction, rewriting, code generation, etc. Hand-written DSL definitions in Hydra.Sources.Kernel.Terms.*. Also includes the encoder/decoder source modules (see below). For the high-level framing of how inference and checking cooperate, see the Inference wiki page; this section covers only the build-system mechanics.

  • Haskell modules (haskellModules) — Both type modules (the Haskell AST model) and term modules (the Haskell coder, serializer, and utilities). These are specific to hydra-haskell and enable Haskell code generation.

  • JSON modules (jsonModules) — The JSON data model (type module) along with the JSON coder, parser, and writer (term modules).

  • Other modules (otherModules) — Currently the YAML model and coder utilities.

  • Test modules (testModules) — The common test suite, compiled into each target language as part of the sync process. Defined separately from mainModules.

Encoder/decoder source modules are a special category of term modules that are generated from the type modules rather than hand-written. For each kernel type module (e.g., hydra.core), a pair of modules is generated that can encode objects of that type as Hydra Terms and decode them from Terms. These live in Hydra.Sources.{Encode,Decode}.* and are included in kernelTermsModules alongside the hand-written term modules.

The full set is composed as:

mainModules   = kernelModules ++ haskellModules ++ jsonModules ++ otherModules
kernelModules = kernelTypesModules ++ kernelTermsModules ++ jsonModules

kernelTermsModules = kernelPrimaryTermsModules   -- hand-written logic modules

The encode/decode modules (hydra.encode.*, hydra.decode.*) are synthesized in-memory at runtime by generateEncoderModules/generateDecoderModules (#448) and injected into the driver's universe before inference runs. They are no longer shipped as dist/haskell/.../Sources/{Encode,Decode}/*.hs files.

The sync pipeline

All modules in mainModules — regardless of category — go through the same code generation pipeline: writeHaskell (or writeJava, writePython) compiles them from Hydra module definitions into executable code in the target language.

The encoder/decoder source modules require a special staging step because they are derived from the type modules rather than hand-written. The sync script (sync-haskell.sh) handles this with an initial generation pass, followed by a source module generation step, followed by a second generation pass.

Because these derived modules are produced mechanically from a known type, the synthesizer is the authority on their types. Each derived Source module contains a single module_ TermDefinition whose term has type hydra.packaging.Module. The synthesizer (moduleToSourceModule in Hydra.Sources.Kernel.Terms.Generation) must set termDefinitionTypeScheme = Just (TypeScheme [] (TypeVariable "hydra.packaging.Module") Nothing) on that binding, so downstream consumers can skip type inference rather than re-derive it from the term's large encoded structure. Leaving the field as Nothing forces a full inferModulesIO pass per derived module — manageable locally but memory-prohibitive on typical CI runners. See #367 for the case where this invariant was violated.

Phases:

Phase What it does
1 Compile mainModules into executable Haskell (initial pass)
2–3 Generate universal test cases and eval lib
4 Generate encoder/decoder source modules from kernelTypesModules
5 Recompile mainModules into executable Haskell (picking up the new source modules)
6 Export and verify JSON kernel
7 Run tests

Phase 5 is necessary because the encoder/decoder source modules generated in phase 4 are part of kernelTermsModules and therefore mainModules. They need to be compiled into executable code just like every other module. A stack build between phases 4 and 5 ensures the Haskell compiler picks up the newly generated source files.

Key generation functions (from Hydra.Generation)

  • writeHaskell / writeJava / writePython — Compile Hydra modules into executable code in the target language. Signature: FilePath -> [Module] -> [Module] -> IO () (output directory, universe modules for resolution, modules to generate).
  • writeDecoderSourceHaskell / writeEncoderSourceHaskell — Generate encoder/decoder source modules (Hydra module definitions) from type modules. Used in phase 4.
  • writeDecoderHaskell / writeEncoderHaskell — Convenience functions that generate encoder/decoder modules and immediately compile them to executable Haskell in one step.

For detailed context on encoder/decoder modules, see Issue #47: Per-Type Term Coders.

Incremental inference

This section covers how Hydra's build system runs inference at scale — caching, incremental skipping, per-package iteration. For what inference itself does (HM with elaboration to typed System F, the two cooperating modules hydra.inference and hydra.checking, the Graph as inference context), see the Inference wiki page.

inferModulesGiven (in Hydra.Codegen) takes a universe and a target set and re-infers only the relevant subset. Bindings in the target modules or in the transitive term-dependency closure that lack a pre-attached TypeScheme are fed to inferGraphTypes; clean non-target bindings are left untouched, and their cached schemes are consulted during inference via graphBoundTypes. Equivalent to inferModules when nothing in the universe carries a scheme, which is today's default path.

A content-hash cache (Hydra.Digest) sits on top: writeModulesJson computes SHA-256 hashes of kernel DSL source files and short-circuits inference and writes entirely if every hash matches the stored digest and every target JSON file exists. Digest files live under the gitignored build-cache subtree at dist/json/<pkg>/build/<main|test>/digest.json and dist/json/build/digest.json (see #247 and #379 for the build/ layout rationale).

Per-package inference

When the cache misses or the dirty set is too large to fit in one inferModulesGiven call, Phase 1 falls back to a per-package iterative driver (Hydra.Generation.inferAndWriteByPackage and its seeded variant inferAndWriteByPackageSeeded). The driver topologically sorts the package dep graph from each packages/<pkg>/package.json's dependencies field, then for each package in order runs a Generation-side wrapper inferModulesGivenSchemes over only that package's modules — with the typed-so-far universe merged into the inference graph as (Name, TypeScheme) maps rather than full Module values. Each iteration writes the focus package's JSON to disk immediately, which forces the inferred TypeSchemes through serialization and breaks any lazy thunk chain across iterations.

Two entry points use the same driver:

  • The cold-cache fallback in writeModulesJsonPackageSplit calls inferAndWriteByPackage with empty seed maps and an empty schema context. Every module flows through the per-package loop.
  • The warm-cache incremental path in tryIncrementalInference calls inferAndWriteByPackageSeeded with the JSON-loaded clean modules' (Name, TypeScheme) pairs as the seed maps (one for term bindings, one for type-def schemas) and the clean modules themselves as a schema-context-only set used to build the JSON-write schemaMap once up front. After that one-shot build, the clean modules are unreferenced and can be GC'd; the iteration carries only the seed maps + dirty modules forward.

Peak memory per iteration is bounded by type-schemes of transitive deps + bindings of the focus package, not by every prior module's full payload (which is what the original [Module] accumulator retained). A TypeScheme is typically 1-3 orders of magnitude smaller than the term body it types, so this is what keeps Phase 1 within the -M6G CI heap cap on a wholly dirty universe (e.g. after a kernel-wide rename invalidates every module name's digest). See #381 and Phase 1's memory envelope in the build-system doc for the wall-time trade-off and the dead-end per-SCC attempt that preceded it.

One subtlety: DSL-authored TypeDefinitions encode polymorphism as nested TypeForall wrappers inside typeSchemeBody (with empty typeSchemeVariables). The kernel's schemaGraphToTypingEnvironment unwraps these at schema-graph lookup time; when we bypass the schema graph and inject TypeSchemes directly into graphSchemaTypes, we have to apply the same normalization (normalizeTypeScheme) ourselves — otherwise downstream consumers that pattern-match on the body shape (e.g. expecting record{...}) hit UnexpectedShape errors against the raw ∀.∀.…record{...} form.

The Java and Python native DSL → JSON pipelines (heads/java/.../Generation.java and heads/python/.../generation.py) mirror the same driver shape in their host language so the native generators (UpdateJavaJson, update-python-json.py) hit the same per-package memory envelope — relevant when those pipelines grow to cover more than their own one package.

The bootstrap challenge

DSL defines Hydra      → Generates code for Hydra
        ↓                         ↓
But generator needs            Code generation
to understand DSL             requires understanding
                              the new DSL constructs!
                              CIRCULAR DEPENDENCY!

Bootstrap solution: Gradual extension

When adding new features (like Either type):

Step 1: Define in DSL

Add to core types in Core.hs:

def "Term" $
  union [
    -- ... existing variants
    "either">: Types.either_ (core "Term") (core "Term"),
    -- ...
  ]

Add DSL operations in Phantoms.hs:

either_ :: TypedTerm (a -> c) -> TypedTerm (b -> c) -> TypedTerm (Either a b) -> TypedTerm c

Step 2: Build (Will Fail)

stack build
# Error: Generator doesn't understand 'either' yet

Step 3: Manual Patch

Hand-translate DSL definitions to Haskell in generated files:

-- Manually edit: dist/haskell/hydra-kernel/src/main/haskell/Hydra/Inference.hs
inferTypeOfEither :: InferenceContext -> Graph -> Either Term Term -> Either Error InferenceResult
inferTypeOfEither cx graph (Left left) = do
  leftResult <- inferType cx graph left
  let leftType = inferenceResultType leftResult
  let cx2 = inferenceResultContext leftResult
  return $ InferenceResult cx2 (TypeUnion [leftType, typeAny])
inferTypeOfEither cx graph (Right right) = do
  rightResult <- inferType cx graph right
  let rightType = inferenceResultType rightResult
  let cx2 = inferenceResultContext rightResult
  return $ InferenceResult cx2 (TypeUnion [typeAny, rightType])

Step 4: Rebuild

stack build
# Success! Generator now understands Either

Step 5: Regenerate

bin/sync-haskell.sh
# Regenerates DSL → JSON → Haskell (Phase 1 of the sync pipeline).
# Replaces the retired hydra-ext-debug exec.

Step 6: Final Build

stack build
# Self-hosting loop complete!

Generated code structure

heads/haskell/src/main/haskell/Hydra/
├── Dsl/                    # DSL definitions (manual)
├── Lib/                    # Native implementations (manual)
├── Generation.hs           # Code-gen driver (manual)
├── ExtGeneration.hs        # Driver for ext-language coders (manual)
└── Haskell/Generation.hs   # Haskell-specific coder driver (manual)

packages/hydra-kernel/src/main/haskell/Hydra/
└── Sources/                # Kernel DSL-based specifications (manual)
    ├── Kernel/Types/       # Type modules (data shapes)
    ├── Kernel/Terms/       # Term modules (kernel functions)
    ├── Kernel/Lib/         # Primitive registry: PrimitiveDefinition per hydra.lib.<sub> module name
    └── Test/               # Common test suite

packages/hydra-<lang>/src/main/haskell/Hydra/
└── Sources/<Lang>/         # Per-language coder DSL sources (manual)

dist/haskell/hydra-kernel/src/main/haskell/   # Generated kernel code
├── Hydra/
│   ├── Core.hs             # Generated Core types
│   ├── Variants.hs         # Generated Variants types
│   ├── Inference.hs        # Generated type inference
│   ├── Checking.hs         # Generated type checking
│   └── ...                 # All kernel modules

dist/haskell/hydra-haskell/src/main/haskell/Hydra/Haskell/Coder.hs   # Haskell coder
dist/haskell/hydra-java/src/main/haskell/Hydra/Java/Coder.hs         # Java coder
dist/haskell/hydra-python/src/main/haskell/Hydra/Python/Coder.hs     # Python coder
dist/haskell/hydra-scala/src/main/haskell/Hydra/Scala/Coder.hs       # Scala coder
dist/haskell/hydra-lisp/src/main/haskell/Hydra/Lisp/Coder.hs         # Lisp coder (4 dialects)
dist/haskell/hydra-pg/src/main/haskell/Hydra/Pg/                     # Property graphs
dist/haskell/hydra-rdf/src/main/haskell/Hydra/Rdf/                   # RDF / SHACL
dist/haskell/hydra-ext/src/main/haskell/Hydra/                       # Long-tail (Avro, Protobuf, GraphQL, ...)

Extending Hydra

Hydra's modular architecture provides clear extension points for adding new functionality. For detailed step-by-step guides, see the Developer Recipes.

Key extension points

Primitive functions: Add new standard library functions by declaring a PrimitiveDefinition in the appropriate Hydra/Sources/Kernel/Lib/<Sub>.hs module, adding native implementations in each host, and regenerating code for all target languages. See the Adding primitives recipe.

Core types: Extend the kernel type system by adding new type definitions to Core.hs, updating DSL constructors, and following the bootstrap process to regenerate the system. See the Extending Hydra Core recipe.

Target languages: Add support for new programming languages by implementing a coder (term/type encoding), serializer (AST to text), and language constraint definitions in the appropriate package under packages/.

Standard libraries: Create new library modules by defining types in Sources/Kernel/Types/, implementing native functions in Lib/, registering primitives, and creating DSL wrappers.


Appendix: Key file locations

Type modules

packages/hydra-kernel/src/main/haskell/Hydra/Sources/Kernel/Types/

├── Core.hs              # hydra.core - foundation
├── Variants.hs          # hydra.variants - metadata
├── Coders.hs            # hydra.coders - Coder, Adapter, Language
├── Graph.hs             # hydra.graph - primitives
├── Packaging.hs         # hydra.packaging - modules, namespaces, packages
├── Typing.hs            # hydra.typing - inference results
└── ...                  # see Hydra.Sources.Kernel.Types.All for the full list

DSL system

heads/haskell/src/main/haskell/Hydra/Dsl/

├── Terms.hs             # Untyped term DSL
├── Types.hs             # Untyped type DSL
├── Phantoms.hs          # Phantom-typed DSL
├── Meta/Terms.hs        # Term-encoded terms
├── Core.hs              # High-level constructors
├── Bootstrap.hs         # Bootstrapping utilities
└── Lib/                 # Library DSLs
    ├── Lists.hs
    ├── Eithers.hs
    └── ...

(#418: three of these DSL-support modules — Dsl/Terms.hs, Dsl/Literals.hs, and Dsl/Meta/Common.hs — are part of the hydra-kernel distribution runtime and have moved to overlay/haskell/hydra-kernel/src/main/haskell/Hydra/Dsl/. The rest of Hydra/Dsl/ remains head-only.)

Primitive functions

packages/hydra-kernel/src/main/haskell/Hydra/Sources/Kernel/Lib/ — Canonical primitive registry (one PrimitiveDefinition-emitting module per hydra.lib.<sub> module name)

overlay/haskell/hydra-kernel/src/main/haskell/Hydra/Haskell/Lib/ — Native Haskell implementations (relocated here from the head by #418)

overlay/haskell/hydra-kernel/src/main/haskell/Hydra/Dsl/Libraries.hs — Host-side bindings (pairs each native impl with a name derived from its PrimitiveDefinition via prim1/prim2/prim3; relocated here + name-derivation by #473)

Sources/Kernel/Lib/
├── Math.hs
├── Lists.hs
└── ...

overlay/haskell/hydra-kernel/.../Hydra/Haskell/Lib/
├── Math.hs
├── Lists.hs
└── ...

Code generators (DSL sources)

Per-language DSL sources live under packages/hydra-<lang>/src/main/. Most are authored in Haskell (.../haskell/Hydra/Sources/<Lang>/); hydra-java and hydra-python are authored host-natively in Java and Python (.../{java,python}/hydra/sources/):

Generated coder output lands under dist/haskell/<pkg>/ for each source package. The long-tail dist/haskell/hydra-ext/ tree is frozen (targetLanguages: [] in packages/hydra-ext/package.json) and shipped as-is rather than regenerated by the sync matrix.

Generated code

dist/haskell/hydra-kernel/src/main/haskell/ — Generated Haskell dist/java/hydra-kernel/src/main/java/ — Generated Java dist/python/hydra-kernel/src/main/python/ — Generated Python


Summary

Hydra's implementation demonstrates a sophisticated multi-layer architecture:

  1. Type modules define the core type system in a modular, dependency-aware manner
  2. DSLs provide multiple levels of abstraction for writing Hydra code with compile-time safety
  3. Primitives offer a comprehensive standard library with multi-language generation
  4. Coders transform Hydra definitions into multiple target languages systematically
  5. Bootstrap process enables self-hosting and gradual extension of the language

This architecture enables:

  • Type-safe code generation across languages
  • Self-modifying compiler capabilities
  • Systematic addition of new features
  • Clear separation of concerns
  • Maintainable and extensible codebase

The combination of Haskell's type system, phantom types, and careful layering creates a robust foundation for a multi-language transformation framework.


Appendix: Build scripts and executables

Hydra uses a combination of shell script wrappers (in bin/ directories) and Stack executables for code generation and synchronization. The main sync scripts orchestrate the individual executables in the correct order; the individual scripts and executables are useful during development when you need to rerun a single phase.

For how these fit into the release workflow, see docs/release-workflow.md (procedure) and the release policy (wiki).

The sync system is organized in three layers under per-package dist trees (dist/<lang>/<pkg>/src/main/<lang>/...):

  • Layer 1 (transforms): per-language transform-json-to-<lang>.sh scripts in heads/haskell/bin/ convert the JSON universe into source files for one target.
  • Layer 2 (assemblers): per-language heads/<lang>/bin/assemble-all.sh scripts produce complete per-package distributions for one target in batch mode (one Haskell universe load per target).
  • Layer 2.5 (testers): per-language heads/<lang>/bin/test-*.sh scripts compile and test the assembled distributions.
  • Layer 3 (orchestrators): top-level bin/sync*.sh scripts compose the above across hosts and targets.

Each layer caches its work via per-package digest files; warm-cache runs short-circuit in seconds. See Code generation for the full workflow.

Cache layers (warm-cache short-circuits)

Warm bin/sync.sh runs complete in a few seconds when no inputs have changed. The short-circuits, from coarsest to finest:

Layer Gate Cache location Skips when …
Top-level Phase 1 skip bin/lib/check-phase1-fresh.py heads/haskell/.stack-work/phase1-input-cache.txt DSL sources + heads/haskell/src/** + heads/haskell/package.yaml + heads/haskell/stack.yaml + sync-haskell.sh content-hash unchanged since last green sync
Step 3 (verify) sync-haskell.sh coarse skip heads/haskell/.stack-work/verify-json-kernel-cache.txt dist/json/hydra-kernel/**.json + verify-json-kernel source content-hash unchanged
Step 3 (verify, per-module) verify-json-kernel exec heads/haskell/.stack-work/verify-json-kernel-per-module-cache.json Each module's JSON file content-hash matches its last green-verify record
Step 4 (generate Haskell) sync-haskell.sh coarse skip heads/haskell/.stack-work/bootstrap-from-json-cache.txt dist/json/**.json + bootstrap-from-json source content-hash unchanged
Step 6 (stack test) sync-haskell.sh coarse skip heads/haskell/.stack-work/haskell-test-cache.txt Generated kernel + heads/haskell/src/{main,test}/**.hs + package.yaml + stack.yaml content-hash unchanged
Layer 2/2.5 per-package digest-check dist/<lang>/<pkg>/build/<set>/digest.json Per-package input digest matches recorded digest
Layer 2.5 per-target tests bin/lib/test-cache.sh dist/<lang>/test-cache.json Every source + test helper + runner content-hash unchanged since last green run

All caches are content-hash based (not mtime). Editing a file with no byte-level change does not invalidate any cache; changing a byte by any amount invalidates. Caches stamp only after a fully-green run; a failed run does not poison the cache.

The hydra-ext tree has targetLanguages: ["python"], so the haskell sync matrix does not include it in batch mode (assemble-all.sh omits --include-ext). To regenerate dist/haskell/hydra-ext/ after a source DSL change in packages/hydra-ext/Sources/, run heads/haskell/bin/assemble-distribution.sh hydra-ext directly.

Top-level orchestrators (bin/)

Script Purpose
sync-all.sh Full sync. Run the complete matrix (Phase 1 JSON build + Phase 2 per-target assemble + Phase 3 test). Supports --no-tests.
sync.sh Scoped sync. Run a chosen host/target subset via --hosts <list> --targets <list>.
sync-default.sh Shortcut for the haskell/java/python bootstrapping triad.
sync-packages.sh Per-package sync. Bring one or more packages/<pkg>/ trees into sync with their dist/ outputs across all targets. Symmetric to sync.sh but scoped by package rather than (host, target).
sync-java.sh, sync-python.sh, sync-scala.sh Per-language wrappers (host == target).
sync-clojure.sh, sync-common-lisp.sh, sync-emacs-lisp.sh, sync-scheme.sh Per-Lisp-dialect wrappers.
regenerate-lexicon.sh Regenerate docs/hydra-lexicon.txt from the Haskell kernel. On-demand / pre-release (not part of regular sync).
prepare-release.sh Cross-implementation pre-release preparation: verification + upload-ready sdist/docs.

Haskell (heads/haskell/bin/)

Shell script wrappers live in heads/haskell/bin/. Executables without shell wrappers are run via stack exec <name>.

Script / Executable Purpose
sync-haskell.sh Phase 1 sync. Regenerate DSL → JSON, the Haskell kernel, and run stack test. The lexicon is no longer refreshed here; use bin/regenerate-lexicon.sh on demand.
assemble-all.sh Layer 2 batch assembler. Produce Haskell distributions for every package in one bootstrap-from-json invocation.
assemble-distribution.sh <pkg> Layer 2 single-package assembler for one Haskell target package.
transform-haskell-dsl-to-json.sh Transform Haskell DSL sources into the JSON universe under dist/json/.
transform-json-to-haskell.sh Transform JSON into Haskell source files.
transform-json-to-{java,python,scala,lisp,target}.sh Layer 1 per-target transforms.
test-distribution.sh Layer 2.5 tester for Haskell distributions.
update-json-{kernel,main,test,manifest}.sh Export kernel / non-kernel / test / manifest modules to JSON.
verify-json-kernel.sh Verify the JSON kernel round-trips correctly.
bootstrap-from-json Hydrate target-language distributions from the per-package JSON exports (executable; supports --scoped, --all-packages, and flat modes).
digest-check Inspect and refresh per-package digest files used for warm-cache short-circuits (executable).

Target-language assemblers and testers (heads/<lang>/bin/)

Each non-Haskell host mirrors the same shape:

Script Purpose
heads/<lang>/bin/assemble-all.sh Batch Layer 2 assembler for this target.
heads/<lang>/bin/test-*.sh Layer 2.5 tester: compile the assembled per-package distributions and run target-language tests.

<lang> ∈ {java, python, scala}. Lisp follows a similar shape under heads/lisp/bin/ (shared) and heads/lisp/<dialect>/bin/ for each of clojure, common-lisp, emacs-lisp, and scheme.