Skip to content

JLugagne/jsonschema-infer

Repository files navigation

jsonschema-infer

A Go library for inferring JSON Schema from JSON samples. This library analyzes multiple JSON documents and automatically generates a JSON Schema that describes their structure, types, and patterns.

Features

Type inference

  • Infer basic types: string, boolean, number, integer, null
  • Nested objects: full support for deeply nested object structures
  • Arrays: treats all array items as the same type and infers their schema
  • Arrays of objects: infers schemas for complex array items with optional fields
  • Flexible root types: supports objects, arrays, and primitives at root level
  • Multiple types: union types when a field has varying types across samples

Field presence

  • Optional fields: fields appearing in all samples → required; some samples → optional
  • Null → optional: a field whose value is null in any sample is treated as optional, without polluting the inferred type
  • Const detection: if a primitive field always has the same value, the schema includes "const" for that value

Format detection

  • Unified format detection: all formats detected using the same FormatDetector mechanism
  • Built-in formats: datetime (ISO 8601), email, UUID, IPv4, IPv6, URL (HTTP/HTTPS/FTP/FTPS)
  • Custom format detectors: register user-defined format detection functions
  • Disable built-in formats: opt out for full control over format detection

Configuration

  • Predefined types: override inference for specific fields (e.g., created_at as DateTime)
  • Schema versions: Draft 06 and Draft 07 (default)
  • Examples: optional first-value capturing per field (disabled by default)
  • Max samples limit: cap the number of samples processed
  • Indented output: configurable JSON indentation via WithIndent

Performance & API

  • Lazy schema building: schema built on demand, cached between samples — no per-sample overhead
  • O(1) memory per field: format candidates eliminated eagerly; no string buffering
  • AddParsedSample: skip JSON parsing when you've already decoded the document
  • GenerateTo(io.Writer): write schema directly to any writer without an intermediate string
  • Thread-safe: all methods safe for concurrent use — call AddParsedSample from multiple goroutines
  • Load/Resume: load a previously generated schema and continue adding samples

Requirements

  • Go 1.25 or higher

Installation

go get github.com/JLugagne/jsonschema-infer

Documentation

Quick Start

Basic Usage

package main

import (
    "fmt"
    "github.com/JLugagne/jsonschema-infer"
)

func main() {
    // Create a new generator
    generator := jsonschema.New()

    // Add JSON samples
    generator.AddSample(`{"name": "John", "age": 30, "active": true}`)
    generator.AddSample(`{"name": "Jane", "age": 25, "active": false}`)
    generator.AddSample(`{"name": "Bob", "age": 35}`)

    // Generate the schema
    schema, err := generator.Generate()
    if err != nil {
        panic(err)
    }

    fmt.Println(schema)
}

Output:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "name": {
      "type": "string"
    },
    "age": {
      "type": "integer"
    },
    "active": {
      "type": "boolean"
    }
  },
  "required": ["name", "age"]
}

Note: active is not in required because it doesn't appear in all samples.

Predefined Types

Configure specific fields to have predefined types:

generator := jsonschema.New(
    jsonschema.WithPredefined("created_at", jsonschema.DateTime),
    jsonschema.WithPredefined("updated_at", jsonschema.DateTime),
)

generator.AddSample(`{"id": 1, "created_at": "2023-01-15T10:30:00Z"}`)
generator.AddSample(`{"id": 2, "created_at": "2023-02-20T14:45:00Z"}`)

schema, _ := generator.Generate()

Available predefined types:

  • DateTime - string with date-time format
  • String - string type
  • Boolean - boolean type
  • Number - number type
  • Integer - integer type
  • Array - array type
  • Object - object type

Examples

The library can capture the first observed value as an example for each field:

// Enable example capturing
generator := jsonschema.New(jsonschema.WithExamples())

By default, example capturing is disabled to save memory and keep schemas compact.

Arrays of Objects

The library handles arrays of objects and detects optional fields within array items:

generator := jsonschema.New()

generator.AddSample(`{
    "users": [
        {"id": 1, "name": "John", "email": "john@example.com"},
        {"id": 2, "name": "Jane"}
    ]
}`)

generator.AddSample(`{
    "users": [
        {"id": 3, "name": "Bob", "email": "bob@example.com"}
    ]
}`)

schema, _ := generator.Generate()

The resulting schema will show that email is optional in the array items since it doesn't appear in all objects.

Load and Resume

Load a previously generated schema and continue adding samples:

// Generate initial schema
generator1 := jsonschema.New()
generator1.AddSample(`{"name": "John", "age": 30}`)
schemaJSON, _ := generator1.Generate()

// Later, load the schema and add more samples
generator2 := jsonschema.New()
err := generator2.Load(schemaJSON)
if err != nil {
    panic(err)
}

// Add new samples with additional fields
generator2.AddSample(`{"name": "Jane", "age": 25, "email": "jane@example.com"}`)

// Generate updated schema
updatedSchema, _ := generator2.Generate()

Get Current Schema

Retrieve the current schema as a Schema object after any sample:

generator := jsonschema.New()
generator.AddSample(`{"name": "John"}`)

// Get the current schema as an object (not JSON string)
schema := generator.GetCurrentSchema()

// Access properties
fmt.Println(schema.Type) // "object"
fmt.Println(schema.Properties["name"].Type) // "string"

Building and Testing

Build

go build

Or use the Makefile:

make build

Test

go test -v

Or use the Makefile:

make test

Test with Coverage

make test-coverage

This generates coverage.html which you can open in a browser.

Architecture

The library uses a tree-based recursive architecture:

  • SchemaNode: Each node represents a part of the JSON structure

    • Handles only primitives (string, number, boolean, null)
    • Delegates to child nodes for complex types (arrays, objects)
    • Accumulates observations across all samples
  • Lazy Schema Building: Schema is built on demand when Generate() or GetCurrentSchema() is called

    • No redundant work while adding samples
    • Result is cached and reused until the next sample invalidates it
    • Can still inspect schema evolution via GetCurrentSchema() at any time
  • Optional Field Detection: Tracks how many times each field appears

    • Fields appearing in all samples → required
    • Fields appearing in some samples → optional

More Examples

See the examples/ directory for runnable examples:

  • basic - Basic type inference and optional fields
  • arrays - Arrays of objects with optional fields
  • datetime - Automatic datetime detection
  • predefined - Configuring predefined types
  • load_resume - Loading and resuming schemas
  • nested - Deeply nested structures
  • incremental - Watching schema evolution

Run all examples:

cd examples
./run-examples.sh

Comparison with Other Libraries

This library is unique in the Go ecosystem for sample-based JSON schema inference. Similar functionality exists in other languages:

Key advantages of jsonschema-infer:

  • ✅ Pure Go implementation
  • ✅ Incremental schema updates
  • ✅ Load/resume capability
  • ✅ Tree-based recursive architecture
  • ✅ Optional field frequency tracking

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

License

MIT License

Notes

  • The library uses Go's standard encoding/json package for JSON parsing
  • All array items are treated as having the same schema (merged together)
  • Multiple type detection is supported (e.g., a field that's sometimes string, sometimes number)

About

A Go library for inferring JSON Schema from JSON samples. This library analyzes multiple JSON documents and automatically generates a JSON Schema that describes their structure, types, and patterns.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors