A Go library for inferring JSON Schema from JSON samples. This library analyzes multiple JSON documents and automatically generates a JSON Schema that describes their structure, types, and patterns.
- ✅ Infer basic types: string, boolean, number, integer, null
- ✅ Nested objects: full support for deeply nested object structures
- ✅ Arrays: treats all array items as the same type and infers their schema
- ✅ Arrays of objects: infers schemas for complex array items with optional fields
- ✅ Flexible root types: supports objects, arrays, and primitives at root level
- ✅ Multiple types: union types when a field has varying types across samples
- ✅ Optional fields: fields appearing in all samples → required; some samples → optional
- ✅ Null → optional: a field whose value is
nullin any sample is treated as optional, without polluting the inferred type - ✅ Const detection: if a primitive field always has the same value, the schema includes
"const"for that value
- ✅ Unified format detection: all formats detected using the same
FormatDetectormechanism - ✅ Built-in formats: datetime (ISO 8601), email, UUID, IPv4, IPv6, URL (HTTP/HTTPS/FTP/FTPS)
- ✅ Custom format detectors: register user-defined format detection functions
- ✅ Disable built-in formats: opt out for full control over format detection
- ✅ Predefined types: override inference for specific fields (e.g.,
created_atas DateTime) - ✅ Schema versions: Draft 06 and Draft 07 (default)
- ✅ Examples: optional first-value capturing per field (disabled by default)
- ✅ Max samples limit: cap the number of samples processed
- ✅ Indented output: configurable JSON indentation via
WithIndent
- ✅ Lazy schema building: schema built on demand, cached between samples — no per-sample overhead
- ✅ O(1) memory per field: format candidates eliminated eagerly; no string buffering
- ✅
AddParsedSample: skip JSON parsing when you've already decoded the document - ✅
GenerateTo(io.Writer): write schema directly to any writer without an intermediate string - ✅ Thread-safe: all methods safe for concurrent use — call
AddParsedSamplefrom multiple goroutines - ✅ Load/Resume: load a previously generated schema and continue adding samples
- Go 1.25 or higher
go get github.com/JLugagne/jsonschema-infer- Usage Guide - Detailed examples and best practices
- API Documentation - Complete API reference
- Architecture - Internal design and algorithms
- Examples - Runnable example programs
package main
import (
"fmt"
"github.com/JLugagne/jsonschema-infer"
)
func main() {
// Create a new generator
generator := jsonschema.New()
// Add JSON samples
generator.AddSample(`{"name": "John", "age": 30, "active": true}`)
generator.AddSample(`{"name": "Jane", "age": 25, "active": false}`)
generator.AddSample(`{"name": "Bob", "age": 35}`)
// Generate the schema
schema, err := generator.Generate()
if err != nil {
panic(err)
}
fmt.Println(schema)
}Output:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"name": {
"type": "string"
},
"age": {
"type": "integer"
},
"active": {
"type": "boolean"
}
},
"required": ["name", "age"]
}Note: active is not in required because it doesn't appear in all samples.
Configure specific fields to have predefined types:
generator := jsonschema.New(
jsonschema.WithPredefined("created_at", jsonschema.DateTime),
jsonschema.WithPredefined("updated_at", jsonschema.DateTime),
)
generator.AddSample(`{"id": 1, "created_at": "2023-01-15T10:30:00Z"}`)
generator.AddSample(`{"id": 2, "created_at": "2023-02-20T14:45:00Z"}`)
schema, _ := generator.Generate()Available predefined types:
DateTime- string with date-time formatString- string typeBoolean- boolean typeNumber- number typeInteger- integer typeArray- array typeObject- object type
The library can capture the first observed value as an example for each field:
// Enable example capturing
generator := jsonschema.New(jsonschema.WithExamples())By default, example capturing is disabled to save memory and keep schemas compact.
The library handles arrays of objects and detects optional fields within array items:
generator := jsonschema.New()
generator.AddSample(`{
"users": [
{"id": 1, "name": "John", "email": "john@example.com"},
{"id": 2, "name": "Jane"}
]
}`)
generator.AddSample(`{
"users": [
{"id": 3, "name": "Bob", "email": "bob@example.com"}
]
}`)
schema, _ := generator.Generate()The resulting schema will show that email is optional in the array items since it doesn't appear in all objects.
Load a previously generated schema and continue adding samples:
// Generate initial schema
generator1 := jsonschema.New()
generator1.AddSample(`{"name": "John", "age": 30}`)
schemaJSON, _ := generator1.Generate()
// Later, load the schema and add more samples
generator2 := jsonschema.New()
err := generator2.Load(schemaJSON)
if err != nil {
panic(err)
}
// Add new samples with additional fields
generator2.AddSample(`{"name": "Jane", "age": 25, "email": "jane@example.com"}`)
// Generate updated schema
updatedSchema, _ := generator2.Generate()Retrieve the current schema as a Schema object after any sample:
generator := jsonschema.New()
generator.AddSample(`{"name": "John"}`)
// Get the current schema as an object (not JSON string)
schema := generator.GetCurrentSchema()
// Access properties
fmt.Println(schema.Type) // "object"
fmt.Println(schema.Properties["name"].Type) // "string"go buildOr use the Makefile:
make buildgo test -vOr use the Makefile:
make testmake test-coverageThis generates coverage.html which you can open in a browser.
The library uses a tree-based recursive architecture:
-
SchemaNode: Each node represents a part of the JSON structure- Handles only primitives (string, number, boolean, null)
- Delegates to child nodes for complex types (arrays, objects)
- Accumulates observations across all samples
-
Lazy Schema Building: Schema is built on demand when
Generate()orGetCurrentSchema()is called- No redundant work while adding samples
- Result is cached and reused until the next sample invalidates it
- Can still inspect schema evolution via
GetCurrentSchema()at any time
-
Optional Field Detection: Tracks how many times each field appears
- Fields appearing in all samples → required
- Fields appearing in some samples → optional
See the examples/ directory for runnable examples:
- basic - Basic type inference and optional fields
- arrays - Arrays of objects with optional fields
- datetime - Automatic datetime detection
- predefined - Configuring predefined types
- load_resume - Loading and resuming schemas
- nested - Deeply nested structures
- incremental - Watching schema evolution
Run all examples:
cd examples
./run-examples.shThis library is unique in the Go ecosystem for sample-based JSON schema inference. Similar functionality exists in other languages:
- Python: genson - similar approach
- JavaScript: @jsonhero/schema-infer
- Online: jsonschema.net - web-based tool
Key advantages of jsonschema-infer:
- ✅ Pure Go implementation
- ✅ Incremental schema updates
- ✅ Load/resume capability
- ✅ Tree-based recursive architecture
- ✅ Optional field frequency tracking
Contributions are welcome! Please feel free to submit issues and pull requests.
- The library uses Go's standard
encoding/jsonpackage for JSON parsing - All array items are treated as having the same schema (merged together)
- Multiple type detection is supported (e.g., a field that's sometimes string, sometimes number)