Skip to content

Support transforming primitive types using a codec#73

Merged
zxch3n merged 10 commits into
loro-dev:mainfrom
bentefay:add-transform
Mar 24, 2026
Merged

Support transforming primitive types using a codec#73
zxch3n merged 10 commits into
loro-dev:mainfrom
bentefay:add-transform

Conversation

@bentefay
Copy link
Copy Markdown
Contributor

@bentefay bentefay commented Feb 14, 2026

Transform Feature Proposal for Loro Mirror

Hi there!

This PR proposes a new Transform feature for loro-mirror. It enables bidirectional conversion between CRDT leaf primitives (String, Number and Boolean) and rich domain types (like Temporal types, BigInt or custom objects). This allows users to model their application state with rich domain types, while Loro continues to store JSON primitives. I used Zod codecs for inspiration, noting you appear to have done the same for catchall.

I imagine this could be a first step in a broader effort to support a rich schema definition in Loro Mirror similar to frameworks like Zod, but with all the incredible benefits of Loro Mirror's CRDT core. For example, you could imagine Loro Mirror supporting features like z.tuple, z.union and z.discriminatedUnion for rich domain modelling. I don't think this would require any changes to Loro's core - just incremental improvements to Loro Mirror.

This is unsolicited work, so I completely understand if it doesn't align with your vision. I'm happy to discuss, iterate, or accept a "no thank you."

API at a Glance

Three additions to the public API:

schema.String().transform({ decode, encode });
schema.Number().transform({ decode, encode });
schema.Boolean().transform({ decode, encode });

The transform argument has this shape:

/**
 * Transform definition for bidirectional conversion between CRDT primitives and domain types.
 * It is strongly recommend that DomainType is immutable or
 * [supported by Immer](https://immerjs.github.io/immer/complex-objects/) otherwise changes to transformed
 * values may not be detected and converted to CRDT operations. Never mutate DomainType instances
 * outside of Loro Mirror's setState function. CRDTType and DomainType can be null or undefined, but decode/encode functions
 * will never receive null/undefined - they pass through as-is. Validation is performed on the domain type after transformation
 * (or more precisely, validation on domain types happens before encoding).
 */
interface TransformDefinition<CRDTType, DomainType> {
    /** Convert CRDT primitive to domain type. Never called with null/undefined. */
    decode: (value: CRDTType & {}) => DomainType & {};

    /** Convert domain type to CRDT primitive. Never called with null/undefined. */
    encode: (value: DomainType & {}) => CRDTType & {};

    /** Validate the domain value. Called during schema validation. */
    validate?: (value: DomainType & {}) => boolean | string;

    /**
     * How to compare domain values for equality during setState diffing.
     * @default "reference-equality"
     */
    isEqual?: EqualityStrategy<DomainType>;

    /**
     * Whether to validate that encode() returns the correct CRDT type during schema validation.
     * When true, encode() is called on every validation to check the return type.
     * When false, encode type checking is skipped for better performance.
     *
     * @default false
     */
    validateEncodedType?: boolean;
}

type EqualityStrategy<D> =
    | "reference-equality"
    | "encoded-value-equality"
    | "deep-equality"
    | ((a: D, b: D) => boolean);

Before / After

// BEFORE: Manual conversion everywhere
const mySchema = schema({
    task: schema.LoroMap({
        title: schema.String(),
        dueDate: schema.String(), // You work with strings
    }),
});

mirror.setState({
    task: { title: "Review PR", dueDate: new Date().toISOString() }, // Manual encode
});
const date = new Date(mirror.getState().task.dueDate); // Manual decode

// AFTER: Domain types throughout
const dateTransform = {
    decode: (s: string) => new Date(s),
    encode: (d: Date) => d.toISOString(),
};

const mySchema = schema({
    task: schema.LoroMap({
        title: schema.String(),
        dueDate: schema.String().transform(dateTransform), // You work with Dates
    }),
});

mirror.setState({
    task: { title: "Review PR", dueDate: new Date() }, // Just pass a Date
});
mirror.getState().task.dueDate.getTime(); // Already a Date

Type inference works automatically — InferType returns Date, not string.

Why This Matters

The core benefit: Allows users of Loro to model their application state with rich domain types, not JSON primitives.

Aspect Primitives Domain Types
Business logic Must parse/validate repeatedly Methods available (date.getDay())
Type safety string could be anything TypeScript and LLMs know it's a Date
Refactoring Did you update all conversion sites? One transform definition
setState Manual encode required Domain types accepted
getState Manual decode required Domain types returned

Why Not Just Use Selectors?

You might ask: "Why not just use React/Jotai selectors to transform on read rather than complicating Loro Mirror?"

Selectors only work one direction:

// Selector pattern
const dateSelector = (state) => new Date(state.createdAt); // Read works
setState({ createdAt: new Date().toISOString() }); // Write still needs manual conversion

// Transform pattern
mirror.getState().createdAt; // Returns Date
mirror.setState({ createdAt: new Date() }); // Accepts Date

Selectors require caching at every level to avoid recomputation. Cache invalidation grows with your state tree depth.

Transforms decode once during event application. getState() returns cached domain objects instantly — no memoization needed.

Quick Examples

Date Transform

const dateTransform = {
    decode: (s: string) => new Date(s),
    encode: (d: Date) => d.toISOString(),
};

schema.String().transform(dateTransform);

Optional Fields

Optional fields work directly with transforms — no wrapper needed:

schema.String({ required: false }).transform(dateTransform);
// Type infers as Date | undefined

When a field is undefined, the transform is bypassed entirely. The undefined passes through as-is.

List with Transforms

schema.LoroList(
    schema.String().transform(dateTransform),
    (e) => e.getTime().toString()
);

Equality Strategies

The transform config has an optional isEqual property that controls how Loro Mirror detects whether a transformed value has changed during setState(). This determines when CRDT operations are emitted.

"reference-equality" (default)

Different reference = different value. Fast O(1) check.

const transform = {
    decode: (s: string) => new Date(s),
    encode: (d: Date) => d.toISOString(),
    isEqual: "reference-equality",
};

const date = new Date("2025-01-19");
mirror.setState({ task: { dueDate: date } });
mirror.setState({ task: { dueDate: date } }); // Same ref → no CRDT op
mirror.setState({ task: { dueDate: new Date("2025-01-19") } }); // New ref → CRDT op (even though same value)

Best for: Most cases. As long as a domain type is immutable or ImmerJS compatible, this will just work. However, it will emit unnecessary CRDT ops if a reference changes but the encoded value is the same (for example, two separate Date objects at the same instant).

"encoded-value-equality"

Encodes both values and compares the primitives. Slower but avoids spurious updates.

const transform = {
    decode: (s: string) => new Date(s),
    encode: (d: Date) => d.toISOString(),
    isEqual: "encoded-value-equality",
};

mirror.setState({ task: { dueDate: new Date("2025-01-19") } });
mirror.setState({ task: { dueDate: new Date("2025-01-19") } }); // Same encoded value → no CRDT op

Best for: Cases where the reference might change regularly but the value remains the same. Recommended if the encode function is fast, to minimise CRDT ops emitted.

"deep-equality"

Performs deep recursive comparison of domain values.

const transform = {
    decode: (s: string) => new Date(s),
    encode: (d: Date) => d.toISOString(),
    isEqual: "deep-equality",
};

Best for: Complex objects where you want structural equality without encoding overhead. This is rarely needed as Immer will detect changes deep within an object hierarchy and change the reference of the root.

Custom function

Full control over comparison logic.

const transform = {
    decode: (s: string) => new Date(s),
    encode: (d: Date) => d.toISOString(),
    isEqual: (a: Date, b: Date) => a.getTime() === b.getTime(),
};

Best for: When you can compare faster than encoding (e.g., comparing timestamps is faster than toISOString()), or when you need domain-specific equality (e.g., comparing only an id field on a complex object).

Validation

The behaviour of validation is maintained - validation runs on the mirror state rather than the CRDT document. However, this is no longer validating a JSON object, since the mirror state is now a domain type. The validation functions, validateUpdates and validateSchema, now validate the domain type rather than the JSON that will become the CRDT type. By default, domain types are not encoded to check whether they have the correct JSON type. However, each transform can opt into this by setting validateEncodedType to true.

Additionally, transforms can define their own validate function that receives the domain value and returns true for valid or an error message string for invalid.

Implementation Details

How it works:

  • setState() → encodes domain values that have changed to primitives and writes to CRDT
  • Loro op application → decodes primitives to domain values inside Immer produce() (atomic)
  • getState() → returns cached state with domain objects (no decoding at read time)
  • null/undefined → bypass transforms entirely, pass through as-is

Peer sync: Only CRDT primitives travel over the network. Each peer decodes based on its schema.

One Area of Uncertainty

The one area of the code base I didn't fully understand was the tree handling in mirror.ts in the containerToMirrorState function. It seems like the conversion of trees to mirror states tries to avoid multiple WASM roundtrips by lazily converting tree nodes to JSON, normalizing their shape and converting ids to cids on just the tree nodes. It appears to not transform the node.data map containers and any nested children they may have. I'm assuming this is a performance optimization, where setState is intended to later restore the cids of nested data if they change. I needed to update this logic to recursively decode any primitives values that might be nested inside the tree (implemented as decodeNestedJsonValues, which you can see I'm calling in containerToMirrorState).

Note also that I've renamed containerToJson and the associated types and functions to containerToMirrorState as they are no longer strictly JSON (since they can be rich domain types like dates or BigInts).

I don't understand why the following is duplicated in loroEventApply.ts and mirror.ts but I updated both to use the term MirrorState rather than JSON:

type JSONPrimitive = string | number | boolean | null | undefined;
type JSONValue = JSONPrimitive | JSONObject | JSONValue[];
interface JSONObject {
    [k: string]: JSONValue;
}

Test Coverage

~3,560 lines across 9 test files covering:

  • Composition (nested maps, lists with idSelector, movable lists)
  • Edge cases (null/undefined bypass, empty collections, BigInt, class instances)
  • All four equality strategies
  • Error handling and propagation
  • Round-trip sync between peers
  • All diff functions
  • Compile-time type inference

Breaking Changes

None. Purely additive. Existing schemas without transforms work unchanged.

Known Limitations

  • Transforms only on primitive schemas (String, Number, Boolean), not containers
  • Transforms cannot return null or undefined (enforced by & {} type constraint)
  • Decode exceptions fail the entire state update immediately. You could imagine building up errors and emitting them as a group.

Feedback Welcome

This is complete and tested, but I'm open to:

  • Design changes based on your feedback
  • Breaking this into smaller PRs
  • A "not interested" if it doesn't align with your vision

Thanks for considering this. I've found it valuable in my own work and thought it might benefit other loro-mirror users.

Further Documentation

For more detail:

  • packages/core/src/schema/index.ts - .transform() builder
  • packages/core/src/schema/types.ts - TransformDefinition, EqualityStrategy
  • packages/core/src/core/utils.ts - applyEncode(), applyDecode(), valuesEqual()
  • packages/core/src/core/mirror.ts - Integration in setState/getState
  • packages/core/src/core/loroEventApply.ts - Decode during event application
  • packages/core/src/core/diff.ts - valuesEqual() in all 5 diff functions

Copilot AI review requested due to automatic review settings February 14, 2026 06:57
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 27b00652df

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/core/src/core/utils.ts Outdated
Comment thread packages/core/src/core/mirror.ts
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a comprehensive Transform feature to loro-mirror that enables bidirectional conversion between CRDT leaf primitives (string, number, boolean) and rich domain types (Date, BigInt, custom objects, etc.). This allows developers to work with domain types throughout their application while Loro stores JSON-serializable primitives.

Changes:

  • Added .transform() method to schema.String(), schema.Number(), and schema.Boolean() primitives
  • Implemented encoding (domain → CRDT) during setState() and diff operations
  • Implemented decoding (CRDT → domain) during Loro event application and snapshot initialization
  • Added configurable equality strategies (reference-equality, encoded-value-equality, deep-equality, custom function) for change detection
  • Added comprehensive test coverage (~3,560 lines across 9 test files) covering composition, edge cases, equality strategies, error handling, and peer sync

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
packages/core/src/schema/index.ts Adds .transform() builder methods to String, Number, and Boolean schema types
packages/core/src/schema/types.ts Defines TransformDefinition and EqualityStrategy types; updates type inference to return domain types
packages/core/src/schema/validators.ts Updates validation to work with domain types; adds transform-specific validation
packages/core/src/core/utils.ts Implements applyEncode(), applyDecode(), valuesEqual(), and decodeNestedJsonValues() helper functions
packages/core/src/core/mirror.ts Integrates encoding in setState() and decoding in snapshot building; renames JSON types to MirrorState
packages/core/src/core/loroEventApply.ts Applies decoding during Loro event processing to transform primitives to domain types
packages/core/src/core/diff.ts Uses valuesEqual() for change detection and applyEncode() for primitive diffs
packages/core/tests/*.test.ts Adds 9 comprehensive test files covering roundtrip, validation, types, equality, optional fields, error handling, edge cases, and diff operations
README.md Documents the transform feature with examples and API reference

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/core/src/core/utils.ts Outdated
Comment thread README.md
@bentefay bentefay force-pushed the add-transform branch 2 times, most recently from 074c207 to 84b0a66 Compare February 15, 2026 09:45
Enable conversion between CRDT primitives (strings, numbers, booleans)
and application domain types (Date, Temporal, BigInt, custom objects).

- Add TransformDefinition interface with decode/encode functions
- Add EqualityStrategy for configurable diff behavior
- Integrate transforms into diff, event application, and Mirror
- Add comprehensive test suite (9 test files, ~2,651 lines)
@zxch3n zxch3n merged commit 2045d5f into loro-dev:main Mar 24, 2026
1 check passed
@github-actions github-actions Bot mentioned this pull request Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants