AmendDiff

Semantic amendment diff for credit agreements. When Amendment No. N to a credit agreement lands, AmendDiff answers: which defined terms changed, what do they cascade into, and which sections of the base agreement are affected — as a clause-anchored Amendment Impact Memo, with an explicit list of everything the extraction was unsure about.

This is the wedge product for the universal-semantic-container (.usc) thesis: every deal accumulates in one SQLite file holding the original documents verbatim, attributed assertions about them, and an append-only event log. The format is exhaust; the memo is the product.

Quickstart

# 1. Pull real agreement+amendment pairs from SEC EDGAR (no deps, uses curl)
python3 scripts/edgar_pull.py --max-pairs 1

# 2. Ingest the downloaded pair into a deal file (which company you get
#    depends on EDGAR search order — check the manifest for roles).
#    Selecting via manifest.json skips any dir from an interrupted run.
pair=$(dirname "$(ls data/*/manifest.json | head -1)")
mkdir -p deals
python3 -m amenddiff ingest deals/demo.usc "$pair"/base--* "$pair"/amendment--*

# 3. Generate the impact memo ('base--'/'amendment--' are filename substrings)
#    --html adds a self-contained report where every claim links to the
#    highlighted evidence span inside the embedded source documents
python3 -m amenddiff memo deals/demo.usc --base base-- --amendment amendment-- \
    -o deals/demo.memo.md --html deals/demo.html

# 4. Integrity-check the deal file (hashes, anchors, event log)
python3 -m amenddiff audit deals/demo.usc

# 5. Inspect the deal file (assets, assertion counts, event log)
python3 -m amenddiff show deals/demo.usc

Or install it: pip install -e . gives you the amenddiff command (pip install -e ".[llm]" to include the Claude pass). Run the tests with python3 -m unittest discover -s tests.

Try it in the browser

pip install -e ".[web]"
amenddiff-web          # → http://localhost:8000

One page: three bundled sample deals (real SEC EDGAR filings — one click, no documents needed) plus a drop-your-own-pair form. Uploads are processed in memory per request and never stored; 8 MB/file cap; PDFs politely rejected (EDGAR .htm exhibits and plain text work as-is).

Publish it anywhere that runs a container:

docker build -t amenddiff . && docker run -p 8000:8000 amenddiff

Fly.io: fly launch (Dockerfile detected automatically)
Render / Railway: new web service from this repo, Docker runtime
Any VPS: docker run -d -p 80:8000 amenddiff behind your proxy

The core has zero dependencies beyond Python 3.11+. The optional LLM pass (memo --llm) needs pip install anthropic pydantic and ANTHROPIC_API_KEY; it layers Claude Opus 4.8 structured extraction on top of the regex core as separately-attributed assertions.

What the memo contains

Changed definitions — added / restated / amended / deleted, each with a quoted snippet and confidence.
Sections amended directly — "Section 6.8(b) is hereby amended…" detections.
Cascade — BFS over the defined-term dependency graph: change "Consolidated Operational Restructuring Costs" and the memo shows "Adjusted EBITDA" (depth 1) and "Interest Coverage Ratio" (depth 2) shifting with it.
Affected sections — every section of the base agreement that uses an affected term.
Uncertainty — what the extractor could NOT classify or only saw as a cross-reference. This is deliberate: the tool's job is to make counsel review faster and targeted, not to pretend certainty.
Warnings — e.g. when the amendment amends a different agreement than the ingested base (a real EDGAR pairing failure mode), the memo says so instead of analyzing silently against the wrong document.

The .usc deal store

One SQLite file per deal, three tables, three rules:

table	rule
`assets`	Original bytes verbatim, content-hashed. The only source of truth. Derived text and memos are assets too, linked via `parent_id`.
`assertions`	Every extracted fact carries its source (`regex-v0.1`, `llm:claude-opus-4-8`, `human:<name>`), a confidence, and a character anchor into the derived text. Re-extraction adds assertions; it never overwrites.
`events`	Append-only log of every ingest, extraction, and memo generation.

Design stance (from the red-team review of the .usc concept): the graph is a cache of attributed opinions about the originals, never the truth itself. That is what makes the file auditable.

Layout

scripts/edgar_pull.py    EDGAR full-text search + exhibit downloader
amenddiff/
  htmltext.py            HTML/plain-text exhibit -> derived text
  extract.py             defined terms, dependency graph, sections (regex core)
  amendment.py           amendment operation detection
  analysis.py            the full base-vs-amendment analysis (one computation)
  store.py               the .usc SQLite deal store + integrity audit
  memo.py                impact memo, markdown rendering
  report.py              impact memo, self-contained HTML with anchored evidence
  llm.py                 optional Claude Opus 4.8 structured-extraction pass
  cli.py                 ingest / memo / audit / show
tests/                   unit + end-to-end suite (synthetic fixtures)
docs/discovery/          call script, outreach templates, target-list playbook
data/                    downloaded EDGAR exhibits (gitignored)
deals/                   .usc deal files + memos (gitignored)

Known limits (v0.1, by design)

Regex extraction misses definitions that don't follow the "Term" means / "Term" … refers to idioms; those surface in the Uncertainty section rather than silently disappearing.
Section detection requires Section N.NN <Capitalized Title> headings at line start (TOC entries and cross-references are filtered out). Agreements with bare 2.01 Heading numbering get zero sections (honest) rather than fabricated ones, and unusual heading formats may drop individual sections — the "Affected sections" table errs toward omission, never invention.
Amendment pairing from EDGAR is heuristic — confirm via manifest.json. An amendment may amend a different agreement than the one paired as base (e.g. it amends the Credit Agreement while the base on file is the Guarantee and Collateral Agreement); cascade analysis only covers the ingested base.
The cascade is definition-graph only; it does not yet parse covenant tables (pricing grids, ratio schedules).
Amendments that modify a clause inside a definition without the phrase definition of "X" can be missed — the Uncertainty section is there because the memo assists counsel review; it does not replace it.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
amenddiff		amenddiff
docs/discovery		docs/discovery
scripts		scripts
tests		tests
web/samples		web/samples
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AmendDiff

Quickstart

Try it in the browser

What the memo contains

The .usc deal store

Layout

Known limits (v0.1, by design)

Next

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AmendDiff

Quickstart

Try it in the browser

What the memo contains

The .usc deal store

Layout

Known limits (v0.1, by design)

Next

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages