Semantic amendment diff for credit agreements. When Amendment No. N to a credit agreement lands, AmendDiff answers: which defined terms changed, what do they cascade into, and which sections of the base agreement are affected — as a clause-anchored Amendment Impact Memo, with an explicit list of everything the extraction was unsure about.
This is the wedge product for the universal-semantic-container (.usc) thesis: every deal accumulates in one SQLite file holding the original documents verbatim, attributed assertions about them, and an append-only event log. The format is exhaust; the memo is the product.
# 1. Pull real agreement+amendment pairs from SEC EDGAR (no deps, uses curl)
python3 scripts/edgar_pull.py --max-pairs 1
# 2. Ingest the downloaded pair into a deal file (which company you get
# depends on EDGAR search order — check the manifest for roles).
# Selecting via manifest.json skips any dir from an interrupted run.
pair=$(dirname "$(ls data/*/manifest.json | head -1)")
mkdir -p deals
python3 -m amenddiff ingest deals/demo.usc "$pair"/base--* "$pair"/amendment--*
# 3. Generate the impact memo ('base--'/'amendment--' are filename substrings)
# --html adds a self-contained report where every claim links to the
# highlighted evidence span inside the embedded source documents
python3 -m amenddiff memo deals/demo.usc --base base-- --amendment amendment-- \
-o deals/demo.memo.md --html deals/demo.html
# 4. Integrity-check the deal file (hashes, anchors, event log)
python3 -m amenddiff audit deals/demo.usc
# 5. Inspect the deal file (assets, assertion counts, event log)
python3 -m amenddiff show deals/demo.uscOr install it: pip install -e . gives you the amenddiff command
(pip install -e ".[llm]" to include the Claude pass). Run the tests with
python3 -m unittest discover -s tests.
pip install -e ".[web]"
amenddiff-web # → http://localhost:8000One page: three bundled sample deals (real SEC EDGAR filings — one click,
no documents needed) plus a drop-your-own-pair form. Uploads are processed
in memory per request and never stored; 8 MB/file cap; PDFs politely
rejected (EDGAR .htm exhibits and plain text work as-is).
Publish it anywhere that runs a container:
docker build -t amenddiff . && docker run -p 8000:8000 amenddiff- Fly.io:
fly launch(Dockerfile detected automatically) - Render / Railway: new web service from this repo, Docker runtime
- Any VPS:
docker run -d -p 80:8000 amenddiffbehind your proxy
The core has zero dependencies beyond Python 3.11+. The optional LLM pass (memo --llm) needs pip install anthropic pydantic and ANTHROPIC_API_KEY; it layers Claude Opus 4.8 structured extraction on top of the regex core as separately-attributed assertions.
- Changed definitions — added / restated / amended / deleted, each with a quoted snippet and confidence.
- Sections amended directly — "Section 6.8(b) is hereby amended…" detections.
- Cascade — BFS over the defined-term dependency graph: change "Consolidated Operational Restructuring Costs" and the memo shows "Adjusted EBITDA" (depth 1) and "Interest Coverage Ratio" (depth 2) shifting with it.
- Affected sections — every section of the base agreement that uses an affected term.
- Uncertainty — what the extractor could NOT classify or only saw as a cross-reference. This is deliberate: the tool's job is to make counsel review faster and targeted, not to pretend certainty.
- Warnings — e.g. when the amendment amends a different agreement than the ingested base (a real EDGAR pairing failure mode), the memo says so instead of analyzing silently against the wrong document.
One SQLite file per deal, three tables, three rules:
| table | rule |
|---|---|
assets |
Original bytes verbatim, content-hashed. The only source of truth. Derived text and memos are assets too, linked via parent_id. |
assertions |
Every extracted fact carries its source (regex-v0.1, llm:claude-opus-4-8, human:<name>), a confidence, and a character anchor into the derived text. Re-extraction adds assertions; it never overwrites. |
events |
Append-only log of every ingest, extraction, and memo generation. |
Design stance (from the red-team review of the .usc concept): the graph is a cache of attributed opinions about the originals, never the truth itself. That is what makes the file auditable.
scripts/edgar_pull.py EDGAR full-text search + exhibit downloader
amenddiff/
htmltext.py HTML/plain-text exhibit -> derived text
extract.py defined terms, dependency graph, sections (regex core)
amendment.py amendment operation detection
analysis.py the full base-vs-amendment analysis (one computation)
store.py the .usc SQLite deal store + integrity audit
memo.py impact memo, markdown rendering
report.py impact memo, self-contained HTML with anchored evidence
llm.py optional Claude Opus 4.8 structured-extraction pass
cli.py ingest / memo / audit / show
tests/ unit + end-to-end suite (synthetic fixtures)
docs/discovery/ call script, outreach templates, target-list playbook
data/ downloaded EDGAR exhibits (gitignored)
deals/ .usc deal files + memos (gitignored)
- Regex extraction misses definitions that don't follow the
"Term" means/"Term" … refers toidioms; those surface in the Uncertainty section rather than silently disappearing. - Section detection requires
Section N.NN <Capitalized Title>headings at line start (TOC entries and cross-references are filtered out). Agreements with bare2.01 Headingnumbering get zero sections (honest) rather than fabricated ones, and unusual heading formats may drop individual sections — the "Affected sections" table errs toward omission, never invention. - Amendment pairing from EDGAR is heuristic — confirm via
manifest.json. An amendment may amend a different agreement than the one paired as base (e.g. it amends the Credit Agreement while the base on file is the Guarantee and Collateral Agreement); cascade analysis only covers the ingested base. - The cascade is definition-graph only; it does not yet parse covenant tables (pricing grids, ratio schedules).
- Amendments that modify a clause inside a definition without the phrase
definition of "X"can be missed — the Uncertainty section is there because the memo assists counsel review; it does not replace it.
- Run the discovery playbook in docs/discovery — the kill criterion is 3 pain-confirmed calls out of 15.
- Grade the memo against counsel review on one live deal under NDA.
- Instrument the one metric that matters: changes counsel found that the memo missed.