Skip to content

Add DocLang writer, consolidate shared utilities#11705

Draft
bitkojine wants to merge 2 commits into
jgm:mainfrom
bitkojine:v4.0-wip
Draft

Add DocLang writer, consolidate shared utilities#11705
bitkojine wants to merge 2 commits into
jgm:mainfrom
bitkojine:v4.0-wip

Conversation

@bitkojine

@bitkojine bitkojine commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds the DocLang writer (-t doclang) — a new output format that converts any pandoc-supported document to the DocLang AI-native document format. DocLang is an open standard under LF AI & Data Foundation, backed by IBM, NVIDIA, Red Hat, and ABBYY.

Also adds isSimpleContent utility in Text.Pandoc.Writers.Shared — detects single-paragraph inline content for container optimizations (virtual text, simple list items, etc.), available for all writers.

Added

  • DocLang writer (src/Text/Pandoc/Writers/DocLang.hs)
  • DocLang template (data/templates/default.doclang)
  • Test suite (test/writer.doclang, test/tables.doclang)
  • isSimpleContent utility in Text.Pandoc.Writers.Shared
  • Registered in pandoc.cabal, Writers.hs, MANUAL.txt

Verifying

stack build
stack test pandoc:test-pandoc
echo '# Hello' | stack exec pandoc -f markdown -t doclang -s

@bitkojine bitkojine marked this pull request as draft June 13, 2026 00:45
…ties

Breaking: the XML format (1:1 equivalent of native/json) has been
removed. Use -t native or -t json instead.

Added:
- DocLang writer (-t doclang) - AI-native document format, 205 lines
- isSimpleContent utility in Writers.Shared
- HTML reader CI fix (unused variable)

Removed:
- XML reader/writer/XMLFormat/tests/doc/schemas (-3,466 lines)

Net: -2,340 lines. All 3,943 tests pass.
@jgm

jgm commented Jun 13, 2026

Copy link
Copy Markdown
Owner

No, we won't be removing the xml format...

Reverts the XML format removal from the previous v4.0 commit while
keeping all DocLang additions intact:

Restored deleted files:
- doc/xml.md, Readers/XML.hs, Writers/XML.hs, XMLFormat.hs
- Tests/XML.hs, tools/pandoc-xml.{dtd,rnc,rng,xsd}

Reverted XML-removal modifications in:
- Readers.hs, Writers.hs, test-pandoc.hs (XML refs restored)
- pandoc.cabal (XML modules + DocLang module)
- MANUAL.txt (XML output format + doclang format)

DocLang additions preserved:
- Writers/DocLang.hs, template, test files
- isSimpleContent in Writers/Shared.hs
- Extensions.hs entry, Tests/Old.hs entry
@bitkojine bitkojine changed the title v4.0: Add DocLang writer, remove XML format, consolidate shared utilities Add DocLang writer, consolidate shared utilities Jun 13, 2026
@jgm

jgm commented Jun 13, 2026

Copy link
Copy Markdown
Owner

DocLang seems very new (just a few weeks). If it gets some traction then it could be a worthy addition to pandoc, but I think it's premature to add it now.

@bitkojine

Copy link
Copy Markdown
Contributor Author

Yes, I agree it’s probably a bit early to merge. I mainly wanted to get the PR ready and explore what Pandoc can do in DocLang pipelines. For now, I’m happy to keep it on my Pandoc fork and revisit it if DocLang gains traction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants