A cross-platform CLI tool that converts .docx and .pdf files to Markdown using Pandoc, with automatic image extraction and path normalisation.
- Go 1.22+
- Pandoc in
PATH
make build
# or
go build -o convert ./cmd/convertCross-platform binaries:
make crossconvert input.docx output.md images/
convert report.pdf report.md images/Extracts images to images/<input_basename>/image1.ext, image2.ext, … and rewrites paths in the Markdown.
convert dir input_dir/ output_dir/ images/Recursively converts all .docx and .pdf files. Preserves subdirectory structure in the output.
convert dir input_dir/ output.zip images/ --zipSame as directory mode but packages the result into a zip archive. No temp files are left behind.
| Flag | Default | Description |
|---|---|---|
-format |
markdown |
Pandoc output format (markdown or gfm) |
-overwrite |
false | Overwrite existing output files |
-recursive |
true | Recurse into subdirectories |
-flatten |
false | Collapse directory structure in output |
-log <file> |
Write log output to a file |
cmd/convert/ main entry point
internal/
cli/ argument parsing, command dispatch
converter/ single-file and directory conversion pipelines
pandoc/ pandoc invocation via os/exec
images/ image normalisation and Markdown path rewriting
filesystem/ path helpers, directory walking, safe temp dirs
zip/ zip archive creation
logger/ structured levelled logging
pkg/types/ shared types (ConvertOptions, ConvertResult)
The Go application is a thin orchestrator. All document parsing is delegated to Pandoc via os/exec. The tool's responsibilities are:
- CLI interface and flag parsing
- Input validation and directory traversal
- Invoking Pandoc with the correct arguments
- Moving extracted images to canonical paths
- Rewriting image references in the generated Markdown
- ZIP packaging and temp directory cleanup
make test
# or
go test ./... -race