This Python package converts IDML (InDesign Markup Language) files to Docbook 5.2.
More importantly, because DocBook is supported by Pandoc, this tool effectively enables IDML to be converted into dozens of other formats (Markdown, DOCX, EPUB, ODT, AsciiDoc, etc.). In practice, idml2docbook acts as a custom reader of IDML files for Pandoc. It is a bridge between InDesign and the Pandoc ecosystem, and as such, it is the cornerstone of OutDesign.
flowchart LR
subgraph S1["idml2docbook"]
IDML[IDML] --> DOCBOOK[DocBook]
end
subgraph S2["Pandoc"]
DOCBOOK --> MD[Markdown]
DOCBOOK --> DOCX[DOCX]
DOCBOOK --> ADOC[AsciiDoc]
DOCBOOK --> ODT[ODT]
DOCBOOK --> EPUB[EPUB]
DOCBOOK --> ETC[etc.]
end
First, create a virtual environment.
Then, you can install and download this package using pip:
pip install idml2docbook
The package is now installed, but the environment still needs to be configured. This converter requires external dependencies because it is basically a wrapper around idml2xml-frontend that takes its Hub XML output and converts it to DocBook. To make it all work, the following is required:
- Python >= 3.x
- Java >= 1.7
- git (needed to install idml2xml-frontend)
- idml2xml-frontend
The following command helps you check if you have those dependencies installed. It also installs idml2xml-frontend and generates a sample .env if none are to be found in your folder:
idml2docbook-install-dependencies
If you already have a .env file in your project, you will need to manually add it the path to idml2xml-frontend:
IDML2HUBXML_SCRIPT_FOLDER="/path/to/idml2xml-frontend"
These dependencies are for MacOS and Linux. For Windows, a BAT script was written, but it was never tested (if anybody wants to help there, they would be very much welcomed.)
Convert an IDML file:
idml2docbook file.idmlOptions are also available. They are as well documented in the command-line tool (see the help with -h/--help).
-
-x,--idml2hubxml-file
Treats the input file as a Hub XML file.
Useful for saving processing time ifidml2xml-frontendhas already been run on the source IDML file. -
-o,--output <file>
Name to assign to the output file.
By default, output is sent to standard output (stdout). -
-t,--typography
Applies French typographic refinements.
(thin spaces, non-breaking spaces, etc.). -
-l,--thin-spaces
Use only thin spaces for typography refinement.
Should be used together with--typography. -
-b,--linebreaks
Do not replace<br>tags with spaces. -
-f,--media <path>
Path to the folder containing media files.
Default:Links. -
-r,--raster <extension>
Extension to use when replacing that of raster images.
Example:jpg. -
-v,--vector <extension>
Extension to use when replacing that of vector images.
Example:svg. -
-i,--idml2hubxml-output <path>
Path to the output from Transpect’s idml2hubxml converter.
Default:idml2hubxml. -
-s,--idml2hubxml-script <path>
Path to the script of Transpect’s idml2xml-frontend converter. -
--version
Displays the version of idml2docbook and exits the program.
In addition to idml2docbook, another command is also accessible through the CLI, idml2docbook-utils. This command takes a Hub XML file as input, as well as other options to extract styles data under various formats:
-
--to-css
Generates a CSS file that contains the paragraph and character styles attribute/value pairs extracted from the original IDML input file. -
--to-ods
Generates an ODS file based on the paragraph and character styles of the original IDML input file.
Finally, a wrapper around idml2docbook was written in order to facilitate the extraction of CSS content. If you are more interested in form than in content, you can go have a look to idml2css.
Simple command to use this package with Pandoc:
pandoc -f docbook -t markdown <(idml2docbook input.idml)
Though, it is possible to do crazy stuff as well 🤪
idml2docbook input.idml \
--typography \
--thin-spaces \
--raster jpg \
--vector svg \
--media images | \
pandoc -f docbook \
-t markdown_phpextra \
--wrap=none \
-o output/output.mdInDesign paragraph and character styles are converted into DocBook as role attributes.
Pandoc supports role attributes in the Docbook reader in versions 3.9 (February 2026) and higher.
In order to convert role attributes into Pandoc classes, the roles-to-classes.lua filter can be used:
pandoc -f docbook -t markdown --lua-filter=roles-to-classes.lua <(idml2docbook input.idml)
Sample script to use the API:
from idml2docbook.core import idml2docbook
file = "input.idml"
# Options are optional!
options = {
'typography': True,
'thin_spaces': True,
'linebreaks': True,
'ignore_overrides': True,
'raster': "jpg",
'vector': "svg",
'media': "images"
}
output = idml2docbook(file, **options)
print(output)It is also possible to output the paragraph and character styles as CSS by extracting them from the resulting Hub XML file:
from idml2docbook.idml2hubxml import idml2hubxml
from idml2docbook.map import generate_css
file = "input.xml"
hubxml = idml2hubxml(file, read_output_file=True)
output = generate_css(hubxml)
print(output)Start by creating a virtual environment. The Python dependencies specified in pyproject.toml should be downloaded automatically.
You also need to meet the other prerequisites specifies in the Installation section. In order to do so, execute the following command:
python idml2docbook/install_dependencies.pyOnce you have this, you can try your configuration on the hello_world.idml file:
python -m idml2docbook tests/hello_world/hello_world.idmlWhich should output the following:
<?xml version="1.0" encoding="utf-8"?>
<article version="5.0" xml:lang="fr-FR" xmlns="http://docbook.org/ns/docbook">
<para role="NormalParagraphStyle"><phrase role="character-override-1">Hello world!</phrase></para>
</article>You can also run the tests from the tests folder:
pip install pytest # Install pytest in your venv if you haven't already
python -m pytest