Kbiw converts image files in digitisation batches to JP2 (JPEG 2000 Part 1) format using the Grok codec. It clones the structure of the input directory, and then replicates it in the output directory. The JP2 images are then subjected to the following quality checks:
- Pixel comparison between each pair of source and destination images (using libvips and pyvips).
- Check of extracted technical properties (using jpylyzer) against a user-defined Schematron profile.
The software also generates checksums of all converted images.
- Python (tested with versions 3.12.3 and 3.14.5)
- Grok JPEG 2000 codec (tested with version ??)
- Libvips
- ExifTool
-
Download the latest binaries of the Grok image compression software for your platform from:
-
Extract the ZIP file to your local file system, and make a note of the installation location. You'll need to enter this later in the kbiw configuration file.
Install libvips using:
sudo apt install libvips-dev --no-install-recommends
Install libvips using:
brew install vips
-
Download the latest release from the build-win64-mxe repository. For a 64 bit Windows system you need the ZIP file that follows the "vips-dev-x64-all-x.y.z.zip" naming pattern (e.g. vips-dev-x64-all-8.18.2.zip).
-
Extract the ZIP file to your local file system, and make a note of the installation location (e.g. "C:\vips-dev"). You'll need to enter this later in the kbiw configuration file.
Install ExifTool using:
sudo apt install libimage-exiftool-perl
Install ExifTool using:
brew install exiftool
-
Download the 64-bit Windows executable from the ExifTool website.
-
Extract the ZIP file to your local file system.
-
In the extracted folder, rename the ExifTool executable ("exiftool(-k).exe") to "exiftool.exe.
-
Make a note of the full path to the executable (e.g. "C:\exiftool\exiftool.exe"). You'll need to enter this later in the kbiw configuration file.
As of 2026, uv appears to be the most straightforward tool for installing Python applications on a variety of platforms (Linux, MacOS, Windows). However, the default KB Windows security policy blocks any applications that are installed in this way. For these machines, we have to install kbiw with pip, after which kbiw can be run as a Python module. For completeness, both installation methods (uv installation and pip installation) are decribed below.
First, check if uv is installed on your system by typing the uv command in a terminal:
uv
If this results in a help message, uv is installed, and you can skip directly to the "imgquad installation" section below. If not, you first need to install uv.
On Linux and MacOS you can install uv with the following command:
curl -LsSf https://astral.sh/uv/install.sh | sh
Alternatively, you can use wget if your system doesn't have curl installed:
wget -qO- https://astral.sh/uv/install.sh | sh
To install uv on Windows, open a Powershell terminal, and run the following command:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
Regardless of the operating system, in some cases the installation script will update your system's configuration to make the location of the uv executable globally accessible. If this happens, just close your current terminal, and open a new one for these changes to take effect. Pay attention to the screen output of the installation script for any details on this.
Use the following command to install kbiw (all platforms):
uv tool install kbiw
As an alternative to the uv installation, you can also install kbiw with pip. This is especially useful in case of Windows policies that block running installed Python applications.
Install kbiw with:
python -m pip install kbiw
After the installation, run kbiw once:
kbiw
For a Windows machine with a software restriction policy that blocks running Python applications, use this command instead:
python -m kbiw
Depending on your system, kbiw will now create a configuration folder (see next section).
Before you can use kbiw, you need to edit the configuration file, which is located in the configuration folder. The configuration folder has the name "kbiw", and its location depends on your operating system:
- For Linux and MacOS, the configuration folder is a subdirectory of the location defined by the environment variable $XDG_CONFIG_HOME. If this variable is not set, it will be a subdirectory of the .config directory in the user's home folder (e.g.
/home/johan/.config/kbiw). Note that the .config directory is hidden by default. - For Windows, the configuration folder is a subdirectory of of the AppData\Local folder (e.g.
C:\Users\johan\AppData\Local\kbiw).
Open the configuration file ("config.json") in a text editor, and edit the following values:
| Variable | Meaning | Examples |
|---|---|---|
| grokDir | Grok installation directory | C:/Grok (Windows); ~/grok (Linux) |
| exifToolExecutable | ExifTool executable | C:/exiftool/exiftool.exe (Windows); /bin/exiftool (Linux) |
| vipsBinDir | Libvips binary dir (only needed on Windows, ignored on Linux/macOS) | C:/vips-dev/bin (Windows) |
Here's an example for a Windows system:
{
"grokDir": "`C:/Grok",
"exifToolExecutable": "C:/exiftool/exiftool.exe",
"vipsBinDir": "C:/vips-dev/bin",
:
}The remaining part of configuration file contains a set of compression profiles, which define the JPEG 20000 compression options used by Grok. Make sure to not change these (unless you know what you're doing), as it may result in unexpected behaviour.
The general syntax of kbiw is:
kbiw [-h] [--version] dirIn dirOut workflow
The command-line arguments are:
| Argument | Description |
|---|---|
| dirIn | input batch directory |
| dirOut | output batch directory |
| workflow | workflow (tifftojp2-generic, tifftojp2-mh, tifftojp2-ie) |
As an example, the following command converts input batch "batch-tiff" to "output batch "batch-jp2", using the "tifftojp2-generic" workflow:
kbiw ./batch-tiff ./batch-jp2 tifftojp2-generic
This converts all TIFF images (identified by a ".tif" or ".tiff" file extension) in the input batch directory to corresponding JP2 images in the output batch directory. The directory structure of the input batch is replicated in the output batch. Any files in the input batch that are not TIFF images are ignored. The same is true for directories that do not contain any TIFF images. As a result, these files and directories are not included in the output batch. For each input TIFF, the workflow involves the following steps:
- Convert the TIFF to JP2 with the Grok JPEG 2000 encoder, using the compression profile "KB_MASTER_LOSSLESS_10/06/2026".
- Read the metadata from the input TIFF, and write these as an XMP block to the JP2 with ExifTool.
- Analyze the JP2 with Jpylyzer, and evaluate its output against the Schematron schema that defines the required technical properties and metadata.
- Check if the pixel values of the JP2 are identical to those of the input TIFF, using the libvips image processing library.
- Calculate the JP2's SHA-512 checksum, and add this value to the checksum file of the batch.
In addition to this, it writes the following files to the root of the output batch:
This is a text file with a (very) brief summary of the resuls of the workflow. Here's an example:
Grok version: 20.3.3
Errors: 0
Warnings: 0
See batch manifest and log file for details on errors and warnings
This is a semicolon-delimited file with information about each converted image. It has the following columns:
| Column | Meaning |
|---|---|
| image | relative path + name of the image |
| successGrok | True/False flag that indicates if Grok's TIFF to JP2 conversion was successful |
| successExifTool | True/False flag that indicates if ExifTool's metadata copying was successful |
| palettedImage | True/False flag that is True if the output JP2 has a color palette, and False otherwise (some JP2 decoders cannot decode paletted images, so in general you may want to avoid them) |
| successPixelCheck | True/False flag that indicates if the check on the pixel values was successful |
| successJpylyzerCheck | True/False flag that indicates if the check on the image properties (Jpylyzer + Schematron) was successful |
| failedJpylyzerChecks | List of failed Jpylyzer + Schematron checks (separated by "|" characters) |
Log file. If the summary file indicates any errors or warnings, the log file provides detailed information on them (look for the ERROR and WARNING messages).
File with SHA-512 checksums (format is compatible with the sha512sum tool).
This workflow is identical to the tifftojp2-generic workflow, but with the following addition:
- Convert any palette-color TIFFs to a regular (RGB or grayscale) JP2. For this, the workflow uses ExifTool to check the value of the "PhotometricInterpretation" TIFF tag. If this is "3" ("RGB Palette"), the workflow uses libvips to convert the input TIFF to a (temporary) unpaletted TIFF file. This temporary TIFF is then converted to JP2. Converting to unpaletted JP2 can be useful, because paletted JP2s are quite uncommon, and not all decoders support them.
Workflow for batches from the "Middeleeuwse Handschriften" (Medieval Manuscripts) project. This workflow is largely identical to the tifftojp2-generic workflow, but with the following additions:
- Read concordance tables from the input batch, and write corresponding concordance tables to the output batch, and update all references to TIFF images to JP2. Any references to non-TIFF images (e.g. access JPEGs) are copied verbatim.
- Perform a two-way check on the output concordance tables: first check that all JP2 images defined in the concordance tables exist in the output batch, and then also check that all JP2 images in the output batch are defined in the concordance tables.
- Create verbatim copies of the directories "Pakbon" and "Access_Rename".
Note that at present, the entries in the "Middeleeuwse Handschriften" concordance tables don't include direct file path references. For the master images, the path follows from the name of the concordance table file, and for the targets it follows from the name of the target image.
Workflow for batches from the "Indisch Erfgoed" program. This workflow is largely identical to the tifftojp2-generic workflow, but with the following addition:
- Create verbatim copies of the directories "Afgeleiden", "Rapportages_meetresultaten", "Rapportages_onregelmatigheden" and "rapporten HeronQAE TC 5".
Schemas contain the Schematron rules on which the Jpylyzer check is based. Some background information about this type of rule-based validation can be found in this blog post. Currently the following schemas are included:
This is a schema for digitised medieval manuscripts.
KBiw is released under the Apache License, Version 2.0. Parts of the code were inspired by the Bodleian's Image Processing library.