VoxelCubes · zaniar-z · Jun 28, 2026
diff --git a/README.fa.md b/README.fa.md
diff --git a/README.md b/README.md
@@ -35,6 +35,7 @@ The two bottom pages are what the program can output: either just the transparen
 > [Usage](#usage) \
 > [Profiles](#profiles) \
 > [OCR](#ocr) \
+> [Smart OCR (LLM Post-Correction)](#smart-ocr-llm-post-correction) \
 > [Examples](#examples-of-tricky-bubbles) \
 > [Acknowledgements](#acknowledgements) \
 > [License](#license) \
@@ -72,6 +73,8 @@ The two bottom pages are what the program can output: either just the transparen
 
 - Can also run OCR on the pages and output the text to a file.
 
+- Optionally runs the OCR output through a large language model (via [airllm](https://github.com/lyogavin/Anima/tree/main/air_llm)) to fix garbled text — "smart" OCR post-correction.
+
 - Review cleaning and OCR output, including editing the OCR output interactively before saving it.
 
 - Interface available in: English, German, Bulgarian, Spanish
@@ -308,6 +311,53 @@ For detailed installation instructions and additional information, please refer
 
 > Note: While Tesseract supports additional languages, Panel Cleaner will only utilize Tesseract for English and Japanese text recognition. English is installed by default. Follow the instructions here [Installing additional language packs](https://ocrmypdf.readthedocs.io/en/latest/languages.html) to install the Japanese language pack.  
 
+## Smart OCR (LLM Post-Correction)
+
+Panel Cleaner can optionally run the OCR output through a large language model (LLM) to fix garbled text — correcting misread characters, broken words, and stray punctuation — while preserving the original language and meaning. It does **not** translate. This is useful when the raw manga-ocr or Tesseract output is noisy and you want cleaner text for translation or archival.
+
+This feature is powered by [airllm](https://github.com/lyogavin/Anima/tree/main/air_llm), which performs layer-wise disk offloading so that very large models (e.g. Llama-3-70B) can run on just a few gigabytes of RAM, at the cost of slow token generation.
+
+### Enabling LLM Correction
+
+The feature is **off by default**. First, install the optional `[llm]` extra:
+
+```bash
+pip install pcleaner[llm]
+```
+
+Then enable it for a single OCR run with the `--use-llm` flag:
+
+```bash
+pcleaner ocr myfolder --output-path=output.txt --use-llm
+```
+
+Or enable it permanently by setting `llm_enabled = True` in the `[LLM]` section of your [profile](#profiles):
+
+```ini
+[LLM]
+llm_enabled = True
+llm_model = meta-llama/Meta-Llama-3-8B-Instruct
+llm_max_bubbles_per_prompt = 40
+llm_max_new_tokens = 1024
+llm_compression =        # leave empty, or use "4bit" / "8bit"
+llm_hf_token =           # required for gated models like meta-llama/*
+```
+
+### How It Works
+
+1. Panel Cleaner runs OCR as usual, collecting the text from every detected bubble.
+2. The OCR text from many bubbles is batched into a single LLM prompt (since airllm generation is slow, batching is much faster than one prompt per bubble).
+3. The LLM returns a JSON array of corrected strings, which replace the raw OCR output.
+4. If a batch fails or the model returns the wrong number of items, the original OCR text is kept for that batch — so a single bad batch never aborts the whole run.
+
+### Notes
+
+- **Slow by design:** airllm offloads model layers to disk, so generation is far slower than a normal GPU inference. This is the trade-off for running large models on minimal RAM.
+- **Instruct-tuned models** (e.g. `Meta-Llama-3-8B-Instruct`) are strongly recommended.
+- **Gated models** (such as the `meta-llama/*` repos) require a [Hugging Face token](https://huggingface.co/settings/tokens). Set it via `llm_hf_token` in the profile or the `HF_TOKEN` environment variable.
+- **Compression:** setting `llm_compression` to `4bit` or `8bit` enables block-wise quantization for up to ~3x faster inference at a small accuracy cost (requires the `bitsandbytes` package).
+- **GPU recommended:** airllm targets CUDA. On CPU-only machines the model may fail to load — in that case the run falls back to raw OCR output automatically.
+
 ## Examples of Tricky Bubbles
 
 | Original | Cleaned |
@@ -332,6 +382,8 @@ For detailed installation instructions and additional information, please refer
 - [Simple Lama Inpainting](https://github.com/enesmsahin/simple-lama-inpainting) for inpainting bubbles that can't be masked out.
   Using the fine-tuned [Model by dreMaz](https://huggingface.co/dreMaz/AnimeMangaInpainting).
 
+- [airllm](https://github.com/lyogavin/Anima/tree/main/air_llm) for running large language models on minimal RAM via layer-wise disk offloading, used by the optional smart OCR post-correction feature.
+
 
 ## License
 

diff --git a/pcleaner/config.py b/pcleaner/config.py
@@ -932,6 +932,109 @@ def fix(self) -> None:
         self.max_inpainting_radius = max(self.min_inpainting_radius, self.max_inpainting_radius)
 
 
+@define
+class LLMConfig:
+    # EXPERIMENTAL: Optional "smart" OCR post-correction via a large language model.
+    # Requires the optional [llm] extra:  pip install pcleaner[llm]
+    # airllm performs layer-wise disk offloading so very large models run on minimal RAM.
+    llm_enabled: bool = False
+    # Hugging Face repo id or local path of the model to load with airllm.
+    llm_model: str = "meta-llama/Meta-Llama-3-8B-Instruct"
+    # How many OCR bubbles to pack into a single LLM prompt. Higher saves on the very
+    # slow generation passes; too high may exceed the model's context window.
+    llm_max_bubbles_per_prompt: int = 40
+    # Maximum new tokens to generate per prompt. Must be large enough for the JSON array
+    # of corrections for one batch.
+    llm_max_new_tokens: int = 1024
+    # Optional block-wise quantization for ~3x speedup at a small accuracy cost.
+    # One of: "", "4bit", "8bit". Empty/none disables it.
+    llm_compression: str = ""
+    # Optional Hugging Face token, needed for gated models such as meta-llama/Llama-*.
+    llm_hf_token: str = ""
+
+    def export_to_conf(
+        self, config_updater: cu.ConfigUpdater, add_after_section: str, gui_mode: bool = False
+    ) -> None:
+        """
+        Write the config to the config updater object.
+
+        :param config_updater: An existing config updater object.
+        :param add_after_section: The section to add the new section after.
+        :param gui_mode: Whether to format the config for the GUI.
+        """
+        config_str = f"""\
+        [LLM]
+
+        # EXPERIMENTAL FEATURE: Optional "smart" OCR post-correction using a large language model.
+        # When enabled, the OCR text from the `pcleaner ocr` command is passed through an LLM that
+        # fixes garbled manga_ocr / tesseract output (misread characters, broken words, stray
+        # punctuation) while preserving the original language and meaning. It does not translate.
+        # [CLI: Use the --use-llm flag on the `pcleaner ocr` command to enable this for a single run.]
+        # This requires the optional [llm] extra to be installed: pip install pcleaner[llm]
+        # airllm performs layer-wise disk offloading, so large models (e.g. Llama-3-70B) can run on
+        # a few GB of RAM, at the cost of slow token generation.
+        llm_enabled = {self.llm_enabled}
+
+        # The Hugging Face repository id (e.g. "meta-llama/Meta-Llama-3-8B-Instruct") or a local path
+        # of the model to load with airllm. Instruct-tuned models are strongly recommended.
+        # Note: gated models (like the meta-llama ones) require a Hugging Face token, see below.
+        llm_model = {self.llm_model}
+
+        # How many OCR bubbles to pack into a single LLM prompt. Since airllm generation is slow,
+        # batching many bubbles into one prompt is much faster than one prompt per bubble.
+        # Lower this if you exceed the model's context window or run out of memory.
+        llm_max_bubbles_per_prompt = {self.llm_max_bubbles_per_prompt}
+
+        # The maximum number of new tokens to generate per prompt. This needs to be large enough to
+        # hold the JSON array of corrections for one batch.
+        llm_max_new_tokens = {self.llm_max_new_tokens}
+
+        # Optional block-wise quantization for up to ~3x faster inference at a small accuracy cost.
+        # Leave empty for no compression, or set to "4bit" or "8bit". Requires the bitsandbytes package.
+        llm_compression = {self.llm_compression}
+
+        # Optional Hugging Face token, required to download gated models such as the meta-llama ones.
+        # Leave empty if your model is not gated. You can also set the HF_TOKEN environment variable.
+        llm_hf_token = {self.llm_hf_token}
+
+        """
+        llm_conf = cu.ConfigUpdater()
+        llm_conf.read_string(multi_left_strip(format_for_version(config_str, gui_mode)))
+        llm_section = llm_conf["LLM"]
+        config_updater[add_after_section].add_after.space(2).section(llm_section.detach())
+
+    def import_from_conf(self, config_updater: cu.ConfigUpdater) -> None:
+        """
+        Read the config from the config updater object.
+
+        :param config_updater: An existing config updater object.
+        """
+        section = "LLM"
+        if not config_updater.has_section(section):
+            logger.debug(f"No {section} section found in the profile, using defaults.")
+            return
+
+        try_to_load(self, config_updater, section, bool, "llm_enabled")
+        try_to_load(self, config_updater, section, str, "llm_model")
+        try_to_load(self, config_updater, section, int, "llm_max_bubbles_per_prompt")
+        try_to_load(self, config_updater, section, int, "llm_max_new_tokens")
+        try_to_load(self, config_updater, section, str, "llm_compression")
+        try_to_load(self, config_updater, section, str, "llm_hf_token")
+
+    def fix(self) -> None:
+        if self.llm_max_bubbles_per_prompt < 1:
+            self.llm_max_bubbles_per_prompt = 1
+        if self.llm_max_new_tokens < 1:
+            self.llm_max_new_tokens = 1
+        if self.llm_compression not in ("", "4bit", "8bit"):
+            logger.warning(
+                f"Invalid llm_compression '{self.llm_compression}', disabling compression."
+            )
+            self.llm_compression = ""
+        if not self.llm_model.strip():
+            self.llm_model = "meta-llama/Meta-Llama-3-8B-Instruct"
+
+
 @define
 class Profile:
     """
@@ -944,6 +1047,7 @@ class Profile:
     masker: MaskerConfig = field(factory=MaskerConfig)
     denoiser: DenoiserConfig = field(factory=DenoiserConfig)
     inpainter: InpainterConfig = field(factory=InpainterConfig)
+    llm: LLMConfig = field(factory=LLMConfig)
 
     def bundle_config(self, gui_mode: bool = False) -> cu.ConfigUpdater:
         """
@@ -959,6 +1063,7 @@ def bundle_config(self, gui_mode: bool = False) -> cu.ConfigUpdater:
         self.masker.export_to_conf(config_updater, "Preprocessor", gui_mode=gui_mode)
         self.denoiser.export_to_conf(config_updater, "Masker", gui_mode=gui_mode)
         self.inpainter.export_to_conf(config_updater, "Denoiser", gui_mode=gui_mode)
+        self.llm.export_to_conf(config_updater, "Inpainter", gui_mode=gui_mode)
         return config_updater
 
     def hash_current_values(self) -> int:
@@ -1027,6 +1132,7 @@ def load(cls, path: Path) -> "Profile":
             profile.masker.import_from_conf(config)
             profile.denoiser.import_from_conf(config)
             profile.inpainter.import_from_conf(config)
+            profile.llm.import_from_conf(config)
             profile.fix()
         except Exception:
             logger.exception(f"Failed to load profile from {path}")
@@ -1061,6 +1167,7 @@ def fix(self) -> None:
         self.masker.fix()
         self.denoiser.fix()
         self.inpainter.fix()
+        self.llm.fix()
 
 
 @define

diff --git a/pcleaner/gui/profile_parser.py b/pcleaner/gui/profile_parser.py
@@ -288,23 +288,33 @@ def _get_text() -> str | None:
                     name_mapper = {
                         LayeredExport.NONE: self.tr("None", "Layered export option"),
                         LayeredExport.PSD_BULK: self.tr("PSD Bulk", "Layered export option"),
-                        LayeredExport.PSD_PER_IMAGE: self.tr("PSD Per Image", "Layered export option"),
+                        LayeredExport.PSD_PER_IMAGE: self.tr(
+                            "PSD Per Image", "Layered export option"
+                        ),
                     }
                 case EntryTypes.LanguageCode:
                     enum_class = LanguageCode
                     name_mapper = {
                         LanguageCode.detect_box: self.tr("Detect per box", "Language code option"),
-                        LanguageCode.detect_page: self.tr("Detect per page", "Language code option"),
+                        LanguageCode.detect_page: self.tr(
+                            "Detect per page", "Language code option"
+                        ),
                         LanguageCode.jpn: self.tr("Japanese", "Language code option"),
                         LanguageCode.eng: self.tr("English", "Language code option"),
                         LanguageCode.kor: self.tr("Korean", "Language code option"),
                         LanguageCode.kor_vert: self.tr("Korean (vertical)", "Language code option"),
-                        LanguageCode.chi_sim: self.tr("Chinese - Simplified", "Language code option"),
-                        LanguageCode.chi_tra: self.tr("Chinese - Traditional", "Language code option"),
+                        LanguageCode.chi_sim: self.tr(
+                            "Chinese - Simplified", "Language code option"
+                        ),
+                        LanguageCode.chi_tra: self.tr(
+                            "Chinese - Traditional", "Language code option"
+                        ),
                         LanguageCode.sqi: self.tr("Albanian", "Language code option"),
                         LanguageCode.ara: self.tr("Arabic", "Language code option"),
                         LanguageCode.aze: self.tr("Azerbaijani", "Language code option"),
-                        LanguageCode.aze_cyrl: self.tr("Azerbaijani - Cyrilic", "Language code option"),
+                        LanguageCode.aze_cyrl: self.tr(
+                            "Azerbaijani - Cyrilic", "Language code option"
+                        ),
                         LanguageCode.ben: self.tr("Bengali", "Language code option"),
                         LanguageCode.bul: self.tr("Bulgarian", "Language code option"),
                         LanguageCode.mya: self.tr("Burmese", "Language code option"),
@@ -769,4 +779,5 @@ def to_display_name(name: str) -> str:
         " ".join(word.capitalize() for word in s2.split(" "))
         .replace("Ai ", "AI ")
         .replace("Ocr", "OCR")
+        .replace("Llm", "LLM")
     )