Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
413 changes: 413 additions & 0 deletions README.fa.md

Large diffs are not rendered by default.

52 changes: 52 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ The two bottom pages are what the program can output: either just the transparen
> [Usage](#usage) \
> [Profiles](#profiles) \
> [OCR](#ocr) \
> [Smart OCR (LLM Post-Correction)](#smart-ocr-llm-post-correction) \
> [Examples](#examples-of-tricky-bubbles) \
> [Acknowledgements](#acknowledgements) \
> [License](#license) \
Expand Down Expand Up @@ -72,6 +73,8 @@ The two bottom pages are what the program can output: either just the transparen

- Can also run OCR on the pages and output the text to a file.

- Optionally runs the OCR output through a large language model (via [airllm](https://github.com/lyogavin/Anima/tree/main/air_llm)) to fix garbled text — "smart" OCR post-correction.

- Review cleaning and OCR output, including editing the OCR output interactively before saving it.

- Interface available in: English, German, Bulgarian, Spanish
Expand Down Expand Up @@ -308,6 +311,53 @@ For detailed installation instructions and additional information, please refer

> Note: While Tesseract supports additional languages, Panel Cleaner will only utilize Tesseract for English and Japanese text recognition. English is installed by default. Follow the instructions here [Installing additional language packs](https://ocrmypdf.readthedocs.io/en/latest/languages.html) to install the Japanese language pack.

## Smart OCR (LLM Post-Correction)

Panel Cleaner can optionally run the OCR output through a large language model (LLM) to fix garbled text — correcting misread characters, broken words, and stray punctuation — while preserving the original language and meaning. It does **not** translate. This is useful when the raw manga-ocr or Tesseract output is noisy and you want cleaner text for translation or archival.

This feature is powered by [airllm](https://github.com/lyogavin/Anima/tree/main/air_llm), which performs layer-wise disk offloading so that very large models (e.g. Llama-3-70B) can run on just a few gigabytes of RAM, at the cost of slow token generation.

### Enabling LLM Correction

The feature is **off by default**. First, install the optional `[llm]` extra:

```bash
pip install pcleaner[llm]
```

Then enable it for a single OCR run with the `--use-llm` flag:

```bash
pcleaner ocr myfolder --output-path=output.txt --use-llm
```

Or enable it permanently by setting `llm_enabled = True` in the `[LLM]` section of your [profile](#profiles):

```ini
[LLM]
llm_enabled = True
llm_model = meta-llama/Meta-Llama-3-8B-Instruct
llm_max_bubbles_per_prompt = 40
llm_max_new_tokens = 1024
llm_compression = # leave empty, or use "4bit" / "8bit"
llm_hf_token = # required for gated models like meta-llama/*
```

### How It Works

1. Panel Cleaner runs OCR as usual, collecting the text from every detected bubble.
2. The OCR text from many bubbles is batched into a single LLM prompt (since airllm generation is slow, batching is much faster than one prompt per bubble).
3. The LLM returns a JSON array of corrected strings, which replace the raw OCR output.
4. If a batch fails or the model returns the wrong number of items, the original OCR text is kept for that batch — so a single bad batch never aborts the whole run.

### Notes

- **Slow by design:** airllm offloads model layers to disk, so generation is far slower than a normal GPU inference. This is the trade-off for running large models on minimal RAM.
- **Instruct-tuned models** (e.g. `Meta-Llama-3-8B-Instruct`) are strongly recommended.
- **Gated models** (such as the `meta-llama/*` repos) require a [Hugging Face token](https://huggingface.co/settings/tokens). Set it via `llm_hf_token` in the profile or the `HF_TOKEN` environment variable.
- **Compression:** setting `llm_compression` to `4bit` or `8bit` enables block-wise quantization for up to ~3x faster inference at a small accuracy cost (requires the `bitsandbytes` package).
- **GPU recommended:** airllm targets CUDA. On CPU-only machines the model may fail to load — in that case the run falls back to raw OCR output automatically.

## Examples of Tricky Bubbles

| Original | Cleaned |
Expand All @@ -332,6 +382,8 @@ For detailed installation instructions and additional information, please refer
- [Simple Lama Inpainting](https://github.com/enesmsahin/simple-lama-inpainting) for inpainting bubbles that can't be masked out.
Using the fine-tuned [Model by dreMaz](https://huggingface.co/dreMaz/AnimeMangaInpainting).

- [airllm](https://github.com/lyogavin/Anima/tree/main/air_llm) for running large language models on minimal RAM via layer-wise disk offloading, used by the optional smart OCR post-correction feature.


## License

Expand Down
107 changes: 107 additions & 0 deletions pcleaner/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -932,6 +932,109 @@ def fix(self) -> None:
self.max_inpainting_radius = max(self.min_inpainting_radius, self.max_inpainting_radius)


@define
class LLMConfig:
# EXPERIMENTAL: Optional "smart" OCR post-correction via a large language model.
# Requires the optional [llm] extra: pip install pcleaner[llm]
# airllm performs layer-wise disk offloading so very large models run on minimal RAM.
llm_enabled: bool = False
# Hugging Face repo id or local path of the model to load with airllm.
llm_model: str = "meta-llama/Meta-Llama-3-8B-Instruct"
# How many OCR bubbles to pack into a single LLM prompt. Higher saves on the very
# slow generation passes; too high may exceed the model's context window.
llm_max_bubbles_per_prompt: int = 40
# Maximum new tokens to generate per prompt. Must be large enough for the JSON array
# of corrections for one batch.
llm_max_new_tokens: int = 1024
# Optional block-wise quantization for ~3x speedup at a small accuracy cost.
# One of: "", "4bit", "8bit". Empty/none disables it.
llm_compression: str = ""
# Optional Hugging Face token, needed for gated models such as meta-llama/Llama-*.
llm_hf_token: str = ""

def export_to_conf(
self, config_updater: cu.ConfigUpdater, add_after_section: str, gui_mode: bool = False
) -> None:
"""
Write the config to the config updater object.

:param config_updater: An existing config updater object.
:param add_after_section: The section to add the new section after.
:param gui_mode: Whether to format the config for the GUI.
"""
config_str = f"""\
[LLM]

# EXPERIMENTAL FEATURE: Optional "smart" OCR post-correction using a large language model.
# When enabled, the OCR text from the `pcleaner ocr` command is passed through an LLM that
# fixes garbled manga_ocr / tesseract output (misread characters, broken words, stray
# punctuation) while preserving the original language and meaning. It does not translate.
# [CLI: Use the --use-llm flag on the `pcleaner ocr` command to enable this for a single run.]
# This requires the optional [llm] extra to be installed: pip install pcleaner[llm]
# airllm performs layer-wise disk offloading, so large models (e.g. Llama-3-70B) can run on
# a few GB of RAM, at the cost of slow token generation.
llm_enabled = {self.llm_enabled}

# The Hugging Face repository id (e.g. "meta-llama/Meta-Llama-3-8B-Instruct") or a local path
# of the model to load with airllm. Instruct-tuned models are strongly recommended.
# Note: gated models (like the meta-llama ones) require a Hugging Face token, see below.
llm_model = {self.llm_model}

# How many OCR bubbles to pack into a single LLM prompt. Since airllm generation is slow,
# batching many bubbles into one prompt is much faster than one prompt per bubble.
# Lower this if you exceed the model's context window or run out of memory.
llm_max_bubbles_per_prompt = {self.llm_max_bubbles_per_prompt}

# The maximum number of new tokens to generate per prompt. This needs to be large enough to
# hold the JSON array of corrections for one batch.
llm_max_new_tokens = {self.llm_max_new_tokens}

# Optional block-wise quantization for up to ~3x faster inference at a small accuracy cost.
# Leave empty for no compression, or set to "4bit" or "8bit". Requires the bitsandbytes package.
llm_compression = {self.llm_compression}

# Optional Hugging Face token, required to download gated models such as the meta-llama ones.
# Leave empty if your model is not gated. You can also set the HF_TOKEN environment variable.
llm_hf_token = {self.llm_hf_token}

"""
llm_conf = cu.ConfigUpdater()
llm_conf.read_string(multi_left_strip(format_for_version(config_str, gui_mode)))
llm_section = llm_conf["LLM"]
config_updater[add_after_section].add_after.space(2).section(llm_section.detach())

def import_from_conf(self, config_updater: cu.ConfigUpdater) -> None:
"""
Read the config from the config updater object.

:param config_updater: An existing config updater object.
"""
section = "LLM"
if not config_updater.has_section(section):
logger.debug(f"No {section} section found in the profile, using defaults.")
return

try_to_load(self, config_updater, section, bool, "llm_enabled")
try_to_load(self, config_updater, section, str, "llm_model")
try_to_load(self, config_updater, section, int, "llm_max_bubbles_per_prompt")
try_to_load(self, config_updater, section, int, "llm_max_new_tokens")
try_to_load(self, config_updater, section, str, "llm_compression")
try_to_load(self, config_updater, section, str, "llm_hf_token")

def fix(self) -> None:
if self.llm_max_bubbles_per_prompt < 1:
self.llm_max_bubbles_per_prompt = 1
if self.llm_max_new_tokens < 1:
self.llm_max_new_tokens = 1
if self.llm_compression not in ("", "4bit", "8bit"):
logger.warning(
f"Invalid llm_compression '{self.llm_compression}', disabling compression."
)
self.llm_compression = ""
if not self.llm_model.strip():
self.llm_model = "meta-llama/Meta-Llama-3-8B-Instruct"


@define
class Profile:
"""
Expand All @@ -944,6 +1047,7 @@ class Profile:
masker: MaskerConfig = field(factory=MaskerConfig)
denoiser: DenoiserConfig = field(factory=DenoiserConfig)
inpainter: InpainterConfig = field(factory=InpainterConfig)
llm: LLMConfig = field(factory=LLMConfig)

def bundle_config(self, gui_mode: bool = False) -> cu.ConfigUpdater:
"""
Expand All @@ -959,6 +1063,7 @@ def bundle_config(self, gui_mode: bool = False) -> cu.ConfigUpdater:
self.masker.export_to_conf(config_updater, "Preprocessor", gui_mode=gui_mode)
self.denoiser.export_to_conf(config_updater, "Masker", gui_mode=gui_mode)
self.inpainter.export_to_conf(config_updater, "Denoiser", gui_mode=gui_mode)
self.llm.export_to_conf(config_updater, "Inpainter", gui_mode=gui_mode)
return config_updater

def hash_current_values(self) -> int:
Expand Down Expand Up @@ -1027,6 +1132,7 @@ def load(cls, path: Path) -> "Profile":
profile.masker.import_from_conf(config)
profile.denoiser.import_from_conf(config)
profile.inpainter.import_from_conf(config)
profile.llm.import_from_conf(config)
profile.fix()
except Exception:
logger.exception(f"Failed to load profile from {path}")
Expand Down Expand Up @@ -1061,6 +1167,7 @@ def fix(self) -> None:
self.masker.fix()
self.denoiser.fix()
self.inpainter.fix()
self.llm.fix()


@define
Expand Down
21 changes: 16 additions & 5 deletions pcleaner/gui/profile_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -288,23 +288,33 @@ def _get_text() -> str | None:
name_mapper = {
LayeredExport.NONE: self.tr("None", "Layered export option"),
LayeredExport.PSD_BULK: self.tr("PSD Bulk", "Layered export option"),
LayeredExport.PSD_PER_IMAGE: self.tr("PSD Per Image", "Layered export option"),
LayeredExport.PSD_PER_IMAGE: self.tr(
"PSD Per Image", "Layered export option"
),
}
case EntryTypes.LanguageCode:
enum_class = LanguageCode
name_mapper = {
LanguageCode.detect_box: self.tr("Detect per box", "Language code option"),
LanguageCode.detect_page: self.tr("Detect per page", "Language code option"),
LanguageCode.detect_page: self.tr(
"Detect per page", "Language code option"
),
LanguageCode.jpn: self.tr("Japanese", "Language code option"),
LanguageCode.eng: self.tr("English", "Language code option"),
LanguageCode.kor: self.tr("Korean", "Language code option"),
LanguageCode.kor_vert: self.tr("Korean (vertical)", "Language code option"),
LanguageCode.chi_sim: self.tr("Chinese - Simplified", "Language code option"),
LanguageCode.chi_tra: self.tr("Chinese - Traditional", "Language code option"),
LanguageCode.chi_sim: self.tr(
"Chinese - Simplified", "Language code option"
),
LanguageCode.chi_tra: self.tr(
"Chinese - Traditional", "Language code option"
),
LanguageCode.sqi: self.tr("Albanian", "Language code option"),
LanguageCode.ara: self.tr("Arabic", "Language code option"),
LanguageCode.aze: self.tr("Azerbaijani", "Language code option"),
LanguageCode.aze_cyrl: self.tr("Azerbaijani - Cyrilic", "Language code option"),
LanguageCode.aze_cyrl: self.tr(
"Azerbaijani - Cyrilic", "Language code option"
),
LanguageCode.ben: self.tr("Bengali", "Language code option"),
LanguageCode.bul: self.tr("Bulgarian", "Language code option"),
LanguageCode.mya: self.tr("Burmese", "Language code option"),
Expand Down Expand Up @@ -769,4 +779,5 @@ def to_display_name(name: str) -> str:
" ".join(word.capitalize() for word in s2.split(" "))
.replace("Ai ", "AI ")
.replace("Ocr", "OCR")
.replace("Llm", "LLM")
)
Loading