Skip to content

Kaito-ishiguro/capynode

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cat-translate-0.8b-travel-gguf

Offline EN↔JA travel translator, Q4_K_M GGUF, runs fully on-device via llama.cpp / PocketPal.

Fine-tuned from cyberagent/CAT-Translate-0.8b (itself built on sbintuitions/sarashina2.2-0.5b) via LoRA domain adaptation on a curated travel corpus, then converted to GGUF via a patched convert_hf_to_gguf.py and quantized with importance-matrix (imatrix) calibration on travel data.


Files

File Size Notes
cat-translate-0.8b-travel-Q4_K_M.gguf 503.7 MB RECOMMENDED — iPhone target
cat-translate-0.8b-travel-Q5_K_M.gguf 569.2 MB Alternate
cat-translate-0.8b-travel-Q8_0.gguf 806.2 MB Near-lossless reference

All three files are imatrix-calibrated PTQ (importance-matrix post-training quantization), calibrated on a 534-example travel corpus weighted toward emergency (134 examples, all available) and food (200 examples) domains. Quantized with llama-quantize --imatrix.


Intended Use & Scope

Target use case: consumer travel translation — ordering food, navigating transportation, accommodation, shopping, and emergency communication in Japan.

Food and emergency domains are explicitly weighted in both fine-tuning (food is 42.5% of training pairs) and in the importance matrix calibration (emergency is the heaviest-weighted domain by proportion). The motivating scenario is a foreigner with no internet access facing a Japanese-only emergency line — "Please call an ambulance" and "I am having an allergic reaction" must be correct.

This model is a credible demonstrator with rigorous safety-aware evaluation. It is not a deployable medical, legal, or defense product. The consumer travel app is the near-term target. See the safety architecture note below for how life-critical phrases are handled in the intended deployment.

Not in scope: medical advice, legal documents, regulated translation workflows, or any context where a confident-wrong output carries high-stakes consequences.


Safety Architecture

The intended deployment is a hybrid phrasebook + LLM system:

  • Deterministic, human-verified phrasebook handles life-critical phrases: calling an ambulance, reporting an allergic reaction, "I can't breathe", "I'm lost". These are fixed strings — no probability, no hallucination risk.
  • The LLM handles the open-ended long tail: menus, negotiations, descriptions — anything that formulaic phrases don't cover.

A 4-bit quantized LLM alone must not be relied on for life-critical phrases. Confident-wrong is the worst failure mode in an emergency; the phrasebook is the safety net.


Evaluation

Measured on a 156-item held-out gold test set (78 EN→JA + 78 JA→EN), food- and emergency-weighted, with all 44 emergency references human-verified. Evaluated at the desktop inference stage using llama-server with plain-text prompts on stock llama.cpp b9716.

Label Size (MB) COMET (all) Δ vs bf16 food COMET emerg COMET BLEU en→ja BLEU ja→en
bf16 1,515.2 0.9001 ±0.0000 0.8960 0.8913 48.87 39.99
Q8_0 806.2 0.9022 +0.0021 0.8971 0.8925 49.79 40.35
Q5_K_M 569.2 0.9000 −0.0001 0.8933 0.8925 48.92 41.32
Q4_K_M 503.7 0.8988 −0.0013 0.8892 0.8986 47.96 38.97

COMET model: Unbabel/wmt22-comet-da.
BLEU tokenizers: char for EN→JA (sacrebleu), 13a for JA→EN (sacrebleu).

Why Q4_K_M: COMET delta −0.0013 vs bf16 is inside the 0.005 negligible threshold, at 67% smaller. Emergency gate clears: Q4 emerg COMET 0.8986 vs bf16 0.8913 — zero emergency degradation at 4-bit (the small apparent gain is within n=44 noise; the honest claim is no degradation). The entire quant ladder sits within ~0.003 COMET; quant choice collapses to smallest-that-fits-the-phone.

Read these caveats before drawing conclusions

  • COMET ≈ parity is the reliable signal. Do not read small BLEU swings between quant levels as quality conclusions — the variation is within small-sample noise.
  • These 156-item numbers are NOT comparable to Phase 3 (fine-tune) numbers. The earlier 40-item eval set contained shorter, more formulaic sentences. The 156-item set includes longer, more varied sentences; BLEU naturally drops on harder sets. The difference is not a regression.
  • EN→JA BLEU uses char tokenizer; JA→EN uses 13a. Mixing tokenizers makes numbers incomparable. If you re-evaluate, pin these tokenizers explicitly.

Differentiation

The value proposition is pragmatics, not BLEU. Incumbent offline NMT (Google Translate offline ~47 MB, Apple ~100 MB) wins literal character-BLEU. This model's edge:

  • Pro-drop resolution — correctly inferring dropped subjects/objects from context in Japanese
  • Keigo — appropriate politeness register for service interactions (restaurant, hotel, shop)
  • Cultural and food context — menu descriptions, allergen terms, regional food vocabulary

Do not frame this as a BLEU race against Google or Apple. Frame it on pragmatic correctness in travel scenarios that require cultural and register awareness.


Critical Tokenizer Note

This GGUF was built with a patched convert_hf_to_gguf.py and tokenizes correctly only in llama.cpp builds that use the resulting tokenizer.ggml.model = "t5" path.

Background: cyberagent/CAT-Translate-0.8b uses a SentencePiece Unigram tokenizer (inherited from sbintuitions/sarashina2.2-0.5b). The stock llama.cpp conversion script hardcodes tokenizer.ggml.model = "llama" for all SPM models, routing the runtime into the greedy-merge SPM path. SPM greedy-merge ≠ Unigram Viterbi — it produces silently wrong token splits. All known community GGUFs of this model family (mradermacher, mmnga-o) have this bug.

The fix (applied at conversion time): llama.cpp/conversion/base.py::_set_vocab_sentencepiece() was modified to (1) detect tokenizer.json → model.type == "Unigram" and emit tokenizer.ggml.model = "t5", routing llama.cpp into its LLAMA_VOCAB_TYPE_UGM Viterbi path; and (2) selectively promote BYTE tokens to NORMAL type only for byte characters that have no existing NORMAL token — so control characters like \n tokenize correctly without triggering the duplicate-text assertion crash (GGML_ASSERT(id_to_token.size() == token_to_id.size())).

Validation on stock llama.cpp b9716 (2026-06-19): /tokenize on the template string: 25/25 tokens match HF output exactly, including \n\n = [25, 25]. Broad parity test over 30 diverse strings: all match. COMET ≈ HF parity in both directions on the 40-sentence eval.

Verify your build: run /tokenize on a Japanese string and compare against HF tokenizer output. If your llama.cpp or PocketPal build uses an unpatched conversion (model string "llama" instead of "t5"), tokenization will be silently wrong and translation quality will degrade. PocketPal validation on a real device remains a required gate before any app deployment — b9716 desktop testing is a strong proxy, not a guarantee.

The patch is published at patches/llama_cpp_ugm_tokenizer.patch (against llama.cpp b9716, commit db52540).


Usage

llama-cli

llama-cli \
  -m cat-translate-0.8b-travel-Q4_K_M.gguf \
  --no-warmup \
  -p "Translate the following English text into Japanese.\n\nI'd like a table for two."

llama-server (OpenAI-compatible)

# Start server
llama-server -m cat-translate-0.8b-travel-Q4_K_M.gguf --port 8080

# Query
curl http://localhost:8080/completion \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Translate the following English text into Japanese.\n\nI have a peanut allergy.", "n_predict": 128}'

Use plain-text prompts only — no chat template, no system message. Template: "Translate the following {src_lang} text into {tgt_lang}.\n\n{source}"

For JA→EN: "Translate the following Japanese text into English.\n\n救急車を呼んでください。"

PocketPal (iOS / Android)

Open PocketPal → Models → Add Model → enter the Hugging Face repo ID kaiish/capynode → select cat-translate-0.8b-travel-Q4_K_M.gguf. Enable airplane mode after download to confirm fully offline operation.


Running in a chat app (PocketPal, etc.)

This model is single-turn and uses no system message. Chat UIs that wrap input in a default system prompt, or that accumulate conversation history across turns, will break it — the model drops out of translation mode and responds conversationally instead. This is expected behaviour: CAT-Translate is invoked purely by the instruction in a single user turn, not by a persistent persona or context.

Recipe validated working fully offline on iPhone 11 in PocketPal:

  1. System prompt: empty — clear any default system message in the model settings.
  2. One translation per conversation — start a fresh chat for each translation; do not carry history between requests.
  3. Temperature 0 (greedy) for clean, deterministic output.
  4. Jinja chat-template ON — use the embedded template (enabled by default in PocketPal when the GGUF contains a tokenizer.chat_template key, which this one does).

For app integration, the cleaner path is to call the model with the instruction as a single user turn (or as a raw text completion) and no history — which is natural for a translation app since each translation is independent anyway.

Validated offline on iPhone 11 (4 GB) via PocketPal: correct EN→JA and JA→EN including emergency and food/allergen phrases; UGM tokenizer confirmed intact on the device build.


Training Data Provenance

Source Pairs License
Gemini distillation (gemini-3.1-pro-preview / gemini-2.5-flash) ~2,550 Synthetic, Google AI Studio free tier
Tatoeba ~1,230 CC BY 2.0
BSD dropped CC BY-NC-SA 4.0 — excluded (non-commercial clause)

Final training corpus: 6,458 train / 718 validation examples (bidirectional, sarashina2.2 instruct template). Domain distribution: food 42.5%, shopping 17.5%, transportation 15%, accommodation 15%, emergency 5%, general 5%.


License & Attribution

License: MIT. See release/LICENSE for this repository's terms.

This GGUF is built on two upstream MIT-licensed components. Their copyright and permission notices are preserved verbatim in the bundled license files below — these are the authoritative texts that must be carried forward in any distribution or deployment (e.g., in an app's About or credits screen):

Model Lineage

sbintuitions/sarashina2.2-0.5b   (MIT — SB Intuitions, 2025)
        │
        └── cyberagent/CAT-Translate-0.8b   (MIT — CyberAgent AI Lab, 2026)
                │
                └── LoRA fine-tune (travel domain adaptation)
                        │
                        └── imatrix Q4_K_M GGUF   (this repo — MIT, Kaito Ishiguro, 2026)

Cleared for: quantizing, re-hosting on HuggingFace, and shipping in a closed-source paid, subscription, or ad-supported app, provided both bundled license files above are preserved.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors