It is a simple Unicode encyclopedia and the most comprehensive character map ever. Right now Windows only.
Lifecycle phase: 5/7 (production/stable, inactive). Minor troubles with sustainability, but generally survived five releases of Unicode, 14.0 to 17.0. Probably won’t survive 18.0, just because…
I’m a Ukrainian officer now. Everything may happen: maybe I’ll start the work again. Maybe I’ll just get killed.
It has been moved to a separate repo. Visit https://github.com/Mercury13/unicodia-sesh
I was asked several times, but by this time it had already been portable.
Open Unicodia.xml, it’s documented.
Unicodia does not collect data at all, but uses GitHub API for updating.
- Ask programmer to add localized buttons if needed. One button is international for now, A-Z, and it already has Cyrillic, Katakana and Chinese versions. The rest are unchangeable for now… until needed.
- Download Lang-src/en.uorig from this repo.
- If you are able to use Git, better use it. We’ll be able to work together on one translation.
- Put Unicodia to writeable location.
- Create a language directory, edit locale.xml for that language.
- Download UTranslator. New → Translation of *.uorig.
- If you don’t know English, use another *.utran file as a reference translation.
- After saving, UTranslator created lang.xml. Put it to language directory. Or use a symlinking tool to tie these files forever and avoid handwork.
- Press F12 in Unicodia to reload translation without reopening the entire program.
- Warning, it reloads strings only; all locales are loaded on startup.
- nspk template parameters: 1=language name (or script name, non-localizable), 2=pre-comment (e.g. synonym, localizable).
- If there’s no {{nspk}} in languages and there’s language data, default {{nspk}} is added automatically. So: {{nspk}} at the end → delete, it’ll be added! Need e.g. synonym → add {{nspk||=Klingon}}. Synonym is the SECOND parameter. See Script.Mroo in English/Russian.
- To test alphabetic sorting, especially in troublesome languages like Japanese: press Ct+Sh+W and look into Blocks drop-down list (does not work in Sort by tech name). There’s only one telltale, [1] when the 1st character does not belong to the sorting alphabet. These [1]’s are often mistakes and always signs of attention.
- When a new original has arrived: open the translation, press File → Update data (Ctrl+F5). The interface will guide you. The command Go → Find warnings → All (F6) can also help.
- Do not forget to reset the red eye/warning icon when the translation finally reflects the new original! Either double click, or Ctrl+Enter.
Common. No war jargon. Describe 2022 war as neutral as possible. Every lingua franca (English, Russian, French) in its international form. Make examples as patriotic as possible for language we’re writing in: the same letter is Russian and Ukrainian in respective L10n’s. And English if the same phenomenon exists in English language. Apostrophe is U+2019.
Is Old in the front or in the back? It depends. 1) In Scripts — as convenient. In Blocks… 2) Old is the main word (Ancient symbols) → better front. 3) Auxiliary block (Old Sogdian, Ancient Greek) → no matter, we’ll find it anyway by looking around Greek. 4) Old is an adjective to something more important (Italic old, Mongolian old, Permic old) → better back. It’s just ease of finding a block in the long list of 300 blocks.
AI as a translator. Allowed, but at least check it somehow.
English. The dialect called “International English” or “English as a lingua franca”: use the best word for non-native audience. Examples: truck > lorry, petrol > gas, -ize > -ise. Prefer British form if both are good. Punctuation around quotes is British/international: it’s inside quotes if it’s a part of “phrase being quoted”.
Though the grammar must be close to the British/American origin, with articles and tenses. Unless you are native/proficient, every new sufficiently large text must be grammar-checked with Grammarly or a similar AI tool.
Russian. Ё is mandatory. No grammatical concessions to Ukrainian.
(May apply to new languages as well.) Adjectives like Georgian may agree to script (письменность, female in Russian), or to language (язык, male). The rules are…
- BLOCKS: strongly connected to language → to language (грузинский=Georgian [language]). Otherwise to script (батакская=Batak [script]).
- SCRIPTS: of course agree to script (грузинская=Georgian [script]).
Ukrainian. See Lang-src/Ukrainian.md.
New languages.
- As English uses lots of capital letters, translations to other languages may use small where English is capital. Refer to Russian/Ukrainian for letter case.
- See Russian script/language rule.
About war jargon. Open-source software with neutral license and without special purpose (e.g. censorship circumvention) should be neutral. Period.
- Slight C++20 and std::filesystem here → so need either MSYS or recent Qt with MinGW 11.
- Also need cURL (present in W10 18H2+), 7-zip, UTransCon, SvgCleaner.
- Run
!rel.batfile.- If there are troubles with paths, make a local configuration by creating
~setup_local.bat. Write only the keys that are bad in defaultxsetup.bat.
- If there are troubles with paths, make a local configuration by creating
See develop.md.
Win7/10/11 x64 only. Rationale:
- WXP, WVista and W8 are completely abandoned by all imaginable software. Though I did some improvements specially for W8.
- No obstacles for x86, just untested because no one compiled Qt for x86.
- Though W11 is not the main OS, I did many W11-specific improvements.
- W10/11 should support everything possible, W7 just runs somehow. At the time of testing still no BMP tofu, per old policy.
- Previously W7 supported the entire base plane and three important plane 1 scripts. I dropped that guarantee, though I did nothing against it, just did not test
- Small misrenderings in descriptions are tolerable, I’ll fix them only if samples are bad, or if the font has other problems.
Wartime: as soon as base arrives, and release date is frozen, even on alpha review stage
Peacetime (probably): stable release + some big font covering a major set arrives. Han too if the coverage is really high
Emergency releases of a few characters (e.g. currency, Japanese era): instantly, even if they are tofu
Fonts are always updated to release versions. Font is updated to alpha/beta if fixes a major misrender, and/or professionally implements a new character.
Naming: Noto if tables and existing glyphs are surely untouched; Uto otherwise.
These fonts are taken to Unicodia without author’s consent:
- Craggy font with missing/trivial tables. Examples: Garay, Tolong Siki
- Font without license belonging to the author of Unicode request and released by him/her. Examples: Makasar (now replaced), Tangsa
- The author is surely SIL, even if found elsewhere. Example: Toto
- Incomplete fonts that have only a few fixup characters, if found in requests in TTF form. Example: a few rare CJK chars.
- ASCII mapping is NOT a reason. Example: Ol Onal (wrote own)
I never rip fonts from Unicode charts, always use TTF form. But the authors of fixup fonts can on completely unrelated characters. Examples: PlanGothic P2 (now probably OK).
The only person I could ask about ideographs has died. Let these rules be for history, maybe I’ll coin others.
- Serif style > correctness
- One country is enough
- Preference of countries
- confirmed Chinese (G)
- = modernized confirmed Chinese
- > confirmed other (J, K, H, M, KP, V)
- > hypothetical Chinese, country-independent (JV)
- It’s perfectly OK to take hypothetical Chinese if it’s wrong in…
- SimSun makes the same decision
- stroke types and stroke joins only (what is invisible or barely seen in sans style), even pointy vs dot
- whether the strokes leave a small gap or written together, even two crosses (T) vs horizontal dagger (G)
- whether a stroke is convex up or down
- minor difference of stroke length e.g. in “three”
- style of roof stroke e.g. in 2F34: straight S (31D1, older) vs backslash dot D (31D4, current)
- number of strokes if off by one e.g. in 2E3D9: one dot in Chinese, two in SAT
Anyway, Unicodia will never be a good ideograph guide, everything I write about ideographs I suck from other sources.
Data is as neutral as possible. Examples.
- Number of people speaking Russian. Its current status is Lingua franca, so # of L2 speakers is always shaky, especially under current world war. So just # of L1 speakers
- Number of people speaking Ukrainian. Under this war people tend to conceal native Russian, and Ukrainian’s status is Alive, so # of L1 speakers is ⪢ L2 → so total number
- Disputable territory. Disputable, who currently controls, and maybe who is disputing
- Finish GlyphWiki loader.
- Better CJK reference.
- Plane map.