A module containing a character-pair merging algorithm catered to the orthographic nuances of Classical Tibetan. As Tibetan is written without spaces and segmented instead based on syllables (using punctuation particular to the script), Bodyig-CPM permits those working with the language to more easily perform character-pair merging which remains sensitive to sentence-, word-, and syllable-boundaries.
Provided a plaintext corpus (optionally, with each sentence on a new line) and the desired number of merge-iterations, Bodyig-CPM returns a set of merge rules and the corpus with these rules applied.
(TBD: Insert example sentences.)