Skip to content

merzsielen/Bodyig-CPM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bodyig-CPM

A module containing a character-pair merging algorithm catered to the orthographic nuances of Classical Tibetan. As Tibetan is written without spaces and segmented instead based on syllables (using punctuation particular to the script), Bodyig-CPM permits those working with the language to more easily perform character-pair merging which remains sensitive to sentence-, word-, and syllable-boundaries.

Provided a plaintext corpus (optionally, with each sentence on a new line) and the desired number of merge-iterations, Bodyig-CPM returns a set of merge rules and the corpus with these rules applied.

(TBD: Insert example sentences.)

About

A character-pair merging module for Classical Tibetan.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages