Bodyig-CPM

A module containing a character-pair merging algorithm catered to the orthographic nuances of Classical Tibetan. As Tibetan is written without spaces and segmented instead based on syllables (using punctuation particular to the script), Bodyig-CPM permits those working with the language to more easily perform character-pair merging which remains sensitive to sentence-, word-, and syllable-boundaries.

Provided a plaintext corpus (optionally, with each sentence on a new line) and the desired number of merge-iterations, Bodyig-CPM returns a set of merge rules and the corpus with these rules applied.

(TBD: Insert example sentences.)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
bodyig_cpm.py		bodyig_cpm.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bodyig-CPM

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bodyig-CPM

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages