Skip to content

IlyaGusev/rnnmorph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

107 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rnnmorph

Current version on PyPI Python versions Tests Status Code Climate

Important: please see https://github.com/natasha/slovnet#morphology-1

Morphological analyzer (POS tagger) for Russian and English languages based on neural networks and dictionary-lookup systems (pymorphy2, nltk).

Contacts

Russian language, MorphoRuEval-2017 test dataset, accuracy

Domain Full tag PoS tag F.t. + lemma Sentence f.t. Sentence f.t.l.
Lenta (news) 96.31% 98.01% 92.96% 77.93% 52.79%
VK (social) 95.20% 98.04% 92.06% 74.30% 60.56%
JZ (lit.) 95.87% 98.71% 90.45% 73.10% 43.15%
All 95.81% 98.26% N/A 74.92% N/A

English language, UD EWT test, accuracy

Dataset Full tag PoS tag F.t. + lemma Sentence f.t. Sentence f.t.l.
UD EWT test 91.57% 94.10% 87.02% 63.17% 50.99%

Speed and memory consumption

Speed: from 200 to 600 words per second using CPU.

Memory consumption: about 500-600 MB for single-sentence predictions

Install

pip install rnnmorph

Usage

Example: Open In Colab

from rnnmorph.predictor import RNNMorphPredictor
predictor = RNNMorphPredictor(language="ru")
forms = predictor.predict(["мама", "мыла", "раму"])
print(forms[0].pos)
>>> NOUN
print(forms[0].tag)
>>> Case=Nom|Gender=Fem|Number=Sing
print(forms[0].normal_form)
>>> мама
print(forms[0].vector)
>>> [0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 1]

Training

Simple model training: Open In Colab

Acknowledgements

About

Morphological analyzer for Russian and English languages based on neural networks and dictionary-lookup systems.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors