Skip to content

ahmedcali84/Custom_NLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Custom NLP Library

This NLP class handles file-based text processing, including reading, tokenizing, and creating a sorted bag of words. It supports detailed cleaning steps to normalize contractions, hyphens, and punctuation for accurate analysis.

The Lemmatize class applies a finite state transducer (FST) model to systematically break down words and reduce them to their lemmas by following character transitions. It uses a trie structure to store transitions between characters and determine if a word matches a stored lemma, enhancing accuracy by filtering out proper nouns through the NLP.tokenizer.

Dependencies

  • numpy

Install Dependencies

pip install numpy

Run Tests

Clone the Repository and run:

python run_tests.py

About

Custom NLP Library

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages