The most comprehensive, schema-driven NLP pre-processing framework for Amharic.
Website · Issues · Contributing
Fidel Tools is a modular, schema-driven pre-processing and Natural Language Processing (NLP) toolkit designed specifically for Amharic and other Ethiopic script text. It provides high-performance components out of the box including character normalization, sentence boundary tokenization, prefix-aware stopword removal, light stemming, and bidirectional transliteration.
Natural Language Processing in the Ethiopic ecosystem is a half-solved problem. Most implementations require hardcoded, unconfigurable logic and suffer from low accuracy. We believe developers deserve a production-grade, highly customizable solution hence, Fidel Tools.
Fidel Tools is managed as a monorepo workspace. Check the individual package directories and their respective changelogs:
| Package | Description | Version | Changelog |
|---|---|---|---|
@fidel-tools/core |
Core processing pipeline and NLP engine | 0.1.6 |
Changelog |
@fidel-tools/lang-am |
Amharic language pack & schema configurations | 0.1.6 |
Changelog |
@fidel-tools/validate-pack |
CLI tool to validate & fix language packs | 0.1.6 |
Changelog |
pnpm add @fidel-tools/core @fidel-tools/lang-amimport { Pipeline } from '@fidel-tools/core'
import amPack from '@fidel-tools/lang-am'
const nlp = new Pipeline(amPack)
// Normalize homophones, labialization, and gemination
const text = nlp.normalize("ሐኪም ኀይሉ በልቷልልል!")
console.log(text) // "ሃኪም ሃይሉ በልቱዋልል!"
// Remove stopwords using boundary rules
const cleaned = nlp.removeStopwords("ያወጣውን የተጨማሪ እሴት")
console.log(cleaned) // "ያወጣውን የ እሴት"
// Stem Amharic words
const stem = nlp.stem("ልጆቻቸውን")
console.log(stem) // "ልጅ"The built-in rate limiter (apps/api/src/middleware/rateLimiter.ts) stores request counters in an in-memory Javascript store. While perfectly sufficient for single-instance applications or local testing, this state is volatile and resets on server restarts. If you are deploying the API across multiple distributed instances, it is recommended to refactor the memory store in rateLimiter.ts to utilize a shared cache database like Redis.
Fidel Tools is free and open-source software licensed under the MIT License. You can support the project by:
- Contributing features, fixes, or new language packs. Read our Contributing Guide.
- Opening issues or submitting feature requests.
The processing logic draws on academic foundations in Ethiopic NLP:
- Girma Neshir Alemneh. “Amharic Light Stemmer”. ResearchGate. Sep 2020.
- Genet Mezemir Fikremariam. “Automatic Stemming for Amharic text: An experiment using successor variety approach”. AAU. Jan 2009.
- Tessema Mindaye Mengistu. “Design and Implementation of Amharic Search Engine”. ResearchGate. August 2007.
- Yitna Firdyiwek and Daniel Yaqob. “The System for Ethiopic Representation in ASCII”. ResearchGate. Jan 1997.
