High-performance TextRank keyword extraction in Rust with Python bindings.
Extract keywords and key phrases from text up to 10-100x faster than pure Python implementations, with 7 core algorithm variants, an AutoRank ensemble, and stopword support for 18 languages. Computation runs in Rust; the Python GIL is released during extraction.
pip install rapid_textrankOptional extras: pip install rapid_textrank[spacy] for spaCy tokenization, pip install rapid_textrank[topic] for gensim LDA utilities.
from rapid_textrank import extract_keywords
text = """
Machine learning is a subset of artificial intelligence that enables
systems to learn and improve from experience. Deep learning, a type of
machine learning, uses neural networks with many layers.
"""
keywords = extract_keywords(text, top_n=5, language="en")
for phrase in keywords:
print(f"{phrase.text}: {phrase.score:.4f}")Output:
machine learning: 0.2341
deep learning: 0.1872
artificial intelligence: 0.1654
neural networks: 0.1432
systems: 0.0891
If you do not want to choose a variant manually, use AutoRank. It runs the full eligible keyword ensemble for the document and returns consensus metadata alongside the ranked phrases.
from rapid_textrank import AutoRank
extractor = AutoRank(top_n=5, language="en")
result = extractor.extract_keywords(text)
for phrase, support in zip(result.phrases, result.consensus.phrase_support):
print(
f"{phrase.text}: {phrase.score:.4f} "
f"(confidence={support.confidence:.2f}, variants={support.supporting_variants})"
)Extraction latency (single document, end-to-end including tokenization):
| Document size | rapid_textrank | pytextrank + spaCy | Speedup |
|---|---|---|---|
| ~20 words | ~0.1 ms | ~5 ms | ~50x |
| ~100 words | ~0.3 ms | ~15 ms | ~50x |
| ~1,000 words | ~2 ms | ~80 ms | ~40x |
See benchmarks for methodology and full results.
Use focus terms to pull results toward a specific domain. Here we steer toward security/privacy phrases in a mixed document:
from rapid_textrank import BiasedTextRank
text = """
We encrypt data at rest using AES-256 and enforce TLS 1.2+ for data in transit.
Access to production is gated by MFA and short-lived credentials. Audit logs are
retained for 180 days and monitored for anomalous access patterns. Personal data
processing is limited to the declared purpose, and retention follows a documented
schedule. We support DSAR workflows and apply data minimization by default.
"""
extractor = BiasedTextRank(
focus_terms=["privacy", "encrypt", "tls", "mfa", "audit", "retention"],
bias_weight=8.0,
top_n=10,
language="en",
)
result = extractor.extract_keywords(text)
for phrase in result.phrases[:5]:
print(f"{phrase.text}: {phrase.score:.4f}")You can override focus terms per call without creating a new extractor:
result = extractor.extract_keywords(text, focus_terms=["privacy", "retention", "dsar"])Use AutoRank when you want the library to pick and fuse the right keyword variants for you. Use the table below when you want to select a specific algorithm yourself.
| Algorithm | Best for | Key idea |
|---|---|---|
| BaseTextRank | General-purpose keyword extraction | Standard co-occurrence graph + PageRank |
| PositionRank | Short/structured text (abstracts, news) | Biases toward early-occurring terms |
| BiasedTextRank | Domain-focused extraction | Steers PageRank toward user-supplied focus terms |
| SingleRank | Long technical documents | Weighted edges + cross-sentence co-occurrence window |
| TopicRank | Multi-topic documents needing diversity | Clusters candidates into topics, ranks topics |
| TopicalPageRank | LDA-guided extraction | Personalization vector from per-word topic weights |
| MultipartiteRank | Fine-grained topic diversity | Multipartite graph separates candidates by topic cluster |
Start with BaseTextRank if unsure. See the algorithm guide for a decision flowchart.
| Topic | Link |
|---|---|
| Getting Started | Installation, quickstart, recipes |
| Algorithms | How TextRank works + all 7 variants |
| API Reference | extract_keywords(), extractor classes, JSON interface |
| Performance | Benchmarks and comparison with alternatives |
| Development | Contributing guide |
- pytextrank — Python TextRank implementation built on spaCy
- KeyBERT — keyword extraction using BERT embeddings
MIT