Skip to content

xang1234/rapid-textrank

Repository files navigation

rapid_textrank

License: MIT Python 3.9+ Rust

High-performance TextRank keyword extraction in Rust with Python bindings.

Extract keywords and key phrases from text up to 10-100x faster than pure Python implementations, with 7 core algorithm variants, an AutoRank ensemble, and stopword support for 18 languages. Computation runs in Rust; the Python GIL is released during extraction.

Install

pip install rapid_textrank

Optional extras: pip install rapid_textrank[spacy] for spaCy tokenization, pip install rapid_textrank[topic] for gensim LDA utilities.

Quick Start

from rapid_textrank import extract_keywords

text = """
Machine learning is a subset of artificial intelligence that enables
systems to learn and improve from experience. Deep learning, a type of
machine learning, uses neural networks with many layers.
"""

keywords = extract_keywords(text, top_n=5, language="en")
for phrase in keywords:
    print(f"{phrase.text}: {phrase.score:.4f}")

Output:

machine learning: 0.2341
deep learning: 0.1872
artificial intelligence: 0.1654
neural networks: 0.1432
systems: 0.0891

Recommended Default: AutoRank

If you do not want to choose a variant manually, use AutoRank. It runs the full eligible keyword ensemble for the document and returns consensus metadata alongside the ranked phrases.

from rapid_textrank import AutoRank

extractor = AutoRank(top_n=5, language="en")
result = extractor.extract_keywords(text)

for phrase, support in zip(result.phrases, result.consensus.phrase_support):
    print(
        f"{phrase.text}: {phrase.score:.4f} "
        f"(confidence={support.confidence:.2f}, variants={support.supporting_variants})"
    )

Performance

Extraction latency (single document, end-to-end including tokenization):

Document size rapid_textrank pytextrank + spaCy Speedup
~20 words ~0.1 ms ~5 ms ~50x
~100 words ~0.3 ms ~15 ms ~50x
~1,000 words ~2 ms ~80 ms ~40x

See benchmarks for methodology and full results.

Steering Extraction with BiasedTextRank

Use focus terms to pull results toward a specific domain. Here we steer toward security/privacy phrases in a mixed document:

from rapid_textrank import BiasedTextRank

text = """
We encrypt data at rest using AES-256 and enforce TLS 1.2+ for data in transit.
Access to production is gated by MFA and short-lived credentials. Audit logs are
retained for 180 days and monitored for anomalous access patterns. Personal data
processing is limited to the declared purpose, and retention follows a documented
schedule. We support DSAR workflows and apply data minimization by default.
"""

extractor = BiasedTextRank(
    focus_terms=["privacy", "encrypt", "tls", "mfa", "audit", "retention"],
    bias_weight=8.0,
    top_n=10,
    language="en",
)

result = extractor.extract_keywords(text)
for phrase in result.phrases[:5]:
    print(f"{phrase.text}: {phrase.score:.4f}")

You can override focus terms per call without creating a new extractor:

result = extractor.extract_keywords(text, focus_terms=["privacy", "retention", "dsar"])

Choosing an Algorithm

Use AutoRank when you want the library to pick and fuse the right keyword variants for you. Use the table below when you want to select a specific algorithm yourself.

Algorithm Best for Key idea
BaseTextRank General-purpose keyword extraction Standard co-occurrence graph + PageRank
PositionRank Short/structured text (abstracts, news) Biases toward early-occurring terms
BiasedTextRank Domain-focused extraction Steers PageRank toward user-supplied focus terms
SingleRank Long technical documents Weighted edges + cross-sentence co-occurrence window
TopicRank Multi-topic documents needing diversity Clusters candidates into topics, ranks topics
TopicalPageRank LDA-guided extraction Personalization vector from per-word topic weights
MultipartiteRank Fine-grained topic diversity Multipartite graph separates candidates by topic cluster

Start with BaseTextRank if unsure. See the algorithm guide for a decision flowchart.

Learn More

Topic Link
Getting Started Installation, quickstart, recipes
Algorithms How TextRank works + all 7 variants
API Reference extract_keywords(), extractor classes, JSON interface
Performance Benchmarks and comparison with alternatives
Development Contributing guide

See Also

  • pytextrank — Python TextRank implementation built on spaCy
  • KeyBERT — keyword extraction using BERT embeddings

License

MIT

About

⚡ High-performance TextRank in Rust with Python bindings. Extract keywords 10-100x faster than pure Python. Supports TextRank, PositionRank & BiasedTextRank with 18 languages.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors