pyAutoSummarizer — An Extractive and Abstractive Summarization Library Powered with Artificial Intelligence.
PEREIRA, V., DE LIMA PORTO, R.C., FIGUEIRA, L.A.A., FERREIRA, R.A.C.A. (2026). Unveiling pyAutoSummarizer: An Extractive and Abstractive Summarization Library Powered with Artificial Intelligence. In: DA HORA, H., PORTER, A.L., CHIAVETTA, D., ZHANG, Y. (eds) Technology Mining. Springer, Cham. https://doi.org/10.1007/978-3-032-10849-4_2
pyAutoSummarizer is a Python library for text summarization, covering both extractive and abstractive approaches, and providing a comprehensive suite of evaluation metrics — from classic n-gram overlap to modern semantic and faithfulness measures.
Extractive — identifies and returns the most important sentences from the original text:
| Method | Description |
|---|---|
| TextRank | Graph-based ranking using sentence embeddings and cosine similarity |
| LexRank | Graph-based ranking using TF-IDF cosine similarity |
| LSA | Latent Semantic Analysis via SVD on embeddings or TF-IDF matrix |
| KL-Sum | Selects sentences that minimise KL-divergence from the full document distribution |
| BART | facebook/bart-large-cnn abstractive model (deep learning) |
| T5 | t5-base abstractive model (deep learning) |
Abstractive — generates new text that captures the meaning of the source:
| Method | Description |
|---|---|
| PEGASUS | google/pegasus-xsum model fine-tuned for abstractive summarization |
| chatGPT | OpenAI gpt-4o-mini (or any chat model) via the OpenAI API |
The library provides a flexible pre-processing pipeline:
- Lowercasing, accent removal, special character removal, number removal
- Custom word removal
- Stopword removal across 26 languages: Arabic, Bengali, Bulgarian, Chinese, Czech, English, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Italian, Japanese, Korean, Marathi, Persian, Polish, Portuguese-br, Romanian, Russian, Slovak, Spanish, Swedish, Thai, and Ukrainian
- Sentence segmentation by punctuation, word count, or character count
| Metric | Method | Returns |
|---|---|---|
| ROUGE-N | rouge_N(generated, reference, n=1) |
F1, Precision, Recall |
| ROUGE-L | rouge_L(generated, reference) |
F1, Precision, Recall |
| ROUGE-S | rouge_S(generated, reference, skip_distance=4) |
F1, Precision, Recall |
| BLEU | bleu(generated, reference, n=4) |
Score |
| METEOR | meteor(generated, reference) |
Score |
| Metric | Method | Returns | Notes |
|---|---|---|---|
| BERTScore | bert_score(generated, reference, model_type='roberta-large') |
F1, Precision, Recall | Requires pip install bert-score. Captures paraphrasing that ROUGE misses by comparing contextualised token embeddings. |
These metrics check whether the summary is factually consistent with the source document, detecting hallucinations that lexical metrics cannot see.
| Metric | Method | Returns | Notes |
|---|---|---|---|
| SummaC | summa_c(generated, nli_model='cross-encoder/nli-deberta-v3-small') |
Score ∈ [0, 1] | Self-contained NLI-based faithfulness scorer using HuggingFace transformers. No extra install needed. |
| AlignScore | align_score(generated, model='AlignScore-base') |
Score ∈ [0, 1] | Requires pip install pyAutoSummarizer[faithfulness] and python -m spacy download en_core_web_sm. Based on Zha et al., ACL 2023. |
| Metric | Method | Returns | Notes |
|---|---|---|---|
| G-Eval | g_eval(generated, api_key, model='gpt-4o-mini', dimensions=['coherence','consistency','fluency','relevance']) |
dict {dimension: int 1–5} |
Uses an OpenAI chat model to score the summary across four quality dimensions. Based on Liu et al., 2023. Requires an OpenAI API key. |
pip install pyAutoSummarizerpip install "pyAutoSummarizer[faithfulness]"
python -m spacy download en_core_web_smRequirements: Python ≥ 3.9
from pyAutoSummarizer.base import psr
text = """
Your long text goes here. It can be multiple paragraphs.
The library will pre-process it, split it into sentences,
and summarize it using any of the available methods.
"""
# Initialise — pre-processes the text
s = psr.summarization(text, stop_words=['en'], lowercase=True,
rmv_accents=True, rmv_special_chars=True, rmv_numbers=True)
# --- Extractive summarization ---
rank = s.summ_text_rank() # TextRank
summary = s.show_summary(rank, n=3) # top-3 sentences
print(summary)
# --- Abstractive summarization ---
summary = s.summ_abst_chatgpt(api_key='YOUR_KEY', model='gpt-4o-mini')
# --- Evaluation (classic) ---
f1, p, r = s.rouge_N(summary, reference, n=1)
bleu_s = s.bleu(summary, reference)
# --- Evaluation (semantic) ---
f1, p, r = s.bert_score(summary, reference)
# --- Evaluation (faithfulness — no reference needed) ---
faith_sc = s.summa_c(summary) # SummaC (built-in NLI)
align_sc = s.align_score(summary) # AlignScore (requires [faithfulness] extra)
# --- Evaluation (LLM-as-judge) ---
scores = s.g_eval(summary, api_key='YOUR_KEY')
# {'coherence': 4, 'consistency': 5, 'fluency': 5, 'relevance': 4}Extractive Summarization
Abstractive Summarization
- chatGPT — requires an OpenAI API key
- PEGASUS
- pyBibX — A Bibliometric and Scientometric Python Library Powered with Artificial Intelligence Tools