pyAutoSummarizer

pyAutoSummarizer — An Extractive and Abstractive Summarization Library Powered with Artificial Intelligence.

Citation

PEREIRA, V., DE LIMA PORTO, R.C., FIGUEIRA, L.A.A., FERREIRA, R.A.C.A. (2026). Unveiling pyAutoSummarizer: An Extractive and Abstractive Summarization Library Powered with Artificial Intelligence. In: DA HORA, H., PORTER, A.L., CHIAVETTA, D., ZHANG, Y. (eds) Technology Mining. Springer, Cham. https://doi.org/10.1007/978-3-032-10849-4_2

Introduction

pyAutoSummarizer is a Python library for text summarization, covering both extractive and abstractive approaches, and providing a comprehensive suite of evaluation metrics — from classic n-gram overlap to modern semantic and faithfulness measures.

Summarization Methods

Extractive — identifies and returns the most important sentences from the original text:

Method	Description
TextRank	Graph-based ranking using sentence embeddings and cosine similarity
LexRank	Graph-based ranking using TF-IDF cosine similarity
LSA	Latent Semantic Analysis via SVD on embeddings or TF-IDF matrix
KL-Sum	Selects sentences that minimise KL-divergence from the full document distribution
BART	`facebook/bart-large-cnn` abstractive model (deep learning)
T5	`t5-base` abstractive model (deep learning)

Abstractive — generates new text that captures the meaning of the source:

Method	Description
PEGASUS	`google/pegasus-xsum` model fine-tuned for abstractive summarization
chatGPT	OpenAI `gpt-4o-mini` (or any chat model) via the OpenAI API

Text Pre-processing

The library provides a flexible pre-processing pipeline:

Lowercasing, accent removal, special character removal, number removal
Custom word removal
Stopword removal across 26 languages: Arabic, Bengali, Bulgarian, Chinese, Czech, English, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Italian, Japanese, Korean, Marathi, Persian, Polish, Portuguese-br, Romanian, Russian, Slovak, Spanish, Swedish, Thai, and Ukrainian
Sentence segmentation by punctuation, word count, or character count

Evaluation Metrics

Classic Metrics (reference-based, lexical)

Metric	Method	Returns
ROUGE-N	`rouge_N(generated, reference, n=1)`	F1, Precision, Recall
ROUGE-L	`rouge_L(generated, reference)`	F1, Precision, Recall
ROUGE-S	`rouge_S(generated, reference, skip_distance=4)`	F1, Precision, Recall
BLEU	`bleu(generated, reference, n=4)`	Score
METEOR	`meteor(generated, reference)`	Score

Semantic Metric (reference-based)

Metric	Method	Returns	Notes
BERTScore	`bert_score(generated, reference, model_type='roberta-large')`	F1, Precision, Recall	Requires `pip install bert-score`. Captures paraphrasing that ROUGE misses by comparing contextualised token embeddings.

Faithfulness / Factual Consistency Metrics (source-based, no reference needed)

These metrics check whether the summary is factually consistent with the source document, detecting hallucinations that lexical metrics cannot see.

Metric	Method	Returns	Notes
SummaC	`summa_c(generated, nli_model='cross-encoder/nli-deberta-v3-small')`	Score ∈ [0, 1]	Self-contained NLI-based faithfulness scorer using HuggingFace transformers. No extra install needed.
AlignScore	`align_score(generated, model='AlignScore-base')`	Score ∈ [0, 1]	Requires `pip install pyAutoSummarizer[faithfulness]` and `python -m spacy download en_core_web_sm`. Based on Zha et al., ACL 2023.

LLM-as-Judge Metric

Metric	Method	Returns	Notes
G-Eval	`g_eval(generated, api_key, model='gpt-4o-mini', dimensions=['coherence','consistency','fluency','relevance'])`	`dict {dimension: int 1–5}`	Uses an OpenAI chat model to score the summary across four quality dimensions. Based on Liu et al., 2023. Requires an OpenAI API key.

Installation

Core install (extractive/abstractive methods + lexical/BERTScore metrics)

pip install pyAutoSummarizer

With faithfulness metrics (AlignScore)

pip install "pyAutoSummarizer[faithfulness]"
python -m spacy download en_core_web_sm

Requirements: Python ≥ 3.9

Quick Start

from pyAutoSummarizer.base import psr

text = """
Your long text goes here. It can be multiple paragraphs.
The library will pre-process it, split it into sentences,
and summarize it using any of the available methods.
"""

# Initialise — pre-processes the text
s = psr.summarization(text, stop_words=['en'], lowercase=True,
                      rmv_accents=True, rmv_special_chars=True, rmv_numbers=True)

# --- Extractive summarization ---
rank    = s.summ_text_rank()          # TextRank
summary = s.show_summary(rank, n=3)   # top-3 sentences
print(summary)

# --- Abstractive summarization ---
summary = s.summ_abst_chatgpt(api_key='YOUR_KEY', model='gpt-4o-mini')

# --- Evaluation (classic) ---
f1, p, r = s.rouge_N(summary, reference, n=1)
bleu_s   = s.bleu(summary, reference)

# --- Evaluation (semantic) ---
f1, p, r = s.bert_score(summary, reference)

# --- Evaluation (faithfulness — no reference needed) ---
faith_sc = s.summa_c(summary)    # SummaC (built-in NLI)
align_sc = s.align_score(summary) # AlignScore (requires [faithfulness] extra)

# --- Evaluation (LLM-as-judge) ---
scores   = s.g_eval(summary, api_key='YOUR_KEY')
# {'coherence': 4, 'consistency': 5, 'fluency': 5, 'relevance': 4}

Colab Demos

Extractive Summarization

Abstractive Summarization

chatGPT — requires an OpenAI API key
PEGASUS

Related Projects

pyBibX — A Bibliometric and Scientometric Python Library Powered with Artificial Intelligence Tools

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
pyAutoSummarizer		pyAutoSummarizer
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pyAutoSummarizer

Citation

Introduction

Summarization Methods

Text Pre-processing

Evaluation Metrics

Classic Metrics (reference-based, lexical)

Semantic Metric (reference-based)

Faithfulness / Factual Consistency Metrics (source-based, no reference needed)

LLM-as-Judge Metric

Installation

Core install (extractive/abstractive methods + lexical/BERTScore metrics)

With faithfulness metrics (AlignScore)

Quick Start

Colab Demos

Related Projects

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pyAutoSummarizer

Citation

Introduction

Summarization Methods

Text Pre-processing

Evaluation Metrics

Classic Metrics (reference-based, lexical)

Semantic Metric (reference-based)

Faithfulness / Factual Consistency Metrics (source-based, no reference needed)

LLM-as-Judge Metric

Installation

Core install (extractive/abstractive methods + lexical/BERTScore metrics)

With faithfulness metrics (AlignScore)

Quick Start

Colab Demos

Related Projects

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages