Entropy-enhanced ranking pipeline for mobile app store reviews

Research implementation for prioritizing mobile app store reviews using a weighted ranking function, Shannon Entropy, NDCG evaluation, and algorithmic bias analysis.

This repository contains the experimental pipeline developed for my thesis, "Optimizando parametros en procesamiento de comentarios de usuarios de aplicaciones moviles", and the related paper "Shannon Entropy is better Feature than Category and Sentiment in User Feedback Processing".

Overview

Mobile app stores contain large volumes of user reviews that can help developers identify bugs, feature requests, and relevant user concerns. However, these reviews are usually noisy, unstructured, and hard to prioritize manually.

This pipeline ranks app reviews according to their relevance for developers. It compares a standard weighted-function ranking based on traditional features with an entropy-enhanced ranking where Shannon Entropy replaces review length as a ranking feature.

What this pipeline does

Prepares app review datasets for ranking experiments
Adds Shannon Entropy as a feature extracted from review text
Generates weighted ranking functions using exhaustive search
Evaluates ranking quality with NDCG
Compares standard features against entropy-enhanced features
Detects country-based algorithmic bias using AIF360
Applies bias mitigation with Reweighing
Generates experiment outputs and statistics

Research Context

The pipeline evaluates whether Shannon Entropy can improve user feedback prioritization in requirements engineering.

The experiments compare two feature sets:

Standard ranking:
Category + Sentiment + Score + Review Length

Entropy-enhanced ranking:
Category + Sentiment + Score + Shannon Entropy

The best entropy-enhanced configuration reported in the paper achieved a higher NDCG than the standard ranking, suggesting that entropy can capture useful information density in reviews while reducing dependency on heavier feature extraction steps.

Pipeline Stages

1. Preprocessing
2. Feature Extraction
3. Ranking
4. Quality Testing
5. Bias Testing
6. Statistics

Experiments

The pipeline can run four experiment modes:

1 - Weighted-function ranking with standard features
2 - Weighted-function ranking replacing Review Length with Entropy
3 - Entropy-enhanced ranking with bias evaluation
4 - Entropy-enhanced ranking with bias mitigation

Supported decimal precision values:

1.0, 0.1, 0.01, 0.001

Note: higher precision increases the number of weight combinations significantly.

Dataset

The experiments use Apple App Store reviews from eight countries:

Australia
Canada
Hong Kong
India
Singapore
South Africa
United Kingdom
United States

The annotated subset contains manually ranked reviews used as ground truth for NDCG evaluation.

Requirements

This implementation was tested with:

Debian 11
Python 3.9.7
R
RStudio

Python dependencies are listed in:

requirements.txt

Install them with:

pip install -r requirements.txt

The statistics stage uses R scripts, so R/RStudio must be available in the environment.

To avoid indentation errors when editing scripts, configure your text editor with:

1 tab = 4 spaces

Running the Pipeline

From the pipeline directory:

cd pipeline
bash cli.sh

The script asks for:

experiment number
decimal precision

Experiment outputs are saved under:

pipeline/0-Data/3_experimentes_results/

Repository Structure

pipeline/
  0-Data/              datasets, intermediate data, experiment results
  1-Preprocessing/     data preparation scripts
  2-FeatureExtraction/ entropy extraction
  4-Ranking/           weighted ranking function and weight generation
  5-QualityTesting/    NDCG evaluation
  6-BiasTesting/       bias detection and mitigation
  7-Statistics/        R scripts and plots

Paper

Andres Rojas Paredes, Brenda Mareco Shannon Entropy is better Feature than Category and Sentiment in User Feedback Processing
arXiv:2409.12012

Read the paper on arXiv

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.vscode		.vscode
pipeline		pipeline
.Rhistory		.Rhistory
README.md		README.md
Tesis-Brenda-Mareco.zip		Tesis-Brenda-Mareco.zip
git_menu.sh		git_menu.sh
pythonendebian11.txt		pythonendebian11.txt
requirements.txt		requirements.txt
todo.txt		todo.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Entropy-enhanced ranking pipeline for mobile app store reviews

Overview

What this pipeline does

Research Context

Pipeline Stages

Experiments

Dataset

Requirements

Running the Pipeline

Repository Structure

Paper

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Entropy-enhanced ranking pipeline for mobile app store reviews

Overview

What this pipeline does

Research Context

Pipeline Stages

Experiments

Dataset

Requirements

Running the Pipeline

Repository Structure

Paper

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages