Research implementation for prioritizing mobile app store reviews using a weighted ranking function, Shannon Entropy, NDCG evaluation, and algorithmic bias analysis.
This repository contains the experimental pipeline developed for my thesis, "Optimizando parametros en procesamiento de comentarios de usuarios de aplicaciones moviles", and the related paper "Shannon Entropy is better Feature than Category and Sentiment in User Feedback Processing".
Mobile app stores contain large volumes of user reviews that can help developers identify bugs, feature requests, and relevant user concerns. However, these reviews are usually noisy, unstructured, and hard to prioritize manually.
This pipeline ranks app reviews according to their relevance for developers. It compares a standard weighted-function ranking based on traditional features with an entropy-enhanced ranking where Shannon Entropy replaces review length as a ranking feature.
- Prepares app review datasets for ranking experiments
- Adds Shannon Entropy as a feature extracted from review text
- Generates weighted ranking functions using exhaustive search
- Evaluates ranking quality with NDCG
- Compares standard features against entropy-enhanced features
- Detects country-based algorithmic bias using AIF360
- Applies bias mitigation with Reweighing
- Generates experiment outputs and statistics
The pipeline evaluates whether Shannon Entropy can improve user feedback prioritization in requirements engineering.
The experiments compare two feature sets:
Standard ranking:
Category + Sentiment + Score + Review Length
Entropy-enhanced ranking:
Category + Sentiment + Score + Shannon Entropy
The best entropy-enhanced configuration reported in the paper achieved a higher NDCG than the standard ranking, suggesting that entropy can capture useful information density in reviews while reducing dependency on heavier feature extraction steps.
1. Preprocessing
2. Feature Extraction
3. Ranking
4. Quality Testing
5. Bias Testing
6. Statistics
The pipeline can run four experiment modes:
1 - Weighted-function ranking with standard features
2 - Weighted-function ranking replacing Review Length with Entropy
3 - Entropy-enhanced ranking with bias evaluation
4 - Entropy-enhanced ranking with bias mitigation
Supported decimal precision values:
1.0, 0.1, 0.01, 0.001
Note: higher precision increases the number of weight combinations significantly.
The experiments use Apple App Store reviews from eight countries:
Australia
Canada
Hong Kong
India
Singapore
South Africa
United Kingdom
United States
The annotated subset contains manually ranked reviews used as ground truth for NDCG evaluation.
This implementation was tested with:
Debian 11
Python 3.9.7
R
RStudio
Python dependencies are listed in:
requirements.txt
Install them with:
pip install -r requirements.txtThe statistics stage uses R scripts, so R/RStudio must be available in the environment.
To avoid indentation errors when editing scripts, configure your text editor with:
1 tab = 4 spaces
From the pipeline directory:
cd pipeline
bash cli.shThe script asks for:
experiment number
decimal precision
Experiment outputs are saved under:
pipeline/0-Data/3_experimentes_results/
pipeline/
0-Data/ datasets, intermediate data, experiment results
1-Preprocessing/ data preparation scripts
2-FeatureExtraction/ entropy extraction
4-Ranking/ weighted ranking function and weight generation
5-QualityTesting/ NDCG evaluation
6-BiasTesting/ bias detection and mitigation
7-Statistics/ R scripts and plots
Andres Rojas Paredes, Brenda Mareco
Shannon Entropy is better Feature than Category and Sentiment in User Feedback Processing
arXiv:2409.12012