A professional data cleaning and preprocessing application built with Streamlit, Pandas, and NumPy.
The tool helps users quickly analyze datasets, identify common data quality issues, apply automated cleaning techniques, and download the cleaned data for further analysis.
- Upload CSV files
- Dataset preview
- Dataset profiling dashboard
- Missing value detection
- Duplicate row detection
- Outlier detection using NumPy Z-Score
- Data quality scoring
- Smart rule-based cleaning suggestions
- Automatic data cleaning
- Cleaned dataset preview
- Download cleaned dataset as CSV
- Python
- NumPy
- Pandas
- Streamlit
The Auto Clean feature performs:
- Removal of duplicate rows
- Filling numeric missing values using median
- Filling categorical missing values using mode
Outliers are detected using the Z-Score method implemented entirely with NumPy without using SciPy.
Install dependencies:
pip install -r requirements.txtStart the application:
streamlit run app.py