Before using this tool, ensure you have:
- AWS account with appropriate permissions
- S3 bucket set up
- SNS topic configured for notifications
- Python 3.8+ environment
Create the following folder structure in your S3 bucket: s3://///
your-bucket/
├── config/
│ └── config.json
├── data/
│ ├── reference/
│ │ └── reference_dataset.csv
│ ├── predictions/
│ └── current_dataset.csv
└── results/
└── feature_analysis.csv
The tool automatically:
- Loads reference and current datasets
- Performs drift detection tests
- Calculates feature importance
- Generates detailed reports
- Sends email notifications
You can specify any of these tests in your config for different columns:
- 'ks' (Kolmogorov-Smirnov test)
- 'chisquare' (Chi-square test)
- 'wasserstein' (Wasserstein distance)
- 'jensen_shannon' (Jensen-Shannon divergence)
- 'psi' (Population Stability Index)
The tool generates two types of output:
A feature_analysis.csv file containing:
- Feature name
- Timestamp
- Feature type
- Month
- Feature importance method
- Feature importance score
- Drift test used
- Drift test score
- Drift test threshold
- Drift test p-value
- Drift detected flag
You'll receive formatted ASCII tables showing:
- Overall summary statistics
- List of drifted columns with scores
- List of non-drifted columns
- Feature importance rankings
- Lambda timeout constraints
- Memory limitations based on Lambda configuration
- S3 bucket permissions must be properly configured
- SNS topic must have appropriate access policies
If you encounter issues:
- Check CloudWatch logs for detailed error messages
- Verify S3 bucket permissions
- Ensure SNS topic ARN is correct
- Validate config.json format
- Check if data types in CSV files match configuration