Analysis of banking transactions using PySpark to detect suspicious high-value activity.
- Python 3.x
- PySpark (Apache Spark)
- Pandas
- Jupyter Notebook
- 1000 synthetic banking transactions
- 50 unique accounts
- Date Range: January 2024 – December 2024
- Load Transaction Dataset
- Detect High-Value Transactions (Threshold: 40,000)
- Group by Account
- Calculate Total Balance
- Suspicious Transactions: 194
- Completed Suspicious Transactions: 67
- Suspicious Accounts: 27
- Highest Balance Account: ACC0017 (878,654.32)
- suspicious_transactions.csv
- account_summary.csv
-
Install dependencies: pip install -r requirements.txt
-
Launch Jupyter Notebook: jupyter notebook
-
Open banking_transaction_analysis.ipynb
-
Run all cells