This project analyzes how short-term events (concerts, sports games, etc.) affect crime patterns in San Francisco.
It combines event data with police incident reports and applies geospatial and temporal filtering to identify patterns around events.
Key idea: compare observed crime near events against realistic baselines (same time windows without events).
The main output is an interactive dashboard for exploring event-driven crime patterns. It allows users to:
- Select a sample of events using filters
- Define spatial (radius) and temporal (± hours) windows
- Compare observed crime levels against a baseline
- Visualize trends across crime categories and time
Run:
02_MAIN_CrimeEDA.ipynb
That’s it.
The notebook expects a Data/ folder in the same directory containing:
sf_events_2024_filtered.csvPolice_Department_Incident_Reports__2018_to_Present_20250928_reduced.csv
These reduced datasets are already included and are sufficient for all analysis.
- Uses preprocessed datasets
- Fast and memory-efficient
- Fully functional interactive analysis
01_OPTIONAL_EventScraper.ipynb # Optional: scrape and clean event data
02_MAIN_CrimeEDA.ipynb # Main analysis and interactive tool
Data/
├── sf_events_2024_filtered.csv
├── Police_Department_Incident_Reports__2018_to_Present_20250928_reduced.csv
Screenshots/
├── A1.png
├── A2.png
├── B1.png
README.md
- Install required libraries:
pip install pandas numpy matplotlib seaborn geopandas plotly shapely scipy ipywidgets
- Make sure the notebooks and the Data/ folder are in the same directory
- Run:
02_MAIN_CrimeEDA.ipynb
⸻
Run:
01_OPTIONAL_EventScraper.ipynb
Only if you want to: • Scrape events again • Reprocess datasets • Work from raw data
Otherwise, skip this.
⸻
The full crime dataset is not included due to size.
You can download it from the official SF Open Data portal: https://data.sfgov.org/Public-Safety/Police-Department-Incident-Reports-2018-to-Present/wg3w-h783/about_data
After downloading, rename the file to: Police_Department_Incident_Reports__2018_to_Present_20250928.csv
Place it inside the Data folder (same directory as the notebooks).
This is only needed if you want to regenerate the reduced dataset.
⸻
• The project uses a reduced crime dataset for performance
• The reduced dataset contains all fields required for analysis
• Using the full dataset does not add meaningful analytical value
• Event dataset reduction is minimal and mainly for cleanup
⸻
• Only events within San Francisco are included
• Levi’s Stadium events were excluded (located in Santa Clara)
• Event dataset may contain repeated headliners across different dates
• This is observational analysis, not causal inference
• 311 data was explored but not used due to limited relevance
⸻
This project is for academic use only as part of Georgia Tech CSE 6242.


