You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We used a binary outcome based on %poor mental health prevalence.
The binary outcome was calculated by median split:
The median % poor mental health of the 500 cities was 13.89%. So…
If a city < 13.89% poor mental health → “Good Mental Health”
If a city >= 13.89% poor mental health → “Bad Mental Health”
Features were log-transformed and scaled to bring them into a normal distribution
We tried logistic regression, support vector machines (1-3 kernels), decision tree, gradient tree boost (learning rates .05 - 1), random forest, and 1-2 layer deep learning
We used 10-fold cross-validation - i.e., 10 machine learning instances of randomly allocating 90% of data to training and 10% to testing. We averaged the performance across the 10 instances.
Summary of Findings
Multiple machine learning models were used and most of them provided about 80% accuracy in their mental health risk prediction.
With Random Forest model the most strongest feature in predicting poor mental health was Standard Deviation of Income.
This suggests that income inequality in a city most predicted the prevalence of poor mental health.
Limitations
Mental health data
Small sample
Limited availability
Subjective self-rating
Differences in time frame of the datasets
Recommendations for Further Analysis
Exploring data related to:
Impact of COVID-19
Climate
Affordable healthcare
Availability of mental healthcare providers
Data from wearable devices can be used to identify physiological markers associated with mental illness
Social media and internet activity can provide insight into a behavioral side of mental health
Presenation Slides
The presentation of the project will be found on a Google Slide Presenation,
Here
Dashboards
Tableau Dashboard
Data Exporation Visuals and Machine Learning Summary