Global Health Trends Extraction (COVID-19 Twitter Analysis)

This project performs end-to-end health trend analysis using classical NLP methods (no transformers). It processes COVID-19-related tweets to extract sentiment patterns, keyword trends, locations, entities, and time-series insights.

All code and visuals come from the notebook global_health_trends_extraction.ipynb (Colab).

Features

1. Data Loading

Loads the Corona NLP dataset (CSV) via pandas.
Includes columns: Username, Location, Date, Tweet Text, Sentiment.

2. Data Cleaning

A custom clean_text() function removes:

URLs
Mentions (@user)
Hashtags
HTML codes
Extra spaces
(pages 1–2) :contentReference[oaicite:1]{index=1}

3. Sentiment Analysis (VADER)

Uses NLTK VADER to compute compound sentiment scores.
Converts scores → Positive, Negative, Neutral.
Generates sentiment-trend line graphs.
(pages 2–3 & 7–8) :contentReference[oaicite:2]{index=2}

4. Keyword Extraction (YAKE)

Extracts top keywords for each tweet.
Aggregates and plots Top 20 keywords.
(page 4 chart) :contentReference[oaicite:3]{index=3}

5. Named Entity Recognition (spaCy)

Uses en_core_web_sm.
Extracts all entities per tweet.
Filters locations only (GPE, LOC).
Plots top user locations & top mentioned locations.
(pages 5–7 visuals) :contentReference[oaicite:4]{index=4}

6. Time-Based Trend Analysis

Tweet volume per day
Sentiment trend by date
(page 7–8) :contentReference[oaicite:5]{index=5}

Technologies Used

Purpose	Technology	Reference
Data Loading	`pandas`	page 1
NLP Dataset	HuggingFace CSV	page 1
Cleaning	`re` (regex)	pages 1–2
Sentiment Analysis	`nltk` (VADER)	pages 2–3
Keyword Extraction	`yake`	pages 3–4
NER	`spaCy`	page 5
Visualization	`matplotlib`	all graphs
Date Parsing	pandas datetime	page 2

(All validated from notebook.) :contentReference[oaicite:6]{index=6}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
44_global_health_trends_extraction_using_nlp.pdf		44_global_health_trends_extraction_using_nlp.pdf
global_health_trends_extraction (1).ipynb		global_health_trends_extraction (1).ipynb
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Global Health Trends Extraction (COVID-19 Twitter Analysis)

Features

1. Data Loading

2. Data Cleaning

3. Sentiment Analysis (VADER)

4. Keyword Extraction (YAKE)

5. Named Entity Recognition (spaCy)

6. Time-Based Trend Analysis

Technologies Used

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Global Health Trends Extraction (COVID-19 Twitter Analysis)

Features

1. Data Loading

2. Data Cleaning

3. Sentiment Analysis (VADER)

4. Keyword Extraction (YAKE)

5. Named Entity Recognition (spaCy)

6. Time-Based Trend Analysis

Technologies Used

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages