A multi-country social media analytics system that scrapes TikTok comments, performs sentiment analysis, predicts trending topics, and visualises insights through an interactive dashboard — covering Malaysia, Japan, Philippines, Singapore, and South Korea.
This project analyses TikTok comment data to understand public sentiment and predict emerging social media trends across Southeast and East Asia. It combines NLP-based sentiment analysis, machine learning-based trend prediction (One-Class SVM), geographic information system (GIS) visualisation, and an AI-powered chatbot, all surfaced through a web dashboard.
- 💬 Comments Scraper — Scrapes and collects TikTok comments for analysis.
- 🌏 Multi-Country Sentiment Analysis — Analyses sentiment (positive/negative/neutral) of TikTok comments for MY, JP, PH, SG, and SK using transformer-based models and TextBlob.
- 📊 Trend Prediction — Uses trained One-Class SVM models to predict whether content is trending, per country.
- 🗺️ GIS Visualisation — Maps sentiment and trend data geographically across the covered regions.
- 🤖 AI Chatbot — A conversational chatbot (powered by Flask) to query and explore the analysed data.
- 🖥️ Interactive Dashboard — A unified HTML dashboard to view insights from all modules.
| Country | Code | Notebooks |
|---|---|---|
| Malaysia | MY | MY Sentiment, Malaysia Sentiment Analysis, Malaysia Prediction, One-Class SVM MY |
| Japan | JP | JP Sentiment, Japan Sentiment Analysis, JP Prediction, One-Class SVM Japan |
| Philippines | PH | PH Sentiment, PH Sentiment Analysis, PH Prediction, One-Class SVM PH |
| Singapore | SG | SG Sentiment, Singapore Sentiment Analysis, SG Prediction, One-Class SVM SG |
| South Korea | SK | SK Sentiment, SK Sentiment Analysis, SK Prediction, One-Class SVM SK |
- Data Processing — pandas, numpy, tqdm
- NLP & Sentiment — Hugging Face Transformers, TextBlob, NLTK, langdetect, deep-translator, emoji, fugashi (Japanese tokenizer), sentencepiece
- Machine Learning — scikit-learn (One-Class SVM), joblib
- Deep Learning — PyTorch, torchvision, torchaudio, accelerate, timm
- Backend / API — Flask, flask-cors
- Frontend — HTML/CSS/JS (
dashboard.html) - Notebooks — Jupyter Notebook
SocialMediaTrendPredictor/
├── Datasets/ # Raw and processed datasets
│
├── Comments Scraper.ipynb # TikTok comment scraping
├── GIS.ipynb # Geographic visualisation
├── Chatbot.ipynb # AI chatbot backend
│
├── Malaysia Sentiment Analysis.ipynb # Full sentiment pipeline (MY)
├── Japan Sentiment Analysis.ipynb # Full sentiment pipeline (JP)
├── PH Sentiment Analysis.ipynb # Full sentiment pipeline (PH)
├── Singapore Sentiment Analysis.ipynb # Full sentiment pipeline (SG)
├── SK Sentiment Analysis.ipynb # Full sentiment pipeline (SK)
│
├── MY Sentiment.ipynb # Sentiment model (MY)
├── JP Sentiment.ipynb # Sentiment model (JP)
├── PH Sentiment.ipynb # Sentiment model (PH)
├── SG Sentiment.ipynb # Sentiment model (SG)
├── SK Sentiment.ipynb # Sentiment model (SK)
│
├── Malaysia Prediction.ipynb # Trend prediction (MY)
├── JP Prediction.ipynb # Trend prediction (JP)
├── PH Prediction.ipynb # Trend prediction (PH)
├── SG Prediction.ipynb # Trend prediction (SG)
├── SK Prediction.ipynb # Trend prediction (SK)
│
├── One-Class SVM MY.ipynb # SVM model training (MY)
├── One-Class SVM Japan.ipynb # SVM model training (JP)
├── One-Class SVM PH.ipynb # SVM model training (PH)
├── One-Class SVM SG.ipynb # SVM model training (SG)
├── One-Class SVM SK.ipynb # SVM model training (SK)
│
├── *_cleaned_annotated_tiktok_comments.csv # Cleaned datasets per country
├── *_one_class_svm_tiktok.pkl # Trained SVM models per country
├── *_scaler_tiktok.pkl # Fitted scalers per country
│
├── dashboard.html # Web dashboard
└── requirements.txt # Python dependencies
- Python 3.9+
- Jupyter Notebook
- A modern web browser
- Visual Studio Code (for running the dashboard)
Note: All required libraries are already installed inline within each
.ipynbfile. Therequirements.txtis provided as a reference only.
git clone https://github.com/KX-ai/SocialMediaTrendPredictor.git
cd SocialMediaTrendPredictor
pip install -r requirements.txtThe system requires two applications running simultaneously: Jupyter Notebook and VS Code.
Open and run the notebooks in the following order:
GIS.ipynb— Starts the GIS/mapping backendChatbot.ipynb— Starts the chatbot Flask server- Trend Prediction notebooks (run all five):
Malaysia Prediction.ipynbJP Prediction.ipynbPH Prediction.ipynbSG Prediction.ipynbSK Prediction.ipynb
- Sentiment Analysis notebooks (run all five):
MY Sentiment.ipynbJP Sentiment.ipynbPH Sentiment.ipynbSG Sentiment.ipynbSK Sentiment.ipynb
- Open
dashboard.htmlin VS Code - Launch it with the Live Server extension (or open directly in a browser)
You should now be able to use the full system through the dashboard! 🎉
Pre-cleaned and annotated TikTok comment datasets are included for all five countries:
my_cleaned_annotated_tiktok_comments.csvjp_cleaned_annotated_tiktok_comments.csvph_cleaned_annotated_tiktok_comments.csvsg_cleaned_annotated_tiktok_comments.csvsk_cleaned_annotated_tiktok_comments.csv
Additional raw datasets can be found in the Datasets/ folder.
Trained One-Class SVM models and their corresponding scalers are included for each country, ready for inference:
| Country | Model | Scaler |
|---|---|---|
| Malaysia | my_one_class_svm_tiktok.pkl |
my_scaler_tiktok.pkl |
| Japan | jp_one_class_svm_tiktok.pkl |
jp_scaler_tiktok.pkl |
| Philippines | ph_one_class_svm_tiktok.pkl |
ph_scaler_tiktok.pkl |
| Singapore | sg_one_class_svm_tiktok.pkl |
sg_scaler_tiktok.pkl |
| South Korea | sk_one_class_svm_tiktok.pkl |
sk_scaler_tiktok.pkl |
This project is open source. Feel free to fork and build on it.