Amazon Reviews Sentiment Analysis Notebook

Overview

This Jupyter notebook (sentiment_analysis.ipynb) performs sentiment analysis on a sample of Amazon product reviews from the dataset Reviews.csv. It compares two popular approaches:

VADER (Valence Aware Dictionary and sEntiment Reasoner): A rule-based lexicon and grammar tool for sentiment scoring.
RoBERTa (Robustly optimized BERT approach): A transformer-based pre-trained model fine-tuned for sentiment analysis on tweets (twitter-roberta-base-sentiment).

The goal is to analyze review texts, compute sentiment scores (positive, neutral, negative, compound), visualize distributions by star ratings, and compare model performances.

Dataset

Source: Amazon Reviews CSV (/Users/kasmyabhatia/Downloads/Reviews.csv)
Used: First 500 rows for efficiency.
Columns used:
- Id: Unique review ID
- Score: Star rating (1-5)
- Text: Review text

Key Steps & Code Sections

1. Imports & Setup

import pandas as pd, numpy as np, matplotlib.pyplot as plt, seaborn as sns
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
from tqdm.notebook import tqdm
from scipy.special import softmax

Loads data analysis, NLP, and ML libraries.
Applies 'ggplot' style to plots.

2. Data Loading & EDA

Loads CSV and limits to 500 rows.
Bar plot: Review count by stars (shows class imbalance).

3. NLTK Basics (Exploratory)

Tokenization, POS tagging, Named Entity Recognition (NER) on an example review.
Demonstrates basic NLP preprocessing (not used in core analysis).

4. VADER Sentiment Analysis

Initializes SentimentIntensityAnalyzer.
Computes scores: pos, neu, neg, compound for each review.
Merges results with original DF.
Visualizations:
- Bar: Compound score by stars.
- Subplots: Pos/Neu/Neg distributions by stars.

5. RoBERTa Pre-trained Model

Loads tokenizer and model: cardiffnlp/twitter-roberta-base-sentiment.
Defines polarity_scores_roberta(): Tokenizes text, runs inference, applies softmax.
Computes RoBERTa scores (roberta_neg, roberta_neu, roberta_pos) for all reviews.
Handles runtime errors gracefully.

6. Comparison & Insights

Merges VADER + RoBERTa results.
Pairplot: Scatter matrix of all sentiment scores, colored by Score.
Examples: Worst positive misclassifications (reviews with low stars but high model pos score).

7. Transformers Pipeline (Bonus)

Quick demo of Hugging Face pipeline(\"sentiment-analysis\").
Install note: !pip install transformers.

Dependencies

Run in Jupyter/Colab:

pip install pandas numpy matplotlib seaborn nltk transformers torch tqdm scipy

NLTK data: nltk.download('maxent_ne_chunker_tab'), nltk.download('words')
Note: RoBERTa requires PyTorch (auto-installed via transformers).

Visualizations Generated

Review counts by stars.
VADER compound by stars.
VADER pos/neu/neg subplots.
Pairplot of VADER + RoBERTa scores by true stars.

Key Findings (from plots/code)

Higher star ratings correlate with more positive sentiment (as expected).
RoBERTa may handle nuances better than VADER (inspect pairplot).
Misclassifications: Some 1-star reviews scored positive by models (sarcasm?).

How to Run

Notebook

Place Reviews.csv in /Users/kasmyabhatia/Downloads/ or update path.
Open sentiment_analysis.ipynb in Jupyter/VSCode.
Run all cells sequentially.
View plots inline.

Streamlit App

pip install -r requirements_app.txt
streamlit run sentiment_app.py
Open browser: Analyze single reviews or upload CSV interactively!

Potential Improvements

Full dataset processing (remove .head(500)).
Cross-validation with more metrics (accuracy, F1).
Train custom model.
Handle long texts (truncate for RoBERTa).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon Reviews Sentiment Analysis Notebook

Overview

Dataset

Key Steps & Code Sections

1. Imports & Setup

2. Data Loading & EDA

3. NLTK Basics (Exploratory)

4. VADER Sentiment Analysis

5. RoBERTa Pre-trained Model

6. Comparison & Insights

7. Transformers Pipeline (Bonus)

Dependencies

Visualizations Generated

Key Findings (from plots/code)

How to Run

Notebook

Streamlit App

Potential Improvements

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
requirements_app.txt		requirements_app.txt
sentiment_analysis.ipynb		sentiment_analysis.ipynb
sentiment_app.py		sentiment_app.py

Folders and files

Latest commit

History

Repository files navigation

Amazon Reviews Sentiment Analysis Notebook

Overview

Dataset

Key Steps & Code Sections

1. Imports & Setup

2. Data Loading & EDA

3. NLTK Basics (Exploratory)

4. VADER Sentiment Analysis

5. RoBERTa Pre-trained Model

6. Comparison & Insights

7. Transformers Pipeline (Bonus)

Dependencies

Visualizations Generated

Key Findings (from plots/code)

How to Run

Notebook

Streamlit App

Potential Improvements

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages