Look on https://hf.co/datasets for more useful datasets Unlabeled: - [x] https://huggingface.co/datasets/mjw/stock_market_tweets -> 1.7 million stock market tweets about Apple, Amazon, Google, Microsoft and Tesla stocks - [x] https://www.kaggle.com/datasets/leoth9/crypto-tweets -> 10k crypto tweets - [ ] https://aws.amazon.com/marketplace/pp/prodview-hncbgbk6sb2qs -> 100k crypto tweets - [x] https://data.mendeley.com/datasets/8fbdhh72gs/5 -> 60k crypto tweets - [x] https://data.mendeley.com/datasets/x7yvshrnxy/1 -> 250k bitcoin tweets - [x] https://www.kaggle.com/datasets/kaushiksuresh147/bitcoin-tweets -> 650k bitcoin tweets - [x] https://github.com/am15h/CrypTop12 -> 48k crypto tweets of 12 most popular coins - [x] https://www.kaggle.com/datasets/tleonel/crypto-tweets-80k-in-eng-aug-2022 -> 80k crypto tweets with "crypto" in the tweet - [x] https://www.kaggle.com/datasets/rezasemyari/crypto-sentiment-tweets -> 321MB of crypto tweets - [x] <del> https://www.kaggle.com/datasets/johnyleebrown/twitter-parsed-cryptocurrencies-data/data -> crypto tweets (not used: truncated tweets + inconsistent format) - [x] https://huggingface.co/datasets/StephanAkkerman/financial-tweets -> 47k crypto, 22k stocks, 193k other - [ ] https://github.com/yumoxu/stocknet-dataset/tree/master Twitter sentiment datasets (similar to tweet-eval): - [ ] https://www.kaggle.com/datasets/saurabhshahane/twitter-sentiment-dataset -> 160k rows (seems like Indian related tweets only) - [ ] https://huggingface.co/datasets/carblacac/twitter-sentiment-analysis -> 200k rows (no neutral) - [ ] https://huggingface.co/datasets/sentiment140 -> 1.6M rows (no neutral) - [ ] https://drive.google.com/file/d/1eB1gnQlWNnlTtFDPUXQP89axziHX-a5w/view -> 8k rows (no neutral) - [ ] https://www.kaggle.com/datasets/tariqsays/sentiment-dataset-with-1-million-tweets -> 900k (4 categories) - [ ] https://www.kaggle.com/datasets/prkhrawsthi/twitter-sentiment-dataset-3-million-labelled-rows -> 3M rows (no neutral) - [ ] https://www.kaggle.com/datasets/imrandude/twitter-sentiment-analysis -> 1M (no neutral) Labeled: - [x] https://www.kaggle.com/datasets/sbhatti/financial-sentiment-analysis -> 5322 Tweets - [x] https://github.com/moritzwilksch/MasterThesis/blob/main/data/labeled/labeled_tweets.parquet -> 10000 labeled tweets - [x] https://huggingface.co/datasets/ChanceFocus/fiqa-sentiment-classification -> 1,173 news headlines - [ ] https://data.world/mercal/btc-tweets-sentiment (human labeled?) - [ ] https://www.kaggle.com/datasets/danielfme/twitter-financial-news-sentiment/data (same as https://huggingface.co/datasets/zeroshot/twitter-financial-news-sentiment)
Look on https://hf.co/datasets for more useful datasets
Unlabeled:
https://www.kaggle.com/datasets/johnyleebrown/twitter-parsed-cryptocurrencies-data/data -> crypto tweets (not used: truncated tweets + inconsistent format)Twitter sentiment datasets (similar to tweet-eval):
Labeled: