Crawl and Visualize NeurIPS 2021 Datasets and Benchmarks Track OpenReview Data

based on evanzd/ICLR2021-OpenReviewData

Descriptions

This Jupyter Notebook contains the data crawled from NeurIPS 2021 Datasets and Benchmarks Track OpenReview webpages and their visualizations. The list of submissions (sorted by the average ratings) can be found here.

Prerequisites

python 3.7
selenium
pandas
seaborn
imageio
wordcloud
tqdm
edgewebdriver
- NOTE: You can also use chromedriver by setting driver = webdriver.Chrome('chromedriver.exe').

Crawl Data

Run crawl_paperlist.py to crawl the list of papers (~0.5h).
Run crawl_reviews.py to crawl the reviews (~1.5h).
- NOTE: currently only review ratings are crawled.

Visualization

Keywords Frequency

The top 50 common keywords (uncased) and their frequency:

Keywords Cloud

The word clouds formed by keywords of submissions show the hot topics including deep learning, reinforcement learning, representation learning, graph neural network, etc.

Ratings Distribution

The distribution of reviewer ratings centers around 5 (mean: 5.367).

Keywords vs Ratings

The average reviewer ratings and the frequency of keywords indicate that to maximize your chance to get higher ratings would be using the keywords such as deep generative models, or normalizing flows.

All NeurIPS 2021 Datasets and Benchmarks Track Submissions

(Collected at 07/08/2020).

Rank	AvgRating	STDRating	Title	Ratings	Decision
1	8.5	1	RadGraph: Extracting Clinical Entities and Relations from Radiology Reports	9, 9, 9, 7	Unknown
2	8.33	1.15	ATOM3D: Tasks on Molecules in Three Dimensions	9, 9, 7	Unknown
3	8.33	1.53	Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks	7, 10, 8	Unknown
4	8	0	Programming Puzzles	8, 8, 8	Unknown
5	8	1	A Large-Scale Database for Graph Representation Learning	7, 8, 9	Unknown
6	8	1	CCNLab: A Benchmarking Framework for Computational Cognitive Neuroscience	8, 9, 7	Unknown
7	8	1.73	CommonsenseQA 2.0: Exposing the Limits of AI through Gamification	7, 7, 10	Unknown
8	7.67	0.58	Q-Pain: A Question Answering Dataset to Measure Social Bias in Pain Management	8, 7, 8	Unknown
9	7.67	0.58	Which priors matter? Benchmarking models for learning latent dynamics	8, 8, 7	Unknown
10	7.67	0.58	NaturalProofs: Mathematical Theorem Proving in Natural Language	8, 7, 8	Unknown
11	7.67	1.53	It's COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks	8, 9, 6	Unknown
12	7.67	2.08	EEGEyeNet: a Simultaneous Electroencephalography and Eye-tracking Dataset and Benchmark for Eye Movement Prediction	6, 10, 7	Unknown
13	7.33	0.58	LiRo: Benchmark and leaderboard for Romanian language tasks	7, 7, 8	Unknown
14	7.33	0.58	TenSet: A Large-scale Program Performance Dataset for Learned Tensor Compilers	7, 7, 8	Unknown
15	7.33	0.58	A Unified Few-Shot Classification Benchmark to Compare Transfer and Meta Learning Approaches	8, 7, 7	Unknown
16	7.33	1.15	Variance-Aware Machine Translation Test Sets	6, 8, 8	Unknown
17	7.33	1.53	Teach Me to Explain: A Review of Datasets for Explainable Natural Language Processing	6, 7, 9	Unknown
18	7.33	1.53	Automatic Construction of Evaluation Suites for Natural Language Generation Datasets	7, 6, 9	Unknown
19	7	0	EventNarrative: A large-scale Event-centric Dataset for Knowledge Graph-to-Text Generation	7, 7, 7	Unknown
20	7	0	A Spoken Language Dataset of Descriptions for Speech-Based Grounded Language Learning	7, 7, 7	Unknown
21	7	1	FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information	6, 8, 7	Unknown
22	7	1	CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer	7, 6, 8	Unknown
23	7	1	B-Pref: Benchmarking Preference-Based Reinforcement Learning	6, 7, 8	Unknown
24	7	1	The Multi-Agent Behavior Dataset: Mouse Dyadic Social Interactions	6, 7, 8	Unknown
25	7	1	CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms	6, 8, 7	Unknown
26	7	1	A Procedural World Generation Framework for Systematic Evaluation of Continual Learning	6, 8, 7	Unknown
27	7	1.73	ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation	6, 9, 6	Unknown
28	7	1.73	Personalized Benchmarking with the Ludwig Benchmarking Toolkit	8, 5, 8	Unknown
29	7	2.65	PASS: An ImageNet replacement for self-supervised pretraining without humans	5, 6, 10	Unknown
30	7	2.65	Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development	9, 8, 4	Unknown
31	6.75	0.96	CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review	8, 7, 6, 6	Unknown
32	6.75	1.5	ReaSCAN: Compositional Reasoning in Language Grounding	8, 8, 5, 6	Unknown
33	6.67	0.58	The PAIR-R24M Dataset for Multi-animal 3D Pose Estimation	7, 7, 6	Unknown
34	6.67	0.58	Generating Datasets of 3D Garments with Sewing Patterns	7, 6, 7	Unknown
35	6.67	1.15	HiRID-ICU-Benchmark --- A Comprehensive Machine Learning Benchmark on High-resolution ICU Data.	6, 8, 6	Unknown
36	6.67	1.15	Contemporary Symbolic Regression Methods and their Relative Performance	6, 6, 8	Unknown
37	6.67	1.53	The People’s Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage	8, 5, 7	Unknown
38	6.4	1.14	Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks	7, 8, 6, 6, 5	Unknown
39	6.33	0.58	Multimodal AutoML on Tables with Text Fields	6, 7, 6	Unknown
40	6.33	0.58	MultiBench: Multiscale Benchmarks for Multimodal Representation Learning	7, 6, 6	Unknown
41	6.33	0.58	Benchmarking Bias Mitigation Algorithms in Representation Learning through Fairness Metrics	6, 6, 7	Unknown
42	6.33	0.58	FedScale: Benchmarking Model and System Performance of Federated Learning	6, 6, 7	Unknown
43	6.33	0.58	OmniPrint: A Configurable Printed Character Synthesizer	7, 6, 6	Unknown
44	6.33	0.58	The Neural MMO Platform for Massively Multiagent Research	6, 7, 6	Unknown
45	6.33	0.58	Modeling Worlds in Text	6, 7, 6	Unknown
46	6.33	1.15	An Extensible Benchmark Suite for Learning to Simulate Physical Systems	7, 7, 5	Unknown
47	6.33	1.53	Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation	6, 5, 8	Unknown
48	6.33	1.53	VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation	8, 5, 6	Unknown
49	6.33	1.53	Protein-Ligand Docking Surrogate Models: A SARS-CoV-2 Benchmark for Deep Learning Accelerated Virtual Screening	6, 8, 5	Unknown
50	6.33	2.08	RedCaps: Web-curated image-text data created by the people, for the people	8, 7, 4	Unknown
51	6.33	2.52	Physion: Evaluating Physical Prediction from Vision in Humans and Machines	9, 4, 6	Unknown
52	6.25	1.26	AIT-QA: Question Answering Dataset over Complex Tables in the Airline Industry	6, 5, 6, 8	Unknown
53	6.25	1.71	Reinforcement Learning Benchmarks for Traffic Signal Control	4, 6, 7, 8	Unknown
54	6	0	Pl@ntNet-300K: a new plant image dataset for the evaluation of set-valued classifiers	6, 6, 6	Unknown
55	6	0	Benchmark for Compositional Text-to-Image Synthesis	6, 6, 6	Unknown
56	6	0	A Benchmark of Medical Out of Distribution Detection	6, 6, 6	Unknown
57	6	0.82	DABS: a Domain-Agnostic Benchmark for Self-Supervised Learning	6, 7, 6, 5	Unknown
58	6	1	MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research	6, 5, 7	Unknown
59	6	1	One Million Scenes for Autonomous Driving: ONCE Dataset	6, 5, 7	Unknown
60	6	1	Towards a robust experimental framework and benchmark for lifelong language learning	7, 6, 5	Unknown
61	6	1	The Caltech Off-Policy Policy Evaluation Benchmarking Suite	7, 5, 6	Unknown
62	6	1	Revisiting Time Series Outlier Detection: Definitions and Benchmarks	5, 6, 7	Unknown
63	6	1	Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning	7, 5, 6	Unknown
64	6	1	MLPerf Tiny Benchmark	7, 6, 5	Unknown
65	6	1.73	Measuring the State of Document Understanding	5, 5, 8	Unknown
66	6	1.73	Vox Populi, Vox DIY: Benchmark Dataset for Crowdsourced Audio Transcription	4, 7, 7	Unknown
67	6	1.73	The Medkit-Learn(ing) Environment: Medical Decision Modelling through Simulation	7, 7, 4	Unknown
68	6	1.73	CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions	5, 8, 5	Unknown
69	6	1.73	LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptation Semantic Segmentation	7, 4, 7	Unknown
70	6	1.73	CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation	4, 7, 7	Unknown
71	6	2	QAConv: Question Answering on Informative Conversations	8, 6, 4	Unknown
72	6	2	MDP Playground: A Design and Debug Testbed for Reinforcement Learning	4, 6, 8	Unknown
73	6	2	RobustBench: a standardized adversarial robustness benchmark	6, 8, 4	Unknown
74	6	2.16	HumBugDB: a large-scale acoustic mosquito dataset	4, 6, 5, 9	Unknown
75	6	2.16	The Single Hyperparameter Benchmark for Reinforcement Learning	6, 7, 3, 8	Unknown
76	5.75	0.5	TWEET-FD: A Dataset for Multiple Foodborne Illness Incident Detection Tasks	6, 6, 5, 6	Unknown
77	5.75	0.96	The Benchmark Lottery	6, 7, 5, 5	Unknown
78	5.75	0.96	Challenging America: Digitized Newspapers as a Source of Machine Learning Challenges	7, 5, 5, 6	Unknown
79	5.75	1.26	Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers	6, 7, 6, 4	Unknown
80	5.67	0.58	An Empirical Study of Graph Contrastive Learning	6, 6, 5	Unknown
81	5.67	0.58	CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark	6, 6, 5	Unknown
82	5.67	0.58	FHIST: A Benchmark for Few-shot Classification of Histological Images	6, 6, 5	Unknown
83	5.67	0.58	HPO-B: a Large-Scale Reproducible Benchmark for Black-Box HPO based on OpenML	6, 6, 5	Unknown
84	5.67	0.58	All-In-One Drive: A Comprehensive Perception Dataset with High-Density Long-Range Point Clouds	6, 5, 6	Unknown
85	5.67	0.58	Hangul Fonts Dataset: a Hierarchical and Compositional Dataset for Investigating Learned Representations	5, 6, 6	Unknown
86	5.67	0.58	JECC: Commonsense Reasoning Tasks Derived fromInteractive Fictions	5, 6, 6	Unknown
87	5.67	0.58	RealCause: Realistic Causal Inference Benchmarking	5, 6, 6	Unknown
88	5.67	0.58	CityNet: A Multi-city Multi-modal Dataset for Smart City Applications	5, 6, 6	Unknown
89	5.67	0.58	BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling	5, 6, 6	Unknown
90	5.67	1.15	GrowSpace: Learning How to Shape Plants	5, 7, 5	Unknown
91	5.67	1.15	WaveFake: A Data Set to Facilitate Audio DeepFake Detection	5, 7, 5	Unknown
92	5.67	1.15	StudentSADD: Mobile Depression and Suicidal Ideation Screening of College Students during the Coronavirus Pandemic	5, 7, 5	Unknown
93	5.67	1.53	GitTables: A Large-Scale Corpus of Relational Tables	4, 6, 7	Unknown
94	5.67	1.53	A ground-truth dataset of real security patches	7, 6, 4	Unknown
95	5.67	1.53	ImageNet-21K Pretraining for the Masses	4, 7, 6	Unknown
96	5.67	2.08	Cluster3D: A Dataset and Benchmark for Clustering Non-Categorical 3D CAD Models	8, 5, 4	Unknown
97	5.5	1.73	A Realistic Simulation Framework for Learning with Label Noise	7, 4, 4, 7	Unknown
98	5.33	0.58	Benchmarking the Robustness of CNN-based Spatial-Temporal Models	6, 5, 5	Unknown
99	5.33	0.58	Rethinking the Role of Hyperparameter Tuning in Optimizer Benchmarking	5, 5, 6	Unknown
100	5.33	0.58	Natural Adversarial Objects	5, 5, 6	Unknown
101	5.33	0.58	ExpMRC: Explainability Evaluation for Machine Reading Comprehension	5, 6, 5	Unknown
102	5.33	0.58	ZeroWaste Dataset: Towards Automated Waste Recycling	5, 5, 6	Unknown
103	5.33	1.15	Brax - A Differentiable Physics Engine for Large Scale Rigid Body Simulation	6, 6, 4	Unknown
104	5.33	1.15	The Argoverse Trajectory Retrieval Benchmark	6, 6, 4	Unknown
105	5.33	1.15	Addressing "Documentation Debt" in Machine Learning: A Retrospective Datasheet for BookCorpus	4, 6, 6	Unknown
106	5.33	1.53	Towards a universal dataset and metrics for training and evaluating table extraction models	5, 4, 7	Unknown
107	5.33	1.53	MMVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting	7, 4, 5	Unknown
108	5.33	1.53	PROCAT: Product Catalogue Dataset for Implicit Clustering, Permutation Learning and Structure Prediction	4, 7, 5	Unknown
109	5.33	1.53	Monash Time Series Forecasting Archive	7, 4, 5	Unknown
110	5.33	2.08	SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous Driving	7, 3, 6	Unknown
111	5.33	2.31	Chest ImaGenome Dataset for Clinical Reasoning	4, 8, 4	Unknown
112	5.33	2.31	Synthetic Benchmarks for Scientific Research in Explainable Machine Learning	8, 4, 4	Unknown
113	5.25	0.96	Irregular Shape Instance Segmentation	6, 4, 5, 6	Unknown
114	5	0	On Predicting and Generating a Good Break Shot in Billiards Sports	5, 5, 5	Unknown
115	5	0	Robustness Disparities in Commercial Face Detection	5, 5, 5	Unknown
116	5	0.82	NATURE: Natural Auxiliary Text Utterances forRealistic Spoken Language Evaluation	4, 5, 6, 5	Unknown
117	5	1	VISIOCITY: A New Benchmarking Dataset and Evaluation Framework Towards Realistic Video Summarization	5, 6, 4	Unknown
118	5	1	A Dataset of Discovering Drug-Target Interaction from Biomedical Literature	4, 5, 6	Unknown
119	5	1	MQBench: Towards Reproducible and Deployable Model Quantization Benchmark	4, 6, 5	Unknown
120	5	1	Particulate Matter Dataset Collected with Vehicle Mounted IoT Devices in Delhi-NCR	6, 4, 5	Unknown
121	5	1	Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions	4, 6, 5	Unknown
122	5	1	VoxelScape: Large Scale Simulated 3D Point Cloud Dataset of Urban Traffic Environments	4, 6, 5	Unknown
123	5	1	SeasonDepth: Cross-Season Monocular Depth Prediction Dataset and Benchmark under Multiple Environments	6, 5, 4, 4, 6	Unknown
124	5	1.15	ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data	4, 6, 4, 6	Unknown
125	5	1.73	ComSum: Commit Messages Summarization and Meaning Preservation	6, 6, 3	Unknown
126	5	1.73	RealCity3D: A Large-scale Georeferenced 3D Shape Dataset of Real-world Cities	7, 4, 4	Unknown
127	5	2	Graph Robustness Benchmark: Rethinking and Benchmarking Adversarial Robustness of Graph Neural Networks	7, 5, 3	Unknown
128	5	2	Turath-150K: Image Database of Arab Heritage	3, 7, 5	Unknown
129	5	2.45	Scenic4RL: Programmatic Modeling and Generation of Reinforcement Learning Environments	5, 5, 8, 2	Unknown
130	4.75	0.96	Hi-Phy: A Benchmark for Hierarchical Physical Reasoning	5, 4, 6, 4	Unknown
131	4.75	1.71	SMART-Rain: A Degradation Evaluation Dataset for Autonomous Driving in Rain	3, 4, 7, 5	Unknown
132	4.67	0.58	A2X: An Agent and Environment Interaction Benchmark for Multimodal Human Trajectory Prediction	5, 4, 5	Unknown
133	4.67	1.15	The Tarteel Dataset: Crowd-Sourced and Labeled Quranic Recitation	4, 4, 6	Unknown
134	4.67	1.15	Arena: A Scalable and Configurable Benchmark for Policy Learning	4, 6, 4	Unknown
135	4.67	2.08	DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction	7, 4, 3	Unknown
136	4.67	2.52	Age dataset: age of death and other basic information for over 1.31 million historical figures (data, methods and applications)	2, 7, 5	Unknown
137	4.33	0.58	NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning	4, 4, 5	Unknown
138	4.33	0.58	Geon3D: Benchmarking 3D Shape Bias towards Building Robust Machine Vision	5, 4, 4	Unknown
139	4.33	0.58	A Framework for Cluster and Classifier Evaluation in the Absence of Reference Labels	4, 5, 4	Unknown
140	4.33	0.58	Karenina: Modeling the Complexity of Negative Emotions to Better Serve Industry Goals	5, 4, 4	Unknown
141	4.33	0.58	DermX: a Dermatological Diagnosis Explainability Dataset	5, 4, 4	Unknown
142	4.33	0.58	CatLC: Catalonia Multiresolution Land Cover Dataset	5, 4, 4	Unknown
143	4.33	0.58	Modeling and Optimizing Laser-Induced Graphene	4, 5, 4	Unknown
144	4.33	1.15	Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization	3, 5, 5	Unknown
145	4.33	1.53	MAIN: A Multi-agent Indoor Navigation Benchmark for Cooperative Learning	6, 4, 3	Unknown
146	4.33	1.53	High-Dimensional Gene Expression and Morphology Profiles of Cells across 28,000 Genetic and Chemical Perturbations	6, 4, 3	Unknown
147	4	0	Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations	4, 4, 4	Unknown
148	4	1	Reconstruction benchmark for obfuscated representations	4, 3, 5	Unknown
149	4	1	Multi-Domain Conditional Image Translation: Translating Driving Datasets from Clear-Weather to Adverse Conditions	5, 3, 4	Unknown
150	4	1	A Dataset for Label Aggregation from Crowdsourced Triplet Similarity Comparisons	4, 3, 5	Unknown
151	4	1	NAS-Bench-360: Benchmarking Diverse Tasks for Neural Architecture Search	3, 4, 5	Unknown
152	4	2	HypeML: A Health AI Promotional Language Dataset	6, 4, 2	Unknown
153	3.67	0.58	Deep Credal Neural Network: Characterization of Imprecision Between Categories	3, 4, 4	Unknown
154	3.67	1.15	Using Dynamic Neural Networks to Model the Speed-Accuracy Trade-Off in People	5, 3, 3	Unknown
155	3.33	0.58	Rail-5k: a Real-World Dataset for Rail Surface Defects Detection	4, 3, 3	Unknown
156	3.33	0.58	New Zealand Open Environmental Science Data sets	3, 4, 3	Unknown
157	3.33	0.58	Multi-Dataset Benchmarks for Masked Identification using Contrastive Representation Learning	4, 3, 3	Unknown
158	3	1	“There will be consequences!” Datasets and Benchmarks towards Causal Relation Extraction and Societal Event Forecasting	3, 2, 4	Unknown
159	2.67	0.58	EEG Thinking1 Datasets: Think-Count-Recall (TCR) and Read-Write-Type (RWT)	3, 3, 2	Unknown
160	2.67	1.53	Indian-CT, Dataset and Analysis of Chest CT Scans of COVID-19 Patients Using Light-weight CNN	4, 3, 1	Unknown
161	2	0	SkillBERT: "Skilling" the BERT to classify skills using Electronic Recruitment Records	2, 2, 2	Unknown

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
asset		asset
README.md		README.md
crawl_paperlist.py		crawl_paperlist.py
crawl_reviews.py		crawl_reviews.py
paperlist.tsv		paperlist.tsv
ranked_papers.md		ranked_papers.md
ratings.tsv		ratings.tsv
visualization.ipynb		visualization.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crawl and Visualize NeurIPS 2021 Datasets and Benchmarks Track OpenReview Data

Descriptions

Prerequisites

Crawl Data

Visualization

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Crawl and Visualize NeurIPS 2021 Datasets and Benchmarks Track OpenReview Data

Descriptions

Prerequisites

Crawl Data

Visualization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages