MoMA Statistics Exploration

A comprehensive statistical analysis and interactive data visualization project exploring gender-based patterns in artwork acquisitions at the Museum of Modern Art (MoMA). This project combines statistical hypothesis testing with interactive web visualizations to investigate whether gender disparities in museum acquisitions are statistically significant or due to random chance.

Live Site: https://a-partha.github.io/MoMA-Stats-Exploration/home.html

📊 Project Overview

This project performs a rigorous statistical investigation into gender-based patterns in MoMA's artwork collection. Through interactive visualizations and hypothesis testing, it addresses critical questions:

Are male artists consistently overrepresented in MoMA's acquisitions across decades?
Is this pattern statistically significant, or could it be attributed to random chance?
Does gender influence which art department an artist's work is categorized into?

The analysis combines descriptive statistics with inferential statistical tests (t-tests and chi-square tests) to provide evidence-based answers to these questions.

🛠️ Technologies Used

Frontend Technologies

HTML5 - Structure and semantic markup
CSS3 - Responsive styling with modern layout techniques (Flexbox, media queries)
JavaScript (ES6+) - Interactive functionality and data manipulation

Data Visualization Libraries

D3.js (v7) - Core visualization library for creating interactive bar charts
- d3.scaleBand() - Categorical scaling for decades
- d3.scaleLog() - Logarithmic scaling for count data
- d3.rollup() - Data aggregation by decade
- d3.select() and d3.selectAll() - DOM manipulation and data binding
- d3.transition() - Smooth animations for bar chart updates

Data Processing

CSV Data - Cleaned MoMA artwork dataset (Artworks_clean.csv)
Data Cleaning - Performed in Google Colab (Python/Pandas) before analysis

Design & Styling

Google Fonts - Custom typography (Hind Madurai, Roboto Mono, Cantarell, Dela Gothic One, Ysabeau Office)
Responsive Design - Mobile-first approach with CSS media queries
Interactive Elements - Hover effects, tooltips, and dynamic dropdown menus

📈 Statistical Methods

1. Descriptive Statistics

Data Aggregation

Decade Grouping: Artworks grouped by decade of acquisition (1920s-2020s)
Count Aggregation: Using d3.rollup() to count artworks per decade
Gender Categorization: Binary classification (Male vs. Non-male, which includes female and non-binary artists)

Data Filtering

Only one artwork per artist considered
Collaborations excluded
Missing acquisition dates or gender data excluded
Art departments with fewer than 100 artworks excluded (Film, Fluxus Collection)

2. Independent Samples t-Tests

Purpose: Test whether the mean number of acquisitions differs significantly between male and non-male artists across different decades.

Methodology:

F-Test for Variance Equality (Pre-test)
- Null Hypothesis (H₀): Variances of male and non-male artwork counts are equal
- Significance Level: α = 0.05
- Decision Rule:
  - If F > F-critical (one-tail) AND p-value < 0.05 → Reject H₀ → Use unequal variance t-test (Welch's t-test)
  - If F ≤ F-critical AND p-value ≥ 0.05 → Fail to reject H₀ → Use equal variance t-test
Independent Samples t-Test
- Null Hypothesis (H₀): There is no difference between the mean number of acquisitions from male and non-male artists
- Alternative Hypothesis (H₁): There is a significant difference
- Tested Decades: 1950s, 1960s, 1970s, 1990s, 2000s, 2010s
- Significance Level: α = 0.05

Results Summary:

Decade	t-value	p-value (one-tail)	p-value (two-tail)	Conclusion
1950s	5.05	0.00	0.00	Reject H₀
1960s	7.11	0.00	0.00	Reject H₀
1970s	9.69	0.00	0.00	Reject H₀
1990s	7.19	0.00	0.00	Reject H₀
2000s	2.29	0.02	0.04	Reject H₀
2010s	4.62	0.00	0.00	Reject H₀

Key Finding: All decades showed statistically significant differences (p < 0.05), indicating that the gender disparity is not due to random chance alone.

3. Chi-Square Test of Independence

Purpose: Test whether artist gender and art department classification are statistically independent.

Methodology:

Contingency Table Analysis
- Observed Frequencies: Cross-tabulation of gender (Male, Non-male) × department (5 categories)
- Expected Frequencies: Calculated under the assumption of independence
- Standardized Residuals: Computed to identify which cells contribute most to the dependency
Chi-Square Test
- Null Hypothesis (H₀): Artist gender and art department are statistically independent
- Alternative Hypothesis (H₁): The two variables are dependent
- Test Statistic: χ² = 189.42
- p-value: ≈ 0 (highly significant)
- Significance Level: α = 0.05
Residual Analysis
- Standardized Residuals calculated for each cell
- Threshold: Absolute values > 2 indicate significant deviation from expected
- Strong Threshold: Absolute values > 3 indicate very strong deviation

Results Summary:

Observed Frequencies:

Gender	Architecture & Design	Drawings & Prints	Media & Performance	Painting & Sculpture	Photography	Total
Male	1,709	3,698	272	784	1,536	7,999
Non-male	231	960	174	161	405	1,931
Total	1,940	4,658	446	945	1,941	9,930

Key Findings from Residual Analysis:

Males: Significantly overrepresented in Architecture & Design (standardized residual = 9.35), underrepresented in Drawings & Prints (-2.75) and Media & Performance (-10.68)
Non-males: Significantly overrepresented in Drawings & Prints (2.75) and Media & Performance (10.68), underrepresented in Architecture & Design (-9.35)

Conclusion: The chi-square test strongly rejects the null hypothesis (χ² = 189.42, p ≈ 0), providing evidence that gender and art department are statistically dependent.

📁 Project Structure

MoMA-Stats-Exploration/
│
├── home.html              # Landing page
├── page1.html             # Interactive bar chart visualization
├── ttestpage.html         # T-test results and analysis
├── chisqpage.html         # Chi-square test results and analysis
├── ack.html               # References and acknowledgments
│
├── home.css               # Landing page styles
├── page1.css              # Chart page styles
├── page1.js               # D3.js visualization logic
├── ttestpage.css          # T-test page styles
├── ttestpage.js           # T-test table generation
├── chisqpage.css          # Chi-square page styles
├── chisqpage.js           # Chi-square table generation
├── ack.css                # References page styles
│
├── Artworks_clean.csv     # Cleaned MoMA dataset
├── d3.js                  # D3.js library (v7)
├── d3.min.js              # Minified D3.js
├── building.jpg           # Background image
└── README.md              # This file

✨ Features

Interactive Visualizations

Dynamic Bar Chart (page1.html)
- Logarithmic y-axis scaling for better visualization of count data across decades
- Interactive tooltips showing exact counts on hover
- Dual dropdown menus for filtering:
  - Variable selector: Gender or Art Department
  - Type selector: Specific category (all, male, female, etc.)
- Smooth animated transitions when filtering
- Responsive design that adapts to screen size
Interactive Tables
- T-test Results Table: Displays t-values and p-values for each decade
- Chi-square Tables: Three tables showing:
  - Observed frequencies
  - Expected frequencies
  - Standardized residuals
- Hover effects highlighting table rows and cells
- Responsive horizontal scrolling on mobile devices

Statistical Analysis Features

Hypothesis Testing Documentation: Clear explanation of null hypotheses, test procedures, and interpretations
Residual Analysis: Detailed breakdown of which cells contribute most to statistical dependency
Multiple Test Scenarios: Analysis across different decades and categories

User Experience

Responsive Design: Works seamlessly on desktop, tablet, and mobile devices
Intuitive Navigation: Clear page flow with "Next" buttons guiding users through the analysis
Aesthetic Design: Custom color schemes and typography for each page
Accessibility: Semantic HTML and proper viewport meta tags

🔍 Key Insights

1. Gender Disparity Across Decades

Finding: Male artists consistently have more artworks acquired by MoMA than non-male artists across all decades analyzed (1950s-2010s).

Statistical Evidence:

All t-tests across six decades showed statistically significant differences (p < 0.05)
The pattern persisted in both "normal" decades and decades with extreme disparities (1960s, 2000s)
This suggests the disparity is systematic rather than random

2. Departmental Gender Segregation

Finding: Gender influences which art department an artist's work is categorized into.

Statistical Evidence:

Chi-square test: χ² = 189.42, p ≈ 0 (highly significant)
Strong standardized residuals (|r| > 3) indicate:
- Males: Overrepresented in Architecture & Design (r = 9.35)
- Males: Underrepresented in Media & Performance (r = -10.68)
- Non-males: Overrepresented in Media & Performance (r = 10.68)
- Non-males: Underrepresented in Architecture & Design (r = -9.35)

Implications: The relationship between gender and department classification is not coincidental but reflects systematic patterns in how artworks are categorized or acquired.

3. Persistence Across Time

Finding: The gender disparity persists across multiple decades, suggesting it's not a temporary phenomenon tied to specific historical periods.

Evidence: Consistent significant results across 1950s-2010s, spanning over 60 years of acquisitions.

🚀 How to Run

Local Development

Clone or Download the repository

git clone https://github.com/a-partha/MoMA-Stats-Exploration.git
cd MoMA-Stats-Exploration

Start a Local Server

Since the project uses D3.js to load CSV data, you need a local server (browsers block local file access for security):

Option A: Python
```
# Python 3
python -m http.server 8000

# Python 2
python -m SimpleHTTPServer 8000
```
Option B: Node.js (http-server)
```
npm install -g http-server
http-server -p 8000
```
Option C: VS Code Live Server
- Install the "Live Server" extension
- Right-click on home.html → "Open with Live Server"
Access the Site
- Open your browser and navigate to: http://localhost:8000/home.html
- Or use the port specified by your server

File Requirements

Artworks_clean.csv must be in the same directory as the HTML files
d3.js or d3.min.js must be present (included in the repository)

📚 Data Sources & References

Data

MoMA Collection Dataset: Museum of Modern Art artwork acquisition data
Data Cleaning: Performed in Google Colab using Python/Pandas
Data Cleaning Reference: Interactive Data Visualization Fall 2023 - MoMA Data Cleanup

Course Materials

DATA 73200 - Interactive Data Visualization (53908): Taught by Prof. Ellie Frymire at The Graduate Center of CUNY, Fall 2023
DATA 71000 - Data Analysis Methods (53244): Taught by Prof. Howard Everson at The Graduate Center of CUNY, Fall 2023

Tools & Resources

D3.js: https://d3js.org/
Google Fonts: https://fonts.google.com/
Cover Image: Museum of Modern Art, NYC

🎓 Statistical Methodology Details

Why Logarithmic Scale?

The y-axis uses a logarithmic scale because:

Artwork counts span a wide range (from single digits to thousands)
Logarithmic scaling prevents smaller values from being visually compressed
Makes it easier to compare trends across different magnitude ranges
Maintains readability while showing the full data range

Why Multiple Decades for t-Tests?

Testing multiple decades allows us to:

Determine if disparities are consistent across time
Identify whether patterns are specific to certain periods
Provide stronger evidence through replication
Rule out decade-specific historical factors

Why Residual Analysis?

After finding dependency (chi-square test), residual analysis helps:

Identify which specific cells contribute most to the dependency
Understand the direction of the relationship (over/under-representation)
Provide actionable insights beyond simple "dependent/independent" conclusions

📝 Notes & Limitations

Data Limitations

Only one artwork per artist was considered (prevents overrepresentation of prolific artists)
Non-binary artists: Only 3 in the entire dataset (too small for statistical analysis)
Missing data: Collaborations and entries with missing dates/gender were excluded
Small departments: Film and Fluxus Collection excluded (< 100 artworks each)

Statistical Limitations

Binary Gender Classification: Non-male category combines female and non-binary (due to small sample size)
Decade Aggregation: Loses granularity of year-by-year trends
Multiple Comparisons: Multiple t-tests across decades increase risk of Type I error (though all results highly significant)

Future Enhancements

Year-by-year analysis instead of decade aggregation
Separate analysis for female vs. non-binary artists (when sample size allows)
Multivariate analysis controlling for other factors (artist nationality, time period, etc.)
Interactive exploration of other demographic variables

👤 Author

Aniruddha Partha

This project was completed as part of the Interactive Data Visualization course (DATA 73200) at The Graduate Center of CUNY, Fall 2023.

📄 License

This project is for educational and research purposes. The MoMA dataset usage is subject to MoMA's terms of use.

🔗 Live Site

Access the interactive visualization: https://a-partha.github.io/MoMA-Stats-Exploration/home.html

Last Updated: 2024

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.cursor		.cursor
.gitattributes		.gitattributes
Artworks_clean.csv		Artworks_clean.csv
README.md		README.md
ack.css		ack.css
ack.html		ack.html
building.jpg		building.jpg
chisqpage.css		chisqpage.css
chisqpage.html		chisqpage.html
chisqpage.js		chisqpage.js
d3.js		d3.js
d3.min.js		d3.min.js
home.css		home.css
home.html		home.html
mainoo.js		mainoo.js
page1.css		page1.css
page1.html		page1.html
page1.js		page1.js
style.css		style.css
ttestpage.css		ttestpage.css
ttestpage.html		ttestpage.html
ttestpage.js		ttestpage.js

Folders and files

Latest commit

History

Repository files navigation

MoMA Statistics Exploration

📊 Project Overview

🛠️ Technologies Used

Frontend Technologies

Data Visualization Libraries

Data Processing

Design & Styling

📈 Statistical Methods

1. Descriptive Statistics

Data Aggregation

Data Filtering

2. Independent Samples t-Tests

Methodology:

Results Summary:

3. Chi-Square Test of Independence

Methodology:

Results Summary:

📁 Project Structure

✨ Features

Interactive Visualizations

Statistical Analysis Features

User Experience

🔍 Key Insights

1. Gender Disparity Across Decades

2. Departmental Gender Segregation

3. Persistence Across Time

🚀 How to Run

Local Development

File Requirements

📚 Data Sources & References

Data

Course Materials

Tools & Resources

🎓 Statistical Methodology Details

Why Logarithmic Scale?

Why Multiple Decades for t-Tests?

Why Residual Analysis?

📝 Notes & Limitations

Data Limitations

Statistical Limitations

Future Enhancements

👤 Author

📄 License

🔗 Live Site

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages