Skip to content

a-partha/MoMA-Stats-Exploration

Repository files navigation

MoMA Statistics Exploration

A comprehensive statistical analysis and interactive data visualization project exploring gender-based patterns in artwork acquisitions at the Museum of Modern Art (MoMA). This project combines statistical hypothesis testing with interactive web visualizations to investigate whether gender disparities in museum acquisitions are statistically significant or due to random chance.

Live Site: https://a-partha.github.io/MoMA-Stats-Exploration/home.html


πŸ“Š Project Overview

This project performs a rigorous statistical investigation into gender-based patterns in MoMA's artwork collection. Through interactive visualizations and hypothesis testing, it addresses critical questions:

  • Are male artists consistently overrepresented in MoMA's acquisitions across decades?
  • Is this pattern statistically significant, or could it be attributed to random chance?
  • Does gender influence which art department an artist's work is categorized into?

The analysis combines descriptive statistics with inferential statistical tests (t-tests and chi-square tests) to provide evidence-based answers to these questions.


πŸ› οΈ Technologies Used

Frontend Technologies

  • HTML5 - Structure and semantic markup
  • CSS3 - Responsive styling with modern layout techniques (Flexbox, media queries)
  • JavaScript (ES6+) - Interactive functionality and data manipulation

Data Visualization Libraries

  • D3.js (v7) - Core visualization library for creating interactive bar charts
    • d3.scaleBand() - Categorical scaling for decades
    • d3.scaleLog() - Logarithmic scaling for count data
    • d3.rollup() - Data aggregation by decade
    • d3.select() and d3.selectAll() - DOM manipulation and data binding
    • d3.transition() - Smooth animations for bar chart updates

Data Processing

  • CSV Data - Cleaned MoMA artwork dataset (Artworks_clean.csv)
  • Data Cleaning - Performed in Google Colab (Python/Pandas) before analysis

Design & Styling

  • Google Fonts - Custom typography (Hind Madurai, Roboto Mono, Cantarell, Dela Gothic One, Ysabeau Office)
  • Responsive Design - Mobile-first approach with CSS media queries
  • Interactive Elements - Hover effects, tooltips, and dynamic dropdown menus

πŸ“ˆ Statistical Methods

1. Descriptive Statistics

Data Aggregation

  • Decade Grouping: Artworks grouped by decade of acquisition (1920s-2020s)
  • Count Aggregation: Using d3.rollup() to count artworks per decade
  • Gender Categorization: Binary classification (Male vs. Non-male, which includes female and non-binary artists)

Data Filtering

  • Only one artwork per artist considered
  • Collaborations excluded
  • Missing acquisition dates or gender data excluded
  • Art departments with fewer than 100 artworks excluded (Film, Fluxus Collection)

2. Independent Samples t-Tests

Purpose: Test whether the mean number of acquisitions differs significantly between male and non-male artists across different decades.

Methodology:

  1. F-Test for Variance Equality (Pre-test)

    • Null Hypothesis (Hβ‚€): Variances of male and non-male artwork counts are equal
    • Significance Level: Ξ± = 0.05
    • Decision Rule:
      • If F > F-critical (one-tail) AND p-value < 0.05 β†’ Reject Hβ‚€ β†’ Use unequal variance t-test (Welch's t-test)
      • If F ≀ F-critical AND p-value β‰₯ 0.05 β†’ Fail to reject Hβ‚€ β†’ Use equal variance t-test
  2. Independent Samples t-Test

    • Null Hypothesis (Hβ‚€): There is no difference between the mean number of acquisitions from male and non-male artists
    • Alternative Hypothesis (H₁): There is a significant difference
    • Tested Decades: 1950s, 1960s, 1970s, 1990s, 2000s, 2010s
    • Significance Level: Ξ± = 0.05

Results Summary:

Decade t-value p-value (one-tail) p-value (two-tail) Conclusion
1950s 5.05 0.00 0.00 Reject Hβ‚€
1960s 7.11 0.00 0.00 Reject Hβ‚€
1970s 9.69 0.00 0.00 Reject Hβ‚€
1990s 7.19 0.00 0.00 Reject Hβ‚€
2000s 2.29 0.02 0.04 Reject Hβ‚€
2010s 4.62 0.00 0.00 Reject Hβ‚€

Key Finding: All decades showed statistically significant differences (p < 0.05), indicating that the gender disparity is not due to random chance alone.

3. Chi-Square Test of Independence

Purpose: Test whether artist gender and art department classification are statistically independent.

Methodology:

  1. Contingency Table Analysis

    • Observed Frequencies: Cross-tabulation of gender (Male, Non-male) Γ— department (5 categories)
    • Expected Frequencies: Calculated under the assumption of independence
    • Standardized Residuals: Computed to identify which cells contribute most to the dependency
  2. Chi-Square Test

    • Null Hypothesis (Hβ‚€): Artist gender and art department are statistically independent
    • Alternative Hypothesis (H₁): The two variables are dependent
    • Test Statistic: χ² = 189.42
    • p-value: β‰ˆ 0 (highly significant)
    • Significance Level: Ξ± = 0.05
  3. Residual Analysis

    • Standardized Residuals calculated for each cell
    • Threshold: Absolute values > 2 indicate significant deviation from expected
    • Strong Threshold: Absolute values > 3 indicate very strong deviation

Results Summary:

Observed Frequencies:

Gender Architecture & Design Drawings & Prints Media & Performance Painting & Sculpture Photography Total
Male 1,709 3,698 272 784 1,536 7,999
Non-male 231 960 174 161 405 1,931
Total 1,940 4,658 446 945 1,941 9,930

Key Findings from Residual Analysis:

  • Males: Significantly overrepresented in Architecture & Design (standardized residual = 9.35), underrepresented in Drawings & Prints (-2.75) and Media & Performance (-10.68)
  • Non-males: Significantly overrepresented in Drawings & Prints (2.75) and Media & Performance (10.68), underrepresented in Architecture & Design (-9.35)

Conclusion: The chi-square test strongly rejects the null hypothesis (χ² = 189.42, p β‰ˆ 0), providing evidence that gender and art department are statistically dependent.


πŸ“ Project Structure

MoMA-Stats-Exploration/
β”‚
β”œβ”€β”€ home.html              # Landing page
β”œβ”€β”€ page1.html             # Interactive bar chart visualization
β”œβ”€β”€ ttestpage.html         # T-test results and analysis
β”œβ”€β”€ chisqpage.html         # Chi-square test results and analysis
β”œβ”€β”€ ack.html               # References and acknowledgments
β”‚
β”œβ”€β”€ home.css               # Landing page styles
β”œβ”€β”€ page1.css              # Chart page styles
β”œβ”€β”€ page1.js               # D3.js visualization logic
β”œβ”€β”€ ttestpage.css          # T-test page styles
β”œβ”€β”€ ttestpage.js           # T-test table generation
β”œβ”€β”€ chisqpage.css          # Chi-square page styles
β”œβ”€β”€ chisqpage.js           # Chi-square table generation
β”œβ”€β”€ ack.css                # References page styles
β”‚
β”œβ”€β”€ Artworks_clean.csv     # Cleaned MoMA dataset
β”œβ”€β”€ d3.js                  # D3.js library (v7)
β”œβ”€β”€ d3.min.js              # Minified D3.js
β”œβ”€β”€ building.jpg           # Background image
└── README.md              # This file

✨ Features

Interactive Visualizations

  1. Dynamic Bar Chart (page1.html)

    • Logarithmic y-axis scaling for better visualization of count data across decades
    • Interactive tooltips showing exact counts on hover
    • Dual dropdown menus for filtering:
      • Variable selector: Gender or Art Department
      • Type selector: Specific category (all, male, female, etc.)
    • Smooth animated transitions when filtering
    • Responsive design that adapts to screen size
  2. Interactive Tables

    • T-test Results Table: Displays t-values and p-values for each decade
    • Chi-square Tables: Three tables showing:
      • Observed frequencies
      • Expected frequencies
      • Standardized residuals
    • Hover effects highlighting table rows and cells
    • Responsive horizontal scrolling on mobile devices

Statistical Analysis Features

  • Hypothesis Testing Documentation: Clear explanation of null hypotheses, test procedures, and interpretations
  • Residual Analysis: Detailed breakdown of which cells contribute most to statistical dependency
  • Multiple Test Scenarios: Analysis across different decades and categories

User Experience

  • Responsive Design: Works seamlessly on desktop, tablet, and mobile devices
  • Intuitive Navigation: Clear page flow with "Next" buttons guiding users through the analysis
  • Aesthetic Design: Custom color schemes and typography for each page
  • Accessibility: Semantic HTML and proper viewport meta tags

πŸ” Key Insights

1. Gender Disparity Across Decades

Finding: Male artists consistently have more artworks acquired by MoMA than non-male artists across all decades analyzed (1950s-2010s).

Statistical Evidence:

  • All t-tests across six decades showed statistically significant differences (p < 0.05)
  • The pattern persisted in both "normal" decades and decades with extreme disparities (1960s, 2000s)
  • This suggests the disparity is systematic rather than random

2. Departmental Gender Segregation

Finding: Gender influences which art department an artist's work is categorized into.

Statistical Evidence:

  • Chi-square test: χ² = 189.42, p β‰ˆ 0 (highly significant)
  • Strong standardized residuals (|r| > 3) indicate:
    • Males: Overrepresented in Architecture & Design (r = 9.35)
    • Males: Underrepresented in Media & Performance (r = -10.68)
    • Non-males: Overrepresented in Media & Performance (r = 10.68)
    • Non-males: Underrepresented in Architecture & Design (r = -9.35)

Implications: The relationship between gender and department classification is not coincidental but reflects systematic patterns in how artworks are categorized or acquired.

3. Persistence Across Time

Finding: The gender disparity persists across multiple decades, suggesting it's not a temporary phenomenon tied to specific historical periods.

Evidence: Consistent significant results across 1950s-2010s, spanning over 60 years of acquisitions.


πŸš€ How to Run

Local Development

  1. Clone or Download the repository

    git clone https://github.com/a-partha/MoMA-Stats-Exploration.git
    cd MoMA-Stats-Exploration
  2. Start a Local Server

    Since the project uses D3.js to load CSV data, you need a local server (browsers block local file access for security):

    Option A: Python

    # Python 3
    python -m http.server 8000
    
    # Python 2
    python -m SimpleHTTPServer 8000

    Option B: Node.js (http-server)

    npm install -g http-server
    http-server -p 8000

    Option C: VS Code Live Server

    • Install the "Live Server" extension
    • Right-click on home.html β†’ "Open with Live Server"
  3. Access the Site

    • Open your browser and navigate to: http://localhost:8000/home.html
    • Or use the port specified by your server

File Requirements

  • Artworks_clean.csv must be in the same directory as the HTML files
  • d3.js or d3.min.js must be present (included in the repository)

πŸ“š Data Sources & References

Data

Course Materials

  • DATA 73200 - Interactive Data Visualization (53908): Taught by Prof. Ellie Frymire at The Graduate Center of CUNY, Fall 2023
  • DATA 71000 - Data Analysis Methods (53244): Taught by Prof. Howard Everson at The Graduate Center of CUNY, Fall 2023

Tools & Resources


πŸŽ“ Statistical Methodology Details

Why Logarithmic Scale?

The y-axis uses a logarithmic scale because:

  • Artwork counts span a wide range (from single digits to thousands)
  • Logarithmic scaling prevents smaller values from being visually compressed
  • Makes it easier to compare trends across different magnitude ranges
  • Maintains readability while showing the full data range

Why Multiple Decades for t-Tests?

Testing multiple decades allows us to:

  • Determine if disparities are consistent across time
  • Identify whether patterns are specific to certain periods
  • Provide stronger evidence through replication
  • Rule out decade-specific historical factors

Why Residual Analysis?

After finding dependency (chi-square test), residual analysis helps:

  • Identify which specific cells contribute most to the dependency
  • Understand the direction of the relationship (over/under-representation)
  • Provide actionable insights beyond simple "dependent/independent" conclusions

πŸ“ Notes & Limitations

Data Limitations

  • Only one artwork per artist was considered (prevents overrepresentation of prolific artists)
  • Non-binary artists: Only 3 in the entire dataset (too small for statistical analysis)
  • Missing data: Collaborations and entries with missing dates/gender were excluded
  • Small departments: Film and Fluxus Collection excluded (< 100 artworks each)

Statistical Limitations

  • Binary Gender Classification: Non-male category combines female and non-binary (due to small sample size)
  • Decade Aggregation: Loses granularity of year-by-year trends
  • Multiple Comparisons: Multiple t-tests across decades increase risk of Type I error (though all results highly significant)

Future Enhancements

  • Year-by-year analysis instead of decade aggregation
  • Separate analysis for female vs. non-binary artists (when sample size allows)
  • Multivariate analysis controlling for other factors (artist nationality, time period, etc.)
  • Interactive exploration of other demographic variables

πŸ‘€ Author

Aniruddha Partha

This project was completed as part of the Interactive Data Visualization course (DATA 73200) at The Graduate Center of CUNY, Fall 2023.


πŸ“„ License

This project is for educational and research purposes. The MoMA dataset usage is subject to MoMA's terms of use.


πŸ”— Live Site

Access the interactive visualization: https://a-partha.github.io/MoMA-Stats-Exploration/home.html


Last Updated: 2024

About

Statistical Visualization of the Museum of Modern Art in NYC.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors