A comprehensive statistical analysis and interactive data visualization project exploring gender-based patterns in artwork acquisitions at the Museum of Modern Art (MoMA). This project combines statistical hypothesis testing with interactive web visualizations to investigate whether gender disparities in museum acquisitions are statistically significant or due to random chance.
Live Site: https://a-partha.github.io/MoMA-Stats-Exploration/home.html
This project performs a rigorous statistical investigation into gender-based patterns in MoMA's artwork collection. Through interactive visualizations and hypothesis testing, it addresses critical questions:
- Are male artists consistently overrepresented in MoMA's acquisitions across decades?
- Is this pattern statistically significant, or could it be attributed to random chance?
- Does gender influence which art department an artist's work is categorized into?
The analysis combines descriptive statistics with inferential statistical tests (t-tests and chi-square tests) to provide evidence-based answers to these questions.
- HTML5 - Structure and semantic markup
- CSS3 - Responsive styling with modern layout techniques (Flexbox, media queries)
- JavaScript (ES6+) - Interactive functionality and data manipulation
- D3.js (v7) - Core visualization library for creating interactive bar charts
d3.scaleBand()- Categorical scaling for decadesd3.scaleLog()- Logarithmic scaling for count datad3.rollup()- Data aggregation by decaded3.select()andd3.selectAll()- DOM manipulation and data bindingd3.transition()- Smooth animations for bar chart updates
- CSV Data - Cleaned MoMA artwork dataset (
Artworks_clean.csv) - Data Cleaning - Performed in Google Colab (Python/Pandas) before analysis
- Google Fonts - Custom typography (Hind Madurai, Roboto Mono, Cantarell, Dela Gothic One, Ysabeau Office)
- Responsive Design - Mobile-first approach with CSS media queries
- Interactive Elements - Hover effects, tooltips, and dynamic dropdown menus
- Decade Grouping: Artworks grouped by decade of acquisition (1920s-2020s)
- Count Aggregation: Using
d3.rollup()to count artworks per decade - Gender Categorization: Binary classification (Male vs. Non-male, which includes female and non-binary artists)
- Only one artwork per artist considered
- Collaborations excluded
- Missing acquisition dates or gender data excluded
- Art departments with fewer than 100 artworks excluded (Film, Fluxus Collection)
Purpose: Test whether the mean number of acquisitions differs significantly between male and non-male artists across different decades.
-
F-Test for Variance Equality (Pre-test)
- Null Hypothesis (Hβ): Variances of male and non-male artwork counts are equal
- Significance Level: Ξ± = 0.05
- Decision Rule:
- If F > F-critical (one-tail) AND p-value < 0.05 β Reject Hβ β Use unequal variance t-test (Welch's t-test)
- If F β€ F-critical AND p-value β₯ 0.05 β Fail to reject Hβ β Use equal variance t-test
-
Independent Samples t-Test
- Null Hypothesis (Hβ): There is no difference between the mean number of acquisitions from male and non-male artists
- Alternative Hypothesis (Hβ): There is a significant difference
- Tested Decades: 1950s, 1960s, 1970s, 1990s, 2000s, 2010s
- Significance Level: Ξ± = 0.05
| Decade | t-value | p-value (one-tail) | p-value (two-tail) | Conclusion |
|---|---|---|---|---|
| 1950s | 5.05 | 0.00 | 0.00 | Reject Hβ |
| 1960s | 7.11 | 0.00 | 0.00 | Reject Hβ |
| 1970s | 9.69 | 0.00 | 0.00 | Reject Hβ |
| 1990s | 7.19 | 0.00 | 0.00 | Reject Hβ |
| 2000s | 2.29 | 0.02 | 0.04 | Reject Hβ |
| 2010s | 4.62 | 0.00 | 0.00 | Reject Hβ |
Key Finding: All decades showed statistically significant differences (p < 0.05), indicating that the gender disparity is not due to random chance alone.
Purpose: Test whether artist gender and art department classification are statistically independent.
-
Contingency Table Analysis
- Observed Frequencies: Cross-tabulation of gender (Male, Non-male) Γ department (5 categories)
- Expected Frequencies: Calculated under the assumption of independence
- Standardized Residuals: Computed to identify which cells contribute most to the dependency
-
Chi-Square Test
- Null Hypothesis (Hβ): Artist gender and art department are statistically independent
- Alternative Hypothesis (Hβ): The two variables are dependent
- Test Statistic: ΟΒ² = 189.42
- p-value: β 0 (highly significant)
- Significance Level: Ξ± = 0.05
-
Residual Analysis
- Standardized Residuals calculated for each cell
- Threshold: Absolute values > 2 indicate significant deviation from expected
- Strong Threshold: Absolute values > 3 indicate very strong deviation
Observed Frequencies:
| Gender | Architecture & Design | Drawings & Prints | Media & Performance | Painting & Sculpture | Photography | Total |
|---|---|---|---|---|---|---|
| Male | 1,709 | 3,698 | 272 | 784 | 1,536 | 7,999 |
| Non-male | 231 | 960 | 174 | 161 | 405 | 1,931 |
| Total | 1,940 | 4,658 | 446 | 945 | 1,941 | 9,930 |
Key Findings from Residual Analysis:
- Males: Significantly overrepresented in Architecture & Design (standardized residual = 9.35), underrepresented in Drawings & Prints (-2.75) and Media & Performance (-10.68)
- Non-males: Significantly overrepresented in Drawings & Prints (2.75) and Media & Performance (10.68), underrepresented in Architecture & Design (-9.35)
Conclusion: The chi-square test strongly rejects the null hypothesis (ΟΒ² = 189.42, p β 0), providing evidence that gender and art department are statistically dependent.
MoMA-Stats-Exploration/
β
βββ home.html # Landing page
βββ page1.html # Interactive bar chart visualization
βββ ttestpage.html # T-test results and analysis
βββ chisqpage.html # Chi-square test results and analysis
βββ ack.html # References and acknowledgments
β
βββ home.css # Landing page styles
βββ page1.css # Chart page styles
βββ page1.js # D3.js visualization logic
βββ ttestpage.css # T-test page styles
βββ ttestpage.js # T-test table generation
βββ chisqpage.css # Chi-square page styles
βββ chisqpage.js # Chi-square table generation
βββ ack.css # References page styles
β
βββ Artworks_clean.csv # Cleaned MoMA dataset
βββ d3.js # D3.js library (v7)
βββ d3.min.js # Minified D3.js
βββ building.jpg # Background image
βββ README.md # This file
-
Dynamic Bar Chart (
page1.html)- Logarithmic y-axis scaling for better visualization of count data across decades
- Interactive tooltips showing exact counts on hover
- Dual dropdown menus for filtering:
- Variable selector: Gender or Art Department
- Type selector: Specific category (all, male, female, etc.)
- Smooth animated transitions when filtering
- Responsive design that adapts to screen size
-
Interactive Tables
- T-test Results Table: Displays t-values and p-values for each decade
- Chi-square Tables: Three tables showing:
- Observed frequencies
- Expected frequencies
- Standardized residuals
- Hover effects highlighting table rows and cells
- Responsive horizontal scrolling on mobile devices
- Hypothesis Testing Documentation: Clear explanation of null hypotheses, test procedures, and interpretations
- Residual Analysis: Detailed breakdown of which cells contribute most to statistical dependency
- Multiple Test Scenarios: Analysis across different decades and categories
- Responsive Design: Works seamlessly on desktop, tablet, and mobile devices
- Intuitive Navigation: Clear page flow with "Next" buttons guiding users through the analysis
- Aesthetic Design: Custom color schemes and typography for each page
- Accessibility: Semantic HTML and proper viewport meta tags
Finding: Male artists consistently have more artworks acquired by MoMA than non-male artists across all decades analyzed (1950s-2010s).
Statistical Evidence:
- All t-tests across six decades showed statistically significant differences (p < 0.05)
- The pattern persisted in both "normal" decades and decades with extreme disparities (1960s, 2000s)
- This suggests the disparity is systematic rather than random
Finding: Gender influences which art department an artist's work is categorized into.
Statistical Evidence:
- Chi-square test: ΟΒ² = 189.42, p β 0 (highly significant)
- Strong standardized residuals (|r| > 3) indicate:
- Males: Overrepresented in Architecture & Design (r = 9.35)
- Males: Underrepresented in Media & Performance (r = -10.68)
- Non-males: Overrepresented in Media & Performance (r = 10.68)
- Non-males: Underrepresented in Architecture & Design (r = -9.35)
Implications: The relationship between gender and department classification is not coincidental but reflects systematic patterns in how artworks are categorized or acquired.
Finding: The gender disparity persists across multiple decades, suggesting it's not a temporary phenomenon tied to specific historical periods.
Evidence: Consistent significant results across 1950s-2010s, spanning over 60 years of acquisitions.
-
Clone or Download the repository
git clone https://github.com/a-partha/MoMA-Stats-Exploration.git cd MoMA-Stats-Exploration -
Start a Local Server
Since the project uses D3.js to load CSV data, you need a local server (browsers block local file access for security):
Option A: Python
# Python 3 python -m http.server 8000 # Python 2 python -m SimpleHTTPServer 8000
Option B: Node.js (http-server)
npm install -g http-server http-server -p 8000
Option C: VS Code Live Server
- Install the "Live Server" extension
- Right-click on
home.htmlβ "Open with Live Server"
-
Access the Site
- Open your browser and navigate to:
http://localhost:8000/home.html - Or use the port specified by your server
- Open your browser and navigate to:
Artworks_clean.csvmust be in the same directory as the HTML filesd3.jsord3.min.jsmust be present (included in the repository)
- MoMA Collection Dataset: Museum of Modern Art artwork acquisition data
- Data Cleaning: Performed in Google Colab using Python/Pandas
- Data Cleaning Reference: Interactive Data Visualization Fall 2023 - MoMA Data Cleanup
- DATA 73200 - Interactive Data Visualization (53908): Taught by Prof. Ellie Frymire at The Graduate Center of CUNY, Fall 2023
- DATA 71000 - Data Analysis Methods (53244): Taught by Prof. Howard Everson at The Graduate Center of CUNY, Fall 2023
- D3.js: https://d3js.org/
- Google Fonts: https://fonts.google.com/
- Cover Image: Museum of Modern Art, NYC
The y-axis uses a logarithmic scale because:
- Artwork counts span a wide range (from single digits to thousands)
- Logarithmic scaling prevents smaller values from being visually compressed
- Makes it easier to compare trends across different magnitude ranges
- Maintains readability while showing the full data range
Testing multiple decades allows us to:
- Determine if disparities are consistent across time
- Identify whether patterns are specific to certain periods
- Provide stronger evidence through replication
- Rule out decade-specific historical factors
After finding dependency (chi-square test), residual analysis helps:
- Identify which specific cells contribute most to the dependency
- Understand the direction of the relationship (over/under-representation)
- Provide actionable insights beyond simple "dependent/independent" conclusions
- Only one artwork per artist was considered (prevents overrepresentation of prolific artists)
- Non-binary artists: Only 3 in the entire dataset (too small for statistical analysis)
- Missing data: Collaborations and entries with missing dates/gender were excluded
- Small departments: Film and Fluxus Collection excluded (< 100 artworks each)
- Binary Gender Classification: Non-male category combines female and non-binary (due to small sample size)
- Decade Aggregation: Loses granularity of year-by-year trends
- Multiple Comparisons: Multiple t-tests across decades increase risk of Type I error (though all results highly significant)
- Year-by-year analysis instead of decade aggregation
- Separate analysis for female vs. non-binary artists (when sample size allows)
- Multivariate analysis controlling for other factors (artist nationality, time period, etc.)
- Interactive exploration of other demographic variables
Aniruddha Partha
This project was completed as part of the Interactive Data Visualization course (DATA 73200) at The Graduate Center of CUNY, Fall 2023.
This project is for educational and research purposes. The MoMA dataset usage is subject to MoMA's terms of use.
Access the interactive visualization: https://a-partha.github.io/MoMA-Stats-Exploration/home.html
Last Updated: 2024