The Weird AI Test Museum

A curated, visual collection of memorable AI tests: spaghetti physics, hand rendering, language tricks, long-horizon games, citation checks, audio robustness, and research benchmarks.

This project is intentionally part museum, part field guide, and part internet culture archive. It is made for fun and curiosity while AI develops unusually fast. The exhibits are snapshots, not permanent verdicts on any model or company.

Explore

Open the museum

Repository: github.com/eudk/weird-ai-test-museum

Inside the museum

Headliners: memorable tests that escaped research circles and became internet culture.
Image stress tests: hands, text rendering, spatial relationships, and compositional reasoning.
Language tricks: counting, constraint following, ambiguity, and deceptively simple prompts.
Game tests: long-horizon planning, memory, exploration, and the effect of agent harnesses.
Video and audio: physics, identity preservation, synchronization, noisy-scene reasoning, and voice cloning.
Real-world stress tests: cases such as the widely reported 18,000-water-cups drive-through request.
Formal benchmarks: ARC-AGI, Humanity's Last Exam, SWE-Bench Pro, BrowseComp, OSWorld, Terminal-Bench, APEX-Agents, MCP-Atlas, MMMU-Pro, EVMbench, and others.
Dated model snapshot: a fold-out comparison of selected published June 2026 results with tool and harness caveats.

What this project is

The museum explains what each test is trying to reveal, why people found it memorable, and where to read more. Sources are linked directly from the exhibits.

It does not:

declare one model universally best;
treat a viral demo as a controlled scientific benchmark;
assume an old score still describes the current frontier;
present capability labels as permanent limits.

AI evaluation depends on the model version, date, tools, prompts, scaffolding, retries, scoring method, and dataset version. Numbers without that context age badly.

Inspiration

This project was inspired in part by the excellent BenchLM.ai AI Benchmarks Directory. Its broad catalog helped shape the museum's formal benchmark shelf and encouraged the mix of benchmark categories represented here.

The individual exhibits also draw from primary papers, official benchmark sites, reputable reporting, public leaderboards, and the wonderfully strange informal tests that spread through AI culture.

Memorable informal tests and formal benchmarks serve different purposes:

Type	Useful for
Informal regression tests	Making visible inconsistencies easy to recognize and discuss
Real-world incidents	Revealing deployment, guardrail, and human-handoff problems
Formal benchmarks	Producing controlled, repeatable comparisons under stated conditions

Run locally

No build step or dependency installation is required.

Start-Process .\index.html

You can also open index.html directly in any modern browser.

Project structure

AITestMuseum/
|-- assets/
|   |-- examples.png
|   `-- name.png
|-- index.html
|-- LICENSE
`-- README.md

The site is a single responsive HTML document with embedded CSS and JavaScript. Google Fonts are loaded from the web; the rest of the project is static.

Updating exhibits

When adding or revising an exhibit:

Prefer a primary paper, official benchmark page, or reputable report.
Date volatile scores and identify the exact evaluation setting.
State when a test is informal, anecdotal, or dependent on a custom harness.
Avoid unsupported precision and universal claims.
Keep the tone curious, readable, and honest about uncertainty.

Disclaimer

Independent educational project. Not affiliated with or endorsed by any person, company, model provider, or benchmark creator mentioned.

Linked material and third-party names belong to their respective owners.

License

The original code and project text are available under the MIT License. Third-party names, trademarks, linked material, and referenced media remain the property of their respective owners.

Made by eudk.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Weird AI Test Museum

Explore

Inside the museum

What this project is

Inspiration

Run locally

Project structure

Updating exhibits

Disclaimer

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md
index.html		index.html

Folders and files

Latest commit

History

Repository files navigation

The Weird AI Test Museum

Explore

Inside the museum

What this project is

Inspiration

Run locally

Project structure

Updating exhibits

Disclaimer

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages