MathSight

MathSight: A Benchmark Exploring Have Vision-Language Models Really Seen in University-Level Mathematical Reasoning?

🧠 What is MathSight?

MathSight is a university-level multimodal mathematical reasoning benchmark designed to answer a simple question:

Have vision-language models (VLMs) really seen in mathematical reasoning?

Most existing multimodal math benchmarks report high overall accuracy, but they rarely isolate how much the visual modality actually contributes. MathSight systematically controls visual variants and text-only settings, so we can disentangle:

What can be solved from text alone?
When do models truly rely on images?
How robust are models to style changes in diagrams?

✨ Key Features

Multi-view visual variants

Each multimodal problem may contain:
- Original clean diagram
- Hand-drawn style
- Photo-captured version
- Text-only condition (no image)
This enables controlled comparison between vision + language vs language-only reasoning.
Challenging university-level coverage
- 2,048 total problems
- Covers core university math subjects
- Multimodal split is mainly graduate-level, closer to real exam-style questions.
Vision vs text analysis

MathSight is built to compare:
- Closed-source & open-source VLMs
- VLMs with vs. without image input
- Answer consistency across different image styles and resolutions

📊 Dataset Statistics

Total problems: 2,048
Multimodal items: 661
Images: 632 × 3 variants (original / hand / photo)
Subjects: 6 core areas of mathematics
Question types: 1,713 non-proof vs. 335 proof problems
Difficulty: 1,668 graduate-level vs. 380 undergraduate-level

📦 Dataset

We will gradually release the MathSight benchmark in JSONL format.

File structure

All dataset files will be stored under the data/ directory:

Dataset/final-language-1387-with-subject-withfixed-new-proof.jsonl
Text-only version of the problems (no image paths), for pure LLM evaluation.
Dataset/661/final-vision-661-new.jsonl
Multimodal version including image_path fields and visual variants.

Each JSONL line corresponds to one problem and follows the schema:

{
  "question": "...",
  "answer/solution": "...",
  "ground_truth": "...",
  "image_path": "",
  "subject": "Calculus",
  "difficulty": "Graduate-level",
  "is_proof": false
}

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
Dataset		Dataset
code		code
website_assets		website_assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MathSight

🧠 What is MathSight?

✨ Key Features

📊 Dataset Statistics

📦 Dataset

File structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MathSight

🧠 What is MathSight?

✨ Key Features

📊 Dataset Statistics

📦 Dataset

File structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages