🤗 Hugging Face | 📑 Paper | ⚙️ Github | 🖥️ Home Page
Peiwen Sun
The official repo for SpaceVista: All-Scale Visual Spatial Reasoning from
[2026.5.28] 📦 Our SpaceVista-1M is released at .
[2026.5.28] 🎯 Our SpaceVista-Bench is released at .
[2026.5.10] 🏆 The Guinness World Records data used in our paper is released at .
[2026.5.2] 🎉 Our paper is accepted by ICML 2026. See you in Seoul.
[2025.10.10] Our preview SFT code base is released for preview. .
[2025.10.10] Our preview 100K subset of SpaceVista-1M is now available at .
[2025.10.10] Our initial paper is now accessible at .
- Training Dataset: SpaceVista-1M
.
- Evaluation Dataset: SpaceVista-Bench
.
- SFT training: SFT code for SpaceVista
.
- Evaluating: Evaluation code for SpaceVista
.
Spatial reasoning is the ability to perceive, interpret, and act across spatial scales, from millimeter-sized components to distant aerial scenes. All-scale spatial reasoning is fundamental to next-generation intelligent systems and supports diverse applications: mm sensing for advanced manufacturing, cm and m perception for embodied agents, 10m operation for autonomous driving, and 100m for drone-based sensing. Despite progress, existing work shows clear limitations in both model design and dataset coverage. Current scene perception research mostly targets indoor scenes, narrow object classes, and limited spatial ranges, and lacks training paradigms engineered for end to end, cross scale reasoning. SpaceVista addresses this gap by presenting the first systematic optimization across both data and model dimensions to enable robust, full-scene spatial reasoning.
# Download SpaceVista-1M
huggingface-cli download SpaceVista/SpaceVista-Full --repo-type dataset --local-dir ./SpaceVista-1MSpaceVista-Bench: Available at .
# Download SpaceVista-Bench
huggingface-cli download SpaceVista/SpaceVista-Bench --repo-type dataset --local-dir ./SpaceVista-BenchWe provide API-based evaluation scripts that work with any OpenAI-compatible API (OpenRouter, OpenAI, POE, etc.). No GPU required.
cd eval
pip install openai pillow numpy tqdm pandas
export API_KEY="your-api-key-here"For APIs that support frame (image) input:
python evaluate_api.py --model qwen/qwen2.5-vl-72b-instructFor APIs that support video input (requires ffmpeg):
python evaluate_api_video.py --model qwen/qwen2.5-vl-72b-instructTo use a different API provider, override --base_url:
# OpenAI
python evaluate_api.py --model gpt-4o --base_url https://api.openai.com/v1
# POE
python evaluate_api.py --model gpt-4o --base_url https://api.poe.com/v1See the eval/README.md for full argument reference, resume support, and output format.
In case you want to train the Qwen2.5-VL-7B model with SpaceVista, please refer to the sft folder for detailed instructions.
If you find this repo useful, please cite our papers:
@article{sun2025spacevista,
title={SpaceVista: All-Scale Visual Spatial Reasoning from mm to km},
author={Sun, Peiwen and Lang, Shiqiang and Wu, Dongming and Ding, Yi and Feng, Kaituo and Liu, Huadai and Ye, Zhen and Liu, Rui and Liu, Yun-Hui and Wang, Jianan and Yue, Xiangyu},
journal={arXiv preprint arXiv:2510.09606},
year={2025}
}

