Skip to content

ShiqiangLang/SpaceVista

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SpaceVista: All-Scale Visual Spatial Reasoning from $mm$ to $km$

  🤗 Hugging Face   |    📑 Paper    |    ⚙️ Github    | 🖥️ Home Page  

Peiwen Sun $^{*}$, Shiqiang Lang $^{*}$, Dongming Wu, Yi Ding, Kaituo Feng, Huadai Liu, Zhen Ye, Rui Liu, Yun-Hui Liu, Jianan Wang, Xiangyu Yue

Keywords: Multi-Modal All-Scale Spatial Reasoning

The official repo for SpaceVista: All-Scale Visual Spatial Reasoning from $mm$ to $km$.

Outlines

💥 News 💥

[2026.5.28] 📦 Our SpaceVista-1M is released at .

[2026.5.28] 🎯 Our SpaceVista-Bench is released at .

[2026.5.10] 🏆 The Guinness World Records data used in our paper is released at .

[2026.5.2] 🎉 Our paper is accepted by ICML 2026. See you in Seoul.

[2025.10.10] Our preview SFT code base is released for preview. .

[2025.10.10] Our preview 100K subset of SpaceVista-1M is now available at .

[2025.10.10] Our initial paper is now accessible at .

Overall Structure

SpaceVista


Spatial reasoning is the ability to perceive, interpret, and act across spatial scales, from millimeter-sized components to distant aerial scenes. All-scale spatial reasoning is fundamental to next-generation intelligent systems and supports diverse applications: mm sensing for advanced manufacturing, cm and m perception for embodied agents, 10m operation for autonomous driving, and 100m for drone-based sensing. Despite progress, existing work shows clear limitations in both model design and dataset coverage. Current scene perception research mostly targets indoor scenes, narrow object classes, and limited spatial ranges, and lacks training paradigms engineered for end to end, cross scale reasoning. SpaceVista addresses this gap by presenting the first systematic optimization across both data and model dimensions to enable robust, full-scene spatial reasoning.

Training Data

SpaceVista-1M: Available at .

# Download SpaceVista-1M
huggingface-cli download SpaceVista/SpaceVista-Full --repo-type dataset --local-dir ./SpaceVista-1M

Evaluation Data

SpaceVista-Bench: Available at .

# Download SpaceVista-Bench
huggingface-cli download SpaceVista/SpaceVista-Bench --repo-type dataset --local-dir ./SpaceVista-Bench

Evaluation

We provide API-based evaluation scripts that work with any OpenAI-compatible API (OpenRouter, OpenAI, POE, etc.). No GPU required.

cd eval
pip install openai pillow numpy tqdm pandas
export API_KEY="your-api-key-here"

For APIs that support frame (image) input:

python evaluate_api.py --model qwen/qwen2.5-vl-72b-instruct

For APIs that support video input (requires ffmpeg):

python evaluate_api_video.py --model qwen/qwen2.5-vl-72b-instruct

To use a different API provider, override --base_url:

# OpenAI
python evaluate_api.py --model gpt-4o --base_url https://api.openai.com/v1

# POE
python evaluate_api.py --model gpt-4o --base_url https://api.poe.com/v1

See the eval/README.md for full argument reference, resume support, and output format.

Usage

In case you want to train the Qwen2.5-VL-7B model with SpaceVista, please refer to the sft folder for detailed instructions.

Reference

If you find this repo useful, please cite our papers:

@article{sun2025spacevista,
  title={SpaceVista: All-Scale Visual Spatial Reasoning from mm to km}, 
  author={Sun, Peiwen and Lang, Shiqiang and Wu, Dongming and Ding, Yi and Feng, Kaituo and Liu, Huadai and Ye, Zhen and Liu, Rui and Liu, Yun-Hui and Wang, Jianan and Yue, Xiangyu},
  journal={arXiv preprint arXiv:2510.09606},
  year={2025}
}

About

SpaceVista: All-Scale Visual Spatial Reasoning from mm to km.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 99.6%
  • Other 0.4%