This is a community fork of TencentARC/Pixal3D with several fixes and improvements for self-hosted use on high-VRAM workstations. See CHANGELOG.md for the full technical diff. All original credits and license terms are preserved below.
What this fork fixes / adds:
- Preview frames and 3D viewer were black after generation due to Gradio
FileDataserialisation bug — fixedremesh_project=0was overriding the library default, producing blocky voxel-grid topology — fixed to0.9CASCADE_MAX_NUM_TOKENSraised from 49152 → 131072 so generation always runs at the model's maximum 1536 voxel resolution instead of silently stepping down to 1280 for complex objects- Full Fidelity mesh extraction — the default extraction was discarding the model's actual decoded mesh (FlexiDualGrid with learned sub-voxel vertex offsets) and replacing it with a coarser Dual Contouring reconstruction on a 96-cell grid, producing large facets in smooth areas regardless of the polygon limit set. Default changed to
remesh=False. DC Remesh still selectable for clean-topology workflows. See CHANGELOG.md for a full explanation.- OBJ export — mesh extracted as OBJ + MTL + textures ZIP alongside the GLB
- All quality controls (Shape Steps, Texture Steps, Texture Size, Max Faces up to 5M, Mesh Mode) exposed and wired end-to-end — nothing hardcoded
- Every slider paired with an editable number field — type exact polygon budgets directly
- Every control has a description and tooltip explaining what it does and its meaningful range
Dong-Yang Li¹ · Wang Zhao²* · Yuxin Chen² · Wenbo Hu² · Meng-Hao Guo¹ · Fang-Lue Zhang³ · Ying Shan² · Shi-Min Hu¹✉
¹Tsinghua University (BNRist) ²Tencent ARC Lab ³Victoria University of Wellington
*Project lead ✉Corresponding author
Pixal3D generates high-fidelity 3D assets from a single image. Unlike previous methods that loosely inject image features via attention, Pixal3D explicitly lifts pixel features into 3D through back-projection, establishing direct pixel-to-3D correspondences. This enables near-reconstruction-level fidelity with detailed geometry and PBR textures.
- May 2026: Release the improved version based on Trellis.2 backbone. 💪
- May 2026: Release inference code and online demo. 🤗
- Apr 2026: Our paper is accepted to SIGGRAPH 2026! 🎉
| Branch | Description |
|---|---|
main |
Latest version — improved implementation based on Trellis.2 backbone with better performance. |
paper |
Paper version — original implementation based on Direct3D-S2, corresponding to results reported in our SIGGRAPH 2026 paper. |
If you want to reproduce the results in our paper, please switch to the
paperbranch.
You can try Pixal3D directly in your browser without any installation via our Hugging Face Gradio demo:
Please first follow the installation guide of TRELLIS.2 to set up the base environment.
pip install -r requirements.txtpip install https://github.com/LDYang694/Storages/releases/download/20260430/utils3d-0.0.2-py3-none-any.whlNote:
requirements-hfdemo.txtis for the Hugging Face Spaces demo (H-series GPU architecture) and may not be compatible with other architectures.
Generate a GLB mesh from a single image:
python inference.py --image assets/test_image/0.png --output ./output.glbWe provide a Gradio web demo for Pixal3D, which allows you to generate 3D meshes from images interactively.
python app.py This project is heavily built upon Trellis.2 and Direct3D-S2. We sincerely thank the authors for their outstanding work on scalable 3D generation , which serves as the foundation of our codebase and model architecture.
We also thank the following repos for their great contributions:
If you find this work useful, please consider citing:
@article{li2026pixal3d,
title={Pixal3D: Pixel-Aligned 3D Generation from Images},
author={Li, Dong-Yang and Zhao, Wang and Chen, Yuxin and Hu, Wenbo and Guo, Meng-Hao and Zhang, Fang-Lue and Shan, Ying and Hu, Shi-Min},
journal={arXiv preprint arXiv:2605.10922},
year={2026}
}
