Positronic supports three foundation models with different capabilities and resource requirements. This guide helps you choose the right model for your task.
Start with LeRobot ACT if you want something quick and low-cost. Progress to OpenPI (π₀.₅) or GR00T when you need more capable models. Positronic makes switching easy.
| Aspect | OpenPI (π₀.₅) | GR00T | SmolVLA | LeRobot ACT |
|---|---|---|---|---|
| Capability | Most capable, generalist | Generalist | Vision-language-action | Single-task specialist |
| Training Hardware | capable cloud GPU (~78GB, LoRA) | capable cloud GPU (~50GB) | Consumer GPU (RTX 3090, 4090) | Consumer GPU (RTX 3090, 4090) |
| Training Time | Multiple days | 0.5-2 days | Several hours | Several hours |
| Inference Hardware | GPU (~62GB, likely cloud) | GPU (~7.5GB, can run on robot) | Consumer GPU (4GB+) | Consumer GPU (4GB+) |
| Inference Speed | Moderate | Moderate | Moderate | Fast |
| Best For | Complex multi-task manipulation, generalization | General robotics tasks | Language-conditioned manipulation | Specific manipulation tasks, fast iteration |
| When to Use | Need generalization, multi-task scenarios, leveraging foundation models | Prefer NVIDIA stack | Language instructions, VLM backbone | Single task, resource constraints, rapid experimentation |
What it is: Foundation model for robotics trained by Physical Intelligence on diverse manipulation tasks.
Strengths:
- Most capable generalization across manipulation scenarios
- Can leverage pretrained knowledge from large-scale training
- State-of-the-art performance on complex tasks
Limitations:
- Requires ~78GB GPU for training, ~62GB for inference (likely cloud deployment)
- Training takes multiple days
What it is: NVIDIA's foundation model for generalist robot control.
Strengths:
- Generalist capabilities
- Can run on smaller GPU (~7.5GB inference, can run closer to robot)
- Requires ~50GB for training (less than OpenPI)
- Training takes 1-2 days (faster than OpenPI)
Limitations:
- Requires capable GPU for training
- Slower than single-task models
What it is: A compact vision-language-action model from HuggingFace LeRobot (0.4.x). Combines a VLM backbone with action prediction for language-conditioned manipulation.
Strengths:
- Language-conditioned: specify tasks in natural language
- VLM backbone enables visual understanding
- Consumer GPU training and inference
- Part of the LeRobot ecosystem
Limitations:
- Larger than ACT (VLM backbone)
- Requires 512x512 images
What it is: Action Chunking Transformer from HuggingFace LeRobot, designed for efficient single-task imitation learning. Can be multi-task with sufficient data and task conditioning.
Strengths:
- Fast training (several hours on consumer GPUs)
- Efficient inference on consumer hardware
- Easy and quick to get started
- Rapid iteration cycles
- LeRobot ecosystem, cheaper models, bigger community
Limitations:
- Single-task focus (multi-task possible but not primary strength)
- Less generalization than foundation models
Positronic relies on our forks of openpi and gr00t. We do our best to keep them up to date with upstream repositories.
Positronic's goal is to democratize ML/AI in robotics. You shouldn't be locked to a single vendor or architecture.
- Same dataset format — Record once, train on any model
- Same inference API — Swap models without changing hardware code
- Easy experimentation — Try all models with your data, pick what works best
- Future-proof — We'll keep adding foundation models as they emerge
- Start with your data — Collect demonstrations using Positronic
- Experiment freely — Try LeRobot ACT for quick baseline, then GR00T or OpenPI
- Compare results — Use the same dataset and inference code across models
- Deploy the winner — Pick the model that balances performance and resources for your use case
No! Positronic's dataset library stores data in a format-agnostic way. Record once, then use codecs to project your data to different model formats. You can:
- Train LeRobot ACT for fast baseline
- Train GR00T for comparison
- Train OpenPI for best performance
All from the same raw dataset using different codecs.
Start with LeRobot ACT:
- Fastest to train and iterate
- Lowest resource requirements
- Validates your dataset and task setup
- Simplest model to get working end-to-end
Then progress to GR00T or OpenPI if you need:
- Multi-task generalization
- Better performance on complex scenarios
- More capable models
Positronic's architecture is extensible. We'll continue adding foundation models as they emerge. You can also implement custom model integrations following our vendor patterns.
- Choose your model using the decision tree above
- Review the model-specific README for detailed workflow
- Check the Codecs Guide to understand observation/action encoding
- Follow the Training Workflow for end-to-end steps
- OpenPI Documentation
- GR00T Documentation
- SmolVLA Documentation
- LeRobot ACT Documentation
- Codecs Guide — Understanding observation encoding and action decoding
- Training Workflow — Unified training steps across all models