Hi @cheng-chi,
I’ve been adapting UMI to a low-cost DIY setup (AR4 arm, custom 3D-printed servo gripper, consumer fisheye camera) to explore how the approach behaves outside the original hardware assumptions. Overall it works surprisingly well, but I noticed two execution-time behaviors I’m trying to understand.
Short demo video: https://youtu.be/DaHaX00mwM8
Code fork: https://github.com/robotsir/umi_ar4_retrofit
- Camera intrinsics / image composition
My camera has slightly different intrinsics and image composition (e.g., border placement, focal length). Images look similar to training, but I occasionally see grasp undershoot (e.g., gripper tip missing the cup around 0:46).
Question:
How sensitive is UMI to intrinsics or framing differences at execution time? Do you expect retraining or fine-tuning once camera geometry deviates beyond a narrow range?
- Control latency & queued commands
After placing the cup, the robot sometimes delays gripper release and exhibits small twitching motions. The arm is not near singularity. This platform has higher latency and supports only single-step commands, so I suspect queued commands executing late due to missed control cycles.
Question:
How tolerant is the UMI control loop to higher or variable robot latency? Are there recommended safeguards (e.g., command dropping, timing assumptions) for this case?
I also designed a simple servo gripper and modeled a cup/saucer set for 3D printing based on the commonly used training objects. Happy to share files if useful.
Thanks for sharing the work — adapting it has been a fun way to explore robustness in constrained systems.
Hi @cheng-chi,
I’ve been adapting UMI to a low-cost DIY setup (AR4 arm, custom 3D-printed servo gripper, consumer fisheye camera) to explore how the approach behaves outside the original hardware assumptions. Overall it works surprisingly well, but I noticed two execution-time behaviors I’m trying to understand.
Short demo video: https://youtu.be/DaHaX00mwM8
Code fork: https://github.com/robotsir/umi_ar4_retrofit
My camera has slightly different intrinsics and image composition (e.g., border placement, focal length). Images look similar to training, but I occasionally see grasp undershoot (e.g., gripper tip missing the cup around 0:46).
Question:
How sensitive is UMI to intrinsics or framing differences at execution time? Do you expect retraining or fine-tuning once camera geometry deviates beyond a narrow range?
After placing the cup, the robot sometimes delays gripper release and exhibits small twitching motions. The arm is not near singularity. This platform has higher latency and supports only single-step commands, so I suspect queued commands executing late due to missed control cycles.
Question:
How tolerant is the UMI control loop to higher or variable robot latency? Are there recommended safeguards (e.g., command dropping, timing assumptions) for this case?
I also designed a simple servo gripper and modeled a cup/saucer set for 3D printing based on the commonly used training objects. Happy to share files if useful.
Thanks for sharing the work — adapting it has been a fun way to explore robustness in constrained systems.