Skip to content

yorklyb/SI-Diff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

119 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SI-Diff: A Framework for Learning Search and High-Precision Insertion with a Force-Domain Diffusion Policy

RA-L'26 & ICRA'27

[ArXiv], [Paper], [Project Page]

From the Author

Due to IP policies, we do not release a click-and-run version of SI-Diff. We will provide more supplementary details to the paper to help you reproduce the work.

We first provide a straightforward introduction to the fundamentals of robot control to help readers avoid confusion. The second-order dynamic model of an n-Degree-of-Freedom torque-controlled robot is as follows:
image
Among these terms, we control the robot by changing $\boldsymbol{\tau}_m$, the joint torque. If the following algorithm is used, the robot is controlled by an impedance controller.
image
Based on this, if a feedforward force term is added, the controller becomes a feedforward force-based impedance controller, which is the controller used in this work.
image
Our force diffusion policy learns how to predict the feedforward force.

Note that we rely on the error term e to drive the end effector (EE) to the desired position. In other words, we need to first define a desired position or trajectory. Although the feedforward force can also influence the motion of the EE, we only rely on it to handle misalignment or sticking situations.

Step 1: Impedance Controller

First, you need to build an impedance controller for your robot. If you are using a Franka Robotics robot, you can follow this demo. Once this step is completed, your robot should behave like the one shown in the following video.

595577416-4ef82801-d471-4a69-8b65-04aa87ca3d07.mp4

Step 2: Feedforward-based Impedance Controller

On top of the impedance controller, you need to further add a feedforward force term to the controller. You can start by designing the feedforward force using a simple pattern. For example, you can set fz as a sinusoidal signal and set fx, fy, mx, my, and mz to zero. Then, your robot should behave as shown in the following video.

595577540-e5ae456f-881e-4a4d-90d8-ec9e37ff4f6c.mov

Step 3: Teacher Policy

Follow Algorithm 1 in our paper to design the teacher policy and collect training data. We provide one demonstration (robot_action.pkl & robot_state.pkl) in this repository to show what the training data look like.

Our diffusion policy learns to predict robot action (output) from robot states (input). The action is the 6 DoF feedforward force (fx, fy, fz, mx, my, and mz). The robot state is 37-dimensional: the first value is the mode prompt, and the following 36 dimensions are identical to the observations in TacDiffusion. You can refer to the discussion here for details regarding the 36 dimension values.

Once the teacher policy is ready, the robot can start searching. In the early stages, we manually created misalignments to collect data for the teacher policy. Later, we developed an automated data collection pipeline. It mirrors the evaluation process of the teacher policy, but only records successful demonstrations that meet our efficiency criteria (completed within 2 seconds). We kept running this until a sufficient number of expert demonstrations are collected.

auto_data.mp4

Step 4: Diffusion Policy

Our diffusion policy is built upon Imitating-Human-Behaviour-w-Diffusion and TacDiffusion. We recommend first becoming familiar with these two works, then following the instructions in our paper to add the mode embedding layers.
network

Step 5: Model Training

Since the model needs to learn two modes simultaneously, and the data distribution between the two modes is imbalanced, we recommend using the BBS technique. The following code briefly illustrates one training iteration process.

for ep in range(n_epoch):
    dataload_train_0.sampler.set_epoch(ep)
    dataload_train_1.sampler.set_epoch(ep)

    model.train()
    optim.param_groups[0]["lr"] = lrate * ((np.cos((ep / n_epoch) * np.pi) + 1) / 2)

    pbar = zip(dataload_train_0, dataload_train_1)
    if rank == 0:
        pbar = tqdm(pbar, total=min(len(dataload_train_0), len(dataload_train_1)), desc=f"Epoch {ep}")

    for (x0, y0), (x1, y1) in pbar:
        # 1. Move tensors to the configured device asynchronously
        x0 = x0.to(device, non_blocking=True).float()
        y0 = y0.to(device, non_blocking=True).float()
        x1 = x1.to(device, non_blocking=True).float()
        y1 = y1.to(device, non_blocking=True).float()

        # 2. Extract the mode prompt from the first dimension (index 0)
        # Input shape: [B, 37] -> mode shape: [B], feature shape: [B, 36]
        mode0 = x0[:, 0].long()       # Cast to long for the embedding layer
        x0_feature = x0[:, 1:]        # Slice the remaining 36 dimensions for observations

        mode1 = x1[:, 0].long()
        x1_feature = x1[:, 1:]

        # 3. Concatenate the dual-source data into a single balanced batch
        x_batch = torch.cat([x0_feature, x1_feature], dim=0)  # Pure 36-dim observations
        y_batch = torch.cat([y0, y1], dim=0)
        mode_batch = torch.cat([mode0, mode1], dim=0)          # Combined mode prompts

        # 4. Forward pass and loss computation
        loss = model.module.loss_on_batch(x_batch, y_batch, mode_batch)
        
        # 5. Backward pass and optimization step
        optim.zero_grad()
        loss.backward()
        optim.step()

        if rank == 0:
            pbar.set_description(f"train loss: {loss.item():.4f}")
            writer.add_scalar('training_loss', loss.item(), global_step)
            global_step += 1

Modifications to FORGE

FORGE is an RL-based STOA competitor in our paper. However, the released FORGE code cannot be directly used to learn peg-in-hole tasks with our objects. In particular, although the authors mention in the paper that a noisy estimate of the fixed part’s 6-DoF pose (which lies in SE(3)) is adopted as input to the model, we found that in their code implementation, the model only utilizes the 3-DoF translation component.

FORGE paper

image

Released FORGE code

image
Fingertip refers to the end-effector. Please note that pos does not represent a 6D pose, but rather a 3D position vector, while quat denotes the quaternion. Namely, the released code implementation does not fully reproduce the algorithm described in the paper. This discrepancy is also reflected in how the “key points” are defined in the code. In their implementation, the key points defined on each object all lie on a single line. The training objective is to align the key points defined on the peg (blue) with those defined on the hole (green). Under this definition, orientation becomes meaningless, since two parallel lines (both perpendicular to the ground) differ only by translational error. We experimentally found that while this setup works for the cylinder-like pegs used in the paper, it fails for the cuboid peg, which requires additional constraints to properly guide alignment during insertion.
image


Therefore, we made the following two modifications to the FORGE code implementation.
First, we revised the definition of key points, from colinear points to the four corners of a square, along with their normal direction. Please refer to the figure above. By comparing the two setups, we can see that orientation information becomes meaningful, as a specific orientation must be followed to align the key points, unlike in the previous setup.
Second, following the algorithm described in the FORGE paper, we added the orientation-related term fingertip_euler_rel_fixed, which represents the noisy estimate of the fixed part’s rotation, and modified the corresponding training and inference procedures accordingly.
image

Acknowledgments

Parts of this project page were adopted from the Nerfies page. We would like to thank the authors of Imitating-Human-Behaviour-w-Diffusion and TacDiffusion for their open-source contributions.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors