Skip to content

evenpang666/defense_ei_agents

Repository files navigation

defense_ei_agents Real UR7e Workflow

This workflow runs four DefenseAgent ReAct agents around a real UR7e:

  1. planner decomposes the natural-language task into primitive-level atomic tasks.
  2. supervisor turns each atomic task into execution-relevant information and visual done criteria.
  3. coder generates restricted Python for exactly one phase of one atomic task.
  4. judger evaluates each post-phase global/wrist RGB image pair and returns SUCCESS or FAIL.

The orchestrator is evaluate_defense_agent_real.py. It executes real robot code only; there is no dry-run or generate-only mode.

Safety Notes

  • Use this only with the robot in a supervised lab setting.
  • Put the UR e-Series robot in Remote Control mode before running.
  • Keep E-stop reachable.
  • Test camera capture and a tiny free-space motion before object interaction.
  • Generated code is statically checked before execution. It may call only: move_x, move_y, move_z, rotate_x, rotate_y, rotate_z, look_at_operated_object, sleep, and gripper_control.
  • Primitive names such as pick_place, push, pull, press, open, close, and pour are labels only. The coder must expand them into axis-wise motion and gripper_control phases.

Required Environment Variables

DefenseAgent resolves LLM settings from profile values first, then environment variables. The checked-in profiles intentionally leave provider, model, base_url, and api_key blank.

Recommended OpenAI-compatible setup:

export AGENT_LAB_LLM_PROVIDER="openai"
export LLM_MODEL_ID="gpt-5.4"
export LLM_BASE_URL="https://api.qnaigc.com"
export LLM_API_KEY="..."

Provider-specific variables also work when LLM_* is unset:

export AGENT_LAB_LLM_PROVIDER="openai"
export OPENAI_MODEL="gpt-5.4"
export OPENAI_BASE_URL="https://api.qnaigc.com"
export OPENAI_API_KEY="..."

Optional embedding variables enable DefenseAgent memory/RAG. If no embedding settings are provided, the real workflow runs with memory disabled:

export EMBEDDING_MODEL="qwen/qwen3-embedding-8b"
export EMBEDDING_API_KEY="..."
export EMBEDDING_BASE_URL="https://openrouter.ai/api/v1"
export EMBEDDING_DIMS="1536"

The same settings can be passed on the command line:

--embedding-model text-embedding-3-small \
--embedding-api-key "$LLM_API_KEY" \
--embedding-base-url "$LLM_BASE_URL" \
--embedding-dims 1536

Dependencies

Install the project dependencies, DefenseAgent, RealSense bindings, and UR RTDE support in the active Python environment:

pip install imageio pyyaml numpy openai ur-rtde pyrealsense2 ultralytics
pip install defense-agent

Installing DefenseAgent with pip install -e ... is important because the source tree also needs package dependencies such as ms-agent and omegaconf.

If DefenseAgent lives outside this copied folder, either install it into the active environment or set:

export DEFENSE_AGENT_ROOT="/path/to/DefenseAgent"

For the original sibling checkout layout, the script also auto-detects ../defense_EI/third_party/DefenseAgent.

If you use the repository-wide environment from the top-level README, make sure ur-rtde, pyrealsense2, imageio, and pyyaml are present.

Hardware Inputs

The runtime uses:

  • URScript primary socket: port 30003
  • URScript secondary socket: port 30002
  • RTDE receive: port 30004
  • Robotiq socket: port 63352
  • Two Intel RealSense D435i RGB-D streams, global and wrist

If only one RealSense stream is found, it is reused as both global and wrist input. For reliable judging, use two cameras and pass serials explicitly:

--camera-serials GLOBAL_SERIAL,WRIST_SERIAL

GLOBAL_SERIAL and WRIST_SERIAL are placeholders. To list the serials detected by pyrealsense2, run:

python evaluate_defense_agent_real.py --list-cameras

If listing cameras reports access denied on Windows, close Intel RealSense Viewer or other camera apps and check Windows camera privacy permissions.

If live capture fails with Frame didn't arrive within 5000, close RealSense Viewer or other camera apps, check both cameras are on USB3 ports, and try a more forgiving capture setup:

--camera-frame-timeout-ms 20000 \
--camera-capture-retries 5 \
--camera-width 640 \
--camera-height 480 \
--camera-fps 15

RGB-D Scene State

The workflow now saves depth sidecars and a structured scene_state.json for the initial observation and every post-phase capture. The scene state is passed to both coder and judger as the perception handoff. Object estimation uses the global RealSense RGB image for YOLO boxes/masks and the aligned global depth map to reconstruct local 3D point clouds. By default, all YOLO-detected scene objects are included in scene_state.json.

No global-camera calibration is required. Object coordinates are reported relative to the global camera, for example:

  • objects[].center_global_camera_m
  • objects[].grasp_region_center_global_camera_m
  • objects[].point_cloud.bounds_camera_m

The current TCP pose is read from the robot and stored in the same JSON under tcp.pose_base. Optional base/gripper object fields are only populated when the corresponding transforms are available.

If you also want optional base-frame object coordinates, provide the fixed global-camera extrinsic:

python evaluate_defense_agent_real.py \
  --task "pick up the red block and place it into the bowl" \
  --robot-ip 169.254.26.10 \
  --camera-serials GLOBAL_SERIAL,WRIST_SERIAL \
  --global-camera-base-transform calibration/t_base_global_camera.json \
  --yolo-seg-model models/best.pt

The transform file should contain either a top-level 4x4 JSON array or {"matrix": [[...], [...], [...], [...]]} for T_base_global_camera.

The wrist camera defaults to an approximate gripper transform with offsets of -0.04 m along gripper Y and -0.09 m along gripper Z. Override it with:

--wrist-camera-gripper-transform calibration/t_gripper_wrist_camera.json

By default, --perception-backend yolo stores and uses checkpoints/yolo26n-seg.pt when --yolo-seg-model is omitted. If you pass a bare Ultralytics model name such as --yolo-seg-model yolo26n-seg.pt, it is also copied or downloaded under checkpoints/, which is the default directory for all YOLO checkpoints used by this workflow. Segmentation models such as *-seg.pt are preferred because their masks produce cleaner local object point clouds, but detection-only models are also supported by estimating depth from the detected box region. The selected pixels are converted into a local RGB-D point cloud. The workflow reports both the object center and a point-cloud grasp reference in objects[].grasp_region_center_global_camera_m.

To limit scene_state to specific YOLO classes, pass comma-separated labels:

--yolo-target-labels block,bowl

If labels are omitted, all YOLO segmentation detections are included. You can tune thresholds with:

--yolo-conf 0.35 --yolo-iou 0.5

YOLO inference defaults to CPU because some Windows CUDA environments install a torchvision build without CUDA NMS support. If you see an error mentioning torchvision::nms and the CUDA backend, keep the default or pass:

--yolo-device cpu

After installing matching CUDA builds of torch and torchvision, you can opt back into GPU inference with --yolo-device 0 or --yolo-device cuda:0.

If you need optional base-frame object fields, create calibration/t_base_global_camera.json with calibrate_global_camera.py. See docs/global_camera_calibration.md for the full step-by-step procedure.

Capture Perception Snapshot

Use this to verify camera wiring and write a one-shot RGB-D perception snapshot without running planner/coder/judger or executing robot motion. The command connects to the robot by default to read the current TCP pose, so the compact object-position JSON contains both tcp.pose_base and all detected object positions in the global camera frame.

python evaluate_defense_agent_real.py \
  --task "capture test" \
  --capture-only \
  --camera-serials GLOBAL_SERIAL,WRIST_SERIAL \
  --yolo-seg-model checkpoints/robot_yolo26n_seg.pt

The command writes these files under logs/defense_agent_real/<timestamp>/:

  • current_global_rgb.png, current_wrist_rgb.png: raw RGB images.
  • current_global_yolo.png, current_wrist_yolo.png: YOLO detection overlays when --perception-backend yolo is active.
  • current_scene_state.json: full object position/grasp-point scene state.
  • current_object_positions.json: compact per-object position JSON.
  • *_depth_m.npy, *_depth_mm.png, *_depth_vis.png: depth sidecars; _depth_vis.png is a colorized depth preview.
  • *_point_cloud.ply: sampled RGB-D point cloud in camera-frame meters.
  • *_point_cloud_front.png, *_point_cloud_top.png: point-cloud visualizations.
  • summary.json: paths to the generated artifacts and the TCP pose used for gripper-relative coordinates.

Record Keypoints

Use record_keypoints.py to continuously save named TCP poses with the current wrist RGB image. Each entered name appends one record to keypoint_database.json and saves the image under record_image/:

python record_keypoints.py \
  --robot-ip 169.254.26.10 \
  --camera-serials GLOBAL_SERIAL,WRIST_SERIAL

For a one-shot record:

python record_keypoints.py \
  --robot-ip 169.254.26.10 \
  --camera-serials GLOBAL_SERIAL,WRIST_SERIAL \
  --name tube_1_grasp_point

During the full workflow, evaluate_defense_agent_real.py loads keypoint_database.json by default and passes the matching records to coder before each phase. The database is reference context only: coder should use the matched TCP pose and wrist image to understand where the grasp/place point is, then approach through move_x/move_y/move_z and rotate_* commands with distances appropriate to the remaining offset. Generated code must not call move_to_keypoint or any direct absolute-pose movement API.

Use a custom database path with:

--keypoint-database path/to/keypoint_database.json

Run The Full Workflow

Normal run with live initial capture:

python evaluate_defense_agent_real.py \
  --task "pick up the red block and place it into the bowl" \
  --robot-ip 169.254.26.10 \
  --camera-serials GLOBAL_SERIAL,WRIST_SERIAL \
  --max-attempts-per-atomic 3

Run from existing initial images while still using RealSense for post-execution feedback:

python evaluate_defense_agent_real.py \
  --task "pick up the red block and place it into the bowl" \
  --current-global-image logs/example/current_global_rgb.png \
  --current-wrist-image logs/example/current_wrist_rgb.png \
  --robot-ip 169.254.26.10 \
  --camera-serials GLOBAL_SERIAL,WRIST_SERIAL

When existing initial images are supplied, both paths must be provided. The orchestrator still opens the RealSense cameras so the judger receives fresh after-action images. global_image is the whole-scene view; wrist_image is the gripper view and may show part of the gripper along the bottom.

Robotiq Fallback

By default, arm control can continue if the Robotiq socket is unavailable unless --strict-gripper-connection is set. If your setup requires URScript fallback definitions, pass:

--robotiq-urscript-defs-path /path/to/robotiq_defs.script

Use strict mode when gripper motion is required for the task:

--strict-gripper-connection

Outputs

Each run creates a timestamped directory under logs/defense_agent_real/ containing:

  • real_*_profile.yaml: resolved profile copies for the run
  • defense_ei_plan.json: planner output
  • defense_ei_atomic_task_info.json: supervisor output
  • atomic_XX/phase_NN_<slug>/attempt_YY/real_phase_actions.py: generated code for one phase only
  • atomic_XX/phase_NN_<slug>/attempt_YY/syntax_check.json: syntax and runtime API validation for that phase
  • atomic_XX/phase_NN_<slug>/attempt_YY/real_execution.json: robot execution report for that phase
  • atomic_XX/phase_NN_<slug>/attempt_YY/phase_start_robot_state.json: joint, TCP, and gripper snapshot saved immediately before executing that phase
  • atomic_XX/phase_NN_<slug>/attempt_YY/restore_disabled_after_judge_fail.json: record written when the judger marks the phase attempt as FAIL; the robot is left at its post-attempt pose and is not automatically restored
  • atomic_XX/phase_NN_<slug>/attempt_YY/phase_NN_<slug>_gaze_refine_XX_*: closed-loop wrist-gaze captures created by look_at_operated_object() during mandatory pick/place approach phases
  • atomic_XX/phase_NN_<slug>/attempt_YY/phase_NN_<slug>_global_rgb.png and phase_NN_<slug>_wrist_rgb.png: post-phase validation images, resized proportionally to 128 px wide
  • atomic_XX/phase_NN_<slug>/attempt_YY/phase_NN_<slug>_scene_state.json: post-phase RGB-D scene-state estimates for coder/judger context
  • atomic_XX/phase_NN_<slug>/attempt_YY/real_atomic_judge_phase_NN.json: per-phase judger result; the next phase is generated only after this phase is judged SUCCESS
  • atomic_XX/phase_NN_<slug>/attempt_YY/real_atomic_judge.json: final result for the current phase attempt
  • summary.json: final status summary

Exit code is 0 when all atomic tasks complete and 2 when the workflow stops after failed attempts. --max-attempts-per-atomic now limits retries for each phase within an atomic task.

Profiles And Contracts

Profiles live in:

  • planner/profile.yaml
  • supervisor/profile.yaml
  • coder/profile.yaml
  • judger/profile.yaml

The coder contracts are the most safety-critical:

  • coder/prompts/system.md
  • coder/skills/atomic-code-contract/SKILL.md
  • coder/skills/real-robot-code-contract/SKILL.md
  • coder/skills/primitive-skill-contract/SKILL.md
  • coder/python_tools/atomic_runtime_contract.py

If you add new runtime APIs, update all of these plus make_real_runtime_api() and the orchestrator's static runtime-call allowlist together.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages