This project demonstrates how LLMs (Large Language Models) can serve as the "Intelligence Provider" for robotic systems, encapsulated as a Model Context Protocol (MCP) server.
The system is split into two independent MCP servers and a frontend:
-
gil_controls/ (Python):
- Role: The Body.
- Manages robot state, inverse kinematics, and sensor data.
- Runs a WebSocket server to communicate with the simulation frontend.
- Exposes tools:
move_arm,control_gripper,get_robot_state,get_latest_image.
-
gil_models/ (Python):
- Role: The Brain.
- Manages heavy AI models (Cosmos, Gemini, LidarGen).
- Processes visual data into semantic understanding.
- Exposes tools:
analyze_scene,generate_point_cloud.
-
gil_frontend/ (React):
- Role: The World.
- Visualizes the 3D simulation (Three.js).
- Connects to
gil_controlsvia WebSocket to receive commands and send sensor data.
An Autonomous Agent (e.g., running in Claude Desktop or a script) orchestrates the system:
- Agent calls
gil_controls.get_latest_image(). - Agent passes image to
gil_models.analyze_scene(image). - Agent decides on action based on analysis.
- Agent calls
gil_controls.move_arm(x, y, z).
cd gil_controls
pip install -r requirements.txt
python src/main.pycd gil_models
pip install -r requirements.txt
export GOOGLE_API_KEY=your_key
python src/main.pycd gil_frontend
npm install
npm run dev