Real-time face-tracking robot built on ESP32-CAM, ESP32-S3 and OpenCV.
IA-Cam is a distributed embedded system that performs real-time face detection and pan/tilt tracking using off-the-shelf ESP32 hardware. An ESP32-CAM module streams MJPEG video over Wi-Fi to a Python service running OpenCV Haar cascades; detected face coordinates are converted into directional commands and dispatched over HTTP to a second ESP32-S3 board, which actuates two servomotors via the ESP32Servo library. The system is deliberately decoupled into three independent nodes (capture, inference, actuation) to keep the inference loop off the camera MCU and allow each component to be iterated independently.
- MJPEG streaming from the ESP32-CAM over HTTP on port 81.
- Server-side face detection using OpenCV's
haarcascade_frontalface_defaultclassifier. - Center-of-frame tracking with configurable dead zone and inter-command rate limiting.
- Pan/tilt actuation through a dedicated ESP32-S3 motor controller with bounded servo angles (0–180°).
- Automatic 180° frame rotation to compensate for inverted camera mounting.
- Static IP configuration on the motor node to bypass DHCP lookup at boot.
- Web preview UI exposed by the inference server on port 5000 for live monitoring.
| Layer | Technology |
|---|---|
| Camera firmware | Arduino-ESP32 + esp_camera |
| Motor firmware | Arduino-ESP32 + ESP32Servo |
| HTTP (embedded) | esp_http_server / WebServer |
| Vision runtime | Python 3.x + OpenCV (opencv-python) |
| Inference server | Flask |
| HTTP client | Requests |
| Numerical backend | NumPy |
| Target boards | AI-Thinker ESP32-CAM, ESP32-S3 N16R8 |
| Build toolchain | Arduino IDE / arduino-cli |
cam-IA/
├── CameraWebServer/ # ESP32-CAM firmware (capture + MJPEG)
│ ├── CameraWebServer.ino # Setup, Wi-Fi bring-up, camera init
│ ├── app_httpd.cpp # HTTP server, /stream and /move handlers
│ ├── board_config.h # Board-specific defines
│ ├── camera_pins.h # Pin map for AI-Thinker model
│ ├── camera_index.h # Embedded UI assets
│ ├── partitions.csv # Custom flash partition table
│ └── ci.yml # Arduino CI matrix (esp32 / s2 / s3)
├── esp-mot/ # ESP32-S3 firmware (servo controller)
│ └── esp-mot.ino # Static IP, /move?dir=... endpoint
├── camera serv.py # Python inference + tracking server
└── .gitignore
Request flow:
┌────────────────┐ MJPEG (HTTP :81/stream) ┌──────────────────────┐
│ ESP32-CAM │ ──────────────────────────▶ │ Python / OpenCV │
│ (AI-Thinker) │ │ Flask :5000 │
└────────────────┘ │ Haar face detection │
└──────────┬───────────┘
│ HTTP GET
│ /move?dir=...
▼
┌──────────────────────┐
│ ESP32-S3 :80 │
│ Pan/Tilt servos │
└──────────────────────┘
| Requirement | Minimum version |
|---|---|
| Arduino IDE | 2.0 (with esp32 board package ≥ 3.0) |
| Python | 3.9 |
| pip | 22.0 |
| Hardware | 1× ESP32-CAM (AI-Thinker), 1× ESP32-S3, 2× SG90/MG90 servos, stable 5 V supply |
git clone https://github.com/Mastr00/IA-Cam.git
cd IA-Cam
pip install flask opencv-python numpy requestsFlash each firmware from the Arduino IDE:
CameraWebServer/CameraWebServer.ino→ board: AI Thinker ESP32-CAM.esp-mot/esp-mot.ino→ board: ESP32-S3 Dev Module.
The project does not yet ship a .env.example; the following values are inline constants in the source files and must be edited before flashing or running.
| Variable | Scope | Description |
|---|---|---|
ssid |
CameraWebServer.ino, esp-mot.ino |
Wi-Fi SSID used by both ESP32 boards. |
password |
CameraWebServer.ino, esp-mot.ino |
Wi-Fi password. |
local_IP |
esp-mot.ino |
Static IPv4 assigned to the motor node (default 192.168.1.200). |
gateway |
esp-mot.ino |
LAN gateway used for static IP config. |
IP_CAMERA |
camera serv.py |
Base URL of the ESP32-CAM (e.g. http://192.168.1.201). |
IP_MOTEURS |
camera serv.py |
Base URL of the motor controller (e.g. http://192.168.1.200). |
MOVE_DELAY |
camera serv.py |
Minimum delay in seconds between successive motor commands (default 0.1). |
DEAD_ZONE |
camera serv.py |
Pixel radius around the frame center where no movement is issued (default 60). |
SERVO_PAN_PIN / SERVO_TILT_PIN |
app_httpd.cpp |
GPIO pins for the servo lines on the camera board (default 12, 13). |
PIN_PAN / PIN_TILT |
esp-mot.ino |
GPIO pins for the servo lines on the motor board (default 5, 4). |
This repository has no package.json. The relevant commands are:
| Command | Purpose |
|---|---|
pip install flask opencv-python numpy requests |
Install Python runtime dependencies. |
python "camera serv.py" |
Start the Flask inference and tracking server on 0.0.0.0:5000. |
arduino-cli compile --fqbn esp32:esp32:esp32 CameraWebServer |
Build the camera firmware from the CLI. |
arduino-cli compile --fqbn esp32:esp32:esp32s3 esp-mot |
Build the motor controller firmware from the CLI. |
arduino-cli upload -p <PORT> --fqbn <FQBN> <sketch> |
Flash a sketch to a connected board. |
There is no automated test suite. Functional validation is performed end-to-end:
python "camera serv.py"
# then open http://localhost:5000 and confirm faces are boxed in green
# and that the servos track movement inside the configured dead zone.The camera firmware is built against the upstream Arduino-ESP32 matrix declared in CameraWebServer/ci.yml (esp32, esp32s2, esp32s3 — PSRAM enabled / disabled).
The system is deployed on local hardware over a private LAN. There is no cloud target.
- Power both ESP32 boards from a clean 5 V supply (the AI-Thinker module is sensitive to brown-outs; a shared USB hub is not recommended).
- Flash
CameraWebServer.inoand capture the IP printed on the serial monitor at 115200 baud. - Flash
esp-mot.ino. The board self-assigns192.168.1.200by default; adjustlocal_IP/gatewayto match your subnet. - Update
IP_CAMERAandIP_MOTEURSincamera serv.pywith the two addresses. - Run
python "camera serv.py"on a host on the same LAN. - Open
http://<host>:5000to view the annotated stream and validate tracking.
- Wi-Fi credentials and device IPs are currently hard-coded in the firmware sources; rotate them before publishing builds and consider provisioning via NVS or a secrets header excluded from VCS (the existing
.gitignorealready excludes*.env,.env.local, andconfig.local.*). - All HTTP endpoints (
/stream,/move,/video_feed) are unauthenticated and bind to0.0.0.0. Keep the boards on an isolated LAN or VLAN; do not expose them to the public internet without an upstream reverse proxy and authentication layer. - The motor
/movehandler validates and clamps incomingpan/tiltquery parameters to[0, 180]viaconstrain(), preventing out-of-range writes from malformed requests. - The Python tracking loop applies a
MOVE_DELAYrate limit, which acts as a basic guard against runaway command bursts toward the motor node. - Flask is started with
debug=False; do not flip this on in shared environments — the Werkzeug debugger exposes arbitrary code execution. - Build artifacts (
*.elf,*.bin) and IDE state are excluded from VCS via.gitignore.
Released under the MIT License.
Maintainer: Mehdi Mamdouh — @Mastr00 Repository: https://github.com/Mastr00/IA-Cam