A hands-on Jupyter notebook course taking you from basic Python to applying 6D pose estimation in real robotics applications.
By the end of this course you will be able to:
- Calibrate a real camera and understand its intrinsic and extrinsic parameters
- Estimate the 6D pose (3D position + 3D orientation) of objects using classical and deep learning methods
- Use ArUco markers to guide a mobile robot to a docking station
- Detect pallets from a CAD model and compute alignment offsets for a forklift
- Run pose estimation in real time from a webcam feed
| Topic | Required level |
|---|---|
| Python | Intermediate — OOP basics, dictionaries, loops, list comprehensions |
| Linear algebra | Basic — matrix multiplication, determinants, eigenvalues |
| NumPy | Not required — covered in Part 1 |
| OpenCV | Not required — covered in Parts 2–5 |
| Deep learning | Not required — covered in Part 7 |
Part 0 — Getting Started (notebooks 00–01)
Part 1 — Tools: Jupyter + NumPy (notebooks 02–03)
Part 2 — OpenCV Foundations (notebooks 04–05)
Part 3 — Camera Model (notebooks 06–08)
Part 4 — Classical Pose (solvePnP)(notebooks 09–10)
Part 5 — ArUco Markers (notebooks 11–15)
Part 6 — Stereo Vision (notebooks 16–17)
Part 7 — Deep Learning 6D Pose (notebooks 18–22)
Part 8 — Robotics Projects (notebooks 23–25)
| # | File | Topic | Time |
|---|---|---|---|
| 00 | part_0_getting_started/00_welcome_and_roadmap.ipynb |
What is 6D pose, course map, real-world problems | 20 min |
| 01 | part_0_getting_started/01_environment_setup.ipynb |
venv, conda, pip, CUDA, VSCode, Colab, Docker intro | 30 min |
| 02 | part_1_tools/02_jupyter_notebooks_101.ipynb |
Cells, kernel, shortcuts, magic commands | 25 min |
| 03 | part_1_tools/03_numpy_for_cv.ipynb |
Arrays, shapes, dtypes, broadcasting, matplotlib | 45 min |
| 04 | part_2_opencv_foundations/04_intro_to_opencv.ipynb |
Install, BGR/RGB, imread, VideoCapture, drawing | 40 min |
| 05 | part_2_opencv_foundations/05_image_operations.ipynb |
Resize, color spaces, filters, edge detection | 50 min |
| 06 | part_3_camera_model/06_camera_model_theory.ipynb |
Coordinate frames, pinhole, intrinsics K, x=KPX | 60 min |
| 07 | part_3_camera_model/07_camera_calibration.ipynb |
Chessboard, findChessboardCorners, calibrateCamera | 50 min |
| 08 | part_3_camera_model/08_distortion_undistortion.ipynb |
Radial/tangential distortion, undistort, remap | 40 min |
| 09 | part_4_classical_pose/09_solvePnP_explained.ipynb |
2D-3D correspondences, Levenberg-Marquardt | 55 min |
| 10 | part_4_classical_pose/10_pose_with_chessboard.ipynb |
Full demo: calibration → solvePnP → 3D cube | 60 min |
| 11 | part_5_aruco/11_aruco_theory.ipynb |
Binary grids, Hamming distance, dictionaries | 40 min |
| 12 | part_5_aruco/12_generate_aruco.ipynb |
drawMarker, save PNG, printing tips | 20 min |
| 13 | part_5_aruco/13_detect_aruco.ipynb |
detectMarkers, corners, IDs, webcam | 35 min |
| 14 | part_5_aruco/14_aruco_pose_estimation.ipynb |
estimatePoseSingleMarkers, rvec/tvec, drawAxes | 45 min |
| 15 | part_5_aruco/15_aruco_robotics_app.ipynb |
Full: ArUco at station → alignment offset | 60 min |
| 16 | part_6_stereo_vision/16_stereo_theory.ipynb |
Epipolar geometry, disparity, depth, Q matrix | 55 min |
| 17 | part_6_stereo_vision/17_stereo_calibration.ipynb |
stereoCalibrate, rectify, remap | 50 min |
| 18 | part_7_deep_learning_6d_pose/18_intro_dl_for_cv.ipynb |
Neural networks, inference, pretrained models | 40 min |
| 19 | part_7_deep_learning_6d_pose/19_mediapipe_objectron.ipynb |
Objectron, 3D bounding boxes, webcam — mediapipe==0.9.3 |
45 min |
| 20 | part_7_deep_learning_6d_pose/20_efficientpose.ipynb |
EfficientNet backbone, rotation/translation heads | 60 min |
| 21 | part_7_deep_learning_6d_pose/21_foundationpose_freeze.ipynb |
Foundation models, RGB-D, zero-shot | 55 min |
| 22 | part_7_deep_learning_6d_pose/22_megapose_visp.ipynb |
CAD prep, ViSP integration | 60 min |
| 23 | part_8_robotics_projects/23_station_alignment_aruco.ipynb |
Full project: ArUco → docking offset | 75 min |
| 24 | part_8_robotics_projects/24_pallet_detection_pose.ipynb |
CAD + pose → fork/clamp alignment | 75 min |
| 25 | part_8_robotics_projects/25_capstone_template.ipynb |
Student's own robotics application | — |
- INDEX.ipynb — clickable table of contents for all 25 notebooks (start here if you're browsing in JupyterLab)
- VIDEO_COMPANION.md — which YouTube videos to watch before each notebook, with direct links
- Open any notebook on GitHub and click "Open in Colab"
- Each notebook detects Colab automatically and installs its own dependencies
- For GPU-heavy notebooks (Part 7+): Runtime → Change runtime type → T4 GPU
# Clone or download this repository
cd course
# Create and activate virtual environment
python -m venv venv
# Windows:
venv\Scripts\activate
# Linux/macOS:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Launch Jupyter
jupyter labMost of the course runs fine with Options A or B. The one exception is Notebook 20 (EfficientPose), which requires TensorFlow 1.14 — an old version that conflicts with modern Python environments. Docker is the cleanest solution.
New to Docker? NB 01 introduces it briefly, and NB 20 has a full plain-English explainer with step-by-step install instructions. You don't need to set it up until you reach NB 20.
# Install Docker Desktop first: https://www.docker.com/products/docker-desktop
# Then pull the TF1 GPU image and run EfficientPose inside it:
docker pull tensorflow/tensorflow:1.14.0-gpu-py3
docker run --gpus all -it \
-v $(pwd):/workspace \
tensorflow/tensorflow:1.14.0-gpu-py3 \
/bin/bashFor FoundationPose (NB 21), Docker is optional — a conda environment works too. See NB 21 for details.
Each Part 7 notebook uses a pretrained model. Here's exactly what you need to do (or not do) for each:
| Notebook | Model | Action required |
|---|---|---|
| NB 19 — MediaPipe Objectron | MediaPipe models | Nothing — downloaded automatically on first pip install mediapipe run |
| NB 20 — EfficientPose | LineMOD weights (.h5) |
Manual download from EfficientPose GitHub Releases — see NB 20 Section 3 for step-by-step instructions |
| NB 21 — FoundationPose | FoundationPose weights | Nothing — included inside the Docker image (docker pull nvcr.io/nvidia/foundationpose:latest) |
| NB 21 — FreeZe | DINO foundation weights | Nothing — auto-downloaded on first run |
| NB 22 — MegaPose + ViSP | MegaPose weights | One command — python -m happypose.toolbox.utils.download --megapose-models (run after pip install) |
Summary for beginners: You only need to manually download something for NB 20 (EfficientPose). Everything else is handled automatically. NB 20 Section 3 walks you through it step by step.
assets/
images/ — sample images used across notebooks
calibration/ — saved calibration data (.npz, .json)
models/ — CAD models (.obj/.mtl)
aruco_markers/ — pre-generated marker PNGs (4x4, 5x5, 6x6)
The scripts/ folder contains ready-to-run Python scripts extracted from the notebooks. Use these with a real camera — no Jupyter needed.
scripts/
calibration/ capture_calibration_images.py — webcam capture for calibration images (NB07)
stereo_camera_calibration.py — full stereo calibration pipeline (NB17)
aruco/ generate_aruco_markers.py — batch-generate printable ArUco PNGs (NB12)
detect_aruco_live.py — real-time ArUco detection (NB13)
aruco_pose_estimation_live.py — real-time ArUco 6D pose + axes overlay (NB14)
pose/ undistort_live_video.py — real-time lens undistortion via remap (NB08)
chessboard_pose_estimation.py — chessboard → solvePnP → 3D cube overlay (NB10)
stereo/ stereo_depth_live.py — real-time stereo depth from two cameras (NB17)
robotics/ robot_station_docking.py — ArUco docking state machine + P-controller (NB15)
forklift_pallet_alignment.py — multi-marker pallet pose + 4-axis fork control (NB24)
Each script has a full usage guide in its docstring — run python scripts/<subfolder>/<name>.py --help.
For step-by-step instructions on running each script (environment setup, commands, controls, workflow) see SCRIPTS_COMPANION.md.
This course is built on 36 curated video notes covering real implementations. Each notebook has recommended videos to watch first — see VIDEO_COMPANION.md for the full list with links.
Topics covered across the videos:
- ArUco marker pose estimation workflows
- Camera calibration (chessboard, <5 min method)
- OpenCV GPU installation with CUDA
- solvePnP and 6D pose pipelines
- MediaPipe Objectron, EfficientPose, FoundationPose, FreeZe, MegaPose
- Stereo vision calibration
- Full robotics station docking demos
Questions? Issues? Open a GitHub issue or reach out to your instructor.