6D Pose Vision Workshop — Complete Course

A hands-on Jupyter notebook course taking you from basic Python to applying 6D pose estimation in real robotics applications.

What You Will Build

By the end of this course you will be able to:

Calibrate a real camera and understand its intrinsic and extrinsic parameters
Estimate the 6D pose (3D position + 3D orientation) of objects using classical and deep learning methods
Use ArUco markers to guide a mobile robot to a docking station
Detect pallets from a CAD model and compute alignment offsets for a forklift
Run pose estimation in real time from a webcam feed

Prerequisites

Topic	Required level
Python	Intermediate — OOP basics, dictionaries, loops, list comprehensions
Linear algebra	Basic — matrix multiplication, determinants, eigenvalues
NumPy	Not required — covered in Part 1
OpenCV	Not required — covered in Parts 2–5
Deep learning	Not required — covered in Part 7

Course Structure

Part 0 — Getting Started          (notebooks 00–01)
Part 1 — Tools: Jupyter + NumPy   (notebooks 02–03)
Part 2 — OpenCV Foundations       (notebooks 04–05)
Part 3 — Camera Model             (notebooks 06–08)
Part 4 — Classical Pose (solvePnP)(notebooks 09–10)
Part 5 — ArUco Markers            (notebooks 11–15)
Part 6 — Stereo Vision            (notebooks 16–17)
Part 7 — Deep Learning 6D Pose    (notebooks 18–22)
Part 8 — Robotics Projects        (notebooks 23–25)

Notebook Index

#	File	Topic	Time
00	`part_0_getting_started/00_welcome_and_roadmap.ipynb`	What is 6D pose, course map, real-world problems	20 min
01	`part_0_getting_started/01_environment_setup.ipynb`	venv, conda, pip, CUDA, VSCode, Colab, Docker intro	30 min
02	`part_1_tools/02_jupyter_notebooks_101.ipynb`	Cells, kernel, shortcuts, magic commands	25 min
03	`part_1_tools/03_numpy_for_cv.ipynb`	Arrays, shapes, dtypes, broadcasting, matplotlib	45 min
04	`part_2_opencv_foundations/04_intro_to_opencv.ipynb`	Install, BGR/RGB, imread, VideoCapture, drawing	40 min
05	`part_2_opencv_foundations/05_image_operations.ipynb`	Resize, color spaces, filters, edge detection	50 min
06	`part_3_camera_model/06_camera_model_theory.ipynb`	Coordinate frames, pinhole, intrinsics K, x=KPX	60 min
07	`part_3_camera_model/07_camera_calibration.ipynb`	Chessboard, findChessboardCorners, calibrateCamera	50 min
08	`part_3_camera_model/08_distortion_undistortion.ipynb`	Radial/tangential distortion, undistort, remap	40 min
09	`part_4_classical_pose/09_solvePnP_explained.ipynb`	2D-3D correspondences, Levenberg-Marquardt	55 min
10	`part_4_classical_pose/10_pose_with_chessboard.ipynb`	Full demo: calibration → solvePnP → 3D cube	60 min
11	`part_5_aruco/11_aruco_theory.ipynb`	Binary grids, Hamming distance, dictionaries	40 min
12	`part_5_aruco/12_generate_aruco.ipynb`	drawMarker, save PNG, printing tips	20 min
13	`part_5_aruco/13_detect_aruco.ipynb`	detectMarkers, corners, IDs, webcam	35 min
14	`part_5_aruco/14_aruco_pose_estimation.ipynb`	estimatePoseSingleMarkers, rvec/tvec, drawAxes	45 min
15	`part_5_aruco/15_aruco_robotics_app.ipynb`	Full: ArUco at station → alignment offset	60 min
16	`part_6_stereo_vision/16_stereo_theory.ipynb`	Epipolar geometry, disparity, depth, Q matrix	55 min
17	`part_6_stereo_vision/17_stereo_calibration.ipynb`	stereoCalibrate, rectify, remap	50 min
18	`part_7_deep_learning_6d_pose/18_intro_dl_for_cv.ipynb`	Neural networks, inference, pretrained models	40 min
19	`part_7_deep_learning_6d_pose/19_mediapipe_objectron.ipynb`	Objectron, 3D bounding boxes, webcam — ⚠️ deprecated API, requires `mediapipe==0.9.3`	45 min
20	`part_7_deep_learning_6d_pose/20_efficientpose.ipynb`	EfficientNet backbone, rotation/translation heads	60 min
21	`part_7_deep_learning_6d_pose/21_foundationpose_freeze.ipynb`	Foundation models, RGB-D, zero-shot	55 min
22	`part_7_deep_learning_6d_pose/22_megapose_visp.ipynb`	CAD prep, ViSP integration	60 min
23	`part_8_robotics_projects/23_station_alignment_aruco.ipynb`	Full project: ArUco → docking offset	75 min
24	`part_8_robotics_projects/24_pallet_detection_pose.ipynb`	CAD + pose → fork/clamp alignment	75 min
25	`part_8_robotics_projects/25_capstone_template.ipynb`	Student's own robotics application	—

Navigation

INDEX.ipynb — clickable table of contents for all 25 notebooks (start here if you're browsing in JupyterLab)
VIDEO_COMPANION.md — which YouTube videos to watch before each notebook, with direct links

How to Run

Option A — Google Colab (recommended for beginners)

Open any notebook on GitHub and click "Open in Colab"
Each notebook detects Colab automatically and installs its own dependencies
For GPU-heavy notebooks (Part 7+): Runtime → Change runtime type → T4 GPU

Option B — Local with venv

# Clone or download this repository
cd course

# Create and activate virtual environment
python -m venv venv
# Windows:
venv\Scripts\activate
# Linux/macOS:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Launch Jupyter
jupyter lab

Option C — Docker (required for NB 20 · EfficientPose only)

Most of the course runs fine with Options A or B. The one exception is Notebook 20 (EfficientPose), which requires TensorFlow 1.14 — an old version that conflicts with modern Python environments. Docker is the cleanest solution.

New to Docker? NB 01 introduces it briefly, and NB 20 has a full plain-English explainer with step-by-step install instructions. You don't need to set it up until you reach NB 20.

# Install Docker Desktop first: https://www.docker.com/products/docker-desktop
# Then pull the TF1 GPU image and run EfficientPose inside it:

docker pull tensorflow/tensorflow:1.14.0-gpu-py3

docker run --gpus all -it \
    -v $(pwd):/workspace \
    tensorflow/tensorflow:1.14.0-gpu-py3 \
    /bin/bash

For FoundationPose (NB 21), Docker is optional — a conda environment works too. See NB 21 for details.

Pretrained Models

Each Part 7 notebook uses a pretrained model. Here's exactly what you need to do (or not do) for each:

Notebook	Model	Action required
NB 19 — MediaPipe Objectron	MediaPipe models	Nothing — downloaded automatically on first `pip install mediapipe` run
NB 20 — EfficientPose	LineMOD weights (`.h5`)	Manual download from EfficientPose GitHub Releases — see NB 20 Section 3 for step-by-step instructions
NB 21 — FoundationPose	FoundationPose weights	Nothing — included inside the Docker image (`docker pull nvcr.io/nvidia/foundationpose:latest`)
NB 21 — FreeZe	DINO foundation weights	Nothing — auto-downloaded on first run
NB 22 — MegaPose + ViSP	MegaPose weights	One command — `python -m happypose.toolbox.utils.download --megapose-models` (run after pip install)

Summary for beginners: You only need to manually download something for NB 20 (EfficientPose). Everything else is handled automatically. NB 20 Section 3 walks you through it step by step.

Assets

assets/
    images/        — sample images used across notebooks
    calibration/   — saved calibration data (.npz, .json)
    models/        — CAD models (.obj/.mtl)
    aruco_markers/ — pre-generated marker PNGs (4x4, 5x5, 6x6)

Standalone Scripts

The scripts/ folder contains ready-to-run Python scripts extracted from the notebooks. Use these with a real camera — no Jupyter needed.

scripts/
    calibration/   capture_calibration_images.py   — webcam capture for calibration images (NB07)
                   stereo_camera_calibration.py     — full stereo calibration pipeline (NB17)
    aruco/         generate_aruco_markers.py        — batch-generate printable ArUco PNGs (NB12)
                   detect_aruco_live.py             — real-time ArUco detection (NB13)
                   aruco_pose_estimation_live.py    — real-time ArUco 6D pose + axes overlay (NB14)
    pose/          undistort_live_video.py          — real-time lens undistortion via remap (NB08)
                   chessboard_pose_estimation.py    — chessboard → solvePnP → 3D cube overlay (NB10)
    stereo/        stereo_depth_live.py             — real-time stereo depth from two cameras (NB17)
    robotics/      robot_station_docking.py         — ArUco docking state machine + P-controller (NB15)
                   forklift_pallet_alignment.py     — multi-marker pallet pose + 4-axis fork control (NB24)

Each script has a full usage guide in its docstring — run python scripts/<subfolder>/<name>.py --help.

For step-by-step instructions on running each script (environment setup, commands, controls, workflow) see SCRIPTS_COMPANION.md.

Grounded in Real Video Notes

This course is built on 36 curated video notes covering real implementations. Each notebook has recommended videos to watch first — see VIDEO_COMPANION.md for the full list with links.

Topics covered across the videos:

ArUco marker pose estimation workflows
Camera calibration (chessboard, <5 min method)
OpenCV GPU installation with CUDA
solvePnP and 6D pose pipelines
MediaPipe Objectron, EfficientPose, FoundationPose, FreeZe, MegaPose
Stereo vision calibration
Full robotics station docking demos

Questions? Issues? Open a GitHub issue or reach out to your instructor.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

6D Pose Vision Workshop — Complete Course

What You Will Build

Prerequisites

Course Structure

Notebook Index

Navigation

How to Run

Option A — Google Colab (recommended for beginners)

Option B — Local with venv

Option C — Docker (required for NB 20 · EfficientPose only)

Pretrained Models

Assets

Standalone Scripts

Grounded in Real Video Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
part_0_getting_started		part_0_getting_started
part_1_tools		part_1_tools
part_2_opencv_foundations		part_2_opencv_foundations
part_3_camera_model		part_3_camera_model
part_4_classical_pose		part_4_classical_pose
part_5_aruco		part_5_aruco
part_6_stereo_vision		part_6_stereo_vision
part_7_deep_learning_6d_pose		part_7_deep_learning_6d_pose
part_8_robotics_projects		part_8_robotics_projects
scripts		scripts
video_notes		video_notes
.gitignore		.gitignore
INDEX.ipynb		INDEX.ipynb
LICENSE		LICENSE
README.md		README.md
SCRIPTS_COMPANION.md		SCRIPTS_COMPANION.md
VIDEO_COMPANION.md		VIDEO_COMPANION.md
VIDEO_REPORT.md		VIDEO_REPORT.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

6D Pose Vision Workshop — Complete Course

What You Will Build

Prerequisites

Course Structure

Notebook Index

Navigation

How to Run

Option A — Google Colab (recommended for beginners)

Option B — Local with venv

Option C — Docker (required for NB 20 · EfficientPose only)

Pretrained Models

Assets

Standalone Scripts

Grounded in Real Video Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages