This repository provides an up-to-date collection of papers focused on 3D Human Pose and Shape Estimation from LiDAR Point Clouds. The organization of the repository follows the taxonomy introduced in the paper below. Please cite our paper if you benefit from this repository:
S. Galaaoui, E. Valle, D. Picard, N.Samet, "3D Human Pose and Shape Estimation from LiDAR Point Clouds: A Review", arXiv, 2025. [preprint]
BibTeX entry:
@article{galaaoui20253dhumanposeshape,
title={{3D Human Pose and Shape Estimation from LiDAR Point Clouds: A Review}},
author={Salma Galaaoui and Eduardo Valle and David Picard and Nermin Samet},
year={2025},
journal={arXiv preprint arXiv:2509.12197}
}
If you know of a paper on 3D Human Pose Estimation or Human Mesh Reconstruction from LiDAR Point Clouds, you are welcome to contribute by submitting a pull request. In your submission, please indicate the section where your paper fits best within the repository’s taxonomy.
Please contact Salma Galaaoui (salma.galaaoui@valeo.com) for your questions.
- Overview of 3D Human Pose Estimation or Human Mesh Reconstruction Methods from LiDAR Point Clouds
1.1 Comparative Summary of Human Pose Estimation or Human Mesh Reconstruction Methods
1.2 Comparing Network Architectures of Human Pose Estimation or Human Mesh Reconstruction Methods - 3D Human Pose Estimation From LIDAR Point Clouds
2.1 Supervised Human Pose Estimation
2.2 Weakly-supervised Human Pose Estimation
2.3 Unsupervised Human Pose Estimation - 3D Human Mesh Reconstruction From LIDAR Point Clouds
3.1 LiDAR-Only Human Mesh Reconstruction
3.2 Fusing LiDAR and Other Modalities for Human Mesh Reconstruction - Datasets
4.1 Waymo Open Dataset
4.2 SLOPER4D
4.3 Human-M3
- DAPT, AAAI 2025, [paper]
- HUM3DIL, CoRL 2022, [paper]
- LiDAR-HMP, ACM Multimedia 2024, [paper]
- LidPose, Sensors 2024, [paper]
- LPFormer, ICRA 2024, [paper]
- MMVP, arXiv 2023, [paper]
- UniPVU-Human, CVPR 2024, [paper]
- VoxelKP, ICCV 2025, [paper]
- FusionPose, AAAI 2023, [paper]
- HPERL, ICPR 2020, [paper]
- LiCamPose, WACV 2025, [paper]
- SA-VR system, arXiv 2024, [paper]
- WS-HPE, CVPRW 2020, [paper]
- WS-Fusion, IEEE-IV 2023, [paper]
- GC-KPL, CVPR 2023, [paper]
- LiDARCap, CVPR 2022, [paper]
- LiDARCapV2, Pattern Recognition 2024, [paper]
- LiDAR-HMR, IEEE-TMM 2025, [paper]
- LiveHPS, CVPR 2024, [paper]
- LiveHPS++, ECCV 2024, [paper]
- NE-3D-HPE, AAAI 2024, [paper]
- CIMI4D, CVPR 2023, [paper]
- FreeCap, AAAI 2025, [paper]
- HSC4D*, CVPR 2022, [paper]
- Human-M3*, arXiv 2023, [paper]
- LiDAR-aid Inertial Poser (LIP), IEEE-TVCG 2023, [paper]
- PEAR-Proj, ACM Multimedia 2024, [paper]
- SLOPER4D* , CVPR 2023, [paper]
- SMPLify-3D, Master Thesis 2024, [paper]
* The annotation pipeline employs mesh reconstruction for 3D data labeling.
-
Dataset Name and Version
- Waymo Open Dataset - Perception
- Latest versions: v1.4.3 (including maps as polylines and polygons) or v2.0.1 (same data but without maps)
-
Overview
- Waymo Open Dataset is a large corpus of autonomous driving related data and scenarios. It comprises two large datasets: Motion and Perception, the latter being the one of focus for this survey. The Motion Dataset in its latest version contains LiDAR captures of 103,354 scenarios each containing 20 seconds of tracked vehicles, objects, and humans in addition to camera embeddings. The Perception dataset, as its name indicates, is focused on perception and provides annotated rich sensor data captured from LiDAR sensors and cameras mounted on Waymo vehicles.
-
Data Collection
- Methodology: The data is collected using LiDAR sensors and high-resolution cameras mounted on Waymo vehicles. The vehicles have 5 in-house LiDARs mounted: one mid-range sensor on the top and four short-range sensors on the front, side left, side right, and rear. The LiDAR beams are truncated to 70 meters for the top sensor and 20 meters for the rest. Only the first two returns of the sensors are kept. In addition, RGB data is collected from 5 cameras associated with the following directions: front, front left, front right, side left, and side right. The images are saved in JPEG format. Additional information is provided to customize the LiDAR to camera projection.
- Time period of data collection: Initial release in 2019. Post-processing lasted until April 2024 as per Waymo's website.
- Geographic coverage: The majority of scenes were recorded in San Francisco, Phoenix, and Mountain View with data collected at various times of the day and night.
-
Data Format and Structure
- Data is stored in folders according to annotation type (e.g., human keypoints, bounding box, camera labels, etc.) and each label is stored using Apache Parquet format.
- Dataset is organized into sequences of 20 seconds each with multiple sensor inputs per sequence sampled at 10Hz.
-
Dataset Name and Version
- SLOPER4D - Scene Aware Dataset for Global 4D Human Pose Estimation in Urban Environments
- Latest version: v1.0 (partial release of scenes, only 6 out of 15)
-
Overview
- SLOPER4D is the first large-scale urban 3D Human Pose Estimation (HPE) dataset containing multi-modal capture data, including calibrated and synchronized IMU measurements, LiDAR point clouds, and images for each subject. It includes rich 3D annotations such as SMPL models and global locations in the world coordinate system. The dataset also provides the complete 3D scene mesh.
-
Data Collection
- Methodology: Data is collected using a person wearing a head-mounted LiDAR and camera, following a subject wearing IMUs performing mundane actions. The mounted LiDAR is an Ouster OS-1 mid-range LiDAR, oriented at 45° to better capture the performer.
- Time period of data collection: Initial release in 2023.
-
Data Format and Structure
- Data is separated into folders containing per-sequence annotations. In each folder are subfolder containing: LiDAR point cloud frames in PCD format, trajectories and tracking trajectories, MoCap data in BVH format, and a video of the sequence. Each sequence folder also includes a JSON file containing metadata.
-
Dataset Name and Version
- Human-M3: A Multi-view Multi-modal Dataset for 3D Human Pose Estimation in Outdoor Scenes.
-
Overview
- Human-M3 is an outdoor, multi-modal, multi-person, and multi-view dataset designed as a benchmark for 3D HPE and HMR. It captures multi-person interaction scenes using four diagonally opposed camera–LiDAR pairs, providing fused and post-processed scene point clouds (from all four LiDARs) alongside their corresponding camera views.
-
Data Collection
- Methodology: Data is captured from four diagonally opposed angles using Livox MID-100 sensors operating with a Non-Repetitive Scanning (NRS) pattern and cameras stationed at the same locations. All sensor streams are synchronized and sampled at 10 Hz, and the test set is manually annotated to ensure reliability.
- Time period of data collection: Initial release in 2023.
-
Data Format and Structure
- Data is separated into folders containing per-sequence annotations. In each folder: LiDAR point cloud frames in PCD format, camera images, SMPL estimated parameters stored in JSON format, 3D keypoints in JSON format, camera calibration intrinsics and extrinsics in JSON format.
Waymo Open Dataset We extract statics for Waymo mostly using their v4.0.1 Apache parquet files. We filter out objects labelled as pedestrian (label 2) and cyclist (label 4) following their labelling protocol file from their 'camera_box' dataframes, then we compute average bounding box size. For the average distance from sensor and the number of 3D keypoint frames, we use the 'lidar_hkp' dataframe and compute the average distance of the 14 keypoints that are already in LiDAR coordinate frame. We also report the horizontal resolution for the TOP Lidar and thus its pointcloud resolution from the 'lidar' dataframe. Finally for the 3D pose diversity we use postprocessed files from the authors of LiDAR-HMR that provide ground truth meshes generated using SiMPLify, we remove the rotation and translation and compute the average MSE between the posed ground truth mesh a T-posed mesh with the same betas, as show below.
SLOPER4D We extract our statistics integrally from the SLOPER4D v1.0 dataset files. Most information is available through their data-loader, we use their segmented human points which are in LiDAR coordinate frame to compute average points per instance as well as the average distance to sensor. For 3D pose diversity we use the same method we used with Waymo Open Dataset. We additionally compute average sequence length using the average number of RGB frames and the framerate. Note that we report our statistics based on the 6 out of 15 sequences released yielding 33k LiDAR frames instead of the 100k announced in their paper.
Human-M3 We extract our statistics integrally from the Human-M3 dataset files. We compute the area coverage information by summing the covered area in each captured scene; we load the first .PCD point cloud of each captured scene and compute the area by projecting the points to the ground plane (removing Z- information) and computing the convex hull using Trimesh of the ground points providing a rough estimate of the coverage. We deduce the sequence lengths using the number of LiDAR frames and framerate. We compute number of subjects using the unique keys in the SMPL dictionaries of each sequence and the number of human instances using the total number of SMPL annotations present. For the average points per human instance we instanciate the SMPL mesh in the LiDAR point cloud scene using Trimesh and compute the total number of point contained inside the minimal bounding shape of the SMPL mesh.
Extrinsic characteristics of lidar point cloud datasets
| Datasets | WOD | Sloper4D | Human-M3 |
|---|---|---|---|
| Area coverage ( |
76M | 2-13k | 111.5k |
| Sequence length (s) | 20 | 102-441 | 12-45 |
| # scenes | 998 | 6 | 4 |
| # subjects | 23.6k | 12 | 237 |
| # 3D human instances | 9.9k | 33k | 89k |
| # LiDAR frames | 230k | 42.3k | 12.2k |
Acquisition-related characteristics of lidar point cloud datasets.
| Datasets | WOD | Sloper4D | Human-M3 |
|---|---|---|---|
| # beams | 64 | 128 | 3 |
| PC resolution | 169600 | 131072 | 80928 |
| Range (m) | 20/75 | 90 | 90 |
| Framerate (Hz) | 10 | 20 | 10 |
| FOV (hfov |
25.2 |
360 |
98.4 |
Intrinsic characteristics and diversity of the lidar point cloud datasets.
| Datasets | WOD | SLOPER4D | Human-M3 |
|---|---|---|---|
| Avg. points per instance | 384.1 | 967.8 | 369.1 |
| Avg. bounding box size (px) | 10340.7 | 37471.1 | N/A |
| # Human keypoints | 14 | 21 | 15 |
| Avg. human-sensor dist (m) | 14.51 | 2.81 | N/A |
| 3D pose diversity (cm) | 22.0 | 22.9 | 22.3 |
We compute 3D pose variability by aligning the posed SMPL model (purple) with the template model in T-pose (pink) and compute the MSE between the joints (orange and green respectively)



