Skip to content

valeoai/3D-Human-Pose-Shape-Estimation-from-LiDAR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 

Repository files navigation

A Repository of Papers on 3D Human Pose and Shape Estimation from LiDAR Point Clouds

This repository provides an up-to-date collection of papers focused on 3D Human Pose and Shape Estimation from LiDAR Point Clouds. The organization of the repository follows the taxonomy introduced in the paper below. Please cite our paper if you benefit from this repository:

S. Galaaoui, E. Valle, D. Picard, N.Samet, "3D Human Pose and Shape Estimation from LiDAR Point Clouds: A Review", arXiv, 2025. [preprint]

BibTeX entry:

@article{galaaoui20253dhumanposeshape,
      title={{3D Human Pose and Shape Estimation from LiDAR Point Clouds: A Review}}, 
      author={Salma Galaaoui and Eduardo Valle and David Picard and Nermin Samet},
      year={2025},
      journal={arXiv preprint arXiv:2509.12197}
}

How to request addition of a paper

If you know of a paper on 3D Human Pose Estimation or Human Mesh Reconstruction from LiDAR Point Clouds, you are welcome to contribute by submitting a pull request. In your submission, please indicate the section where your paper fits best within the repository’s taxonomy.

Contact

Please contact Salma Galaaoui (salma.galaaoui@valeo.com) for your questions.

Table of Contents

  1. Overview of 3D Human Pose Estimation or Human Mesh Reconstruction Methods from LiDAR Point Clouds
    1.1 Comparative Summary of Human Pose Estimation or Human Mesh Reconstruction Methods
    1.2 Comparing Network Architectures of Human Pose Estimation or Human Mesh Reconstruction Methods
  2. 3D Human Pose Estimation From LIDAR Point Clouds
    2.1 Supervised Human Pose Estimation
    2.2 Weakly-supervised Human Pose Estimation
    2.3 Unsupervised Human Pose Estimation
  3. 3D Human Mesh Reconstruction From LIDAR Point Clouds
    3.1 LiDAR-Only Human Mesh Reconstruction
    3.2 Fusing LiDAR and Other Modalities for Human Mesh Reconstruction
  4. Datasets
    4.1 Waymo Open Dataset
    4.2 SLOPER4D
    4.3 Human-M3

1. 3D Human Pose Estimation From LIDAR Point Clouds

1.1. Comparative Summary of Human Pose Estimation or Human Mesh Reconstruction Methods

1.2. Comparing Network Architectures of Human Pose Estimation or Human Mesh Reconstruction Methods

2. 3D Human Pose Estimation From LIDAR Point Clouds

2.1. Supervised Human Pose Estimation

2.2. Weakly-supervised Human Pose Estimation

2.3. Unsupervised Human Pose Estimation

3. 3D Human Mesh Reconstruction From LIDAR Point Clouds

3.1. LiDAR-Only Human Mesh Reconstruction

3.2. Fusing LiDAR and Other Modalities for Human Mesh Reconstruction

  • CIMI4D, CVPR 2023, [paper]
  • FreeCap, AAAI 2025, [paper]
  • HSC4D*, CVPR 2022, [paper]
  • Human-M3*, arXiv 2023, [paper]
  • LiDAR-aid Inertial Poser (LIP), IEEE-TVCG 2023, [paper]
  • PEAR-Proj, ACM Multimedia 2024, [paper]
  • SLOPER4D* , CVPR 2023, [paper]
  • SMPLify-3D, Master Thesis 2024, [paper]

* The annotation pipeline employs mesh reconstruction for 3D data labeling.

4. Datasets

4.1. Waymo Open Dataset

  • Dataset Name and Version

    • Waymo Open Dataset - Perception
    • Latest versions: v1.4.3 (including maps as polylines and polygons) or v2.0.1 (same data but without maps)
  • Overview

    • Waymo Open Dataset is a large corpus of autonomous driving related data and scenarios. It comprises two large datasets: Motion and Perception, the latter being the one of focus for this survey. The Motion Dataset in its latest version contains LiDAR captures of 103,354 scenarios each containing 20 seconds of tracked vehicles, objects, and humans in addition to camera embeddings. The Perception dataset, as its name indicates, is focused on perception and provides annotated rich sensor data captured from LiDAR sensors and cameras mounted on Waymo vehicles.
  • Data Collection

    • Methodology: The data is collected using LiDAR sensors and high-resolution cameras mounted on Waymo vehicles. The vehicles have 5 in-house LiDARs mounted: one mid-range sensor on the top and four short-range sensors on the front, side left, side right, and rear. The LiDAR beams are truncated to 70 meters for the top sensor and 20 meters for the rest. Only the first two returns of the sensors are kept. In addition, RGB data is collected from 5 cameras associated with the following directions: front, front left, front right, side left, and side right. The images are saved in JPEG format. Additional information is provided to customize the LiDAR to camera projection.
    • Time period of data collection: Initial release in 2019. Post-processing lasted until April 2024 as per Waymo's website.
    • Geographic coverage: The majority of scenes were recorded in San Francisco, Phoenix, and Mountain View with data collected at various times of the day and night.
  • Data Format and Structure

    • Data is stored in folders according to annotation type (e.g., human keypoints, bounding box, camera labels, etc.) and each label is stored using Apache Parquet format.
    • Dataset is organized into sequences of 20 seconds each with multiple sensor inputs per sequence sampled at 10Hz.

4.2. SLOPER4D

  • Dataset Name and Version

    • SLOPER4D - Scene Aware Dataset for Global 4D Human Pose Estimation in Urban Environments
    • Latest version: v1.0 (partial release of scenes, only 6 out of 15)
  • Overview

    • SLOPER4D is the first large-scale urban 3D Human Pose Estimation (HPE) dataset containing multi-modal capture data, including calibrated and synchronized IMU measurements, LiDAR point clouds, and images for each subject. It includes rich 3D annotations such as SMPL models and global locations in the world coordinate system. The dataset also provides the complete 3D scene mesh.
  • Data Collection

    • Methodology: Data is collected using a person wearing a head-mounted LiDAR and camera, following a subject wearing IMUs performing mundane actions. The mounted LiDAR is an Ouster OS-1 mid-range LiDAR, oriented at 45° to better capture the performer.
    • Time period of data collection: Initial release in 2023.
  • Data Format and Structure

    • Data is separated into folders containing per-sequence annotations. In each folder are subfolder containing: LiDAR point cloud frames in PCD format, trajectories and tracking trajectories, MoCap data in BVH format, and a video of the sequence. Each sequence folder also includes a JSON file containing metadata.

4.3. Human-M3

  • Dataset Name and Version

    • Human-M3: A Multi-view Multi-modal Dataset for 3D Human Pose Estimation in Outdoor Scenes.
  • Overview

    • Human-M3 is an outdoor, multi-modal, multi-person, and multi-view dataset designed as a benchmark for 3D HPE and HMR. It captures multi-person interaction scenes using four diagonally opposed camera–LiDAR pairs, providing fused and post-processed scene point clouds (from all four LiDARs) alongside their corresponding camera views.
  • Data Collection

    • Methodology: Data is captured from four diagonally opposed angles using Livox MID-100 sensors operating with a Non-Repetitive Scanning (NRS) pattern and cameras stationed at the same locations. All sensor streams are synchronized and sampled at 10 Hz, and the test set is manually annotated to ensure reliability.
    • Time period of data collection: Initial release in 2023.
  • Data Format and Structure

    • Data is separated into folders containing per-sequence annotations. In each folder: LiDAR point cloud frames in PCD format, camera images, SMPL estimated parameters stored in JSON format, 3D keypoints in JSON format, camera calibration intrinsics and extrinsics in JSON format.

5. Technical Details for Extracting Dataset Statistics

Waymo Open Dataset We extract statics for Waymo mostly using their v4.0.1 Apache parquet files. We filter out objects labelled as pedestrian (label 2) and cyclist (label 4) following their labelling protocol file from their 'camera_box' dataframes, then we compute average bounding box size. For the average distance from sensor and the number of 3D keypoint frames, we use the 'lidar_hkp' dataframe and compute the average distance of the 14 keypoints that are already in LiDAR coordinate frame. We also report the horizontal resolution for the TOP Lidar and thus its pointcloud resolution from the 'lidar' dataframe. Finally for the 3D pose diversity we use postprocessed files from the authors of LiDAR-HMR that provide ground truth meshes generated using SiMPLify, we remove the rotation and translation and compute the average MSE between the posed ground truth mesh a T-posed mesh with the same betas, as show below.

SLOPER4D We extract our statistics integrally from the SLOPER4D v1.0 dataset files. Most information is available through their data-loader, we use their segmented human points which are in LiDAR coordinate frame to compute average points per instance as well as the average distance to sensor. For 3D pose diversity we use the same method we used with Waymo Open Dataset. We additionally compute average sequence length using the average number of RGB frames and the framerate. Note that we report our statistics based on the 6 out of 15 sequences released yielding 33k LiDAR frames instead of the 100k announced in their paper.

Human-M3 We extract our statistics integrally from the Human-M3 dataset files. We compute the area coverage information by summing the covered area in each captured scene; we load the first .PCD point cloud of each captured scene and compute the area by projecting the points to the ground plane (removing Z- information) and computing the convex hull using Trimesh of the ground points providing a rough estimate of the coverage. We deduce the sequence lengths using the number of LiDAR frames and framerate. We compute number of subjects using the unique keys in the SMPL dictionaries of each sequence and the number of human instances using the total number of SMPL annotations present. For the average points per human instance we instanciate the SMPL mesh in the LiDAR point cloud scene using Trimesh and compute the total number of point contained inside the minimal bounding shape of the SMPL mesh.

Extrinsic characteristics of lidar point cloud datasets

Datasets WOD Sloper4D Human-M3
Area coverage ($m^2$) 76M 2-13k 111.5k
Sequence length (s) 20 102-441 12-45
# scenes 998 6 4
# subjects 23.6k 12 237
# 3D human instances 9.9k 33k 89k
# LiDAR frames 230k 42.3k 12.2k

Acquisition-related characteristics of lidar point cloud datasets.

Datasets WOD Sloper4D Human-M3
# beams 64 128 3 $\dagger$
PC resolution 169600 131072 80928
Range (m) 20/75 90 90
Framerate (Hz) 10 20 10
FOV (hfov $\times$ vfov) 25.2 $\textdegree$ $\times$ 120/20 $\textdegree$ 360 $\textdegree$ $\times$ 42.2 $\textdegree$ 98.4 $\textdegree$ $\times$ 38.4 $\textdegree$

$\dagger$ Sensor uses an NRS pattern with three laser beams, scanning at 300,000 points/s.

Intrinsic characteristics and diversity of the lidar point cloud datasets.

Datasets WOD SLOPER4D Human-M3
Avg. points per instance 384.1 967.8 369.1
Avg. bounding box size (px) 10340.7 37471.1 N/A
# Human keypoints 14 21 15
Avg. human-sensor dist (m) 14.51 2.81 N/A
3D pose diversity (cm) 22.0 22.9 22.3

We compute 3D pose variability by aligning the posed SMPL model (purple) with the template model in T-pose (pink) and compute the MSE between the joints (orange and green respectively)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors