Official PyTorch implementation of CMLG-Net (Cross-modality Local-Global Network), a two-stream CNN framework for robust and accurate dynamic hand gesture authentication under challenging real-world conditions.
Yufeng Zhang, Wenxiong Kang, and Wenwei Song
IEEE Transactions on Information Forensics and Security (TIFS), 2024
Robust fine-grained behavioral features are critical for dynamic hand gesture authentication. However, behavioral characteristics are abstract and complex, making them harder to capture than physiological traits. Moreover, varying illumination and backgrounds in practical applications pose additional challenges to conventional RGB-based methods.
CMLG-Net addresses these issues with two complementary modules:
- Temporal Scale Pyramid (TSP): Captures fine-grained local motion cues at multiple temporal scales using parallel convolutions with different kernel sizes.
- Cross-Modality Temporal Non-Local (CMTNL): Aggregates global temporal features and cross-modality (RGB-D) information via an attention mechanism.
Together, these modules produce a comprehensive and robust behavioral representation that combines multi-scale (short- and long-term) and multimodal (RGB-D) information.
Overall architecture of CMLG-Net. It contains independent RGB and depth branches with a shared design. Each branch uses a pretrained ResNet18 backbone, inserts a TSP module at Conv1 with a residual connection, and concludes with the CMTNL module to summarize global temporal and multimodal features. Finally, enhanced features from both branches are concatenated for the final identity representation.
- ✅ State-of-the-art accuracy – achieves 0.497% EER on SCUT-DHGA and 4.848% on SCUT-DHGA-br under challenging protocols
- ✅ Robust to illumination & background changes – explicitly designed for real-world scenes
- ✅ Multi-scale temporal modeling – TSP module captures local motion at various scales, and CMTNL module captures long-term motion patterns
- ✅ Multimodal learning – leverages both RGB and depth modalities with cross-modality attention
- Multiple parallel convolution branches with different temporal kernel sizes
- Captures fine-grained local motion cues across short, medium, and long temporal scales
- Inserted at Conv1 block with a residual connection
-
Aggregates global temporal features and cross-modality (RGB-D) information
-
Composed of two sub-modules: UME (Unimodal Enhancement) and CMI (Cross-Modality Integration)
- Fuse the RGB and depth modalities using dynamic weights based on their quality
- Three independent loss functions supervise unimodal (RGB, depth) and multimodal identity features
- Gradient from CMTNL is blocked during back-propagation, allowing each branch to focus on modality-specific features without interference
SCUT-DHGA-br is a challenging derived dataset designed to evaluate robustness under practical conditions. It is built by applying background replacement and lighting adjustment to the testing set of the original SCUT-DHGA dataset. This dataset can be downloaded from SCUT-DHGA-br.
| Feature | Description |
|---|---|
| Total backgrounds | 3,627 (airports, classrooms, malls, museums, subways, etc.) |
| Brightness factor | Random between 0.5 and 1 |
| Purpose | Simulate real-world illumination and background variations |
CMLG-Net is compared with 27 state-of-the-art methods, including 2D CNNs, 3D CNNs, symbiotic CNNs, and two-stream CNNs. It achieves the best EER under both UMG and RMG protocols.
CMLG-Net significantly outperforms other approaches and meets real-time requirements on GPU devices.
- Python 3.8+
- PyTorch ≥ 2.4.0
git clone https://github.com/SCUT-BIP-Lab/CMLG-Net.git
cd CMLG-Net
pip install -r requirements.txt-
Download the SCUT-DHGA dataset (original) and the SCUT-DHGA-br dataset (derived).
-
Organize the data as follows:
data/
├── SCUT-DHGA/
│ ├── color_hand/ # RGB iamges
│ ├── depth_hand/ # Depth images
└── SCUT-DHGA-br/
├── color_hand/
└── depth_hand/
# Train CMLG-Net on SCUT-DHGA under UMG protocol
python ./train.py --conf_file ./conf/CMLGNet/UMG/UMG1_SD_CMLGNet.conf --mode train# Evaluate CMLG-Net on SCUT-DHGA under UMG protocol
python ./train.py --conf_file ./conf/SSAF/UMG/UMG1_SD_CMLGNet.conf --mode evalIf you find this work useful, please cite:
@ARTICLE{zhang2024cmlg,
author={Zhang, Yufeng and Kang, Wenxiong and Song, Wenwei},
journal={IEEE Transactions on Information Forensics and Security},
title={Robust and Accurate Hand Gesture Authentication With Cross-Modality Local-Global Behavior Analysis},
year={2024},
volume={19},
number={},
pages={8630-8643},
keywords={Authentication;Videos;Feature extraction;Physiology;Robustness;Lighting;Spatiotemporal phenomena;Biometrics;hand gesture authentication;multimodal fusion;spatiotemporal analysis;behavioral characteristic representation},
doi={10.1109/TIFS.2024.3451367}}
}Biometrics and Intelligence Perception Lab
College of Automation Science and Engineering
South China University of Technology, Guangzhou, China
- Yufeng Zhang: auyfzhang@mail.scut.edu.cn
- Wenxiong Kang: auwxkang@scut.edu.cn
MIT License. See LICENSE for details.



