Skip to content

Karl1109/LIDAR-Mamba

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LIDAR

LOGO

[ACM MM 2025] LIDAR: Lightweight Adaptive Cue-Aware Fusion Vision Mamba for Multimodal Segmentation of Structural Cracks

🖐😭🤚 🌟If this work is useful to you, please give this repository a Star!🌟 🖐😭🤚
arXiv License

📬 News

  • 2025-07-31: The preprint of LIDAR has been posted on 📤️arXiv!
  • 2025-07-30: The code for LIDAR is publicly available in this repository! 📦
  • 2025-07-06: 🎉🎉🎉We are delighted to announce that our LIDAR has been accepted by the ACM MM 2025! 🖐😭🤚

⚒ Method Overview

Overview

Achieving pixel-level segmentation with low computational cost using multimodal data remains a key challenge in crack segmentation tasks. Existing methods lack the capability for adaptive perception and efficient interactive fusion of cross-modal features. To address these challenges, we propose a Lightweight Adaptive Cue-Aware vision Mamba network (LIDAR), which efficiently perceives and integrates morphological and textural cues from different modalities under multimodal crack scenarios, generating clear pixel-level crack segmentation maps. Specifically, LIDAR is composed of a Lightweight Adaptive Cue-Aware Visual State Space module (LacaVSS) and a Lightweight Dual Domain Dynamic Collaborative Fusion module (LD3CF). LacaVSS adaptively models crack cues through the proposed mask-guided Efficient Dynamic Guided Scanning Strategy (EDG-SS), while LD3CF leverages an Adaptive Frequency Domain Perceptron (AFDP) and a dual-pooling fusion strategy to effectively capture spatial and frequency-domain cues across modalities. Moreover, we design a Lightweight Dynamically Modulated Multi-Kernel convolution (LDMK) to perceive complex morphological structures with minimal computational overhead, replacing most convolutional operations in LIDAR. Experiments on three datasets demonstrate that our method outperforms other state-of-the-art (SOTA) methods. On the light-field depth dataset, our method achieves 0.8204 in F1 and 0.8465 in mIoU with only 5.35M parameters.

🎮 Getting Start

🗂 Download Datasets

The CrackPolar, CrackDepth and IRTCrack that we use can be downloaded from Multimodal_Crack_Dataset.

Overview

⚙️ Environment Setup

You can create your own conda environment for LIDAR based on the following command:

conda create -n LIDAR python=3.9 -y
conda activate LIDAR
pip install torch==2.1.2+cu121 torchvision==0.16.2+cu121 torchaudio==2.1.2+cu121
pip install -U openmim
mim install mmcv-full
pip install mamba-ssm==1.2.0

🖱️ Train

Before formal training, 10 rounds of pre-training are required and the mask is generated using the pre-training weights file, first, change the value of the scan_list_json_path parameter to pretrain in main.py and run:

python main.py

After pre-training is complete, modify the path to the weight file in inference_mask.py to the location of the pre-training weight file and run the following command to generate the mask:

python inference_mask.py

Once the mask has been generated, change the dataset path to the location of the dataset to be pre-scanned in the ./pre_scan/scan.py file, change the dataset path to the location of the dataset to be pre-scanned and run the following command to generate the JSON file that holds the pre-scan path:

python scan.py

Run the following command to check if the scan sequence was generated correctly:

python test_scan_json.py

Next, change the value of the scan_list_json_path parameter in main.py to the location of the pre-scanned JSON file, and run the following command for formal training:

python main.py

✍️✍️✍️Note:

  • When conducting the pre-training process, you need to change the value of the scan_list_json_path parameter in main.py to pretrain, and change the value of inference_mask to True.
  • When conducting the formal training, it is necessary to change the value of scan_list_json_path to the path of the JSON file, and change the value of inference_mask to False.

⌨ Test

After training, the weights file can be used for inference:

python test.py

〽️ Evaluate

Run the following commands to calculate the ODS, OIS, F1, and mIoU metrics:

cd eval
python evaluate.py

Run the following command to calculate the Params, FLOPs metrics:

cd ..
python eval_compute.py

🔭 Visualization

Visual comparison under dual-modal input:

Overview

Visual comparison under multimodal input:

Overview

Visual comparison under RGB single-modal input:

Overview

🏷️ License

This project is released under the Apache 2.0 license.

🫡 Acknowledgment

This work stands on the shoulders of the following open-source projects:

📟 Contact

If you have any other questions, feel free to contact me at liuhui1109@stud.tjut.edu.cn or liuhui@ieee.org.

About

[ACM MM 2025] LIDAR: Lightweight Adaptive Cue-Aware Fusion Vision Mamba for Multimodal Segmentation of Structural Cracks

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages