Skip to content

HKUST-LongGroup/MoKus

Repository files navigation

MoKus

Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization

Chenyang Zhu, Hongxiang Li, Xiu Li, Long Chen

arXiv 2026

ArXiv Project Page Dataset

MoKus teaser

Abstract

Concept customization typically binds rare tokens to a target concept. Unfortunately, these approaches often suffer from unstable performance as the pretraining data seldom contains these rare tokens. Meanwhile, these rare tokens fail to convey the inherent knowledge of the target concept. Consequently, we introduce Knowledge-aware Concept Customization, a novel task aiming at binding diverse textual knowledge to target visual concepts. This task requires the model to identify the knowledge within the text prompt to perform high-fidelity customized generation. Meanwhile, the model should efficiently bind all the textual knowledge to the target concept. Therefore, we propose MoKus, a novel framework for knowledge-aware concept customization. Our framework relies on a key observation: cross-modal knowledge transfer, where modifying knowledge within the text modality naturally transfers to the visual modality during generation. Inspired by this observation, MoKus contains two stages: (1) In visual concept learning, we first learn the anchor representation to store the visual information of the target concept. (2) In textual knowledge updating, we update the answer for the knowledge queries to the anchor representation, enabling high-fidelity customized generation. To further comprehensively evaluate our proposed MoKus on the new task, we introduce the first benchmark for knowledge-aware concept customization: KnowCusBench. Extensive evaluations have demonstrated that MoKus outperforms state-of-the-art methods. Moreover, the cross-model knowledge transfer allows MoKus to be easily extended to other knowledge-aware applications like virtual concept creation and concept erasure. We also demonstrate the capability of our method to achieve improvements on world knowledge benchmarks.

Highlights

  • Introduces Knowledge-Aware Concept Customization, a new task for binding rich textual knowledge to customized visual concepts.
  • We observe the cross-modal knowledge transfer, where knowledge updating in text modality can transfer to the visual modality.
  • Inspired by this observation, we proposes MoKus, a two-stage framework for knowledge-aware concept customization.
  • Presents KnowCusBench, the first benchmark designed for knowledge-aware concept customization.

News

  • [2026.03.13]: Released the paper on arXiv (2603.12743), project page, and codebase.

Getting Started

1. Download KnowCusBench

Download the benchmark assets from Hugging Face. The release includes:

  1. Concept images in KnowCusBench/concept_image
  2. Textual knowledge in KnowCusBench/knowledge_data
  3. Generation prompts in KnowCusBench/concept_image/dataset.json
  4. Visual Concept Learning checkpoints for each target concept in KnowCusBench/visual_ckpt

2. Visual Concept Learning

You can directly use the Visual Concept Learning checkpoints provided in KnowCusBench, so retraining is optional.

If you prefer to train the visual concept model yourself, please first prepare the additional environment required by the official Diffusers DreamBooth implementation for Qwen-Image:

DreamBooth for Qwen-Image

We used the following training command:

export concept_name="your-concept-name"
export MODEL_NAME="./path/to/Qwen-Image"
export INSTANCE_DIR="./path/to/concept_image"
export OUTPUT_DIR="./path/to/output_dir"

accelerate launch train_dreambooth_lora_qwen_image.py \
  --pretrained_model_name_or_path="$MODEL_NAME" \
  --instance_data_dir="$INSTANCE_DIR" \
  --output_dir="$OUTPUT_DIR" \
  --mixed_precision="bf16" \
  --instance_prompt="sks $concept_name" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --use_8bit_adam \
  --learning_rate=2e-4 \
  --report_to="wandb" \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --checkpointing_steps=100 \
  --cache_latents \
  --seed="42"

3. Textual Knowledge Updating

Environment Setup

You can run setup_env.sh, or create the conda environment manually:

conda create -n MoKus python=3.9.7 -y
conda activate MoKus
pip install -r requirements.txt

Download Pretrained Models

Download the weights for Qwen-Image and Qwen2.5-VL-7B-Instruct from Hugging Face, then place them under ./pretrained_models.

Run Knowledge Updating

After downloading KnowCusBench, run the following command, or use run_text_knowledge_updating.sh:

export concept_name="your-concept-name"

python text_knowledge_updating.py \
  --editing_method=UltraEdit \
  --hparams_dir="./hparams/qwenvl2.5-7b.yaml" \
  --data_dir="./knowledge_data/${concept_name}.json" \
  --data_type=unike_data \
  --output_dir="./updated_models" \
  --task_name="${concept_name}" \
  --sequential_edit

The updated Qwen-Image model will be saved under ./updated_models/${concept_name}.

4. Inference

Use the following command, or run run_inference.sh:

CONCEPT_MODEL_PATH="path/to/your/concept/model"
LORA_MODEL_PATH="path/to/your/lora/model"
PROMPT="Your inference prompt goes here"

python inference.py \
  --concept-model-path "$CONCEPT_MODEL_PATH" \
  --lora-model-path "$LORA_MODEL_PATH" \
  --prompt "$PROMPT" \
  --output-path "image.png"

Results

Qualitative results

Citation

@article{zhu2026mokus,
  title={MoKus: Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization},
  author={Zhu, Chenyang and Li, Hongxiang and Li, Xiu and Chen, Long},
  journal={arXiv preprint arXiv:2603.12743},
  year={2026}
}

Acknowledgements

This repository builds heavily on EasyEdit and Diffusers. We thank the authors for making their code and models publicly available.

Contact

This repository accompanies our research project, and we will continue refining the codebase and documentation. If you have questions or would like to discuss ideas, please contact Chenyang Zhu.

About

[arXiv 2026] MoKus: This repo is the official implementation of "MoKus: Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages