Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization
Chenyang Zhu, Hongxiang Li, Xiu Li, Long Chen
arXiv 2026
Concept customization typically binds rare tokens to a target concept. Unfortunately, these approaches often suffer from unstable performance as the pretraining data seldom contains these rare tokens. Meanwhile, these rare tokens fail to convey the inherent knowledge of the target concept. Consequently, we introduce Knowledge-aware Concept Customization, a novel task aiming at binding diverse textual knowledge to target visual concepts. This task requires the model to identify the knowledge within the text prompt to perform high-fidelity customized generation. Meanwhile, the model should efficiently bind all the textual knowledge to the target concept. Therefore, we propose MoKus, a novel framework for knowledge-aware concept customization. Our framework relies on a key observation: cross-modal knowledge transfer, where modifying knowledge within the text modality naturally transfers to the visual modality during generation. Inspired by this observation, MoKus contains two stages: (1) In visual concept learning, we first learn the anchor representation to store the visual information of the target concept. (2) In textual knowledge updating, we update the answer for the knowledge queries to the anchor representation, enabling high-fidelity customized generation. To further comprehensively evaluate our proposed MoKus on the new task, we introduce the first benchmark for knowledge-aware concept customization: KnowCusBench. Extensive evaluations have demonstrated that MoKus outperforms state-of-the-art methods. Moreover, the cross-model knowledge transfer allows MoKus to be easily extended to other knowledge-aware applications like virtual concept creation and concept erasure. We also demonstrate the capability of our method to achieve improvements on world knowledge benchmarks.
- Introduces Knowledge-Aware Concept Customization, a new task for binding rich textual knowledge to customized visual concepts.
- We observe the cross-modal knowledge transfer, where knowledge updating in text modality can transfer to the visual modality.
- Inspired by this observation, we proposes MoKus, a two-stage framework for knowledge-aware concept customization.
- Presents KnowCusBench, the first benchmark designed for knowledge-aware concept customization.
- [2026.03.13]: Released the paper on arXiv (2603.12743), project page, and codebase.
Download the benchmark assets from Hugging Face. The release includes:
- Concept images in
KnowCusBench/concept_image - Textual knowledge in
KnowCusBench/knowledge_data - Generation prompts in
KnowCusBench/concept_image/dataset.json - Visual Concept Learning checkpoints for each target concept in
KnowCusBench/visual_ckpt
You can directly use the Visual Concept Learning checkpoints provided in KnowCusBench, so retraining is optional.
If you prefer to train the visual concept model yourself, please first prepare the additional environment required by the official Diffusers DreamBooth implementation for Qwen-Image:
We used the following training command:
export concept_name="your-concept-name"
export MODEL_NAME="./path/to/Qwen-Image"
export INSTANCE_DIR="./path/to/concept_image"
export OUTPUT_DIR="./path/to/output_dir"
accelerate launch train_dreambooth_lora_qwen_image.py \
--pretrained_model_name_or_path="$MODEL_NAME" \
--instance_data_dir="$INSTANCE_DIR" \
--output_dir="$OUTPUT_DIR" \
--mixed_precision="bf16" \
--instance_prompt="sks $concept_name" \
--resolution=1024 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 \
--use_8bit_adam \
--learning_rate=2e-4 \
--report_to="wandb" \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=500 \
--checkpointing_steps=100 \
--cache_latents \
--seed="42"You can run setup_env.sh, or create the conda environment manually:
conda create -n MoKus python=3.9.7 -y
conda activate MoKus
pip install -r requirements.txtDownload the weights for Qwen-Image and Qwen2.5-VL-7B-Instruct from Hugging Face, then place them under ./pretrained_models.
After downloading KnowCusBench, run the following command, or use run_text_knowledge_updating.sh:
export concept_name="your-concept-name"
python text_knowledge_updating.py \
--editing_method=UltraEdit \
--hparams_dir="./hparams/qwenvl2.5-7b.yaml" \
--data_dir="./knowledge_data/${concept_name}.json" \
--data_type=unike_data \
--output_dir="./updated_models" \
--task_name="${concept_name}" \
--sequential_editThe updated Qwen-Image model will be saved under ./updated_models/${concept_name}.
Use the following command, or run run_inference.sh:
CONCEPT_MODEL_PATH="path/to/your/concept/model"
LORA_MODEL_PATH="path/to/your/lora/model"
PROMPT="Your inference prompt goes here"
python inference.py \
--concept-model-path "$CONCEPT_MODEL_PATH" \
--lora-model-path "$LORA_MODEL_PATH" \
--prompt "$PROMPT" \
--output-path "image.png"@article{zhu2026mokus,
title={MoKus: Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization},
author={Zhu, Chenyang and Li, Hongxiang and Li, Xiu and Chen, Long},
journal={arXiv preprint arXiv:2603.12743},
year={2026}
}This repository builds heavily on EasyEdit and Diffusers. We thank the authors for making their code and models publicly available.
This repository accompanies our research project, and we will continue refining the codebase and documentation. If you have questions or would like to discuss ideas, please contact Chenyang Zhu.

