This repository is based on the official implementation of SAEdit “SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder” by Kamenetsky et al. (2025).
While the original SAEdit method performs token-level editing using PCA/SVD-based aggregation of SAE directions, this project introduces a Top-K Hybrid aggregation strategy for computing edit directions during generation. The goal is to improve robustness and controllability by focusing on the most semantically active SAE dimensions instead of relying solely on global principal directions.
Large-scale text-to-image diffusion models provide powerful editing capabilities but lack fine-grained, disentangled control. SAEdit addresses this by manipulating token-level text embeddings using directions extracted from a Sparse AutoEncoder (SAE) trained on text embeddings.
This project:
- Keeps the core SAEdit framework unchanged
- Uses the same pretrained SAE and diffusion backbone
- Modifies how SAE directions are aggregated during inference
- Provides side-by-side vanilla vs. modified generation scripts
Method Aggregation Vanilla SAEdit PCA / SVD-based aggregation (aggregate="pca") Ours (Top-K Hybrid) Top-K sparse latent selection + hybrid aggregation (aggregate="top_k_hybrid")
Instead of averaging or projecting across all active SAE dimensions, the Top-K Hybrid approach:
- Selects the K most activated SAE latent dimensions
- Aggregates directions only from these dominant components
- Reduces noise from weak or irrelevant activations
- Improves semantic locality of edits
Two separate scripts are provided for clarity and comparison.
Vanilla SAEdit (PCA-based)
python flux_dev_example.py \
--variation_path configs/variations/smiling_man.yaml \
--prompt "a portrait of a man riding a donkey in the snow" \
--source_tokens man \
--factor 0.6 \
--output vanilla_output.png
Ours: Top-K Hybrid Aggregation
python flux_dev_example_ours.py \
--variation_path configs/variations/smiling_man.yaml \
--prompt "a portrait of a man riding a donkey in the snow" \
--source_tokens man \
--factor 0.6 \
--output ours_output.png
Our code builds on the requirement of the diffusers library. To set up the environment, please run:
conda env create -f environment.yaml
conda activate saeedit_env
or install requirements:
pip install -r requirements.txt
This code builds on the code from the SAEdit library.
archivePrefix={arXiv},
primaryClass={cs.GR},
url={https://arxiv.org/abs/2510.05081},
}