SAEdit-TopK: Token-Level Continuous Image Editing with Hybrid Top-K Aggregation

This repository is based on the official implementation of SAEdit “SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder” by Kamenetsky et al. (2025).

While the original SAEdit method performs token-level editing using PCA/SVD-based aggregation of SAE directions, this project introduces a Top-K Hybrid aggregation strategy for computing edit directions during generation. The goal is to improve robustness and controllability by focusing on the most semantically active SAE dimensions instead of relying solely on global principal directions.

Overview

Large-scale text-to-image diffusion models provide powerful editing capabilities but lack fine-grained, disentangled control. SAEdit addresses this by manipulating token-level text embeddings using directions extracted from a Sparse AutoEncoder (SAE) trained on text embeddings.

This project:

Keeps the core SAEdit framework unchanged
Uses the same pretrained SAE and diffusion backbone
Modifies how SAE directions are aggregated during inference
Provides side-by-side vanilla vs. modified generation scripts

Key Differences from Original SAEdit

1. Aggregation Strategy

Method Aggregation Vanilla SAEdit PCA / SVD-based aggregation (aggregate="pca") Ours (Top-K Hybrid) Top-K sparse latent selection + hybrid aggregation (aggregate="top_k_hybrid")

Instead of averaging or projecting across all active SAE dimensions, the Top-K Hybrid approach:

Selects the K most activated SAE latent dimensions
Aggregates directions only from these dominant components
Reduces noise from weak or irrelevant activations
Improves semantic locality of edits

2. Example Scripts

Two separate scripts are provided for clarity and comparison.

Vanilla SAEdit (PCA-based)

python flux_dev_example.py \
  --variation_path configs/variations/smiling_man.yaml \
  --prompt "a portrait of a man riding a donkey in the snow" \
  --source_tokens man \
  --factor 0.6 \
  --output vanilla_output.png

Ours: Top-K Hybrid Aggregation

python flux_dev_example_ours.py \
  --variation_path configs/variations/smiling_man.yaml \
  --prompt "a portrait of a man riding a donkey in the snow" \
  --source_tokens man \
  --factor 0.6 \
  --output ours_output.png

Environment Setup

Our code builds on the requirement of the diffusers library. To set up the environment, please run:

conda env create -f environment.yaml
conda activate saeedit_env

or install requirements:

pip install -r requirements.txt

Acknowledgements

This code builds on the code from the SAEdit library.

  archivePrefix={arXiv},
  primaryClass={cs.GR},
  url={https://arxiv.org/abs/2510.05081},

}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
examples		examples
src		src
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SAEdit-TopK: Token-Level Continuous Image Editing with Hybrid Top-K Aggregation

Overview

Key Differences from Original SAEdit

1. Aggregation Strategy

2. Example Scripts

Environment Setup

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SAEdit-TopK: Token-Level Continuous Image Editing with Hybrid Top-K Aggregation

Overview

Key Differences from Original SAEdit

1. Aggregation Strategy

2. Example Scripts

Environment Setup

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages