Interactive Image Slider

Rotem Ezra¹ Hedi Zisling¹ Nimrod Berman¹ Ilan Naiman¹ Alexey Gorkor² Liran Nochumsohn¹ Eliya Nachmani³ Omri Azencot¹

¹Faculty of Computer and Information Science, Ben-Gurion University of the Negev
²Lightricks
³Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev

📂 View Code Repository 📄 arXiv Paper

Abstract

Diffusion models have become state-of-the-art generative models for images, audio, and video, yet enabling fine-grained controllable generation, i.e., continuously steering specific concepts without disturbing unrelated content, remains challenging. Concept Sliders (CS) offer a promising direction by discovering semantic directions through textual contrasts, but they require per-concept training and architecture-specific fine-tuning (e.g., LoRA), limiting scalability to new modalities. In this work, we introduce a simple yet effective approach that is fully training-free and modality-agnostic, achieved by partially estimating the CS formula during inference. To support modality-agnostic evaluation, we extend the CS benchmark to include both video and audio, establishing the first suite for fine-grained concept generation control with multiple modalities. We further propose three evaluation properties along with new metrics to improve evaluation quality. Finally, we identify an open problem of scale selection and non-linear traversals and introduce a two-stage procedure that automatically detects saturation points and reparameterizes traversal for perceptually uniform, semantically meaningful edits. Extensive experiments demonstrate that our method enables plug-and-play, training-free concept control across modalities, improves over existing baselines, and establishes new tools for principled controllable generation.