Free Sliders: Training-Free Modality-Agnostic Concept Sliders: Fine-Grained Control via Diffusion Models of Images, Audio, and Video

Rotem Ezra1 Hedi Zisling1 Nimrod Berman1 Ilan Naiman1 Alexey Gorkor2 Liran Nochumsohn1 Eliya Nachmani3 Omri Azencot1
1Faculty of Computer and Information Science, Ben-Gurion University of the Negev
2Lightricks
3Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev

Abstract

Diffusion models have become state-of-the-art generative models for images, audio, and video, yet enabling fine-grained controllable generation, i.e., continuously steering specific concepts without disturbing unrelated content, remains challenging. Concept Sliders (CS) offer a promising direction by discovering semantic directions through textual contrasts, but they require per-concept training and architecture-specific fine-tuning (e.g., LoRA), limiting scalability to new modalities. In this work, we introduce a simple yet effective approach that is fully training-free and modality-agnostic, achieved by partially estimating the CS formula during inference. To support modality-agnostic evaluation, we extend the CS benchmark to include both video and audio, establishing the first suite for fine-grained concept generation control with multiple modalities. We further propose three evaluation properties along with new metrics to improve evaluation quality. Finally, we identify an open problem of scale selection and non-linear traversals and introduce a two-stage procedure that automatically detects saturation points and reparameterizes traversal for perceptually uniform, semantically meaningful edits. Extensive experiments demonstrate that our method enables plug-and-play, training-free concept control across modalities, improves over existing baselines, and establishes new tools for principled controllable generation.

Method Overview Teaser

This website presents visual examples of our results across different modalities.

Explore Our Results

📸 Image Sliders 🎥 Video Sliders 🔊 Audio Examples 🎬 LTX Video Grids 📊 Composition ⚖️ Method Comparisons

Interactive Image Sliders

Explore different image slider examples. Move the sliders below to see how various attributes change.

Expression

Frowning
Person
Smiling
Expression

Age

Young
Person
Old
Age

Car

New
Car
Damaged
Car

Age Slider

Young
Person
Old
Age

Makeup Slider

No Makeup
Person
Makeup
Makeup

Dog Fur Slider

Darker Fur
Dog
Lighter Fur
Dog Fur

Interactive Video Slider

Explore different video slider examples. Move the sliders below to see how various attributes change.
Calm ocean
Sailboat in the ocean
Wavy ocean
Calm river
River streaming through valley
Wavy river
Brown, bare tree
Tree rustling in wind
Green, leafy tree

Mountain Hikers Slider

Less hikers
Mountain hiking trail
More hikers

Lighthouse Water Slider

Calm ocean
Lighthouse in the ocean
Wavy ocean

Cat Fur Color Slider

Dark Fur
Cat on windowsill
Orange Fur

Slider examples with LTX-Video

Examples of video sliders using LTX-Video generation with different concepts.

Age Slider

Old
Person
Young

Sailboat Slider

Slow boat
Sailboat
Fast boat

Car Color Slider

Old gray car
Car
Bright red car

Slider Compositions

Examples from our paper showing how multiple concept sliders can be composed together to achieve complex transformations.

Age + Smiling Composition

Young & Frowning
Person
Old & Smiling
Age and Smiling Composition

Smiling + Lipstick + Glasses Composition

Frowning & no glasses & no lipstick
Person
Smiling & glasses & lipstick
Smiling, Lipstick and Glasses Composition

Method Comparisons

Comparisons between different methods and effectiveness of our ASTD add-on across various concepts.

ASTD Add-on Effectiveness - Age Slider

Young
Person
Old
Ours w/ ASTD
Ours with ASTD - Age Comparison
Ours w/o ASTD
Ours without ASTD - Age Comparison

CS w/ ASTD
CS with ASTD - Age Comparison
CS w/o ASTD
CS without ASTD - Age Comparison

Glasses Slider - Method Comparison

No Glasses
Person
Glasses
Ours
Ours - Glasses Comparison
CS
CS - Glasses Comparison
Text-Embedding Variant
Text-Embedding Variant - Glasses Comparison

Smiling Slider - Method Comparison

Frowning
Person
Smiling
Ours
Ours - Smiling Comparison
CS
CS - Smiling Comparison
Text-Embedding Variant
Text-Embedding Variant - Smiling Comparison