ByteDance's next-generation AI video model with the advanced @-reference system. Combine text, images, video clips, and audio in a single prompt. Native audio-video synchronization, V2V editing, and up to 2K resolution at 30fps — all in one unified generation.
Seedance 2.0 is ByteDance's most advanced AI video generation model, unveiled in February 2026. It adopts a unified multimodal audio-video joint generation architecture supporting 4 input modalities simultaneously — text, up to 9 images, up to 3 video clips, and up to 3 audio tracks. The ground-breaking @-reference system lets you tag specific elements in your prompt and bind them to uploaded references for granular control over camera movement, character appearance, audio rhythm, and visual style. Outputs reach up to 2K resolution with native synchronized audio including multilingual lip-sync, sound effects, and background music.
Advanced reference tagging using @Image, @Video, and @Audio labels in your prompt. Bind specific elements to uploaded files for precise control over camera movement, character actions, audio rhythm, and visual style.
Combine text, up to 9 images, up to 3 video clips, and up to 3 audio tracks in a single generation request. Seedance 2.0 is the first model to process all four input types simultaneously.
Joint audio-video synthesis produces lip-sync dialogue, sound effects, and background music synchronized with the visual output. Supports multilingual lip-sync with phoneme-level precision.
Edit existing videos through reference-to-video mode. Transfer motion patterns, camera paths, and pacing from uploaded clips. Change outfits, modify actions, or replace elements while preserving the original structure.
Native 2K (2048x1080) output at 30fps with multiple quality levels: 480p, 720p, and 1080p. Video duration ranges from 4 to 15 seconds per generation.
Upload multiple reference images of the same character from different angles. Seedance 2.0 maintains consistent faces, clothing, body proportions, and accessories across multiple generated clips.
Explore Seedance 2.0's capabilities in multimodal reference control, native audio generation, and video editing

“@Image1 walks through @Image2 with camera movement from @Video1 and background music from @Audio1”
Multi-reference prompt combining all modalities

“@Image1 character dances with rhythm from @Audio1 in @Image3 environment”
Character motion guided by audio beat reference

“A person giving a presentation with synchronized English speech and slide transitions”
Lip-sync dialogue with visual content

“Cooking tutorial with step-by-step narration and ambient kitchen sounds”
Narration synchronized with cooking actions
Seedance 2.0 FAQ
The @-reference system lets you tag elements in your prompt with @Image1, @Video1, @Audio1 labels and bind them to uploaded reference files. Seedance 2.0 extracts camera movements from video references, beat rhythms from audio, and composition styles from images. This gives you granular control over every aspect of the generated video.
Seedance 2.0 supports 4 input modalities simultaneously: text prompts (unlimited length), up to 9 reference images (≤30MB each), up to 3 video clips (2-15s total duration, ≤50MB each), and up to 3 audio tracks (≤15s total, ≤15MB each). Total file limit: 12 files per request.
Seedance 2.0 outputs at native 2K (2048x1080) resolution at 30fps with multiple quality levels: 480p, 720p, and 1080p. Video duration ranges from 4 to 15 seconds per generation. Supported aspect ratios include landscape, portrait, and 21:9 ultra-wide.
Seedance 2.0 uses a dual-branch architecture that processes video and audio latents in parallel. Audio is generated simultaneously with visuals, ensuring millisecond-level synchronization. It supports multilingual lip-sync dialogue, action-matched sound effects, and mood-appropriate background music. You can also upload audio references as input.
V2V editing allows you to upload existing video clips as reference and generate new videos that inherit their motion patterns, camera paths, and pacing. You can change specific elements like outfits, actions, or scene details while preserving the original motion structure.
Seedance 2.0 adds video and audio reference inputs, increases image references from 1 to 9, introduces the @-reference system for multimodal control, adds V2V video editing, extends max resolution from 1080p to 2K, increases duration from 12s to 15s, and is approximately 30% faster than 1.5 Pro.
Seedance 2.0 uses per-second dynamic pricing based on resolution: 480p (14-28 credits/second), 720p (28.5-57 credits/second), and 1080p (640-3,810 credits/second). There are two speed variants: Standard and Fast, with Fast being roughly 30% faster.
Seedance 2.0 is ideal for video directors needing precise motion control, content creators wanting native audio sync without post-production, advertisers producing branded video content, educators creating narrated tutorials, and anyone who needs professional-quality AI video with synchronized sound.
"The @-reference system is genuinely innovative. I can extract camera movements from a reference clip and apply them instantly — it's a completely new creative workflow."
Video Director
"The 4-modality input is a game-changer. I can bring a character design, a camera movement reference, and background music all into one prompt and get exactly what I envisioned."
Motion Designer
"The @-reference system is genuinely innovative. I can extract camera movements from a reference clip and apply them instantly — it's a completely new creative workflow."
Video Director
"The 4-modality input is a game-changer. I can bring a character design, a camera movement reference, and background music all into one prompt and get exactly what I envisioned."
Motion Designer
Experience Seedance 2.0 — the most advanced video generator from ByteDance, free online