阿里巴巴下一代多模態視頻模型,原生支援音訊與視頻協同生成。一個統一模型,四種生產力場景——文字、圖像、多圖參考與視頻原位編輯。在 Nano Banana 2 Pro 免費體驗。
HappyHorse 是阿里巴巴下一代 AI 視頻模型,基於原生多模態架構打造。單一統一模型涵蓋四種生產場景——文字生視頻、圖像生視頻、多圖參考生視頻與視頻原位編輯,支援原生音訊與視頻聯合生成、720p/1080p 輸出,深度適配廣告行銷、電商展示、短劇製作與社群創意等內容生產。

從底層就支援音訊與視頻協同生成,HappyHorse 在一次生成中輸出同步的畫面與聲音,無需後期製作。
文字生視頻、圖像生視頻、多圖參考生視頻與視頻原位編輯——全部由同一統一模型處理,保持一致的提示風格。
最多綁定 5 張參考圖引導角色、場景與道具。自由組合多個參考構圖多元素鏡頭,保持強一致性。
替換主體、服裝,乃至整體視覺風格,同時完整保留原始鏡頭運動、光線與構圖——適合在地化與創意重混。
720p 用於快速迭代,1080p 用於終稿交付。畫面清晰、壓縮乾淨,滿足短劇與廣告的發佈級品質。
HappyHorse 針對廣告、電商、短劇與社群創意深度調優——兼顧質感與生產效率。
See HappyHorse in action across all four scenes: text, image, multi-image reference, and video editing.
Generate video from pure text prompts with native audio
“A Pixar-style short about a nervous little traffic cone who dreams of being a finish line pylon at a major race. Other cones mock its ambitions. A construction worker accidentally places it at a marathon finish line. The cone's painted face shifts from terror to joy as runners pass. Confetti falls on its cone head. Other cones watch on TV, inspired. Audio: Traffic sounds becoming crowd cheers, inspirational swelling music.”
Duration: 5s
“8mm vintage film style, grainy texture, slight light leaks. A group of friends laughing and running on a beach in the 1970s. Sun-drenched colors, nostalgic atmosphere, handheld camera shaking slightly. Authentic retro look.”
Duration: 5s
“First-person POV (GoPro style), a high-speed mountain bike descent through a narrow, rocky forest trail. The camera vibrates with the bumps, trees rushing past in a blur. Intense sunlight filtering through the canopy. Adrenaline-pumping action, immersive sound of tires on gravel.”
Duration: 5s
Animate still images into motion with synchronized sound
“Tracking shot as the girl walks gracefully through the meadow. Her dress and hair flutter in the wind, and clouds drift slowly. Cinematic audio of soft footsteps on grass, rustling summer wind, and melodic bird calls.”
Duration: 5s
“First-person POV. The camera glides smoothly and continuously forward deep into the sci-fi corridor. Glowing neon lights pass by rapidly on both sides. Tiny glowing dust particles float in the illuminated air. Steady tracking shot, immersive atmosphere.”
Duration: 5s
“Time-lapse effect. The thick morning mist rolls and flows fluidly through the pine trees like a slow-moving river. The bright volumetric light rays shift their angle dynamically as the sun rises. Cinematic slow zoom in.”
Duration: 5s
Combine up to 5 reference images into a coherent scene
“The girl from Image 1 is jogging lightly through a sunlit forest. The glowing forest spirit from Image 2 playfully flies closely behind her like a small comet, leaving a faint luminous trail in the air. Golden light filters through the dense trees. Cinematic audio of soft, quick footsteps on grass, a gentle magical whoosh, and distant bird calls.”
Duration: 5s
“Place the cotton doll from Image 1 into the vintage room from Image 2. The doll sits on the wooden workbench, gently swinging its legs, looking around curiously. Keep the lighting of Image 2 and the plush texture of Image 1 strictly consistent.”
Duration: 5s
“The idol from Image 1 stands on the water stage from Image 2, directly in front of the giant glowing moon. The idol steps forward slowly, creating gentle ripples in the water, and raises the microphone to sing. The soft blue light from the moon reflects perfectly on the idol's outfit.”
Duration: 5s
Replace subjects, styles, or elements while keeping camera motion
“Replace the teenage boy in the video with SpongeBob SquarePants. He should retain his classic iconic look: a yellow rectangular sea sponge with large blue eyes, wearing a white collared shirt, red tie, and brown square pants. SpongeBob should be riding the skateboard naturally and performing the kickflip. Render him in a high-quality 3D realistic style to match the lighting and shadows of the real-world park background. Keep the original camera tracking and motion exactly the same.”
“Replace the grey hoodie and pants with the floral silk skirt from the reference image. The skirt should flow and sway naturally with the woman's walking and spinning motion. Keep her face, hair, and the living room background exactly the same.”
“Transform the entire video into a vibrant Lego world. The person, the desk, and every object in the room should be constructed from high-quality plastic Lego bricks. Keep the original waving motion and spatial layout perfectly. The lighting should be bright and clean, like a professional Lego toy commercial.”
HappyHorse FAQ
HappyHorse 是阿里巴巴下一代多模態視頻模型,原生支援音訊與視頻協同生成,並在單一統一模型中提供四個生產就緒場景:文字生視頻、圖像生視頻、多圖參考與視頻原位編輯,深度適配廣告、電商、短劇與社群創意。
HappyHorse 支援 720p 與 1080p 輸出;常用時長為 5/8/10 秒;視頻編輯場景採用源視頻的時長。
參考生視頻與視頻編輯場景最多可使用 5 張參考圖。請在提示詞中以 Image 1 / Image 2 等標籤精確綁定每個元素。
上傳源視頻並描述要變更的內容,HappyHorse 會替換主體、服裝或整體風格,同時完整保留原始鏡頭路徑、節奏與構圖。適合在地化、創意重混與快速驗證視覺方向。
提供每日免費生成額度。定價依時長與解析度計算:720p 為 31 積分/秒,1080p 為 51 積分/秒。
無需註冊即可試用。註冊後可保存歷史、解鎖更長時長並追蹤積分餘額。
可以。Nano Banana 2 Pro 平台面向全球用戶開放,香港用戶可以正常註冊和使用所有功能。支持繁體中文介面和提示詞輸入。
完全支持繁體中文提示詞。模型對中文語義理解良好,可以直接用繁體中文描述想要生成的圖片內容。簡體中文和英文提示詞同樣支持。
可以。所有通過 Nano Banana 2 Pro 生成的圖片均可用於商業用途,包括社交媒體、廣告設計、電商等場景,無需額外授權。
"HappyHorse 讓我們以同份 brief 產出四種風格的產品視頻——多圖參考是效率神器。"
電商創意總監
"文字/圖像/參考/編輯一體化,讓團隊工作流高度緊湊。HappyHorse 已成為我們管線的常駐模型。"
廣告公司總監
"HappyHorse 讓我們以同份 brief 產出四種風格的產品視頻——多圖參考是效率神器。"
電商創意總監
"文字/圖像/參考/編輯一體化,讓團隊工作流高度緊湊。HappyHorse 已成為我們管線的常駐模型。"
廣告公司總監
體驗 HappyHorse——阿里巴巴的多模態視頻模型,線上免費使用