The Problem
In order to generate visuals, i'm generally using Google model for their multimodality. When using on google AI studio, we miss some features that the API of Google model offers. Here, i'll leverage the "extend video" option to generate long clips to generate ads. VEO 3.1 caps generation at 8 seconds per clip. Producing a real ad (15-60s+) requires chaining extensions sequentially, managing continuity between segments, and concatenating the output.
What I Built
I generally don't use Google AI studio because the UI can be buggy and take a huge time to generate in my opinion.
- A one-input interface: describe your ad in plain English, pick a duration, done.
- A Gemini-powered prompt optimizer that expands a vague idea into structured per-segment VEO prompts (cinematography, subject, action, audio cues, continuity links). All of this is powered by a custom prompt (created on my input and optimized by Gemini 3.1 pro preview) that respect the JSON input format for prompting VEO 3.1 which provides better results.
- Automatic extension chaining: the server generates segment 1, then extends it N times, downloads each tail, and ffmpeg-concatenates into one file.
- One-click background removal (per-frame alpha extraction -> transparent WebM) so the subject can be composited onto any backdrop.

Tech Stack
Node/Express. Gemini 3.1 Pro (prompt optimization). VEO 3.1 API (video generation + extension). @imgly/background-removal-node. ffmpeg
How It Works
Brief -> Gemini optimizes into N segment prompts -> VEO generates base clip -> VEO extends N-1 times -> ffmpeg concat -> optional BG removal -> final MP4/WebM

Final Result
Then I can go on CapCut to quickly add the background of my app and apply my AI explaining what SignatureMaker can solve.

Timeline
2026
Stack
Responsibilities
- Prompt engineering for VEO continuity
- Extension chaining pipeline
- Background removal integration
- ffmpeg concatenation automation