🎯 Why Edit & Retry is the real engine

First generation is a starting point, not the finish line. The magic of Sleepy Motion is that you can keep the parts that are already strong and refine only the parts that are off.

Think of it like direction, not random re-rolls: you steer the script, pacing, and voice choices until it matches exactly what you had in mind.

The best users do not generate once. They run small, intentional retries.

🧷 What gets preserved on retry

Retry is designed to preserve your chosen direction. In normal usage, these are kept unless you explicitly change them:

Business context and creative angle
Brand colors, fonts, and logo choices
Selected style and mode
Voice/vibe preferences (unless edited)
Your manual script edits when provided

This is why retries are high leverage: you improve fidelity without resetting everything.

📌 What to edit first (highest impact order)

When a result is close but not perfect, edit in this order:

Script clarity: tighten weak lines and remove generic phrases.
Hook strength: make the first line hit harder and faster.
Voice + vibe alignment: match delivery to audience and brand tone.
Duration fit: move to 25-30s only if the message feels cramped.
Visual identity: colors/fonts/logo consistency pass.

Most quality gains happen in script + hook, not from random style switching.

✍️ How to give surgical script edits

Use direct line edits instead of vague instructions.

Low signal

"Make it better"

High signal

"Keep line 1. Replace line 2 with a stronger benefit. End with a direct CTA to book now."

The more specific your edit intent, the fewer retries you need.

🎙️ Voice, vibe, and timing adjustments

If the message is right but the energy feels wrong, adjust voice and vibe before rewriting the full script.

Use a more assertive voice for hard-sell offers.
Use calm/professional tone for trust-heavy services.
Shorten sentence structure for 10-15s formats.
Move to 25-30s when proof points do not fit naturally.

🎙️ Retrying a Custom VO video

Custom VO videos work differently on retry because your audio is already saved the engine does not ask you to re-upload it. What you are editing is how that audio is presented on screen.

After the first generation, each screen in the script editor shows the phrase that was assigned to that timing window. That is what you can fix: the text labels, not the audio itself. The underlying beats when each screen cuts, how long each phrase holds came from the word-timing analysis the engine ran on your audio the first time.

The two retry modes

Re-align timing OFF (default)

The beat structure from the first run is kept exactly as-is. You only edit what text appears on each screen. If the cuts and timing felt right but the words were wrong or slightly off, this is your option. Fast, predictable, no surprises.

Re-align timing ON

The engine runs a fresh word-to-timing analysis on your saved audio from scratch. The screen cuts and phrase groupings may change completely. Use this only when the first timing felt genuinely broken screens cutting too early, words grouped across the wrong beats, long awkward pauses on the wrong screens.

Important: this works well for clean speech recordings. If you uploaded a song, a heavily produced track, or audio with significant background noise, re-aligning is unlikely to improve things the engine is not built to extract word beats from music.

Decide based on what went wrong:

→

Wrong words on the right screens leave re-align off. Just correct the phrase text for each screen.

→

Right words, wrong screen order leave re-align off. Drag or rewrite phrases to match the order you want.

→

Timing feels completely broken (screens cutting mid-word, wrong groupings throughout) try re-align on, but only if your audio is clear speech.

→

Audio is a song or music re-align will not reliably fix it. Edit the phrases manually to match what you want on each screen instead.

🎬 The After Effects-grade workflow (for music & messy audio)

Read this first

For clean TTS and normal voiceovers, the engine already nails timing on the first generation you almost never need this workflow. Where it shines is the hard cases: uploading a song to make a music video, or a messy / heavily-produced TTS where the first cut has obvious wording or beat issues. In the absolute worst case, three jobs total will get you a polished result.

This is the path I've found fastest when the audio is genuinely difficult for example, dropping in a song and wanting a publish-ready music video in minutes. Three passes, in this exact order:

Pass 1 Generate normally

Submit the job the way you always would: pick your style, fonts, brand colors, vibe, and let the engine produce a first cut. Don't try to fix anything yet you need a baseline.

Pass 2 Retry with script edits only

Open the editor and fix only the wording on each screen. Do not open the detailed editor yet. Hit Retry. The engine re-aligns your corrected text to the existing audio, so every word on every screen is now exactly what was sung or said.

For songs especially, this pass is where you correct the lyrics that the first-pass transcription mis-heard.

Pass 3 Detailed editor for timing

Now open the Detailed Editor. The wording is already correct, so all you do here is nudge a few screen and word boundaries on the timeline. Drag an edge a few hundred milliseconds, splice in an extra screen at a junction with the + button, or delete a stray screen. Hit Save, then Retry.

The detailed editor lets you preview your timing live with a karaoke-style highlight, so you can hear and see exactly when each word lands before committing.

Why three passes beat one big edit

Each pass isolates one variable. Pass 2 fixes what is said; Pass 3 fixes when it appears. If you try to fix both at once in the detailed editor, you end up retyping phrases and dragging boundaries and small wording changes can shift word counts, which throws off the timings you just adjusted. Doing them in sequence guarantees zero regressions.

Heads up: once you save edits in the Detailed Editor, the Full Narration field locks (it's blurred and read-only) so your per-word timings can't be invalidated by a stray text change.

Don't pre-strip the instruments. It's tempting to run a song through a vocal-isolation tool first to give the engine “cleaner” audio don't. Most of those tools do a poor job and leave smeared, phasey vocals behind. More importantly, Sleepy Motion's engine reacts to beats and instrumentationfor visual hits and motion timing, so handing it a vocals-only stem throws away the fine-tuned effects that make a music video feel alive. Upload the original mix.

🔁 A 3-pass workflow to reach final quality

Pass 1 - Direction: confirm concept, audience fit, and hook angle.
Pass 2 - Language: refine script lines, remove fluff, sharpen CTA.
Pass 3 - Polish: lock voice/vibe and verify brand consistency.

Most teams can get from rough to publish-ready in 2-4 retries with this structure.

🚫 Common mistakes that waste retries

Changing too many variables at once, then not knowing what improved the result.
Using vague feedback instead of explicit line-level edits.
Switching styles before fixing script quality.
Ignoring brand colors/fonts and trying to solve everything through prompts.
Expecting first pass perfection instead of iterating intentionally.

Edit & Retry Masterclass