What is it?
a prompt refers to the specific input or instruction given to an AI model to generate a desired response. A prompt can be a question, a statement, a few keywords, or a detailed instruction. The quality, clarity, and specificity of the prompt directly influence the quality of the AI's response.
The better and clearer the prompt, the more relevant and accurate the AI’s output will be. In essence, a prompt serves as the guiding instruction that steers the AI towards generating the desired output.
In the course we will use both text-based AI models for some tasks (like scripts, research and more); and we'll use AI image and video generators like Runway, Veo and Midjourney for images and video.
Check out the official prompting instruction pages from specific tools by clicking on the icon buttons below & scroll down for a condensed overview of each tool.
Below, you'll find tailored prompting guides organized by AI tool categories:
Each section includes specific tools and best practices for crafting effective prompts tailored to that tool's unique capabilities. These guides are designed to help you get the most out of each AI platform—whether you're generating a screenplay, creating visuals, producing video content, or composing audio.
Important Note: AI prompts will always generate a result—but not always the same one. Even identical prompts can yield different outcomes. That’s part of the creative process.
We encourage you to use these guides as starting points. Think of them as templates rather than strict formulas. Not every project requires every element of a detailed prompt. The best way to learn is through experimentation—try different approaches, refine your inputs, and see what works best for your unique goals.
Explore. Experiment. Create.
click the drop down arrows to revel more
ChatGPT excels at generating creative text, dialogue, storylines, and detailed narrative scripts for video, audio, or written formats. It works especially well when you provide clear goals and context.
When prompting ChatGPT:
You can also collaborate interactively — ChatGPT allows back-and-forth refinement, helping you revise and expand ideas quickly.
Include:
Avoid:
ChatGPT is highly effective at:
“Write a 2-minute video script in the style of a motivational short film. A single mom working two jobs finally gets her nursing license. Use emotional dialogue, a cinematic arc, and a powerful voiceover.”
“Outline a 4-part YouTube series called ‘Modern Cowboys’, covering how traditional ranchers are using AI, drones, and solar energy in 2025. Each part should include a topic title, scene structure, and key interview ideas.”
“Suggest 5 video ideas for a Gen Z-focused channel that mixes AI tools with creative hobbies (e.g., music, writing, art). The ideas should be fun, engaging, and easy to film.”
“Create a prompt for Google Veo to generate a cinematic scene of a dusty Southern town at golden hour. Include a teenage girl in cowboy boots walking along train tracks, with voiceover narration about chasing freedom. Style: modern western, soft lens flares, melancholy tone.”
“Write a 1-minute voiceover script for a dramatic AI-generated short about loneliness in the digital age. The visuals should include a man in a dark apartment, neon lights from the city outside, and slow zooms. Add a poetic narration with a hopeful ending.”
Overloading your prompt with too many requests in one sentence
Instead, break it into parts or use bullet points:
“Give me:
A logline,
Scene outline,
Sample dialogue.”
Assuming ChatGPT will guess your format
Be explicit:
“Write it as a 3-act script with headings: ACT I, ACT II, ACT III.”
https://chat.openai.com
Or use the ChatGPT iOS/Android app for on-the-go idea generation
Gemini specializes in generating multi-modal content, combining visuals with textual explanations.
Many creators benefit from using both in tandem — Gemini to plan visual layout and structure, ChatGPT to refine the tone and storytelling.
Gemini is Google’s multimodal AI, designed to seamlessly generate content that blends visuals, text, voice cues, layout ideas, and media structure. It’s especially powerful when working on:
“Write a 2-minute educational video script explaining the causes of wildfires. Include voiceover narration, on-screen bullet points for key facts, visual cues for drone footage of forests, animated fire maps, and calming background music.”
“Outline a 4-part product demo video for a smart home device. Each part should include:
Intro problem (30 sec),
Feature demo (1 min),
Testimonials (45 sec),
Call to action (15 sec).
Include visual transitions, overlay text, and suggested B-roll footage.”
“Generate 5 visual-first video ideas for a women’s beginner fitness channel. Each idea should include a title, visual hook, camera angle suggestion (e.g., top-down, side profile), and the type of background music or tone to use.”
“Create a 60-second Instagram Reel script about plastic waste in oceans. Use punchy stats in large on-screen text, B-roll of marine life and ocean pollution, and end with a bold CTA. Keep the tone urgent but hopeful.”
“Design the script and visual plan for a short video based on this Google Sheet data (insert link). Include a title card, transitions, animated bar chart overlays, and a summary screen with key takeaways. Intended for LinkedIn.”
“Write a dramatic 1-minute script for a short film where a country girl confronts her brother about leaving the family farm. Set the scene at dusk with warm lighting. Include facial reaction cues and soft ambient music.”
When using Gemini inside Docs, Slides, or YouTube Shorts Editor, you can:
Gemini thrives when you treat it like a creative director and editor in one — give it structure, visuals, and tone, and it can build out highly integrated video content.
Claude (by Anthropic) is known for its ability to create thoughtful, emotionally intelligent, and ethically grounded content. It thrives in projects that require:
Claude is particularly effective for:
Claude stands out for its emotionally rich, ethically grounded storytelling — perfect for drama, documentaries, or social issue content where nuance and reflection matter. In contrast, ChatGPT is more dynamic and collaborative, excelling at fast-paced scriptwriting, genre work, and dialogue-heavy content. Gemini is the most visual of the three, ideal for creators working on educational, product-based, or multimedia content that requires clear visual structure, transitions, and integration with Google tools.
Claude (Anthropic):
“Write a 5-minute short film script about a refugee father trying to reconnect with his daughter over video calls while working abroad. Show emotional distance, cultural tension, and a hopeful resolution.”
“Outline a 3-part video documentary on the emotional toll of student debt in the U.S.
Structure:
Introduction (facts & stats)
Personal stories (3 individuals)
Expert insight & conclusion.
Include tone, visual approach, and narration style.”
“Suggest four compelling video ideas exploring ethical challenges in biotechnology, aimed at Gen Z science enthusiasts. Include potential titles, central questions, and a real-world case to explore in each.”
“Create a script for a YouTube explainer that introduces the debate around AI-generated art and its impact on human creativity. Include perspectives from both sides, a neutral narrator, and a reflection at the end.”
Use Claude when your creative work involves:
Claude is less flashy, more literary. Think of it like your sensitive screenwriter friend who asks the deeper questions.
GitHub Copilot, powered by OpenAI, is specifically designed for structured, logic-driven content, especially in coding, technical tutorials, developer documentation, and how-to walkthroughs.
This may not be the most chosen tool for most of you developing AI driven video content, but it maybe - depending on your topic.
In 2025, Copilot continues to excel in:
“Write a detailed 10-minute YouTube script for a beginner tutorial on building a password generator in Python. Include step-by-step code walkthroughs, beginner-friendly explanations, and occasional voiceover suggestions.”
“Outline a beginner video tutorial for photography basics. Divide it into 5 parts: camera types, framing techniques, lighting fundamentals, common beginner mistakes, and pro tips. Add short intros and transitions between segments.”
“Give me 3 engaging YouTube video ideas for teaching Python basics to high school students. Include a working title, project overview, and key learning outcomes.”
“Create an outline for a 3-part mini-course video series teaching HTML and CSS. Each video should build on the last, include coding examples, and end with a small challenge project.”
“Write a script that explains how a ‘for loop’ works in Python to a middle schooler. Include visual metaphors, real-life comparisons, and simple language for voiceover.”
Copilot is purpose-built for generating structured, step-by-step content, making it ideal for technical tutorials, code explanations, and instructional videos. It shines when you need clear logic, consistent formatting, and beginner-friendly walkthroughs. In contrast, ChatGPT is the most conversational and flexible, excelling in storytelling, dialogue, and iterative creative development — perfect for scripting narratives, YouTube videos, and character-driven content. Claude specializes in deep, ethically aware storytelling, and is best suited for projects involving emotional arcs, social issues, or philosophical themes. Finally, Gemini stands out for its tight integration of visuals and text, making it the go-to choice for creating visually structured content like explainer videos, product demos, or content destined for Google Slides, Docs, or YouTube Shorts.
Use Copilot when your video content is:
Avoid Copilot for emotional storytelling, drama, or conceptual ideation. It’s a technical co-pilot, not a screenwriter.
Perplexity is an AI platform known for its fact-first, research-focused capabilities. In 2025, it's a go-to tool for creators who need:
Think of Perplexity as your AI research assistant that can also draft compelling analytical narratives. It’s ideal for:
“Generate a 5-minute video script explaining the key milestones in the development of quantum computing. Include three historical events, how they impacted the field, and references to key researchers.”
“Outline a 6-part educational YouTube video on the global rise of electric vehicles.
Include:
History of EVs
Modern tech breakthroughs
Market growth stats
Environmental impact
Government policies
Future projections.
Reference real-world data and reports.”
“Suggest three video ideas for a data journalism YouTube channel focused on global economic shifts. Each idea should include a working title, key stats to include, and suggested sources or regions to focus on.”
“Write a short explainer video about CRISPR gene editing. Make it clear, factual, and suitable for college students. Include on-screen text suggestions for scientific terms, and mention 2 recent studies.”
Perplexity is the best tool when accuracy, research, and real-time data are essential. It’s ideal for fact-based content like explainers, trend analysis, and educational videos. In contrast, ChatGPT is more narrative-driven, great for scripting stories or engaging dialogue. Gemini shines when visuals are core to your project — from slide-based videos to product explainers — while Claude offers emotional and ethical depth for reflective or issue-based storytelling. Copilot rounds out the group by excelling in instructional, code-focused content where step-by-step clarity is key.
Squibler is built for professional scriptwriters and long-form storytellers, offering specialized tools for outlining, formatting, and developing screenplays, episodes, and structured narrative arcs.
In 2025, Squibler is perfect for:
It’s like a digital screenwriting room with AI support baked in — focused more on structure and formatting than raw idea generation.
“Write a 15-minute script for a dystopian web series pilot where a rogue AI controls a city’s water supply. Introduce the protagonist, build tension with the system, and end with a cliffhanger where they discover a hidden truth.”
“Outline a 3-act structure for a 7-minute short film where a heist goes wrong.
Act 1: The setup + character motivations
Act 2: The job in progress, unexpected obstacle
Act 3: Betrayal, escape, ambiguous ending.”
“Generate three episodic video story ideas set in a cyberpunk city. Each should follow a young tech detective unraveling a deeper conspiracy. Include episode titles, main conflict, and teaser line for each.”
“Create a high-level 5-episode arc for a drama series about a rural nurse fighting a corrupt health system. Include evolving character dynamics, main moral conflict, and how the protagonist changes over time.”
Squibler stands apart by focusing entirely on structured narrative development, making it ideal for screenwriters, filmmakers, and episodic content creators. It’s less about ideation and more about refining story arcs, organizing scenes, and formatting dialogue professionally. In contrast, ChatGPT is better for fast-paced creative brainstorming and tone control, while Claude focuses on emotional and ethical depth. Gemini is your best choice for creating videos with a strong visual-text connection, and Copilot leads for tutorials and code-based content. Perplexity, meanwhile, is unmatched in data-backed, research-driven scripting.
ChatSonic by Writesonic is built for real-time, trend-aware content creation. It specializes in generating casual, conversational material around current events, social media trends, tech updates, and pop culture. Unlike platforms focused on storytelling or deep analysis, ChatSonic is optimized for speed, freshness, and audience relatability, making it ideal for influencers, marketers, creators, and news-style content producers.
In 2025, it remains one of the best options for producing:
ChatSonic now integrates enhanced live web search, giving it stronger access to up-to-the-minute information across tech, entertainment, finance, and social trends. Its tone customization has also improved, letting you toggle more easily between influencer-style delivery, news brief summaries, or casual podcast-style narration. In addition, its SEO suggestion engine can now be layered directly into content generation — helpful for creators trying to stay discoverable in fast-moving niches.
To get the best results, clearly define the tone (e.g. casual, expert, energetic), the trending context, and your audience's familiarity level. ChatSonic is especially strong when you're aiming for natural dialogue, commentary, or topical engagement.
Include:
Avoid:
Conversational Script Prompt Example:
“Write a 5-minute YouTube script in a casual, vlog-style tone discussing the top 3 AI tools creators are using in 2025. Include real examples, humor, and mention of popular creator reactions.”
Video Structure Prompt Example:
“Outline a 7-minute video for a digital marketing channel. Topic: ‘Top Instagram Growth Tactics That Still Work in 2025.’ Include intro hook, audience pain points, strategy breakdowns, and call-to-action.”
Video Ideas Prompt Example:
“Give me three video ideas for a Gen Z influencer covering trending TikTok challenges and viral topics from July 2025. Include catchy titles and what makes each trend unique.”
Explainer Prompt Example:
“Create a script in the style of a fast-paced social media explainer breaking down the latest Threads vs. X platform battle. Mention user reactions, growth stats, and creator shifts.”
ChatSonic is distinct from platforms like ChatGPT, Claude, or Perplexity in that it's laser-focused on real-time, trending, and conversational content. While ChatGPT is highly creative and versatile across narrative forms, and Claude excels at emotional and ethical depth, ChatSonic works best for topical scripts, live commentary, or social-first ideas. It's also more agile than Gemini when it comes to spontaneous content not tied to visuals. Unlike Copilot, which is tailored for technical clarity, or Perplexity, which prioritizes research accuracy, ChatSonic aims to help creators move quickly and stay relevant. Its sweet spot is immediacy — making content that feels like it belongs right now in a feed.
Try ChatSonic:
https://writesonic.com/chat
TextCortex is purpose-built for concise, attention-grabbing writing. In 2025, it’s a standout tool for generating short-form video scripts, social media captions, ad copy, and compact marketing messages. While other AI platforms may focus on story depth or visual integration, TextCortex thrives where brevity meets impact.
It’s ideal for:
Think of TextCortex as the tool you reach for when you need to say a lot in a little time—with rhythm, clarity, and energy.
TextCortex works best when you’re specific with format and outcome. It’s strong at trimming fat from ideas while keeping energy and clarity intact.
Include:
Avoid:
Social Script Prompt Example:
“Write a 30-second Instagram Reel script promoting a refillable water bottle. Use upbeat, eco-conscious language and end with a CTA to ‘shop the link in bio.’”
Video Structure Prompt Example:
“Outline a 3-part short-form video to promote a new digital course for freelancers.
Part 1: Grab attention with a pain point
Part 2: Quick course benefits
Part 3: Strong call to action with urgency.”
Video Ideas Prompt Example:
“Suggest three punchy Instagram video ideas for a skincare brand launching a plant-based moisturizer. Each idea should include a hook, visual concept, and caption-style CTA.”
Short Explainer Prompt Example:
“Create a script for a 20-second TikTok explaining why compostable packaging is better for the planet. Make it fast, fun, and easy to follow.”
TextCortex is uniquely positioned for high-impact, short-form content, particularly when speed, clarity, and platform awareness matter. It doesn’t attempt to write novels or screenplays—instead, it helps brands and creators communicate quickly and persuasively in spaces like social media, email, and product videos. Compared to ChatGPT, which can generate rich, evolving narratives, TextCortex trims the storytelling down to essentials. Gemini’s strength lies in visual-text synthesis, while Claude favors deep emotional and ethical structure. Perplexity focuses on data-driven analysis, and Copilot dominates in instructional or technical guidance. TextCortex, by contrast, is the go-to for fast-paced, sales-ready language—ideal for modern content marketing.
Try TextCortex:
https://textcortex.com
click the drop down arrows to revel more
Midjourney has become the go-to platform for stylized, cinematic, and emotionally rich visuals. With the release of Midjourney V7, the model has evolved into a powerhouse for generating photorealistic, painterly, surreal, and concept-driven artwork. Whether you’re building a fantasy world, a product concept, or a moody cinematic shot, Midjourney brings ambiance, texture, and imagination to life with stunning visual cohesion.
It remains ideal for:
With V7, Midjourney has become:
New V7 features also include:
To get powerful results with Midjourney, combine rich scene description with clear visual language. Think of it like writing for a production designer or concept artist.
Prompting Formula:
[Subject] + [Scene/setting] + [Mood] + [Style/art medium] + [Lighting] + [Color palette] + [Camera or composition notes] (optional)
Prompt Example 1:
A lone cowboy walking through a foggy desert valley at dawn, cinematic mood, soft ambient light, warm earth tones, inspired by Roger Deakins, wide shot, 2.35:1 aspect ratio, hyper-detailed grainy film still
Prompt Example 2:
A whimsical forest tea party with foxes and raccoons wearing tiny hats, surreal and painterly style, inspired by Studio Ghibli and Beatrix Potter, golden hour light, soft pastels and mossy greens, magical ambiance
Prompt Example 3:
A futuristic cyberpunk alley at night, glowing neon signage, rain-slick streets, deep contrast, ultra-wide shot, 80s retrofuturism style, vivid purples and electric blues, detailed reflections, gritty urban textures
Prompt Example 4:
A fashion editorial shoot in a post-apocalyptic greenhouse, model in layered fabrics, overgrown vines, soft backlighting, moss-covered stone floor, high-fashion meets ruinpunk aesthetic, moody and textured, Vogue-style composition
Midjourney remains the most visually poetic AI compared to more function-focused platforms. While Gemini blends visuals with structured logic and ChatGPT focuses on text-based prompting, Midjourney interprets mood and style more intuitively—often feeling like you're art-directing a dream. It doesn’t require code, datasets, or realism-by-default like tools such as DALL·E for prototyping or design. Instead, it leans into atmosphere, surrealism, and stylistic richness, making it a top pick for concept art, storytelling visuals, or expressive branding.
DALL·E is OpenAI’s image generation model designed to turn natural language prompts into detailed, coherent, and often photorealistic or illustrative visuals. With its 2025 improvements, DALL·E now delivers stronger image consistency, more accurate compositions, and style-guided generation using image references or inpainting.
It excels in:
It strikes a balance between creative surrealism and design-ready realism, making it especially powerful for marketing, storytelling, teaching, and prototyping.
DALL·E responds best to prompts that are:
Prompting Formula:
[Primary subject] + [Scene or setting] + [Style or medium] + [Lighting or mood] + [Color palette or composition details]
Prompt Example 1:
A cozy Scandinavian-style kitchen interior with natural wood cabinets, potted herbs on the windowsill, and sunlight pouring through large glass windows. Clean and minimalistic, warm earth tones, highly realistic style.
Prompt Example 2:
A futuristic cityscape at dusk, neon lights reflecting off wet streets, flying cars above, and citizens in reflective clothing walking through the plaza. Cyberpunk style, dramatic lighting, wide cinematic frame.
Prompt Example 3:
A hand-drawn children’s book illustration of a fox and a squirrel having a picnic in a flower-filled meadow, pastel color palette, friendly expressions, illustrated in watercolor style.
Prompt Example 4:
A sleek, modern electric car parked on a mountain overlook road, mist surrounding the valley below. Rendered in high-resolution, realistic lighting with a soft-focus background and glossy car surface.
DALL·E differs from other image platforms like Midjourney or Stable Diffusion in its focus on prompt accuracy, realism, and structured composition. While Midjourney excels at artistic surrealism and painterly ambiance, DALL·E is more flexible across styles and often better at following logical or branded layouts — making it excellent for marketing visuals, book illustrations, and product design. It’s also deeply integrated into the ChatGPT environment, allowing users to generate, refine, and edit images directly within conversations. Compared to Gemini, which fuses text with visual elements more educationally, DALL·E remains a pure visual-generation engine tuned for practical creativity and design precision.
Gemini (formerly Bard), now part of Google’s broader multimodal ecosystem, integrates tightly with Google's Imagen model for image generation. In 2025, it excels at producing narrative-aligned, context-aware, and highly coherent visuals—especially when paired with a story, mood, or functional intent.
Gemini stands out for its ability to:
This makes it a strong choice for:
As of 2025, Gemini uses Google’s Imagen 2 for visual generation within the Gemini interface. Imagen is known for generating realistic and richly detailed images, particularly when paired with structured or narrative input.
Imagen is optimized for:
Gemini responds best when your prompt includes:
Prompting Formula:
[Narrative or theme] + [Visual subject or environment] + [Tone or purpose] + [Style or medium]
Prompt Example 1:
Create an image of a bustling 1920s jazz club. Show a band onstage with brass instruments, couples dancing in formal wear, and patrons at candlelit tables. Use sepia tones and art deco style to reflect the era’s mood.
Prompt Example 2:
Generate a visual of an underwater research base. Include glass observation domes, colorful coral reefs, bioluminescent sea creatures, and ambient blue-green lighting. The scene should evoke scientific curiosity and exploration.
Prompt Example 3:
Illustrate a three-step process for composting at home, in a clean flat-vector style suitable for a classroom poster. Include labeled bins, food scraps, and worms, with each step numbered.
Prompt Example 4:
Design a futuristic classroom setting for a tech-focused learning environment. Include digital blackboards, students using AR glasses, and a clean, minimalist architectural style. Bright natural lighting and diverse student characters.
Gemini (with Imagen) differs from models like Midjourney or DALL·E by prioritizing narrative alignment, educational clarity, and contextual realism. While Midjourney leans into moody and artistic abstraction, and DALL·E offers flexibility between surreal and practical design, Gemini excels when your image needs to support a structured idea, learning objective, or clean brand tone. Its integration with Google’s tools also makes it particularly strong for professional or academic content. Compared to Meta AI, which focuses more on emotional and social visual storytelling, Gemini’s strength lies in intentional, clear, and logically grounded visual outputs.
Adobe Firefly, part of Adobe’s advanced generative suite, is designed for creators and video producers looking to easily integrate AI-generated visuals into their projects. In 2025, it excels at producing high-quality, context-driven images that can be seamlessly used in video creation, offering a new level of creative flexibility for video editors, animators, and content creators.
Firefly shines for its ability to:
This makes it an excellent tool for:
As of 2025, Adobe Firefly is fully integrated with Adobe’s suite of creative tools, offering an AI-powered design system that prioritizes artistic control and customization. With its ability to generate professional-grade visuals, Firefly is uniquely positioned to create assets for videos, including backgrounds, animated assets, product visuals, and scene-building elements.
Firefly is optimized for:
Video Asset Creation: Firefly now includes video-oriented tools that allow users to generate images specifically for video purposes. These include background environments, motion design assets, and character elements that fit video dimensions and formats.
Dynamic Scene Adaptation: Firefly can generate visuals based on the movement or progression of a video scene. This is ideal for creating backgrounds or environments that evolve with the flow of a video.
Real-Time Customization: A new feature in Firefly allows for the instant adjustment of visual elements, enabling video editors to tweak generated images without leaving the Adobe platform. This streamlines workflows, especially for video editors working on tight deadlines.
Asset-to-Scene Matching: For users building out a full scene for video production, Firefly can now match visual elements across various assets. This ensures that all generated imagery—whether it’s a character, prop, or background—matches the same style and tone throughout the video.
When using Firefly for video projects, it’s essential to provide clear prompts that align with the overall tone, theme, and progression of your video. The more specific your prompt, the more tailored your assets will be, helping you seamlessly integrate them into your final product.
Firefly responds best when your prompt includes:
Prompting Formula:
[Use case] + [Visual subject or asset type] + [Style or tone] + [Color, lighting, or thematic elements]
Include:
Avoid:
Prompt Example 1:
Generate a dystopian cityscape for an animated intro video. The scene should feature towering, sleek skyscrapers with glowing neon signs and empty streets. Use deep blues and purples with cold lighting to create a futuristic yet lonely atmosphere. The image will fade into a bright, bustling city center as the scene progresses.
Prompt Example 2:
Create a serene, forest-based background for an educational video on wildlife. Include tall, green trees, lush undergrowth, and soft sunlight filtering through the branches. The scene should feel calming and natural, with warm golden tones and soft green hues, evoking a sense of peace and harmony with nature.
Prompt Example 3:
Generate an abstract representation of the internet for a digital marketing video. The image should feature glowing data streams, interconnected nodes, and a glowing, circuit-like background. The style should be clean and modern, with cool blue tones and sleek, high-tech lines to convey digital connectivity.
Prompt Example 4:
Create a dynamic background for a fitness tutorial video. The scene should include a spacious gym with modern workout equipment, mirrored walls, and motivational banners. Bright, energetic lighting and vivid colors (blues, oranges, yellows) should create an atmosphere of motivation and energy.
Adobe Firefly distinguishes itself from other generative AI tools like DALL·E and Midjourney by focusing on user control, seamless integration with Adobe’s creative suite, and video-specific features. While DALL·E offers flexibility for a wide variety of visual styles, and Midjourney excels at abstract and artistic concepts, Firefly’s strength lies in creating assets that can be directly applied to professional video projects. It is especially powerful for designers and video editors looking for custom assets that align perfectly with their branding, storytelling, and video production needs.
Additionally, Firefly’s integration with Adobe’s other tools means that video creators can easily refine, edit, and animate AI-generated visuals within familiar programs like Premiere Pro or After Effects, offering an intuitive, streamlined workflow.
Runway ML, now with its advanced Gen 4 model released in 2025, is designed to allow creators to seamlessly transition from image generation to video production—all within one platform. It excels at generating stunning, high-quality visuals that can then be transformed into dynamic videos, making it an essential tool for video creators, digital artists, and designers.
Runway ML is particularly strong in:
This makes it a top choice for:
Runway ML Gen 4 takes image generation to the next level by not only creating high-quality visuals but also enabling creators to smoothly turn those images into video clips, animations, or even complex scenes. With its powerful AI model, Runway ML offers advanced capabilities for video creators who need precise images for use in their video projects, all while keeping the flow consistent across both formats.
Runway ML is optimized for:
Image-First to Video Workflow: Gen 4 introduces an enhanced pipeline where creators can generate images first and easily transform them into dynamic video content without leaving the platform. This is ideal for filmmakers, animators, and content creators looking to maintain consistent visual quality across both image and video formats.
Reference Image Upload: You can now upload up to 3 reference images to guide the AI in generating visuals that are closely aligned with real-world images or pre-existing concepts. For example, if you're creating a character, you can upload photos of a person or a specific scene to ensure the AI’s output closely matches the desired look.
Enhanced Styling Options: With Gen 4, you can now provide even more detailed stylistic guidance for your images, helping to generate visuals that fit perfectly within the overall aesthetic or tone of your video project. Whether you're working on a cinematic video or an abstract animation, Runway ML now offers enhanced flexibility for controlling the look of your visuals.
When generating images for video production in Runway ML, it's crucial to be specific with your prompts to ensure that the AI produces visuals that will fit well within the larger context of your video. If you plan to use reference images, make sure to include them in your prompt to guide the AI toward the desired outcome.
Runway ML responds best when your prompt includes:
Prompting Formula:
[Visual subject or asset] + [Detailed environment or setting] + [Style, lighting, and tone] + [Reference images] + [Context for video transition]
Include:
Avoid:
Prompt Example 1:
Generate a futuristic character for a sci-fi video. The character should have metallic armor with glowing blue accents and a cyberpunk-style helmet. Upload reference images of a person in a sleek suit and futuristic cityscapes to guide the design. The character will be used in multiple video scenes, so make sure the design is clean and visually striking, with a strong emphasis on metallic textures and vibrant lighting.
Prompt Example 2:
Create a serene mountain landscape at sunrise for a travel video. The scene should feature snow-capped peaks, pine trees, and a soft pink glow in the sky. Use reference images of a real mountain range to guide the natural feel of the landscape.
Prompt Example 3:
Generate an abstract neon-lit cityscape for a high-energy music video. The scene should include glowing signs, dark alleys, and illuminated streets. Use reference images of urban street scenes with neon lights to guide the visual design.
Prompt Example 4:
Create a vintage 1920s jazz club scene for a historical documentary. The background should feature velvet curtains, art deco furniture, and a live jazz band performing on stage. Upload reference images of a 1920s jazz club for stylistic accuracy. The scene should feel nostalgic and sophisticated, suitable for a calm, story-driven video.
Runway ML Gen 4 differs from other AI tools like Midjourney or DALL·E by focusing specifically on the needs of video creators who require images that can be directly transitioned into motion graphics or full video sequences. While DALL·E and Midjourney offer powerful image generation, Runway ML is unique in its ability to handle both the image-first creation and seamless video integration all in one platform. Its reference image feature also sets it apart, allowing for a more guided and accurate generation of visuals—particularly when you need precise control over likenesses or stylistic alignment.
Compared to other platforms that focus more on static imagery, Runway ML is designed for video-first creators, making it ideal for video content generation where you need consistency across still images and motion elements.
Meta AI (formerly known as part of the Meta GenAI suite) specializes in generating emotionally resonant, socially expressive, and lifestyle-driven visuals. Designed to enhance creative output across social platforms like Instagram, Facebook, Horizon, and Reels, it emphasizes people, emotion, and community energy in its generated outputs.
In 2025, Meta AI has advanced in:
Ideal for:
Meta AI performs best when prompts emphasize connection, atmosphere, and emotion. Think in terms of moments, not just objects or scenery.
Prompting Formula:
[Scenario or group dynamic] + [Location or environment] + [Emotional tone or action] + [Lighting or stylistic detail]
Prompt Example 1:
A group of friends gathered around a beach bonfire at sunset, laughing and roasting marshmallows, warm lighting from the flames casting soft glows on their faces. Emphasize connection and relaxed joy.
Prompt Example 2:
A lively block party on a summer evening, with neighbors dancing, kids running with sparklers, and string lights overhead. The mood is festive, inclusive, and full of motion and color.
Prompt Example 3:
Two young women hugging tightly at an airport arrival gate, tears in their eyes, surrounded by other travelers and luggage. Emphasize emotion, reunion, and natural lighting.
Prompt Example 4:
A multigenerational family cooking together in a cozy kitchen during the holidays. A child helps decorate cookies while grandparents laugh in the background. The image should feel heartwarming, candid, and detailed.
Meta AI stands apart from platforms like Midjourney and DALL·E through its focus on emotional realism and human connection. While Midjourney thrives in stylized dreamscapes and DALL·E excels at product realism or abstract design, Meta AI prioritizes natural social moments, multi-person dynamics, and lifestyle imagery that feels “captured,” not staged. It’s especially effective for creators and brands looking to tell relatable stories visually, and for building assets meant to live on social platforms. Compared to Gemini or Perplexity, Meta AI leans less into analytical precision and more into relatable, visual storytelling grounded in human emotion.
Grok, X's generative AI tool, is tailored for creating high-quality images that can seamlessly transform into video assets within the same platform. It excels at producing visually rich, context-aware images that serve as both standalone pieces and building blocks for video content. With its deep integration into X's social media ecosystem, Grok is a powerful tool for creators who want to produce both static and dynamic visuals with minimal friction.
Grok stands out for its ability to:
This makes it ideal for:
Grok leverages X's powerful AI and its vast database of social media and cultural insights, creating images that resonate with your audience and align with current trends. As of 2025, Grok is optimized not only for image creation but for quick conversion into video elements, which is a huge advantage for content creators working in fast-paced environments.
Grok is optimized for:
Image + Video Workflow: In 2025, Grok’s integration into X’s ecosystem now allows creators to generate images for specific video use cases. Once the image is created, you can easily transition it into video, maintaining a consistent look and feel for both static and motion content.
Reference Image Upload: A powerful feature of Grok is the ability to upload reference images. This helps the AI understand and replicate specific visual details, such as the look of a person, product, or environment. For video creators, this is especially useful when designing characters or scenes that need to match real-world references or previous design work.
Brand and Style Matching: Grok now allows users to specify not just general visual tone, but also exact brand guidelines (e.g., color palette, font types, mood). This feature is particularly useful for video creators looking to maintain a consistent brand image across both still and motion content.
Fast Content Generation: Grok is optimized to produce visuals in seconds, allowing content creators to keep up with the fast pace of social media, especially when creating real-time video content based on evolving trends.
When using Grok for image creation, especially for video production, it’s important to craft clear and contextually relevant prompts. The more specific you are about how the image will be used in video, the better the result will be.
Grok responds best when your prompt includes:
Prompting Formula:
[Purpose for video use] + [Visual subject or scene] + [Details for composition and style] + [Reference images] + [Video transition or animation cues]
Include:
Avoid:
Prompt Example 1:
Create an image of a futuristic city skyline for a sci-fi video. The skyline should feature towering skyscrapers, neon lights, and flying cars. Upload reference images of modern cityscapes and futuristic architecture. This image will serve as the background in a video sequence, so make sure it's dynamic and full of energy, ready for animation.
Prompt Example 2:
Generate an image of a young woman sitting in a coffee shop, with a laptop open in front of her. Upload reference images of a woman in casual attire and a cozy coffee shop. This image will be used as a character shot in a vlog-style video about remote work. The lighting should be warm and inviting, evoking a sense of comfort and productivity.
Prompt Example 3:
Create a dynamic action shot of a superhero flying over a cityscape. The superhero should be in a bright, colorful costume, and the city should have tall buildings with a sunset sky in the background. Upload reference images of superhero poses and city skylines. This image will serve as the key visual for an animated video intro.
Prompt Example 4:
Generate an image of a professional office space with modern furniture and a clean aesthetic. Upload reference images of minimalist office designs. This image will be used in a corporate promotional video, so make sure it conveys a sense of professionalism and efficiency.
Grok differs from models like DALL·E, Midjourney, or Adobe Firefly by focusing on real-time content creation and image-to-video workflows. While platforms like Midjourney excel at artistic and abstract visuals, Grok emphasizes practical image generation with an eye toward real-world applications like social media, marketing, and video production. Unlike DALL·E, which offers flexibility in terms of surreal or playful designs, Grok is tailored to creators who need timely, on-brand imagery for both static and dynamic use cases. It also has the advantage of being deeply integrated with X’s social media ecosystem, allowing for trend-sensitive, contextually relevant images.
Compared to other video-first tools, Grok offers an all-in-one solution for both image and video asset creation, making it an ideal choice for creators looking to produce visuals quickly, consistently, and on-brand.
You'll see me use Whisk during the course for image generation. Not only do I use Whisk to generate images but I also use it to backwards engineer a prompt by extracting Whisks analysis of an image - you'll see this is section 10 in the VEO lectures.
Whisk by Google Labs is an experimental creative platform that blends visual generation, storyboarding, and scene planning into one streamlined tool. Originally intended to assist writers and storytellers, Whisk has evolved into a visual ideation engine with strong narrative support.
Whisk is designed to:
It’s especially powerful for:
Whisk responds best when your prompts include a narrative or cinematic context — imagine giving direction to a storyboard artist or visual concept designer.
Prompting Formula:
[Scene context or action] + [Characters and visual actions] + [Camera angle or style] + [Mood, lighting, and tone]
You can also add notes like:
Prompt Example 1:
Scene 2: The Escape – A teenage girl runs through a cornfield at dusk, her face filled with fear. A shadowy figure is seen in the distance. Use a wide shot with golden backlight, grainy 35mm film style. Mood: suspenseful, urgent.
Prompt Example 2:
Scene 5 – Two astronauts float outside a damaged space station. One reaches for a broken panel, the Earth below in frame. Use a close-up on their helmet reflections, soft lighting, cinematic realism.
Prompt Example 3:
Panel 1 of 4: A wizard stands at the edge of a cliff, casting lightning toward a dragon overhead. Storm clouds churn behind him. Dramatic angle from below, painterly fantasy style, cool blue and violet tones.
Prompt Example 4:
Marketing Visual: A cozy kitchen scene with a mother and child baking cookies. Use soft warm lighting, vintage textures, and a mid-shot that shows their joyful expressions and flour-covered hands. Include kitchen props and ingredients in soft focus.
Whisk differs from tools like DALL·E and Midjourney by focusing on scene logic, narrative consistency, and visual continuity. It’s not just about creating a beautiful image — it’s about generating a moment that feels like part of a larger story. Compared to Gemini (which is more structured and info-aligned) and Midjourney (more abstract and artistic), Whisk is built for storytellers, giving you tools to explore, develop, and visualize entire narratives — from script to screen. It also stands apart in its support for camera direction, character continuity, and story-driven visual design.
click the drop down arrows to revel more
What is VEO 3?
VEO 3 is an advanced text-to-video AI tool that revolutionizes the way content creators produce video. As of 2025, it’s one of the leading platforms for transforming detailed text prompts into fully-fledged videos, including visuals, characters, dialogue, sound effects, and lip-syncing speech. Unlike traditional video production methods, VEO 3 combines the power of AI to generate entire videos from descriptive text inputs, allowing for an incredibly high degree of customization in terms of visual style, audio, and dialogue delivery.
VEO 3 excels at:
To get the most out of VEO 3’s text-to-video capabilities, it's essential to provide a comprehensive, clear, and detailed prompt that covers all the aspects of the video you want to create. This includes everything from visual style to character actions and dialogue. Here’s how to craft the perfect prompt.
Start by clearly defining the purpose or storyline of the video. Is it for an advertisement, educational content, a short film, or a social media post? This gives the AI a foundation on which to base all other elements.
Example:
"A tutorial explaining how to grow tomatoes in a backyard garden."
VEO 3 excels at creating various visual styles, from realistic landscapes to abstract illustrations. Specify the type of environment, the color palette, and the visual tone of the video. If you're aiming for a certain aesthetic (like cartoonish, noir, or hyper-realistic), be explicit.
Details to Include:
Example:
"The scene is a bright backyard garden, with a raised wooden bed filled with tomato plants. There is soft sunlight filtering through the trees, casting a warm golden glow. The camera zooms in to focus on the tomatoes as they ripen."
Define who’s in the scene, their appearance, clothing, and actions. If characters are involved, mention their age, gender, ethnicity, and expressions. If you want multiple characters, detail how they should interact.
Details to Include:
Example:
"A middle-aged Caucasian woman in gardening clothes is crouching beside the raised bed, inspecting the tomatoes. She’s smiling as she gently picks a ripe tomato."
VEO 3 allows you to generate dialogue that will be synced with the characters' lip movements. Specify the exact lines or speech, the tone of voice, and the mood.
Details to Include:
Example:
"The woman says, 'These tomatoes are the best I’ve ever grown. If you want to learn how to do it, keep watching!' She smiles as she looks directly into the camera."
VEO 3 can generate background sounds or music to match the mood of your video. Specify any particular sound effects you need (e.g., footsteps, bird chirping) or the type of music (e.g., upbeat, relaxing, suspenseful).
Details to Include:
Example:
"Soft birds chirping in the background, with gentle acoustic guitar music playing to set a calming, pleasant tone."
Make sure to define how the audio (dialogue, sounds, and music) should sync with the visual elements. Timing plays a crucial role in how natural and engaging the video feels.
Details to Include:
Example:
"As the woman picks the tomato, she starts speaking, and the background music gradually fades in. The music then fades out slightly when she finishes speaking, allowing the sound of the garden to take over."
If you’re going for something more dynamic or unique, mention any special effects (e.g., zoom-ins, transitions, explosions, glowing objects, or any fantastical elements).
Example:
"As the woman holds the ripe tomato, the scene should transition with a gentle glow, highlighting the fruit in her hand."
Here’s a complete example of a well-rounded prompt for VEO 3:
Prompt Example:
"A close-up shot of a woman in her late 30s, wearing a light blue gardening outfit, standing in a backyard garden with raised beds full of tomatoes. The sunlight is soft and golden as she picks a ripe tomato. As she smiles and looks directly into the camera, she says, 'These tomatoes are the best I’ve ever grown. If you want to learn how to do it, keep watching!' The background is peaceful, with birds chirping, and gentle acoustic guitar music is playing softly. The camera zooms in slightly as she picks the tomato, and the sound of the tomato being picked is audible with a slight crunch."
By following this structure, you'll be able to craft detailed prompts that give VEO 3 everything it needs to create high-quality, engaging videos directly from text!
Try VEO 3
SORA offers powerful tools for generating both text-to-video and image-to-video outputs. While using an image as a starting point ensures visual fidelity, crafting detailed text prompts remains crucial. The best practices for each method share similarities, with specific nuances for effective results with text to video (you will need to describe the characters, clothing and scene with more detail - like we did with AI Image generation).
1. Start with Detailed Descriptions
For text-to-video, remember to include essential visual and descriptive elements, such as:
Example:
A cinematic shot of a 35-year-old knight in full metal armor scaling a snowy peak at dawn. Golden sunlight reflects off the snow, emphasizing the epic fantasy tone in a wide-angle, sweeping shot.
2. Strike a Balance Between Brevity and Detail
3. Stay Grounded in Realistic Concepts
Avoid overly abstract or impossible visuals that may confuse the AI. Use scenarios that can be rendered believably.
Avoid: "A dragon exploding into fireworks made of water."
Better: "A dragon flying over a moonlit forest, its scales glistening in silver light."
4. Utilize Emotional and Lighting Cues
Integrate emotional undertones and lighting descriptions to add depth and context. Lighting conveys mood and enhances storytelling.
Example:
A tranquil lake surrounded by glowing autumn foliage at sunrise, casting warm, golden light on the scene, evoking serenity and reflection.
5. Use Cinematic Language
Incorporate filmmaking terminology to refine the visual style and composition.
Example:
A protagonist silhouetted against a fiery sunset, captured with a telephoto lens, as the camera slowly zooms in for dramatic tension.
6. Experiment with Artistic Styles
Define specific art or film styles to achieve distinctive aesthetics.
Example:
A bustling 1920s jazz club filled with dancers, shot in black and white with a grainy film effect reminiscent of silent cinema.
For more insights, explore the styles guide at AI Video School Styles.
Runway focuses on real-time video editing and generation, offering tools that allow users to create videos with dynamic visuals and seamless transitions.
Haiper is designed for generating short, engaging videos with a focus on storytelling and dynamic visual elements.
Pika focuses on generating engaging, visually appealing videos with a simple and intuitive interface, making it ideal for marketing and social media videos.
click the drop down arrows to revel more
ElevenLabs is an amazing AI tool for audio; narration and sound effects. Use it for text to speech and voiceover changer tools, note: prompting may not be needed. However, you may want to prompt for sound effects.
ElevenLabs specializes in generating realistic voiceovers, custom sound effects, and synthetic speech with natural intonations. It excels in creating soundscapes and voice-based audio elements, making it ideal for dialogue and sound effect design.
SUNO is designed for creating full audio tracks, such as music compositions, backing tracks, and complete soundscapes for video content. It’s versatile for generating anything from background scores to full-length music tracks.
Example: Generating a country song with a specific theme:
To create a country song using SUNO, you’ll need to provide clear details about the theme, mood, and specific elements you want to include. Country songs are often narrative-driven, focusing on relatable stories, emotions, and imagery.
Example Prompt:
“Generate a country song with a nostalgic and reflective mood, focusing on the theme of returning home after years of being away. The song should have a slow to mid-tempo pace with acoustic guitar, soft fiddle, and light harmonica. Include lyrics that evoke imagery of dirt roads, old memories, and reconnecting with family and old friends. The tone should be warm and heartfelt, creating a sense of longing and comfort.”
These elements give SUNO clear guidelines on the theme and musical style, ensuring that the generated track captures the essence of a classic country song. The mention of instruments and specific imagery helps refine the sound and narrative focus, making it more relatable and emotionally resonant.
Get offers, updates, free resources - stay in the loop (no spam I promise... I hate spam!)
AI Video School
ลิขสิทธิ์ ©2025 AI Video School - สงวนสิทธิ์ทุกประการ