Overseas access: www.kdjingpai.com
Bookmark Us
Current Position:fig. beginning " AI How-Tos

Zero to One: Building Your First AI Video Automation Workflow with Coze and Cutscene

2025-07-29 12

A new model of video production is emerging. In the past, the video process that required the collaboration of a professional team is now compressed into a few steps with the support of AI, and an automated production line is built. From copy conception, voice synthesis to editing output, the whole process can be completed in half an hour at the earliest, while the generation cost of a single video is only a few dollars and takes a few seconds.

The core of this automation is based on ByteHop's Coze (buckle) with 剪映 The two tools are synergistic. Together they form a workflow building platform that allows many video genres that have been proven successful on social media, such as science knowledge, historical stories, psychological healing, etc., to be encapsulated into replicable templates. All the producer needs to do is enter a topic or keyword, and the AI automates all subsequent processes and sends a draft of the final product directly to the 剪映 in for manual fine-tuning or direct release.

exist Coze These encapsulated workflow templates are becoming ubiquitous in the smart body ecosystem. Video styles such as pixelated, healing grandmothers, and ancient people's health, which have previously gained a lot of attention on platforms such as Shake Yin and Xiaohongshu, are now digital commodities that can be generated with a single click. Users are often directed to communities by scanning QR codes or visiting specific links to access these templates, which has given rise to a new business model: selling workflows.

The developers of these workflows, a group imaginatively known as "shovel sellers," receive an official cut of the money they receive for setting up intelligences and plug-ins on the platform. At the same time, they also cash in through private transactions, offering services such as courses, community memberships, and customized workflows.

In the near future.Coze The official announcement of its development platform Coze Studio and operation and maintenance platform Coze Loop Open source, using Apache 2.0 Agreement. This means that any developer and company can use, modify and redistribute their source code for commercial purposes free of charge. This initiative has given a big boost to the AI The ubiquity of workflows provides a powerful underlying tool for small and medium-sized developers.

GitHub project address.

https://github.com/coze-dev/coze-studio

So how are these automated video workflows built? Can it really replace manual labor altogether? We've actually done it and dismantled the entire process.

Test: Building an "Ancient Health" Video Workflow

Before building any video workflow, you first need to sort out its generation logic. Take the video "Ancient people's health" as an example, its core elements include: text, images matching the text, and background narration.

Thus, the core logic of workflow can be broken down as:Input Theme -> AI Generate Copy -> AI Generate Mirror Script -> AI Generate Image -> AI Synthesize Speech -> Portfolio ExportThe

The next steps show how the Coze in which these logical nodes are linked together to build a complete workflow.

First, open the Coze Website, create a new workflow in the workspace. The panel will have two base nodes "Start" and "End" by default.

Step 1: Generate video theme and copy

The starting point of the workflow is to respond to user input, so a large language model node is needed. We do this by setting the Prompt to command the AI Generate a specific style of copy based on keywords entered by the user.

The model's role setting (System Prompt) is critical, and determines the AI The output style of the

The full system prompt word is below:

# 角色
你是一位擅长创作养生知识、接地气且人间清醒的文字创作者。你能够根据用户输入的内容,提炼深层感悟,用真挚且通俗易懂的语言,结合精炼、有反差和亮点的短句组合,生成一段不超过100字,能够引发深思以及学习养生之道的文本。
# 技能
- 创作精炼有意思且通俗易懂的养生知识语句。
- 当用户输入相关内容并要求创作精炼接地气的养生知识内容时,深入分析用户输入内容,挖掘其中普遍存在的养生知识内容、生活常识以及能够引发普通人共鸣的养生之道。
- 提炼核心内容,用贴近生活、朴实无华的语言,将其转换为一系列直击人心、易于理解的短句或短语。
- 通过巧妙组合这些知识,形成一段整体不超过100字的文本,组合方式应自然流畅,或通过生活化的对比、温和的转折,制造令人眼前一亮,容易记住的亮点和共鸣。
- 确保每次生成的文本紧密围绕用户输入,又能从中提炼出普通人都能理解的养生学知识和建议。
- 每次根据不同的输入、生成的内容风格、情感侧重或表达方式应体现差异化,避免模式化。
- 仅输出和内容有关的内容,引导或无关内容均不输出。
# 限制
- 回答内容必须围绕用户输入内容,仅回答养生知识相关内容,创作通俗易懂、接地气的语句组合,拒绝回答无关话题。
- 所输出内容必须是原创创作,不得抄袭已有内容。
- 生成的整段文本的总字数(不含标点符号),不得超过100字。

Step 2: Generate a video split

Once the copy has been generated, it needs to be transformed into a split-screen script for the video. This again requires a large language model node whose inputs are the copy generated in the previous step and whose outputs are structured split-scene descriptions, subtitles, and image-generated cues.

Corresponding system prompt words:

# 角色
你是一位专业且资深的国风水墨画老爷爷视频创作者,拥有养生学博士学位,在国风养生老爷爷视频创作领域经验极为丰富。你不仅能够深入理解用户需求,还擅长依据给定关键词,精心创作出高质量的国风水墨画老爷爷视频分镜脚本、相应字幕以及画面提示词。
# 技能
- 生成古代国风水墨画老爷爷视频相关内容。
- 仔细从用户给到的{{doc}}中合理分割得出字幕内容。
- 将分割得出的字幕另外输出一个对应的英文翻译版本。
- 根据分割出来的每一句内容,生成符合养生主题的国风水墨画老爷爷图案的详细描述词。
- 对生成的分镜脚本,检查动作描述是否清晰明确,若不清晰需进一步细化。
- 科学合理地设置分镜脚本的时长,确保整体视频节奏流畅。
# 限制
- 仅围绕生成国风水墨画老爷爷视频相关内容进行回复,坚决拒绝回答无关话题。
- 所输出的分镜脚本、字幕、画面提示词必须严格符合相应要求,任何内容都不能偏离框架要求。
- 分镜脚本的动作描述要精准清晰,时长设置要科学合理且符合实际创作逻辑。

Step 3: Batch generate split-scope images

With the frame descriptors (i.e. pictures) for each subplot PromptThe next step is to generate the images. Since each video contains multiple images, it is necessary to use the "Batch" node here to allow the AI Loop over the image generation task.

Inside the batch node, several tools can be linked in series. The first is the Image Generation node.

To ensure a uniform video style, you can use the "key" tool to remove the background, or superimpose the subject image onto a uniform background.

Finally, use the "Drawing Board" tool to preset the layout of the final screen, such as determining the position of the subtitles, the scale of the screen (horizontal or vertical), etc.

Step 4: Generate Voice Readings

Audio.Coze The official "Speech Synthesis" plug-in is built-in. Call the plug-in directly, take the text generated in the first step as input, and select the appropriate tone and speed of speech to generate a voice-over.

Step 5: Export to Cutscene

Currently, there will be Coze Workflow and 剪映 Connectivity, in general, relies on a third-party developed plug-in called "Cropping Assistant". By calling the plug-in's create_craft function, you can pack all the clips (pictures, audio, subtitles) generated in the previous section into a draft cutscene.

After setting up the draft parameters and material parameters, connect all nodes to the "End" module and a complete workflow is built.

After running the workflow, a draft link will be generated, which will be opened on the computer where the "Screen Cutter Assistant" is installed, and the material will be automatically synchronized to the 剪映 in the software.

This workflow is just a basic template. By modifying the system cue words and invoked tool nodes, various styles of videos can be derived, such as Tinder animations, learning vlogs, sober granny quotes, and so on.

More complex videos, such as the previously popular pixel-style short films, are realized by means of nested workflows (one workflow calling another).

Note that running a workflow consumes the Coze Platform resource points. A "health video" consumes approximately 2,000 points. Currently, the platform offers a small amount of free credits per day, and additional resources need to be purchased for a fee.

The Content Ecosystem on the Wave of Automation

despite the fact that building Coze Workflow has a certain learning threshold, but its core attraction lies in the fully automated ability to "build once, produce continuously". It reduces the tedious traditional AI The video production process (copy generation -> script conversion -> video generation -> post-production synthesis) is integrated into a unified platform.

This threshold and information gap has directly given rise to the business of selling ready-made workflows. But the downside is also very obvious:AI It has greatly lowered the threshold of creation, leading to serious homogenization of content and an unprecedented acceleration of hotspot iterations.

In this model, the role of the social media account operator is closer to that of an "operator" of content distribution and monitoring than to that of a traditional "creator," and it is difficult for them to establish core creative barriers in an automated process. In contrast, the "shovel people" who develop and iterate workflows are closer to the role of creators. They need to stay on top of what's hot, designing and optimizing new templates and plug-ins in order to stay profitable in this gold rush.

Strictly speaking, the current mass-produced AI Video is essentially "graphic videoization" - the dynamic stitching together of still images, synthesized audio and subtitles. The value of workflow is that it automates this stitching process. This heralds a profound shift in content production that will reshape the definition of the creator, the skills required, and the business model.

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

inbox

Contact Us

Top

en_USEnglish