Overseas access: www.kdjingpai.com
Bookmark Us

Code2Video is a pioneering video generation framework from the NUS Show Lab, with a core philosophy of "code-centric" generation of high-quality educational videos. Unlike traditional AI video models that generate pixels directly (e.g., Sora), Code2Video does not "draw" the video directly, but "renders" it by writing executable Python code (based on the Manim engine). This approach solves the problem of traditional video generation models. This approach addresses the shortcomings of traditional video generation models in terms of logical rigor, geometric accuracy, and text clarity, and is particularly suited to generating STEM-type instructional videos for math, physics, computer science, and other topics that require precise representation. The framework consists of three collaborative AI intelligences: a Planner for designing the storyboards, a Coder for writing and debugging the code, and a Critic for visual review and optimization. Through this collaboration, Code2Video is able to transform simple textual nuggets of knowledge into professional instructional videos comparable to manual productions such as the 3Blue1Brown style.

Function List

  • Intelligent split-screen planning (Planner Agent): Automatically transforms input knowledge points or short texts into detailed video scripts and visual storyboards that plan the rise and fall of the video.
  • Automated code generation (Coder Agent): Transforms natural language scripts into executable Python (Manim) code that supports complex mathematical formulas, geometry, and animation logic.
  • Self-healing and debugging: Built-in error detection mechanism, when the generated code reports errors, the system can automatically analyze the error log and correct the code to ensure that the code can run successfully.
  • Visual Quality Review (Critic Agent)The Visual Language Model (VLM) is used as an "aesthetic guide" to check the layout, overlap, and clarity of the generated images and suggest changes to feed back to the encoder.
  • High-precision vector rendering: Based on the Manim engine, the resulting video is rendered with infinite resolution clarity, with no blurring or artifacts in formulas and text.
  • Multi-model API supportSupport access to Claude, Gemini, GPT-4 and other mainstream large language models as the back-end logic driver.

Using Help

Code2Video is an open source command line tool that requires a locally configured Python environment to run. Here is a detailed installation and usage process to help you generate your first instructional video from scratch.

1. Environment preparation and installation

First, make sure you have Anaconda or Miniconda, and Git installed on your computer.

Step 1: Clone the project code
Open a terminal or command prompt and execute the following command to download the project:

git clone https://github.com/showlab/Code2Video.git
cd Code2Video

Step 2: Create a virtual environment
To avoid dependency conflicts, create a separate Python environment (Python 3.9+ recommended):

conda create -n code2video python=3.9 -y
conda activate code2video

Step 3: Install system dependencies (Linux for example)
The Manim engine requires some system level libraries (e.g. ffmpeg, cairo).

sudo apt-get update
sudo apt-get install libcairo2-dev libpango1.0-dev ffmpeg

Note: For Windows users, please refer to Manim official documentation to install ffmpeg and latex.

Step 4: Install Python dependencies
The project has recently optimized dependencies and the installation speed has been dramatically improved:

pip install -r requirements.txt

2. Configure the API key

Code2Video relies on the Large Language Model to generate code. You need to configure the API Key for the LLM.
Find the configuration file in the project root directory (usually in the config folder or via environment variable settings). It is recommended to export environment variables directly in the terminal:

# 以使用 Claude 为例
export ANTHROPIC_API_KEY="sk-ant-..."
# 或者使用 OpenAI
export OPENAI_API_KEY="sk-..."

Tip: Make sure your account has enough Token Amount.

3. Video generation (core operations)

Code2Video offers a convenient startup script run_agent_single.sh to generate videos of individual knowledge points.

Basic command format:

bash run_agent_single.sh [模型API] [输出文件夹前缀] "[知识点描述]"

Example of operation:
Suppose you want to generate a video on the Pythagorean theorem, using the Claude-3.5-Sonnet model, as follows:

  1. Edit startup script (optional)::
    You can either run the command directly or open the run_agent_single.sh View the default parameters.
  2. Execute the generate command::
    bash run_agent_single.sh claude-3-5-sonnet test_output "The Pythagorean theorem explains the relationship between the three sides of a right-angled triangle"
    

Parameter Explanation:

  • claude-3-5-sonnet: Specify the inference model to be used, recommending a model with strong programming capabilities.
  • test_output: The generated video and intermediate files will be saved in the experiments/test_output Catalog.
  • "...": This is the most important input, a clear description in one sentence of the point you want to teach.

4. Viewing the results

While the program is running, the terminal displays a log of the collaboration between the three intelligences:

  1. Planner Will output a well-designed description of the subplot.
  2. Coder It displays the Python code being generated and automatically retries if an error is reported.
  3. Critic An evaluation score of the current screen is given.

After the run is complete, go to experiments/test_output folder, you will see:

  • .mp4 File: Final rendered HD video.
  • .py File: The generated Manim source code (you can manually modify this code to fine-tune the video).
  • log.txt: A complete log of the generation process.

5. Advanced techniques

  • Customized material: If the video requires a specific icon, you can put the SVG file into the assets folder and mentioned in the cue word.
  • Adjustment length: In the input Prompt, you can specify "Generate a video of about 30 seconds" and Planner will adjust the number of takes accordingly.

application scenario

  1. Teaching Mathematics and Physics
    The teacher inputs "Explain the basic principle of Fourier transform", and the system automatically generates a demonstration video with dynamic waveform superimposed animation to visualize the abstract concepts.
  2. Algorithm Visualization
    Computer science students type in "Demonstration of the binary lookup algorithm" to generate an animation that shows the process of moving and looking up an array index for use in technology blogs or homework presentations.
  3. Automated online course production
    Educational institutions can batch input their textbook catalogs into the system to quickly produce a series of short videos explaining basic concepts and build a library of standardized lessons.
  4. Research Paper Presentation
    Researchers can input core formulas or model logic from papers to generate highly accurate schematic animations for use in academic conference presentations or Video Abstract.

QA

  1. What is the difference between Code2Video and Sora/Runway?
    Code2Video doesn't generate pixels directly, it generates "code". This means that it generates videos with absolutely correct logic (because it's based on mathematical formulas) and infinitely clear text and lines, which makes it perfect for education and popularization of science. Sora and other models are suitable for generating realistic or artistic creative videos, but are weak in textual and logical accuracy.
  2. Can I use it if I don't know how to program?
    Can. All you need to do is enter a text description (prompt word) and the system will automatically complete the code. But if you know a bit of Python/Manim, you can modify the generated code directly and get a higher level of control.
  3. What if the generated video is very short?
    The current version mainly generates short videos (usually 10-60 seconds) for a single knowledge point. If you need a long video, it is recommended to split the big topic into several small knowledge points and generate them separately, and finally merge them in the editing software.
  4. Missing ffmpeg when installing?
    Manim strongly relies on ffmpeg for video compositing. Make sure you type ffmpeg -version Windows users need to manually download ffmpeg and add its bin directory to the system environment variable Path.
  5. Does it support Chinese input?
    Support. Although the underlying code is in English, you can describe the knowledge points in Chinese. For better effect, it is suggested to add "Please use Chinese for the text in the video" in the prompt or directly replace the text with Chinese in the generated code.
0Bookmarked
0kudos

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

Top

en_USEnglish