MoviiGen1.1 is an open source AI tool developed by ZuluVision that focuses on generating high quality videos from text. It supports 720P and 1080P resolutions and is especially suitable for professional video production that requires cinematic visual effects. Users can generate videos with natural dynamics and consistent aesthetics from simple text descriptions.MoviiGen1.1 provides model weighting and inference code based on the PyTorch framework, which is easy to deploy and use. Its openness and high performance make it ideal for video creators and developers for a wide range of applications in film and television production, advertising, and creative content generation.
Function List
- Supports 720P and 1080P high resolution video generation, 1080P and 21:9 aspect ratio (1920×832) are recommended for cinematic quality.
- Provide text-to-video function to generate high-quality videos by describing scenes, subjects and actions.
- Includes an extended model for cue words, fine-tuned based on Qwen2.5-7B-Instruct to improve the detail and generation of text descriptions.
- Open source model weights and inference code to support local deployment and customized development.
- Supports professional-grade video generation for film and television production, advertising and creative content creation.
- FastVideo plug-in support to optimize video generation performance.
- Compatible with PyTorch 2.4.0 and above, easy to integrate into existing development environments.
Using Help
Installation process
To use MoviiGen 1.1, users need to complete the environment configuration and model installation first. The following are the detailed steps:
- clone warehouse
Run the following command in the terminal to get the MoviiGen1.1 source code:git clone https://github.com/ZulutionAI/MoviiGen1.1.git cd MoviiGen1.1
- Installation of dependencies
Make sure that Python 3.10 and above is installed on your system, and that PyTorch 2.4.0 or later is installed. Run the following command to install the dependencies:pip install -r requirements.txt
In addition, the FastVideo plugin needs to be installed according to the official instructions, which can be found on FastVideo's GitHub page.
- Download model
MoviiGen 1.1 model hosted on Hugging Face. Use thehuggingface-cli
Download the model file:pip install "huggingface_hub[cli]" huggingface-cli download ZuluVision/MoviiGen1.1 --local-dir ./MoviiGen1.1
The model supports the T2V-14B architecture and is downloaded and stored in the
./MoviiGen1.1
Catalog. - Verification Environment
Ensure that the GPU hardware is supported (20GB or more video memory is recommended for 1080P video). Check that PyTorch recognizes the GPU correctly:python -c "import torch; print(torch.cuda.is_available())"
exports
True
Indicates successful environment configuration.
Usage
The core function of MoviiGen 1.1 is to generate video by text cue words. Below is the specific operation procedure:
1. Basic video generation
Users can generate videos by running inference scripts. Example command:
PYTHONPATH=. python scripts/inference/generate.py --ckpt_dir ./MoviiGen1.1 --prompt "一个穿红裙的女子在街头漫步,背景是繁忙的都市街道,阳光洒在地面,镜头缓慢移动,展现明亮色彩。"
--ckpt_dir
: Specifies the model file path.--prompt
: Enter text describing the scene, suggesting 100-200 words, including the scene, subject, action, aesthetic style, and camera movement.
2. Extending the model with cues
MoviiGen 1.1 provides a cue word extension model based on Qwen2.5-7B-Instruct fine-tuning to enhance the details of text descriptions. Enable this feature:
PYTHONPATH=. python scripts/inference/generate.py --ckpt_dir ./MoviiGen1.1 --prompt "一个穿红裙的女子在街头漫步。" --use_prompt_extend --prompt_extend_model ZuluVision/MoviiGen1.1_Prompt_Rewriter
The cue word extension automatically enriches the description, such as adding scene details, lighting effects, etc., to enhance the quality of the generated video.
3. Recommended prompt format
For best results, the prompt needs to contain the following elements:
- Scene DescriptionFor example, "A smoke-filled detective's office with blinds casting sharp shadows".
- main part: as in "The tired detective sat behind the table".
- movementsAs in "light a cigarette and exhale a puff of smoke".
- esthetic style: e.g. "Black and white high contrast, 1940s film noir style".
- camera shift: as in "Static Medium Shot, Focus on the Detective".
Example Prompt Words:
In a smoky detective's office, with blinds casting sharp shadows, the tired detective sits behind his desk, lights a cigarette and exhales a plume of smoke. The image is in black and white with high contrast, presenting the 1940s film noir style. The camera is a static medium shot, focusing on the detective to create a depressing atmosphere.
4. Output settings
- resolution (of a photo): 720P and 1080P are supported by default, and 1080P (1920×832) is recommended for movie-quality results.
- generation time: Generating 1080P video takes a long time, so a high-performance GPU (e.g. RTX 4090, 20GB video memory) is recommended.
- output path: The generated video is saved by default in the
./MoviiGen1.1/output
directory, other paths can be specified in the inference script.
5. Optimizing performance
- FastVideo Plugin: Installation accelerates video generation, refer to FastVideo documentation for configuration.
- Video Memory Optimization: If video memory is insufficient, try generating 720P video to reduce memory requirements.
- batch file: Supports batch generation of multiple cue words and modification of inference scripts to loop through multiple cue words.
caveat
- Ensure a stable internet connection to download models and dependencies.
- High-resolution video generation has high hardware requirements and a high-performance GPU is recommended.
- The quality of the prompt words directly affects the generation results, and it is recommended to try more different description styles.
- Check the GitHub repository regularly for updates to the latest models and code.
application scenario
- film and television production
MoviiGen 1.1 is suitable for generating movie trailers, short films or scene clips. Users can quickly generate videos with professional lighting effects with detailed text descriptions, reducing traditional filming costs. - Creative Advertising
Advertising teams can utilize MoviiGen 1.1 to generate product promotional videos. For example, enter scenes and actions describing a branded product to quickly generate high-quality advertisement footage and save production time. - game development
Game developers can use MoviiGen 1.1 to generate animated transitions or environmental background videos that support high resolution and customized styles to meet game narrative needs. - Education and training
Educators can generate instructional videos that show moving images of historical scenes or science experiments to enhance the appeal and visualization of the content.
QA
- Is MoviiGen 1.1 free?
Yes, MoviiGen 1.1 is an open source tool, and the model weights and inference code are freely available for users to download and use from GitHub and Hugging Face. - What hardware is required to generate 1080P video?
A GPU with at least 20GB of video memory (e.g. NVIDIA RTX 4090) is recommended. Lower configurations can be used to generate 720P video, but still require GPU support. - How to optimize the quality of the generated video?
Use detailed cue words with scene, subject, action and style descriptions. Enable the Cue Expansion Model to further enhance detail. Make sure your hardware supports 1080P generation for best results. - Are Windows and Linux supported?
Yes, MoviiGen 1.1 supports Windows and Linux, and will run with Python 3.10+ and PyTorch 2.4.0+ installed. - How long does it take to generate a video?
Depending on the hardware and resolution, 1080p video can take minutes to hours, 720p is faster. Use a high-performance GPU and the FastVideo plug-in to reduce the time.