HiDream-I1 is an open source image generation base model with 17 billion parameters to quickly generate high quality images. Users only need to enter a textual description, and the model can generate images in a variety of styles including realistic, cartoon, and artistic. Developed by the HiDream.ai team, the project is hosted on GitHub under the MIT license and is supported for personal, scientific, and commercial use.HiDream-I1 has excelled in a number of benchmarks, such as HPS v2.1, GenEval, and DPG, and has reached industry-leading levels in terms of quality of the images generated and the ability to follow cue words. Users can experience the model through the Hugging Face platform or download the model weights to run locally. The project also provides a Gradio demo interface to facilitate interactive image generation.
Function List
- Text to Image: Generate high-quality images based on text descriptions entered by users.
- Multi-style support: generate realistic, cartoon, art and other styles of images.
- Fast Generation: Images can be generated in as little as seconds by optimizing the inference steps.
- Model variants: Full (HiDream-I1-Full), Development (HiDream-I1-Dev) and Fast (HiDream-I1-Fast) versions are available.
- Image editing support: Based on the HiDream-E1-Full model, it supports image modification through text commands.
- Open Source and Commercial: The MIT license allows free use of the generated images.
- Gradio Interactive Interface: Provides an online demo for users to experience image generation directly.
Using Help
Installation process
To use HiDream-I1, you need to configure the model runtime environment in your local environment. The following are the detailed installation steps:
- Preparing the environment
It is recommended to use Python 3.12 and create a new virtual environment to avoid dependency conflicts. Run the following command:conda create -n hdi1 python=3.12 conda activate hdi1
or use a virtual environment:
python3 -m venv venv
source venv/bin/activate # Linux
.\venv\Scripts\activate # Windows
- Installation of dependencies
Install the necessary libraries, especially the Hugging Face Diffusers library. It is recommended to install from source to ensure compatibility:pip install git+https://github.com/huggingface/diffusers.git
In addition, Flash Attention is installed to optimize performance and CUDA 12.4 is recommended:
pip install flash-attn
- Download model
HiDream-I1 model weights are available from Hugging Face. Three variants are supported:HiDream-ai/HiDream-I1-Full
: complete model, suitable for high quality generation.HiDream-ai/HiDream-I1-Dev
: The development version, with fewer reasoning steps, is faster.HiDream-ai/HiDream-I1-Fast
: A quick version, suitable for rapid generation.
Running the reasoning script will automatically downloadmeta-llama/Meta-Llama-3.1-8B-Instruct
Models. If the network is unstable, download them from Hugging Face in advance and place them in the cache directory.
- running inference
Run the image generation using the following Python code:import torch from transformers import PreTrainedTokenizerFast, LlamaForCausalLM from diffusers import HiDreamImagePipeline tokenizer_4 = PreTrainedTokenizerFast.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct") text_encoder_4 = LlamaForCausalLM.from_pretrained( "meta-llama/Meta-Llama-3.1-8B-Instruct", output_hidden_states=True, output_attentions=True, torch_dtype=torch.bfloat16 ) pipe = HiDreamImagePipeline.from_pretrained( "HiDream-ai/HiDream-I1-Full", tokenizer_4=tokenizer_4, text_encoder_4=text_encoder_4, torch_dtype=torch.bfloat16 ) pipe = pipe.to('cuda') image = pipe( 'A cat holding a sign that says "HiDream.ai"', height=1024, width=1024, guidance_scale=5.0, num_inference_steps=50, generator=torch.Generator("cuda").manual_seed(0) ).images[0] image.save("output.png")
Parameter Description:
height
cap (a poem)width
: Set the resolution of the generated image, 1024 x 1024 is recommended.guidance_scale
: Controls the degree of cue word adherence, recommendation 5.0.num_inference_steps
: The number of inference steps is 50 for the Full version, 28 for the Dev version, and 16 for the Fast version.
- Run the Gradio demo
The project provides a Gradio interface to facilitate interactive image generation. Run the following command to start it:python gradio_demo.py
Once launched, access the local web interface and enter a text description to generate an image.
Featured Function Operation
- Text to Image: Enter descriptive text in the Gradio interface, e.g. "A cat holding up a sign that says 'HiDream.ai'". Select the model variant and adjust the resolution, click Generate to get the image.
- image editing: Using the HiDream-E1-Full model in Hugging Face space (
https://huggingface.co/spaces/HiDream-ai/HiDream-E1-Full
) Upload the image and enter a change command, such as "Change background to forest". The model will adjust the image according to the command to keep the character consistent. - Model SelectionThe Full version is suitable for high-quality generation, the Dev version for development and testing, and the Fast version for rapid prototyping.
caveat
- Hardware Requirements: Requires NVIDIA GPU (e.g. A100, RTX 3090), supports Ampere architecture or higher. 4-bit quantization version (
hykilpikonna/HiDream-I1-nf4
) can run with 16GB of video memory. - License: consent required
meta-llama/Meta-Llama-3.1-8B-Instruct
's community license and log in on Hugging Face:huggingface-cli login
application scenario
- content creation
Creators can use HiDream-I1 to generate illustrations, advertising graphics or concept art. For example, enter "future city night scene" to generate a sci-fi style image for a novel cover or game design. - Education and Research
Researchers can use the model to conduct image generation experiments, test the effects of different cue words, or develop new applications based on the MIT license. - commercial use
Businesses can generate product promotional images or marketing materials.The MIT license allows free use of the generated images without additional licensing.
QA
- What hardware is required for the HiDream-I1?
Requires NVIDIA GPU (e.g. RTX 3090, A100) with support for the Ampere architecture or higher. 4-bit quantized versions run at 16GB of video memory. - How to choose a model variant?
The Full version is good for high quality generation, the Dev version is good for fast development, and the Fast version is good for fast generation but slightly lower quality. - Are the generated images commercially available?
Yes. The MIT license allows the use of the generated images for personal, scientific and commercial purposes. - How do I fix a model download failure?
Advance Download from Hugging Facemeta-llama/Meta-Llama-3.1-8B-Instruct
model, placed into the cache directory.