
xAI Grok Imagine API:生产环境开箱即用的多模态音视频生成服务
xAI 于 2026 年 1 月正式推出了 Grok Imagine API,这是一项面向开发者和企业的生产级多模态视频生成服务。该服务基于 xAI 内部研发的 “Aurora” 模型构建,核心能力在于能够根据文本提...

OmniInsert: A tool for inserting any reference image into video without masking
OmniInsert is a research project developed by ByteDance Intelligent Creation Lab. It is a tool that seamlessly inserts any reference object into a video without the use of a mask. In the traditional video editing process, if you want to add a new object to the video, you usually need to manually create a precise “mask” to frame out the...

Qwen-Image-Edit: an AI model for editing images based on textual commands
Qwen-Image-Edit is an image editing AI model developed by Alibaba Tongyi Qianqian team. It is trained based on the Qwen-Image model with 20 billion parameters, and its core function is to allow users to modify images through simple Chinese or English text commands. This model utilizes both visual semantic understanding and...

Qwen-Image: an AI tool for generating high-fidelity images with accurate text rendering
Qwen-Image is a 20B parametric multimodal diffusion model (MMDiT) developed by the Qwen team, specializing in high-fidelity image generation and accurate text rendering. It excels in complex text processing (especially Chinese and English) and image editing. The model supports a variety of art styles, such as realistic, anime, and high-definition posters,...

SkyworkUniPic: An Open Source Model for Unified Processing Image Understanding and Generation
SkyworkUniPic is an open-source multimodal model developed by SkyworkAI that focuses on image understanding, text-generated images, and image editing. It integrates three visual language tasks using a single 150 million parameter architecture. Users can run 102 on consumer GPUs such as RTX 4090...

FLUX.1 Krea: Free Open Source Tool for Generating Highly Realistic Images
FLUX.1 Krea [dev] is an open source image generation tool developed by Black Forest Labs in collaboration with Krea AI and hosted on the Hugging Face platform. It is based on a 12 billion parameter rectified flow transfo...

Diffuman4D: Generating High-Fidelity 4D Human Body Views from Sparse Video
Diffuman4D is a project developed by the ZJU3DV research team at Zhejiang University, focusing on generating high-fidelity 4D human body views from sparse view videos. The project combines spatio-temporal diffusion modeling and 4DGS (4D Gaussian Splatting) technology, which solves the difficulty of traditional methods in generating sparse input...

Launch of FLUX.1 Kontext and BFL Playground
Today, we are proud to release FLUX.1 Kontext -- a set of generative flow matching models to support image generation and editing. Unlike existing text-based image generation models, the FLUX.1 Kontext family supports context-sensitive...

PartCrafter: Generating Editable 3D Part Models from a Single Image
PartCrafter is an innovative open source project focused on generating editable 3D part models from a single RGB image. It uses advanced structured 3D generation techniques to simultaneously generate multiple semantically meaningful 3D parts from a single image, suitable for game development, product design and other fields. The project is based on a pre-trained 3D mesh diffusion transformer...

HiDream-I1
HiDream-I1 is an open source image generation base model with 17 billion parameters to quickly generate high quality images. Users only need to enter a text description, the model can generate a variety of styles including realistic, cartoon, art and other images. The project is developed by the HiDream.ai team, hosted on GitHub under the MIT license...

Imagen 4
Google DeepMind's recently launched Imagen 4 model, the latest iteration of its image generation technology, is quickly becoming an industry sensation. The model has made significant progress in improving image richness, detail accuracy, and generation speed, working to bring users' imaginations to life in ways never before possible. Currently, the use of ...

StarVector: Basic model for generating SVG vector graphics from images and text
StarVector is an open source project created by developers such as Juan A. Rodriguez to convert images and text into Scalable Vector Graphics (SVG). This tool uses a visual language model that understands image content and textual instructions to generate high-quality SVG code. Its core features are...

AnyText
AnyText is a revolutionary multilingual visual text generation and editing tool developed based on the diffusion model. It generates natural, high-quality multilingual text in images and supports flexible text editing capabilities. It was developed by a team of researchers and received Spotlight honors at the ICLR 2024 conference.AnyText's...

OmniGen
OmniGen is a “universal” image generation model developed by VectorSpaceLab that allows users to create diverse and contextually rich visuals with simple text prompts or multimodal inputs. It is particularly suited to scenes that require character recognition and consistent character rendering. Users can upload up to three images...

CogView3: Wisdom Spectrum Light Word open source cascade diffusion text to generate image models
Comprehensive Introduction CogView3 is an advanced text generation image system developed by Tsinghua University and Think Tank Team (Smart Spectrum Qingyan). It is based on the cascading diffusion model and generates high-resolution images through multiple stages.The main features of CogView3 include multi-stage generation, innovative architecture and efficient performance, which is suitable for art creation, advertisement design, game development, and many other...
Top