ShareGPT-4o-Image is an open source large multimodal image generation dataset released by the FreedomIntelligence team to help open source multimodal models align the image generation capabilities of GPT-4o. The dataset contains 91K high-quality samples divided into two categories:
- 45K text-to-image samples: Generate images from text prompts only.
- 46K text plus image to image samples: Image editing based on input image and text prompts.
The dataset is stored in Parquet format, is about 20.7 MB in size, contains 92,256 rows of data, and is freely available at Hugging Face or GitHub. Its core features are:
- Supports multimodal model training to enhance image generation and editing.
- Provide the community with high-quality resources for the development of open source multimodal AI.
- The companion Janus-4o model was developed to outperform its predecessor, Janus-Pro.
This answer comes from the articleShareGPT-4o-Image: an open source multimodal image generation datasetThe

































