How to achieve multimodal content generation for text and images?

2025-08-22

703

Multimodal Support Program

geminicli2api supports simultaneous processing of text and image inputs, providing solutions for content creation, education, and more:

API Call Methods::
- OpenAI-compatible interface: viafilesField to submit image path (supports local files/URLs)
- Native Gemini interface: inpartsThe array containsfileDataboyfriend
file formatSupport JPEG/PNG/GIF and other common formats, single file is recommended to be less than 4MB.
mixing instruction: Include both text instructions and image references in the message (e.g., "Describe the main object in this picture").