Current Position:fig. beginning " AI Answers

How to implement multimodal (text+image) content generation with geminicli2api?

2025-08-22

711

Multimodal generation needs to be realized in one of the following two ways:

1. OpenAI-compatible interfaces::
existchat.completions.createThe request adds thefilesParameters:
{ "model": "gemini-2.5-pro", "messages": [{"role": "user", "content": "描述图片内容"}], "files": ["image.jpg"] }

2. Native Gemini API::
existgenerateContentEndpoints construct multi-part requests:
"parts": [ {"text": "描述这张图片"}, {"file_data": {"mime_type": "image/jpeg", "file_uri": "image.jpg"}} ]

Technical Details:
- Support for JPEG/PNG and other common formats
- Upload up to 10MB of content in a single request
- The image will be encoded as base64 before being transferred

This answer comes from the articlegeminicli2api: Proxy tool to convert Gemini CLI to OpenAI-compatible APIsThe

May not be reproduced without permission:AI productivity tools " How to implement multimodal (text+image) content generation with geminicli2api?