There are two main ways to get an AI response using the Gemini-CLI-2-API:
Basic Chat Request
This can be done by sending a POST request to the /v1/chat/completions endpoint, example:
curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer sk-your-key" -d '{ "model": "gemini-2.5-pro", "messages": [ {"role": "system", "content": "你是一个翻译助手"}, {"role": "user", "content": "将这句中文翻译成英文"} ] }'
Streaming Response Requests
To get the response in real time, set "stream": true:
curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer sk-your-key" -d '{ "model": "gemini-2.5-pro", "stream": true, "messages": [ {"role": "user", "content": "写一首关于春天的诗"} ] }'
In addition, the list of supported models can be queried via the /v1/models endpoint. Note that the request format follows the OpenAI API specification exactly, for easy integration with existing tools.
This answer comes from the articleGemini-CLI-2-API: Converting the Gemini CLI to an OpenAI-compatible Native API ServiceThe