The main steps for deploying the Step3 API service via vLLM are as follows:
- Start the API server: execute the command
python -m vllm.entrypoints.api_server --model stepfun-ai/step3 --port 8000
The service will run on local port 8000. - Send API request: send an API request via HTTP POST to the
http://localhost:8000/v1/completions
Sends a request in JSON format with parameters such as model, prompt and max_tokens. - Processing Response: The API returns the generated result in JSON format, which can be parsed and used directly.
Sample requests can contain multimodal content, such as submitting both image URLs and text prompts. vLLM's efficient reasoning capabilities are particularly well suited for real-time scenarios in production environments, where highly concurrent requests can be handled effectively.
This answer comes from the articleStep3: Efficient generation of open source big models for multimodal contentThe