Overseas access: www.kdjingpai.com
Bookmark Us
Current Position:fig. beginning " AI Answers

How do you handle multimodal inputs (e.g. image + text) and function calls in AIRouter?

2025-08-21 211

AIRouter supports multimodal inputs and function calls by extending the API as follows:

  • Multi-modal inputs::
    1. Images need to be converted to Base64 format, for example:
    with open("image.jpg", "rb") as f: img_base64 = base64.b64encode(f.read()).decode()
    2. Callsgenerate_mmmethod that specifies a model that supports multimodality (e.g., GPT-4o):
    LLM_Wrapper.generate_mm(model_name="gpt4o_mini", prompt="描述图片", img_base64=img_base64)
  • function call::
    1. Define a list of tools (e.g., weather query functions) with names, descriptions, and parameters.
    2. Adoptionfunction_callingmethod trigger, for example:
    LLM_Wrapper.function_calling(model_name="gpt4o_mini", prompt="北京天气", tools=tools)

take note of: You need to make sure that the selected model supports the corresponding function (e.g. GPT-4o supports multimodal), otherwise an error will be returned.

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

Top

en_USEnglish