Current Position:fig. beginning " Course materials

12 - Factor Agents 7. Using Tool Calls to Communicate with People

2025-07-22

By default, the Large Language Model (LLM) API relies on a fundamentally high-risk Token Choice: Do we return plain text content or structured data?

12 - Factor Agents 7. Using Tool Calls to Communicate with People - 1

You put a lot of weight on the choice of the first Token in the the weather in tokyo In the case of

"the"

However, in the case of fetch_weather case, it is a special Token that is used to represent the beginning of the JSON object.

|JSON>

By making the Large Language Model (LLM) from beginning endOutput the json and then use some natural language token (e.g. request_human_input maybe done_for_now) to declare its intent (rather than as in check_weather_in_city With a "standard" tool like this), you may get better results.

Again, this may not result in any performance gains, but you should experiment and make sure you are free to try some unconventional methods to get the best results.

class Options:
urgency: Literal["low", "medium", "high"]
format: Literal["free_text", "yes_no", "multiple_choice"]
choices: List[str]
# 用于人类交互的工具定义
class RequestHumanInput:
intent: "request_human_input"
question: str
context: str
options: Options
# 在代理循环中的使用示例
if nextStep.intent == 'request_human_input':
thread.events.append({
type: 'human_input_requested',
data: nextStep
})
thread_id = await save_state(thread)
await notify_human(nextStep, thread_id)
return # 中断循环并等待带有线程 ID 的响应返回
else:
# ... 其他情况

After that, you may receive a webhook from a system that handles slack, email, SMS, or other events.

@app.post('/webhook')
def webhook(req: Request):
thread_id = req.body.threadId
thread = await load_state(thread_id)
thread.events.push({
type: 'response_from_human',
data: req.body
})
# ... 为简洁起见已简化，你可能不希望在这里阻塞 web worker
next_step = await determine_next_step(thread_to_prompt(thread))
thread.events.append(next_step)
result = await handle_next_step(thread, next_step)
# todo - 循环、中断或任何你想要的操作
return {"status": "ok"}

The above contains information from Element 5 - Harmonization of implementation status and operational status,Element 8 - Take Control of Your Flow of Control,Element 3 - Take Control of Your Context Window cap (a poem) Element 4 - Tools as structured outputs and several other elements of the model.

If we use the data from the Element 3 - Take Control of Your Context Window 's class XML format, after a few rounds of dialog, our context window might look like this:

(为简洁起见已省略)
<slack_message>
From: @alex
Channel: #deployments
Text: Can you deploy backend v1.2.3 to production?
Thread: []
</slack_message>
<request_human_input>
intent: "request_human_input"
question: "Would you like to proceed with deploying v1.2.3 to production?"
context: "This is a production deployment that will affect live users."
options: {
urgency: "high"
format: "yes_no"
}
</request_human_input>
<human_response>
response: "yes please proceed"
approved: true
timestamp: "2024-03-15T10:30:00Z"
user: "alex@company.com"
</human_response>
<deploy_backend>
intent: "deploy_backend"
tag: "v1.2.3"
environment: "production"
</deploy_backend>
<deploy_backend_result>
status: "success"
message: "Deployment v1.2.3 to production completed successfully."
timestamp: "2024-03-15T10:30:00Z"
</deploy_backend_result>

Benefits.

clear-cut instructions: Using different tools for different types of human contact can make the instructions of the Large Language Model (LLM) more specific.
Internal versus external circulation: Can be used in traditional chatGPT style interface apart from Enable proxy workflows whose control flow and context initialization may be Agent->Human rather than Human->Agent (e.g., agents triggered by cron or events)
multi-user access: Input from different users can be easily tracked and coordinated through structured events
multi-agent: Simple abstractions can be easily extended to support Agent->Agent Requests and responses
durability: with Element 6 - Start/Suspend/Resume via a simple API Combined, they can create persistent, reliable and introspective multi-person workflows