《12-Factor Agents》7. 使用工具调用与人交流

2025-07-22

默认情况下，大语言模型 (LLM) API 依赖于一个根本性的高风险 Token 选择：我们是返回纯文本内容，还是返回结构化数据？

你将很大的权重放在了第一个 Token 的选择上，在 the weather in tokyo 的情况下，这个 Token 是

“the”

但在 fetch_weather 的情况下，它是一个特殊的 Token，用来表示 JSON 对象的开始。

|JSON>

通过让大语言模型 (LLM) 始终输出 json，然后用一些自然语言 Token (例如 request_human_input 或 done_for_now) 来声明其意图 (而不是像 check_weather_in_city 这样的“标准”工具)，你可能会获得更好的结果。

同样，这可能不会带来任何性能提升，但你应该进行实验，并确保你可以自由地尝试一些非常规的方法来获得最佳结果。

class Options:
urgency: Literal["low", "medium", "high"]
format: Literal["free_text", "yes_no", "multiple_choice"]
choices: List[str]
# 用于人类交互的工具定义
class RequestHumanInput:
intent: "request_human_input"
question: str
context: str
options: Options
# 在代理循环中的使用示例
if nextStep.intent == 'request_human_input':
thread.events.append({
type: 'human_input_requested',
data: nextStep
})
thread_id = await save_state(thread)
await notify_human(nextStep, thread_id)
return # 中断循环并等待带有线程 ID 的响应返回
else:
# ... 其他情况

之后，你可能会从处理 slack、电子邮件、短信或其他事件的系统中收到一个 webhook。

@app.post('/webhook')
def webhook(req: Request):
thread_id = req.body.threadId
thread = await load_state(thread_id)
thread.events.push({
type: 'response_from_human',
data: req.body
})
# ... 为简洁起见已简化，你可能不希望在这里阻塞 web worker
next_step = await determine_next_step(thread_to_prompt(thread))
thread.events.append(next_step)
result = await handle_next_step(thread, next_step)
# todo - 循环、中断或任何你想要的操作
return {"status": "ok"}

以上内容包含了来自要素 5 – 统一执行状态和业务状态、要素 8 – 掌控你的控制流、要素 3 – 掌控你的上下文窗口和要素 4 – 工具即结构化输出以及其他几个要素的模式。

如果我们使用来自要素 3 – 掌控你的上下文窗口的类 XML 格式，经过几轮对话后，我们的上下文窗口可能会是这样：

(为简洁起见已省略)
<slack_message>
From: @alex
Channel: #deployments
Text: Can you deploy backend v1.2.3 to production?
Thread: []
</slack_message>
<request_human_input>
intent: "request_human_input"
question: "Would you like to proceed with deploying v1.2.3 to production?"
context: "This is a production deployment that will affect live users."
options: {
urgency: "high"
format: "yes_no"
}
</request_human_input>
<human_response>
response: "yes please proceed"
approved: true
timestamp: "2024-03-15T10:30:00Z"
user: "alex@company.com"
</human_response>
<deploy_backend>
intent: "deploy_backend"
tag: "v1.2.3"
environment: "production"
</deploy_backend>
<deploy_backend_result>
status: "success"
message: "Deployment v1.2.3 to production completed successfully."
timestamp: "2024-03-15T10:30:00Z"
</deploy_backend_result>

好处:

清晰的指令: 针对不同类型的人类联系方式使用不同工具，可以让大语言模型 (LLM) 的指令更具体
内循环与外循环: 可以在传统的 chatGPT 式界面之外启用代理工作流，其控制流和上下文初始化可能是 Agent->Human 而不是 Human->Agent (例如，由 cron 或事件触发的代理)
多用户访问: 可以通过结构化事件轻松跟踪和协调来自不同用户的输入
多代理: 简单的抽象可以轻松扩展以支持 Agent->Agent 的请求和响应
持久性: 与要素 6 – 通过简单的 API 启动/暂停/恢复相结合，可以创建持久、可靠且可内省的多人工作流