Overseas access: www.kdjingpai.com

Bookmark Us

Current Position:fig. beginning " AI Answers

Agent Leaderboard 中的 TSQ 评分是什么？它如何帮助开发者选择模型？

2025-08-30

1.5 K

TSQ 评分详解

TSQ（Tool Selection Quality）是 Agent Leaderboard 的核心评估指标，用于衡量 AI 代理在工具使用中的准确性。

Assessment dimensions

工具选择准确性：模型是否能正确识别和使用所需工具。
多工具协同能力：在复杂任务中协调多个工具的表现。
场景适应性：在不同领域（如数学、零售、航空）中的稳定性。

Practical advice

根据 TSQ 得分：

高分模型（0.85+）：适合复杂工作流场景（如 GPT-4o 在多工具任务中表现优异）。
中低分模型：可考虑用于简单 API 交互或预算优先的项目（如 Gemini-2.0 Flash 的成本仅 $0.15/百万 token）。

This answer comes from the articleAgent Leaderboard: AI Agent Performance Evaluation RankingsThe

Related articles

May not be reproduced without permission:AI productivity tools " Agent Leaderboard 中的 TSQ 评分是什么？它如何帮助开发者选择模型？

Recommended

English