CoAgents achieves competence by breaking down tool-use tasks to three specialized intelligences:
- Basic Agent: Responsible for translating user intent into executable instructions, solving the "what to do" problem.
- executing agent: Specializes in the actual invocation of tools/APIs, solving the "how to" problem.
- Observation Agents: Extract structured information from the raw returned data, solving the "what is the result" problem
The three form a closed-loop workflow, which can be iteratively optimized through environmental feedback when execution errors occur. For example, in the TMDB case, the user inputs "find sci-fi movie", the base agent generates an API query command, the execution agent calls the /search interface, and the observation agent extracts key fields such as title/rating from the JSON result. This division of labor enables LLM to use complex tools more accurately.
This answer comes from the articleCoAgents: a framework for learning to use tools through multi-intelligence collaborationThe