Tool Use & Function Calling
Overview
Tool use is one of the foundational mechanisms that turns a language model into an LLM agent. By exposing external functions — such as web search, code execution, or API calls — to the model, the agent can go beyond the static knowledge baked into its weights and interact with the world in real time. Research — LLM Agents.md
This capability is closely integrated with patterns like ReAct, where the model interleaves reasoning with tool invocations, observing the results before reasoning further. Without reliable tool use, the grounding that makes ReAct valuable breaks down.
What Tool Use Enables
- Search: querying live or curated knowledge bases, addressing the training-data cutoff problem
- Code execution: running computations, parsing data, or automating tasks
- External APIs: interacting with services (calendars, databases, third-party platforms)
Together these extend the agent far beyond what a single prompt or chain-of-thought reasoning step could achieve. Research — LLM Agents.md
Key Challenges
1. Reliability of Tool Selection
The model must pick the *right* tool from those available. Research notes flag that too many tools in the available set degrades selection accuracy — the model becomes more likely to choose an inappropriate one. Research — LLM Agents.md
2. Argument Formation
Even when the correct tool is identified, the model may malform the arguments passed to it, causing calls to fail or return unexpected results. Research — LLM Agents.md
3. Structured / Forced Schemas as a Mitigation
One perspective in the source notes argues that structured or forced tool schemas sharply cut argument errors by constraining the output format the model must produce. Research — LLM Agents.md
Trade-off: Schema Strictness vs. Flexibility
| Approach | Benefit | Risk |
|---|---|---|
| Structured / forced schema | Fewer malformed arguments | May be too rigid for open-ended tasks |
| Open / natural-language tool call | More flexible | Higher error rate on argument formatting |
| Large tool set | More capabilities | Degrades selection accuracy |
| Small, focused tool set | Better selection accuracy | Limits what the agent can do |
Research — LLM Agents.md
Relationship to Other Concepts
- ReAct: tool calls are the "act" half of the reason-act loop; a failed tool call can cause the agent to loop or repeat a failing action.
- Agent Memory: tool results become part of the agent's short-term context and may need to be summarized or stored for longer-term retrieval via RAG.
- Planner vs. Reactive: explicit planners typically pre-select tools for each step; reactive loops choose tools dynamically, which amplifies selection-accuracy concerns.
Open Questions
See Open Questions for unresolved issues. Specific to tool use:
- At what tool-set size does selection accuracy begin to meaningfully degrade?
- Do structured schemas help equally across model sizes and families, or mainly for smaller/weaker models?
- How should agents handle irrecoverable tool failures without looping?
Research — LLM Agents.md