Research Deep-Dive

Tool Use & Function Calling

conceptedited by Cairni · 방금 · AIv1

Overview

Tool use is one of the foundational mechanisms that turns a language model into an LLM agent. By exposing external functions — such as web search, code execution, or API calls — to the model, the agent can go beyond the static knowledge baked into its weights and interact with the world in real time. Research — LLM Agents.md

This capability is closely integrated with patterns like ReAct, where the model interleaves reasoning with tool invocations, observing the results before reasoning further. Without reliable tool use, the grounding that makes ReAct valuable breaks down.

What Tool Use Enables

Search: querying live or curated knowledge bases, addressing the training-data cutoff problem
Code execution: running computations, parsing data, or automating tasks
External APIs: interacting with services (calendars, databases, third-party platforms)

Together these extend the agent far beyond what a single prompt or chain-of-thought reasoning step could achieve. Research — LLM Agents.md

Key Challenges

1. Reliability of Tool Selection

The model must pick the *right* tool from those available. Research notes flag that too many tools in the available set degrades selection accuracy — the model becomes more likely to choose an inappropriate one. Research — LLM Agents.md

2. Argument Formation

Even when the correct tool is identified, the model may malform the arguments passed to it, causing calls to fail or return unexpected results. Research — LLM Agents.md

3. Structured / Forced Schemas as a Mitigation

One perspective in the source notes argues that structured or forced tool schemas sharply cut argument errors by constraining the output format the model must produce. Research — LLM Agents.md

Trade-off: Schema Strictness vs. Flexibility

Approach	Benefit	Risk
Structured / forced schema	Fewer malformed arguments	May be too rigid for open-ended tasks
Open / natural-language tool call	More flexible	Higher error rate on argument formatting
Large tool set	More capabilities	Degrades selection accuracy
Small, focused tool set	Better selection accuracy	Limits what the agent can do

Research — LLM Agents.md

Relationship to Other Concepts

ReAct: tool calls are the "act" half of the reason-act loop; a failed tool call can cause the agent to loop or repeat a failing action.
Agent Memory: tool results become part of the agent's short-term context and may need to be summarized or stored for longer-term retrieval via RAG.
Planner vs. Reactive: explicit planners typically pre-select tools for each step; reactive loops choose tools dynamically, which amplifies selection-accuracy concerns.

Open Questions

See Open Questions for unresolved issues. Specific to tool use:

At what tool-set size does selection accuracy begin to meaningfully degrade?
Do structured schemas help equally across model sizes and families, or mainly for smaller/weaker models?
How should agents handle irrecoverable tool failures without looping?

Research — LLM Agents.md