Mastering Function Calling & Tools
Dec 29, 2025 • 30 min read
Function Calling (or "Tools") is the bridge between the LLM's brain and the outside world. It turns a passive text generator into an Agent that can query databases, send emails, or control software.
1. The Core Mental Model: The API Glue
It cannot see the weather outside. It cannot check the stock market. But it has a phone. You give it a list of phone numbers (Functions).
"If someone asks about rain, call 555-WEATHER."
Crucially, the LLM does not execute the code. It generates the JSON arguments for you to execute the code.
2. Defining Tools with Zod (The Modern Way)
Writing raw JSON schemas is painful and error-prone. Modern AI engineering uses Zod to define schemas in TypeScript.
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
// 1. Define the Schema with Zod
const WeatherSchema = z.object({
location: z.string().describe("City and state, e.g. San Francisco, CA"),
unit: z.enum(["celsius", "fahrenheit"]).optional().default("celsius")
});
// 2. Convert to OpenAI format
const tools = [{
type: "function",
function: {
name: "get_weather",
description: "Get current temperature", // CRITICAL: The model reads this!
parameters: zodToJsonSchema(WeatherSchema)
}
}];Why "Description" is the Most Code
The description field is not a comment. It is part of the prompt. If your tool is meant to "Search the user's email", but you describe it as "Search", the model might confuse it with "Search Google".Be Verbose.
3. The "Human-in-the-Loop" Pattern
What if the tool is dangerous? (e.g., transfer_money). You cannot let the AI run this autonomously. You need a confirmation step.
The Workflow
- User: "Send $500 to Bob."
- Model: Calls
transfer_money({amount: 500, to: "Bob"}). - Server: Pauses execution. Sends a "Confirmation Card" to the UI.
- User: Clicks "Approve".
- Server: Executes the function and sends the result back to the model.
// Server-Side Logic
if (toolCall.name === "transfer_money") {
// DO NOT EXECUTE.
// Return a special UI state to the client.
return {
ui: <ConfirmationCard amount={500} to="Bob" />,
status: "waiting_for_approval"
};
}4. Parallel Function Calling
A common mistake is assuming the model calls one tool at a time. It can call multiple.
If a user asks: "What's the weather in Tokyo, Paris, and London?", the model will return an array of 3 tool calls in a single response.
// The Model Response (simplified)
{
tool_calls: [
{ id: "call_1", name: "get_weather", args: '{ "location": "Tokyo" }' },
{ id: "call_2", name: "get_weather", args: '{ "location": "Paris" }' },
{ id: "call_3", name: "get_weather", args: '{ "location": "London" }' }
]
}
// Your Job: Execute ALL of them in parallel
const promises = tool_calls.map(call => executeTool(call));
const results = await Promise.all(promises);5. Structured Outputs (Strict Mode)
OpenAI recently introduced strict: true. This guarantees that the output matches your schema 100% of the time, solving the "missing bracket" or "wrong type" issues.
This is essential for Data Extraction pipelines.
const response = await openai.chat.completions.create({
model: "gpt-4o-2024-08-06",
messages: [...],
response_format: {
type: "json_schema",
json_schema: {
name: "extraction",
schema: MyZodSchema,
strict: true // <--- The Limit Saver
}
}
});6. Real World Use Case: The Refund Agent
Let's build a Customer Support Agent that can issue refunds.
The Toolset
look_up_order(order_id): Read-only. Safe.issue_refund(order_id): Definite Side Effect. Dangerous.
The Prompt Strategy
"You are a helpful support agent. First, ALWAYS look up the order to verify eligibility. Only if the order is eligible (less than 30 days old), call the refund tool."
This is "Chain of Thought" via Tooling. The model effectively "checks its work" before taking action.
7. FAQ: Common Pitfalls
My model keeps calling the wrong tool.
Your descriptions are likely ambiguous. Rename search() to search_internal_knowledge_base(). Be explicit.
Can I paste a whole API spec?
Technically yes, but it wastes tokens. Curate a specific "AI Plugin" set of endpoints that are relevant.
8. Advanced Tool Patterns
Tool Result Error Handling
Tools fail. APIs go down, permissions get denied, and data doesn't match expected formats. Your tool handler must always return a result back to the model, even if it's an error — otherwise the model hangs waiting for a response that never comes:
async function executeTool(toolCall) {
try {
const result = await runTool(toolCall.name, toolCall.args);
return { tool_call_id: toolCall.id, role: "tool", content: JSON.stringify(result) };
} catch (error) {
// ALWAYS return an error result, never throw
return {
tool_call_id: toolCall.id,
role: "tool",
content: JSON.stringify({ error: error.message, code: error.code }),
};
}
}When the model receives an error result, it can reason about what went wrong and either retry with different arguments, ask the user for clarification, or gracefully inform the user that the action couldn't be completed.
Tool Use Security Guard-Rails
Function calling creates a new attack surface. When your agent can write files, query databases, or make HTTP requests, a malicious user could craft prompts that abuse these capabilities. Essential guard-rails:
- Validate all arguments server-side: Never trust the model's arguments directly. Re-validate them with your Zod schema before execution.
WeatherSchema.parse(toolCall.args)throws if invalid. - Scope permissions tightly: Create read-only database users for search tools. Use API keys with minimal scopes. A search tool should never have write permissions as a side effect.
- Rate limit tool calls: Set a maximum of 10-20 tool calls per conversation turn. Infinite tool call loops are a real failure mode.
- Audit log everything: Log every tool call with timestamp, user ID, arguments, and result. This is essential for both debugging and compliance.
Dynamic Tool Registration
Instead of giving every agent every tool, dynamically select which tools are available based on context. A customer support agent in the "billing" section should only see billing tools. This has two benefits: it reduces token usage (fewer tool definitions = smaller prompt) and makes the model less likely to call the wrong tool:
// Select tools based on user context
function getToolsForContext(userContext) {
const tools = [lookupOrderTool]; // Always available
if (userContext.role === 'support') {
tools.push(issueRefundTool, escalateTicketTool);
}
if (userContext.subscription === 'enterprise') {
tools.push(generateReportTool, exportDataTool);
}
return tools; // Only send relevant tools to the model
}9. Frequently Asked Questions (Extended)
What's the difference between function calling and prompt engineering?
Prompt engineering instructs the model to produce specific text output. Function calling gives the model structured access to external systems. Use prompting to guide reasoning and tone; use function calling to take actions with real-world effects (APIs, databases, file systems). The two work together: your system prompt describes when to use each tool, and the tool definitions specify the interface.
Can I use function calling with open-source models?
Yes. Models like Llama 3.1, Mistral, and Qwen 2.5 support function calling. The interface is nearly identical to OpenAI's API. Ollama and vLLM both expose OpenAI-compatible endpoints with tool calling support. Quality varies — GPT-4o and Claude 3.5 Sonnet are most reliable for complex multi-step tool use chains.
How many tools can I give the model at once?
Technically up to 128 (OpenAI limit), but practically keep it under 20. Each tool definition adds tokens to the context (roughly 50-200 tokens per tool). With 50+ tools, quality degrades — the model gets confused about which tool to use. If you need 50+ tools, use dynamic tool selection to surface only the 5-10 most relevant ones per request.
How do I prevent the model from calling tools when it shouldn't?
Use tool_choice: "auto" (default) to let the model decide, "none" to force a text response, or "required" to force tool use. For conversational queries ("How are you?"), the model correctly avoids tool calls with "auto". If it's over-calling, make your descriptions more specific about when to use each tool.
10. Conclusion
Tools are what separate "Chatbots" from "Agents". By mastering Zod schemas, parallel execution, error handling, and approval workflows, you can build systems that do real work in the real world — not just generate text about it. The teams building the most reliable AI products aren't those with the best prompts; they're those with the most robust tool scaffolding.
Continue Reading
Vivek
AI EngineerFull-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning — no fluff, just working code and real-world context.