LLM APIs & SDKs

Gemini API Function Calling: Practical Patterns That Work

If you are building anything beyond a chatbot, you need your model to take action. Gemini function calling is how you bridge that gap, letting the model decide when to call your Python functions, what arguments to pass, and how to chain results into a final answer. This tutorial walks through the patterns that actually survive production: clean tool definitions, parallel and compositional calls, validation, and a customer-support routing scenario. By the end you will have working code for a real Gemini function calling workflow rather than a toy weather demo.

What Is Gemini Function Calling?

Gemini function calling is a structured-output feature where the model, instead of replying with prose, returns a JSON object describing which of your declared tools to invoke and with what arguments. Your code runs the tool, sends the result back, and Gemini uses it to produce the next response. This turns the model from a text generator into a planner that can hit your APIs, query your database, or trigger a workflow.

The mental model is a loop. First, you describe your tools as function declarations with names, descriptions, and JSON-schema parameters. Then you send a user prompt plus those declarations to Gemini. Gemini either replies with text or with one or more functionCall parts. Next, you execute the calls in your own code, attach the results as functionResponse parts, and send everything back. Finally, Gemini either calls another tool or returns a natural-language answer.

In contrast to one-shot prompt engineering, function calling gives you a stable contract. The model cannot hallucinate a field name into the wrong slot because your schema rejects it. As a result, you can confidently route the output to a downstream system without a regex parse.

Importantly, function calling does not give the model arbitrary code execution. The model only proposes calls; your code decides whether to run them and how. Therefore, treat each function declaration as a public API and validate inputs as you would for any external caller.

Setup: Get an API Key and Install the SDK

First, generate an API key from Google AI Studio and export it as an environment variable. The new SDK (google-genai) is the recommended path for any project starting today. It replaces the older google-generativeai package and aligns with the same surface used by Vertex AI.

pip install google-genai pydantic
export GEMINI_API_KEY="your-key-here"

Next, confirm the SDK is reachable with a minimal call. The example below uses gemini-2.5-flash because it is fast, cheap, and supports function calling at the same fidelity as the Pro tier for routing-style tasks.

import os
from google import genai

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Reply with the single word: ready",
)
print(response.text)

If you see ready printed, your environment is wired up. For production code, prefer dependency injection of the client rather than module-level globals, since the client holds connection state and you will want to mock it in tests.

Your First Function Call: A Weather Lookup

The simplest useful pattern is a single tool. Start with one Python function and a matching declaration. The declaration is what Gemini sees; the function is what your code runs when the model picks it.

from google import genai
from google.genai import types

def get_current_weather(location: str, unit: str = "celsius") -> dict:
    """Pretend implementation. In production, call your weather provider here."""
    return {
        "location": location,
        "temperature": 18 if unit == "celsius" else 64,
        "unit": unit,
        "conditions": "partly cloudy",
    }

weather_tool = types.Tool(
    function_declarations=[
        types.FunctionDeclaration(
            name="get_current_weather",
            description="Get current weather for a city. Use for any 'what's the weather' question.",
            parameters=types.Schema(
                type="OBJECT",
                properties={
                    "location": types.Schema(
                        type="STRING",
                        description="City name, e.g. 'Berlin' or 'San Francisco'",
                    ),
                    "unit": types.Schema(
                        type="STRING",
                        enum=["celsius", "fahrenheit"],
                    ),
                },
                required=["location"],
            ),
        )
    ]
)

Notice the description fields. The model reads those, not the Python docstring. Therefore, vague descriptions lead to wrong tool selection. Write them like you would write API docs for a junior engineer: state when to call, what each argument means, and what the function returns.

Now wire up the loop. The first call sends the user prompt and the tool. If Gemini decides to call the function, you execute it and send the result back in a second turn.

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
config = types.GenerateContentConfig(tools=[weather_tool])

user_prompt = "What's the weather in Berlin right now?"
history = [types.Content(role="user", parts=[types.Part(text=user_prompt)])]

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=history,
    config=config,
)

call = response.candidates[0].content.parts[0].function_call
if call:
    result = get_current_weather(**call.args)
    history.append(response.candidates[0].content)
    history.append(types.Content(
        role="user",
        parts=[types.Part.from_function_response(name=call.name, response=result)],
    ))
    final = client.models.generate_content(
        model="gemini-2.5-flash",
        contents=history,
        config=config,
    )
    print(final.text)

That structure is the entire pattern. Everything else in this post is variations on it: more tools, parallel calls, validation, and recovery.

Defining Multiple Tools and Letting Gemini Choose

In real systems you have a toolbelt, not a single function. Add three more tools and let Gemini pick the right one based on the user message. Group related functions in the same Tool object so the model sees them together.

def get_forecast(location: str, days: int) -> dict:
    return {"location": location, "days": days, "forecast": ["sunny"] * days}

def get_air_quality(location: str) -> dict:
    return {"location": location, "aqi": 42, "category": "good"}

travel_tool = types.Tool(
    function_declarations=[
        types.FunctionDeclaration(
            name="get_current_weather",
            description="Get current weather. Use for 'now' or 'today' weather questions.",
            parameters=types.Schema(
                type="OBJECT",
                properties={"location": types.Schema(type="STRING")},
                required=["location"],
            ),
        ),
        types.FunctionDeclaration(
            name="get_forecast",
            description="Get a multi-day weather forecast. Use when the user asks about future days.",
            parameters=types.Schema(
                type="OBJECT",
                properties={
                    "location": types.Schema(type="STRING"),
                    "days": types.Schema(type="INTEGER", description="Number of days, 1-7"),
                },
                required=["location", "days"],
            ),
        ),
        types.FunctionDeclaration(
            name="get_air_quality",
            description="Get current air quality index for a city.",
            parameters=types.Schema(
                type="OBJECT",
                properties={"location": types.Schema(type="STRING")},
                required=["location"],
            ),
        ),
    ]
)

You can also constrain how Gemini selects tools with tool_config. Setting mode to AUTO is the default and lets the model decide between calling a tool and answering directly. Switching to ANY forces the model to call exactly one of the declared tools, which is useful when you have already classified intent upstream and just need argument extraction. Meanwhile, NONE disables tools entirely for that turn.

config = types.GenerateContentConfig(
    tools=[travel_tool],
    tool_config=types.ToolConfig(
        function_calling_config=types.FunctionCallingConfig(
            mode="ANY",
            allowed_function_names=["get_current_weather", "get_forecast"],
        )
    ),
)

In practice, leave mode at AUTO for chat-style assistants and reach for ANY when you are building a workflow step that must produce structured arguments. Furthermore, the allowed_function_names list is the cleanest way to scope tools per turn without rebuilding the whole tool array.

Parallel Function Calling

Gemini can request several function calls in a single turn when the user prompt naturally fans out. For example, “What’s the weather in Berlin and Tokyo, and how’s the air quality in Tokyo?” should produce three calls at once, not three round-trips. The SDK exposes parallel calls as multiple function_call parts on the same candidate.

def dispatch(call):
    handlers = {
        "get_current_weather": get_current_weather,
        "get_forecast": get_forecast,
        "get_air_quality": get_air_quality,
    }
    return handlers[call.name](**call.args)

user_prompt = "What's the weather in Berlin and Tokyo, and the air quality in Tokyo?"
history = [types.Content(role="user", parts=[types.Part(text=user_prompt)])]

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=history,
    config=types.GenerateContentConfig(tools=[travel_tool]),
)

parts = response.candidates[0].content.parts
calls = [p.function_call for p in parts if p.function_call]

history.append(response.candidates[0].content)
response_parts = [
    types.Part.from_function_response(name=c.name, response=dispatch(c))
    for c in calls
]
history.append(types.Content(role="user", parts=response_parts))

final = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=history,
    config=types.GenerateContentConfig(tools=[travel_tool]),
)
print(final.text)

Crucially, you must return every function_response together in one user turn, in the same order Gemini emitted the calls. Sending them as separate turns confuses the model and often causes it to re-call the same tools. For higher throughput, dispatch the handlers concurrently with asyncio.gather when the functions are I/O-bound, which is the common case for database or HTTP-based tools.

Compositional Function Calling: Tools That Chain

Sometimes the answer requires the output of one tool to feed the next. The user asks for a restaurant nearby, your code geocodes the city, then searches restaurants, then checks opening hours. Gemini handles this naturally if you keep the loop running until no function_call parts are returned.

def run_tool_loop(prompt: str, tools: list, max_turns: int = 6) -> str:
    history = [types.Content(role="user", parts=[types.Part(text=prompt)])]
    config = types.GenerateContentConfig(tools=tools)

    for _ in range(max_turns):
        response = client.models.generate_content(
            model="gemini-2.5-flash",
            contents=history,
            config=config,
        )
        parts = response.candidates[0].content.parts
        calls = [p.function_call for p in parts if p.function_call]

        if not calls:
            return response.text

        history.append(response.candidates[0].content)
        response_parts = [
            types.Part.from_function_response(name=c.name, response=dispatch(c))
            for c in calls
        ]
        history.append(types.Content(role="user", parts=response_parts))

    raise RuntimeError("Tool loop exceeded max turns")

The max_turns guard is non-negotiable. Without it, a confused model can ping-pong tools forever and run up your token bill. A budget of four to six turns covers nearly every legitimate chain; anything longer is usually a planning failure that you want surfaced as an error rather than masked.

For deeper chains, consider streaming. Gemini supports generate_content_stream, which delivers partial responses and function calls as they arrive. Subsequently, you can show “calling tool X” status to your users instead of a spinner, which dramatically improves perceived latency. For more on streaming patterns, see our guide on streaming AI chatbot responses.

Error Handling and Validation

Treat the model’s tool calls as untrusted input. The schema constrains shape, but it does not constrain semantics. Gemini might pass days=14 when your forecast API only supports seven, or send a city name with a typo. Validate before dispatching.

from pydantic import BaseModel, Field, ValidationError, conint

class ForecastArgs(BaseModel):
    location: str = Field(min_length=1, max_length=80)
    days: conint(ge=1, le=7)

def safe_get_forecast(raw_args: dict) -> dict:
    try:
        args = ForecastArgs(**raw_args)
    except ValidationError as e:
        return {"error": "invalid_arguments", "details": e.errors()}
    return get_forecast(args.location, args.days)

When you return an error object, Gemini sees it as the function result and can decide to apologize, ask for clarification, or retry with corrected arguments. That feedback loop is far more useful than raising an exception and crashing the request.

For unexpected exceptions inside the tool itself, catch and serialize them rather than letting them bubble:

def dispatch(call):
    handlers = {"get_forecast": safe_get_forecast}
    try:
        return handlers[call.name](call.args)
    except Exception as e:
        return {"error": "tool_failure", "message": str(e)}

For production patterns around validating LLM outputs more broadly, our walkthrough on OpenAI structured outputs covers complementary techniques that apply here too.

When to Use Gemini Function Calling

  • You need the model to take actions in real systems (booking, search, ticket creation) rather than just generate text
  • You want structured arguments extracted from messy natural-language input
  • You are routing user intent across a small set of well-defined backend operations
  • Your downstream code needs a stable JSON contract you can validate
  • Latency tolerates one or two extra round-trips per request (typical for chat UX)

When NOT to Use Gemini Function Calling

  • The output is purely text with no actions to take; plain generate_content is cheaper and faster
  • You need sub-100ms total latency; the loop adds round-trips that are hard to compress
  • The “tools” are trivial transformations the model can do inline (uppercase, summarize, translate)
  • You want guarantees against any tool call being made; use a different model surface or strict response schemas
  • Your tool set is huge (50+ functions); break it up by intent classifier first, or you waste tokens describing tools the model will never pick

Common Mistakes with Gemini Function Calling

The first trap is vague descriptions. If two tools have overlapping descriptions, Gemini picks inconsistently. Therefore, write descriptions that include trigger phrases (“Use when the user asks about future days”) and explicit boundaries (“Do not use for historical weather”). Treat description authoring like prompt engineering, because that is exactly what it is.

A second mistake is ignoring the order of function responses in parallel calls. The SDK expects responses in the same sequence as the calls. Reordering them does not crash, but it confuses the model and degrades answer quality silently.

Another frequent issue is missing the max-turns guard in compositional loops. Without it, a poorly described tool can cause the model to retry the same call indefinitely. Cap turns, log the trajectory, and surface an error.

Furthermore, developers often return Python objects directly from tool handlers. Gemini needs JSON-serializable values. Therefore, convert Pydantic models with .model_dump(), datetimes with .isoformat(), and decimals to floats before returning.

Finally, forgetting to append the model’s tool-call message to history is the single most common bug. The conversation only makes sense if Gemini’s functionCall part and your functionResponse part live in adjacent turns. Skip the assistant turn and the model will either re-call the tool or produce nonsense.

Real-World Scenario: A Customer Support Routing Bot

Consider a mid-sized SaaS product with a support inbox that receives several thousand messages per week. A small support team wants Gemini function calling to triage incoming messages into three actions: create a billing ticket, escalate an outage, or reply with a help-article link. The tools are intentionally narrow so the model’s role stays bounded.

def create_billing_ticket(customer_id: str, summary: str) -> dict:
    return {"ticket_id": "BILL-1042", "status": "open"}

def escalate_outage(component: str, severity: str) -> dict:
    return {"incident_id": "INC-77", "paged": ["oncall-sre"]}

def fetch_help_article(topic: str) -> dict:
    return {"url": f"https://help.example.com/{topic}", "title": f"Help: {topic}"}

In production, this pattern typically catches roughly 60 to 80 percent of routine triage without a human, with the remaining cases falling through to the team. The trade-off is that you must invest real effort in the tool descriptions and feed back misroutes into your prompt, since drift creeps in as your product surface changes.

Common pitfalls in this scenario include letting the support agent call multiple tools when a single action is needed and accepting tickets with empty summaries. Both are solved with tool_config.mode="ANY" when intent is unambiguous and with Pydantic validation on arguments. For deeper patterns on agent tool use, see our guide on building AI agents with tools and planning.

Finally, log every turn. Store the user prompt, the chosen tool, the arguments, the result, and the final response. Subsequently, when an answer is wrong you can replay the trajectory and decide whether the fix belongs in the prompt, the schema, or the underlying tool. This is the same observability discipline that makes Claude tool use workflows debuggable, and it transfers cleanly across providers.

Wiring It Into a Larger Stack

If you already have a Gemini integration handling images and video with Gemini API multimodal vision and video, function calling slots in next to it without architectural changes. The same client and config object accepts both tools= and multimodal Part inputs in the same call. Therefore, you can build a single endpoint that takes an image, a question, and a toolbelt, then lets the model decide whether to answer from the image directly or call a tool.

For multi-provider deployments where you may route between Gemini and other models, keep your tool handlers framework-agnostic. The provider differences live at the declaration layer (schema dialect, request shape) and the response-parsing layer; the actual Python functions should not care which model invoked them. That separation is what makes it cheap to swap providers later, and it pairs well with the patterns from getting started with the Claude API.

Final Thoughts on Gemini Function Calling

Gemini function calling is the most practical way to turn the model into a planner that drives your existing systems. Start with one or two well-described tools, validate every argument, cap your tool loop, and add parallel calls only once the simple case is solid. As a result, you get a deterministic JSON contract on top of a flexible natural-language interface, which is exactly the combination production systems need.

Next, pick one slow internal workflow at your company, list the three or four functions it really needs, and prototype it with the loop in this post. Then compare the failure modes against the common mistakes section before shipping. For wider patterns on prompts that drive these tools well, our notes on prompt engineering best practices pair naturally with the function-calling work here.

Leave a Comment