Building Generative AI Applications with Vertex AI

There's a meaningful difference between calling the Gemini API directly and deploying through Vertex AI. The direct API works great for prototyping. Vertex AI is what you need for production: data governance, audit logging, VPC security, model versioning, and team cost attribution.

Pattern 1: Basic Gemini via Vertex SDK

import vertexai
from vertexai.generative_models import GenerativeModel, GenerationConfig

vertexai.init(project="my-gcp-project", location="us-central1")

model = GenerativeModel(
    model_name="gemini-2.0-pro-exp-02-05",
    system_instruction="You are a senior financial analyst. Always cite sources.",
)

generation_config = GenerationConfig(
    temperature=0.2,
    max_output_tokens=2048,
    response_mime_type="application/json",
    response_schema={
        "type": "object",
        "properties": {
            "summary": {"type": "string"},
            "risk_level": {"type": "string", "enum": ["LOW", "MEDIUM", "HIGH"]},
        }
    }
)

response = model.generate_content(
    "Analyze Q4 2025 AAPL earnings: Revenue $94.9B (+4%), Services +14%",
    generation_config=generation_config,
)

import json
print(json.loads(response.text))

JSON Mode

Vertex AI Gemini 2.0 supports constrained JSON output with schema enforcement — the model is guaranteed to produce valid JSON matching your schema. This eliminates complex post-processing for structured extraction tasks.

Pattern 2: Grounding with Google Search

One of Vertex AI's most powerful exclusive features is grounding Gemini responses with real-time Google Search results, eliminating knowledge cutoff limitations.

from vertexai.generative_models import GenerativeModel, Tool, grounding

model = GenerativeModel("gemini-2.0-pro-exp-02-05")

google_search_tool = Tool.from_google_search_retrieval(
    grounding.GoogleSearchRetrieval(disable_attribution=False)
)

response = model.generate_content(
    "What are the latest Claude 4 capabilities announced by Anthropic?",
    tools=[google_search_tool],
)

print(response.text)
for chunk in response.candidates[0].grounding_metadata.grounding_chunks:
    print(f"Source: {chunk.web.title} — {chunk.web.uri}")

Pattern 3: RAG with Vertex AI Search

For enterprise RAG over your own documents, Vertex AI Search (formerly Enterprise Search) handles chunking, embedding, and indexing automatically. You query it as a grounding source:

from vertexai.preview.generative_models import grounding, Tool

DATASTORE_ID = "projects/my-project/locations/us/collections/default_collection/dataStores/my-docs"

vertex_search_tool = Tool.from_retrieval(
    grounding.Retrieval(
        source=grounding.VertexAISearch(datastore=DATASTORE_ID),
    )
)

model = GenerativeModel("gemini-2.0-pro-exp-02-05")
response = model.generate_content(
    "What is our refund policy for enterprise SaaS contracts?",
    tools=[vertex_search_tool],
)
print(response.text)  # Grounded in your internal documents

Pattern 4: Function Calling (Tool Use)

from vertexai.generative_models import FunctionDeclaration, Tool, GenerativeModel

get_crm_record = FunctionDeclaration(
    name="get_crm_record",
    description="Retrieves a customer record from CRM by ID",
    parameters={
        "type": "object",
        "properties": {
            "customer_id": {"type": "string"},
        },
        "required": ["customer_id"]
    }
)

model = GenerativeModel(
    "gemini-2.0-pro-exp-02-05",
    tools=[Tool(function_declarations=[get_crm_record])]
)
chat = model.start_chat()
response = chat.send_message("What is the MRR for customer CUST-8845?")

if response.candidates[0].function_calls:
    fc = response.candidates[0].function_calls[0]
    result = your_crm_sdk.get_customer(fc.args["customer_id"])
    final = chat.send_message(
        Part.from_function_response(name=fc.name, response={"result": result})
    )
    print(final.text)

Pattern 5: Multi-Turn Chat

model = GenerativeModel(
    "gemini-2.0-flash-exp",
    system_instruction="You are a Python coding assistant."
)

chat = model.start_chat()
chat.send_message("Write a function to parse ISO 8601 timestamps")
chat.send_message("Add timezone handling to that function")  # Remembers context
resp = chat.send_message("Write unit tests for the final version")
print(resp.text)  # Tests for the timezone-aware version

Enterprise Security: VPC-Peered Invocation

For financial and healthcare institutions, sending data to a public inference endpoint is a non-starter. Vertex AI solves this by allowing developers to invoke Gemini models strictly within a Private Service Connect (VPC Network).

# Example: Invoking Vertex AI Gemini within a secure Google Cloud VPC
from google.cloud import aiplatform

# Initialize with strict regionality (e.g., EU data residency)
aiplatform.init(
    project="your-enterprise-project",
    location="europe-west4", # Netherlands
    # The models are served within your VPC boundary
    # Network traffic never traverses the public internet
    network="projects/123456/global/networks/vpc-secure-env"
)

from vertexai.generative_models import GenerativeModel
model = GenerativeModel("gemini-1.5-pro")

response = model.generate_content("Analyze this highly confidential PII data...")
print(response.text)

Conclusion

Vertex AI provides the most complete production genAI stack in the Google Cloud ecosystem. The combination of Gemini 2.0's multimodal capabilities, Google Search grounding, Vertex AI Search for private RAG, JSON mode schema enforcement, and function calling gives you everything needed to ship reliable, enterprise-grade AI applications.