QuickStart

Use the following steps to begin building Enterprise AI Agents with the OCI Responses API.

Tip

For an overview of the steps and features offered in this QuickStart, keep this page collapsed and review the titles. Then, expand each title for details. The first six steps setup your environment to get a model response using the Responses API. The rest of the sections show other features for building agents.

Get a Model Response with Responses API

Prerequisites

Before you create a project or call the OCI Responses API, ensure that the required IAM permissions are in place.

Grant access to OCI Generative AI resources

sk an administrator to grant the user group that you belong to permission to manage OCI Generative AI resources in the compartment used for this QuickStart, <QuickStart-compartment-name>.

allow group <your-group-name> 
to manage generative-ai-family
in compartment <QuickStart-compartment-name>

With this permission, you can create all Generative AI service resources including projects, API keys, vector stores, and semantic stores in the <QuickStart-compartment-name>.

Important

This QuickStart uses a sandbox-style setup for its permissions. We recommend that you grant broad access only to administrators or users working in sandbox environments. For production use, apply more restrictive policies.
1. Create a Project

A project is the foundational resource for organizing AI agents and related assets in OCI Generative AI. You can create a project by using the Console.

After you create a project, you can manage it through the Console. For example, you can update its details, move it to another compartment, manage tags, or delete it. These actions are available from the Actions menu (three dots) on the project list page.

To begin, navigate to the project list page and select Create project.

Basic information

Start by defining the core attributes of the project.

  • Name (optional):

    Provide a name that begins with a letter or underscore, followed by letters, numbers, hyphens, or underscores. The length can be 1 to 255 characters. If you don't specify a name, one is generated automatically by using the format generativeaiproject<timestamp>, for example generativeaiproject20260316042443. You can update this later.

  • Description (optional):

    Add a brief description to help identify the purpose of the project.

  • Compartment:

    Select the <QuickStart-compartment-name> compartment.

Data retention

Configure how long the generated data is stored.

  • Response retention:

    Defines how long individual model responses are stored after generation.

  • Conversation retention:

    Decides how long an entire conversation is retained after its most recent update.

You can set both values in hours, up to a maximum of 720 hours (30 days).

Short-term memory compaction config

Short-term memory compaction summarizes recent conversation history into a compact representation. It helps maintain context while reducing token usage and latency.

Important

  • You can select the compaction model only at creation time and can't change this option later.
  • If enabled, this feature can't be disabled without deleting the project.

Long-term memory config

By enabling long-term memory, the service can extract and persist important information from conversations for future use. This data is stored as embeddings, making it searchable and reusable across interactions.

  • Enable (optional):

    Turn on long-term memory to retain key insights from conversations.

  • Model selection:

    Select the following models:

    • Extraction model: Identifies and captures important information.
    • Embedding model: Converts stored data into vector representations for retrieval.
Important

  • You must select these models during project creation.
  • After it's enabled, you can't change the models and long-term memory can't be disabled unless the project is deleted.
Tip

For best results, set both response retention and conversation retention to the maximum duration of 720 hours when you use long-term memory.

Tags

Tags help you organize and manage resources.

  • Optional: Select Add tag to assign metadata to the project.

    For more information, see Resource Tags.

When you are finished, select Create.

2. Set Up Authentication

The OCI Responses API supports two authentication methods. You can use either option.

  • OCI Generative AI API keys

    Authenticate and reach the models and endpoints with OpenAI-compatible Generative AI service-specific API keys. Use this option for testing and early development.

  • OCI IAM-based authentication

    Authenticate and reach the models and endpoints by using signed API requests. We recommend this option for production workloads and OCI-managed environments. Consider using IAM authentication when you:

    • Run applications in OCI services such as Functions or Oracle Kubernetes Engine (OKE)
    • Want to avoid long-lived API keys
    • Need centralized access control through OCI IAM
2.1 (a) Create an API Key

This step is required only if you use OCI Generative AI API key authentication.

Create an API key to authenticate requests to OCI Generative AI. You can name the key and optionally configure up to two key names with expiration dates and times.

You can create and manage API keys by using the Console, CLI, or API.

  • On the API keys list page, select Create API key.

    Basic Information

    1. Enter a Name for the API key. This field is required. Start the name with a letter or underscore, followed by letters, numbers, hyphens, or underscores. The length can be 1 to 255 characters.
    2. (Optional) Enter a Description.
    3. Save the API key in the <QuickStart-compartment-name> compartment.
      You must have permission to work in a compartment to see the resources in it. If you're not sure which compartment to use, contact an administrator. For more information, see Understanding Compartments.
    4. (Optional) Assign Tags to this API key. See Resource Tags.

    Key Names and Expiration Times

    1. Enter a name for the first key in Key one name. Start the name with a letter or underscore, followed by letters, numbers, hyphens, or underscores. The length can be 1 to 255 characters.
    2. (Optional) Set Key one expiration date and Key one expiration time (UTC).
      The default value is three months from the creation date.
    3. Enter a name for the second key in Key two name.
    4. (Optional) Set the Key two expiration date and Key two expiration time (UTC).
      The default value is three months from the creation date.
    5. Select Create.
  • Use the api-key create command and required parameters to create an API key.

    oci generative-ai api-key create [OPTIONS]

    For a complete list of parameters and values for CLI commands, see the CLI Command Reference.

  • Run the CreateApiKey operation to create an API key. Provide the display name, optional description, and any key names with their expiration timestamps in UTC.

2.1 (b) Add Permission to the API Key

This step is required only if you use OCI Generative AI API key authentication.

Find the API Key OCID

To scope permissions to a specific API key, first get its OCID.

In the Console:

  1. Open the API Keys list page.
  2. Select the API key you created.
  3. Copy the OCID. It typically starts with ocid1.generativeaiapikey....

Grant Permission to the API Key

To allow a specific group of users to invoke the Responses API with this API key, add the following IAM policy:

allow group <agent-builders-with-Responses-API-group>
to manage generative-ai-response 
in compartment <QuickStart-compartment-name> 
where ALL {request.principal.type='generativeaiapikey',
request.principal.id='<api-key-OCID>'}

This policy allows requests authenticated with the specified API key to access the Responses API while keeping access scoped and controlled.

For broader sandbox access only, you can use the following policy:

allow any-user to use generative-ai-family 
in compartment <QuickStart-compartment-name> 
where ALL {request.principal.type='generativeaiapikey', 
request.principal.id='<your-api-key-OCID>'}
2.2 (a) Install the OCI Generative AI IAM Auth Library

This step is required only if you use OCI Generative AI API key authentication. If you are using OCI Generative AI API keys, skip this step.

Install the OCI Generative AI authentication helper package:

pip install oci-genai-auth

The oci-genai-auth package provides authentication helpers for integrating OCI IAM authentication with the OpenAI SDK, including:

  • OciSessionAuth for local development
  • OciUserPrincipalAuth for users using OCI IAM API signing keys
  • OciInstancePrincipalAuth
  • OciResourcePrincipalAuth for OCI-managed environments

When you use OCI IAM authentication with the OpenAI client, set api_key="not-used" and provide an authenticated http_client.

For OciUserPrincipalAuth, set up an OCI config file for the identity that signs requests. See the following related topics:

3. Install openai

Install the Official OpenAI SDK.

Python

pip install openai
Note

To invoke the Responses API, use the OpenAI SDK, not the OCI SDK. Also ensure that you use the latest version of the OpenAI SDK.

For other languages, see the OpenAI libraries page.

4. Find the Endpoint for OCI Responses API

The OCI Responses API is the primary API for building Enterprise AI agentic applications in OCI Generative AI. It lets you send model requests, use tools, and manage conversation context through a single API.

Use the Responses API to:

  • Run simple, single-step inference or multi-step agent workflows
  • Enable or disable reasoning, depending on the use case
  • Integrate platform-managed or client-side tools
  • Manage conversation state either in the service or in the client

The OCI Responses API uses an OpenAI-compatible interface. Requests use the same OpenAI-style syntax, but they're sent to OCI Generative AI and authenticated with OCI credentials, such as OCI Generative AI API keys or OCI IAM-based authentication.

The recommended client is the OpenAI SDK. It supports languages such as Python, Java, TypeScript, Go, and .NET. You can also use it with frameworks such as LangChain, LlamaIndex, and OpenAI Agents SDK.

Use the following base URL:

Base URL: https://inference.generativeai.${region}.oci.oraclecloud.com/openai/v1

Replace ${region} with your OCI region, for example, us-chicago-1.

For the Responses API, use the following endpoint path:

/responses
5. Find Supported Model IDs and Regions

You can use the OCI Responses API to call different types of models available in OCI Generative AI supported regions. For a list of supported models and regions, see Agent Models and Regions.

On-Demand Mode

On-demand models are hosted and managed by OCI and are available without requiring dedicated AI clusters. Examples:

response = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
    input="Write a one-sentence explanation of what a database is."
)
response = client.responses.create(
    model="google.gemini-2.5-pro",
    input="Write a one-sentence explanation of what a database is."
)

Dedicated Mode

For production workloads requiring isolation or predictable performance, you can host models on a dedicated AI cluster. In this case, use the cluster endpoint OCID as the model identifier:

response = client.responses.create(
    model="<dedicated-ai-cluster-endpoint-ocid>",
    input="Write a one-sentence explanation of what a database is."
)

Ensure that you select the same region as the cluster, when you send the request.

Select a model in an available region, with the available mode that best fits your requirements for performance, cost, and control.

6. Create Your First Response

The following examples shows how to call the Responses API by using Python. If the request returns an explanation, the OCI Responses API is working correctly.

API Key Example
from openai import OpenAI

client = OpenAI(
    base_url="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/openai/v1",  # change the region if needed
    api_key="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",  # replace with your Generative AI API key created in Step 2
    project="ocid1.generativeaiproject.oc1.us-chicago-1.xxxxxxxx"  # replace with your Generative AI project OCID created in Step 1
)

response = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
    input="Write a one-sentence explanation of what a database is."
)

print(response.output_text)
OciUserPrincipalAuth Example

Use this approach when using OCI IAM API signing keys (not OCI Generative AI API keys). See Required Keys and OCIDs.

from openai import OpenAI
from oci_openai import OciUserPrincipalAuth
import httpx

client = OpenAI(
    base_url="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/openai/v1",
    api_key="not-used",
    project="ocid1.generativeaiproject.oc1.us-chicago-1.xxxxxxxx",
    http_client=httpx.Client(
        auth=OciUserPrincipalAuth(
            config_file="~/.oci/config",
            profile_name="DEFAULT",
        )
    )

response = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
    input="Write a one-sentence explanation of what a database is."
)

print(response.output_text)
OciSessionAuth Example

Use this approach when running code locally:

from openai import OpenAI
from oci_openai import OciSessionAuth
import httpx

client = OpenAI(
    base_url="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/openai/v1",  # update region if needed
    api_key="not-used",
    project="ocid1.generativeaiproject.oc1.us-chicago-1.xxxxxxxx",  # project OCID created earlier
    http_client=httpx.Client(auth=OciSessionAuth(profile_name="DEFAULT"))  # update profile if needed
)

response = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
    input="Write a one-sentence explanation of what a database is."
)

print(response.output_text)
OciResourcePrincipalAuth Example

Use this approach when workiing with managed environments such as OCI Functions or OCI Container Engine for Kubernetes (OKE):

from openai import OpenAI
from oci_openai import OciResourcePrincipalAuth
import httpx

client = OpenAI(
    base_url="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/openai/v1",  # update region if needed
    api_key="not-used",
    project="ocid1.generativeaiproject.oc1.us-chicago-1.xxxxxxxx",  # project OCID created earlier
    http_client=httpx.Client(auth=OciResourcePrincipalAuth()),
)

response = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
    input="Write a one-sentence explanation of what a database is."
)

print(response.output_text)

More Features

Enable Debug Logging

If you encounter issues when calling the API, enabling debug logging can help with troubleshooting. Debug logs display the raw HTTP requests and responses, including the opc-request-id, which is useful when working with Oracle support.

You can reference this request ID when reporting issues to help identify and diagnose problems more quickly.

from openai import OpenAI
import logging

logger = logging.getLogger("openai")
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())

# Create and use the OpenAI client as usual
client = OpenAI(
    ...
)
Stream Responses

The OCI Responses API supports streaming, where you can receive model outputs incrementally as the tokens are generated.

Stream All Events

response_stream = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Explain the difference between structured and unstructured data.",
    stream=True
)

for event in response_stream:
    print(event)

Stream Only Text Output (Delta Tokens)

response_stream = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Explain the difference between structured and unstructured data.",
    stream=True
)

for event in response_stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

Streaming is especially useful for interactive applications where users can read the responses as they're generated.

Add Reasoning

Reasoning controls let you tune how much effort the model uses before producing a response. This is useful when you want to prioritize speed, depth, or a balance of both.

Tip

Review the key features in the model's detail page to ensure the model that you're calling has reasoning. See Available Chat Models.

Reasoning Effort

response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Solve 18 * (4 + 2).",
    reasoning={"effort": "medium"},
    store=False,
)

print(response.output_text)

Reasoning Summary Output

If you're building a chatbot, enabling reasoning summaries can help users better understand how the model arrived at a result. During streaming, users can also see reasoning tokens while the model is thinking.

response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Solve 18 * (4 + 2).",
    reasoning={"summary": "auto"},
    store=False,
)

print(response.output_text)
Get Structured Outputs

Use structured outputs when you want the model to return data in a predictable format instead of free-form text. This is useful for extracting fields from unstructured input, passing results to other systems, or showing specific values in a UI. In the OCI Responses API, you define a schema and parse the response against it, which makes the output more consistent and easier for applications to use. Structured outputs match a supplied schema, while a JSON mode only guarantees valid JSON.

A strongly typed object is an object whose fields and data types are defined in advance. For example, if a schema says customer_name must be a string and priority must be one of "low", "medium", or "high", the parsed result follows that structure. This makes it easier for code to work with the response safely and predictably. The OCI Responses API supports this by allowing you to define a schema and parse the model output into strongly typed objects.

This approach is useful when integrating with downstream systems, enforcing consistency, or extracting specific fields from natural language input.

from pydantic import BaseModel
from typing import Literal

class SupportRequest(BaseModel):
    customer_name: str
    product: str
    issue_summary: str
    priority: Literal["low", "medium", "high"]
    requested_action: str

response = client.responses.parse(
    model="<supported-model-id>",
    input=[
        {"role": "system", "content": "Extract the support request into the schema."},
        {
            "role": "user",
            "content": (
                "Sarah Johnson from Example Company says the mobile inventory app "
                "crashes whenever she scans more than 20 items in one session. "
                "This is delaying warehouse processing, and she wants a fix as soon as possible."
            ),
        },
    ],
    text_format=SupportRequest,
)

support_request = response.output_parsed
print(support_request)

Example output:

SupportRequest(
customer_name='Sarah Johnson', 
product='mobile inventory app', 
issue_summary='App crashes whenever she scans more than 20 items in one session.', 
priority='high', 
requested_action='Provide a fix as soon as possible'
)
Send Multimodal Inputs

The OCI Responses API supports models that accept multimodal inputs. You can combine text with images, files, and reasoning controls to support richer workflows such as document analysis, image understanding, and more deliberate model responses.

Tip

Review the key features in the model's detail page to ensure the model that you're calling accepts multimodal inputs. See Available Chat Models.

Image Input as Base64-Encoded Data URL

import base64

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

base64_image = encode_image("/path/to/image.png")

response = client.responses.create(
    model="google.gemini-2.5-pro",
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": "Describe the main objects in this image."},
                {
                    "type": "input_image",
                    "image_url": f"data:image/jpeg;base64,{base64_image}",
                    "detail": "high",
                },
            ],
        }
    ],
)

print(response.output_text)

Image Input as an Internet URL

response = client.responses.create(
    model="google.gemini-2.5-pro",
    store=False,
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": "Describe the scene shown in this image."},
                {
                    "type": "input_image",
                    "image_url": "https://example.photos/id/123",
                },
            ],
        }
    ],
)

print(response.output_text)

Replace image_url with a valid image URL.

File Input as File ID

Important

The file ID input feature is supported only with Google Gemini models. For each request, the combined size of all uploaded PDF files must be under 50 MB, and you can provide a maximum of 10 file IDs in the request. See supported Gemini models.
file = client.files.create(
    file=open("<path-to-file>", "rb"),
    purpose="user_data"
)

response = client.responses.create(
    model="google.gemini-2.5-pro",
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_file",
                    "file_id": file.id,
                },
                {
                    "type": "input_text",
                    "text": "Summarize this document.",
                },
            ]
        }
    ]
)

print(response.output_text)

File Input as Internet-Accessible URL

response = client.responses.create(
    model="google.gemini-2.5-flash",
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": "Summarize this file."},
                {
                    "type": "input_file",
                    "file_url": "https://www.example.com/letters/example-letter.pdf",
                },
            ],
        }
    ],
)

print(response.output_text)

Add Tools to Responses API

Code Interpreter

Use Code Interpreter to let the model write and run Python code in a secure, isolated container. This tool is useful for calculations, data analysis, file processing, and other computation-heavy tasks.

Note

The OCI code interpreter tool uses the same format as the OpenAI code interpreter tool used with the Responses API with the OCI OpenAI-compatible endpoint. For syntax and request details, see the Code Interpreter topic in OpenAI documentation.
Tip

In prompts, reference the Code Interpreter as the python tool. For example: Use the python tool to solve the problem.

Because code runs in an isolated environment with no external network access, Code Interpreter is a good option when the workflow needs computation or file processing in a controlled setting.

Using the Code Intepreter

If you're using this for the first time, think of Code Interpreter as a temporary Python workspace for the model.

You can use it for tasks such as:

  • Solving math problems
  • Analyzing uploaded files
  • Cleaning or transforming data
  • Creating charts or tables
  • Generating output files such as logs or processed datasets

Execution Environment

The Python environment includes more than 420 preinstalled libraries, including Pandas, Matplotlib, and SciPy, so many common tasks work without extra setup.

The code runs inside a sandbox container. This container is the working environment where Python runs and where uploaded files, generated files, and temporary working data are stored during the session.

Container Memory Limits

Code Interpreter containers use a shared memory pool of 64 GB per tenancy.

Supported container sizes are:

  • 1 GB
  • 4 GB
  • 16 GB
  • 64 GB

This shared limit can be divided across multiple containers. For example, it can support:

  • Sixty-four 1 GB containers
  • Sixteen 4 GB containers
  • Four 16 GB containers
  • One 64 GB container

If you need more capacity, you can submit a service request.

Container Expiration

A container expires after 20 minutes of inactivity.

This is important to know when building multi-step flows:

  • An expired container can't be reused
  • You must create a new container
  • Files must be uploaded again if needed
  • In-memory state, such as Python variables and Python objects, is lost

Because of this, it's best to treat containers as temporary working environments.

Example

To use Code Interpreter, add a tool definition in the tools property with "type": "code_interpreter".

response = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
    tools=[
        {
            "type": "code_interpreter",
            "container": {"type": "auto"}
        }
    ],
    instructions="Use the python tool to solve the problem and explain the result.",
    input="Find the value of (18 / 3) + 7 * 2."
)

print(response.output_text)

In this example, the python tool is Code Interpreter.

Containers for Code Interpreter

The Code Interpreter tool requires a container object. The container is the isolated environment where the model runs Python code.

A container can hold:

  • Uploaded files
  • Files created by the model
  • Temporary working data during execution

When you use Code Interpreter, you can use one of two container modes:

  • auto: OCI Generative AI provisions or reuses a container in the current context.
  • container OCID: You create the container yourself, define the memory size, and provide the OCID.

For both options, the containers are created and managed in OCI Generative AI. The code that runs in those containers also runs in the OCI Generative AI tenancy.

Note

The OCI code interpreter tool uses the same format as the OpenAI code interpreter tool used with the Responses API with the OCI OpenAI-compatible endpoint. For syntax and request details, see Containers for Code Intepreters and OpenAI Containers API documentation.

Auto Mode

In auto mode, the service provisions or reuses the container for you. This is the easiest option and a good starting point for most users.

Use auto mode when:

  • You want OCI Generative AI to manage the container
  • You don't need direct control over the environment
  • You want a simpler setup

You can optionally specify memory_limit. If you don't specify it, the default is 1 GB.

response = client.responses.create(
    model="xai.grok-code-fast-1",
    tools=[{
        "type": "code_interpreter",
        "container": {
            "type": "auto",
            "memory_limit": "1g"
        }
    }],
    input="Use the python tool to calculate the average of 12, 18, 24, and 30."
)

Explicit Mode

In explicit mode, you create the container first and set the size. Then you pass the container ID in the request.

Use explicit mode when you want more control over container settings, such as memory size, or when you want to maintain a dedicated session.

container = client.containers.create(name="test-container", memory_limit="4g")

response = client.responses.create(
    model="xai.grok-code-fast-1",
    tools=[{
        "type": "code_interpreter",
        "container": container.id
    }],
    tool_choice="required",
    input="Use the python tool to calculate the average of 12, 18, 24, and 30."
)

print(response.output_text)

Files in Code Interpreter

Code Interpreter supports dynamic file interaction during the life of the container. The model can read files you provide and can also create new files.

This is useful for workflows such as:

  • Reading a CSV or PDF
  • Generating a chart
  • Saving processed output
  • Creating logs or reports

API Reference: OpenAI Container Files API documentation and Container Files API.

File Persistence

Files created or updated by the python tool persist across code execution turns in the same container, as long as the container hasn't expired.

This means the model can build on earlier work in the same session. For example, it can:

  • Read a file
  • Analyze it
  • Save a chart
  • Use that chart later in the same container

When the container expires, that state is no longer available.

Uploading and Managing Files

You can manage container files through the Container Files APIs.

Common operations include:

  • Create Container File: Add a file to the container, either by multipart upload or by referencing an existing /v1/files ID
  • List Container Files: View files in the container
  • Delete Container File: Remove a file
  • Retrieve Container File Content: Download a file from the container

This lets you use the container as a temporary workspace for model-driven code execution.

Output Files & Citations

When the model creates files, those files are stored in the container and can be returned as annotations in the response.

These annotations can include:

  • container_id
  • file_id
  • filename

You can use these values to retrieve the generated file content.

The response can include a container_file_citation object that identifies the generated file. Use the Retrieve Container File Content operation to download the file.

The OCI Responses API supports OpenAI-compatible endpoints for features such as Responses, Files, Containers, and Container Files. You can use the related OpenAI documentation as a reference for request structure, response formats, and general workflows. When using these APIs with OCI, send requests to the OCI Generative AI inference endpoints, use OCI authentication, and note that the resources and execution remain in OCI Generative AI, not in an OpenAI tenancy.

Note

The OCI Files API uses the same format as the OpenAI Files API with the OCI OpenAI-compatible endpoint. For syntax and request details, see the OpenAI Files API documentation.
Note

The OCI code interpreter that's used as a tool for the Ressponses API uses the same format as the OpenAI code interpreter with the OCI OpenAI-compatible endpoint. For syntax and request see the following references:

Function Calling

Use Function Calling to let the model request data or actions from your application during a Responses API workflow. This is useful when the model needs information or operations that aren't available in the prompt itself, such as calendar data, internal application state, or the result of a custom operation.

With this pattern, the model doesn't execute the function directly. Instead, it returns the function name and arguments, your application runs the function, and then your application sends the function output back so the model can continue and produce the user-facing answer.

What This Enables

Function Calling is useful when your application needs to stay in control of execution while still allowing the model to decide when outside information is needed.

Example use cases:

  • Looking up calendar events
  • Retrieving application data
  • Calling internal or external API
  • Running business logic or calculations

Within your function, you can call an external service, a database, one of your own library APIs, a CLI, or a local MCP server.

This approach gives you flexibility while keeping the execution path inside the application.

Execution Flow

A typical function calling interaction works las follows:

  1. The client sends a request that includes one or more function tool definitions.
  2. The model decides whether one of those tools is needed.
  3. If a tool is needed, the model returns the function name and arguments.
  4. The application runs the function and prepares the result.
  5. The application sends that result back in a follow-up request.
  6. The model uses that result to complete the response.

State Handling Options

There are two common ways to manage state across these requests:

  • Service-managed state

    Recommended for most use cases. The follow-up request includes previous_response_id, and the service tracks the earlier exchange.

  • Client-managed state

    The application keeps the full interaction history and sends the accumulated context with each request.

Tip

Keep tool definitions precise. Clear names, well-written descriptions, and well-defined parameters help the model select the right tool and generate usable arguments.

Define a Function Tool

To define a function tool, add an entry in the tools property with "type": "function".

The following example defines a tool that retrieves calendar events for a specified date:

tools = [
    {
        "type": "function",
        "name": "get_calendar_events",
        "description": "Return calendar events scheduled for a specific date.",
        "parameters": {
            "type": "object",
            "properties": {
                "date": {
                    "type": "string",
                    "description": "Date to query, for example 2026-04-02"
                }
            },
            "required": ["date"],
        },
    },
]

Include this tools array in the client.responses.create() request.

Example: Service-Managed State

In this pattern, the first request lets the model decide whether the tool is needed. The second request sends the tool result back and references the earlier response.

import json

tools = [
    {
        "type": "function",
        "name": "get_calendar_events",
        "description": "Return calendar events scheduled for a specific date.",
        "parameters": {
            "type": "object",
            "properties": {
                "date": {
                    "type": "string",
                    "description": "Date to query, for example 2026-04-02"
                }
            },
            "required": ["date"],
        },
    },
]

def get_calendar_events(date):
    # Replace this with actual calendar logic or an API call
    return [
        {"time": "09:00", "title": "Team standup"},
        {"time": "13:00", "title": "Design review"},
        {"time": "16:00", "title": "Project check-in"},
    ]

# Initial request
response = client.responses.create(
    model="openai.gpt-oss-120b",
    tools=tools,
    input="Show the calendar events for 2026-04-02.",
)

# Execute the requested function
tool_outputs = []
for item in response.output:
    if item.type == "function_call" and item.name == "get_calendar_events":
        args = json.loads(item.arguments)
        events = get_calendar_events(**args)
        tool_outputs.append({
            "type": "function_call_output",
            "call_id": item.call_id,
            "output": json.dumps({"events": events}),
        })

# Follow-up request
final = client.responses.create(
    model="openai.gpt-oss-120b",
    instructions="Summarize the schedule clearly for the user.",
    tools=tools,
    input=tool_outputs,
    previous_response_id=response.id,
)

print(final.output_text)

Example: Client-Managed State

In this pattern, the application keeps the full exchange and resubmits it with the follow-up request.

import json

tools = [
    {
        "type": "function",
        "name": "get_calendar_events",
        "description": "Return calendar events scheduled for a specific date.",
        "parameters": {
            "type": "object",
            "properties": {
                "date": {
                    "type": "string",
                    "description": "Date to query, for example 2026-04-02"
                }
            },
            "required": ["date"],
        },
    },
]

def get_calendar_events(date):
    # Replace this with actual calendar logic or an API call
    return [
        {"time": "09:00", "title": "Team standup"},
        {"time": "13:00", "title": "Design review"},
        {"time": "16:00", "title": "Project check-in"},
    ]

conversation = [
    {"role": "user", "content": "Show the calendar events for 2026-04-02."}
]

response = client.responses.create(
    model="openai.gpt-oss-120b",
    tools=tools,
    input=conversation,
)

conversation += response.output

for item in response.output:
    if item.type == "function_call" and item.name == "get_calendar_events":
        args = json.loads(item.arguments)
        events = get_calendar_events(**args)
        conversation.append({
            "type": "function_call_output",
            "call_id": item.call_id,
            "output": json.dumps({"events": events}),
        })

final = client.responses.create(
    model="openai.gpt-oss-120b",
    instructions="Summarize the schedule clearly for the user.",
    tools=tools,
    input=conversation,
)

print(final.output_text)

Function calling is a strong option when the application must remain responsible for execution, access control, and integration logic, while still allowing the model to request the information it needs.

MCP Calling

MCP (Model Context Protocol) tools are executable functions or capabilities that AI models can use to interact with external systems. Use MCP Calling in OCI Generative AI to let a model access executable tools exposed by a remote MCP server during a Responses API request. These tools can provide access to external systems such as API, databases, file systems, or application endpoints. OCI Generative AI communicates directly with remote MCP servers as part of the request workflow.

When To Use MCP Calling

Use MCP Calling when the model needs access to tools hosted on a remote MCP server. This approach is useful when you want:

  • The Enterprise AI Agent platform to communicate with the MCP server directly
  • Fewer client-side orchestration steps
  • Lower latency than a client-executed tool pattern
  • Access to tools exposed through a remote MCP server

Key Features

MCP Calling provides the following benefits:

  • Direct platform-to-server communication: Unlike standard function calling, which returns control to the client application, MCP Calling lets OCI Generative AI communicate directly with the remote MCP server.
  • Lower latency: Because the request doesn't require an extra client round trip, MCP Calling can reduce orchestration overhead.

  • Transport Support: Supports Streamable HTTP (SSE deprecated and unsupported).

Defining an MCP Tool

To define an MCP tool, add an entry in the tools property with "type": "mcp".

response_stream = client.responses.create(
    model="openai.gpt-5.4",
    tools=[
        {
            "type": "mcp",
            "server_label": "dmcp",
            "server_description": "A Dungeons and Dragons MCP server to assist with dice rolling.",
            "server_url": "https://mcp.deepwiki.com/mcp",
            "require_approval": "never",
        },
    ],
    input="Roll 2d4+1",
    stream=True,
)

for event in response_stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

This example streams the response and prints text as it's generated.

Restrict the Tools Exposed by an MCP Server

A remote MCP server can expose many tools, which can increase cost and latency. If the application needs only a subset of those tools, use allowed_tools to limit the set.

response_stream = client.responses.create(
    model="openai.gpt-oss-120b",
    tools=[
        {
            "type": "mcp",
            "server_label": "dmcp",
            "server_description": "A Dungeons and Dragons MCP server to assist with dice rolling.",
            "server_url": "https://mcp.deepwiki.com/mcp",
            "require_approval": "never",
            "allowed_tools": ["roll"],
        },
    ],
    input="Roll 2d4+1",
    stream=True,
    store=False,
)

Provide Authentication to the MCP Server

If the remote MCP server requires authentication, pass the access token in the authorization field.

response_stream = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
    tools=[
        {
            "type": "mcp",
            "server_label": "calendar",
            "server_url": "https://calendar.example.com/mcp",
            "authorization": "$CALENDAR_OAUTH_ACCESS_TOKEN"
        },
    ],
    input="List my meetings for 2026-02-02.",
    stream=True
)

Pass only the raw token value. Don't include the Bearer prefix.

OCI sends the token in the API request body over TLS. OCI doesn't decode, inspect, store, or log the token. We recommend that you use TLS-encrypted MCP server endpoints.

NL2SQL Tool

About SQL Search with NL2SQL

Use SQL Search (NL2SQL) to convert natural-language requests into validated SQL for enterprise data in OCI Generative AI.

NL2SQL helps Enterprise AI Agents work with federated enterprise data without moving or duplicating the underlying data. It uses a semantic enrichment layer to map business terms to database tables, columns, and joins, and then generates SQL from natural-language input.

NL2SQL generates SQL only. It doesn't run the query itself. Query execution is handled separately through the DBTools MCP Server, which authorizes and runs the query against the source database by using the end user’s identity and the appropriate guardrails.

Prerequisites

Before you use NL2SQL, you must have:

  • A source database, such as Oracle Autonomous Database
  • Two DBTools connections:
    • Enrichment Connection
    • Query Connection
  • IAM permissions for Semantic Stores, NL2SQL, Database Tools, and secrets
  • OCI IAM authentication configured for the OCI service APIs used in this quickstart
1. Create a Database and DBTools Connections

Before you create a Semantic Store, create the source database and the required DBTools connections.

You need the following connections:

Enrichment Connection

The Enrichment Connection is a higher-privileged database connection used during enrichment. It needs privileges to:

  • Execute queries
  • Perform DDL operations
  • Access example values from the database

OCI Generative AI uses this connection to read schema details and build the metadata needed for SQL generation.

Query Connection

The Query Connection is a lower-privileged database connection used to run queries on behalf of the querying user.

This separation helps keep generation and execution responsibilities distinct and supports safer access control.

Releated Topics

2. Set Up IAM Permissions for Semantic Stores and NL2SQL

Before you create a Semantic Store, set up the required IAM policies.

For Semantic Store Administrators

If you gave access to performed QuickStart step to Grant access to OCI Generative AI resources, you can skip this step for the administrators. They already have the permission to manage semantic stores.

Semantic store administrators are admins who create, update, delete, and manage the OCI Generative AI semantic store resource and its NL2SQL-related operations.

Ask an administrator to create an IAM group for the admins. In this topic, the admin group is represented by:

  • <semantic-store-admin>
allow group <semantic-store-admin> 
to manage generative-ai-semantic-store 
in compartment <QuickStart-compartment-name>
allow group <semantic-store-admin> 
to manage generative-ai-nl2sql 
in compartment <QuickStart-compartment-name>
Admin Tasks Available with The Preceding Two Polices

A <semantic-store-admin> can:

  • create the semantic store
  • view and update it
  • delete or move it
  • trigger enrichment
  • inspect enrichment results
  • generate SQL from natural language for validation/testing
  • manage NL2SQL operations tied to the store

For OCI Generative AI Semantic Stores

  • Create a dynamic group for semantic stores that are created in the tenancy or a specified compartment.
  • Grant the dynamic group permission to:
    • Access Database Tools connections
    • Read database metadata
    • Read Autonomous Database metadata
    • Access Generative AI inference
    • Read secrets used by Database Tools connections
  1. Create a dynamic group for asemantic stores in the tenancy with the following matching rule:
    all {resource.type='generativeaisemanticstore'}
  2. To restrict the semantic stores to a specific compartment, update the previous condition to:
    all {resource.type='generativeaisemanticstore',
     resource.compartment.id='<QuickStart-compartment-name>'}
  3. Create a policy to grant the dynamic group permission to access Database Tools connections in a specified compartment.
    allow dynamic-group <dynamic-group-name> 
    to use database-tools-family in compartment <QuickStart-compartment-name>'}
  4. Add a policy to grant the dynamic group permission to read secrets used by Database Tools connections.
    allow dynamic-group <dynamic-group-name> 
    to read secret-family in compartment <QuickStart-compartment-name>
  5. Add a policy to grant the dynamic group permission to read Oracle Database metadata for Database Tools connections.
    allow dynamic-group <dynamic-group-name> 
    to read database-family in compartment <QuickStart-compartment-name>
  6. Add a policy to grant the dynamic group permission to read Autonomous Database metadata for Database Tools connections and enrichment jobs.
    allow dynamic-group <dynamic-group-name> 
    to read autonomous-database-family in compartment <QuickStart-compartment-name>
  7. Add a policy to grant the dynamic group permission to access the OCI Generative AI resources for inference.
    allow dynamic-group <dynamic-group-name> 
    to use generative-ai-family in compartment <QuickStart-compartment-name>
What The Preceding Two Polices Provide

The generativeaisemanticstore resource can:

  • invoke LLM inference through Generative AI
  • use Database Tools connections for enrichment and querying
  • read secrets required by Database Tools-backed connections
  • read Oracle Database and Autonomous Database metadata

For Semantic Store Users

Semantic store users are end users who are allowed to access an existing semantic store and use NL2SQL capabilities, but don't need to administer the resource.

Ask an administrator to create an IAM group for the users. In this topic, the user group is represented by:

  • <semantic-store-users>
allow group <semantic-store-users> 
to read generative-ai-semantic-store 
in compartment <QuickStart-compartment-name>
allow group <semantic-store-users> 
to manage generative-ai-nl2sql 
in compartment <QuickStart-compartment-name>
User Tasks Available with The Preceding Two Polices

The <semantic-store-users> can:

  • view the semantic store
  • use NL2SQL-related capabilities associated with it
  • inspect and query outputs
  • access enrichment information

For User Access to Database Tools Connections

Grant the group access to the required Database Tools resources:

allow group <semantic-store-users>
to use database-tools-family in compartment <QuickStart-compartment-name>
where all {request.principal.type='generativeaisemanticstore'}
allow group <semantic-store-users>
to read database-family in compartment <QuickStart-compartment-name>
where all {request.principal.type='generativeaisemanticstore'}
allow group <semantic-store-users>
to read autonomous-database-family in compartment <QuickStart-compartment-name>
where all {request.principal.type='generativeaisemanticstore'}
3. Set Up OCI IAM Authentication

This QuickStart uses OCI IAM authentication for the OCI service API that create Semantic Stores, run enrichment jobs, and generate SQL.

Set up OCI IAM authentication for the identity that signs requests. The examples in this section use the OCI Python SDK and BaseClient.

You can authenticate by using:

  • An OCI config file and API signing keys
  • A security token signer

The following examples use SecurityTokenSigner, but you can also use a standard OCI config signer if that fits your environment better.

Related Topics

4. Create a Semantic Store

To use NL2SQL, create an OCI Semantic Store resource.

A Semantic Store is backed by a vector store with structured data and includes the following DBTools connections:

  • Enrichment Connection
  • Query Connection

Create a Semantic Store in the Console

  1. Open the vector stores list page.
  2. Enter a name and description.
  3. Select the <QuickStart-compartment-name> compartment.
  4. Under Data Source Type, select Structured data.
  5. Under Configure sync connector, select OCI Database tool as the connection type.
  6. Enter the Enrichment connection ID, then select Test enrichment connection.
  7. Enter the Querying connection ID, then select Test query connection.
  8. In Schemas, specify the database schema names to ingest.
  9. For Automation, select when enrichment runs:
    • None
    • On create
  10. Select Create.

Create a Semantic Store by Using Python

The following example creates and manages a Semantic Store by using the OCI service API:

import json
import oci
from oci.base_client import BaseClient
from oci.retry import DEFAULT_RETRY_STRATEGY

API_VERSION = "20231130"
HOST = "https://generativeai.us-ashburn-1.oci.oraclecloud.com"
BASE_PATH = f"/{API_VERSION}"

def get_signer_auth_security_token(profile="DEFAULT"):
    config = oci.config.from_file("~/.oci/config", profile)
    signer = oci.auth.signers.SecurityTokenSigner(config)
    return config, signer

def make_base_client(signer):
    return BaseClient(
        service_endpoint=HOST,
        signer=signer,
        retry_strategy=None,
    )

def create_semantic_store(client, body: dict):
    return client.call_api(
        resource_path=f"{BASE_PATH}/semanticStores",
        method="POST",
        header_params={"content-type": "application/json"},
        body=body,
    )

if __name__ == "__main__":
    config, signer = get_signer_auth_security_token(profile="DEFAULT")
    client = make_base_client(signer)

    create_body = {
        "displayName": "TestSemanticStore",
        "description": "Semantic store for the ADMIN schema",
        "freeformTags": {},
        "definedTags": {},
        "dataSource": {
            "queryingConnectionId": "ocid1.databasetoolsconnection.oc1.xxx",
            "enrichmentConnectionId": "ocid1.databasetoolsconnection.oc1.xxx",
            "connectionType": "DATABASE_TOOLS_CONNECTION",
        },
        "refreshSchedule": {"type": "ON_CREATE"},
        "compartmentId": "xxx",
        "schemas": {
            "connectionType": "DATABASE_TOOLS_CONNECTION",
            "schemas": [{"name": "ADMIN"}],
        },
    }

    create_resp = create_semantic_store(client, create_body)
    print("CREATE status:", create_resp.status)

    create_payload = create_resp.data
    if isinstance(create_payload, (bytes, str)):
        create_payload = json.loads(create_payload)

    print("CREATE response:", json.dumps(create_payload, indent=2))
5. Run Enrichment Manually

For Automation, if you selected None instead of On create, you can run enrichment after the semantic store is created. Perform this step if you skipped the On create option.

The enrichment process reads schema metadata, such as tables and columns, from the connected database. OCI Generative AI uses this metadata to help generate the SQL.

To trigger enrichment manually, call the GenerateEnrichmentJob API.

You can also manage enrichment jobs by using the following API:

  • ListEnrichmentJobs
  • GetEnrichmentJob
  • CancelEnrichmentJob
6. Find the OCI Endpoints for Semantic Store and GenerateSqlFromNl

NL2SQL uses OCI service APIs rather than the OCI OpenAI-compatible /openai/v1 paths.

Semantic Store CRUD

Use the following base URL for semantic store CRUD operations:

Base URL: https://generativeai.${region}.oci.oraclecloud.com

Use the following endpoint path:

/20231130/semanticStores

Authentication:

  • IAM session only

Enrichment Job API

Use the following base URL for enrichment jobs of a semantic store:

Base URL: https://inference.generativeai.${region}.oci.oraclecloud.com

Use the following endpoint path:

/20260325/semanticStores/{semanticStoreId}/

Authentication:

  • IAM session only

Generate SQL from Natural Language

Use the following base URL for SQL generation:

Base URL: https://inference.generativeai.${region}.oci.oraclecloud.com

Use the following endpoint pattern:

/20260325/semanticStores/{semanticStoreId}/actions/generateSqlFromNl

Authentication:

  • IAM session only

Available Semantic Store and NL2SQL APIs

Semantic Stores

Semantic Stores
  • CreateSemanticStore
  • ListSemanticStores
  • GetSemanticStore
  • UpdateSemanticStore
  • ChangeSemanticStoreCompartment
  • DeleteSemanticStore
Enrichment Jobs
  • ListEnrichmentJobs
  • GetEnrichmentJob
  • GenerateEnrichmentJob
  • CancelEnrichmentJob
Generate SQL
GenerateSqlFromNl
Note

The Base URL differs for the semantic stores and enrichment jobs API.
7. Generate Your First SQL Statement

After the Semantic Store is ready and enrichment has completed, call the NL2SQL API to generate SQL from natural language.

import json
import oci
from oci.base_client import BaseClient

INFERENCE_BASE_URL = "https://inference.generativeai.<region>.oci.oraclecloud.com"
API_VERSION = "20260325"
SEMANTIC_STORE_ID = "ocid1.generativeaisemanticstore.oc1.xxx"

config = oci.config.from_file("~/.oci/config", "oc1")
signer = oci.auth.signers.SecurityTokenSigner(config)

client = BaseClient(
    service_endpoint=INFERENCE_BASE_URL,
    signer=signer,
    retry_strategy=None,
)

resource_path = (
    f"/{API_VERSION}/semanticStores/{SEMANTIC_STORE_ID}/actions/generateSqlFromNl"
)

body = {
    "displayName": "Generate SQL example",
    "description": "Generate SQL from natural language",
    "inputNaturalLanguageQuery": "Give me last week's order details."
}

resp = client.call_api(
    resource_path=resource_path,
    method="POST",
    header_params={"content-type": "application/json"},
    body=body,
)

print("HTTP status:", resp.status)
print("opc-request-id:", resp.headers.get("opc-request-id"))

data = resp.data
if isinstance(data, (bytes, str)):
    data = json.loads(data)

print(json.dumps(data, indent=2))

Observe

About Tracing Responses API Calls

Use the built-in response trace data in the OCI Responses API to understand how a request was processed. For deeper observability, you can also integrate the OCI Responses API with external observability platforms such as Langfuse.

This QuickStart shows how to inspect execution details returned by the Responses API and how to trace requests by using Langfuse.

Prerequisites

Before you begin, you must have:

  • An OCI Generative AI project
  • Authentication configured for the OCI Responses API
  • The OpenAI SDK installed
  • A working OCI Responses API client

(Optional) To integrate with Langfuse, you need a Langfuse account and credentials.

1. Inspect the Responses API Output

When you call the OCI Responses API, the response includes an output field. This field is an array of items that describe what occurred during the request.

Each item represents a step in the execution and can include different types, such as:

  • message
  • file_search_call
  • mcp_call

These output items provide visibility into how the request was processed. You can use them to:

  • Debug and understand model behavior
  • Display execution steps in a user interface
  • Build custom observability or logging workflows

For example, after you send a request, you can inspect the output field:

response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Explain the difference between structured and unstructured data."
)

for item in response.output:
    print(item.type)
2. Install the Langfuse SDK

For deeper insight, such as latency, cost, and execution traces, you can integrate the OCI Responses API with an observability platform.

One option is Langfuse, an open-source LLM engineering platform that helps developers debug, monitor, and improve LLM applications. Langfuse provides end-to-end observability for tracing agent actions, supports prompt versioning, and helps evaluate model outputs. It integrates with popular frameworks such as OpenAI, LangChain, and LlamaIndex.

Install the Langfuse SDK:

pip install langfuse
3. Configure Environment Variables

Set the required Langfuse and OCI environment variables:

LANGFUSE_SECRET_KEY="sk-lf-xxxxxxxxx"
LANGFUSE_PUBLIC_KEY="pk-lf-xxxxxxxxx"
LANGFUSE_BASE_URL="https://us.cloud.langfuse.com"

# OCI Generative AI credentials
OCI_GENAI_API_KEY="sk-xxxxxxxxx"
OCI_GENAI_PROJECT_ID="ocid1.generativeaiproject.oc1.xxx"
4. Instrument the OpenAI Client

Import the OpenAI client from the Langfuse SDK. The existing request code stays the same, but requests are automatically traced.

import os
from langfuse.openai import OpenAI  # Import from Langfuse

client = OpenAI(
    base_url="https://inference.generativeai.<region>.oci.oraclecloud.com/openai/v1",
    api_key=os.getenv("OCI_GENAI_API_KEY"),
    project=os.getenv("OCI_GENAI_PROJECT_ID"),
)
5. Send a Traced Responses API Request

After you instrument the client, send a Responses API request as usual. Langfuse automatically traces the request.

Example:

response = client.responses.create(
    model="openai.gpt-oss-120b",
    tools=[
        {
            "type": "mcp",
            "server_label": "dmcp",
            "server_url": "https://mcp.example.com/mcp",
            "require_approval": "never",
        },
    ],
    input="Explain why tracing and observability are important in distributed systems."
)

print(response.output_text)