OCI OpenAI-Compatible Endpoints

Use OCI Generative AI OpenAI-compatible endpoints to call Enterprise AI models and build Enterprise AI agents through familiar OpenAI-style API. Call these endpoints to reach supported OpenAI request patterns while keeping authentication, execution, and resource management within OCI.

Enterprise AI Models
Call supported hosted models or imported models with the Responses or Chat Completions API.
Enterprise AI Agents

Use the Responses API as the primary OpenAI-compatible API for agentic workloads. You can use it together with supported agent tools, agent memory capabilities, and low-level foundational agent building blocks such as Files, Vector Stores, and Containers.

In addition to OpenAI-compatible endpoints, OCI Generative AI also provides OCI-native inference API through a separate endpoint for chat, embedding, and rerank tasks.

Understanding OCI OpenAI-Compatible Endpoints

The OpenAI-compatible base endpoint is:

https://inference.generativeai.${region}.oci.oraclecloud.com/openai/v1

You can use the base endpoint with OCI supported OpenAI-style paths.

Example paths:

  • /responses
  • /conversations
  • /containers
  • /files

Key Benefits

Although the API format is OpenAI-compatible, the implementation is fully integrated with OCI:

  • Authentication uses OCI Generative AI API keys or OCI IAM-based authentication, not OpenAI credentials.
  • Requests are routed to OCI Generative AI inference endpoints in a supported OCI region.
  • Resources such as files and containers are created and managed in OCI.
  • Data processing remains within OCI infrastructure.
  • Existing applications built for the OpenAI API can often be adapted with minimal code changes, typically by updating the base URL, authentication method, and model name.

For example, a request to /openai/v1/containers creates and manages a container resource in OCI Generative AI.

Supported Endpoints

Important

Use OCI OpenAI-compatible endpoints only with supported models in supported regions.

For Model Inference and Agentic Workflows

To access supported hosted and imported models through the OCI OpenAI-compatible API for model inference and agentic workflows, use the following endpoints.

Base URL: https://inference.generativeai.${region}.oci.oraclecloud.com/openai/v1

API Endpoint Path Suggested Usage
Responses API /responses Use this primary interface to call models and generate responses. Optionally, include supported tools and conversation IDs for context.
Conversations API /conversations Use this persistent, stateful interface for managing multi-turn conversation history. Include conversation ID in the Responses API, which remains the primary endpoint for generating model responses.
Chat Completions API /chat/completions Use this state-less chat-style interface and predecessor to the stateful Conversations API if you already have application code built around the Chat Completions API or if you need a simpler chat-only interface. Otherwise, use the Conversations API together with the Responses API.

Agent-Building Components

For agentic workloads, the OCI OpenAI-compatible APIs include the following building blocks:

API Endpoint Path Suggested Usage
Files API /files For uploading and managing files
Vector Store Files API /vector_stores/{id}/files For managing files attached to a vector store.
Vector Store File Batches API /vector_stores/{id}/file_batches For adding and managing a batch of vector store files at the same time.
Vector Store Search API /vector_stores/{id}/search For running direct searches against a vector store.
Containers API /containers For creating and managing sandbox containers to use in agent workflows.
Container Files API /containers/{id}/files For managing files in a sandbox container.

Recommendation

For most new agentic workloads, use the Responses API as the primary entry point.

In many cases, you can select a supported model, optionally include conversation context, declare supported tools in the request, and send the request through the Responses API. OCI Generative AI then handles model execution and tool use as part of that workflow.

If needed, you can also combine the Responses API with lower-level foundational APIs such as Files, Vector Stores, and Containers.

This approach is useful when you want to:

  • Use supported models through a single API.
  • Declare tools directly in the request.
  • Build agentic workflows with OCI-managed execution.
  • Add conversation context through the Conversations API.
  • Combine model requests with files, vector stores, or containers when needed.

Example: Using Tools

For example, to use MCP Calling, specify a model and declare the MCP tool in the Responses API request. You don't need a separate MCP-specific API.

response = client.responses.create(
    model="openai.gpt-oss-120b",
    tools=[
        {
            "type": "mcp",
            "server_url": "https://example.com/mcp",
        }
    ],
    input="What events are scheduled for 2026-04-02?"
)

Example: Using Conversation History

For conversation context, first create a conversation.

conversation = client.conversations.create()

Then send the conversation id in the Responses API request for a multi-turn conversation.

response = client.responses.create(
    model="openai.gpt-oss-120b",
    input=[
        {
            "role": "user",
            "content": "Recommend a restaurant based on the food that I like."
        }
    ],
    conversation=conversation.id,
)