Hosted Applications
An application is an OCI resource that defines how a hosted Generative AI workload runs and how it's accessed. Applications centralize operational settings such as autoscaling, managed storage, networking, and authentication. Hosted deployments created within an application inherit these settings.
An application is an OCI resource that defines how a hosted Generative AI workload runs and how it's accessed. Applications centralize operational settings such as autoscaling, managed storage, networking, and authentication. Hosted deployments created within an application inherit these settings.
An application setting can include:
- Scaling settings for hosted deployments (minimum and maximum replicas and autoscaling metric)
- Managed storage options and runtime environment variables
- Networking settings for outbound traffic (egress) and endpoint access
- Authentication settings using an identity domain
Scaling
Scaling settings define how hosted deployments associated with an application add or remove replicas to handle load. You set a minimum and maximum number of replicas and select an autoscaling metric, such as concurrent requests, requests per second (RPS), CPU utilization, or memory utilization.
Managed storage and runtime variables
Managed storage provides service-managed stateful storage options that can be used by hosted deployments associated with an application. When enabled, connection details are provided to the container through environment variables.
You can also define other runtime variables that are injected into the container at runtime.
Authentication
Authentication settings control how requests are authenticated before they're routed to the hosted deployment. Applications support OAuth 2.0 authentication using an identity domain. See Setting up Authentication for Agentic Support
Networking
Networking settings control how hosted deployments associated with an application route outbound traffic (egress) and how the active deployment is accessed through a public or private endpoint.
By default, each deployed application is configured with outbound access to the public internet. This allows agent workloads to reach external resources such as public MCP servers, third-party APIs, foundation model endpoints, and other internet-hosted tools required for typical AI workflows.
An application can also operate in Customer Networking Mode. For this option, you specify a target subnet within a VCN in the tenancy. The platform then establishes a secure private network connection between the agent workload and the selected subnet using a Private Endpoint / Reverse Connection Endpoint (PE/RCE) mechanism.
After enabled, all outbound (egress) traffic from the agent is routed through the customer-specified subnet. As a result:
- The agent can securely access private resources within your network (for example, databases, Compute instances, internal services).
- Traffic remains within private network boundaries.
- Network security controls such as network security groups, (NSG)s, route tables, and firewalls in your VCN govern outbound connectivity.
- Public internet egress can be restricted or disabled according to enterprise security requirements.
This model provides flexibility to support both internet-facing AI workloads and fully private, enterprise-integrated deployments, while maintaining clear network isolation boundaries between the platform and customer environments.
Endpoints
By default, each deployed application is provisioned with a public endpoint that allows your clients to invoke the agent over the internet, subject to configured authentication and authorization controls.
For use cases requiring private network access, you can create a Private Endpoint (PE) within the GenAI platform. The Private Endpoint enables invocation through a private IP address and internal DNS resolution. Clients within the connected private network (for example, VCN, on-premises through FastConnect/VPN, or peered networks) can then invoke the agent using the Private Endpoint’s fully qualified domain name (FQDN).
This setup enables:
- Elimination of public internet exposure
- Traffic confinement within private network boundaries
- Alignment with enterprise network security and compliance requirements
Supported Transport Protocols
After an agent is deployed, clients invoke it through the provisioned endpoint. The transport protocol depends on the agent server implementation and the interaction model required (request/response, streaming, or bidirectional sessions).
The supported protocols include:
HTTP
HTTP is the most widely supported invocation model.
- Interaction model: Stateless request/response
- Transport: HTTP/1.1 or HTTP/2 over TLS
- Use case: Synchronous API calls and short-lived inference requests
In this mode, the client sends an HTTP request (typically POST with a JSON payload). The server returns a single response after processing completes.
SSE (Server-Sent Events)
Server-Sent Events (SSE) is a unidirectional streaming protocol built on top of HTTP.
- Interaction model: Client to server (single request), server to client (streamed response)
- Transport: HTTP with
Content-Type: text/event-stream - Use case: Streaming responses (for example, token-by-token output)
In this mode, the client sends a request and the server keeps the connection open while streaming incremental results as events.
WebSocket (Full Duplex Streaming)
WebSocket provides persistent, bidirectional communication between client and server.
- Interaction model: Full duplex (client and server can send messages at any time)
- Transport: WebSocket protocol (
wss://) - Use case: Interactive agents, real-time tool execution, and multi-turn sessions
After the initial HTTP upgrade handshake, the connection remains open, enabling bidirectional message exchange over a persistent channel.
Managed Storage
AI agents require stateful services to support short-term memory, checkpoints, caching, and context storage. To simplify operations and reduce management overhead, the platform provides fully managed storage for hosted applications.
When deploying an agent, you can select one or more of the following managed storage options:
- PostgreSQL
- OCI Cache
- Oracle Autonomous Database
These storage services are automatically provisioned and configured for your application.
How Managed Storage Works
Managed storage differs from storage that you provision directly in your own tenancy.
- Service-managed deployment: Managed storage is deployed in the service tenancy, not in your tenancy. It is accessible only by the associated hosted application and is not exposed for direct external access (for example, through local database clients or public endpoints).
- Application-scoped access: Only the specific deployed application can access its managed storage instance. Access is controlled internally by the platform, and you do not need to configure networking, authentication, or credentials manually.
- Lifecycle integration: Managed storage is tightly coupled with the lifecycle of your agent:
- When you deploy an agent, the storage is automatically created.
- When you scale the agent, storage scales accordingly (where supported).
- When you delete the agent, the associated storage is also deleted.
- No DBA-level administration: Because the storage is fully managed by the platform:
- You do not have DBA-level permissions.
- You cannot access the underlying infrastructure.
Once the agent is deleted, the managed storage is permanently removed and cannot be recovered.
When to Use Customer-Managed Storage
In some scenarios, you may need:
- Storage whose lifecycle is independent from the agent
- Full administrative control over database configuration
- Direct access from other systems or tools
- Custom extensions, tuning, or cross-application sharing
In these cases, you can provision storage resources in your own VCN and tenancy. Then, configure the agent to connect to those resources using Customer Networking Mode (described in the previous section).
This option provides maximum flexibility while allowing you to retain full control over your infrastructure.
Limits
For limits such as the number of allowed applications or artifacts per tenancy, see Application Limits
Manage
You can perform the following tasks to create and list applications: