Using Short-Term Memory Compaction for Conversations

In the Conversations API, when short-term memory compaction is enabled, OCI Generative AI automatically compacts earlier conversation history into a smaller representation as the conversation grows. This helps preserve important context while reducing token usage and latency.

When sending requests, you don't need to manage the compaction. You can continue sending requests with the same conversation ID, and the service handles the compaction.

Example:

# first turn
response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="I'm planning a team offsite. We prefer outdoor activities, a moderate budget, and vegetarian-friendly food options.",
    conversation=conversation1.id
)

# second turn
response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="We also need the location to be within a two-hour drive from San Francisco.",
    conversation=conversation1.id
)

# third turn
response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Please avoid destinations that are usually crowded on weekends.",
    conversation=conversation1.id
)

# fourth turn
response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Now recommend three offsite options based on those preferences.",
    conversation=conversation1.id
)

As the conversation grows, OCI Generative AI can compact earlier turns automatically while preserving the important details needed for later responses.

Oracle Cloud Infrastructure Documentation

Using Short-Term Memory Compaction for Conversations