OpenAI Responses API

Overview

responses is OpenAI’s most advanced model response generation interface, supporting richer interactive capabilities and tool integration. This endpoint follows the OpenAI Responses API format and provides enhanced features beyond the standard chat completions endpoint.

Core Features

Multimodal Input: Supports text, image, and file inputs
Text Output: Generates high-quality text responses
Stateful Interaction: Uses outputs from previous responses as subsequent inputs, maintaining conversation coherence
Built-in Tools: Integrates file search, web search, code interpreter, and other functions
Function Calling: Allows models to access external systems and data sources
Streaming Support: Real-time streaming responses via Server-Sent Events (SSE)
Reasoning Models: Supports reasoning configuration for gpt-5 and o-series models

Important Notes

Model VariationsDifferent model providers may support different request parameters and return varying response fields. We strongly recommend consulting the model catalog for complete parameter lists and usage instructions for each model.

Response Pass-through PrincipleWisdom Gate typically does not modify model responses beyond reverse-engineered formats, ensuring you receive response content consistent with the original API provider.

When to Use Responses APIUse the /v1/responses endpoint for OpenAI Pro series models (like o3-pro, o3-mini) and when you need advanced features like built-in tools, multimodal inputs, or stateful conversations. For standard chat completions, use /v1/chat/completions.

Auto-Generated DocumentationThe request parameters and response format are automatically generated from the OpenAPI specification. All parameters, their types, descriptions, defaults, and examples are pulled directly from openapi.json. Scroll down to see the interactive API reference.

FAQ

What’s the difference between `/v1/chat/completions` and `/v1/responses`?

The /v1/responses endpoint is OpenAI’s more advanced interface that offers:

Built-in tools: Web search, file search, code interpreter
Multimodal inputs: Support for images and files in addition to text
Stateful conversations: Better conversation state management
Required for Pro models: OpenAI Pro series models (o3-pro, o3-mini) must use this endpoint

Use /v1/chat/completions for standard chat interactions with most models. Use /v1/responses when you need advanced features or are using Pro series models.

How do I use multimodal inputs (text + images)?

You can combine text and images in a single request:

response = client.responses.create(
    model="gpt-4.1",
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_text",
                    "text": "What is in this image?"
                },
                {
                    "type": "input_image",
                    "image_url": "https://example.com/image.jpg"
                }
            ]
        }
    ]
)

How do I use built-in tools like web search?

Enable built-in tools by including them in the tools array:

response = client.responses.create(
    model="gpt-4.1",
    input="What was a positive news story from today?",
    tools=[
        {"type": "web_search_preview"}
    ]
)

How do I maintain conversation state?

Use previous_response_id to create multi-turn conversations:

# First message
response1 = client.responses.create(
    model="gpt-4.1",
    input="Hello, my name is Alice."
)

# Follow-up message
response2 = client.responses.create(
    model="gpt-4.1",
    input="What's my name?",
    previous_response_id=response1.id
)

Alternatively, use the conversation parameter to manage conversation state automatically.

How do I use function calling?

Define custom functions and include them in the tools array:

response = client.responses.create(
    model="gpt-4.1",
    input="What is the weather like in Boston today?",
    tools=[
        {
            "type": "function",
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location", "unit"]
            }
        }
    ],
    tool_choice="auto"
)

How do I use reasoning models (o3, gpt-5)?

For reasoning models, you can configure reasoning effort:

response = client.responses.create(
    model="o3-mini",
    input="How much wood would a woodchuck chuck?",
    reasoning={
        "effort": "high"  # Options: minimal, low, medium, high
    }
)

Higher effort values result in more thorough reasoning but may take longer and use more tokens.

How do I enable streaming?

Set stream: true to enable Server-Sent Events streaming:

stream = client.responses.create(
    model="gpt-4.1",
    input="Tell me a story",
    stream=True
)

for chunk in stream:
    # Process streaming chunks
    print(chunk, end="")

Authorizations

Authorization

string

header

required

Bearer token authentication. Include your API key in the Authorization header as 'Bearer YOUR_API_KEY'

Body

application/json

Request object for the Responses API. Supports multimodal input, tools, function calling, and stateful conversations.

model

string

default:o3-pro

required

ID of the model to use. Specifies the model used to generate the completion message. For OpenAI Pro series models, use this endpoint instead of chat/completions.

Example:

"gpt-4.1"

input

required

Input parameters containing roles and message content. Can be a simple string or an array of input items for multimodal inputs (text, images, files).

instructions

string

A system (or developer) message inserted into the model's context. When used along with previous_response_id, the instructions from a previous response will not be carried over to the next response.

tools

(object | string)[]

An array of tools the model may call while generating a response. Supports built-in tools (web_search_preview, file_search, code_interpreter), MCP tools, and custom function calls.

Show child attributes

tool_choice

enum<string>

How the model should select which tool (or tools) to use when generating a response. auto lets the model decide, none disables tools, required forces tool use.

Available options:

auto,

none,

required

stream

boolean

default:false

Whether to enable streaming response. When set to true, the response will be returned as Server-Sent Events (SSE).

temperature

number

default:1

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

Required range: 0 <= x <= 2

top_p

number

default:1

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.

Required range: 0 <= x <= 1

max_output_tokens

integer

An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.

Required range: x >= 1

previous_response_id

string

The unique ID of the previous response to the model. Use this to create multi-turn conversations. Cannot be used in conjunction with conversation.

conversation

string

The conversation that this response belongs to. Items from this conversation are prepended to input_items for this response request.

reasoning

object

Configuration options for reasoning models (gpt-5 and o-series models only).

Show child attributes

background

boolean

default:false

Whether to run the model response in the background. When set to true, the API will return immediately and process the response asynchronously.

include

enum<string>[]

Specify additional output data to include in the model response. Currently supported values include web_search_call.action.sources, code_interpreter_call.outputs, computer_call_output.output.image_url, file_search_call.results, message.input_image.image_url, message.output_text.logprobs, and reasoning.encrypted_content.

Available options:

web_search_call.action.sources,

code_interpreter_call.outputs,

computer_call_output.output.image_url,

file_search_call.results,

message.input_image.image_url,

message.output_text.logprobs,

reasoning.encrypted_content

max_tool_calls

integer

The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.

Required range: x >= 1

metadata

object

Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.

Show child attributes

modalities

enum<string>[]

Output types that you would like the model to generate for this request. text is the default.

Available options:

text,

audio

parallel_tool_calls

boolean

default:false

Whether to allow the model to run tool calls in parallel. When enabled, the model can make multiple tool calls simultaneously.

prompt

object

Reference to a prompt template and its variables. Learn more about reusable prompts.

prompt_cache_key

string

Used by OpenAI to cache responses for similar requests to optimize your cache hit rates. Replaces the user field. Learn more about prompt caching.

safety_identifier

string

A stable identifier used to help detect users of your application that may be violating OpenAI's usage policies. The IDs should be a string that uniquely identifies each user. We recommend hashing their username or email address, in order to avoid sending us any identifying information.

store

boolean

default:true

Whether or not to store the output of this response request for use in our model distillation or evals products. Learn more about conversation state.

text

object

Configuration options for a text response from the model. Can be plain text or structured JSON data.

Show child attributes

truncation

enum<string>

default:disabled

The truncation strategy to use for the model response. auto: If the input exceeds the model's context window size, the model will truncate the response by dropping items from the beginning of the conversation. disabled (default): If the input size will exceed the context window size for a model, the request will fail with a 400 error.

Available options:

auto,

disabled

Response

Successful response generation

Response object from the Responses API. Contains the generated response, which may include text, tool calls, and other structured content.

model

string

required

Model ID used to generate the response, like gpt-4o or o3. OpenAI offers a wide range of models with different capabilities, performance characteristics, and price points.

response

required

The generated response. Can be a simple string or a structured object with items containing messages, tool calls, reasoning steps, and other content.

input

string

Text, image, or file inputs to the model, used to generate a response. This reflects the input that was sent in the request.

usage

object

Token usage statistics for the request

Show child attributes

string

Unique identifier for this response. Can be used with previous_response_id for multi-turn conversations.

created

integer

Unix timestamp (in seconds) when the response was created

metadata

object

Metadata associated with the response, if provided in the request

Show child attributes

Text Models

Image Models

Video Models

Error Handling

Overview

Core Features

Important Notes

FAQ

What’s the difference between `/v1/chat/completions` and `/v1/responses`?

How do I use multimodal inputs (text + images)?

How do I use built-in tools like web search?

How do I maintain conversation state?

How do I use function calling?

How do I use reasoning models (o3, gpt-5)?

How do I enable streaming?

Authorizations

Body

Response

Text Models

Image Models

Video Models

Error Handling

​Overview

​Core Features

​Important Notes

​FAQ

​What’s the difference between /v1/chat/completions and /v1/responses?

​How do I use multimodal inputs (text + images)?

​How do I use built-in tools like web search?

​How do I maintain conversation state?

​How do I use function calling?

​How do I use reasoning models (o3, gpt-5)?

​How do I enable streaming?

Authorizations

Body

Response

Overview

Core Features

Important Notes

FAQ

What’s the difference between `/v1/chat/completions` and `/v1/responses`?

How do I use multimodal inputs (text + images)?

How do I use built-in tools like web search?

How do I maintain conversation state?

How do I use function calling?

How do I use reasoning models (o3, gpt-5)?

How do I enable streaming?