OpenAI’s most advanced model response generation interface with multimodal input, tools, and stateful conversations
responses is OpenAI’s most advanced model response generation interface, supporting richer interactive capabilities and tool integration. This endpoint follows the OpenAI Responses API format and provides enhanced features beyond the standard chat completions endpoint.
openapi.json. Scroll down to see the interactive API reference./v1/chat/completions and /v1/responses?/v1/responses endpoint is OpenAI’s more advanced interface that offers:
/v1/chat/completions for standard chat interactions with most models. Use /v1/responses when you need advanced features or are using Pro series models.
tools array:
previous_response_id to create multi-turn conversations:
conversation parameter to manage conversation state automatically.
tools array:
stream: true to enable Server-Sent Events streaming:
Bearer token authentication. Include your API key in the Authorization header as 'Bearer YOUR_API_KEY'
Request object for the Responses API. Supports multimodal input, tools, function calling, and stateful conversations.
ID of the model to use. Specifies the model used to generate the completion message. For OpenAI Pro series models, use this endpoint instead of chat/completions.
"gpt-4.1"
Input parameters containing roles and message content. Can be a simple string or an array of input items for multimodal inputs (text, images, files).
A system (or developer) message inserted into the model's context. When used along with previous_response_id, the instructions from a previous response will not be carried over to the next response.
An array of tools the model may call while generating a response. Supports built-in tools (web_search_preview, file_search, code_interpreter), MCP tools, and custom function calls.
How the model should select which tool (or tools) to use when generating a response. auto lets the model decide, none disables tools, required forces tool use.
auto, none, required Whether to enable streaming response. When set to true, the response will be returned as Server-Sent Events (SSE).
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
0 <= x <= 2An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
0 <= x <= 1An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.
x >= 1The unique ID of the previous response to the model. Use this to create multi-turn conversations. Cannot be used in conjunction with conversation.
The conversation that this response belongs to. Items from this conversation are prepended to input_items for this response request.
Configuration options for reasoning models (gpt-5 and o-series models only).
Whether to run the model response in the background. When set to true, the API will return immediately and process the response asynchronously.
Specify additional output data to include in the model response. Currently supported values include web_search_call.action.sources, code_interpreter_call.outputs, computer_call_output.output.image_url, file_search_call.results, message.input_image.image_url, message.output_text.logprobs, and reasoning.encrypted_content.
web_search_call.action.sources, code_interpreter_call.outputs, computer_call_output.output.image_url, file_search_call.results, message.input_image.image_url, message.output_text.logprobs, reasoning.encrypted_content The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.
x >= 1Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
Output types that you would like the model to generate for this request. text is the default.
text, audio Whether to allow the model to run tool calls in parallel. When enabled, the model can make multiple tool calls simultaneously.
Reference to a prompt template and its variables. Learn more about reusable prompts.
Used by OpenAI to cache responses for similar requests to optimize your cache hit rates. Replaces the user field. Learn more about prompt caching.
A stable identifier used to help detect users of your application that may be violating OpenAI's usage policies. The IDs should be a string that uniquely identifies each user. We recommend hashing their username or email address, in order to avoid sending us any identifying information.
Whether or not to store the output of this response request for use in our model distillation or evals products. Learn more about conversation state.
Configuration options for a text response from the model. Can be plain text or structured JSON data.
The truncation strategy to use for the model response. auto: If the input exceeds the model's context window size, the model will truncate the response by dropping items from the beginning of the conversation. disabled (default): If the input size will exceed the context window size for a model, the request will fail with a 400 error.
auto, disabled Successful response generation
Response object from the Responses API. Contains the generated response, which may include text, tool calls, and other structured content.
Model ID used to generate the response, like gpt-4o or o3. OpenAI offers a wide range of models with different capabilities, performance characteristics, and price points.
The generated response. Can be a simple string or a structured object with items containing messages, tool calls, reasoning steps, and other content.
Text, image, or file inputs to the model, used to generate a response. This reflects the input that was sent in the request.
Token usage statistics for the request
Unique identifier for this response. Can be used with previous_response_id for multi-turn conversations.
Unix timestamp (in seconds) when the response was created
Metadata associated with the response, if provided in the request