Responses API

概述

responses 是 OpenAI 最先进的模型响应生成接口，支持更丰富的交互能力和工具集成。此端点遵循 OpenAI Responses API 格式，提供超越标准对话补全端点的增强功能。

核心功能

多模态输入：支持文本、图像和文件输入
文本输出：生成高质量的文本响应
有状态交互：将先前响应的输出用作后续输入，保持对话连贯性
内置工具：集成文件搜索、网络搜索、代码解释器等功能
函数调用：允许模型访问外部系统和数据源
流式支持：通过服务器发送事件（SSE）实现实时流式响应
推理模型：支持 gpt-5 和 o 系列模型的推理配置

重要说明

模型差异不同的模型提供商可能支持不同的请求参数并返回不同的响应字段。我们强烈建议查阅模型目录了解每个模型的完整参数列表和使用说明。

响应透传原则Wisdom Gate 通常不会在逆向格式之外修改模型响应，确保您收到与原始 API 提供商一致的响应内容。

何时使用 Responses API当使用 OpenAI Pro 系列模型（如 o3-pro、o3-mini）以及需要内置工具、多模态输入或有状态对话等高级功能时，请使用 /v1/responses 端点。对于标准对话补全，请使用 /v1/chat/completions。

参考文档

有关 responses 接口的更多详细信息，请参阅 OpenAI 官方文档。 OpenAI 相关指南：

自动生成的文档请求参数和响应格式从 OpenAPI 规范自动生成。所有参数、其类型、描述、默认值和示例都直接从 openapi.json 提取。向下滚动查看交互式 API 参考。

常见问题

`/v1/chat/completions` 和 `/v1/responses` 有什么区别？

/v1/responses 端点是 OpenAI 更先进的接口，提供：

内置工具：网络搜索、文件搜索、代码解释器
多模态输入：除文本外还支持图像和文件
有状态对话：更好的对话状态管理
Pro 模型必需：OpenAI Pro 系列模型（o3-pro、o3-mini）必须使用此端点

对于大多数模型的标准聊天交互，使用 /v1/chat/completions。当需要高级功能或使用 Pro 系列模型时，使用 /v1/responses。

如何使用多模态输入（文本 + 图像）？

您可以在单个请求中组合文本和图像：

response = client.responses.create(
    model="gpt-4.1",
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_text",
                    "text": "这张图片里有什么？"
                },
                {
                    "type": "input_image",
                    "image_url": "https://example.com/image.jpg"
                }
            ]
        }
    ]
)

如何使用内置工具如网络搜索？

通过在 tools 数组中包含内置工具来启用：

response = client.responses.create(
    model="gpt-4.1",
    input="今天有什么好消息？",
    tools=[
        {"type": "web_search_preview"}
    ]
)

如何维护对话状态？

使用 previous_response_id 创建多轮对话：

# 第一条消息
response1 = client.responses.create(
    model="gpt-4.1",
    input="你好，我叫 Alice。"
)

# 后续消息
response2 = client.responses.create(
    model="gpt-4.1",
    input="我叫什么名字？",
    previous_response_id=response1.id
)

或者，使用 conversation 参数自动管理对话状态。

如何使用函数调用？

定义自定义函数并将其包含在 tools 数组中：

response = client.responses.create(
    model="gpt-4.1",
    input="今天波士顿的天气怎么样？",
    tools=[
        {
            "type": "function",
            "name": "get_current_weather",
            "description": "获取指定位置的当前天气",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "城市和州，例如 San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location", "unit"]
            }
        }
    ],
    tool_choice="auto"
)

如何使用推理模型（o3、gpt-5）？

对于推理模型，您可以配置推理强度：

response = client.responses.create(
    model="o3-mini",
    input="土拨鼠能啃多少木头？",
    reasoning={
        "effort": "high"  # 选项：minimal、low、medium、high
    }
)

更高的 effort 值会产生更深入的推理，但可能需要更长时间并使用更多令牌。

如何启用流式传输？

设置 stream: true 启用服务器发送事件流式传输：

stream = client.responses.create(
    model="gpt-4.1",
    input="给我讲个故事",
    stream=True
)

for chunk in stream:
    # 处理流式块
    print(chunk, end="")

授权

Authorization

string

header

必填

Bearer token authentication. Include your API key in the Authorization header as 'Bearer YOUR_API_KEY'

请求体

application/json

Request object for the Responses API. Supports multimodal input, tools, function calling, and stateful conversations.

model

string

默认值:o3-pro

必填

ID of the model to use. Specifies the model used to generate the completion message. For OpenAI Pro series models, use this endpoint instead of chat/completions.

示例:

"gpt-4.1"

input

必填

Input parameters containing roles and message content. Can be a simple string or an array of input items for multimodal inputs (text, images, files).

instructions

string

A system (or developer) message inserted into the model's context. When used along with previous_response_id, the instructions from a previous response will not be carried over to the next response.

tools

(object | string)[]

An array of tools the model may call while generating a response. Supports built-in tools (web_search_preview, file_search, code_interpreter), MCP tools, and custom function calls.

显示子属性

tool_choice

enum<string>

How the model should select which tool (or tools) to use when generating a response. auto lets the model decide, none disables tools, required forces tool use.

可用选项:

auto,

none,

required

stream

boolean

默认值:false

Whether to enable streaming response. When set to true, the response will be returned as Server-Sent Events (SSE).

temperature

number

默认值:1

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

必填范围: 0 <= x <= 2

top_p

number

默认值:1

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.

必填范围: 0 <= x <= 1

max_output_tokens

integer

An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.

必填范围: x >= 1

previous_response_id

string

The unique ID of the previous response to the model. Use this to create multi-turn conversations. Cannot be used in conjunction with conversation.

conversation

string

The conversation that this response belongs to. Items from this conversation are prepended to input_items for this response request.

reasoning

object

Configuration options for reasoning models (gpt-5 and o-series models only).

显示子属性

background

boolean

默认值:false

Whether to run the model response in the background. When set to true, the API will return immediately and process the response asynchronously.

include

enum<string>[]

Specify additional output data to include in the model response. Currently supported values include web_search_call.action.sources, code_interpreter_call.outputs, computer_call_output.output.image_url, file_search_call.results, message.input_image.image_url, message.output_text.logprobs, and reasoning.encrypted_content.

可用选项:

web_search_call.action.sources,

code_interpreter_call.outputs,

computer_call_output.output.image_url,

file_search_call.results,

message.input_image.image_url,

message.output_text.logprobs,

reasoning.encrypted_content

max_tool_calls

integer

The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.

必填范围: x >= 1

metadata

object

Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.

显示子属性

modalities

enum<string>[]

Output types that you would like the model to generate for this request. text is the default.

可用选项:

text,

audio

parallel_tool_calls

boolean

默认值:false

Whether to allow the model to run tool calls in parallel. When enabled, the model can make multiple tool calls simultaneously.

prompt

object

Reference to a prompt template and its variables. Learn more about reusable prompts.

prompt_cache_key

string

Used by OpenAI to cache responses for similar requests to optimize your cache hit rates. Replaces the user field. Learn more about prompt caching.

safety_identifier

string

A stable identifier used to help detect users of your application that may be violating OpenAI's usage policies. The IDs should be a string that uniquely identifies each user. We recommend hashing their username or email address, in order to avoid sending us any identifying information.

store

boolean

默认值:true

Whether or not to store the output of this response request for use in our model distillation or evals products. Learn more about conversation state.

text

object

Configuration options for a text response from the model. Can be plain text or structured JSON data.

显示子属性

truncation

enum<string>

默认值:disabled

The truncation strategy to use for the model response. auto: If the input exceeds the model's context window size, the model will truncate the response by dropping items from the beginning of the conversation. disabled (default): If the input size will exceed the context window size for a model, the request will fail with a 400 error.

可用选项:

auto,

disabled

响应

Successful response generation

Response object from the Responses API. Contains the generated response, which may include text, tool calls, and other structured content.

model

string

必填

Model ID used to generate the response, like gpt-4o or o3. OpenAI offers a wide range of models with different capabilities, performance characteristics, and price points.

response

必填

The generated response. Can be a simple string or a structured object with items containing messages, tool calls, reasoning steps, and other content.

input

string

Text, image, or file inputs to the model, used to generate a response. This reflects the input that was sent in the request.

usage

object

Token usage statistics for the request

显示子属性

string

Unique identifier for this response. Can be used with previous_response_id for multi-turn conversations.

created

integer

Unix timestamp (in seconds) when the response was created

metadata

object

Metadata associated with the response, if provided in the request

显示子属性

Text Models

Image Models

Video Models

Error Handling

概述

核心功能

重要说明

参考文档

常见问题

`/v1/chat/completions` 和 `/v1/responses` 有什么区别？

如何使用多模态输入（文本 + 图像）？

如何使用内置工具如网络搜索？

如何维护对话状态？

如何使用函数调用？

如何使用推理模型（o3、gpt-5）？

如何启用流式传输？

授权

请求体

响应

Text Models

Image Models

Video Models

Error Handling

​概述

​核心功能

​重要说明

​参考文档

​常见问题

​/v1/chat/completions 和 /v1/responses 有什么区别？

​如何使用多模态输入（文本 + 图像）？

​如何使用内置工具如网络搜索？

​如何维护对话状态？

​如何使用函数调用？

​如何使用推理模型（o3、gpt-5）？

​如何启用流式传输？

授权

请求体

响应

概述

核心功能

重要说明

参考文档

常见问题

`/v1/chat/completions` 和 `/v1/responses` 有什么区别？

如何使用多模态输入（文本 + 图像）？

如何使用内置工具如网络搜索？

如何维护对话状态？

如何使用函数调用？

如何使用推理模型（o3、gpt-5）？

如何启用流式传输？