对话补全

概述

chat/completions 是大语言模型最常用的 API 端点，它接收由多条消息组成的对话列表作为输入来获取模型响应。此端点遵循 OpenAI Chat Completions API 格式，可轻松与现有的兼容 OpenAI 的代码集成。

重要说明

模型差异不同的模型提供商可能支持不同的请求参数并返回不同的响应字段。我们强烈建议查阅模型目录了解每个模型的完整参数列表和使用说明。

响应透传原则Wisdom Gate 通常不会在逆向格式之外修改模型响应，确保您收到与原始 API 提供商一致的响应内容。

流式支持Wisdom Gate 支持用于流式响应的服务器发送事件（SSE）。在请求中设置 "stream": true 即可启用实时流式传输，这对聊天应用很有用。

参考文档

有关 chat/completions 端点的更多详情，我们建议参阅 OpenAI 官方文档。 OpenAI 相关指南：

自动生成的文档请求参数和响应格式从 OpenAPI 规范自动生成。所有参数、其类型、描述、默认值和示例都直接从 openapi.json 提取。向下滚动查看交互式 API 参考。

常见问题

如何处理速率限制？

遇到 429 Too Many Requests 时，我们建议实现指数退避重试：

import time
import random

def chat_with_retry(messages, max_retries=3):
    for i in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="gpt-4",
                messages=messages
            )
            return response
        except RateLimitError:
            if i < max_retries - 1:
                wait_time = (2 ** i) + random.random()
                time.sleep(wait_time)
            else:
                raise

如何维护对话上下文？

在 messages 数组中包含完整的对话历史：

conversation_history = [
    {"role": "system", "content": "你是一个有用的助手"},
    {"role": "user", "content": "什么是 Python？"},
    {"role": "assistant", "content": "Python 是一种编程语言..."},
    {"role": "user", "content": "它有什么优势？"}
]

response = client.chat.completions.create(
    model="gpt-4",
    messages=conversation_history
)

finish_reason 是什么意思？

值	含义
`stop`	自然完成
`length`	达到 max_tokens 限制
`content_filter`	触发了内容过滤器
`function_call`	模型调用了函数

如何控制成本？

使用 max_tokens 限制输出长度
选择合适的模型（例如，GPT-3.5 Turbo 比 GPT-4 更经济）
精简提示词，避免冗余上下文
监控令牌消耗，查看响应中的 usage 字段

如何使用流式传输？

通过设置 stream: true 启用流式传输：

stream = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "给我讲个故事"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

授权

Authorization

string

header

必填

Bearer token authentication. Include your API key in the Authorization header as 'Bearer YOUR_API_KEY'

请求体

application/json

model

string

必填

ID of the model to use. Specifies the model ID to use for generating responses. See the model catalog for available models and which models work with the Chat API.

示例:

"gpt-4"

messages

object[]

必填

A list of messages comprising the conversation so far. List of conversation messages containing roles and content. Each message should include a role (system, user, or assistant) and content (the message text).

Minimum array length: 1

显示子属性

temperature

number

默认值:1

Controls the randomness of responses, range 0-2. Lower values (e.g., 0.2) make the output more deterministic and focused, while higher values (e.g., 1.8) make it more random and creative. It's not recommended to adjust both temperature and top_p simultaneously.

必填范围: 0 <= x <= 2

top_p

number

默认值:1

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. It's not recommended to adjust both temperature and top_p simultaneously.

必填范围: 0 <= x <= 1

integer

默认值:1

How many chat completion choices to generate for each input message. Range: 1-128.

必填范围: 1 <= x <= 128

stream

boolean

默认值:false

Whether to enable streaming response. When set to true, the response will be returned in chunks as Server-Sent Events (SSE). Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. This is useful for real-time chat applications.

stop

Up to 4 sequences where the API will stop generating further tokens.

max_tokens

integer

Limits the maximum number of tokens to generate. The total length of input tokens and generated tokens is limited by the model's context length.

必填范围: x >= 1

presence_penalty

number

默认值:0

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

必填范围: -2 <= x <= 2

frequency_penalty

number

默认值:0

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

必填范围: -2 <= x <= 2

logit_bias

object

Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling.

显示子属性

响应

Successful chat completion response

Response object for chat completion. Contains the generated response, metadata, and token usage information.

string

必填

A unique identifier for the chat completion

object

enum<string>

必填

The object type, which is always 'chat.completion' for non-streaming responses, or 'chat.completion.chunk' for streaming responses

可用选项:

chat.completion,

chat.completion.chunk

created

integer

必填

The Unix timestamp (in seconds) of when the chat completion was created

model

string

必填

The model used for the chat completion

choices

object[]

必填

A list of chat completion choices. Can be more than one if n is greater than 1. Each choice contains the generated message and finish reason.

显示子属性

usage

object

Token usage statistics for the request

显示子属性

Text Models

Image Models

Video Models

Error Handling

概述

重要说明

参考文档

常见问题

如何处理速率限制？

如何维护对话上下文？

finish_reason 是什么意思？

如何控制成本？

如何使用流式传输？

授权

请求体

响应

Text Models

Image Models

Video Models

Error Handling

​概述

​重要说明

​参考文档

​常见问题

​如何处理速率限制？

​如何维护对话上下文？

​finish_reason 是什么意思？

​如何控制成本？

​如何使用流式传输？

授权

请求体

响应

概述

重要说明

参考文档

常见问题

如何处理速率限制？

如何维护对话上下文？

finish_reason 是什么意思？

如何控制成本？

如何使用流式传输？