跳转到主要内容
POST
/
v1
/
chat
/
completions
curl --request POST \
  --url https://wisdom-gate.juheapi.com/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "gpt-4",
  "messages": [
    {
      "role": "user",
      "content": "Hello! How can you help me?"
    }
  ]
}
'
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I can help you with a wide variety of tasks. I can answer questions, provide explanations, help with coding, writing, analysis, and much more. What would you like to know or work on?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  }
}

概述

chat/completions 是大语言模型最常用的 API 端点,它接收由多条消息组成的对话列表作为输入来获取模型响应。此端点遵循 OpenAI Chat Completions API 格式,可轻松与现有的兼容 OpenAI 的代码集成。

重要说明

模型差异不同的模型提供商可能支持不同的请求参数并返回不同的响应字段。我们强烈建议查阅 模型目录 了解每个模型的完整参数列表和使用说明。
响应透传原则Wisdom Gate 通常不会在逆向格式之外修改模型响应,确保您收到与原始 API 提供商一致的响应内容。
流式支持Wisdom Gate 支持用于流式响应的服务器发送事件(SSE)。在请求中设置 "stream": true 即可启用实时流式传输,这对聊天应用很有用。

参考文档

有关 chat/completions 端点的更多详情,我们建议参阅 OpenAI 官方文档 OpenAI 相关指南:
自动生成的文档请求参数和响应格式从 OpenAPI 规范自动生成。所有参数、其类型、描述、默认值和示例都直接从 openapi.json 提取。向下滚动查看交互式 API 参考。

常见问题

如何处理速率限制?

遇到 429 Too Many Requests 时,我们建议实现指数退避重试:
import time
import random

def chat_with_retry(messages, max_retries=3):
    for i in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="gpt-4",
                messages=messages
            )
            return response
        except RateLimitError:
            if i < max_retries - 1:
                wait_time = (2 ** i) + random.random()
                time.sleep(wait_time)
            else:
                raise

如何维护对话上下文?

messages 数组中包含完整的对话历史:
conversation_history = [
    {"role": "system", "content": "你是一个有用的助手"},
    {"role": "user", "content": "什么是 Python?"},
    {"role": "assistant", "content": "Python 是一种编程语言..."},
    {"role": "user", "content": "它有什么优势?"}
]

response = client.chat.completions.create(
    model="gpt-4",
    messages=conversation_history
)

finish_reason 是什么意思?

含义
stop自然完成
length达到 max_tokens 限制
content_filter触发了内容过滤器
function_call模型调用了函数

如何控制成本?

  1. 使用 max_tokens 限制输出长度
  2. 选择合适的模型(例如,GPT-3.5 Turbo 比 GPT-4 更经济)
  3. 精简提示词,避免冗余上下文
  4. 监控令牌消耗,查看响应中的 usage 字段

如何使用流式传输?

通过设置 stream: true 启用流式传输:
stream = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "给我讲个故事"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

授权

Authorization
string
header
必填

Bearer token authentication. Include your API key in the Authorization header as 'Bearer YOUR_API_KEY'

请求体

application/json
model
string
必填

ID of the model to use. Specifies the model ID to use for generating responses. See the model catalog for available models and which models work with the Chat API.

示例:

"gpt-4"

messages
object[]
必填

A list of messages comprising the conversation so far. List of conversation messages containing roles and content. Each message should include a role (system, user, or assistant) and content (the message text).

Minimum array length: 1
temperature
number
默认值:1

Controls the randomness of responses, range 0-2. Lower values (e.g., 0.2) make the output more deterministic and focused, while higher values (e.g., 1.8) make it more random and creative. It's not recommended to adjust both temperature and top_p simultaneously.

必填范围: 0 <= x <= 2
top_p
number
默认值:1

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. It's not recommended to adjust both temperature and top_p simultaneously.

必填范围: 0 <= x <= 1
n
integer
默认值:1

How many chat completion choices to generate for each input message. Range: 1-128.

必填范围: 1 <= x <= 128
stream
boolean
默认值:false

Whether to enable streaming response. When set to true, the response will be returned in chunks as Server-Sent Events (SSE). Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. This is useful for real-time chat applications.

stop

Up to 4 sequences where the API will stop generating further tokens.

max_tokens
integer

Limits the maximum number of tokens to generate. The total length of input tokens and generated tokens is limited by the model's context length.

必填范围: x >= 1
presence_penalty
number
默认值:0

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

必填范围: -2 <= x <= 2
frequency_penalty
number
默认值:0

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

必填范围: -2 <= x <= 2
logit_bias
object

Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling.

响应

Successful chat completion response

Response object for chat completion. Contains the generated response, metadata, and token usage information.

id
string
必填

A unique identifier for the chat completion

object
enum<string>
必填

The object type, which is always 'chat.completion' for non-streaming responses, or 'chat.completion.chunk' for streaming responses

可用选项:
chat.completion,
chat.completion.chunk
created
integer
必填

The Unix timestamp (in seconds) of when the chat completion was created

model
string
必填

The model used for the chat completion

choices
object[]
必填

A list of chat completion choices. Can be more than one if n is greater than 1. Each choice contains the generated message and finish reason.

usage
object

Token usage statistics for the request