使用兼容 OpenAI 的 API 格式创建对话补全,支持多种 AI 模型
chat/completions 是大语言模型最常用的 API 端点,它接收由多条消息组成的对话列表作为输入来获取模型响应。此端点遵循 OpenAI Chat Completions API 格式,可轻松与现有的兼容 OpenAI 的代码集成。
chat/completions 端点的更多详情,我们建议参阅 OpenAI 官方文档。
OpenAI 相关指南:
openapi.json 提取。向下滚动查看交互式 API 参考。429 Too Many Requests 时,我们建议实现指数退避重试:
messages 数组中包含完整的对话历史:
| 值 | 含义 |
|---|---|
stop | 自然完成 |
length | 达到 max_tokens 限制 |
content_filter | 触发了内容过滤器 |
function_call | 模型调用了函数 |
max_tokens 限制输出长度usage 字段stream: true 启用流式传输:
Bearer token authentication. Include your API key in the Authorization header as 'Bearer YOUR_API_KEY'
ID of the model to use. Specifies the model ID to use for generating responses. See the model catalog for available models and which models work with the Chat API.
"gpt-4"
A list of messages comprising the conversation so far. List of conversation messages containing roles and content. Each message should include a role (system, user, or assistant) and content (the message text).
1Controls the randomness of responses, range 0-2. Lower values (e.g., 0.2) make the output more deterministic and focused, while higher values (e.g., 1.8) make it more random and creative. It's not recommended to adjust both temperature and top_p simultaneously.
0 <= x <= 2An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. It's not recommended to adjust both temperature and top_p simultaneously.
0 <= x <= 1How many chat completion choices to generate for each input message. Range: 1-128.
1 <= x <= 128Whether to enable streaming response. When set to true, the response will be returned in chunks as Server-Sent Events (SSE). Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. This is useful for real-time chat applications.
Up to 4 sequences where the API will stop generating further tokens.
Limits the maximum number of tokens to generate. The total length of input tokens and generated tokens is limited by the model's context length.
x >= 1Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
-2 <= x <= 2Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
-2 <= x <= 2Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling.
Successful chat completion response
Response object for chat completion. Contains the generated response, metadata, and token usage information.
A unique identifier for the chat completion
The object type, which is always 'chat.completion' for non-streaming responses, or 'chat.completion.chunk' for streaming responses
chat.completion, chat.completion.chunk The Unix timestamp (in seconds) of when the chat completion was created
The model used for the chat completion
A list of chat completion choices. Can be more than one if n is greater than 1. Each choice contains the generated message and finish reason.
Token usage statistics for the request