Skip to main content
POST
/
v1
/
responses
curl --request POST \
  --url https://wisdom-gate.juheapi.com/v1/responses \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "gpt-4.1",
  "input": "Tell me a three sentence bedtime story about a unicorn."
}
'
{
"model": "gpt-4.1",
"response": "Once upon a time, in a magical forest far away, there lived a beautiful unicorn with a shimmering silver horn. Every evening, as the stars began to twinkle, the unicorn would trot through the enchanted meadows, leaving a trail of sparkling stardust behind. When the moon rose high in the sky, the gentle unicorn would curl up beneath the ancient oak tree, its dreams filled with rainbows and the sweet songs of forest creatures."
}

Overview

responses is OpenAI’s most advanced model response generation interface, supporting richer interactive capabilities and tool integration. This endpoint follows the OpenAI Responses API format and provides enhanced features beyond the standard chat completions endpoint.

Core Features

  • Multimodal Input: Supports text, image, and file inputs
  • Text Output: Generates high-quality text responses
  • Stateful Interaction: Uses outputs from previous responses as subsequent inputs, maintaining conversation coherence
  • Built-in Tools: Integrates file search, web search, code interpreter, and other functions
  • Function Calling: Allows models to access external systems and data sources
  • Streaming Support: Real-time streaming responses via Server-Sent Events (SSE)
  • Reasoning Models: Supports reasoning configuration for gpt-5 and o-series models

Important Notes

Model VariationsDifferent model providers may support different request parameters and return varying response fields. We strongly recommend consulting the model catalog for complete parameter lists and usage instructions for each model.
Response Pass-through PrincipleWisdom Gate typically does not modify model responses beyond reverse-engineered formats, ensuring you receive response content consistent with the original API provider.
When to Use Responses APIUse the /v1/responses endpoint for OpenAI Pro series models (like o3-pro, o3-mini) and when you need advanced features like built-in tools, multimodal inputs, or stateful conversations. For standard chat completions, use /v1/chat/completions.

Auto-Generated DocumentationThe request parameters and response format are automatically generated from the OpenAPI specification. All parameters, their types, descriptions, defaults, and examples are pulled directly from openapi.json. Scroll down to see the interactive API reference.

FAQ

What’s the difference between /v1/chat/completions and /v1/responses?

The /v1/responses endpoint is OpenAI’s more advanced interface that offers:
  • Built-in tools: Web search, file search, code interpreter
  • Multimodal inputs: Support for images and files in addition to text
  • Stateful conversations: Better conversation state management
  • Required for Pro models: OpenAI Pro series models (o3-pro, o3-mini) must use this endpoint
Use /v1/chat/completions for standard chat interactions with most models. Use /v1/responses when you need advanced features or are using Pro series models.

How do I use multimodal inputs (text + images)?

You can combine text and images in a single request:
response = client.responses.create(
    model="gpt-4.1",
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_text",
                    "text": "What is in this image?"
                },
                {
                    "type": "input_image",
                    "image_url": "https://example.com/image.jpg"
                }
            ]
        }
    ]
)
Enable built-in tools by including them in the tools array:
response = client.responses.create(
    model="gpt-4.1",
    input="What was a positive news story from today?",
    tools=[
        {"type": "web_search_preview"}
    ]
)

How do I maintain conversation state?

Use previous_response_id to create multi-turn conversations:
# First message
response1 = client.responses.create(
    model="gpt-4.1",
    input="Hello, my name is Alice."
)

# Follow-up message
response2 = client.responses.create(
    model="gpt-4.1",
    input="What's my name?",
    previous_response_id=response1.id
)
Alternatively, use the conversation parameter to manage conversation state automatically.

How do I use function calling?

Define custom functions and include them in the tools array:
response = client.responses.create(
    model="gpt-4.1",
    input="What is the weather like in Boston today?",
    tools=[
        {
            "type": "function",
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location", "unit"]
            }
        }
    ],
    tool_choice="auto"
)

How do I use reasoning models (o3, gpt-5)?

For reasoning models, you can configure reasoning effort:
response = client.responses.create(
    model="o3-mini",
    input="How much wood would a woodchuck chuck?",
    reasoning={
        "effort": "high"  # Options: minimal, low, medium, high
    }
)
Higher effort values result in more thorough reasoning but may take longer and use more tokens.

How do I enable streaming?

Set stream: true to enable Server-Sent Events streaming:
stream = client.responses.create(
    model="gpt-4.1",
    input="Tell me a story",
    stream=True
)

for chunk in stream:
    # Process streaming chunks
    print(chunk, end="")

Authorizations

Authorization
string
header
required

Bearer token authentication. Include your API key in the Authorization header as 'Bearer YOUR_API_KEY'

Body

application/json

Request object for the Responses API. Supports multimodal input, tools, function calling, and stateful conversations.

model
string
default:o3-pro
required

ID of the model to use. Specifies the model used to generate the completion message. For OpenAI Pro series models, use this endpoint instead of chat/completions.

Example:

"gpt-4.1"

input
required

Input parameters containing roles and message content. Can be a simple string or an array of input items for multimodal inputs (text, images, files).

instructions
string

A system (or developer) message inserted into the model's context. When used along with previous_response_id, the instructions from a previous response will not be carried over to the next response.

tools
(object | string)[]

An array of tools the model may call while generating a response. Supports built-in tools (web_search_preview, file_search, code_interpreter), MCP tools, and custom function calls.

tool_choice
enum<string>

How the model should select which tool (or tools) to use when generating a response. auto lets the model decide, none disables tools, required forces tool use.

Available options:
auto,
none,
required
stream
boolean
default:false

Whether to enable streaming response. When set to true, the response will be returned as Server-Sent Events (SSE).

temperature
number
default:1

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

Required range: 0 <= x <= 2
top_p
number
default:1

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.

Required range: 0 <= x <= 1
max_output_tokens
integer

An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.

Required range: x >= 1
previous_response_id
string

The unique ID of the previous response to the model. Use this to create multi-turn conversations. Cannot be used in conjunction with conversation.

conversation
string

The conversation that this response belongs to. Items from this conversation are prepended to input_items for this response request.

reasoning
object

Configuration options for reasoning models (gpt-5 and o-series models only).

background
boolean
default:false

Whether to run the model response in the background. When set to true, the API will return immediately and process the response asynchronously.

include
enum<string>[]

Specify additional output data to include in the model response. Currently supported values include web_search_call.action.sources, code_interpreter_call.outputs, computer_call_output.output.image_url, file_search_call.results, message.input_image.image_url, message.output_text.logprobs, and reasoning.encrypted_content.

Available options:
web_search_call.action.sources,
code_interpreter_call.outputs,
computer_call_output.output.image_url,
file_search_call.results,
message.input_image.image_url,
message.output_text.logprobs,
reasoning.encrypted_content
max_tool_calls
integer

The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.

Required range: x >= 1
metadata
object

Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.

modalities
enum<string>[]

Output types that you would like the model to generate for this request. text is the default.

Available options:
text,
audio
parallel_tool_calls
boolean
default:false

Whether to allow the model to run tool calls in parallel. When enabled, the model can make multiple tool calls simultaneously.

prompt
object

Reference to a prompt template and its variables. Learn more about reusable prompts.

prompt_cache_key
string

Used by OpenAI to cache responses for similar requests to optimize your cache hit rates. Replaces the user field. Learn more about prompt caching.

safety_identifier
string

A stable identifier used to help detect users of your application that may be violating OpenAI's usage policies. The IDs should be a string that uniquely identifies each user. We recommend hashing their username or email address, in order to avoid sending us any identifying information.

store
boolean
default:true

Whether or not to store the output of this response request for use in our model distillation or evals products. Learn more about conversation state.

text
object

Configuration options for a text response from the model. Can be plain text or structured JSON data.

truncation
enum<string>
default:disabled

The truncation strategy to use for the model response. auto: If the input exceeds the model's context window size, the model will truncate the response by dropping items from the beginning of the conversation. disabled (default): If the input size will exceed the context window size for a model, the request will fail with a 400 error.

Available options:
auto,
disabled

Response

Successful response generation

Response object from the Responses API. Contains the generated response, which may include text, tool calls, and other structured content.

model
string
required

Model ID used to generate the response, like gpt-4o or o3. OpenAI offers a wide range of models with different capabilities, performance characteristics, and price points.

response
required

The generated response. Can be a simple string or a structured object with items containing messages, tool calls, reasoning steps, and other content.

input
string

Text, image, or file inputs to the model, used to generate a response. This reflects the input that was sent in the request.

usage
object

Token usage statistics for the request

id
string

Unique identifier for this response. Can be used with previous_response_id for multi-turn conversations.

created
integer

Unix timestamp (in seconds) when the response was created

metadata
object

Metadata associated with the response, if provided in the request