Skip to main content
POST
/
v1beta
/
models
/
{model}
:
{operator}
curl --request POST \
  --url https://wisdom-gate.juheapi.com:{operator}/v1beta/models/{model}:62437 \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "contents": [
    {
      "parts": [
        {
          "text": "How does AI work?"
        }
      ]
    }
  ]
}
'
{
"candidates": [
{
"content": {
"role": "model",
"parts": [
{
"text": "AI, or artificial intelligence, works by using algorithms and data to enable machines to learn from experience, adapt to new inputs, and perform tasks that typically require human intelligence."
}
]
},
"finishReason": "STOP",
"index": 0
}
],
"usageMetadata": {
"promptTokenCount": 5,
"candidatesTokenCount": 25,
"totalTokenCount": 30
}
}

Overview

The Gemini API endpoint allows you to generate content using Google’s Gemini models in their native format. This endpoint follows the official Gemini API specification, making it easy to integrate with existing Gemini-compatible code.
Latest News: gemini-3-pro-preview is now supported!

Quick Start

Simply replace the Base URL and API Key in the official SDK or requests to use it:
  • Base URL: https://wisdom-gate.juheapi.com (replace generativelanguage.googleapis.com)
  • API Key: Replace $GEMINI_API_KEY with your $WISDOM_GATE_KEY

Basic Example: Text Generation

curl "https://wisdom-gate.juheapi.com/v1beta/models/gemini-3-pro-preview:generateContent" \
  -H "x-goog-api-key: $WISDOM_GATE_KEY" \
  -H 'Content-Type: application/json' \
  -X POST \
  -d '{
    "contents": [
      {
        "parts": [
          {
            "text": "How does AI work?"
          }
        ]
      }
    ]
  }'

Important Notes

Model DifferencesDifferent Gemini model versions may support different request parameters and return different response fields. We strongly recommend consulting the model catalog for complete parameter lists and usage instructions for each model.
Response Pass-through PrincipleWisdom Gate typically does not modify model responses outside of reverse format, ensuring you receive response content consistent with the original Gemini API provider.
Streaming SupportWisdom Gate supports Server-Sent Events (SSE) for streaming responses. Use the streamGenerateContent operator with ?alt=sse parameter to enable real-time streaming, which is useful for chat applications.

Auto-Generated DocumentationThe request parameters and response format are automatically generated from the OpenAPI specification. All parameters, their types, descriptions, defaults, and examples are pulled directly from openapi.json. Scroll down to see the interactive API reference.

FAQ

How to control Thinking?

Gemini models support a “thinking” process to improve reasoning capabilities. The control method depends on the model version. For details, please refer to the official documentation: Gemini Thinking Guide

Gemini 3 Series (e.g., gemini-3-pro-preview)

Use the thinkingLevel parameter to control thinking intensity ("LOW" or "HIGH").
curl "https://wisdom-gate.juheapi.com/v1beta/models/gemini-3-pro-preview:generateContent" \
  -H "x-goog-api-key: $WISDOM_GATE_KEY" \
  -H 'Content-Type: application/json' \
  -X POST \
  -d '{
    "contents": [{ "parts": [{ "text": "Explain quantum physics simply." }] }],
    "generationConfig": {
      "thinkingConfig": {
        "thinkingLevel": "LOW"
      }
    }
  }'

Gemini 2.5 Series (e.g., gemini-2.5-pro)

Use the thinkingBudget parameter to control the Token budget for thinking.
  • 0: Disable thinking.
  • -1: Dynamic thinking (model decides automatically, default).
  • > 0: Set a specific Token limit (e.g., 1024).
curl "https://wisdom-gate.juheapi.com/v1beta/models/gemini-2.5-pro:generateContent" \
  -H "x-goog-api-key: $WISDOM_GATE_KEY" \
  -H 'Content-Type: application/json' \
  -X POST \
  -d '{
    "contents": [{ "parts": [{ "text": "Solve this logic puzzle." }] }],
    "generationConfig": {
      "thinkingConfig": {
        "thinkingBudget": 1024
      }
    }
  }'

How to use Streaming Responses?

Streaming responses allow you to receive results incrementally as the model generates content, reducing perceived latency. For details, please refer to the official documentation: Gemini Text Generation - Streaming Responses Note: The URL must point to streamGenerateContent and it is recommended to add ?alt=sse to use the Server-Sent Events format.
curl "https://wisdom-gate.juheapi.com/v1beta/models/gemini-2.5-flash:streamGenerateContent?alt=sse" \
  -H "x-goog-api-key: $WISDOM_GATE_KEY" \
  -H 'Content-Type: application/json' \
  --no-buffer \
  -d '{
    "contents": [
      {
        "parts": [
          {
            "text": "Explain how AI works"
          }
        ]
      }
    ]
  }'

How to maintain conversation context?

Include the complete conversation history in the contents array:
conversation = [
    {
        "role": "user",
        "parts": [{"text": "What is Python?"}]
    },
    {
        "role": "model",
        "parts": [{"text": "Python is a programming language..."}]
    },
    {
        "role": "user",
        "parts": [{"text": "What are its advantages?"}]
    }
]

response = requests.post(
    "https://wisdom-gate.juheapi.com/v1beta/models/gemini-3-pro-preview:generateContent",
    headers={
        "x-goog-api-key": "WISDOM_GATE_KEY",
        "Content-Type": "application/json"
    },
    json={"contents": conversation}
)

What does finishReason mean?

The finishReason field in the response indicates why the model stopped generating:
ValueMeaning
STOPNatural completion
MAX_TOKENSReached maxOutputTokens limit
SAFETYTriggered safety filter
RECITATIONDetected recitation of training data
OTHEROther reason

How to control costs?

  1. Use maxOutputTokens in generationConfig to limit output length
  2. Choose appropriate models (e.g., gemini-2.5-flash is more economical than gemini-3-pro-preview)
  3. Streamline prompts, avoid redundant context
  4. Monitor token consumption in the usageMetadata field of responses
  5. Use thinking budgets wisely for reasoning models to control reasoning token usage

How to use multimodal input (text and images)?

Gemini supports multimodal input through the parts array. You can include both text and images in a single request:
data = {
    "contents": [
        {
            "parts": [
                {"text": "What is in this image?"},
                {
                    "inlineData": {
                        "mimeType": "image/jpeg",
                        "data": "base64_encoded_image_data_here"
                    }
                }
            ]
        }
    ]
}

Authorizations

Authorization
string
header
required

Bearer token authentication. Include your API key in the Authorization header as 'Bearer YOUR_API_KEY'

Path Parameters

model
string
required

The model identifier (e.g., 'gemini-pro', 'gemini-pro-vision')

operator
string
required

The operation to perform. Use 'generateContent' for standard requests, or 'streamGenerateContent?alt=sse' for streaming responses with Server-Sent Events format.

Body

application/json
contents
object[]
required

Array of content parts that make up the conversation

systemInstruction
object

System instruction to guide the model's behavior

generationConfig
object

Configuration for content generation

safetySettings
object[]

Safety settings for content filtering

Response

Successful content generation response

candidates
object[]
required

Array of generated content candidates

usageMetadata
object

Token usage statistics for the request

promptFeedback
object

Feedback about the prompt, including safety ratings