Overview
The Gemini API endpoint allows you to generate content using Google’s Gemini models in their native format. This endpoint follows the official Gemini API specification, making it easy to integrate with existing Gemini-compatible code.
Latest News: gemini-3-pro-preview is now supported!
Quick Start
Simply replace the Base URL and API Key in the official SDK or requests to use it:
- Base URL:
https://wisdom-gate.juheapi.com (replace generativelanguage.googleapis.com)
- API Key: Replace
$GEMINI_API_KEY with your $WISDOM_GATE_KEY
Basic Example: Text Generation
curl "https://wisdom-gate.juheapi.com/v1beta/models/gemini-3-pro-preview:generateContent" \
-H "x-goog-api-key: $WISDOM_GATE_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"contents": [
{
"parts": [
{
"text": "How does AI work?"
}
]
}
]
}'
Important Notes
Model DifferencesDifferent Gemini model versions may support different request parameters and return different response fields. We strongly recommend consulting the model catalog for complete parameter lists and usage instructions for each model.
Response Pass-through PrincipleWisdom Gate typically does not modify model responses outside of reverse format, ensuring you receive response content consistent with the original Gemini API provider.
Streaming SupportWisdom Gate supports Server-Sent Events (SSE) for streaming responses. Use the streamGenerateContent operator with ?alt=sse parameter to enable real-time streaming, which is useful for chat applications.
Auto-Generated DocumentationThe request parameters and response format are automatically generated from the OpenAPI specification. All parameters, their types, descriptions, defaults, and examples are pulled directly from openapi.json. Scroll down to see the interactive API reference.
FAQ
How to control Thinking?
Gemini models support a “thinking” process to improve reasoning capabilities. The control method depends on the model version.
For details, please refer to the official documentation: Gemini Thinking Guide
Gemini 3 Series (e.g., gemini-3-pro-preview)
Use the thinkingLevel parameter to control thinking intensity ("LOW" or "HIGH").
curl "https://wisdom-gate.juheapi.com/v1beta/models/gemini-3-pro-preview:generateContent" \
-H "x-goog-api-key: $WISDOM_GATE_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"contents": [{ "parts": [{ "text": "Explain quantum physics simply." }] }],
"generationConfig": {
"thinkingConfig": {
"thinkingLevel": "LOW"
}
}
}'
Gemini 2.5 Series (e.g., gemini-2.5-pro)
Use the thinkingBudget parameter to control the Token budget for thinking.
0: Disable thinking.
-1: Dynamic thinking (model decides automatically, default).
> 0: Set a specific Token limit (e.g., 1024).
curl "https://wisdom-gate.juheapi.com/v1beta/models/gemini-2.5-pro:generateContent" \
-H "x-goog-api-key: $WISDOM_GATE_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"contents": [{ "parts": [{ "text": "Solve this logic puzzle." }] }],
"generationConfig": {
"thinkingConfig": {
"thinkingBudget": 1024
}
}
}'
How to use Streaming Responses?
Streaming responses allow you to receive results incrementally as the model generates content, reducing perceived latency.
For details, please refer to the official documentation: Gemini Text Generation - Streaming Responses
Note: The URL must point to streamGenerateContent and it is recommended to add ?alt=sse to use the Server-Sent Events format.
curl "https://wisdom-gate.juheapi.com/v1beta/models/gemini-2.5-flash:streamGenerateContent?alt=sse" \
-H "x-goog-api-key: $WISDOM_GATE_KEY" \
-H 'Content-Type: application/json' \
--no-buffer \
-d '{
"contents": [
{
"parts": [
{
"text": "Explain how AI works"
}
]
}
]
}'
How to maintain conversation context?
Include the complete conversation history in the contents array:
conversation = [
{
"role": "user",
"parts": [{"text": "What is Python?"}]
},
{
"role": "model",
"parts": [{"text": "Python is a programming language..."}]
},
{
"role": "user",
"parts": [{"text": "What are its advantages?"}]
}
]
response = requests.post(
"https://wisdom-gate.juheapi.com/v1beta/models/gemini-3-pro-preview:generateContent",
headers={
"x-goog-api-key": "WISDOM_GATE_KEY",
"Content-Type": "application/json"
},
json={"contents": conversation}
)
What does finishReason mean?
The finishReason field in the response indicates why the model stopped generating:
| Value | Meaning |
|---|
STOP | Natural completion |
MAX_TOKENS | Reached maxOutputTokens limit |
SAFETY | Triggered safety filter |
RECITATION | Detected recitation of training data |
OTHER | Other reason |
How to control costs?
- Use
maxOutputTokens in generationConfig to limit output length
- Choose appropriate models (e.g.,
gemini-2.5-flash is more economical than gemini-3-pro-preview)
- Streamline prompts, avoid redundant context
- Monitor token consumption in the
usageMetadata field of responses
- Use thinking budgets wisely for reasoning models to control reasoning token usage
How to use multimodal input (text and images)?
Gemini supports multimodal input through the parts array. You can include both text and images in a single request:
data = {
"contents": [
{
"parts": [
{"text": "What is in this image?"},
{
"inlineData": {
"mimeType": "image/jpeg",
"data": "base64_encoded_image_data_here"
}
}
]
}
]
}
Bearer token authentication. Include your API key in the Authorization header as 'Bearer YOUR_API_KEY'
The model identifier (e.g., 'gemini-pro', 'gemini-pro-vision')
The operation to perform. Use 'generateContent' for standard requests, or 'streamGenerateContent?alt=sse' for streaming responses with Server-Sent Events format.
Array of content parts that make up the conversation
System instruction to guide the model's behavior
Configuration for content generation
Safety settings for content filtering
Successful content generation response
Array of generated content candidates
Token usage statistics for the request
Feedback about the prompt, including safety ratings