Skip to main content
POST
/
v1beta
/
models
/
{model}
:
{operator}
curl --request POST \
  --url https://wisdom-gate.juheapi.com:{operator}/v1beta/models/{model}:62437 \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "contents": [
    {
      "parts": [
        {
          "text": "How does AI work?"
        }
      ]
    }
  ]
}
'
{
"candidates": [
{
"content": {
"role": "model",
"parts": [
{
"text": "AI, or artificial intelligence, works by using algorithms and data to enable machines to learn from experience, adapt to new inputs, and perform tasks that typically require human intelligence."
}
]
},
"finishReason": "STOP",
"index": 0
}
],
"usageMetadata": {
"promptTokenCount": 5,
"candidatesTokenCount": 25,
"totalTokenCount": 30
}
}

Overview

The Gemini Image Generation API allows you to generate images from text descriptions using Google’s Gemini models. This endpoint supports flexible aspect ratios, multiple resolutions (up to 4K), and multi-turn image editing capabilities.
Latest News: gemini-3-pro-image-preview is now supported! Generate images up to 4K resolution.

Quick Start

Simply replace the Base URL and API Key in the official SDK or requests to use it:
  • Base URL: https://wisdom-gate.juheapi.com (replace generativelanguage.googleapis.com)
  • API Key: Replace $GEMINI_API_KEY with your $WISDOM_GATE_KEY

Basic Example: Generate Image

curl -s -X POST \
  "https://wisdom-gate.juheapi.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $WISDOM_GATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [{
        "text": "Da Vinci style anatomical sketch of a dissected Monarch butterfly. Detailed drawings of the head, wings, and legs on textured parchment with notes in English."
      }]
    }],
    "tools": [{"google_search": {}}],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"],
      "imageConfig": {
        "aspectRatio": "1:1",
        "imageSize": "1K"
      }
    }
  }' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | head -1 | base64 --decode > butterfly.png

Image-to-Image Generation

You can upload an input image along with a text prompt to generate a modified new image.
curl -s -X POST \
  "https://wisdom-gate.juheapi.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $WISDOM_GATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "role": "user",
      "parts": [
        { "text": "cat" },
        {
          "inline_data": {
            "mime_type": "image/jpeg",
            "data": "BASE64_DATA_HERE"
          }
        }
      ]
    }],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"]
    }
  }'

Multi-Image & Reference Generation

gemini-3-pro-image-preview supports using multiple images as inputs. You can mix up to 14 reference images in a single request:
  • Up to 6 images of objects with high-fidelity
  • Up to 5 images of humans to maintain character consistency
curl -s -X POST \
  "https://wisdom-gate.juheapi.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $WISDOM_GATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "role": "user",
      "parts": [
        { "text": "An office group photo of these people, they are making funny faces." },
        { "inline_data": { "mime_type": "image/jpeg", "data": "BASE64_IMG_1" } },
        { "inline_data": { "mime_type": "image/jpeg", "data": "BASE64_IMG_2" } },
        { "inline_data": { "mime_type": "image/jpeg", "data": "BASE64_IMG_3" } },
        { "inline_data": { "mime_type": "image/jpeg", "data": "BASE64_IMG_4" } },
        { "inline_data": { "mime_type": "image/jpeg", "data": "BASE64_IMG_5" } }
      ]
    }],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"],
      "imageConfig": {
        "aspectRatio": "5:4",
        "imageSize": "1K"
      }
    }
  }'

Important Notes

Model DifferencesDifferent Gemini image models may support different resolutions and features. gemini-3-pro-image-preview supports up to 4K resolution, while gemini-2.5-flash-image supports 1K and 2K resolutions. We strongly recommend consulting the model catalog for complete parameter lists and usage instructions for each model.
Response Pass-through PrincipleWisdom Gate typically does not modify model responses outside of reverse format, ensuring you receive response content consistent with the original Gemini API provider.
Force Image OutputTo ensure image generation without text-only responses, set "responseModalities": ["IMAGE"] (without TEXT) in your request. This forces the model to generate an image.

Auto-Generated DocumentationThe request parameters and response format are automatically generated from the OpenAPI specification. All parameters, their types, descriptions, defaults, and examples are pulled directly from openapi.json. Scroll down to see the interactive API reference.

FAQ

What models support image generation?

Currently supported models:
  • gemini-3-pro-image-preview: Supports up to 4K resolution, multiple aspect ratios
  • gemini-2.5-flash-image: Supports 1K and 2K resolutions, flexible aspect ratios

How to configure aspect ratios?

Gemini 2.5 Flash Image supports multiple aspect ratios for easy content creation across different devices. All resolutions consume 1,290 tokens by default. Supported aspect ratios:
  • 1:1 - Square
  • 3:2 - Landscape
  • 2:3 - Portrait
  • 3:4 - Portrait
  • 4:3 - Landscape
  • 4:5 - Portrait
  • 5:4 - Landscape
  • 9:16 - Vertical (mobile)
  • 16:9 - Horizontal (widescreen)
  • 21:9 - Ultra-wide
curl -s -X POST \
  "https://wisdom-gate.juheapi.com/v1beta/models/gemini-2.5-flash-image:generateContent" \
  -H "x-goog-api-key: $WISDOM_GATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [{
        "text": "A beautiful sunset over mountains"
      }]
    }],
    "generationConfig": {
      "responseModalities": ["IMAGE"],
      "imageConfig": {
        "aspectRatio": "16:9"
      }
    }
  }'

How to do multi-turn image editing?

You can maintain conversation context across multiple turns to iteratively refine images:
# First turn: Generate initial image
curl -s -X POST \
  "https://wisdom-gate.juheapi.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $WISDOM_GATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "role": "user",
      "parts": [{
        "text": "Create a vibrant infographic that explains photosynthesis as if it were a recipe for a plants favorite food. Show the \"ingredients\" (sunlight, water, CO2) and the \"finished dish\" (sugar/energy). The style should be like a page from a colorful kids cookbook, suitable for a 4th grader."
      }]
    }],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"]
    }
  }' > turn1_response.json

# Extract image from first response
jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' turn1_response.json | head -1 | base64 --decode > photosynthesis.png

# Second turn: Refine based on previous response
# (Include the previous conversation in contents array)

How to force image-only output?

To prevent text-only responses, set "responseModalities": ["IMAGE"] (without TEXT):
data = {
    "contents": [{
        "parts": [{
            "text": "A serene mountain landscape at dawn"
        }]
    }],
    "generationConfig": {
        "responseModalities": ["IMAGE"],  # Only IMAGE, no TEXT
        "imageConfig": {
            "aspectRatio": "16:9"
        }
    }
}

How to extract images from the response?

Images are returned as base64-encoded data in the inlineData field:
import base64

response = requests.post(url, headers=headers, json=data)
result = response.json()

for candidate in result.get("candidates", []):
    for part in candidate.get("content", {}).get("parts", []):
        if "inlineData" in part:
            image_data = part["inlineData"]["data"]
            mime_type = part["inlineData"]["mimeType"]
            
            # Decode and save
            image_bytes = base64.b64decode(image_data)
            extension = mime_type.split("/")[1]  # e.g., "png" from "image/png"
            filename = f"generated_image.{extension}"
            
            with open(filename, "wb") as f:
                f.write(image_bytes)
            print(f"Image saved as {filename}")

How to control costs?

  1. Choose appropriate models: gemini-2.5-flash-image is more economical than gemini-3-pro-image-preview
  2. Use lower resolutions: 1K and 2K consume fewer tokens than 4K
  3. Monitor token consumption: Check the usageMetadata field in responses
  4. All aspect ratios consume 1,290 tokens by default for Gemini 2.5 Flash Image

Authorizations

Authorization
string
header
required

Bearer token authentication. Include your API key in the Authorization header as 'Bearer YOUR_API_KEY'

Path Parameters

model
string
required

The model identifier (e.g., 'gemini-pro', 'gemini-pro-vision')

operator
string
required

The operation to perform. Use 'generateContent' for standard requests, or 'streamGenerateContent?alt=sse' for streaming responses with Server-Sent Events format.

Body

application/json
contents
object[]
required

Array of content parts that make up the conversation

systemInstruction
object

System instruction to guide the model's behavior

generationConfig
object

Configuration for content generation

safetySettings
object[]

Safety settings for content filtering

Response

Successful content generation response

candidates
object[]
required

Array of generated content candidates

usageMetadata
object

Token usage statistics for the request

promptFeedback
object

Feedback about the prompt, including safety ratings