Gemini Image Generation

Overview

The Gemini Image Generation API allows you to generate images from text descriptions using Google’s Gemini models. This endpoint supports flexible aspect ratios, multiple resolutions (up to 4K), and multi-turn image editing capabilities.

Latest News: gemini-3-pro-image-preview is now supported! Generate images up to 4K resolution.

Quick Start

Simply replace the Base URL and API Key in the official SDK or requests to use it:

Base URL: https://wisdom-gate.juheapi.com (replace generativelanguage.googleapis.com)
API Key: Replace $GEMINI_API_KEY with your $WISDOM_GATE_KEY

Basic Example: Generate Image

curl -s -X POST \
  "https://wisdom-gate.juheapi.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $WISDOM_GATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [{
        "text": "Da Vinci style anatomical sketch of a dissected Monarch butterfly. Detailed drawings of the head, wings, and legs on textured parchment with notes in English."
      }]
    }],
    "tools": [{"google_search": {}}],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"],
      "imageConfig": {
        "aspectRatio": "1:1",
        "imageSize": "1K"
      }
    }
  }' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | head -1 | base64 --decode > butterfly.png

Image-to-Image Generation

You can upload an input image along with a text prompt to generate a modified new image.

curl -s -X POST \
  "https://wisdom-gate.juheapi.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $WISDOM_GATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "role": "user",
      "parts": [
        { "text": "cat" },
        {
          "inline_data": {
            "mime_type": "image/jpeg",
            "data": "BASE64_DATA_HERE"
          }
        }
      ]
    }],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"]
    }
  }'

Multi-Image & Reference Generation

gemini-3-pro-image-preview supports using multiple images as inputs. You can mix up to 14 reference images in a single request:

Up to 6 images of objects with high-fidelity
Up to 5 images of humans to maintain character consistency

curl -s -X POST \
  "https://wisdom-gate.juheapi.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $WISDOM_GATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "role": "user",
      "parts": [
        { "text": "An office group photo of these people, they are making funny faces." },
        { "inline_data": { "mime_type": "image/jpeg", "data": "BASE64_IMG_1" } },
        { "inline_data": { "mime_type": "image/jpeg", "data": "BASE64_IMG_2" } },
        { "inline_data": { "mime_type": "image/jpeg", "data": "BASE64_IMG_3" } },
        { "inline_data": { "mime_type": "image/jpeg", "data": "BASE64_IMG_4" } },
        { "inline_data": { "mime_type": "image/jpeg", "data": "BASE64_IMG_5" } }
      ]
    }],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"],
      "imageConfig": {
        "aspectRatio": "5:4",
        "imageSize": "1K"
      }
    }
  }'

Important Notes

Model DifferencesDifferent Gemini image models may support different resolutions and features. gemini-3-pro-image-preview supports up to 4K resolution, while gemini-2.5-flash-image supports 1K and 2K resolutions. We strongly recommend consulting the model catalog for complete parameter lists and usage instructions for each model.

Response Pass-through PrincipleWisdom Gate typically does not modify model responses outside of reverse format, ensuring you receive response content consistent with the original Gemini API provider.

Force Image OutputTo ensure image generation without text-only responses, set "responseModalities": ["IMAGE"] (without TEXT) in your request. This forces the model to generate an image.

Auto-Generated DocumentationThe request parameters and response format are automatically generated from the OpenAPI specification. All parameters, their types, descriptions, defaults, and examples are pulled directly from openapi.json. Scroll down to see the interactive API reference.

FAQ

What models support image generation?

Currently supported models:

gemini-3-pro-image-preview: Supports up to 4K resolution, multiple aspect ratios
gemini-2.5-flash-image: Supports 1K and 2K resolutions, flexible aspect ratios

How to configure aspect ratios?

Gemini 2.5 Flash Image supports multiple aspect ratios for easy content creation across different devices. All resolutions consume 1,290 tokens by default. Supported aspect ratios:

1:1 - Square
3:2 - Landscape
2:3 - Portrait
3:4 - Portrait
4:3 - Landscape
4:5 - Portrait
5:4 - Landscape
9:16 - Vertical (mobile)
16:9 - Horizontal (widescreen)
21:9 - Ultra-wide

curl -s -X POST \
  "https://wisdom-gate.juheapi.com/v1beta/models/gemini-2.5-flash-image:generateContent" \
  -H "x-goog-api-key: $WISDOM_GATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [{
        "text": "A beautiful sunset over mountains"
      }]
    }],
    "generationConfig": {
      "responseModalities": ["IMAGE"],
      "imageConfig": {
        "aspectRatio": "16:9"
      }
    }
  }'

How to do multi-turn image editing?

You can maintain conversation context across multiple turns to iteratively refine images:

# First turn: Generate initial image
curl -s -X POST \
  "https://wisdom-gate.juheapi.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $WISDOM_GATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "role": "user",
      "parts": [{
        "text": "Create a vibrant infographic that explains photosynthesis as if it were a recipe for a plants favorite food. Show the \"ingredients\" (sunlight, water, CO2) and the \"finished dish\" (sugar/energy). The style should be like a page from a colorful kids cookbook, suitable for a 4th grader."
      }]
    }],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"]
    }
  }' > turn1_response.json

# Extract image from first response
jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' turn1_response.json | head -1 | base64 --decode > photosynthesis.png

# Second turn: Refine based on previous response
# (Include the previous conversation in contents array)

How to force image-only output?

To prevent text-only responses, set "responseModalities": ["IMAGE"] (without TEXT):

data = {
    "contents": [{
        "parts": [{
            "text": "A serene mountain landscape at dawn"
        }]
    }],
    "generationConfig": {
        "responseModalities": ["IMAGE"],  # Only IMAGE, no TEXT
        "imageConfig": {
            "aspectRatio": "16:9"
        }
    }
}

How to extract images from the response?

Images are returned as base64-encoded data in the inlineData field:

import base64

response = requests.post(url, headers=headers, json=data)
result = response.json()

for candidate in result.get("candidates", []):
    for part in candidate.get("content", {}).get("parts", []):
        if "inlineData" in part:
            image_data = part["inlineData"]["data"]
            mime_type = part["inlineData"]["mimeType"]
            
            # Decode and save
            image_bytes = base64.b64decode(image_data)
            extension = mime_type.split("/")[1]  # e.g., "png" from "image/png"
            filename = f"generated_image.{extension}"
            
            with open(filename, "wb") as f:
                f.write(image_bytes)
            print(f"Image saved as {filename}")

How to control costs?

Choose appropriate models: gemini-2.5-flash-image is more economical than gemini-3-pro-image-preview
Use lower resolutions: 1K and 2K consume fewer tokens than 4K
Monitor token consumption: Check the usageMetadata field in responses
All aspect ratios consume 1,290 tokens by default for Gemini 2.5 Flash Image

Authorizations

Authorization

string

header

required

Bearer token authentication. Include your API key in the Authorization header as 'Bearer YOUR_API_KEY'

Path Parameters

model

string

required

The model identifier (e.g., 'gemini-pro', 'gemini-pro-vision')

operator

string

required

The operation to perform. Use 'generateContent' for standard requests, or 'streamGenerateContent?alt=sse' for streaming responses with Server-Sent Events format.

Body

application/json

contents

object[]

required

Array of content parts that make up the conversation

Show child attributes

systemInstruction

object

System instruction to guide the model's behavior

Show child attributes

generationConfig

object

Configuration for content generation

Show child attributes

safetySettings

object[]

Safety settings for content filtering

Show child attributes

Response

Successful content generation response

candidates

object[]

required

Array of generated content candidates

Show child attributes

usageMetadata

object

Token usage statistics for the request

Show child attributes

promptFeedback

object

Feedback about the prompt, including safety ratings

Show child attributes

Text Models

Image Models

Video Models

Error Handling

Overview

Quick Start

Basic Example: Generate Image

Image-to-Image Generation

Multi-Image & Reference Generation

Important Notes

FAQ

What models support image generation?

How to configure aspect ratios?

How to do multi-turn image editing?

How to force image-only output?

How to extract images from the response?

How to control costs?

Authorizations

Path Parameters

Body

Response

Text Models

Image Models

Video Models

Error Handling

​Overview

​Quick Start

​Basic Example: Generate Image

​Image-to-Image Generation

​Multi-Image & Reference Generation

​Important Notes

​FAQ

​What models support image generation?

​How to configure aspect ratios?

​How to do multi-turn image editing?

​How to force image-only output?

​How to extract images from the response?

​How to control costs?

Authorizations

Path Parameters

Body

Response

Overview

Quick Start

Basic Example: Generate Image

Image-to-Image Generation

Multi-Image & Reference Generation

Important Notes

FAQ

What models support image generation?

How to configure aspect ratios?

How to do multi-turn image editing?

How to force image-only output?

How to extract images from the response?

How to control costs?