Published on April 2, 2026· Updated April 4, 2026

Image API Rate Limits: Caching & Performance Guide

imejis

@Imejis_io

Image API Rate Limits: Caching & Performance Guide

Your image generation API works perfectly at 10 requests. You ship it, users love it, traffic grows. Then one morning you hit 1,000 requests in an hour and everything falls apart. API calls start failing with 429 errors, your app slows to a crawl, and users see broken images.

I've been there. The fix isn't complicated, but it does require thinking about your architecture differently. This guide covers the exact patterns I use to handle high-volume image generation without hitting rate limits or burning through API credits.

If you're new to image generation APIs, start with how to generate images with an API first.

Understanding rate limitsUnderstanding Rate Limits

Rate limits control how many API requests you can make in a given time window. Every image generation API has them, including Imejis.io.

They exist for good reasons: they prevent any single user from overwhelming the service, they keep infrastructure costs predictable, and they ensure fair access for everyone.

Typical rate limits across image APIs:

API	Requests/Second	Monthly Limit
Most image generation APIs	10-60	Plan-dependent
Free tiers	1-5	100-500/month
Enterprise plans	50-100+	Unlimited or high cap

When you exceed a rate limit, you'll get a 429 Too Many Requests response. Some APIs also return a Retry-After header telling you how long to wait.

The real problem isn't the rate limit itself. It's that most developers don't plan for it. They fire off API calls in a loop, hit the limit, and every request after that fails. Let's fix that.

The caching strategy that solves 90 of problemsThe Caching Strategy That Solves 90% of Problems

Here's the key insight: most image generation requests are duplicates. The same product card, the same social media template, the same Open Graph image, generated over and over with the same input data.

If you generate an image once and cache the result, you'll cut your API usage by 80-95%. That's not a guess. I've seen it consistently across production apps.

Cache key designCache Key Design

Your cache key needs to be deterministic. Same inputs should always produce the same key. Here's the formula:

cache_key = hash(template_id + sorted(input_data))

Sort your input data before hashing. Objects like {name: "Alice", role: "Engineer"} and {role: "Engineer", name: "Alice"} should produce the same cache key, since they'll generate the same image.

const crypto = require("crypto")
 
function generateCacheKey(templateId, inputData) {
  // Sort keys for consistent hashing
  const sortedData = JSON.stringify(inputData, Object.keys(inputData).sort())
  const raw = `${templateId}:${sortedData}`
  return crypto.createHash("sha256").update(raw).digest("hex")
}
 
// Both produce the same cache key
generateCacheKey("social-card", { name: "Alice", role: "Engineer" })
generateCacheKey("social-card", { role: "Engineer", name: "Alice" })

Redis cache implementationRedis Cache Implementation

Redis is the go-to choice for caching generated image URLs. It's fast, supports TTL (time-to-live) expiration, and handles concurrent access well.

const Redis = require("ioredis")
const redis = new Redis(process.env.REDIS_URL)
 
const CACHE_TTL = 60 * 60 * 24 // 24 hours
 
async function getOrGenerateImage(templateId, inputData) {
  const cacheKey = `img:${generateCacheKey(templateId, inputData)}`
 
  // Check cache first
  const cached = await redis.get(cacheKey)
  if (cached) {
    console.log("Cache hit:", cacheKey.slice(0, 12))
    return JSON.parse(cached)
  }
 
  // Cache miss — call the API
  console.log("Cache miss:", cacheKey.slice(0, 12))
  const result = await callImageAPI(templateId, inputData)
 
  // Store in cache
  await redis.setex(cacheKey, CACHE_TTL, JSON.stringify(result))
 
  return result
}
 
async function callImageAPI(templateId, inputData) {
  const response = await fetch("https://api.imejis.io/v1/images", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.IMEJIS_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      template: templateId,
      data: inputData,
    }),
  })
 
  if (!response.ok) {
    throw new Error(`API error: ${response.status}`)
  }
 
  return response.json()
}

A 24-hour TTL works well for most use cases. If your templates change frequently, shorten it. If your data is static (like product images), extend it to 7 days or longer.

Filesystem cacheFilesystem Cache

For smaller apps or local development, a filesystem cache keeps things simple. No Redis server needed.

const fs = require("fs").promises
const path = require("path")
 
const CACHE_DIR = path.join(__dirname, ".image-cache")
 
async function getOrGenerateImageFS(templateId, inputData) {
  const cacheKey = generateCacheKey(templateId, inputData)
  const cachePath = path.join(CACHE_DIR, `${cacheKey}.json`)
 
  try {
    const cached = await fs.readFile(cachePath, "utf-8")
    return JSON.parse(cached)
  } catch (err) {
    // Cache miss
  }
 
  const result = await callImageAPI(templateId, inputData)
 
  await fs.mkdir(CACHE_DIR, { recursive: true })
  await fs.writeFile(cachePath, JSON.stringify(result))
 
  return result
}

This won't scale to millions of images, but it works fine for apps generating a few thousand images per day. For high-volume workloads, stick with Redis or check out batch image generation from CSV for bulk workflows.

Queue based generationQueue-Based Generation

When you need to generate 10,000 images at once (say, for a product catalog or a marketing campaign), don't blast the API with all 10,000 requests simultaneously. You'll hit rate limits instantly.

Instead, use a job queue to control how many requests happen at the same time.

Bullbullmq queueBull/BullMQ Queue

BullMQ is the standard job queue for Node.js. It uses Redis as a backend and gives you concurrency control, retries, and job prioritization out of the box.

const { Queue, Worker } = require("bullmq")
const Redis = require("ioredis")
 
const connection = new Redis(process.env.REDIS_URL)
 
// Create the queue
const imageQueue = new Queue("image-generation", { connection })
 
// Add jobs to the queue
async function queueImageGeneration(jobs) {
  const bulkJobs = jobs.map((job) => ({
    name: "generate",
    data: {
      templateId: job.templateId,
      inputData: job.inputData,
    },
  }))
 
  await imageQueue.addBulk(bulkJobs)
  console.log(`Queued ${bulkJobs.length} image generation jobs`)
}

Controlled concurrencyControlled Concurrency

The worker processes jobs from the queue with a controlled concurrency level. This is where the magic happens: you decide how many API calls happen at once.

const worker = new Worker(
  "image-generation",
  async (job) => {
    const { templateId, inputData } = job.data
 
    // Check cache first (same pattern as before)
    const result = await getOrGenerateImage(templateId, inputData)
    return result
  },
  {
    connection,
    concurrency: 5, // Process 5 jobs at a time
    limiter: {
      max: 10, // Max 10 jobs
      duration: 1000, // per second
    },
  }
)
 
worker.on("completed", (job) => {
  console.log(`Job ${job.id} completed`)
})
 
worker.on("failed", (job, err) => {
  console.error(`Job ${job.id} failed:`, err.message)
})

Setting concurrency: 5 with a rate limiter of 10 per second keeps you well within typical API rate limits. Adjust these numbers based on your plan's limits.

Retry logic with exponential backoffRetry Logic with Exponential Backoff

Even with caching and queues, you'll occasionally get 429 or 5xx errors. Don't treat these as fatal. Just retry with increasing delays.

async function callWithRetry(fn, maxRetries = 4) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn()
    } catch (err) {
      if (attempt === maxRetries) throw err
 
      const isRetryable =
        err.status === 429 || (err.status >= 500 && err.status < 600)
 
      if (!isRetryable) throw err
 
      // Check for Retry-After header
      const retryAfter = err.headers?.get?.("Retry-After")
      const delay = retryAfter
        ? parseInt(retryAfter, 10) * 1000
        : Math.min(1000 * Math.pow(2, attempt), 30000) // 1s, 2s, 4s, 8s
 
      console.log(`Retry attempt ${attempt + 1} after ${delay}ms`)
      await new Promise((resolve) => setTimeout(resolve, delay))
    }
  }
}
 
// Usage
const result = await callWithRetry(() => callImageAPI(templateId, inputData))

The key points: respect the Retry-After header when it's present, cap your maximum delay (30 seconds is reasonable), and set a retry limit so you don't loop forever.

Cdn setup for generated imagesCDN Setup for Generated Images

Once you've generated an image, serve it through a CDN. This removes load from your origin server and delivers images faster to users worldwide.

The setup depends on where you store generated images. Here's a common pattern with Cloudflare:

Store generated images in S3 (or any object storage)
Put Cloudflare in front of S3 with a cache rule
Set cache headers on your S3 objects

const { S3Client, PutObjectCommand } = require("@aws-sdk/client-s3")
 
const s3 = new S3Client({ region: "us-east-1" })
 
async function storeAndServeImage(cacheKey, imageBuffer) {
  const key = `generated/${cacheKey}.png`
 
  await s3.send(
    new PutObjectCommand({
      Bucket: process.env.S3_BUCKET,
      Key: key,
      Body: imageBuffer,
      ContentType: "image/png",
      CacheControl: "public, max-age=86400", // CDN caches for 24 hours
    })
  )
 
  return `https://cdn.yourdomain.com/${key}`
}

With a CDN in place, the same image gets served from edge nodes close to your users. The first request hits your origin; every request after that is served from cache. For more on optimizing costs at scale, see cost optimization at scale.

Pre generation vs on demandPre-Generation vs On-Demand

There are two approaches to image generation, and most production apps use both.

Pre-generation works when you know what images you'll need ahead of time. Product catalogs, scheduled social media posts, email campaigns. All of these have predictable data. Generate the images in a batch job during off-peak hours, store the results, and serve from cache.

# Pre-generate images for a product catalog
import hashlib
import json
import requests
 
def pre_generate_catalog(products, template_id):
    for product in products:
        cache_key = hashlib.sha256(
            f"{template_id}:{json.dumps(product, sort_keys=True)}".encode()
        ).hexdigest()
 
        response = requests.post(
            "https://api.imejis.io/v1/images",
            headers={"Authorization": f"Bearer {API_KEY}"},
            json={"template": template_id, "data": product}
        )
 
        if response.ok:
            store_in_cache(cache_key, response.json())
            print(f"Generated image for {product.get('name', 'unknown')}")
        else:
            print(f"Failed: {response.status_code}")

On-demand generation is for dynamic content: user-generated data, real-time dashboards, personalized images. Use the caching strategy from earlier so repeat requests don't hit the API.

The hybrid approach is what I'd recommend for most teams: pre-generate what you can predict, cache on-demand results for everything else.

Monitoring and alertingMonitoring and Alerting

You can't fix what you can't see. Track these four metrics:

Cache hit ratio: aim for 80%+. If it's lower, your cache TTL might be too short or your cache keys aren't consistent.
API response time: set an alert if p95 exceeds 5 seconds.
Error rate: track 429s and 5xx errors separately. A spike in 429s means you need to throttle more aggressively.
Queue depth: if your job queue keeps growing, you're generating faster than you can process.

// Simple monitoring with counters
const metrics = {
  cacheHits: 0,
  cacheMisses: 0,
  apiErrors: 0,
  totalGenerated: 0,
}
 
function logMetrics() {
  const hitRatio =
    metrics.cacheHits / (metrics.cacheHits + metrics.cacheMisses) || 0
  console.log({
    ...metrics,
    cacheHitRatio: `${(hitRatio * 100).toFixed(1)}%`,
  })
}
 
// Log every 60 seconds
setInterval(logMetrics, 60000)

In production, send these metrics to your observability platform (Datadog, Grafana, or even a simple dashboard). The cache hit ratio alone will tell you if your caching strategy is working.

Architecture patternsArchitecture Patterns

Here are three patterns I've seen work well. Pick the one that matches your workload.

Pattern 1: On-Demand with Cache Best for: apps with unpredictable, user-driven image generation.

Request → Check Cache → [Hit] → Return cached URL
                      → [Miss] → Call API → Store in Cache → Return URL

Simple, effective, and handles most use cases. Start here.

Pattern 2: Pre-Generation Pipeline Best for: catalogs, scheduled content, bulk campaigns.

Cron Job → Load Data → Queue Jobs → Workers → Generate → Store in S3/CDN

Runs on a schedule, generates everything ahead of time. Zero latency for end users since images already exist when they're needed.

Pattern 3: Hybrid Best for: apps that have both predictable and unpredictable image needs.

Predictable content → Pre-generation pipeline → S3/CDN
Dynamic content → On-demand with cache → Redis → CDN

This is what most production apps end up with. Pre-generate what you can, cache the rest. Check out image API speed benchmarks to understand how response times factor into your architecture choice.

Get startedGet Started

The patterns in this guide work with any image generation API, but they're especially effective with Imejis.io because of its fast response times and predictable rate limits.

Start with caching. It's the single highest-impact change you can make. Add queue-based generation when you're doing bulk work. Layer in CDN and monitoring as you scale.

If you're processing large datasets, batch image generation from CSV covers the data pipeline side of things.

FaqFAQ

What are typical rate limits for image apisWhat are typical rate limits for image APIs?

Most image APIs allow 10-60 requests per second. Imejis.io handles concurrent requests well, but check your plan's monthly credit limit. Rate limits protect the service from abuse.

Should i cache generated imagesShould I cache generated images?

Yes, always. Generate once, serve many times. Store the image URL or file in a cache (Redis, CDN, or filesystem). Never regenerate the same image for the same input data.

Whats the best caching strategyWhat's the best caching strategy?

Hash your template ID + input data to create a cache key. Check the cache before calling the API. If cache miss, generate and store. Most teams use Redis with a 24-hour TTL.

How do i handle rate limit errorsHow do I handle rate limit errors?

Catch 429 (Too Many Requests) responses. Implement exponential backoff: wait 1s, then 2s, then 4s. Most APIs include a Retry-After header telling you exactly how long to wait.

Can i pre generate images to avoid rate limitsCan I pre-generate images to avoid rate limits?

Yes. For predictable content (product catalogs, scheduled posts), generate images in batch during off-peak hours. Store results and serve from cache when needed.