Platformatic Blog

Stop Request Stampedes at the Gateway with Platformatic Deduplication

Paolo Insogna — Tue, 30 Jun 2026 14:38:38 GMT

Picture an online store launching a new product and sending out a mailing list campaign. Thousands of users click the same link at once. The product page, built with a Node.js app like Next.js, needs to fetch the same product details, inventory, recommendations, and pricing for each request.

If the page is cached, everything works smoothly. Problems begin when the cache is empty, expired, or being refreshed. Then, the first group of users all miss the cache together, and every request asks the app to generate the same response. (This same thing would also happen with a trending news story, a search crawler, frontend prefetching, or after a cache reset.)

This situation is known as the “thundering herd” problem. Every user request is valid, but the work gets repeated. Your app uses CPU, database, and network resources to calculate the same result over and over, just when response times are already strained.

Rather than sending all traffic straight to your app, wouldn’t it be great if you could place a gateway in front that spots duplicate in-flight reads and combines them before they hit Node.js?

Platformatic Gateway now does exactly this. With request deduplication, it merges concurrent requests for the same data. Only one goes upstream, while the others wait and then get the same response.

This is not a replacement for caching. Instead, it acts as a short-term coordination layer for requests in progress. The cache handles future requests, while deduplication shields your app while the first response is still being generated.

This is especially important for self-hosted Next.js apps. A popular route can cause heavy server rendering, React Server Component processing, image metadata checks, or backend API calls. When many users hit that route at once, or during cache refreshes, deduplication stops the gateway from sending the same work to Next.js over and over.

A Local Benchmark

To see how much this helps, we ran a simple local test with a purposely slow route behind a proxy. The upstream route waited 100 ms before responding, simulating a page or API call that needs backend work. The test used the same leader/waiter pattern as gateway deduplication and sent 100 requests at once to the same URL.

These are the median numbers from three runs:

Scenario	Client requests	Upstream requests	Average latency	p99 latency	Errors
Without deduplication	1,000	1,000	111.31 ms	134 ms	0
With deduplication	1,000	10	104.88 ms	127 ms	0
Without deduplication	10,000	10,000	104.80 ms	122 ms	0
With deduplication	10,000	100	102.91 ms	106 ms	0

The key result isn’t the slight change in average latency, but the number of upstream requests. With deduplication, the proxy still answers every client, but the upstream app only processes one response per wave of requests. In a test with 10,000 requests, that meant just 100 upstream responses instead of 10,000. In real situations, this can mean the difference between a burst that overwhelms your app and one that the gateway handles smoothly.

How It Works

Gateway deduplication uses a leader/waiter model that is easy to reason about in production.

The first matching request becomes the leader. It acquires a lock, goes to the upstream application, buffers the response, stores it for a short time, and notifies any waiters. Concurrent requests with the same key become waiters. They do not call the upstream service immediately; instead, they wait for the leader response and replay it when it becomes available.

In a production gateway, deduplication often sits next to caching. You can use separate Valkey instances or separate key prefixes in the same Valkey deployment, but the two stores serve different purposes: cache storage keeps reusable responses, while deduplication storage keeps short-lived locks and response buffers for in-flight requests.

To prevent deadlocks, every coordination point has an expiration. The leader lock uses a lockTtl, waiters have a timeout, and retries are limited. If the leader fails, the lock expires, a waiter times out, or retries run out, the request switches back to normal proxying. This fallback is intentional—deduplication is meant to lower load, not cause requests to get stuck during traffic spikes.

For a single request, the decision path looks like this:

The Simplest Configuration

Enable deduplication globally under gateway.deduplication:

{
  "gateway": {
    "deduplication": {
      "enabled": true
    },
    "applications": [
      {
        "id": "frontend",
        "proxy": {
          "prefix": "/"
        }
      }
    ]
  }
}

By default, deduplication works for GET and HEAD requests and uses memory for storage.

This default is intentionally cautious. GET and HEAD are the safest types of requests to deduplicate. Write requests often need custom rules before they can be safely coordinated.

Per-Application Overrides

You can also configure deduplication per proxied application with gateway.applications[].proxy.deduplication. Application-level options override the global options.

{
 "gateway": {
   "deduplication": {
     "enabled": true,
     "methods": ["GET"]
   },
   "applications": [
     {
       "id": "frontend",
       "proxy": {
         "prefix": "/",
         "deduplication": {
           "enabled": true,
           "routes": [{ "method": "GET", "path": "/blog/*" }]
         }
       }
     }
   ]
 }
}

This approach lets you begin where the benefits are clear. You can deduplicate public catalogue pages, blog posts, product details, or framework prefetch routes, while keeping endpoints with strict per-user behaviour unchanged.

Choosing The Deduplication Key

The default key is computed from:

the configured application origin
the HTTP method
the rewritten proxy URL, including the query string
selected request headers

The default headers are:

["authorization", "cookie", "accept", "accept-language"]

Including headers matters because many read responses are not only a function of the URL. A localized page can depend on accept-language. A user-specific page can depend on cookie or authorization. If those headers were ignored, unrelated callers could incorrectly share a response.

You can adjust the headers included in the key for a deduplication configuration:

{
 "gateway": {
   "deduplication": {
     "enabled": true,
     "headers": ["authorization", "cookie", "x-tenant-id"]
   }
 }
}

Currently, you can’t set headers per route. If you need different header behaviour for different routes, use separate deduplication settings for each application or create a custom key function.

For full control, provide a synchronous key function:

{
 "gateway": {
   "deduplication": {
     "enabled": true,
     "key": "./deduplication-key.js"
   }
 }
}

export function computeDeduplicationKey(request, context) {
 return `${context.origin}:${context.method}:${context.url}`
}

The function receives the request and a context object containing the origin, method, rewritten URL, parsed query, selected headers, and application configuration. It must return the key synchronously.

Route Whitelisting

For tighter control, configure a route whitelist. Routes use find-my-way syntax.

{
 "gateway": {
   "deduplication": {
     "enabled": true,
     "routes": [
       { "method": "GET", "path": "/blog/*" },
       { "methods": ["GET", "HEAD"], "path": "/products/:id" }
     ]
   }
 }
}

When routes are configured, route matching decides whether deduplication applies. When routes are not configured, the methods list decides.

Storage: Memory Or Valkey

TBy default, memory is used for storage. It handles duplicate requests within a single gateway instance and doesn’t need any external service. This setup is great for local development, single-instance deployments, and easy rollouts.

{
 "gateway": {
   "deduplication": {
     "enabled": true,
     "storage": {
       "adapter": "memory"
     }
   }
 }
}

For deployments that scale horizontally, use the valkey adapter. It stores locks, response pointers, and buffered responses in a Redis-compatible Valkey server, allowing multiple gateway workers, instances, or pods to coordinate. This way, you get the same deduplication benefits even when traffic is spread across several replicas.

{
 "gateway": {
   "deduplication": {
     "enabled": true,
     "storage": {
       "adapter": "valkey",
       "url": "redis://127.0.0.1:6379",
       "prefix": "my-application"
     }
   }
 }
}

Use a prefix if several applications share the same Valkey instance and need separate key spaces.

Operational Behavior

Deduplication is a best-effort feature, not a guarantee of exactly-once processing.

Duplicate upstream requests can still occur if the in-flight lock expires before the upstream response is ready, if a gateway instance fails while handling the leader request, if a waiter times out, or if retries run out. In these cases, the gateway just switches back to normal proxying.

The main timing options are:

timeout: how long a duplicate request waits for the leader response before retrying lock acquisition
retries: how many additional deduplication attempts are made before falling back to normal proxying
ttl: how long stored responses remain available for waiting requests
lockTtl: how long an in-flight lock can live before it expires

Defaults are:

{
 "timeout": 1000,
 "retries": 3,
 "ttl": 10000,
 "lockTtl": 500
}

Responses are fully buffered before being replayed. This works well for short bursts of duplicate reads, but large responses can use more gateway memory and, with Valkey, add some storage overhead. It’s best to start with routes where responses are small and where repeated upstream work is already costing you in money, speed, or capacity.

Custom Gateway Handlers

Deduplication comprises custom gateway handlers. When both a custom handler and deduplication are configured, deduplication runs first, and the leader request is delegated to the handler.

Handlers that use reply.from() do not need special handling. Platformatic Gateway uses reply.from() from @fastify/reply-from to proxy upstream requests.

export function handler(request, reply, dest, options) {
 return reply.from(dest, options)
}

If a handler overrides onResponse or onError, it can call the helper functions provided in options, so waiting requests still receive the correct signal:

export function handler(request, reply, dest, options) {
 return reply.from(dest, {
   ...options,
   async onResponse(request, reply, res) {
     reply.header('x-custom-handler', 'true')
     return options.deduplicateResponse(request, reply, res)
   },
   async onError(reply, error) {
     return options.deduplicateError(reply, error)
   }
 })
}

Handlers that send responses directly without reply.from() cannot be replayed by gateway deduplication.

Metrics

The feature also adds Gateway metrics so you can prove whether deduplication is helping in production:

gateway_deduplication_leader_count
gateway_deduplication_waiter_count
gateway_deduplication_replay_count
gateway_deduplication_fallback_count
gateway_deduplication_error_count

These counters help answer real-world questions: how many requests became leaders, how many waited, how many were replayed, and how often the gateway had to switch back to normal proxying.

Conclusion

Gateway deduplication works best when lots of clients request the same resource at once, and the upstream response can be safely reused for matching keys. This is just like the product launch or hot-news example from earlier: many users show up at once, the cache is cold or refreshing, and your app is about to generate the same page over and over.

The best places to start are public, read-heavy routes, framework prefetch endpoints, cache refresh paths, and expensive upstream reads with limited response sizes. Turn it on for a small set of routes first. Keep an eye on the leader, waiter, replay, and fallback metrics. Then, expand to other routes where you see duplicate work causing the most trouble.

You’ll see results right away: the gateway handles traffic spikes, your services do less repeated work, and users get more consistent response times during busy periods.

This kind of optimization adds up over time. You protect your upstream resources without changing your app code. You cut down on unnecessary backend load before it hits your databases, APIs, or rendering services. You also get metrics to see if the feature is delivering value. And if deduplication can’t help in a certain case, the request just goes through as usual.

For teams using Platformatic Gateway in front of modern web apps, especially self-hosted Next.js apps, request deduplication is a practical way to make read-heavy traffic more manageable. It gives you a safety net for traffic bursts, an easier way to scale with Valkey, and a rollout approach that doesn’t require changing your backend services.

Run Medusa on Kubernetes with Watt as a Monorepo

Paolo Insogna — Tue, 28 Apr 2026 14:30:00 GMT

Medusa stands out as a flexible open source commerce platform for Node.js. It offers teams a customizable backend, admin tools, and a modern storefront, all without locking you into a strict SaaS model. This makes it ideal for teams who want to move quickly and keep control over their architecture.

Running Medusa in production is more than just starting a single process. The real challenge is keeping the entire commerce stack fast, organized, and easy to update, especially when you have a backend, storefront, admin UI, image optimization, internal networking, and Kubernetes involved.

This is where using a Watt monorepo really helps.

Watt is Platformatic’s tool for combining multiple Node.js apps into one deployable unit by running them as worker threads under a single process.

Medusa can be deployed in a Kubernetes environment. To manage, monitor, and optimize your application in this setting, you can use the Intelligent Command Center (ICC). ICC is a sophisticated cloud control plane that provides intelligent management, monitoring, and optimization of cloud-native applications deployed in Kubernetes environments. ICC offers enterprise-grade features for application lifecycle management, intelligent autoscaling, compliance monitoring, and comprehensive observability.

For basic deployment, simply running Watt on Kubernetes is sufficient.

Rather than spreading complexity across multiple repos, custom Dockerfiles, and manual service connections, you can keep everything in one workspace and let Watt manage it as a single platform. This gives you one dependency graph, one build process, one deployment artifact, and a single place to manage the rules that keep your system running smoothly.

In this post, we will look at a working Medusa setup deployed on ICC with:

web/backend: Medusa backend via @platformatic/node
web/frontend: Medusa Next.js starter via @platformatic/next
web/gateway: public routing via @platformatic/gateway
image-server: a dedicated @platformatic/next image optimizer application that reuses the same codebase as web/frontend

This set-up can be both far easier to manage and more performant. Let’s explore.

Why a monorepo is a good fit for Medusa

Medusa already pushes you toward a multi-application architecture. Even in a relatively standard deployment, you are dealing with:

a backend API
an admin UI
a storefront
image optimization
environment variables shared across services
public and internal URLs that must stay aligned

You can spread these parts across different repositories and deployment pipelines, but as soon as you do, even simple changes become complicated.

For example, changing a base path means updating several repos. Keeping React versions consistent gets harder. Coordinating Docker changes turns into a big release task. Even figuring out if the storefront is calling the right backend can take more effort than it should.

With Watt, the monorepo becomes the control plane for the whole stack.

Each application stays isolated as a worker thread with Watt.
The whole platform is configured in one place.
Internal service discovery comes for free.
Deployment stays a single build and a single runtime entry point.

This approach gives you the best of both worlds: separation where it matters, and simplicity where you want it.

The workspace layout

The sample project is structured like this:

.
|-- package.json
|-- pnpm-workspace.yaml
|-- watt.json
`-- web
    |-- backend
    |   |-- medusa-config.ts
    |   |-- package.json
    |   |-- url-handler.js
    |   `-- watt.json
    |-- frontend
    |   |-- next.config.js
    |   |-- package.json
    |   |-- watt.image-optimizer.json
    |   |-- watt.json
    |   `-- src
    `-- gateway
        |-- package.json
        `-- watt.json

At the root, watt.json autoloads the web/* applications, sets gateway as the public entrypoint, and adds an extra application called image-server that reuses the frontend codebase with a different config.

This is where the monorepo model really shines. You can easily reuse the same codebase for different runtime roles. There’s no need to create a second Next.js project just to separate /_next/image. Instead, you keep one frontend codebase and let Watt run it in two different ways.

pnpm workspace setup: one dependency graph, fewer surprises

If you use pnpm, make the workspace explicit with pnpm-workspace.yaml:

packages:
 - web/*

Then pin the React family at the root in package.json:

{
 "pnpm": {
   "overrides": {
     "react": "19.0.4",
     "react-dom": "19.0.4",
     "@types/react": "19.0.4",
     "@types/react-dom": "19.0.4"
   }
 }
}

This is a clear reason why using a monorepo matters. The Medusa storefront, Next.js, and related tools all rely on React. In a multi-repo setup, versions can easily get out of sync. With a Watt monorepo, you set the version once at the root, and every app benefits right away.

This makes building more predictable and keeps maintenance costs much lower.

One .env, clear public and internal boundaries

The root .env needs a few shared values:

REDIS_HOST
MEDUSA_PUBLIC_BACKEND_URL
MEDUSA_BACKEND_URL

The key distinction is this:

MEDUSA_PUBLIC_BACKEND_URL is for the externally visible backend URL
MEDUSA_BACKEND_URL is for server-side calls from the frontend

On ICC, this is the ideal setup:

MEDUSA_PUBLIC_BACKEND_URL=https://medusa.plt/backend
MEDUSA_BACKEND_URL=http://backend.plt.local

Why it matters:

browsers and the admin UI use the public backend URL
The frontend server uses http://backend.plt.local and stays on the Platformatic mesh.

It’s worth emphasizing that second point, since it provides both great DevEx and a substantial performance boost. Thanks to Watt and inter-thread communication, server-side requests skip the public gateway and stay within the process’s internal network.

Once again, the monorepo helps here. The internal service name and public URL strategy are side-by-side in the same workspace, making them much harder to misconfigure.

Backend: run Medusa as a Watt application

In web/backend/package.json, add @platformatic/node:

{
 "dependencies": {
   "@platformatic/node": "^3.44.0"
 }
}

Then configure web/backend/watt.json:

{
 "$schema": "https://schemas.platformatic.dev/@platformatic/node/3.44.0.json",
 "application": {
   "basePath": "/backend",
   "commands": {
     "development": "npm run dev",
     "build": "npm run build",
     "production": "npm run start"
   },
   "changeDirectoryBeforeExecution": false,
   "entrypointPort": 3000
 },
 "node": {
   "disableBuildInDevelopment": true,
   "dispatchViaHttp": true,
   "absoluteUrl": true
 },
 "watch": false
}

This setup gives Medusa a clear application boundary within the workspace, while still allowing the gateway to publish it under /backend.

The companion change in web/backend/medusa-config.ts is just as important:

import { defineConfig, loadEnv } from '@medusajs/framework/utils'

loadEnv(process.env.NODE_ENV || 'development', process.cwd())

module.exports = defineConfig({
 projectConfig: {
   databaseUrl: process.env.DATABASE_URL,
   http: {
     storeCors: process.env.STORE_CORS!,
     adminCors: process.env.ADMIN_CORS!,
     authCors: process.env.AUTH_CORS!,
     jwtSecret: process.env.JWT_SECRET || 'supersecret',
     cookieSecret: process.env.COOKIE_SECRET || 'supersecret'
   },
   cookieOptions: {
     sameSite: 'lax',
     secure: false
   }
 },
 admin: {
   path: (new URL(process.env.MEDUSA_PUBLIC_BACKEND_URL!).pathname + '/app') as `/string`,
   backendUrl: process.env.MEDUSA_PUBLIC_BACKEND_URL,
   vite: config => {
     config.server.allowedHosts ??= []
     config.server.allowedHosts.push('.plt.local')
   }
 }
})

The admin path comes from the public backend URL. So, if ICC publishes the backend at /backend, the admin will automatically be available at /backend/app.

You should also keep web/backend/url-handler.js in place. Medusa’s API and admin UI do not behave identically when you put them behind a prefixed public path, so Watt’s gateway uses this file to rewrite requests correctly.

The implementation used in the sample project looks like this:

const basePath = process.env.PLT_BASE_PATH ?? ''
const adminPath = new URL(process.env.MEDUSA_PUBLIC_BACKEND_URL).pathname.replace(/\/$/, '')
const adminUiPath = adminPath + '/app'
const adminMatcher = new RegExp(`^${adminPath}`)

export default {
 preRewrite(url) {
   if (basePath && !url.startsWith(basePath)) {
     url = `${basePath}${url}`
   }

   url = url.startsWith(adminUiPath) ? url : url.replace(adminMatcher, '')
   return url
 }
}

This file may be small, but it does important work. It keeps the admin UI path intact while removing the backend prefix for API routes that Medusa expects to serve from the root.

Frontend: one codebase, two runtime roles

In web/frontend/package.json, add @platformatic/next:

{
 "dependencies": {
   "@platformatic/next": "^3.44.0"
 }
}

The standard frontend config in web/frontend/watt.json is simple:

{
 "$schema": "https://schemas.platformatic.dev/@platformatic/next/3.44.0.json",
 "application": {
   "basePath": "{PLT_BASE_PATH}",
   "changeDirectoryBeforeExecution": true
 },
 "next": {
   "trailingSlash": true
 }
}

And in web/frontend/next.config.js, set:

const nextConfig = {
 reactStrictMode: true,
 logging: {
   fetches: {
     fullUrl: true
   }
 },
 eslint: {
   ignoreDuringBuilds: true
 },
 typescript: {
   ignoreBuildErrors: true
 }
}

Here’s where it gets interesting: the monorepo lets you reuse the same frontend codebase as a dedicated image optimization service, with almost no extra work.

Split image optimization without splitting the repo

We recently covered why this architecture matters in our post on scaling Next.js image optimization with a dedicated Platformatic application: image optimization is CPU-heavy and can become a noisy neighbour for SSR traffic.

That is exactly why this Medusa setup runs /_next/image separately.

Create web/frontend/watt.image-optimizer.json:

{
 "$schema": "https://schemas.platformatic.dev/@platformatic/next/3.44.0.json",
 "logger": {
   "level": "trace"
 },
 "application": {
   "basePath": "/",
   "changeDirectoryBeforeExecution": true
 },
 "next": {
   "trailingSlash": true,
   "imageOptimizer": {
     "enabled": true,
     "fallback": "frontend",
     "timeout": 30000,
     "ttl": 3600000,
     "maxAttempts": 3,
     "storage": {
       "type": "valkey",
       "url": "{REDIS_HOST}"
     }
   }
 }
}

This is a great example of why Watt monorepos work so well.

You reuse the same frontend app.
You keep one source tree.
You give it a second runtime role.
You isolate a CPU-heavy path without creating a second frontend project.

This setup improves both maintainability and performance, which is exactly what you want from your platform architecture.

The fallback: "frontend" setting is especially nice here: relative image URLs are resolved through the main storefront service over the runtime network, so the optimizer stays tightly integrated without being coupled to the frontend worker pool.

Next.js build-time pragmatism: force dynamic where it helps

Because the Medusa backend is not available during the wattpm build, the storefront cannot pre-generate some pages safely.

For these files:

web/frontend/src/app/[countryCode]/(main)/products/[handle]/page.tsx
web/frontend/src/app/[countryCode]/(main)/categories/[...category]/page.tsx
web/frontend/src/app/[countryCode]/(main)/collections/[handle]/page.tsx

comment out generateStaticParams and add:

export const dynamic = 'force-dynamic'

This uses Next.js Route Segment Config to force runtime rendering instead of static generation.

In a typical Next.js app, this might seem like a compromise. But in this setup, it’s the right choice. The storefront relies on live Medusa data, and Watt provides that backend at runtime.

This is another area where the monorepo helps. The build behaviour is clear because the backend and frontend are in the same workspace, and their dependencies are easy to see.

Gateway: one public surface for the whole stack

Add @platformatic/gateway in web/gateway/package.json:

{
 "dependencies": {
   "@platformatic/gateway": "^3.44.0"
 }
}

Then define web/gateway/watt.json like this:

{
 "$schema": "https://schemas.platformatic.dev/@platformatic/gateway/3.44.0.json",
 "gateway": {
   "applications": [
     {
       "id": "backend",
       "proxy": {
         "prefix": "/backend",
         "custom": {
           "path": "../backend/url-handler.js"
         }
       }
     },
     {
       "id": "frontend",
       "proxy": {
         "prefix": "/"
       }
     },
     {
       "id": "image-server",
       "proxy": {
         "prefix": "/",
         "routes": ["/_next/image", "/_next/image/*"],
         "methods": ["GET"]
       }
     }
   ]
 }
}

This is where the monorepo approach really starts to feel smooth and efficient.

/backend goes to Medusa
/ goes to the storefront
GET /_next/image goes to the image optimizer

Thanks to @platformatic/gateway, you get one public entry point, but the traffic still lands on the right internal application.

This setup is easier to understand, change, and scale than trying to connect separate services outside the repo.

A small middleware detail that improves the experience

There is another subtle optimization in the storefront middleware (web/frontend/src/middleware.ts).

When the request already contains a country code in the URL but does not yet have the medusacache_id cookie, the middleware sets that cookie and returns NextResponse.next() instead of forcing another redirect.

It’s a small detail, but it’s the kind of optimization that’s easier to maintain in a monorepo. Storefront routing, Medusa region lookups, and platform-level caching thanks to Watt HTTP caching handling are all managed together.

In practice, this helps the storefront set up its region-aware state smoothly, without extra steps.

The change is small enough to think of as a focused patch:

 if (urlHasCountryCode && !cacheIdCookie) {
+    const response = NextResponse.next()

   response.cookies.set('_medusa_cache_id', cacheId, {
     maxAge: 60 * 60 * 24
   })

   return response
 }

This is the kind of practical improvement that’s easier to maintain when routing logic, storefront behaviour, and platform deployment are all in the same repo.

ICC environment values

In .env.icc, the main settings to align are:

MEDUSA_PUBLIC_BACKEND_URL=https://medusa.plt/backend
STORE_CORS=https://docs.medusajs.com,https://medusa.plt
ADMIN_CORS=https://docs.medusajs.com,https://medusa.plt
AUTH_CORS=https://docs.medusajs.com,https://medusa.plt
NEXT_PUBLIC_BASE_URL=https://medusa.plt

They all reflect the same core rule: the whole application is published under /medusa, so both Medusa and Next.js need to agree on that public shape.

Since these settings are in one workspace and one deployment artifact, keeping them in sync is much easier than with a split-repo setup.

The Docker build is simple because the repo is simple

The container image is straightforward:

FROM node:22-alpine

# Environment setup
ENV APP_HOME=/home/app/node/
ENV PLT_BASE_PATH="/medusa"
ENV PLT_ICC_URL="http://icc.platformatic.svc.cluster.local"
WORKDIR $APP_HOME

# Install dependencies
RUN npm install -g pnpm wattpm-utils "@platformatic/watt-extra@latest"
COPY package.json pnpm-lock.yaml pnpm-workspace.yaml $APP_HOME
RUN pnpm install --frozen-lockfile --node-linker=hoisted

# Copy application
COPY web $APP_HOME/web
COPY .env.icc watt.json $APP_HOME
RUN mv .env.icc .env
RUN pnpm run build

# Final setup
EXPOSE 3042
EXPOSE 9090
CMD ["watt-extra", "start"]

There are two details worth mentioning.

First, using --node-linker=hoisted with pnpm installs dependencies in a flatter layout, instead of the usual symlink-heavy structure. In a workspace with Medusa, Next.js, shared React versions, and several Watt apps, this makes module resolution more predictable and helps avoid compatibility issues during container builds.

Second, @platformatic/watt-extra is a helper CLI that starts Watt smoothly in container environments like ICC. It adds the operational support you need at runtime, so your container entrypoint remains simple.

This is another area where the monorepo pays off right away: you have one install step, one build step, and one runtime command.

Why does this feel better to maintain

The main advantage of this Medusa setup isn’t any single config file. It’s the overall structure:

One repo for backend, frontend, gateway, and optimizer
One dependency strategy
One place to define public and internal URLs
One deployment artifact for Kubernetes and ICC
One runtime that still preserves application boundaries

Since Watt sees the platform as a group of coordinated apps, you can make performance improvements without making the system harder to manage.

You can send image optimization to a dedicated service, keep frontend-to-backend calls on the mesh network, mount everything under a base path, and update all these rules in one place.

That’s the real value of running Medusa in a Watt monorepo on ICC: convenience and performance work together, instead of getting in each other’s way. Because ICC provides a Kubernetes (K8S)-native environment, your monorepo and its services benefit from K8s's inherent scalability, resilience, and orchestration capabilities. This integration ensures that deploying and managing Medusa within the Watt monorepo is seamless, leveraging the enterprise-grade infrastructure of ICC (which is built on K8S) for optimal operational efficiency.

If you’re building commerce systems with lots of moving parts, this is the kind of platform setup you want.

Introducing Regina: Stateful AI Agent Orchestration for Platformatic Watt

Paolo Insogna — Tue, 14 Apr 2026 14:30:00 GMT

We’re excited to share Regina, a production-ready agent orchestration layer built on Platformatic Watt.

Regina lets you go from single-agent demos to real systems you can run and scale confidently. You define agents in Markdown, start instances over HTTP, and get built-in lifecycle management, persistence, and recovery, all by running and managing agents in Watt as isolated worker threads.

Why Regina, why now

Most AI projects hit the same wall after the first demo:

Prompts are not versioned in a clean, operational way.
Sessions disappear on restart.
Scaling introduces routing and state headaches.
Tool-heavy workflows are hard to observe and control.

Regina solves these problems directly, so your team can focus on building your product and not re-inventing the wheel when it comes to complex orchestration and state management.

What you get on day one

Regina comes as three packages:

@platformatic/regina: per-pod agent manager
@platformatic/regina-agent: per-agent runtime
@platformatic/regina-storage: pluggable backup adapters (fs, s3, redis)

With this stack, you’ll have:

stateful agent instances with per-instance SQLite VFS (“Virtual File System”)
suspend/resume lifecycle management with idle timeout control
NDJSON streaming events for full run visibility
steerable agentic loops via POST /instances/:id/steer
storage-backed restore for resilient multi-pod operation

Markdown-native agent definitions

Regina uses Markdown with YAML frontmatter as the main source for each agent.

---
name: support-agent
description: Customer support assistant
model: anthropic/claude-sonnet-4-5
provider: vercel-gateway
tools:
 - ./tools/search-docs.ts
temperature: 0.3
maxSteps: 10
---
You are a helpful support agent.

This setup keeps prompt and runtime configuration together, so it’s easy to review in pull requests and update across teams.

Built for real runtime behaviour

Regina keeps management and execution separate:

@platformatic/regina discovers definitions, spawns instances, and proxies instance APIs
@platformatic/regina-agent runs each instance in isolation
message history is persisted at /.session/messages.jsonl in each instance VFS

This design gives you reliable performance, even under heavy load:

Idle suspension to free resources automatically
Auto-resume on next request
State continuity across restarts
Rich streaming (text-delta, tool-call, tool-result, step-finish)

In practice, running your agents with Regina and Watt gives you agents that act like durable workflows backed by persistent state instead of ephemeral, one-off chat sessions.

Start simple and scale smoothly

Regina works great in single-pod mode with no Redis, no external storage, and minimal setup.

As your traffic grows, you can add Redis or Valkey for member and instance mapping, and add shared storage for state restore if needed. The API stays the same, so clients don’t need to change as your setup evolves.

Getting started

The Regina demo app is small on purpose, but it shows the full production pattern in a single repo:

watt.json at the root defines a single entrypoint service for Regina
services/regina/watt.json enables @platformatic/regina and points to the shared agents/ directory
Each file in agents/ is a full agent definition (prompt + model + provider + tools)
Custom tools sit alongside agents in agents/tools/*

Here’s how the demo app is set up.

Root watt.json:

{
 "$schema": "https://schemas.platformatic.dev/wattpm/3.50.0.json",
 "server": {
   "port": 3042
 },
 "management": true,
 "entrypoint": "regina",
 "services": [
   {
     "id": "regina",
     "path": "./services/regina",
     "management": {
       "operations": ["addApplications", "removeApplications", "getApplications", "getApplicationDetails", "inject"]
     }
   }
 ]
}

Service services/regina/watt.json:

{
 "$schema": "https://schemas.platformatic.dev/@platformatic/regina/0.1.0.json",
 "regina": {
   "agentsDir": "../../agents"
 }
}

Example agent definition (agents/assistant.md):

---
name: assistant
description: A general-purpose assistant with file and shell access
model: anthropic/claude-sonnet-4-5
provider: vercel-gateway
greeting: "Hi! I'm a general-purpose assistant. I can read and write files, run commands, and help with any task."
temperature: 0.3
maxSteps: 15
---
You are a helpful assistant. You can read, write, and edit files, run bash commands, and help with any task.

Here’s a typical flow in the demo:

Start Watt (wattpm start).
Create an instance from an agent definition (POST /agents/:defId/instances).
Chat with that instance (POST /instances/:instanceId/chat or /chat/stream).
Resume the same instance later with history already available.

This is important because it shows Regina’s core value from start to finish: agents are defined as code, run as managed instances, and keep their state across requests without extra orchestration work.

Storage options for state backup

For multi-pod setups, configure regina.storage so you can restore on another pod.

Filesystem (fs)

{
 "module": "@platformatic/regina",
 "regina": {
   "storage": {
     "type": "fs",
     "basePath": "/mnt/shared/regina-state"
   }
 }
}

Object storage (s3)

{
 "module": "@platformatic/regina",
 "regina": {
   "storage": {
     "type": "s3",
     "bucket": "regina-state",
     "prefix": "backups/",
     "endpoint": "https://s3.amazonaws.com"
   }
 }
}

Redis (redis)

{
 "module": "@platformatic/regina",
 "regina": {
   "redis": "redis://valkey:6379",
   "storage": {
     "type": "redis"
   }
 }
}

All adapters use the same interface (put, get, delete, list, close), so you can switch backends without changing how your clients work.

Get started

Regina is built for teams shipping serious AI systems on Node.js. If you need agents that are reliable, observable, and stateful in production, Regina is ready for you.

@platformatic/kafka Now Supports Confluent Schema Registry

Paolo Insogna — Tue, 07 Apr 2026 14:30:00 GMT

If you run Kafka in production, you can’t skip schema evolution. Teams need clear data types, compatibility checks, and a safe way to update contracts without breaking consumers or downstream services.

Before now, using @platformatic/kafka with Confluent Schema Registry meant writing extra code to connect the pieces. With @platformatic/kafka v1.27.0, that’s no longer needed.

@platformatic/kafka now has built-in support for Confluent Schema Registry, including:

AVRO
Protocol Buffers
JSON Schema
Basic and Bearer authentication
Automatic schema fetch and caching
Integrated Producer and Consumer hooks

You get schema-aware messaging, and the project still focuses on being fast and predictable for Node.js Kafka clients.

Why This Matters

Most schema registry integrations add complexity where you don’t want it: in the message serialization and deserialization paths. Fetching remote schemas is asynchronous, but encoding and decoding should stay synchronous for speed and consistency.

Put simply, network I/O and cache coordination should happen before the main data processing, not during it. Keeping these steps separate helps maintain stable throughput and latency as traffic increases.

This release introduces a two-layer architecture to keep that separation clear:

Low-level hooks for async pre-processing:
- beforeSerialization
- beforeDeserialization
High-level registry API via ConfluentSchemaRegistry

In practice, this means schemas are fetched and cached before encode/decode happens, so your serializers and deserializers stay synchronous when messages are processed.

This gives application teams a simpler way to think about things: do the asynchronous prep first, then keep codec behavior predictable during main processing.

At a high level, the flow is:

Extract schema ID from message metadata (producer) or wire payload (consumer).
Resolve schema from local cache when available.
On cache miss, fetch asynchronously via beforeSerialization/beforeDeserialization hooks and cache the schema.
Run synchronous serialization/deserialization with the resolved schema.

In multi-instance deployments, that cache layer can be backed by Redis or Valkey, so workers share schema state across nodes while keeping encode/decode synchronous in the hot path.

What You Can Do Now

You can connect a registry directly to both the Producer and Consumer, letting @platformatic/kafka handle schema-aware serialization from start to finish.

This is especially helpful when several services publish and consume the same topics on different deployment cycles, since consistent schema handling is a must.

import { Consumer, Producer } from '@platformatic/kafka'
import { ConfluentSchemaRegistry } from '@platformatic/kafka/registries'

const registry = new ConfluentSchemaRegistry({
  url: 'http://localhost:8081'
})

const producer = new Producer({
  clientId: 'orders-producer',
  bootstrapBrokers: ['localhost:9092'],
  registry
})

const consumer = new Consumer({
  groupId: 'orders-consumers',
  clientId: 'orders-consumer',
  bootstrapBrokers: ['localhost:9092'],
  registry
})

When producing, pass schema IDs in message metadata:

await producer.send({
  messages: [
    {
      topic: 'orders',
      key: { orderId: 101 },
      value: { customerId: 'cust-44', total: 129.99 },
      metadata: {
        schemas: {
          key: 10,
          value: 11
        }
      }
    }
  ]
})

When consuming, payloads are automatically decoded with the cached schema. If a schema isn’t found, the registry fetches it before deserialization continues.

This makes it easy to move from custom codec code to a single registry integration in your client setup.

Authentication and Enterprise Scenarios

Schema Registry deployments are often protected. The new integration includes:

Basic auth (username + password)
Bearer token auth (token)
Dynamic credentials via providers

This makes it easier to connect to managed or secured registry instances without writing custom transport code. It also makes credential rotation simpler when you use providers.

If your setup uses short-lived credentials, provider functions let you refresh tokens and secrets without having to rebuild your producer or consumer logic.

Performance and Reliability Considerations

One main design goal was to avoid unnecessary overhead to message processing.

The implementation focuses on cache locality and step-by-step pre-processing:

Schema IDs are extracted from the wire format (or message metadata).
Unknown schemas are fetched once and cached.
Repeated schema IDs in a batch are resolved from the cache.
Encode/decode continues in synchronous paths.

This setup cuts down on unnecessary async work while still supporting remote schema registries safely. It also helps keep throughput and performance steady, as you’d expect from a Node.js client.

Operationally, this also makes failures easier to understand. Schema resolution errors happen during fetch or preparation, while codec errors are still linked to payload and schema compatibility.

Also Included in This Release

The v1.27.0 release also shipped quality improvements around consumer behaviour and protocol handling, with broad test coverage and new playground clients for:

AVRO
Protobuf
JSON Schema
Authenticated Schema Registry setups

The end result is a production-ready integration you can try out quickly, starting in local development and moving to secure production registries.

Experimental API Notice

ConfluentSchemaRegistry and its related hooks are currently experimental. They may change in minor or patch releases as we keep improving them based on real-world use and feedback.

If you plan to use this in production, make sure to pin your versions and check the release notes. We’ll keep refining the API based on feedback from real deployments.

If your team is rolling this out, here’s a practical way to start:

Start with one topic and one schema format (typically AVRO or JSON Schema)
Validate serialization/deserialization behaviour in staging with real payloads.
Expand topic coverage and introduce auth/credential providers as needed.

Getting Started

Install the package:

npm install @platformatic/kafka

For Protobuf support, also install:

npm install protobufjs

Next, follow the full integration guide in the documentation:

If you give it a try, we’d love to hear your feedback at hello@platformatic.dev. Real-world schema workflows will help shape the next version of this API and guide our priorities for future improvements.

Thanks for building with us! 🚀

React SSR Framework Showdown: TanStack Start, React Router, and Next.js Under Load

Matteo Collina — Tue, 17 Mar 2026 14:30:00 GMT

Performance benchmarks capture a moment, not a final judgment. Results depend on a specific workload, scale, and constraints; they do not rank frameworks by value. Next.js stands out for its widespread adoption, strong compatibility, and vast ecosystem trusted by millions. TanStack, as a newcomer, made bold architectural choices. React Router is positioned differently along the maturity curve. Each framework wins in its own context.

The numbers matter less than the response: every team addressed our shared data and delivered fixes. This collaboration with open data, shared flamegraphs, and upstream fixes makes Node.js a safe, long-term choice for enterprise teams.

We updated our Benchmarks! View the new numbers Here

TL;DR

With help from Claude Code, we built the same eCommerce app in three SSR frameworks and tested them at 1,000 requests per second on AWS EKS. We ran each framework both on Watt and directly on Kubernetes.

The results revealed big performance differences and highlighted a few key themes:

Running Node services on Watt improves average latency.
The TanStack team is doing excellent work. Their framework outperformed the others we tested by a wide margin.
The Next.js team has made impressive performance improvements. Upgrading from v15 to v16 canary more than doubled throughput and reduced latency by six times. Their collaboration also led to a 75% speedup in React’s RSC deserialization, which benefits everyone using React.

Both the TanStack and Next.js team used platformatic/flame to find and resolve critical performance bottlenecks the benchmark uncovered - more on that below.

TanStack Start outperformed React Router by 25% in throughput and had 35% lower latency. Both frameworks achieved a 100% success rate, meaning every request got an HTTP 200 response within our 10-second timeout. This strict definition makes the comparison fair and matches real-world SLA expectations. Next.js struggled under our benchmark load, but upgrading from v15.5.5 to v16.2.0-canary.66 more than doubled its throughput (from 322 to 701 requests per second) and reduced average latency by six times.

To mirror common enterprise eCommerce scenarios, no caching was used in this test, as it’s often avoided due to aggressive personalization and A/B testing. In many large-scale e-commerce deployments, personalization strategies ensure that individual user views have minimal overlap, often less than 5%,which means that cache hits provide minimal benefit compared to the invalidation overhead. This explicit trade-off reflects real-world scenarios, where companies choose to prioritize dynamic user experiences over the potential gains from caching.

Collaboration note: We shared benchmark data and flamegraphs (via platformatic/flame) with both the TanStack and Next.js teams. The TanStack team fixed a critical bottleneck, delivering a 252x improvement in response times. The Next.js team’s Tim Neutkens used our flamegraphs to identify a JSON.parse reviver overhead in React Server Components, resulting in a 75% speedup in RSC deserialization merged into React itself.

While we run these benchmarks on a canary release of Next.js, all the advantages are part of Next.js 16.2.0, which is coming out very soon.

The Challenge: Apples-to-Apples Framework Comparison

Comparing SSR performance (or performance generally) across frameworks is notoriously tricky because teams tend to only write and deploy their apps to a single framework, so it’s rare to get a reasonable “like-for-like” comparison.

Luckily for us, we live in an era where writing code is as cheap as however many tokens it costs to generate your favorite LLM. So we made 3 (more-or-less) identical eCommerce sample applications with the help of our dear friend Claudio (feel free to check out the code for yourself here).

The Application: CardMarket

For these benchmarks, we built a trading card marketplace app, similar to a simpler version of TCGPlayer or CardMarket. The data model includes 5 games (Pokémon, Magic: The Gathering, Yu-Gi-Oh!, Digimon, and One Piece), 50 card sets (10 per game), 10,000 cards (200 per set), 100 sellers with ratings and locations, and 50,000 listings with prices, conditions, and quantities.

The app includes several types of pages and routes to create a realistic e-commerce experience, all generated by Claude Code:

The homepage shows featured games, trending cards, and new releases.
There’s a search page with full-text search, filtering, and pagination.
Game detail pages show info about each game and its sets, while set detail pages list cards with pagination.
Card detail pages display card info and seller listings.
The sellers’ list page shows all sellers with their ratings, and each seller has a profile and listings page.
There’s also a cart page with a static shopping cart.

We made several design choices to keep the implementations consistent:

All data comes from JSON files, and every framework uses the same data.
We added a random 1-5ms delay to simulate real database latency.
Every route uses full SSR with no client-side data fetching.
All versions share the same UI components, layouts, and Tailwind CSS styling.

The Frameworks

We implemented this application in three frameworks:

TanStack Start (v1.157.16) - The newest entrant, built on TanStack Router with Vite for SSR
React Router (v7) - The classic routing library, now with first-class SSR support.
Next.js (v15, updated to v16 canary) - The established leader in React SSR

Each implementation uses the framework’s idiomatic patterns:

TanStack Start: createFileRoute with loader functions
React Router: Route modules with loader exports
Next.js: App Router with Server Components

The Runtimes

For each framework, we tested two runtime configurations:

Node.js - Single-threaded, 6 pods with 1 CPU allocated for each
Watt - Multi-worker with SO_REUSEPORT, 3 pods with 2 CPUs allocated, with 2 workers per pod to use those 6 CPUs to the fullest

All configurations received identical total CPU allocation (6 cores) for fair comparison.

Test Methodology

Infrastructure

EKS Cluster: 4 nodes running m5.2xlarge instances (8 vCPUs, 32GB RAM each)
Load Testing Instance: c7gn.2xlarge (8 vCPUs, 16GB RAM, network-optimized)
Region: us-west-2
Load Testing Tool: Grafana k6

Software Versions

All versions are locked in package.json for reproducible benchmarks:

Load Test Configuration

Each test followed this protocol:

NLB Warm-up: 60 seconds ramping from 10 to 500 req/s
Pre-test Warm-up: 20 seconds at moderate load
Cool-down: 60 seconds before the main test
Main Test: 60 seconds ramp-up to 1,000 req/s, then 120 seconds sustained
Between Tests: 480 seconds cooldown

Realistic Traffic Distribution

The load test simulated realistic e-commerce traffic patterns:

Results

TanStack Start: The Performance Leader

After Update (v1.157.16)

TanStack Start delivered exceptional performance, the highest throughput and lowest latency of all frameworks tested. With Watt, average response times stayed under 13ms even at 1,000 requests per second.

React Router: Solid and Reliable

React Router managed the load well and had zero failures. Using Watt made response times 38% faster compared to standalone Node.js.

Next.js: Struggling Under Load, but Making Progress

Initial Benchmark (Next.js 15.5.5, Watt 3.32.0)

Next.js couldn’t handle 1,000 requests per second. Response times averaged 8 to 11 seconds, and about 40% of requests failed. Even with Watt’s optimizations, Next.js lagged behind the lighter frameworks.

Updated Benchmark (Next.js 16.2.0-canary.66, Watt 3.39.0)

We re-ran the benchmarks after upgrading to the latest Next.js canary and Watt 3.39.0 to see if the situation had improved:

Next.js Version Improvement (Watt runtime)

Upgrading from Next.js 15.5.5 to 16.2.0-canary.66, along with Watt 3.39.0, brought a big improvement:

Throughput more than doubled
Average response times dropped by over six times
We saw an 83% reduction in latency.

The success rate only improved a little (about 36% of requests still failed), but the successful requests were served much faster, with the median response time dropping from seconds to 431ms.

This is real progress. Next.js is still the slowest of the three frameworks at this load, but the gap is closing, and more improvements are on the way.

Framework Collaborations: Benchmarks as a Catalyst

One of the best parts of this project was working directly with the framework teams. Sharing real-world benchmark data, especially flamegraphs that show where time is spent, helped turn abstract performance talks into real fixes. (If you are on a web performance team, we’d love to talk.)

The Next.js Collaboration: Fixing RSC Deserialization

After our initial Next.js benchmarks showed multi-second response times, we shared flamegraphs from our load tests withTim Neutkens from the Next.js team. The flamegraphs revealed a clear hotspot: initializeModelChunk. This function calls JSON.parse with a reviver callback in React Server Components (RSC) chunk deserialization.

The root cause was a well-known V8 performance characteristic: JSON.parse is implemented in C++, and passing a reviver callback forces a C++ → JavaScript boundary crossing for every key-value pair in the parsed JSON. Even a trivial no-op reviver (k, v) => v makes JSON.parse roughly 4x slower than bare JSON.parse without one. Since initializeModelChunk is called for every RSC chunk during SSR, this overhead compounds rapidly on pages with many server components.

Tim identified the fix and submitted it directly to React:facebook/react#35776 (merged Feb 19, 2026). The change replaces the reviver callback with a two-step approach—plain JSON.parse() followed by a recursive tree walk in pure JavaScript—yielding a ~75% speedup in RSC chunk deserialization:

This fix helps every React framework that uses Server Components, not just Next.js. It shows how profiling with real workloads can reveal optimization opportunities that microbenchmarks might miss.

The improvement is already reflected in our updated Next.js benchmarks (v16.2.0-canary.66), and we expect further gains as this optimization and others land in stable releases.

The TanStack Turnaround: A Case Study in Rapid Optimization

Interestingly enough, we had a similar journey with the TanStack team. Our initial benchmarks used TanStack Start v1.150.0, and the results were concerning: requests timing out, 75% success rates, and average response times exceeding 3 seconds. We shared these findings with the TanStack team, who quickly identified the critical bottlenecks (also via @platformatic/flame) in their SSR request handling pipeline.

Within 7 minor versions, they shipped a fix. We re-ran the benchmarks on v1.157.16, and the transformation was extraordinary:

The v1.150 numbers tell the story of a framework under distress. The p(95) latency hitting exactly 10,001ms wasn’t a coincidence, as the requests were slamming into our 10-second timeout limit. One in four requests failed entirely.

At 1,000 req/s, the framework was drowning.

After the fix, TanStack Start became the fastest framework in our benchmark. Response times dropped from seconds to milliseconds,the timeout cliff vanished, and every single request succeeded.

What makes this improvement even more notable is that it was runtime-agnostic. Both Watt and Node.js saw virtually identical gains: Watt improved from 3,228ms to 12.79ms average response time, while Node.js improved from 3,171ms to 13.73ms. This confirms that the bottleneck was purely in the framework’s code and that the fix benefited all users equally, regardless of their deployment strategy.

Runtime Comparison: Watt vs Node.js

Watt’s SO_REUSEPORT Advantage

Watt uses Linux kernel’s SO_REUSEPORT to let workers accept connections directly:

Kernel distributes the connection to the worker.
The worker processes the request.

No master coordination, no IPC overhead. The kernel handles load distribution efficiently.

When Does Watt Help Most?

Framework Rankings

With Watt Runtime

With Node.js Runtime

Reproducing These Benchmarks

The complete benchmark infrastructure is available at:

https://github.com/platformatic/k8s-watt-performance-demo/tree/ecommerce

To run the benchmarks:

# Benchmark TanStack Start
AWS_PROFILE= FRAMEWORK=tanstack ./benchmark.sh

# Benchmark React Router
AWS_PROFILE= FRAMEWORK=react-router ./benchmark.sh

# Benchmark Next.js
AWS_PROFILE= FRAMEWORK=next ./benchmark.sh

# Benchmark all frameworks
AWS_PROFILE= ./benchmark-all.sh

The script creates an ephemeral EKS cluster, deploys all three runtime configurations (Node, PM2, Watt), executes the load tests, and tears down the infrastructure automatically. The results for PM2 were omitted from the blog post because they align with previously reported findings (read 93% Faster Next.js in (your) Kubernetes).

Key Takeaways

Watt Provides Consistent Improvements
Watt improved performance for all frameworks compared to standalone Node.js. The gains ranged from 7% for TanStack to 38% for React Router. It’s a low-risk optimization that helps in every case.
TanStack Start is Production-Ready
Despite being the newest framework, TanStack Start delivered the best performance. The team’s rapid response to performance issues (a 252x improvement across 7 versions) demonstrates an active focus on development and optimization.
Keep Dependencies Updated
The results from TanStack and Next.js both show how important it is to keep your dependencies up to date. TanStack improved from 75% to 100% success in 7 versions. Next.js doubled its throughput between v15 and v16 canary. You only get these performance improvements if you update.
Framework Choice Matters More Than Runtime
The difference between TanStack Start and Next.js (3x throughput, 690x latency difference) far exceeds the difference between Watt and Node.js on the same framework. Choose your framework wisely.
Next.js Needs Caching
At 1,000 req/s, Next.js struggled. For high-volume SSR workloads, users should consider adopting aggressive cache strategies (ISR, edge caching, component caching). Next.js has great primitives for these, and you can use them in Watt. We did not implement any caching solution for Next.js because, in most e-commerce (or enterprise) scenarios, caching is a no-go: companies want to implement aggressive personalization strategies and A/B testing, running thousands of experiments in parallel. That said, the jump from v15 to v16 Canary shows meaningful improvement, and if this trajectory continues, the gap will keep closing.

If you want performance to be a key part of your technology choices, try setting clear latency budgets for each route before you start building or picking a framework. Setting concrete performance goals early helps guide decisions about architecture and tools, and makes sure your stack meets real-world needs. Planning for latency by route can also show when caching, framework choice, or runtime tweaks will have the biggest impact on user experience.

Conclusion

These benchmarks show there are big performance differences between SSR frameworks when running the same app under load:

TanStack Start emerged as the performance leader, handling 1,000 req/s with 13ms average latency.
React Router delivered reliable performance with zero failures.
Next.js struggled at this load, but improved a lot after upgrading to v16 canary. Throughput doubled and latency dropped by six times.

Beyond the numbers, this project showed that you can’t fix what you can’t see. We use platformatic/flame for our own internal performance testing, and sharing benchmark data with framework teams led to real improvements. The TanStack team’s 252x improvement in 7 versions, and the Next.js team’s work that led to a 75% speedup in React’s RSC deserialization, both show that open performance data helps the whole ecosystem, not just one framework or project.

For teams choosing an SSR framework, these results suggest:

High-throughput requirements: Consider TanStack Start or React Router
If you have an existing Next.js project, upgrade to the latest version for major performance gains. Use Watt to get the best throughput.
Runtime optimization: Watt provides consistent improvements across all frameworks

We’re actively looking to speak with web performance teams at the moment. If that’s you, please send me a DM on LinkedIn, Twitter, hello@platformatic.dev.

Scale Next.js Image Optimization with a Dedicated Platformatic Application

Paolo Insogna — Tue, 10 Mar 2026 14:46:21 GMT

Image optimization with Next.js is a popular feature, but one that quietly causes instability (in the form of latency spikes) for your frontend. This is because image resizing and encoding are very CPU and memory-intensive, especially when traffic is highest, and users expect fast pages. During real launches, 95th percentile render times often rise from about 600ms to over 2 seconds when there are many image requests, even if the app code stays the same. If image processing shares workers with Server-Side Rendering (SSR), React Server Components (RSC), and API routes, a spike in image requests can slow down everything else, and all of a sudden, you’ve got a cascading failure on your hands.

That’s why teams often notice the same pattern during launches and campaigns: /_next/image traffic increases, CPU usage maxes out, render times get longer, and the whole frontend slows down even though the app logic hasn’t changed. In short, image optimization starts to interfere with your most important user flows.

Watt is our open-source Node.js application server that orchestrates frontend frameworks (Next.js, Astro, Remix) and backend services (Node.js, Fastify, Express, Hono, etc) into a single system, with built-in logging, tracing, and multithreading. It leverages the Linux kernel's SO_REUSEPORT to distribute connections across workers with zero coordination overhead. In our production benchmarks on AWS EKS, Watt delivered 93.6% faster median latency and a 99.8% success rate under a sustained load of 1,000 requests per second. After investigating component rendering, it was only a question of time before we looked into images.

By moving image optimization into its own Watt Application, you create a clear microservice boundary. The optimizer becomes a focused service in your setup, with an API that only exposes what’s needed for safe and efficient image delivery. This keeps media processing separate from your main frontend. You can then scale image capacity on its own, let rendering workers focus on rendering, and adjust retries, timeouts, and storage for media processing without having to over-provision your whole frontend.

@platformatic/next is the official Platformatic package for running Next.js inside a Watt Application. It’s fully maintained and supported by the Platformatic team, so you get long-term compatibility with Next.js updates, regular security patches, and best-practice defaults for production. Teams can count on ongoing updates and quick fixes, which lowers maintenance risk and avoids the downsides of custom or community-maintained solutions. The package now includes an Image Optimizer mode, letting you run /_next/image as a dedicated Watt Application, scale it separately, and keep your frontend fast even when image traffic increases.

This capability was introduced in PR #4605, and it builds on top of @platformatic/image-optimizer, our dedicated optimization engine. Our image optimizer is built on top of sharp, leveraging @platformatic/job-queue, which adds flexible storage, job deduplication with caching, and producer/consumer decoupling.

If you are self-hosting Next.js and want the same kind of operational separation that mature platforms use internally, this is the missing building block.

In short, you can keep using Next.js as you always have, but with a cleaner architecture that handles high traffic more efficiently

Why split image optimization from your frontend?

If your frontend handles page rendering, API routes, and image resizing as a single service, any slowdown in one will cascade to the others. This means that when traffic is highest, like during product launches, campaigns, or social media spikes, this architecture causes performance to suffer the most

And it goes without saying (although it’s a blog, so yes, we will say it anyway…) that page performance isn’t just a technical issue - even a 100 ms delay can lower conversion rates by up to 7%, making slowdowns expensive during launches and campaigns.

The reason comes down to architecture: resizing and re-encoding images is bursty, CPU-heavy, and often I/O bound, while SSR and API routes usually need lower latency and more consistent resources. Running both in one service means you have to use the same autoscaling and resource pool for two very different types of work.

Splitting these responsibilities and running them as worker threads using Watt eliminates this ‘noisy neighbour’ effect and lets you apply the right scaling strategy to each path: scale optimizer replicas (or threads) when media demand rises, and keep frontend replicas sized for rendering throughput and tail latency.

Platformatic’s dedicated image optimizer, Watt Application, gives you:

Independent scaling: add replicas for image workloads without scaling the whole frontend stack.
Operational isolation: image spikes do not starve SSR/RSC rendering.
Centralized controls: enforce width/quality validation, timeout, retry behaviour, and storage in one place.
Flexible queue storage: choose memory, filesystem, or Redis/Valkey depending on your topology.

This setup is especially useful for platform engineering and SRE teams who need predictable performance without over-provisioning the whole frontend. Clear ownership lets these teams align this approach with their KPIs for reliability, scalability, and cost efficiency.

What shipped in Platformatic Next

The new next.imageOptimizer configuration lets you turn on optimizer-only mode in @platformatic/next, so you can run a Watt Application focused just on image processing. In other words: flip one flag and route only /_next/image, making adoption fast and low-friction.

When enabled, the service:

Exposes only the Next.js image endpoint (/_next/image, respecting base path).
Validates image parameters using Next.js rules.
Resolves relative URLs through a fallback target (URL or runtime service name).
Fetches and optimizes images through a queue-backed pipeline; if the same image is requested by multiple users at the same time, it would be processed only once.
Returns optimized image bytes and cache headers.

Under the hood, this relies on @platformatic/image-optimizer, which provides a robust processing pipeline with:

image type detection from magic bytes
optimization for jpeg, png, webp, and avif
animation-aware safeguards
URL fetch + optimize helpers
queue APIs powered by @platformatic/job-queue

The queue can run as a distributed state on Redis/Valkey, so retries, workload distribution, and resilience remain consistent across multiple optimizer replicas.

The main idea is to keep frontend rendering and image optimization separate, while still using the usual Next.js image features.

What this means for teams

Frontend teams keep using next/image as usual, without rewriting application code.
Platform teams get explicit controls for retries, timeout budgets, and queue storage.
Ops teams can scale optimizer replicas independently from the frontend tier.
Product teams get a smoother user experience during peak traffic windows.

The result is a platform that feels (and… is) faster to end users and more controllable to engineering teams. In recent internal benchmarks, shifting image optimization to a dedicated Watt Application reduced 95th-percentile response times during peak traffic by up to 40%, turning previously unpredictable slowdowns into consistently fast delivery even under heavy load.

Choose the right runtime blueprint

The easiest setup is a three-application Watt setup:

gateway: Watt’s gateway service, receive and routeincoming traffic.
frontend: your standard Next.js application
optimizer: @platformatic/next running in Image Optimizer mode

Watt’s Gateway sends only GET /_next/image requests to the optimizer, while everything else goes to the frontend. This gives you a clear separation without needing a complicated network setup.

For relative image URLs (for example /hero.jpg), the optimizer fetches originals from frontend via runtime service discovery (http://frontend.plt.local). For absolute URLs, it fetches upstream directly.

If you are deploying on Kubernetes, your best bet is to configure your K8s ingress controller to route GET /_next/image to separate pods running the image optimizer. This configuration is supported and documented at https://docs.platformatic.dev/docs/guides/next-image-optimizer#10-kubernetes-ingress-example-nginx-ingress-controller.

How to set this up

Start by creating a Watt workspace with three applications: Gateway, frontend, and optimizer. The frontend remains your existing Next.js app; the optimizer is another @platformatic/next app with next.imageOptimizer.enabled: true; Gateway routes image traffic to the optimizer and everything else to the frontend.

Use this structure as a baseline:

my-runtime/
 watt.json
 web/
   gateway/
     platformatic.json
   frontend/
     platformatic.json
     package.json
     next.config.js
     app/
   optimizer/
     next.config.js
     platformatic.json
     package.json

Then configure it in this order:

Enable image optimizer mode in the optimizer Watt Application.
Set optimizer.next.imageOptimizer.fallback to frontend so relative image URLs are fetched from http://frontend.plt.local.
In Gateway, route only GET /_next/image to optimizer and keep all other routes on frontend.
Pick queue storage for your topology:
- memory for local/dev
- filesystem for single-node persistent disk
- Redis/Valkey for distributed replicas
Tune timeout and maxAttempts using your target SLO and expected image profile.

With this setup, app teams can keep using next/image as usual, while platform teams get independent scaling and more control over operations.

Configuration example

In your optimizer application config:

{
 "$schema": "https://schemas.platformatic.dev/@platformatic/next/3.38.1.json",
 "next": {
   "imageOptimizer": {
     "enabled": true,
     "fallback": "frontend",
     "timeout": 30000,
     "maxAttempts": 3,
     "storage": {
       "type": "valkey",
       "url": "redis://localhost:6379",
       "prefix": "next-image:"
     }
   }
 }
}

And in your Gateway config, route only the image endpoint:

{
 "$schema": "https://schemas.platformatic.dev/@platformatic/gateway/3.0.0.json",
 "gateway": {
   "applications": [
     {
       "id": "frontend",
       "proxy": {
         "prefix": "/",
         "routes": ["/*"]
       }
     },
     {
       "id": "optimizer",
       "proxy": {
         "prefix": "/",
         "routes": ["/_next/image"],
         "methods": ["GET"]
       }
     }
   ]
 }
}

Storage choices: what to use and when

memory: local development or simple single-instance setups.
filesystem: single-node deployment with persistent disk.
redis/valkey: distributed production environments with shared queue state.

If you do not specify storage, memory is used by default.

For production multi-instance deployments, Redis/Valkey is usually the best default because it gives shared queue state and predictable behaviour across replicas.

Failure handling and reliability

Optimization runs through a queue with explicit timeout and retry controls:

timeout sets the fetch/optimization budget per job.
maxAttempts controls the automatic retry count.

When retries are exhausted, the service returns a 502 Bad Gateway response, keeping failure behaviour explicit, observable, and easier to alert on.

Try it today

If you are self-hosting Next.js and want predictable image performance under load, this capability gives you a practical path that does not require re-architecting your app:

keep your frontend app unchanged,
stand up a dedicated optimizer Watt Application,
route only /_next/image through Watt’s Gateway service,
pick the storage backend that matches your deployment model.

This is a small architectural change with a big benefit: better frontend stability, simpler operations, and image performance that scales when you need it.

If you want to deliver faster and more reliable user experiences as your traffic grows, dedicated image optimization is one of the best upgrades you can make with minimal disruption.

We brought Skew Protection to your Kubernetes

Marco Piraccini — Thu, 05 Mar 2026 15:00:00 GMT

We're excited to share a new experimental feature for Platformatic: Skew Protection in the Intelligent Command Center (ICC). This brings Vercel-style deployment safety to Kubernetes, letting you deploy without downtime and avoid version-mismatch problems.

You can think of this as akin to Vercel’s Skew Protection functionality, but running right in your existing Kubernetes setup: no migration or changes to your CI/CD pipeline or security policies needed, just out-of-the-box version pinning for your frontend applications.

The Problem: Version Skew in Kubernetes

When you update a web application, users who loaded the old frontend might send requests to the new backend. This is called “version skew,” and it can cause problems if APIs, assets, or data schemas have changed. For example, if you rename a form field, old clients might still send data using the old field name.

This problem matters even more for modern frontend apps, where the same codebase runs on both the client and server. Frameworks like Next.js, Remix, and monorepos often share TypeScript types, API definitions, or business logic between frontend and backend. If these shared parts change between versions, it can cause serious issues:

Hydration Errors and Broken UI: React Server Components tightly couples client and server in a single deployment; when a new version goes live, the server produces updated RSC payloads that older client bundles still in users' browsers cannot reconcile, causing hydration errors and broken UI
API contract violations: OpenAPI or protobuf definitions change between versions, leading to serialization/deserialization failures
Type discrepancies: Shared TypeScript interfaces or zod schemas break when frontend and backend versions diverge, causing runtime errors.
Codependent features: Frontend components that rely on backend-specific functionality fail when that functionality changes or is removed

The implications for your users are fairly straightforward: some might see API errors, missing fields, or broken features if their client and server versions don’t match; others might see data loss or corruption when schemas change across app versions. All this ultimately puts a load on support teams, who often need to coordinate across multiple feature teams to effectively untangle and ultimately resolve these issues.

Outside of the obvious impact on users (and revenue), k8s version skew is another example of how distributed systems, if not operated with the proper guardrails, actually impede developer velocity. In a world that is increasingly reliant on using AI to write code, the bottleneck is no longer the ability to write lines of code (if it ever was), but what happens between when your code is written and when it actually gets to production.

Version Skew in Kubernetes is a perfect example of such a problem - you have teams that are capable of shipping much faster, but without the right guardrails, the entire system actually moves slower and fails more often: fear of committing breaking changes leads to larger, less-frequent deployments that carry more risk and slow down your time-to-market.

The Solution: ICC Skew Protection

Platformatic’s new skew protection feature, built into the Intelligent Command Center, makes sure users stay on the version they started their session with, even when new versions are deployed. If a user starts a session on version N, all their requests during that session go to version N.

How It Works

Skew protection uses the Kubernetes Gateway API for version-aware routing, with ICC acting as the control plane. Each application version runs as a separate, immutable Kubernetes Deployment that users create themselves using standard Kubernetes workflows.

When applications run, ICC automatically detects new versions via label-based discovery and manages routing rules. ICC creates and maintains HTTPRoute resources that route requests based on session cookies, using a __plt_dpl cookie to pinusers to their deployment version.

When a new version is deployed, the previous version transitions to “draining” mode: existing sessions continue to work, while new sessions go to the active version. ICC monitors traffic activity and automatically cleans up old versions after configured grace periods.

Key Platformatic Components

Platformatic Watt is the Node.js application server that runs your application as a worker thread inside of Kubernetes . This allows for improved performance, resiliency, and compute efficiency, as well as providing out-of-the-box features such as hot reloading, health checks, and metrics collection.

watt-extra is an extension layer that sits on top of Platformatic Watt and serves as the bridge between your application and ICC. On startup, watt-extra connects to ICC and registers the application with its metadata (pod ID, app name, version). This registration enables ICC to:

Discover the application’s Kubernetes labels (app.kubernetes.io/name, plt.dev/version)
Manage autoscaling using real-time, Node.js-specific metrics
Implement version-aware routing for skew protection
Monitor health and performance,

System Architecture

The skew protection system consists of four layers. Each application version is a completely separate K8s Deployment, and the Kubernetes Gateway API handles routing at the ingress level based on HTTPRoute rules managed by ICC.

Component Breakdown

Client Layer

Browser Session A (cookie: __plt_dpl=dep-v42): A user who started their session on version 42. The __plt_dpl cookie pins their requests to that version, making sure the requests are routed to the correct backend even after newer versions are deployed.
Browser Session B (cookie: __plt_dpl=dep-v43): A user who started their session on version 43. Their requests are routed to the active version based on their cookie.
New Visitor (no deployment cookie): A first-time user or someone without a version cookie. Their first request is routed to the current active version, and they receive a cookie that pins them to that version.

Gateway API Layer

GatewayClass: Defines a template or class of gateways (e.g., Envoy Gateway, Contour, or Cilium) that can process Gateway API resources. Each cluster operator configures this with their preferred controller.
Gateway Resource: The actual gateway instance that listens on HTTP/HTTPS ports and processes incoming traffic. It contains listener configurations for TLS termination and routing.
HTTPRoute: Managed by ICC, this is the key routing rule that implements version-aware routing. It contains multiple rules: cookie-based matches for draining versions and a default rule that sets a cookie for new visitors and routes to the active version.

ICC (Intelligent Command Center) - Namespace: platformatic

Control Plane Service: The core component responsible for version detection, HTTPRoute management, and lifecycle decisions. When watt-extra registers a new pod, the control plane discovers the application name and version. It holds the version registry and creates/updates/deletes HTTPRoute resources as needed.
PostgreSQL: Stores the persistent state for skew protection, including the version registry with full metadata about each deployment (version string, timestamps, K8s resources), deployment history for audit trails, and per-application skew protection policies.

App Versions - Namespace: myapp

Deployment: myapp-v42 (draining): A Kubernetes Deployment for the previous version (42) that is being phased out. It has its own Service and pods running Watt with watt-extra. Traffic only routes here for users whose cookies match this version.
Deployment: myapp-v43 (active): The current active version deployment. It has multiple replicas for high availability. New visitors and users without matching cookies are routed here. ICC’s autoscaler works across all deployed versions, provisioning the correct amount of resources for each version based on actual traffic.
Service: Each version has its own Kubernetes Service that selects pods with the corresponding plt.dev/version label. These Services are referenced by the HTTPRoute’s backendRefs.
Pods (Watt + watt-extra): Each pod runs the application container (Platformatic Watt runtime) plus watt-extra. watt-extra is the ICC agent that connects to ICC on startup and registers the pod. It sends the pod ID, and ICC discovers the version and deployment metadata through Kubernetes APIs. watt-extra also reports metrics to ICC for autoscaling and health monitoring.

Observability Layer

Prometheus: Collects metrics from all pods and services. ICC queries Prometheus to monitor traffic patterns for each version, track request rates for draining versions, and uses that data to determine when versions should be transitioned to Expired status (meaning services that received no traffic for the pre-configured grace period).

How It All Works Together

When a new application version is deployed:

You deploy a new version of your app with the same app.kubernetes.io/name label and a new plt.dev/version label.
watt-extra registers the new pods with ICC, which detects the new version from the labels.
ICC makes the new version Active and moves the previous one to Draining. It updates the Gateway routing rules so that new sessions go to the active version, while existing sessions with a version cookie keep going to the draining version.
ICC monitors traffic on draining versions. Once there is no traffic, or the grace period elapses, ICC expires the old version — removing its routing rules and scaling it to zero, and optionally deleting the old Deployment and Service.

The Deployment Lifecycle in Detail

When managing multiple versions, skew protection uses a well-defined state machine to guarantee flawless transitions:

Active → The current version serving new sessions. Exactly one version per application is Active at a time. The HTTPRoute’s default rule points to the Active version’s Service, and new visitors receive a cookie pinning them to this version.
Draining → When a newer version is detected and becomes Active, the previous version transitions to Draining. No new sessions are assigned to it, but existing sessions with version-pinning cookies continue to be served. ICC monitors traffic activity for draining versions to determine when they can be safely retired.
Expired → A version transitions to Expired when it has zero traffic over the traffic window (default: 30 minutes) or when the grace period elapses (default: 24 hours), whichever comes first. ICC then removes the version’s matching rules from the HTTPRoute, scales the Deployment to zero replicas via the autoscaler, and optionally deletes the Deployment and Service (if auto-cleanup is enabled).

The ICC uses Version Labels to determine state. Version labels are opaque strings andcan be numbers, semver, git SHAs, or any identifier that fits your workflow. ICC does not parse or compare them; it just treats the most recently detected version as Active.

How users deploy a new version:

Build a new container image with the updated application code (e.g., myapp:v43)
Create a new K8s Deployment and Service with:
- Same app.kubernetes.io/name label (e.g., myapp) — this tells ICC it’s the same application
- New plt.dev/version label (e.g., 43) — this tells ICC it’s a new version
- New Deployment name (e.g., myapp-v43) and matching Service name
Apply the manifest: kubectl apply -f myapp-v43.yaml
ICC automatically detects the new version when pods start and watt-extra registers with ICC. The new version becomes Active, and the previous version begins draining.

Getting Started with ICC

Platformatic’s skew protection is built into the Intelligent Command Center (ICC), a complete control plane for managing Node.js applications or agents running in Kubernetes,with autoscaling, monitoring, and version-aware routing.

To get started with ICC:

Install ICC on your Kubernetes cluster. Follow our Installation Guide for step-by-step instructions, covering infrastructure requirements (Kubernetes, PostgreSQL, Valkey, Prometheus) and installation options.
Deploy your first application using the standard ICC workflow:
- Add @platformatic/watt-extra to your app
- Set PLT_ICC_URL so your app can register with ICC
- Deploy with kubectl apply or your existing CI/CD pipeline
Enable Skew Protection:
- Enable PLT_FEATURE_SKEW_PROTECTION
- Ensure Gateway API CRDs are installed (Kubernetes 1.27+)
- Deploy a Gateway API-compatible controller (Envoy Gateway, Contour, Cilium, Traefik, NGINX Gateway Fabric or Kong). See the Compatible Gateways in ICC documentation
- Configure deployment labels:

labels:
  app.kubernetes.io/name: myapp
   plt.dev/version: "43"
   # Optional: custom path prefix (default: /myapp)
   # plt.dev/path: "/api/leads"
   # Optional: hostname for HTTPRoute
   # plt.dev/hostname: "myapp.example.com"

Bring Vercel-Grade Deployment Safety to Your Kubernetes Environment

Platformatic’s skew protection is now available in ICC. It provides zero-downtime deployments and version-aware routing that keep each user session consistent.

If your team wants to try it in a real enterprise setup, send a message to Luca Maraschi or Matteo Collina via DMs on LinkedIn, or contact info@platformatic.dev.

Building an Auditable AI Gateway with Platformatic Watt

Paolo Insogna — Wed, 04 Mar 2026 15:00:00 GMT

Every engineering team that adopts AI quickly hits the same wall: a simple provider integration that worked for a demo turns into an operational bottleneck at scale. Tracking usage, containing costs, and keeping an audit trail across growing models and teams can slip out of reach fast. AI features are moving fast, but production teams still need the same thing they have always needed: not just control, but auditability.

That is exactly what ai-gateway-auditable delivers: an OpenAI-compatible gateway built with Platformatic Watt that combines provider routing, fallback resiliency, and durable audit logging to S3.

For production teams, this translates directly into risk reduction and regulatory readiness: your audit trail is always preserved, and resilient routing keeps incidents contained. In real terms, this leads to fewer lost logs or broken provider integrations (and fewer 3 a.m. pages as a result), and reliable evidence when you need to answer compliance or security reviews.

This architecture is not only production-ready, but already operating a scale for one of our early adopters. One application (proxy) serves traffic, while another (audit worker) persists audits, and a durable queue between them keeps latency low while preserving records, using the filesystem to provide durability. This same early-adopter halved its application latency using this pattern with Watt. With clear audit trails and resilient traffic handling, they were able to trace errors quickly and keep their on-call load under control, while giving their LLM-enabled end-users performance that approached parity with direct API calls, which was critical for serving their real-time use cases.

Source code: github.com/platformatic/ai-gateway-auditable

Why this matters now

The direct integration pattern is usually the first-stop for teams, but often leads to audit-trace gaps. Finance needs clean attribution by key or team, security needs auditable traces of model interactions, and product needs stronger uptime when upstream providers degrade.

As a real-world example, our same early adopter saw this with their initial production rollout, which missed up to 15% of request logs during peak volume, and causing request latency to spike by more than 2x when provider response times flared. At the same time, you want a single, stable integration surface instead of scattering provider-specific logic across multiple services. An AI gateway is where all your needs converge into a single, manageable control point.

With ai-gateway-auditable, every request has a clear path, every response is traceable, and fallback behavior is visible instead of opaque.

Why Watt

Platformatic Watt is well-suited to this pattern because it lets us run the API-facing proxy and the audit worker as separate applications with a shared operational model, using them as worker threads. That separation is the foundation of reliability here: the proxy can stay focused on low-latency responses, while the worker can focus on durable queue consumption, batching, and S3 shipping.

Most importantly, this design is tolerant of worker crashes. Watt supervises applications (worker threads), so if an audit worker crashes, it is automatically restarted, and unhealthy workers are automatically replaced. During that window, the proxy can keep accepting requests and persisting audit jobs in FileStorage. When the replacement worker is up, it resumes consuming from the same queue path and drains pending jobs.

The result is graceful degradation rather than data loss: temporary worker failures increase audit lag but do not break the request path or discard audit events. This distinction is critical from a business perspective. Losing audit data can put regulatory compliance at risk and expose the company to possible fines or a loss of trust, while a short delay in audit processing only postpones analysis or reporting. In other words, our design trades brief insight delays for the certainty that no evidence is lost.

Why filesystem-based storage

We use filesystem-backed queue storage on purpose. Writing audit jobs to local disk is crash-tolerant because queued data survives process failures and restarts, unlike in-memory buffers.

It also keeps resource usage and request-path performance under control. We do not need to retain full audit payloads in memory awaiting for remote writes, and we do not put every request on the critical path of an external storage service. That removes network latency and remote availability as immediate blockers to request handling, while still providing durable buffering before batches are shipped to S3.

Architecture at a glance

The system runs as two applications (threads) inside of Platformatic Watt, the Node.js application server.

The proxy is optimized for low-latency request/response flow, while the audit-worker is optimized for durability, retries, and batch shipping. Keeping these concerns separate avoids a common failure mode: heavy audit I/O slowing down user-facing traffic.

How do the two applications communicate? Through the same FileStorage queue path on disk. proxy writes audit jobs to ./data/queue at the same rate as local queue operations, and audit-worker consumes those jobs independently in the background. This gives you explicit producer/consumer decoupling: the request path does not wait for S3 uploads, retries, or batch rotation. If the worker restarts, queued jobs remain on disk and are resumed when it comes back. If S3 is slow or temporarily unavailable, jobs continue to accumulate durably in the queue instead of being lost or pushing latency back to callers.

In other words, even when storage is under pressure or S3 is temporarily unavailable, the gateway can keep serving requests while the audit pipeline catches up safely in the background.

What the gateway gives you

At a product level, this gateway provides four strong guarantees:

OpenAI Completions compatible endpoint (/v1/chat/completions) for clients and SDKs.
Model-based routing with fallback across providers.
Complete request/response audit records for every successful exchange.
Durable archival to S3 with batched JSONL files partitioned by time (JSON Lines is a text file format where each line is a valid, independent JSON object, separated by newline characters).

This means reduced provider lock-in, minimized operational risks, and heightened observability.

Service responsibilities

The key behavior is role decoupling: proxy only produces queue jobs, while audit-worker handles all downstream storage and shipping work.

proxy (external entrypoint)

proxy exposes:

GET /health
POST /v1/chat/completions

For each request, it:

Selects a provider chain based on model routing rules.
Executes upstream calls with fallback on retryable failures.
Returns the upstream response to the client.
Enqueues an audit payload into the shared durable queue.

audit-worker (internal service)

audit-worker is an internal Node application with no HTTP API (hasServer = false).

It owns the full audit persistence path:

queue consumption with @platformatic/job-queue
durable local buffering with FileStorage
batched JSONL writing
S3 uploads signed with AWS SigV4.

Queue settings used in the current implementation:

concurrency: 1
maxRetries: 3
resultTTL: 60_000
visibilityTimeout: 30_000

This is optimized for predictable sequential writes and safe retry semantics. Filesystem queue storage is chosen because it needs no external setup (no Redis/Valkey), making local development and single-node production rollouts much simpler. At the same time, it still provides crash resilience: queue state is persisted to disk, so in-flight and pending audit jobs survive process restarts.

That combination is the key trade-off here: you gain operational simplicity and zero external dependencies, without sacrificing durability for the audit trail. Note that adopting the file system exposes teams to the risk of data loss. Moving the auditability trail back to the main response cycle will introduce latency and cause a hard failure if the audit cannot be completed. The tradeoff, as always, is in the hands of engineers: availability or consistency?

Routing and fallback configuration

Routing lives in providers.json and uses two lists:

providers: upstream connection and adapter definitions
routing: per-model routing rules with ordered provider chains

{
 "providers": [
   {
     "id": "openai",
     "type": "openai",
     "baseUrl": "https://api.openai.com",
     "apiKey": "{OPENAI_API_KEY}"
   },
   {
     "id": "anthropic",
     "type": "anthropic",
     "baseUrl": "https://api.anthropic.com",
     "apiKey": "{ANTHROPIC_API_KEY}"
   }
 ],
 "routing": [
   {
     "id": "gpt-4o",
     "providers": ["openai"],
     "strategy": "fallback"
   },
   {
     "id": "claude-sonnet-4-6",
     "providers": ["anthropic"],
     "strategy": "fallback"
   },
   {
     "id": "*",
     "providers": ["openai"],
     "strategy": "fallback"
   }
 ]
}

Environment variables like {OPENAI_API_KEY} are resolved from process env at startup.

Fallback behavior is explicit and policy-driven: by exposing a clearly configurable list of retryable statuses, teams can align gateway failover with internal governance or incident playbooks. For example, you can tune which upstream failures (such as 429, 500, 502, 503, 504) trigger fallback based on your own risk, compliance, or incident response thresholds. This mapping between config and governance means compliance and security teams can review and pre-approve response handling in line with internal standards—a step that accelerates approval and audit-readiness.

retryable statuses: 429, 500, 502, 503, 504
Connection failures are retryable
Non-retryable responses (400, 401, 403) are returned immediately.

If you want delegated provider orchestration, you can configure OpenRouter as an openai-type provider and route * traffic to it.

Adapter model: one external contract, many upstreams

The gateway keeps a single OpenAI-compatible API surface, while adapters normalize provider differences behind the scenes.

OpenAI adapter supports OpenAI-compatible endpoints, including Azure/OpenRouter-compatible APIs.
The anthropic adapter translates OpenAI chat requests and responses to Anthropic Messages API semantics.

This removes provider-specific branching logic from your application layer.

Streaming support with full audit fidelity

Streaming UX matters, so the proxy preserves token-by-token delivery.

For stream: true requests, the proxy:

Pipes SSE chunks to the client in real time.
Buffers chunks internally.
Reconstructs a complete Chat Completions response.
Emits a single audit record with streamed set to true.

Users get low-latency streaming, and operators still get complete records for replay and analysis.

Audit record shape

Each JSONL line is a complete record with request, response, latency, caller hash, status, and routing metadata:

{
 "id": "a8f3b2c1-...",
 "timestamp": "2026-03-03T11:44:00.000Z",
 "duration_ms": 1243,
 "request": {
   "model": "gpt-4o",
   "messages": [{ "role": "user", "content": "Hello" }]
 },
 "response": {
   "id": "chatcmpl-...",
   "choices": [{ "message": { "role": "assistant", "content": "Hi!" } }]
 },
 "upstream_status": 200,
 "caller": "7a3f2b1c",
 "streamed": false,
 "routing": {
   "model": "gpt-4o",
   "planned_providers": [{ "id": "openai", "status": 200, "duration_ms": 1200 }],
   "used_provider": "openai"
 }
}

The caller is an 8-character SHA-256 prefix of the bearer token value, so attribution is possible without storing raw API keys.

Durable audit pipeline in detail

Inside the request path, proxy enqueues each payload using the request ID as the job ID, which naturally supports deduplication when IDs repeat.

audit-worker consumes those jobs and writes them into local JSONL batches before upload.

The writer then:

Appends each record as one JSON line to a local batch file using flush semantics.
Rotates to a new batch when the size or time threshold is reached.
Uploads the batch file to S3 using undici and SigV4 headers.
Deletes local batch files only after successful upload.

Current thresholds:

BATCH_SIZE = 100
FLUSH_INTERVAL_MS = 5000

S3 object keys are hour-partitioned for downstream querying:

audits/2026/03/03/11/batch-1741003090000-3bb7....jsonl

This structure works well with tools like Athena and other data lake pipelines.

Operating under failure

The gateway is intentionally designed to degrade gracefully.

Typical architectural components here include the file-backed queue directory (such as ./data/queue), which serves as the communication bridge between the proxy and the audit-worker; single-node deployment support via Platformatic Watt's supervised applications; and a default S3 bucket for audit archives. Core configuration files like providers.json define routing logic and provider chains, while runtime environment variables control credentials and logging. All of these components work together as the durable, fault-tolerant foundation that keeps this architecture reliable at scale. This keeps user-facing availability high while preserving eventual audit consistency.

Run it locally

git clone https://github.com/platformatic/ai-gateway-auditable.git
cd ai-gateway-auditable
npx wattpm-utils install
docker compose up

Then call the gateway with any OpenAI-compatible client or a simple curl:

curl http://localhost:3042/v1/chat/completions \
 -H 'Content-Type: application/json' \
 -H 'Authorization: Bearer sk-your-key' \
 -d '{
   "model": "gpt-4o",
   "messages": [{"role": "user", "content": "Hello"}]
 }'

Final take

ai-gateway-auditable is a practical pattern for teams that need to move fast with AI and still satisfy the operational norms of production software. It gives you:

one consistent API surface with clear fallback behavior,
complete and queryable audit trails, and a clean separation between serving traffic and persisting evidence.

If your roadmap includes multi-provider AI, compliance requirements, or strict SRE expectations, this architecture is ready to adopt and extend.

The easiest way to get started is to fork the repo, run the quick-start commands, and see the gateway in action with your own test requests. Try spinning up the service locally and sending a sample call: this practical step will show you right away how auditable AI operations can be within your own workflow.

Happy building!

Introducing @platformatic/job-queue

Matteo Collina — Tue, 03 Mar 2026 15:00:00 GMT

Every backend developer knows the frustration: a key job disappears during a server restart, or duplicate tasks pile up when a client retries a request. Lost work, repeated emails, missing reports: these breakdowns always seem to happen when reliability matters most.

@platformatic/job-queue is a new queue library from Platformatic focused on reliability and operational simplicity. This library is built on a workflow that lets you enqueue jobs and wait for results when needed, making background processing feel just as smooth as calling a function. Alongside this, it provides Node.js teams with a modern API that includes built-in caching, deduplication, retries, and pluggable storage.

In practice, this means you can start with a tiny local setup and then move to a distributed, production-grade deployment without rewriting your application code.

What makes it different

Most queue setups force you to stitch together multiple patterns and handle edge cases yourself. @platformatic/job-queue includes those patterns out of the box:

Deduplication by job id so repeated enqueue attempts do not create duplicate work.
Request/response support with enqueueAndWait() when you need async processing but still want a result.
Reliable retries with configurable attempts and backoff behavior.
Stalled job recovery via a Reaper that requeues jobs from crashed workers.
Graceful shutdown ensures in-flight jobs complete before the service stops, reducing lost work during deploys and restarts.
Move fast with safety: The API is TypeScript-native with typed payloads and results, so you catch errors at compile time and move confidently.

This makes it appropriate for both classic fire-and-forget workloads and RPC-style workloads that require a response. You do not have to pick one model globally: many teams use both in the same system, depending on endpoint and latency requirements. For example, in use cases such as sending emails and notifications, fire-and-forget jobs make sense because results are often not needed immediately and occasional retries can be handled gracefully. On the other hand, workflows such as generating invoices or processing payments may require the caller to wait for a result, making the request/response pattern with enqueueAndWait() a better fit.

A quick look at the API

You can use the queue as a producer and consumer in the same process, or split them across services. The API is intentionally small, so the same primitives are easy to apply in monoliths, microservices, and worker pools.

import { Queue, MemoryStorage } from '@platformatic/job-queue'

const storage = new MemoryStorage()
const queue = new Queue<{ email: string }, { sent: boolean }>({
 storage,
 concurrency: 5
})

queue.execute(async job => {
 // your business logic
 return { sent: true }
})

await queue.start()

// fire-and-forget
await queue.enqueue('email-1', { email: 'user@example.com' })

// request/response
const result = await queue.enqueueAndWait('email-2', { email: 'another@example.com' }, { timeout: 30_000 })
console.log(result)

await queue.stop()

Architecture description

When you call enqueue(), the producer checks if the job already exists in the storage. If it’s a new job, it's added to the queue with the state “queued,” and the method returns immediately. If the job is a duplicate, the storage returns a duplicate status without creating a new entry.

When you call enqueueAndWait(), the producer first subscribes to a notification for that job, then enqueues it. If the job was already processed, it returns the cached result immediately. Otherwise, it waits for a notification from the worker when the job completes (or fails), then fetches the result and returns it.

The consumer continuously dequeues jobs from the storage using a blocking move operation. When it receives a job, it marks it as “processing” and executes the handler. On success, it stores the result with TTL and marks the job as completed. On failure, it either retries (if attempts remain) or marks the job as failed.

The producer API supports per-job options such as maxAttempts and resultTTL, which are useful when not all jobs have the same retention or retry requirements. For example, you might keep invoice-generation results longer than low-value notification results, even if they run on the same queue.

Storage backends for different environments

@platformatic/job-queue ships with three storage adapters:

MemoryStorage

MemoryStorage keeps all queue states in process memory. This makes it ideal for local development, testing, and simple single-instance services where data can be ephemeral.

import { Queue, MemoryStorage } from '@platformatic/job-queue'
const storage = new MemoryStorage()
const queue = new Queue({ storage })

Jobs are stored in JavaScript Maps and Sets within the same process. This gives you the lowest latency possible, but means jobs are lost if the process restarts. For development workflows where you restart frequently, this is usually not a concern.

FileStorage

FileStorage persists the queue state to the filesystem in JSON format. It works well for simple deployments on a single node where you need persistence but do not want external dependencies like Redis.

import { Queue, FileStorage } from '@platformatic/job-queue'

const storage = new FileStorage('./queue-data')
const queue = new Queue({ storage })

The storage writes atomically to prevent corruption, and it maintains separate files for jobs, metadata, and locks. Since it relies on file system locks, it is not suitable for multi-node deployments.

RedisStorage

RedisStorage uses Redis (7+) or Valkey (8+) for distributed queue operations. This is the recommended choice for production workloads that require horizontal scaling, leader election, or cross-instance coordination.

import { Queue, RedisStorage } from '@platformatic/job-queue'
const storage = new RedisStorage({ connectionString: 'redis://localhost:6379' })
const queue = new Queue({ storage })

RedisStorage leverages Redis data structures for atomic operations:

Lists for job queues
Sorted sets for delayed job scheduling
Pub/sub for notifications across instances
Lua scripts for atomic state changes

For high availability, RedisStorage also supports Sentinel and Cluster modes for failover and sharding.

Choosing the right backend

Start with MemoryStorage for development, use FileStorage for simple single-node deployments, and choose RedisStorage for production systems that need horizontal scaling.

Reliability features that matter in production

The library is designed around the real failure modes of job processing systems.

Visualize this: you deploy a routine patch, and one of your job workers crashes unnoticed. By the next day, 5,000 critical jobs piled up and could have vanished forever. But thanks to built-in recovery, every one of them was automatically rescued. Situations like this are exactly where background processing systems prove their worth, thanks to strong safeguards.

Recovering stalled jobs

If a worker crashes while processing a job, the Reaper can detect the stalled work and requeue it after visibilityTimeout.

import { Reaper } from '@platformatic/job-queue'
const reaper = new Reaper({
 storage,
 visibilityTimeout: 30_000
})
await reaper.start()

For high availability, the Reaper also supports leader election (with Redis storage), so multiple instances can run safely while only one acts as leader at a time. If the leader goes away, another instance takes over, which helps avoid manual control during incidents.

Controlled retries and terminal states

Failed jobs can retry automatically up to maxRetries. When retries are exhausted, errors are persisted as a terminal state so producers can inspect or react programmatically.

This gives you reliable behavior for flaky dependencies, such as third-party APIs: transient failures recover automatically, while permanent failures remain visible and actionable.

Graceful shutdown

When stopping a worker, queue.stop() waits for in-flight jobs to finish. This reduces dropped work during deploys and restarts and helps keep queue state consistent across gradual updates. In practice, this means you can safely perform blue/green or canary deployments without worrying about losing in-progress work. Teams can ship changes faster, with the confidence that jobs will complete and customer data will not go missing, even as new versions are rolled out.

Request/response without building custom plumbing

One particularly useful capability is enqueueAndWait(). Teams often build this pattern manually on top of queues, but it is already integrated here, including timeout handling and typed errors.

try {
 const result = await queue.enqueueAndWait('invoice-123', payload, { timeout: 10_000 })
 return result
} catch (error) {
 // handle TimeoutError / JobFailedError, etc.
}

This is a good fit when work should run in a worker context, but the caller still needs a bounded response path, such as document generation, webhook fan-out, or expensive validation that should not run on an HTTP thread.

You also get explicit queue errors (TimeoutError, JobFailedError, and others), so your application can distinguish among transport problems, worker failures, and business-level errors.

Getting started

Install the package:

npm install @platformatic/job-queue

Then choose a backend based on your environment:

Start with MemoryStorage for local development.
Move to RedisStorage (Redis 7+ or Valkey 8+) for production.
Add Reaper when running multiple workers or when stalled-job recovery is required.

If you already have queue infrastructure in place, one good migration approach is to move one bounded workflow first (for example, email delivery or report generation), validate behavior and observability, and then expand usage across other jobs.

We recommend separating responsibilities into dedicated processes:

Producer services enqueue jobs from HTTP handlers or internal events.
Worker services execute jobs with tuned concurrency.
A Reaper instance handles stalled-job recovery (or multiple instances with leader election).

This setup lets you scale producers and workers independently. If incoming traffic spikes, add producers; if processing backlog grows, add workers.

Final thoughts

@platformatic/job-queue is a practical option for Node.js teams that want reliable background processing without having to assemble every reliability feature from scratch. The combination of deduplication, request/response semantics, retries, and pluggable storage makes it flexible enough for both simple jobs and more demanding production workloads. Most importantly, it lets you focus on what matters most: building features and generating value, knowing your background tasks are handled with care. Imagine deployments where you can sleep soundly, confident that every job is accounted for and that no critical work is lost, even during outages. With the right foundation, you are set up not just for peace of mind, but for lasting success as your systems and team continue to grow.

If you are evaluating queue systems for your next service, this is a good time to try it and share feedback with the team (us). Real-world feedback is especially valuable while the project is still young and evolving quickly. If you run into an unexpected edge case or a strange retry failure, please open an issue describing your scenario: we love to fix hard problems. Concrete examples help us improve reliability for everyone!