Skip to content
16px
Why Your Payment Backend Needs a Dedicated Real-Time Layer (And How We Built One)
BackendSystem DesignWebSocketsgRPCRedisReal-TimePayments

Why Your Payment Backend Needs a Dedicated Real-Time Layer (And How We Built One)

The thing that processes payments should not be the same thing that delivers notifications. Here's why splitting these into two systems — a payment backend and a dedicated push layer — makes everything better.

February 28, 20267 min read

You know that moment when you order food on Zomato, you pay, and the delivery partner's phone instantly lights up with "Payment Received"? It feels seamless. Almost magical. But behind that two-second moment, there's a surprisingly complex system making it happen.

I recently built a payment processing prototype and ran into a problem that every team building real-time payment experiences eventually hits: the thing that processes payments should not be the same thing that delivers notifications. Let me walk you through why, and how splitting these into two separate systems made everything better.

The Problem With Doing Everything in One Place

Let's start with the obvious approach. You build a payment backend. It talks to Razorpay, handles webhooks, updates the database, and when a payment goes through, it pushes a message to the delivery partner over a WebSocket.

This is what I built first. And honestly, it works. For a prototype. For ten users. Maybe even a hundred.

Here's where it starts to fall apart.

The payment API is an HTTP server — it handles requests, talks to the database, and responds. WebSocket connections, on the other hand, are long-lived. They sit there, open, waiting. Every delivery partner with the app open is holding a persistent connection to your server. Now your payment API is doing two fundamentally different jobs: processing short-lived HTTP requests and babysitting thousands of long-lived socket connections.

The notification part was also naive. When a payment was confirmed, the server would broadcast the event to every connected WebSocket client. Every single one. It didn't care who the payment was for. The client-side app had to figure out "is this event meant for me?" That's wasteful, insecure, and just bad design.

So I decided to separate things.

Enter Propeller: A Dedicated Push Layer

The second system I built is inspired by something CRED (the fintech company) built internally called Propeller. The idea is simple but powerful: build a standalone service whose only job is managing persistent client connections and routing events to the right user, on the right device.

It doesn't know what a payment is. It doesn't care about orders or Razorpay or UPI. All it knows is: "I have a bunch of connected clients, and when someone tells me to send a message to client X, I deliver it."

Think of it like a postal service. The postal service doesn't write letters. It doesn't know what's inside the envelope. Its entire job is making sure the letter reaches the right address.

How the Two Systems Work Together

Here's the full flow of what happens when you tap "Pay" on a food delivery app, with both systems doing their part.

  • Step 1 — The customer pays. The customer's app sends a request to the payment backend. The payment API creates an order in the database and talks to a payment gateway (Razorpay, Cashfree, or Juspay) to initiate the transaction. A smart router picks the best gateway based on availability and success rates.
  • Step 2 — The gateway confirms. After the customer completes the UPI payment, the gateway sends a webhook back to the payment backend. Instead of processing it immediately, the backend drops it into a Redis Stream — a durable queue. Webhooks can arrive in bursts and you can't afford to lose any.
  • Step 3 — The worker picks it up. A background worker reads from the queue. For each webhook, it first checks if the event was already processed (idempotency). Payment gateways sometimes send the same webhook multiple times — processing a payment twice would be catastrophic. If new, the worker updates the database: payment marked SUCCESS, order status flips to PAID.
  • Step 4 — The worker tells Propeller. Instead of pushing a notification over a basic WebSocket, it makes a single gRPC call to Propeller: "Send this event to driver_456." One clean API call. That's it.
  • Step 5 — Propeller delivers. Propeller already has the delivery partner's phone connected via a persistent gRPC stream or WebSocket. It looks up the channel for that client (tenant:zomato:client:driver_456), publishes the event through Redis Pub/Sub, and the message flows to the exact device. Nobody else sees it. The screen shows "Payment Received" within milliseconds.

Why This Split Matters

Let me explain why having Propeller as a separate service is a fundamentally better design.

  • Scale independently. On a Friday night, you might have 50,000 delivery partners connected (50,000 persistent connections) but only 5,000 payments per minute. With a monolithic approach, you'd scale the entire payment backend just to handle sockets. With the split, you scale Propeller horizontally without touching payment infrastructure.
  • Targeted delivery. Instead of broadcasting to everyone, Propeller routes events to specific users on specific devices. Each client subscribes to their own channel. Only they receive their events. A driver should never see payment details meant for another driver.
  • Multi-device awareness. A delivery partner might have two phones. Propeller tracks active devices per user via Redis Hashes, and can target a specific device or deliver to all active devices. The payment backend never has to think about this.
  • Resilience and fault isolation. If Propeller crashes, payments keep processing — the worker retries the gRPC call. If the payment backend goes down, connected clients stay connected to Propeller and don't notice. The two systems fail independently.
  • Multi-tenant by design. Channels are namespaced as tenant:zomato:client:driver_456 and tenant:blinkit:client:rider_789. The same Propeller instance serves multiple brands with zero cross-contamination.

The Connection Lifecycle

One detail that's easy to overlook but critical to get right: keeping connections alive.

Mobile networks are unreliable. Phones go in and out of coverage. Apps get backgrounded. Without actively monitoring connection health, you end up with ghost connections — sockets that look open on the server but have no living client on the other end. Events sent to ghost connections just vanish.

Propeller handles this with a keepalive mechanism. For WebSocket clients, the server periodically sends a Ping frame. If the client doesn't respond with a Pong within a timeout window, the connection is considered dead and cleaned up. For gRPC streaming clients, it monitors idle streams and disconnects them after a configurable timeout.

This ensures the system always has an accurate picture of who is actually online and reachable.

Authentication and Security

Every connection to Propeller requires authentication via JWT tokens. The token contains the client ID and tenant ID — Propeller knows who this user is and which org they belong to. There's no way for one user to subscribe to another user's channel.

For backend services publishing events (like the payment worker), Propeller requires an additional API key. This creates a clear separation:

  • Clients can only listen
  • Backends can only publish

A compromised client token can never be used to inject fake payment confirmations.

The Reconciliation Safety Net

There's one more piece that ties everything together. The payment worker runs a reconciliation process every five minutes. It scans the database for payments still in PENDING state beyond a reasonable time window and resolves them. Maybe a webhook was lost. Maybe the gateway had a hiccup. Whatever the reason, reconciliation catches these edge cases and ensures no payment gets stuck in limbo forever.

In the real world, distributed systems don't behave perfectly. Webhooks get lost. Network calls fail. Redis might have a brief outage. The reconciliation worker acts as a safety net, ensuring eventual consistency even when individual components hiccup.

What This Looks Like from the User's Perspective

  • Customer: Taps pay, completes UPI, sees confirmation. Simple.
  • Delivery partner: App is connected to Propeller in the background. The moment the payment clears, their screen updates. No polling. No refresh button. No delay.
  • Engineering team: Two independent services, each with a clear responsibility. The payment backend deals with money, gateways, and the database. Propeller deals with connections, routing, and delivery. They communicate through a single, clean gRPC interface.

Wrapping Up

The key takeaway: real-time payment notifications are not a feature you bolt onto your payment backend. They're an infrastructure problem that deserves its own dedicated service.

By separating the "processing" from the "pushing," you get a system where each part can scale, fail, and evolve independently. The payment backend stays lean and focused on handling money safely. The push layer stays lean and focused on delivering messages fast and to the right person.

This is the same pattern that companies like CRED, Zomato, and others use at scale. And once you have a Propeller-like service running, it's not just for payments. Order updates, chat messages, live tracking, surge pricing alerts — any event that needs to reach a user in real time can flow through the same infrastructure.

Build the plumbing right, and everything else becomes easy.

Bhupesh Kumar

Bhupesh Kumar

Backend engineer building scalable APIs and distributed systems with Node.js, TypeScript, and Go.