Skip to content
16px
System DesignCachingRedisAerospikeBackend

How Large Indian Tech Companies Think About Caching Infrastructure

A system design breakdown of how Myntra, Zerodha, Flipkart, and JioHotstar use different caching architectures based on correctness, latency, cost, and scale.

May 10, 202614 min read

Most people learn caching as a simple idea:

Put Redis in front of the database so reads become faster.

That is correct at a beginner level, but it is not how large systems actually evolve.

At real scale, caching is not just about speed. It becomes a question of correctness, failure handling, cost, bandwidth, tail latency, and operational simplicity.

This is why different companies solve the same problem in very different ways.

Myntra moved inventory reads closer to the application using near-cache style architecture.

Zerodha kept the hot read path extremely simple with Redis and HTTP E-Tags.

Flipkart uses Aerospike as a high-throughput real-time data layer for large-scale sale traffic.

JioHotstar optimizes throughput by separating cacheable and non-cacheable APIs at the CDN/API layer.

All four are caching stories, but the design decisions are completely different.

The important lesson is this:

There is no universal caching architecture. The correct design depends on what breaks first in your system.

1. Myntra: Inventory Caching Where Consistency Matters More Than Simplicity

Myntra is one of the best examples because the problem is very easy to understand.

Imagine a flash sale.

A product has only 5 units left.

Thousands of users are opening the product page, adding it to cart, and trying to buy it at the same time.

For a normal product page, showing slightly stale data may be acceptable.

But for inventory, stale data is dangerous.

If the system says an item is available when it is already sold out, customers can place orders that cannot be fulfilled. That becomes an overselling problem.

So inventory caching is not just a read-scaling problem. It is a correctness problem.

The Original Myntra-Style Problem

The common architecture looks something like this:

text
1Inventory DB -> Central Redis Cache -> Inventory API -> User

The application reads inventory data from a centralized cache service.

At first, this looks reasonable.

Redis is fast. Redis can handle large read volume. Application servers do not need to hit the database directly.

But the issue appears when updates happen.

Inventory changes very frequently:

text
1Order placed
2Order cancelled
3Stock reserved
4Stock released
5Warehouse stock updated
6Return received

If the database updates first and the cache updates asynchronously after that, there is always a small time window where the cache can be stale.

For normal metadata, this may not be a big deal.

For inventory, that small window can become overselling during peak traffic.

That is the core problem Myntra was solving.

Why Centralized Redis Became a Bottleneck

A centralized Redis-backed cache creates a few problems at very high scale.

First, every inventory read goes over the network.

Even if Redis is fast, network calls are not free. When you are doing millions of lookups, network latency and network bandwidth become serious concerns.

Second, Redis becomes a central dependency.

If every product page, cart page, and checkout flow depends on one central cache layer, then that cache becomes part of the critical path.

Third, cache misses become dangerous if the cache is treated almost like a persistent store.

In a clean cache design, a cache miss should fall back to the source of truth.

But if the cache is designed in such a way that the application expects the data to always be present, then it is no longer just a cache. It behaves like a pseudo-source-of-truth.

That makes failure modes harder.

Myntra's Better Direction: Move Reads Closer To The Application

Myntra's solution was not "remove caching."

The better way to say it is:

Myntra removed the centralized Redis-backed cache from the hot inventory read path and moved to a near-cache architecture.

A simplified version looks like this:

text
1Inventory API
2    -> L1 Near Cache inside/near the application
3    -> L2 Distributed Cache
4    -> MySQL Read Replica

The important idea is the fastest read should happen closest to the service.

So instead of every request going to a centralized Redis service over the network, the application can serve many reads from an in-process or near-process cache.

This reduces network calls, reduces load on the central cache, and improves read latency.

How The Read Path Works

The inventory read path can be understood slowly like this:

text
1User requests product inventory
23Inventory API receives request
45Check L1 Near Cache
67If cache hit: return response immediately
89If cache miss: check L2 distributed cache
1011If needed: read from MySQL read replica
1213Fill cache again
1415Return response

This is a multi-tier cache.

L1 cache is fastest because it is closest to the application.

L2 cache is shared and distributed, so it helps when local cache does not have the data.

MySQL read replica is the fallback read source.

MySQL master remains the source of truth for writes.

Why This Helps During Flash Sales

During a sale, the same product inventory may be read again and again.

If every request goes to Redis or the database, the backend has to handle massive repeated reads.

But if each application instance has a near cache, many repeated reads can be served locally.

That means:

  • Less network traffic
  • Lower cache cluster pressure
  • Lower database pressure
  • Lower latency
  • Better resource usage

The architecture becomes more efficient because the system stops doing unnecessary remote calls for data that is already hot.

The Hard Part: Invalidation

Near cache is powerful, but it creates one hard problem:

How do you make sure local caches do not keep stale inventory forever?

That is where invalidation comes in.

A common design is:

text
1MySQL Master
2    -> Binlog CDC
3    -> Inventory Events
4    -> Cache Invalidator
5    -> Invalidate L1 and L2 cache

When inventory changes in the master database, the change is captured from the database log.

That change becomes an inventory event.

A cache invalidator consumes the event and clears or updates the affected cache keys.

This keeps the read path fast while still giving the system a way to remove stale data.

Myntra's Core Lesson

Myntra's story is not simply "Redis is bad."

Redis is excellent for many use cases.

The real lesson is:

If your cache becomes a correctness risk, your architecture has to change.

For inventory, consistency matters more than blindly centralizing every read through Redis.

Myntra optimized the read path by moving hot reads closer to the application and relying on a stronger invalidation pipeline.

That is why this architecture is so interesting.

Sometimes scaling is not about adding another bigger cache.

Sometimes scaling means removing the wrong cache from the wrong place.

2. Zerodha: Keep The Hot Path Extremely Simple

Zerodha's caching philosophy is almost the opposite of Myntra's.

Zerodha runs Kite, a trading platform where users constantly check orders, positions, holdings, margins, and portfolio data.

The key challenge is not the same as e-commerce inventory.

For Zerodha, the hot path is mostly about serving user-specific read responses very quickly and reliably.

Their public engineering writing explains a very simple idea:

Every bit of data shown on Kite comes from a hot Redis cache.

This includes things like orders, positions, and portfolio data.

The key idea is simplicity.

The Zerodha-Style Read Path

A simplified version looks like this:

text
1User opens Kite
23API receives request
45API does O(1) lookup in Redis
67Raw JSON bytes are returned

The important part is what does not happen in the hot path.

The system does not recompute everything again.

It does not do heavy joins.

It does not do expensive serialization repeatedly.

It does not hit the database for every screen refresh.

It reads already prepared data from Redis and returns it.

That is why the design is powerful.

Why This Works For Zerodha

Trading apps have a very specific traffic pattern.

Many users repeatedly open the same screens:

text
1Positions
2Holdings
3Orders
4Funds
5Portfolio

If each request recomputes the response, the backend wastes CPU.

If each request hits the database, the database becomes overloaded.

If each request serializes fresh JSON, the API layer does unnecessary work.

So Zerodha keeps the hot response ready in Redis.

Then the API layer becomes almost a thin delivery mechanism.

That is the beauty of the design.

E-Tag And HTTP 304 Caching

Zerodha also uses HTTP caching ideas, especially E-Tags and 304 responses.

The concept is simple.

The server gives a version tag for a response.

The client stores that response locally.

Next time, the client asks:

text
1Has this data changed since this E-Tag?

If the data has not changed, the server does not send the full response again.

It can return:

text
1304 Not Modified

This saves bandwidth.

That matters a lot on mobile networks.

It also reduces backend load because the system avoids repeatedly sending large unchanged payloads.

Zerodha's Core Lesson

Zerodha's lesson is:

The fastest request is the one where the server does almost no work.

Their design is not about having the most complex distributed system.

It is about reducing work in the hot path.

Keep prepared data in Redis.

Use HTTP caching where possible.

Avoid unnecessary serialization.

Avoid unnecessary database calls.

Avoid unnecessary network transfer.

This is engineering maturity.

Not every system needs a complex event-driven architecture.

Sometimes the best architecture is the boring one that does the least amount of work.

3. Flipkart: Aerospike For Massive Real-Time Sale Traffic

Flipkart has a different scale problem.

During Big Billion Days, traffic spikes are massive.

Users are searching, scrolling, viewing products, checking prices, seeing recommendations, clicking ads, adding to cart, and buying products.

This creates huge read and write pressure across many systems.

Flipkart's public Aerospike case studies talk about very high query volume across multiple data centers and many use cases.

The interesting part is that Aerospike is not being used for one small cache.

It powers many low-latency, high-throughput use cases.

Examples include:

  • Search
  • Recommendations
  • Ads
  • Pricing
  • Inventory
  • Offers
  • Feature stores
  • Real-time user experiences

Why Redis Alone May Not Be Enough Here

Redis is great when the dataset fits comfortably in memory and the access pattern is simple.

But large e-commerce systems often have massive datasets and extreme traffic spikes.

Keeping everything purely in RAM can become expensive.

Operationally, managing many use cases separately can also become difficult.

Aerospike is designed for high-throughput, low-latency workloads and can use SSDs efficiently while still giving predictable latency.

That makes it attractive for very large real-time systems.

Flipkart-Style Data Platform Thinking

Instead of every team running its own random cache/database cluster, a platform approach is better.

A simplified version looks like this:

text
1Application Teams
23Shared Aerospike Platform
45Low-latency real-time reads/writes
67Search, pricing, inventory, ads, recommendations

The goal is not only speed.

The goal is operational control.

When a company has hundreds of backend services, it cannot let every team invent a new caching layer independently.

A shared platform gives:

  • Standardized operations
  • Predictable latency
  • Better capacity planning
  • Centralized expertise
  • Lower operational chaos

That matters during sale events.

Why This Matters During Big Billion Days

During normal traffic, many systems look fine.

During sale traffic, weak systems break.

A sale event creates synchronized user behavior.

Millions of people come at the same time.

They search at the same time.

They check prices at the same time.

They open product pages at the same time.

They refresh at the same time.

That means tail latency becomes extremely important.

Average latency is not enough.

If the 99th percentile latency becomes bad, users will feel the app is slow even if the average looks fine.

So Flipkart's design needs a data layer that can handle massive concurrency with predictable performance.

That is where Aerospike fits into the story.

Flipkart's Core Lesson

Flipkart's lesson is:

At very large scale, caching becomes a platform problem, not just an application-level trick.

A small startup can add Redis and move on.

A large e-commerce company needs a managed, standardized, highly available, low-latency data platform.

The problem is not just "how do I make one API faster?"

The real problem is:

text
1How do hundreds of teams serve real-time data under sale traffic without each team reinventing infrastructure?

That is a very different level of engineering.

4. JioHotstar: Split Cacheable And Non-Cacheable APIs

JioHotstar has another kind of scale problem.

Streaming platforms deal with huge concurrency.

During a cricket match, millions of users may join at the same time.

The backend has to support login, playback, recommendations, metadata, live match state, ads, subscriptions, and many other services.

But not all APIs are equal.

Some APIs are cacheable.

Some APIs are not.

That distinction is extremely important.

Cacheable Vs Non-Cacheable APIs

A cacheable API returns data that can be reused across many users or for a short time window.

Examples:

  • Match metadata
  • Static content metadata
  • Home page modules
  • Popular content lists
  • Configuration
  • Some recommendation blocks

A non-cacheable API is user-specific or highly sensitive.

Examples:

  • User subscription status
  • Payment status
  • Personalized entitlements
  • User session data
  • Watch history writes

If both types of APIs go through the exact same gateway path with the exact same rules, the system wastes capacity.

Cacheable traffic should be optimized for throughput.

Non-cacheable traffic should be optimized for correctness and security.

JioHotstar-Style CDN/API Split

A simplified version looks like this:

text
1Client
2    -> Cacheable API domain
3        -> CDN optimized path
4        -> lighter gateway path
5
6Client
7    -> Non-cacheable API domain
8        -> stricter gateway path
9        -> origin services

The powerful idea is separation.

Instead of treating every API the same, the platform splits traffic based on cacheability.

Cacheable APIs can be aggressively cached and served through CDN-friendly paths.

Non-cacheable APIs can keep stronger authentication, authorization, and origin checks.

This improves throughput because the system does not force every request through the heaviest possible path.

Why This Works For Streaming Scale

In a live event, many users request the same or similar metadata.

If every user hits origin services for the same data, the backend wastes compute.

But if the CDN can serve cacheable responses, origin load drops dramatically.

This is especially important because video streaming already puts huge pressure on CDN and network infrastructure.

The API layer must not become the bottleneck.

Separating cacheable and non-cacheable APIs helps the system scale more cleanly.

JioHotstar's Core Lesson

JioHotstar's lesson is:

Not every request deserves the same infrastructure path.

Some traffic should go through a fast cached route.

Some traffic should go through a stricter dynamic route.

The mistake many teams make is putting all APIs behind one uniform gateway policy.

Uniform design feels clean, but at scale it can become expensive.

Smart separation improves throughput.

5. Comparing The Four Architectures

These companies are not using the same caching strategy because they do not have the same problem.

That is the main point.

CompanyMain ProblemCaching StrategyCore Lesson
MyntraInventory correctness during flash salesL1 near cache + L2 distributed cache + DB fallback + invalidationMove hot reads closer, but handle invalidation carefully
ZerodhaFast user-specific read responsesHot Redis cache + HTTP E-Tag/304Make the hot path do almost no work
FlipkartHuge sale-scale real-time trafficAerospike as shared low-latency data platformCaching becomes a platform problem
JioHotstarMassive concurrency during live streamingSplit cacheable and non-cacheable API pathsRoute traffic based on cacheability

6. The Pattern Behind All Of Them

Even though the implementations differ, the thought process is similar.

A senior engineer does not start with:

text
1Which cache should we use?

A senior engineer starts with:

text
1What is the failure mode?

For Myntra, the failure mode is overselling.

For Zerodha, the failure mode is doing too much work per request.

For Flipkart, the failure mode is losing predictable latency during sale spikes.

For JioHotstar, the failure mode is pushing all traffic through the same expensive gateway path.

Once you understand the failure mode, the architecture becomes much easier to reason about.

7. How To Think About This Architecture As A Backend Engineer

When designing a cache architecture, ask these questions first:

1. Can The Data Be Stale?

If yes, caching is easy.

If no, caching becomes a consistency problem.

Inventory cannot be stale for too long.

User-specific financial/trading data needs careful freshness.

Static metadata can be cached aggressively.

2. Is The Data Shared Or User-Specific?

Shared data is CDN/cache friendly.

User-specific data needs more careful cache keys and invalidation.

For example:

text
1Product image -> highly cacheable
2Product metadata -> cacheable
3Inventory count -> carefully cacheable
4User portfolio -> user-specific cache
5Payment status -> usually dynamic

3. Is The Bottleneck Database, CPU, Network, Or Bandwidth?

Caching solves different problems depending on where the bottleneck is.

text
1Database bottleneck -> add read cache or replicas
2CPU bottleneck -> cache serialized responses
3Network bottleneck -> move cache closer to app or client
4Bandwidth bottleneck -> use E-Tag / CDN / compression
5Consistency bottleneck -> improve invalidation and write path

4. What Happens On Cache Miss?

This is very important.

A healthy cache design has a clear fallback.

text
1Cache miss -> read from source of truth or replica -> refill cache

A dangerous cache design has no clean fallback.

Then the cache starts behaving like a database.

That increases risk.

5. How Will The Cache Be Invalidated?

Caching is easy.

Invalidation is hard.

There are many ways to invalidate:

  • TTL based invalidation
  • Explicit delete on write
  • Event-driven invalidation
  • CDC-based invalidation
  • Versioned keys
  • E-Tag based client validation

The right option depends on the use case.

For inventory, event-driven or CDC-based invalidation can be useful.

For HTTP responses, E-Tags are powerful.

For static content, CDN TTL may be enough.

8. Final Mental Model

Caching is not one technology.

Caching is a design decision at many layers.

You can cache at the browser.

You can cache at the CDN.

You can cache at the API gateway.

You can cache inside the application process.

You can cache in Redis.

You can use Aerospike as a low-latency data platform.

You can cache serialized responses.

You can cache database rows.

You can cache computed views.

The question is not:

text
1Should we use Redis?

The better question is:

text
1Where should this data live so that the system stays fast, correct, and simple under peak load?

That is how companies like Myntra, Zerodha, Flipkart, and JioHotstar think about infrastructure.

They are not blindly adding caches.

They are designing the hot path around their real failure modes.

That is the difference between beginner caching and production infrastructure engineering.

9. A Simple Architecture Inspired By Myntra's Inventory Read Path

A clean version of the inventory architecture looks like this:

text
1User
23CDN
45API Gateway
67Load Balancer
89Inventory API
1011L1 Near Cache
1213Cache Hit?
14  ├── Yes -> Inventory Response
15  └── No  -> L2 Distributed Cache
1617            MySQL Read Replica
1819            Fill L2 Cache
2021            Fill L1 Cache
2223            Inventory Response

The write and invalidation side looks like this:

text
1Checkout API
23Order Service
45Inventory Writer
67MySQL Master
89Binlog CDC
1011Inventory Events
1213Cache Invalidator
14  ├── Invalidate L1 Near Cache
15  └── Invalidate L2 Distributed Cache

This architecture is useful because it separates two concerns.

The read path is optimized for speed.

The write path is optimized for correctness.

The invalidation path connects both sides.

That is the real system design pattern.

Bhupesh Kumar

Bhupesh Kumar

Backend engineer building scalable APIs and distributed systems with Node.js, TypeScript, and Go.