Most people learn caching as a simple idea:
Put Redis in front of the database so reads become faster.
That is correct at a beginner level, but it is not how large systems actually evolve.
At real scale, caching is not just about speed. It becomes a question of correctness, failure handling, cost, bandwidth, tail latency, and operational simplicity.
This is why different companies solve the same problem in very different ways.
Myntra moved inventory reads closer to the application using near-cache style architecture.
Zerodha kept the hot read path extremely simple with Redis and HTTP E-Tags.
Flipkart uses Aerospike as a high-throughput real-time data layer for large-scale sale traffic.
JioHotstar optimizes throughput by separating cacheable and non-cacheable APIs at the CDN/API layer.
All four are caching stories, but the design decisions are completely different.
The important lesson is this:
There is no universal caching architecture. The correct design depends on what breaks first in your system.
1. Myntra: Inventory Caching Where Consistency Matters More Than Simplicity
Myntra is one of the best examples because the problem is very easy to understand.
Imagine a flash sale.
A product has only 5 units left.
Thousands of users are opening the product page, adding it to cart, and trying to buy it at the same time.
For a normal product page, showing slightly stale data may be acceptable.
But for inventory, stale data is dangerous.
If the system says an item is available when it is already sold out, customers can place orders that cannot be fulfilled. That becomes an overselling problem.
So inventory caching is not just a read-scaling problem. It is a correctness problem.
The Original Myntra-Style Problem
The common architecture looks something like this:
text1Inventory DB -> Central Redis Cache -> Inventory API -> User
The application reads inventory data from a centralized cache service.
At first, this looks reasonable.
Redis is fast. Redis can handle large read volume. Application servers do not need to hit the database directly.
But the issue appears when updates happen.
Inventory changes very frequently:
text1Order placed 2Order cancelled 3Stock reserved 4Stock released 5Warehouse stock updated 6Return received
If the database updates first and the cache updates asynchronously after that, there is always a small time window where the cache can be stale.
For normal metadata, this may not be a big deal.
For inventory, that small window can become overselling during peak traffic.
That is the core problem Myntra was solving.
Why Centralized Redis Became a Bottleneck
A centralized Redis-backed cache creates a few problems at very high scale.
First, every inventory read goes over the network.
Even if Redis is fast, network calls are not free. When you are doing millions of lookups, network latency and network bandwidth become serious concerns.
Second, Redis becomes a central dependency.
If every product page, cart page, and checkout flow depends on one central cache layer, then that cache becomes part of the critical path.
Third, cache misses become dangerous if the cache is treated almost like a persistent store.
In a clean cache design, a cache miss should fall back to the source of truth.
But if the cache is designed in such a way that the application expects the data to always be present, then it is no longer just a cache. It behaves like a pseudo-source-of-truth.
That makes failure modes harder.
Myntra's Better Direction: Move Reads Closer To The Application
Myntra's solution was not "remove caching."
The better way to say it is:
Myntra removed the centralized Redis-backed cache from the hot inventory read path and moved to a near-cache architecture.
A simplified version looks like this:
text1Inventory API 2 -> L1 Near Cache inside/near the application 3 -> L2 Distributed Cache 4 -> MySQL Read Replica
The important idea is the fastest read should happen closest to the service.
So instead of every request going to a centralized Redis service over the network, the application can serve many reads from an in-process or near-process cache.
This reduces network calls, reduces load on the central cache, and improves read latency.
How The Read Path Works
The inventory read path can be understood slowly like this:
text1User requests product inventory 2 ↓ 3Inventory API receives request 4 ↓ 5Check L1 Near Cache 6 ↓ 7If cache hit: return response immediately 8 ↓ 9If cache miss: check L2 distributed cache 10 ↓ 11If needed: read from MySQL read replica 12 ↓ 13Fill cache again 14 ↓ 15Return response
This is a multi-tier cache.
L1 cache is fastest because it is closest to the application.
L2 cache is shared and distributed, so it helps when local cache does not have the data.
MySQL read replica is the fallback read source.
MySQL master remains the source of truth for writes.
Why This Helps During Flash Sales
During a sale, the same product inventory may be read again and again.
If every request goes to Redis or the database, the backend has to handle massive repeated reads.
But if each application instance has a near cache, many repeated reads can be served locally.
That means:
- Less network traffic
- Lower cache cluster pressure
- Lower database pressure
- Lower latency
- Better resource usage
The architecture becomes more efficient because the system stops doing unnecessary remote calls for data that is already hot.
The Hard Part: Invalidation
Near cache is powerful, but it creates one hard problem:
How do you make sure local caches do not keep stale inventory forever?
That is where invalidation comes in.
A common design is:
text1MySQL Master 2 -> Binlog CDC 3 -> Inventory Events 4 -> Cache Invalidator 5 -> Invalidate L1 and L2 cache
When inventory changes in the master database, the change is captured from the database log.
That change becomes an inventory event.
A cache invalidator consumes the event and clears or updates the affected cache keys.
This keeps the read path fast while still giving the system a way to remove stale data.
Myntra's Core Lesson
Myntra's story is not simply "Redis is bad."
Redis is excellent for many use cases.
The real lesson is:
If your cache becomes a correctness risk, your architecture has to change.
For inventory, consistency matters more than blindly centralizing every read through Redis.
Myntra optimized the read path by moving hot reads closer to the application and relying on a stronger invalidation pipeline.
That is why this architecture is so interesting.
Sometimes scaling is not about adding another bigger cache.
Sometimes scaling means removing the wrong cache from the wrong place.
2. Zerodha: Keep The Hot Path Extremely Simple
Zerodha's caching philosophy is almost the opposite of Myntra's.
Zerodha runs Kite, a trading platform where users constantly check orders, positions, holdings, margins, and portfolio data.
The key challenge is not the same as e-commerce inventory.
For Zerodha, the hot path is mostly about serving user-specific read responses very quickly and reliably.
Their public engineering writing explains a very simple idea:
Every bit of data shown on Kite comes from a hot Redis cache.
This includes things like orders, positions, and portfolio data.
The key idea is simplicity.
The Zerodha-Style Read Path
A simplified version looks like this:
text1User opens Kite 2 ↓ 3API receives request 4 ↓ 5API does O(1) lookup in Redis 6 ↓ 7Raw JSON bytes are returned
The important part is what does not happen in the hot path.
The system does not recompute everything again.
It does not do heavy joins.
It does not do expensive serialization repeatedly.
It does not hit the database for every screen refresh.
It reads already prepared data from Redis and returns it.
That is why the design is powerful.
Why This Works For Zerodha
Trading apps have a very specific traffic pattern.
Many users repeatedly open the same screens:
text1Positions 2Holdings 3Orders 4Funds 5Portfolio
If each request recomputes the response, the backend wastes CPU.
If each request hits the database, the database becomes overloaded.
If each request serializes fresh JSON, the API layer does unnecessary work.
So Zerodha keeps the hot response ready in Redis.
Then the API layer becomes almost a thin delivery mechanism.
That is the beauty of the design.
E-Tag And HTTP 304 Caching
Zerodha also uses HTTP caching ideas, especially E-Tags and 304 responses.
The concept is simple.
The server gives a version tag for a response.
The client stores that response locally.
Next time, the client asks:
text1Has this data changed since this E-Tag?
If the data has not changed, the server does not send the full response again.
It can return:
text1304 Not Modified
This saves bandwidth.
That matters a lot on mobile networks.
It also reduces backend load because the system avoids repeatedly sending large unchanged payloads.
Zerodha's Core Lesson
Zerodha's lesson is:
The fastest request is the one where the server does almost no work.
Their design is not about having the most complex distributed system.
It is about reducing work in the hot path.
Keep prepared data in Redis.
Use HTTP caching where possible.
Avoid unnecessary serialization.
Avoid unnecessary database calls.
Avoid unnecessary network transfer.
This is engineering maturity.
Not every system needs a complex event-driven architecture.
Sometimes the best architecture is the boring one that does the least amount of work.
3. Flipkart: Aerospike For Massive Real-Time Sale Traffic
Flipkart has a different scale problem.
During Big Billion Days, traffic spikes are massive.
Users are searching, scrolling, viewing products, checking prices, seeing recommendations, clicking ads, adding to cart, and buying products.
This creates huge read and write pressure across many systems.
Flipkart's public Aerospike case studies talk about very high query volume across multiple data centers and many use cases.
The interesting part is that Aerospike is not being used for one small cache.
It powers many low-latency, high-throughput use cases.
Examples include:
- Search
- Recommendations
- Ads
- Pricing
- Inventory
- Offers
- Feature stores
- Real-time user experiences
Why Redis Alone May Not Be Enough Here
Redis is great when the dataset fits comfortably in memory and the access pattern is simple.
But large e-commerce systems often have massive datasets and extreme traffic spikes.
Keeping everything purely in RAM can become expensive.
Operationally, managing many use cases separately can also become difficult.
Aerospike is designed for high-throughput, low-latency workloads and can use SSDs efficiently while still giving predictable latency.
That makes it attractive for very large real-time systems.
Flipkart-Style Data Platform Thinking
Instead of every team running its own random cache/database cluster, a platform approach is better.
A simplified version looks like this:
text1Application Teams 2 ↓ 3Shared Aerospike Platform 4 ↓ 5Low-latency real-time reads/writes 6 ↓ 7Search, pricing, inventory, ads, recommendations
The goal is not only speed.
The goal is operational control.
When a company has hundreds of backend services, it cannot let every team invent a new caching layer independently.
A shared platform gives:
- Standardized operations
- Predictable latency
- Better capacity planning
- Centralized expertise
- Lower operational chaos
That matters during sale events.
Why This Matters During Big Billion Days
During normal traffic, many systems look fine.
During sale traffic, weak systems break.
A sale event creates synchronized user behavior.
Millions of people come at the same time.
They search at the same time.
They check prices at the same time.
They open product pages at the same time.
They refresh at the same time.
That means tail latency becomes extremely important.
Average latency is not enough.
If the 99th percentile latency becomes bad, users will feel the app is slow even if the average looks fine.
So Flipkart's design needs a data layer that can handle massive concurrency with predictable performance.
That is where Aerospike fits into the story.
Flipkart's Core Lesson
Flipkart's lesson is:
At very large scale, caching becomes a platform problem, not just an application-level trick.
A small startup can add Redis and move on.
A large e-commerce company needs a managed, standardized, highly available, low-latency data platform.
The problem is not just "how do I make one API faster?"
The real problem is:
text1How do hundreds of teams serve real-time data under sale traffic without each team reinventing infrastructure?
That is a very different level of engineering.
4. JioHotstar: Split Cacheable And Non-Cacheable APIs
JioHotstar has another kind of scale problem.
Streaming platforms deal with huge concurrency.
During a cricket match, millions of users may join at the same time.
The backend has to support login, playback, recommendations, metadata, live match state, ads, subscriptions, and many other services.
But not all APIs are equal.
Some APIs are cacheable.
Some APIs are not.
That distinction is extremely important.
Cacheable Vs Non-Cacheable APIs
A cacheable API returns data that can be reused across many users or for a short time window.
Examples:
- Match metadata
- Static content metadata
- Home page modules
- Popular content lists
- Configuration
- Some recommendation blocks
A non-cacheable API is user-specific or highly sensitive.
Examples:
- User subscription status
- Payment status
- Personalized entitlements
- User session data
- Watch history writes
If both types of APIs go through the exact same gateway path with the exact same rules, the system wastes capacity.
Cacheable traffic should be optimized for throughput.
Non-cacheable traffic should be optimized for correctness and security.
JioHotstar-Style CDN/API Split
A simplified version looks like this:
text1Client 2 -> Cacheable API domain 3 -> CDN optimized path 4 -> lighter gateway path 5 6Client 7 -> Non-cacheable API domain 8 -> stricter gateway path 9 -> origin services
The powerful idea is separation.
Instead of treating every API the same, the platform splits traffic based on cacheability.
Cacheable APIs can be aggressively cached and served through CDN-friendly paths.
Non-cacheable APIs can keep stronger authentication, authorization, and origin checks.
This improves throughput because the system does not force every request through the heaviest possible path.
Why This Works For Streaming Scale
In a live event, many users request the same or similar metadata.
If every user hits origin services for the same data, the backend wastes compute.
But if the CDN can serve cacheable responses, origin load drops dramatically.
This is especially important because video streaming already puts huge pressure on CDN and network infrastructure.
The API layer must not become the bottleneck.
Separating cacheable and non-cacheable APIs helps the system scale more cleanly.
JioHotstar's Core Lesson
JioHotstar's lesson is:
Not every request deserves the same infrastructure path.
Some traffic should go through a fast cached route.
Some traffic should go through a stricter dynamic route.
The mistake many teams make is putting all APIs behind one uniform gateway policy.
Uniform design feels clean, but at scale it can become expensive.
Smart separation improves throughput.
5. Comparing The Four Architectures
These companies are not using the same caching strategy because they do not have the same problem.
That is the main point.
| Company | Main Problem | Caching Strategy | Core Lesson |
|---|---|---|---|
| Myntra | Inventory correctness during flash sales | L1 near cache + L2 distributed cache + DB fallback + invalidation | Move hot reads closer, but handle invalidation carefully |
| Zerodha | Fast user-specific read responses | Hot Redis cache + HTTP E-Tag/304 | Make the hot path do almost no work |
| Flipkart | Huge sale-scale real-time traffic | Aerospike as shared low-latency data platform | Caching becomes a platform problem |
| JioHotstar | Massive concurrency during live streaming | Split cacheable and non-cacheable API paths | Route traffic based on cacheability |
6. The Pattern Behind All Of Them
Even though the implementations differ, the thought process is similar.
A senior engineer does not start with:
text1Which cache should we use?
A senior engineer starts with:
text1What is the failure mode?
For Myntra, the failure mode is overselling.
For Zerodha, the failure mode is doing too much work per request.
For Flipkart, the failure mode is losing predictable latency during sale spikes.
For JioHotstar, the failure mode is pushing all traffic through the same expensive gateway path.
Once you understand the failure mode, the architecture becomes much easier to reason about.
7. How To Think About This Architecture As A Backend Engineer
When designing a cache architecture, ask these questions first:
1. Can The Data Be Stale?
If yes, caching is easy.
If no, caching becomes a consistency problem.
Inventory cannot be stale for too long.
User-specific financial/trading data needs careful freshness.
Static metadata can be cached aggressively.
3. Is The Bottleneck Database, CPU, Network, Or Bandwidth?
Caching solves different problems depending on where the bottleneck is.
text1Database bottleneck -> add read cache or replicas 2CPU bottleneck -> cache serialized responses 3Network bottleneck -> move cache closer to app or client 4Bandwidth bottleneck -> use E-Tag / CDN / compression 5Consistency bottleneck -> improve invalidation and write path
4. What Happens On Cache Miss?
This is very important.
A healthy cache design has a clear fallback.
text1Cache miss -> read from source of truth or replica -> refill cache
A dangerous cache design has no clean fallback.
Then the cache starts behaving like a database.
That increases risk.
5. How Will The Cache Be Invalidated?
Caching is easy.
Invalidation is hard.
There are many ways to invalidate:
- TTL based invalidation
- Explicit delete on write
- Event-driven invalidation
- CDC-based invalidation
- Versioned keys
- E-Tag based client validation
The right option depends on the use case.
For inventory, event-driven or CDC-based invalidation can be useful.
For HTTP responses, E-Tags are powerful.
For static content, CDN TTL may be enough.
8. Final Mental Model
Caching is not one technology.
Caching is a design decision at many layers.
You can cache at the browser.
You can cache at the CDN.
You can cache at the API gateway.
You can cache inside the application process.
You can cache in Redis.
You can use Aerospike as a low-latency data platform.
You can cache serialized responses.
You can cache database rows.
You can cache computed views.
The question is not:
text1Should we use Redis?
The better question is:
text1Where should this data live so that the system stays fast, correct, and simple under peak load?
That is how companies like Myntra, Zerodha, Flipkart, and JioHotstar think about infrastructure.
They are not blindly adding caches.
They are designing the hot path around their real failure modes.
That is the difference between beginner caching and production infrastructure engineering.
9. A Simple Architecture Inspired By Myntra's Inventory Read Path
A clean version of the inventory architecture looks like this:
text1User 2 ↓ 3CDN 4 ↓ 5API Gateway 6 ↓ 7Load Balancer 8 ↓ 9Inventory API 10 ↓ 11L1 Near Cache 12 ↓ 13Cache Hit? 14 ├── Yes -> Inventory Response 15 └── No -> L2 Distributed Cache 16 ↓ 17 MySQL Read Replica 18 ↓ 19 Fill L2 Cache 20 ↓ 21 Fill L1 Cache 22 ↓ 23 Inventory Response
The write and invalidation side looks like this:
text1Checkout API 2 ↓ 3Order Service 4 ↓ 5Inventory Writer 6 ↓ 7MySQL Master 8 ↓ 9Binlog CDC 10 ↓ 11Inventory Events 12 ↓ 13Cache Invalidator 14 ├── Invalidate L1 Near Cache 15 └── Invalidate L2 Distributed Cache
This architecture is useful because it separates two concerns.
The read path is optimized for speed.
The write path is optimized for correctness.
The invalidation path connects both sides.
That is the real system design pattern.
