Redis + Varnish + OpenSearch Tuning for Magento 2.4.x
Most Magento performance guides stop at FPM_CHILDREN and "enable Varnish." The real wins live in six config keys nobody talks about: Redis eviction policy, Varnish ESI hold-times, OpenSearch query clause limits, and a handful of header rules. Here is the diff per key, the ROI measured on a real 80k-SKU prod trace, and the safe-rollback notes.
Redis, Varnish, and OpenSearch tuning for Magento 2.4.x is the practice of overriding the six default config keys that Adobe leaves at conservative single-tenant values, which is what makes a stock production stack p99-slow on any catalog north of 30,000 SKUs. The fix is not bigger hardware — it is six edits across env.php, default.vcl, and opensearch.yml. Here is what each one does, the before/after latency from a real prod trace, and the rollback if it goes sideways.
The trace this is built on
Every number in this post comes from one 80,000-SKU Magento 2.4.9 + Hyvä 1.3 store on a 6-core, 32 GB VPS running PHP 8.4, Redis 7.2, Varnish 7.5, and OpenSearch 2.13. Traffic profile: 1.4 M sessions/month, 22% cart-add rate, layered nav across 67 filterable attributes. Tracing via OpenTelemetry into Tempo, p99 captured over a 7-day window before and after each change.
Adobe ships the default config for the lowest common denominator — a single-store 2k-SKU dev install on a laptop. Production is a different planet.
If your store is smaller than 10,000 SKUs and serves under 100k sessions/month, you will not see the deltas below. Read it anyway — you will need it on the next migration.
The six config keys, in order of impact
Here is the cheat sheet before the deep dive. Each row maps to one section below with the diff, the rollback, and the measured p99 delta.
| Config key | Default | Production value | p99 delta |
|---|---|---|---|
Redis maxmemory-policy | noeviction | allkeys-lru | -410 ms category |
Redis cache_id_prefix | blank | per-store unique | fixes cache poisoning |
Redis activedefrag | no | yes | -140 ms after 7 days uptime |
Varnish X-Forwarded-For trust | off | Cloudflare CIDRs only | fixes hit-ratio collapse |
Varnish ESI ttl for cart/wishlist | 0s | 60s with surrogate purge | -580 ms cart-summary block |
OpenSearch max_clause_count | 1024 | 8192 | fixes 500 on layered nav |
Redis fix 1: switch maxmemory-policy off noeviction
Redis ships with maxmemory-policy=noeviction, which means when the cache fills, Magento receives OOM command not allowed when used memory > 'maxmemory' on every SET. Magento's cache layer does not crash — it silently degrades to no-cache mode, and every page recomputes from MySQL.
You will not see this in var/log/system.log. You will see it in p99 latency climbing over the week as the Redis instance fills.
The diff
# /etc/redis/redis.conf
maxmemory 8gb
maxmemory-policy allkeys-lru
maxmemory-samples 10
allkeys-lru evicts the least-recently-used key regardless of TTL. maxmemory-samples 10 raises the eviction-pool sample size from the default 5 — the eviction picks a better LRU candidate at the cost of ~3% more CPU on writes. Worth it on any store that hits the memory ceiling daily.
Measured impact
Pre-change: Redis used_memory_peak_human stuck at 7.94 GB out of 8 GB for 38 hours before the next deploy reset it. Category p99 during that window: 1840 ms. Post-change: peak holds at 7.6 GB and Redis evicts ~120 keys/second steady state. Category p99: 1430 ms. Delta: -410 ms p99.
The rollback
redis-cli CONFIG SET maxmemory-policy noeviction
redis-cli CONFIG REWRITE
systemctl reload redis-serverIf you hit a weird cache-coherency bug after switching, the rollback is one command and takes effect immediately. We have not had to use it in production.
Redis fix 2: separate cache_id_prefix per store
Magento 2.4.x defaults cache_id_prefix in env.php to a blank string. If you run two stores (staging + production, or two brands) against the same Redis instance, they collide on the same keys. The symptom is product prices flickering between the two stores on every cache miss — and it takes a week to diagnose because it only happens on cold caches.
The diff
// app/etc/env.php
'cache' => [
'frontend' => [
'default' => [
'backend' => 'Magento\\Framework\\Cache\\Backend\\Redis',
'backend_options' => [
'server' => '127.0.0.1',
'port' => '6379',
'database' => '0',
'compress_data' => '1',
'compression_lib' => 'zstd',
],
'id_prefix' => 'prod_eu_',
],
'page_cache' => [
'backend' => 'Magento\\Framework\\Cache\\Backend\\Redis',
'backend_options' => [
'server' => '127.0.0.1',
'port' => '6379',
'database' => '1',
'compress_data' => '0',
],
'id_prefix' => 'prod_eu_fpc_',
],
],
],
Use database 0 for the config cache and database 1 for full-page cache. Different DBs let you FLUSHDB page cache without nuking the config cache. The id_prefix must be unique per store-view, not per environment — that is the bit most teams get wrong.
Two stores sharing one Redis instance with blank prefixes is a multi-week bug waiting to happen. Fix it before you scale, not after.
Compression: zstd or l4z, never gzip
Set compression_lib to zstd on the default cache. Compression ratio is 2.3x at ~1/4 of gzip's CPU cost. Do not enable compression on page_cache — the FPC payload is already small per-key and the CPU cost outweighs the network saving.
Redis fix 3: activedefrag on long-running instances
Redis fragments its memory allocator after 4–7 days of mixed-size writes. The symptom is used_memory_rss sitting 30–40% above used_memory in INFO memory. Restarting Redis fixes it — and resets every cache key, which costs you the next ~20 minutes of p99 latency while caches rebuild.
The fix is to enable jemalloc's online defragmenter, which moves keys between memory blocks in the background.
# /etc/redis/redis.conf
activedefrag yes
active-defrag-ignore-bytes 100mb
active-defrag-threshold-lower 10
active-defrag-threshold-upper 100
active-defrag-cycle-min 5
active-defrag-cycle-max 25
The defaults are conservative. The values above are the ones we ship to every kishansavaliya.com client running Redis past 30 GB working set. Measured impact: used_memory_rss / used_memory ratio stays at 1.06 instead of climbing to 1.38 over the week. p99 stayed flat instead of degrading 140 ms by day 7.
Varnish fix 1: trust Cloudflare X-Forwarded-For
If your store is behind Cloudflare, every request hits Varnish from one of ~17 Cloudflare edge IPs. Varnish sees those IPs as the client IP, which means client.ip in VCL is useless for rate-limiting, geo-routing, or admin-IP allowlisting. The fix is to read X-Forwarded-For — but only from trusted Cloudflare CIDRs, otherwise any client can spoof the header.
The diff in default.vcl
# /etc/varnish/cloudflare.vcl - include from default.vcl
import std;
acl cloudflare_v4 {
"103.21.244.0"/22;
"103.22.200.0"/22;
"103.31.4.0"/22;
"104.16.0.0"/13;
"104.24.0.0"/14;
"108.162.192.0"/18;
"131.0.72.0"/22;
"141.101.64.0"/18;
"162.158.0.0"/15;
"172.64.0.0"/13;
"173.245.48.0"/20;
"188.114.96.0"/20;
"190.93.240.0"/20;
"197.234.240.0"/22;
"198.41.128.0"/17;
}
sub vcl_recv {
if (client.ip ~ cloudflare_v4 && req.http.CF-Connecting-IP) {
set req.http.X-Real-IP = req.http.CF-Connecting-IP;
} else {
set req.http.X-Real-IP = client.ip;
}
}
Cloudflare publishes the current CIDR list at https://www.cloudflare.com/ips-v4. Pin a cron to refresh the ACL weekly — Cloudflare adds blocks every few months.
Why this matters for performance, not just security
If Varnish trusts the wrong client IP, every session-aware ESI block (cart summary, customer name, recently-viewed) gets cached against the Cloudflare edge IP — which means two unrelated customers behind the same edge get each other's cart. The Magento response then sets Cache-Control: no-store defensively, and your Varnish hit ratio collapses from 92% to 11%.
I have personally chased this bug on three stores. Each one had the same symptom: cart works in incognito, breaks in production. Each one was the same fix.
Varnish fix 2: ESI hold-times for cart and wishlist
The Magento default Varnish VCL treats every ESI block as ttl=0, meaning Varnish refetches the block on every request. On a logged-in customer with cart + wishlist + recently-viewed + customer-section blocks, that is 4 backend round trips per page.
Set a short TTL with a surrogate purge key. The cart block stays fresh for 60 seconds; an explicit X-Magento-Tags purge clears it on add-to-cart.
sub vcl_backend_response {
if (bereq.url ~ "/customer/section/load") {
set beresp.ttl = 60s;
set beresp.grace = 24h;
set beresp.http.Cache-Control = "private, max-age=60";
return (deliver);
}
}
sub vcl_recv {
if (req.http.X-Magento-Tags-Pattern) {
ban("obj.http.X-Magento-Tags ~ " + req.http.X-Magento-Tags-Pattern);
return (synth(200, "Purged"));
}
}
Pair this with the Magento config system/full_page_cache/varnish/grace_period=86400 so stale content serves while the backend refreshes.
Measured impact
Cart-summary block: 4 backend hits/page x 145 ms = 580 ms per request, becomes 1 backend hit/60s x 145 ms = ~2.4 ms amortized per request. Customer-section p99 dropped from 620 ms to 38 ms. Real saving, not theoretical.
Varnish fix 3: strip cookies on static assets
Magento's default VCL strips cookies on .css, .js, and image extensions — but not on font files (.woff2, .ttf), not on .webp, and not on .avif. Hyvä themes ship every icon as .webp, which means every cookie-bearing request misses cache and hits the backend.
sub vcl_recv {
if (req.url ~ "\.(css|js|jpg|jpeg|png|gif|webp|avif|svg|ico|woff|woff2|ttf|otf|eot)(\?.*)?$") {
unset req.http.Cookie;
set req.http.X-Static-Asset = "1";
return (hash);
}
}
The result: static asset hit ratio jumped from 71% to 99.4%, origin egress dropped by ~38%. If your store serves images from the Magento origin instead of a CDN, this is the single largest hit-ratio improvement you will ship.
OpenSearch fix 1: raise max_clause_count
Magento 2.4.x layered navigation builds a boolean query with one clause per active filter value. A store with 50+ filterable attributes and 200+ values per attribute generates queries with 1500+ clauses on a category landing page with no filters applied. The OpenSearch default indices.query.bool.max_clause_count=1024 rejects the query with HTTP 500.
The symptom: random 500 errors on category pages, only reproducible at peak traffic when the filter-cache is cold.
# /etc/opensearch/opensearch.yml
indices.query.bool.max_clause_count: 8192
indices.query.bool.max_nested_depth: 30
The OpenSearch docs warn that raising this consumes more heap per query. On a 16 GB heap with our 80k-SKU index, the JVM peak rose from 9.2 GB to 10.4 GB after the change. Acceptable, but watch GET _nodes/stats/jvm for two weeks after.
The rollback
Lower the value and restart OpenSearch. The setting is not dynamic — PUT _cluster/settings rejects it.
OpenSearch fix 2: max_result_window for sitemap generation
Magento's sitemap cron runs paginated queries against OpenSearch with from + size values that exceed the default index.max_result_window=10000 on any catalog past 10k SKUs. The cron fails silently — the next sitemap regen drops 70% of your URLs and you only notice when Search Console reports the index drop.
The fix is per-index, not cluster-wide.
curl -X PUT "localhost:9200/magento2_product_*/_settings" \
-H "Content-Type: application/json" \
-d '{ "index.max_result_window": 200000 }'200,000 covers any realistic Magento catalog. The trade-off is heap usage during the deep-pagination query — about 80 MB peak per sitemap cron run. Run the sitemap cron at 03:00 when traffic is low and you will not notice.
Better fix if you can refactor
The right answer is to rewrite the sitemap cron to use search_after instead of from + size pagination. The Adobe-maintained magento/module-sitemap does not do this. Kishan Savaliya's Panth_XmlSitemap module does — but if you cannot swap the module, the index setting above gets you running today.
The before/after table
All six changes applied to the same store, same traffic profile, captured over consecutive 7-day windows.
| Metric | Before | After |
|---|---|---|
| Category page p99 | 1840 ms | 380 ms |
| Search results p99 | 920 ms | 210 ms |
| Customer-section block p99 | 620 ms | 38 ms |
| Varnish hit ratio | 71% | 99.4% |
| Redis evictions/min | 0 (then OOM) | ~120 steady |
| OpenSearch 500s/day | 14 | 0 |
What we did not do
We did not switch to LiteSpeed, raise FPM worker count, or buy more RAM. We did not rebuild on a headless storefront. Every change above is open-source Magento 2.4.4 — 2.4.9 with default Redis, Varnish, and OpenSearch packages from Debian 12 and Ubuntu 24.04 repos. Total engineering time per store, including baseline capture and post-deploy validation, runs around 24 hours.
FAQ
Will allkeys-lru ever evict a key Magento still needs?
Yes — and that is the point. Magento's cache layer regenerates evicted keys on the next request. The cost is a one-request latency spike on cache miss, which is several orders of magnitude cheaper than the OOM-induced no-cache mode that noeviction falls into when Redis fills.
Do these Varnish changes work without Cloudflare?
Five of the six do. The X-Forwarded-For ACL is Cloudflare-specific — swap the CIDR list for your CDN's published IP ranges (Fastly, Akamai, AWS CloudFront all publish them as JSON feeds). The ESI hold-time, the cookie strip, and the surrogate purge are CDN-agnostic.
What if I run Elasticsearch instead of OpenSearch?
The two OpenSearch keys map one-to-one to Elasticsearch 7.x: indices.query.bool.max_clause_count is identical, index.max_result_window is identical. Adobe dropped Elasticsearch support in 2.4.7 — if you are still on it, the upgrade to OpenSearch 2.13 is the bigger win.
How do I know if my Redis is fragmented before I enable activedefrag?
Run redis-cli INFO memory | grep mem_fragmentation_ratio. A ratio above 1.4 means you are wasting 40%+ of allocated memory on fragmentation. Anything above 1.2 is worth enabling defrag for.
Do I need to flush cache after changing cache_id_prefix?
No — but you should. Old keys with the blank prefix sit in Redis until they expire or get evicted, which means up to 24 hours of stale entries clogging your eviction pool. Run redis-cli FLUSHDB on the Magento cache DBs after the env.php change.
Is raising max_clause_count to 8192 safe on a 4 GB heap node?
No. The 8192 ceiling is safe on 16 GB+ heaps with our catalog profile. On 4 GB heaps, cap it at 4096 and monitor GC pause time in _nodes/stats/jvm/gc. If old-gen GC pauses climb past 200 ms, lower the limit.
Does Magento Cloud (PaaS) allow these changes?
Redis policies and OpenSearch settings — yes, via the project YAML. Varnish VCL changes — partially, Magento Cloud provides a limited VCL include hook. The ESI TTL change works; the X-Forwarded-For ACL needs Adobe support to whitelist your CIDR file.
References
- [1] Redis 7.2 documentation, eviction policies.
https://redis.io/docs/latest/develop/reference/eviction/ - [2] OpenSearch 2.13 cluster settings reference,
indices.query.bool.max_clause_count. - [3] Varnish Cache 7.5 reference manual,
vcl_recvandban()semantics. - [4] Adobe Commerce DevDocs, Redis backend configuration for 2.4.x.
- [5] Cloudflare IP Ranges, refreshed feed at
https://www.cloudflare.com/ips-v4.
I ship a fixed-scope Redis + Varnish + OpenSearch tuning sprint with a 7-day before/after p99 report and a rollback playbook per change. Fixed quote: $499 audit · $2,499 sprint · ~24h @ $25/hr. See hire me or Magento 2 performance optimization.