Redis, Varnish, OpenSearch Tuning Magento 2.4

Redis, Varnish, and OpenSearch tuning for Magento 2.4.x is the practice of overriding the six default config keys that Adobe leaves at conservative single-tenant values, which is what makes a stock production stack p99-slow on any catalog north of 30,000 SKUs. The fix is not bigger hardware: it is six edits across env.php, default.vcl, and opensearch.yml. Here is what each one does, the before/after latency from a real prod trace, and the rollback if it goes sideways.

The trace this is built on

Every number in this post comes from one 80,000-SKU Magento 2.4.9 + Hyvä 1.3 store on a 6-core, 32 GB VPS running PHP 8.4, Redis 7.2, Varnish 7.5, and OpenSearch 2.13. Traffic profile: 1.4 M sessions/month, 22% cart-add rate, layered nav across 67 filterable attributes. Tracing via OpenTelemetry into Tempo, p99 captured over a 7-day window before and after each change.

Adobe ships the default config for the lowest common denominator: a single-store 2k-SKU dev install on a laptop. Production is a different planet.

If your store is smaller than 10,000 SKUs and serves under 100k sessions/month, you will not see the deltas below. Read it anyway: you will need it on the next migration.

The six config keys, in order of impact

Here is the cheat sheet before the deep dive. Each row maps to one section below with the diff, the rollback, and the measured p99 delta.

Config key	Default	Production value	p99 delta
Redis `maxmemory-policy`	`noeviction`	`allkeys-lru`	-410 ms category
Redis `cache_id_prefix`	blank	per-store unique	fixes cache poisoning
Redis `activedefrag`	`no`	`yes`	-140 ms after 7 days uptime
Varnish `X-Forwarded-For` trust	off	Cloudflare CIDRs only	fixes hit-ratio collapse
Varnish ESI `ttl` for cart/wishlist	0s	60s with surrogate purge	-580 ms cart-summary block
OpenSearch `max_clause_count`	1024	8192	fixes 500 on layered nav

Redis fix 1: switch maxmemory-policy off noeviction

Redis ships with maxmemory-policy=noeviction, which means when the cache fills, Magento receives OOM command not allowed when used memory > 'maxmemory' on every SET. Magento's cache layer does not crash: it silently degrades to no-cache mode, and every page recomputes from MySQL.

You will not see this in var/log/system.log. You will see it in p99 latency climbing over the week as the Redis instance fills.

The diff

# /etc/redis/redis.conf
maxmemory 8gb
maxmemory-policy allkeys-lru
maxmemory-samples 10

allkeys-lru evicts the least-recently-used key regardless of TTL. maxmemory-samples 10 raises the eviction-pool sample size from the default 5: the eviction picks a better LRU candidate at the cost of ~3% more CPU on writes. Worth it on any store that hits the memory ceiling daily.

Measured impact

Pre-change: Redis used_memory_peak_human stuck at 7.94 GB out of 8 GB for 38 hours before the next deploy reset it. Category p99 during that window: 1840 ms. Post-change: peak holds at 7.6 GB and Redis evicts ~120 keys/second steady state. Category p99: 1430 ms. Delta: -410 ms p99.

The rollback

redis-cli CONFIG SET maxmemory-policy noeviction
redis-cli CONFIG REWRITE
systemctl reload redis-server

If you hit a weird cache-coherency bug after switching, the rollback is one command and takes effect immediately. We have not had to use it in production.

Redis fix 2: separate cache_id_prefix per store

Magento 2.4.x defaults cache_id_prefix in env.php to a blank string. If you run two stores (staging + production, or two brands) against the same Redis instance, they collide on the same keys. The symptom is product prices flickering between the two stores on every cache miss, and it takes a week to diagnose because it only happens on cold caches.

The diff

// app/etc/env.php
'cache' => [
    'frontend' => [
        'default' => [
            'backend' => 'Magento\\Framework\\Cache\\Backend\\Redis',
            'backend_options' => [
                'server' => '127.0.0.1',
                'port' => '6379',
                'database' => '0',
                'compress_data' => '1',
                'compression_lib' => 'zstd',
            ],
            'id_prefix' => 'prod_eu_',
        ],
        'page_cache' => [
            'backend' => 'Magento\\Framework\\Cache\\Backend\\Redis',
            'backend_options' => [
                'server' => '127.0.0.1',
                'port' => '6379',
                'database' => '1',
                'compress_data' => '0',
            ],
            'id_prefix' => 'prod_eu_fpc_',
        ],
    ],
],

Use database 0 for the config cache and database 1 for full-page cache. Different DBs let you FLUSHDB page cache without nuking the config cache. The id_prefix must be unique per store-view, not per environment: that is the bit most teams get wrong.

Two stores sharing one Redis instance with blank prefixes is a multi-week bug waiting to happen. Fix it before you scale, not after.

Compression: zstd or l4z, never gzip

Set compression_lib to zstd on the default cache. Compression ratio is 2.3x at ~1/4 of gzip's CPU cost. Do not enable compression on page_cache the FPC payload is already small per-key and the CPU cost outweighs the network saving.

Redis fix 3: activedefrag on long-running instances

Redis fragments its memory allocator after 4-7 days of mixed-size writes. The symptom is used_memory_rss sitting 30-40% above used_memory in INFO memory. Restarting Redis fixes it, and resets every cache key, which costs you the next ~20 minutes of p99 latency while caches rebuild.

The fix is to enable jemalloc's online defragmenter, which moves keys between memory blocks in the background.

# /etc/redis/redis.conf
activedefrag yes
active-defrag-ignore-bytes 100mb
active-defrag-threshold-lower 10
active-defrag-threshold-upper 100
active-defrag-cycle-min 5
active-defrag-cycle-max 25

The defaults are conservative. The values above are the ones we ship to every kishansavaliya.com client running Redis past 30 GB working set. Measured impact: used_memory_rss / used_memory ratio stays at 1.06 instead of climbing to 1.38 over the week. p99 stayed flat instead of degrading 140 ms by day 7.

Varnish fix 1: trust Cloudflare X-Forwarded-For

If your store is behind Cloudflare, every request hits Varnish from one of ~17 Cloudflare edge IPs. Varnish sees those IPs as the client IP, which means client.ip in VCL is useless for rate-limiting, geo-routing, or admin-IP allowlisting. The fix is to read X-Forwarded-For but only from trusted Cloudflare CIDRs, otherwise any client can spoof the header.

The diff in default.vcl

# /etc/varnish/cloudflare.vcl - include from default.vcl
import std;

acl cloudflare_v4 {
    "103.21.244.0"/22;
    "103.22.200.0"/22;
    "103.31.4.0"/22;
    "104.16.0.0"/13;
    "104.24.0.0"/14;
    "108.162.192.0"/18;
    "131.0.72.0"/22;
    "141.101.64.0"/18;
    "162.158.0.0"/15;
    "172.64.0.0"/13;
    "173.245.48.0"/20;
    "188.114.96.0"/20;
    "190.93.240.0"/20;
    "197.234.240.0"/22;
    "198.41.128.0"/17;
}

sub vcl_recv {
    if (client.ip ~ cloudflare_v4 && req.http.CF-Connecting-IP) {
        set req.http.X-Real-IP = req.http.CF-Connecting-IP;
    } else {
        set req.http.X-Real-IP = client.ip;
    }
}

Cloudflare publishes the current CIDR list at https://www.cloudflare.com/ips-v4. Pin a cron to refresh the ACL weekly: Cloudflare adds blocks every few months.

Why this matters for performance, not just security

If Varnish trusts the wrong client IP, every session-aware ESI block (cart summary, customer name, recently-viewed) gets cached against the Cloudflare edge IP, which means two unrelated customers behind the same edge get each other's cart. The Magento response then sets Cache-Control: no-store defensively, and your Varnish hit ratio collapses from 92% to 11%.

I have personally chased this bug on three stores. Each one had the same symptom: cart works in incognito, breaks in production. Each one was the same fix.

Varnish fix 2: ESI hold-times for cart and wishlist

The Magento default Varnish VCL treats every ESI block as ttl=0, meaning Varnish refetches the block on every request. On a logged-in customer with cart + wishlist + recently-viewed + customer-section blocks, that is 4 backend round trips per page.

Set a short TTL with a surrogate purge key. The cart block stays fresh for 60 seconds; an explicit X-Magento-Tags purge clears it on add-to-cart.

sub vcl_backend_response {
    if (bereq.url ~ "/customer/section/load") {
        set beresp.ttl = 60s;
        set beresp.grace = 24h;
        set beresp.http.Cache-Control = "private, max-age=60";
        return (deliver);
    }
}

sub vcl_recv {
    if (req.http.X-Magento-Tags-Pattern) {
        ban("obj.http.X-Magento-Tags ~ " + req.http.X-Magento-Tags-Pattern);
        return (synth(200, "Purged"));
    }
}

Pair this with the Magento config system/full_page_cache/varnish/grace_period=86400 so stale content serves while the backend refreshes.

Measured impact

Cart-summary block: 4 backend hits/page x 145 ms = 580 ms per request, becomes 1 backend hit/60s x 145 ms = ~2.4 ms amortized per request. Customer-section p99 dropped from 620 ms to 38 ms. Real saving, not theoretical.

Varnish fix 3: strip cookies on static assets

Magento's default VCL strips cookies on .css, .js, and image extensions, but not on font files (.woff2, .ttf), not on .webp, and not on .avif. Hyvä themes ship every icon as .webp, which means every cookie-bearing request misses cache and hits the backend.

sub vcl_recv {
    if (req.url ~ "\.(css|js|jpg|jpeg|png|gif|webp|avif|svg|ico|woff|woff2|ttf|otf|eot)(\?.*)?$") {
        unset req.http.Cookie;
        set req.http.X-Static-Asset = "1";
        return (hash);
    }
}

The result: static asset hit ratio jumped from 71% to 99.4%, origin egress dropped by ~38%. If your store serves images from the Magento origin instead of a CDN, this is the single largest hit-ratio improvement you will ship.

OpenSearch fix 1: raise max_clause_count

Magento 2.4.x layered navigation builds a boolean query with one clause per active filter value. A store with 50+ filterable attributes and 200+ values per attribute generates queries with 1500+ clauses on a category landing page with no filters applied. The OpenSearch default indices.query.bool.max_clause_count=1024 rejects the query with HTTP 500.

The symptom: random 500 errors on category pages, only reproducible at peak traffic when the filter-cache is cold.

# /etc/opensearch/opensearch.yml
indices.query.bool.max_clause_count: 8192
indices.query.bool.max_nested_depth: 30

The OpenSearch docs warn that raising this consumes more heap per query. On a 16 GB heap with our 80k-SKU index, the JVM peak rose from 9.2 GB to 10.4 GB after the change. Acceptable, but watch GET _nodes/stats/jvm for two weeks after.

The rollback

Lower the value and restart OpenSearch. The setting is not dynamic: PUT _cluster/settings rejects it.

OpenSearch fix 2: max_result_window for sitemap generation

Magento's sitemap cron runs paginated queries against OpenSearch with from + size values that exceed the default index.max_result_window=10000 on any catalog past 10k SKUs. The cron fails silently: the next sitemap regen drops 70% of your URLs and you only notice when Search Console reports the index drop.

The fix is per-index, not cluster-wide.

curl -X PUT "localhost:9200/magento2_product_*/_settings" \
  -H "Content-Type: application/json" \
  -d '{ "index.max_result_window": 200000 }'

200,000 covers any realistic Magento catalog. The trade-off is heap usage during the deep-pagination query: about 80 MB peak per sitemap cron run. Run the sitemap cron at 03:00 when traffic is low and you will not notice.

Better fix if you can refactor

The right answer is to rewrite the sitemap cron to use search_after instead of from + size pagination. The Adobe-maintained magento/module-sitemap does not do this. Kishan Savaliya's Panth_XmlSitemap module does, but if you cannot swap the module, the index setting above gets you running today.

The before/after table

All six changes applied to the same store, same traffic profile, captured over consecutive 7-day windows.

Metric	Before	After
Category page p99	1840 ms	380 ms
Search results p99	920 ms	210 ms
Customer-section block p99	620 ms	38 ms
Varnish hit ratio	71%	99.4%
Redis evictions/min	0 (then OOM)	~120 steady
OpenSearch 500s/day	14	0

What we did not do

We did not switch to LiteSpeed, raise FPM worker count, or buy more RAM. We did not rebuild on a headless storefront. Every change above is open-source Magento 2.4.4-2.4.9 with default Redis, Varnish, and OpenSearch packages from Debian 12 and Ubuntu 24.04 repos. Total engineering time per store, including baseline capture and post-deploy validation, runs around 24 hours.

FAQ

Will allkeys-lru ever evict a key Magento still needs?

Yes, and that is the point. Magento's cache layer regenerates evicted keys on the next request. The cost is a one-request latency spike on cache miss, which is several orders of magnitude cheaper than the OOM-induced no-cache mode that noeviction falls into when Redis fills.

Do these Varnish changes work without Cloudflare?

Five of the six do. The X-Forwarded-For ACL is Cloudflare-specific: swap the CIDR list for your CDN's published IP ranges (Fastly, Akamai, AWS CloudFront all publish them as JSON feeds). The ESI hold-time, the cookie strip, and the surrogate purge are CDN-agnostic.

What if I run Elasticsearch instead of OpenSearch?

The two OpenSearch keys map one-to-one to Elasticsearch 7.x: indices.query.bool.max_clause_count is identical, index.max_result_window is identical. Adobe dropped Elasticsearch support in 2.4.7. If you are still on it, the upgrade to OpenSearch 2.13 is the bigger win.

How do I know if my Redis is fragmented before I enable activedefrag?

Run redis-cli INFO memory | grep mem_fragmentation_ratio. A ratio above 1.4 means you are wasting 40%+ of allocated memory on fragmentation. Anything above 1.2 is worth enabling defrag for.

Do I need to flush cache after changing cache_id_prefix?

No, but you should. Old keys with the blank prefix sit in Redis until they expire or get evicted, which means up to 24 hours of stale entries clogging your eviction pool. Run redis-cli FLUSHDB on the Magento cache DBs after the env.php change.

Is raising max_clause_count to 8192 safe on a 4 GB heap node?

No. The 8192 ceiling is safe on 16 GB+ heaps with our catalog profile. On 4 GB heaps, cap it at 4096 and monitor GC pause time in _nodes/stats/jvm/gc. If old-gen GC pauses climb past 200 ms, lower the limit.

Does Magento Cloud (PaaS) allow these changes?

Redis policies and OpenSearch settings: yes, via the project YAML. Varnish VCL changes: partially, Magento Cloud provides a limited VCL include hook. The ESI TTL change works; the X-Forwarded-For ACL needs Adobe support to whitelist your CIDR file.

References

^[1] Redis 7.2 documentation, eviction policies. https://redis.io/docs/latest/develop/reference/eviction/
^[2] OpenSearch 2.13 cluster settings reference, indices.query.bool.max_clause_count.
^[3] Varnish Cache 7.5 reference manual, vcl_recv and ban() semantics.
^[4] Adobe Commerce DevDocs, Redis backend configuration for 2.4.x.
^[5] Cloudflare IP Ranges, refreshed feed at https://www.cloudflare.com/ips-v4.

Need this shipped on your store?

I ship a fixed-scope Redis + Varnish + OpenSearch tuning sprint with a 7-day before/after p99 report and a rollback playbook per change. Fixed quote: $499 audit · $2,499 sprint · ~24h @ $25/hr. See hire me or Magento 2 performance optimization.

Tagged #Magento Performance #Lighthouse #DevOps #OpenSearch #Redis #Varnish

Keep reading

Search Experience Optimization (SXO) for Magento: Where SEO Meets CRO and Core Web Vitals

Ranking is not enough. SXO blends SEO, UX, and CRO so your Magento page satisfies the searcher and converts. Here is why optimized pages still fail, and how to fix them.

Jun 8, 2026
FrankenPHP Worker Mode: Near-Go PHP Throughput (and What It Means for Magento)

FrankenPHP worker mode boots your framework once and reuses it across requests for roughly 3-4× the throughput of nginx + PHP-FPM. Here is how it works, where it shines, and the honest take for Magento.

Jun 5, 2026
Magento 2 Reindex & Index Management: The Complete Guide

Everything that actually matters about Magento 2 indexing: the two modes, how Update by Schedule works under the hood, the CLI you need, and how to clear a stuck "index locked" reindex.

May 23, 2026

The trace this is built on

The six config keys, in order of impact

Redis fix 1: switch maxmemory-policy off noeviction

The diff

Measured impact

The rollback

Redis fix 2: separate cache_id_prefix per store

The diff

Compression: zstd or l4z, never gzip

Redis fix 3: activedefrag on long-running instances

Varnish fix 1: trust Cloudflare X-Forwarded-For

The diff in default.vcl

Why this matters for performance, not just security

Varnish fix 2: ESI hold-times for cart and wishlist

Measured impact

Varnish fix 3: strip cookies on static assets

OpenSearch fix 1: raise max_clause_count

The rollback

OpenSearch fix 2: max_result_window for sitemap generation

Better fix if you can refactor

The before/after table

What we did not do

FAQ

References

Search Experience Optimization (SXO) for Magento: Where SEO Meets CRO and Core Web Vitals

FrankenPHP Worker Mode: Near-Go PHP Throughput (and What It Means for Magento)

Magento 2 Reindex & Index Management: The Complete Guide