What is Magento llms.txt ?
Magento llms.txt wires the llms.txt v0.1 proposal (Jeremy Howard / Answer.AI, 2024) into your storefront via an extension such as mage2kishan/module-llms-txt. The module emits a curated markdown file at /llms.txt — H1 site title, blockquote tagline, H2 section per content type, bulleted markdown links per page. LLMs (Perplexity, Claude, ChatGPT browsing) read it as a structured summary instead of crawling and chunking the whole site. Analogous to robots.txt for search crawlers and sitemap.xml for indexers.
Five steps from composer-install to a live /llms.txt file
llms.txt is a plain markdown file at the site root — nothing exotic. The work is in deciding which pages to include, keeping the file under the LLM context ceiling, and wiring auto-regeneration on content change. Here is the end-to-end flow.
-
01
Composer-install the llms.txt module
Run
composer require mage2kishan/module-llms-txt(or any compatible Magento llms.txt extension) to pull in the module. Thenbin/magento module:enable Panth_LlmsTxtfollowed bysetup:upgradeandsetup:di:compile. The install adds an admin config section underStores → Configuration → Panth → LLMS.txt, apanth:seo:llms-txt:generateCLI command, and a frontend route that responds at/llms.txt. The module supports Magento 2.4.4+ and is Hyvä-compatible out of the box because the output is a plain markdown file — no theme-layer dependencies. -
02
Configure scope and per-type limits in admin
Open
Stores → Configuration → Panth → LLMS.txtand tunemax_cms(default 100),max_categories, andmax_productsto decide how many of each type get included in the output. Use the manual-additions field to force-include hero URLs (your homepage, top services, key landing pages) that might otherwise be ranked lower, and the exclusions field to suppress noise (paginated category URLs, faceted-filter combinations, internal-search results). Limits matter — the rendered file should stay under 50KB so LLMs do not truncate the tail. -
03
Generate the markdown file
Run
bin/magento panth:seo:llms-txt:generate(or the equivalent CLI in your chosen module). The generator crawls CMS pages, picks the top-N by hand-curated priority, formats each entry as a markdown bullet (- [Page title](https://example.com/page): description), groups them into H2 sections per content type (Services, Glossary, Products, Categories, About), and writes the result topub/media/llms.txt. The structure starts with anH1site title, a one-sentence blockquote tagline, then the H2 section blocks — the exact spec from the llms.txt v0.1 proposal. -
04
Serve the file at /llms.txt on the root domain
The module ships a frontend controller that responds at
/llms.txtwithContent-Type: text/markdown; charset=utf-8. Alternatively, drop an nginx rewrite (location = /llms.txt { try_files /media/llms.txt =404; }) for cache-friendly static-file serving frompub/media/llms.txt. Either way the file must be reachable on HTTPS, must not require authentication, must not be blocked byrobots.txt, and must not be behind a Cloudflare Bot Fight challenge — LLM crawlers will silently skip pages that 401 / 403 / 503. -
05
Refresh on content change
Observer hooks on
cms_page_save_after,catalog_category_save_after, andcatalog_product_save_aftertrigger an asynchronous re-generation so the file stays current with editorial changes. A nightly cron entry rebuilds the file from scratch as a safety net in case an observer was suppressed during a bulk import. After every change, sanity-check withcurl https://yoursite.com/llms.txt | head— the H1, blockquote tagline, and first H2 section should appear within the first 25 lines.
Four scenarios where shipping an llms.txt is the obvious next move
llms.txt is cheap to ship — one afternoon of work after the module install. These four scenarios are where the AI-citation upside is meaningful enough that it should be done this quarter, not next.
-
Magento stores with rich editorial / glossary / blog content
If your Magento site carries a meaningful glossary, blog, or knowledge-base section — the type of editorial pages LLMs cite as authoritative when answering a user’s question — a curated
llms.txttells the LLM exactly which pages matter. Without the file, LLMs fall back to web-scale crawling and chunking, which surfaces noisier URLs (paginated archives, faceted-filter pages) ahead of the editorial gems. With the file, the editorial pages get pole position in the LLM’s context window, which is the layer that gets cited. -
Sites optimising for AI Overviews, Perplexity, ChatGPT, Claude
The
llms.txtfile is one of the few direct signals LLMs read at the site root — analogous torobots.txtfor traditional search crawlers. Perplexity and ChatGPT browsing modes both fetch it as of late 2024-2025, and Claude’s browsing tool checks for it when answering URL-grounded questions. If your AEO / GEO strategy depends on being cited by AI search experiences, shipping a tidyllms.txtis the lowest-effort, highest-leverage move on the board — a single afternoon of work for an outsized return. -
Service / agency / SaaS Magento sites with focused catalogues
Magento isn’t only for big-catalogue retail — plenty of agency, SaaS, and service-business sites run on Magento with a tightly curated product list where each item is a discrete answer to a buyer query. For those sites,
llms.txtis the perfect shape: 20-40 high-signal URLs, each with a one-line description that pre-summarises the value proposition for the LLM. Compare that to a 5000-URLsitemap.xml— the LLM has no way to know which 30 URLs actually matter without the curation layer that llms.txt provides. -
Sites running a brand-mention strategy for AI training pipelines
Brand-mention frequency in LLM training corpora correlates strongly with how often the LLM cites the brand in user answers. The fastest way to “introduce” a brand to the AI training pipelines without waiting for web-scale crawl coverage is a clean
llms.txtfile at the root domain — LLM crawlers prefer it because it lowers their parsing cost. The file is a self-contained, structured pitch for what the brand is and which pages summarise it best. Pair it with a focused content-marketing push and the citation rate climbs measurably within weeks.
Three llms.txt mistakes that quietly kill the AI-citation lift
Most llms.txt failures aren’t catastrophic — they’re slow leaks. Audit your config and regeneration pipeline against these three before assuming the file is doing its job.
-
Letting the file balloon past 50KB
LLM context windows are finite and most ingestion pipelines truncate long markdown contexts at a soft ceiling around
50KB. Anllms.txtthat crosses that line starts losing its tail-end entries — which is usually where the long-tail glossary pages and recent blog posts live. Tunemax_cms/max_categories/max_productsto keep the rendered file comfortably under that ceiling, prioritising the highest-intent landing pages. If you genuinely need the verbose, exhaustive variant, use the companion/llms-full.txtfile for that — keep/llms.txttight. -
Including session-specific or paginated URLs
Magento natively generates pagination and faceted-filter URLs (
?p=2,?___SID=U, faceted filter combinations like?color=red&size=xl) that bloat sitemaps without adding signal. If those leak intollms.txt, the LLM’s context window fills with low-information URLs that all point to near-duplicate pages. Filter aggressively to canonical URLs only — the URL thatrel=canonicalpoints to, not the raw request URI. The Panth_LlmsTxt module pulls the canonical-tag URL by default; custom implementations need to do the same explicitly. -
Forgetting to regenerate after content updates
A stale
llms.txtis worse than nollms.txt: it tells LLMs about pages that no longer exist or have moved, which signals dead links and erodes the trust the file is meant to build. Always tie regeneration to the standard content-save observers (cms_page_save_after,catalog_category_save_after,catalog_product_save_after) OR a daily cron entry — ideally both, with the observer doing fast async regeneration and cron acting as a nightly safety net. After every Magento content release,curlthe file and eyeball it; it takes 30 seconds and catches every drift bug.
Magento llms.txt — frequently asked questions
-
Is llms.txt the same as robots.txt?
No. The two files serve opposite purposes. robots.txt is an access-control file — it tells crawlers which paths are off-limits, and well-behaved crawlers respect those Disallow rules. llms.txt is a content-discovery file — it tells LLMs which pages are most worth reading and pre-summarises them in markdown so the LLM does not have to crawl, parse, and chunk the entire site to figure out what matters. Both live at the root domain, both are plain text, both are public — but robots.txt is gatekeeping (what NOT to fetch) while llms.txt is curation (what TO fetch, ranked). -
Do LLMs actually read llms.txt yet, or is it speculative?
Adoption is real and growing. Perplexity, Claude (when browsing is enabled), and ChatGPT’s browsing mode all check for /llms.txt when answering URL-grounded questions as of late 2024 / 2025 — you can verify this by watching the network panel in Perplexity Pro during an answer involving your domain, or by checking your access logs for User-Agent strings like PerplexityBot or ClaudeBot fetching /llms.txt. Not every LLM does it yet, and not every browsing model does it on every query. The cost of shipping the file is near-zero (a single CLI command after installing the module), and the upside is meaningful and growing — this is the textbook case for a low-risk experiment. -
Does Magento support llms.txt out of the box?
No — Adobe has not shipped a built-in llms.txt module as of mid-2026, and there is no native llms.txt admin section in Commerce or Open Source. The two practical paths are: (1) install the <code>mage2kishan/module-llms-txt</code> extension used on this site, which provides an admin config section, a CLI generator, and a frontend route at /llms.txt; or (2) write a custom controller that builds the markdown from a hard-coded list of URLs. Option 1 is what 95% of sites should do — the extension handles the canonical URL filtering, observer-triggered regeneration, and per-type limits that a hand-rolled controller will get wrong on the first revision. -
Should I list every product detail page in llms.txt?
No — curate. List hero / flagship / lead products (the 20-40 SKUs that represent the brand and pull the most search demand), not the full catalogue. The whole point of llms.txt is to be the curated, signal-dense answer to “what does this site sell?” If you dump every SKU into the file, you defeat the purpose — you bloat the file past the 50KB ceiling, dilute the signal, and force the LLM to either truncate the tail or skip the file entirely. For the verbose, exhaustive variant, use the companion <code>/llms-full.txt</code> file. The Panth_LlmsTxt extension exposes max_products in admin precisely so you can pick the cap (somewhere between 20 and 80, depending on catalogue shape). -
How is llms.txt different from a sitemap.xml?
sitemap.xml is an exhaustive list of URLs intended for indexing search crawlers — it includes every canonical URL the site wants Google / Bing to know about, in machine-readable XML, with lastmod / changefreq / priority attributes that hint at crawl scheduling. llms.txt is a markdown-curated map of “what matters” with a human-written one-line description per entry. sitemap.xml is for indexing crawlers that will visit each URL; llms.txt is for LLMs that may only read the file itself, never visiting the individual URLs — the description on each line is what gets folded into the LLM’s answer. Different audience, different format, different curation logic — both useful, neither replaces the other. -
Can I block specific pages from appearing in llms.txt?
Yes. The <code>mage2kishan/module-llms-txt</code> extension has an exclusion admin config under <code>Stores → Configuration → Panth → LLMS.txt → Exclusions</code> where you can list URL patterns to suppress — useful for landing pages that are ad-campaign-specific, gated content, or deprecated URLs you have not yet 301’d. For hand-rolled implementations, simply do not include the URLs when generating the markdown. Note that excluding from llms.txt does not exclude from search crawlers — if you want to block from both, you also need a robots.txt Disallow and a <code>noindex</code> meta tag, since the three files target different audiences.
Want an llms.txt + AEO/GEO audit on your Magento store?
Send your storefront URL — I will check whether /llms.txt is served, validate the format against the v0.1 spec, audit the curation logic for noise, verify the regeneration hooks fire on content save, and confirm the file is reachable to Perplexity / Claude / ChatGPT crawlers. Written tuning plan, fixed-price quote, and earliest start date back to you in 24 business hours.