Technical SEO·7 min read

llms.txt: The Complete Guide (What It Is, How to Write One, and Why It Matters)

Everything you need to know about llms.txt — the file standard that helps AI systems understand your website. Includes the exact format, a real annotated example, common mistakes, and how to check compliance.


If you have heard of robots.txt, you understand the concept behind llms.txt: a standardised file that tells AI systems how to interact with your website. But where robots.txt is primarily about crawl permissions, llms.txt is about comprehension — giving large language models a concise, structured summary of what your site contains and which content deserves attention.

The standard was proposed by Jeremy Howard, founder of fast.ai, in September 2024. It has since gained adoption from hundreds of websites and is now checked by several major AI systems during content retrieval.

Why llms.txt Exists

Language models have a context window problem. When an AI system retrieves a web page, it often has to process an entire HTML document — navigation, footers, cookie banners, repetitive template content — to extract the meaningful information. For complex sites with many pages, this is inefficient and often imprecise.

llms.txt solves this by providing a curated index of your most important content, written in clean Markdown, specifically optimised for LLM consumption. Instead of an agent or RAG system having to crawl and parse your entire site, it can read your llms.txt first and make informed decisions about which pages are worth retrieving in full.

Think of it as a human-readable sitemap for AI — but one that includes context and descriptions, not just URLs.

The File Format

llms.txt lives at yourdomain.com/llms.txt and is written in Markdown. The structure follows a specific convention:

# Company or Site Name

> One or two sentence description of what your site or company does. This is the summary an LLM will use to understand you at a glance.

Optional: additional context paragraphs here. These can explain your positioning, who you serve, or what makes your offering distinct.

## Section Name

- [Page Title](https://yourdomain.com/page-url): Brief description of what this page contains and why it is useful.
- [Another Page](https://yourdomain.com/another-page): Description.

## Another Section

- [Documentation](https://yourdomain.com/docs): Full product documentation.
- [Pricing](https://yourdomain.com/pricing): Current pricing plans and feature comparison.

The key structural elements are:

  • H1 heading: Your site or company name
  • Blockquote (>): A concise summary — this is the most important element and will often be used as a standalone description by LLMs
  • Body paragraphs: Optional extended context
  • H2 sections: Categorical groupings of pages
  • List items with links: Individual pages with descriptions

A Real Annotated Example

Here is an example for a hypothetical B2B SaaS company:

# Acme Analytics

> Acme Analytics is a B2B data pipeline tool that helps engineering teams move data from SaaS applications into data warehouses without writing ETL code. Used by 2,000+ companies including Stripe, Notion, and Linear.

Acme connects to 200+ data sources (Salesforce, HubSpot, Shopify, and more) and syncs data to Snowflake, BigQuery, Redshift, and Databricks. Setup takes under 30 minutes with no SQL required.

## Product

- [How Acme Works](https://acmeanalytics.io/how-it-works): Technical overview of the pipeline architecture, sync frequency options, and transformation capabilities.
- [Integrations](https://acmeanalytics.io/integrations): Full list of 200+ supported data sources and destinations with setup documentation.
- [Pricing](https://acmeanalytics.io/pricing): Pricing tiers from $0/month (Starter) to enterprise. Billed by data volume.

## Documentation

- [Quick Start Guide](https://acmeanalytics.io/docs/quickstart): Get your first pipeline running in under 30 minutes.
- [API Reference](https://acmeanalytics.io/docs/api): Full REST API documentation for programmatic pipeline management.
- [Data Transformation](https://acmeanalytics.io/docs/transforms): How to use dbt-style transforms within Acme pipelines.

## Resources

- [Blog](https://acmeanalytics.io/blog): Technical articles on data engineering, pipeline best practices, and product updates.
- [Case Studies](https://acmeanalytics.io/customers): How customers use Acme to solve specific data challenges.

## Optional

- [Status Page](https://status.acmeanalytics.io): Real-time system status and incident history.
- [Changelog](https://acmeanalytics.io/changelog): Product updates and new feature releases.

A few things worth noting in this example:

  • The blockquote leads with what the product does, then who uses it. Both elements are important — the "what" for categorisation, the "who" for social proof that LLMs can surface.
  • Section names are semantic, not decorative. "Product", "Documentation", "Resources" are standard and easy for AI systems to parse.
  • Descriptions are functional, not marketing-speak. "Full list of 200+ supported data sources" is more useful to an LLM than "Explore our powerful integration ecosystem."
  • The "Optional" section is part of the llms.txt convention — pages that may be useful in some contexts but are not core discovery content.

llms-full.txt: The Companion File

The standard also defines a companion file: llms-full.txt at yourdomain.com/llms-full.txt. Where llms.txt provides a curated index of key pages, llms-full.txt contains the full text content of those pages in a single file — pre-processed and cleaned for LLM consumption.

The use case for llms-full.txt is batch ingestion. If an AI system wants to deeply understand your entire site without making multiple HTTP requests, it can fetch llms-full.txt and process the content all at once. This is particularly valuable for documentation sites, knowledge bases, and any site where an LLM might need comprehensive coverage rather than selective retrieval.

Most sites start with llms.txt and add llms-full.txt later if there is demand from AI system integrations.

Which AI Systems Read llms.txt?

Adoption is growing. As of early 2026, the following systems reference llms.txt when available:

  • Perplexity: Uses llms.txt during web retrieval to prioritise which pages to fetch
  • Claude (with web access): Checks for llms.txt when Anthropic's retrieval system is active
  • Various AI agents and frameworks: LangChain, AutoGPT, and numerous custom agent frameworks check for llms.txt as part of site orientation
  • Developer tooling: Several IDE-integrated AI tools (including Cursor) check llms.txt before referencing external documentation

ChatGPT's browsing implementation does not currently give special treatment to llms.txt, but this may change — and having the file in place costs nothing while providing value to the systems that do read it.

Common Mistakes to Avoid

Too much marketing language. Descriptions like "industry-leading platform" or "revolutionary solution" are useless to an LLM. Write descriptions that communicate what the page contains factually.

Missing the blockquote summary. This is the most frequently skipped element and the most important one. The > summary is what LLMs use when they want a one-sentence description of your company. Without it, they fall back to scraping your homepage.

Stale links. An llms.txt that points to deleted pages or outdated URLs actively harms AI comprehension of your site. Treat it like a sitemap — it requires maintenance.

No descriptions on list items. A list of URLs with no context is nearly as useful as a raw sitemap. The descriptions are what differentiate llms.txt from existing formats and provide the semantic context that LLMs need.

Listing every page. llms.txt is a curated index, not a comprehensive directory. Including 200 pages with superficial descriptions is worse than including 20 pages with useful ones. Be selective.

How Surfaceable Checks llms.txt Compliance

Surfaceable includes llms.txt validation as part of its AI visibility audit. The audit checks:

  • Whether llms.txt and llms-full.txt exist at the correct paths
  • Whether the H1, blockquote, and section structure follow the standard format
  • Whether linked pages return valid HTTP responses (no 404s or redirects)
  • Whether the file is accessible to AI crawlers (not blocked by robots.txt or Cloudflare rules)
  • Content quality flags: descriptions that are missing, too short, or appear to be auto-generated boilerplate

If you have not yet published an llms.txt, the audit will flag it as a missing baseline requirement. If you have one but it has structural issues, the audit will identify them specifically.

Getting Started

Publishing your first llms.txt takes under an hour for most sites. Start with:

  1. Write a one-to-two sentence blockquote description of your site — what it does and who it serves
  2. Identify your five to ten most important pages: typically home, product, pricing, docs, and blog
  3. Write one-sentence functional descriptions for each page
  4. Group them into two to four sections with clear names
  5. Deploy the file at /llms.txt and verify it returns a 200 with Content-Type: text/plain or text/markdown

That is the entire implementation. There is no registration required, no schema to validate against — you publish the file and AI systems will find it.


Try Surfaceable

Track your brand's AI visibility

See how often ChatGPT, Claude, Gemini, and Perplexity mention your brand — and get a full technical SEO audit. Free to start.

Get started free →