AI

llms.txt and AI SEO: What Actually Matters in 2026

10 min read Oskars Tuns

Oskars Tuns

Developers Alliance

There's a new file showing up at the root of some websites. It's called llms.txt, it's written in markdown, and it was built to help AI systems understand what a site is actually about — who runs it, what topics it covers, which pages matter.

Whether it gets read is a different question. But more on that later.

The bigger shift here isn't one file. It's that the way people find information has fundamentally changed, and most websites haven't caught up with what that means for them.

What people mean by "AI SEO"

For the past two decades, search optimization meant getting Google to rank your page above your competitors. Page one, position one, click-through rate. The whole discipline built itself around those signals.

That model still exists. But it's no longer the only one that matters.

ChatGPT, Perplexity, Claude, and Google's AI Overviews now answer questions directly. Someone asks "who builds Magento stores in Eastern Europe?" and gets a paragraph response — not ten blue links. If your agency isn't in that paragraph, you don't exist in that answer.

This has given rise to what some people are calling GEO — Generative Engine Optimization. The goal isn't just to rank; it's to be the source an AI cites when someone asks a relevant question. That requires thinking about discoverability in a different way.

What llms.txt actually is

llms.txt was proposed by Jeremy Howard — the founder of fast.ai — in September 2024. The concept is deliberately simple: a plain-text file, placed at the root of your domain, that summarizes what your site is and what's in it.

A basic example looks like this:

# Developers Alliance
> Professional software development agency specializing in
> Magento, Adobe Commerce, and AI integration.

## Main Pages
- [Homepage](https://developers-alliance.com): Services overview
- [Services](https://developers-alliance.com/services.html): What we do
- [AI for eCommerce](https://developers-alliance.com/ai.html): AI business systems
- [Case Studies](https://developers-alliance.com/case-studies.html): Client work
- [Blog](https://developers-alliance.com/blog.html): Technical articles
- [Contact](https://developers-alliance.com/contact.html): Get in touch

## Optional
- [Privacy Policy](https://developers-alliance.com/privacy-policy.html)

That's it. No complex syntax. No proprietary format. Just structured text that a language model can read and immediately understand the shape of your site — without needing to crawl every page to figure it out.

There's also an extended version, llms-full.txt, which includes the actual full text of your most important pages. The idea is to give AI systems even more material to work with when forming responses about your company or content.

As of April 2026: No major LLM — not ChatGPT, not Claude, not Gemini, not Perplexity — has officially announced that they read or act on llms.txt files. The spec lives at llmstxt.org and has real community support. But right now, it's a proposal without confirmed adoption from any of the major model providers.

That doesn't make it pointless. It's cheap to implement, serves as useful documentation for your own team, and if the standard takes hold, you'll already be ahead. But it shouldn't sit at the top of your AI SEO priority list while more impactful things go undone.

How llms.txt differs from robots.txt

robots.txt has been around since 1994. It's a simple instruction file: here's what you can crawl, here's what you can't. Block a directory, and well-behaved bots won't enter it. The system works because crawlers actively check the file before indexing.

llms.txt does something different at a different layer. It doesn't control access — it provides context. You're not blocking or permitting anything. You're saying: if you've already gotten here, here's the map.

  • robots.txt: access control — what bots can and can't visit
  • llms.txt: context provision — a structured summary of what's here

A site might have a robots.txt that opens all content to AI crawlers while llms.txt points them to what's most relevant. Or a site might restrict crawlers entirely through robots.txt — in which case llms.txt probably won't matter much either way. They operate independently.

Allowing access vs. allowing training — a distinction that matters

This is where a lot of confusion lives, and it's worth being clear about it.

There are two separate things a website owner might want to control when it comes to AI:

1. Allowing AI to read your content (for answering questions)

Systems like GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, and Google-Extended crawl the web to build up knowledge and retrieve context when generating answers. You control whether these bots can visit your site through robots.txt:

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

Want to block them? Same syntax, different directive:

User-agent: GPTBot
Disallow: /

This affects whether AI systems can crawl your site to include it in their knowledge base or retrieve it for search-style responses. Blocking these bots is your right, but it typically means less visibility in AI-generated answers.

2. Allowing AI to train on your content

This is a separate question entirely — and a more contested one. Common Crawl, which many AI models have trained on, has its own opt-out mechanism. The New York Times, Reddit, and others have pursued legal action over training data use.

Some meta tags have emerged as a way to signal training preferences, though no single standard has been formally adopted across the industry. What's important to know: llms.txt doesn't handle this either. It's not an access control file and it's not a training opt-out. It's a helpful summary document and nothing more.

If you want to signal your position on AI training, that conversation happens through robots.txt, legal agreements with data providers, and any formal opt-out mechanisms specific to the crawlers that matter to you.

What actually moves the needle for AI SEO right now

While llms.txt waits for wider adoption, other things are happening now that demonstrably affect whether AI systems cite your content and represent your company accurately.

Accessibility and semantic HTML

AI models read the web the same way screen readers do — through the structure of the markup. Content buried in JavaScript-rendered components, div soup without proper heading hierarchy, or pages that load meaningful text asynchronously after the initial render is harder to parse reliably.

  • Use a clear H1–H6 hierarchy on every page
  • Write meaningful alt text for images — descriptions, not just filenames
  • Use ARIA roles and landmarks where appropriate (role="main", role="navigation", etc.)
  • Write link text that makes sense out of context ("read our Magento security audit guide", not "click here")
  • Add a skip to main content link — it costs nothing and signals a well-structured page

Accessibility and AI parsability overlap almost completely. Building a site that works for assistive technology is essentially the same work as building one that language models can read confidently.

Structured data (Schema.org)

Search engines have used Schema.org markup for years to understand page content, and the same signals inform AI understanding. Marking up your organization, your articles, your FAQ sections, and your service offerings gives language models a cleaner picture of who you are and what you offer.

A blog post with a BlogPosting schema, author information, publication date, and speakable specification is meaningfully more legible to an AI system than one that's just raw HTML. It's not magic, but it helps.

AI-specific meta tags

Most AI crawlers respect standard robots directives, but a few additional tags are worth using:

<meta name="robots" content="index, follow, max-snippet:-1, max-image-preview:large">
<meta name="googlebot" content="max-snippet:-1, max-image-preview:large">
<meta name="bingbot" content="index, follow">

The max-snippet:-1 directive tells search systems they can use as much text as they need from your page for summaries or answer generation. For AI systems that build responses from page content, removing that cap helps your content surface more fully.

Some sites also add declarative tags to signal content authorship:

<meta name="ai-content-declaration" content="human-written, AI-assisted">

There's no formal standard for these yet, but signaling quality and authorship is becoming more relevant as AI systems start trying to distinguish high-quality, expert content from generated filler.

Building authority in the places AI actually learns from

Traditional SEO built authority through backlinks: get other sites to point at yours, especially high-authority ones. That still matters for search rankings. But AI systems absorb authority differently — through the content they were trained on and the sources they trust when retrieving information.

If your company or expertise is mentioned in:

  • Industry publications and news — trade press, tech journalism, analyst reports
  • Forums and communities — Reddit, Hacker News, Stack Overflow, LinkedIn discussions
  • Podcast transcripts — interview content indexes well and gets scraped widely
  • Wikipedia and Wikidata — the most trusted reference source for training data
  • Professional directories — Clutch, G2, Capterra, UpCity, and platform partner pages
  • Case studies and technical documentation — real-world specifics cited by others

...then language models trained on that web are more likely to know who you are and speak accurately about what you do. This is why a genuine PR strategy, real participation in communities where your peers write things, and showing up in industry conversations has compounding value in an AI-first discovery environment.

A brand mentioned three times on the first page of Hacker News discussion probably has more AI visibility than a brand with 50 low-quality backlinks and no mention anywhere a human would actually read.

What our llms.txt looks like

We've had llms.txt live at developers-alliance.com for a while. Here's the structure:

# Developers Alliance - AI Context File
# https://developers-alliance.com

## About
Developers Alliance is a professional software development agency
specializing in e-commerce solutions, AI integration, and Adobe
Commerce/Magento development. We provide expert development services
to businesses worldwide.

## Core Services
- E-Commerce Development (Adobe Commerce, Magento, Shopify)
- AI Integration & Automation
- Custom Software Development
- White Label Development Partnerships

## Main Pages
- [Homepage](https://developers-alliance.com): Company overview
- [Services](https://developers-alliance.com/services.html): Full service list
- [AI for eCommerce](https://developers-alliance.com/ai.html): AI systems
- [Case Studies](https://developers-alliance.com/case-studies.html): Client work
- [Blog](https://developers-alliance.com/blog.html): Technical articles
- [Contact](https://developers-alliance.com/contact.html): Enquiries

Simple enough that it takes ten minutes to write and maintain. Specific enough that an LLM reading it would know exactly what kind of agency we are, what platforms we work on, and where to send someone who wants to learn more.

Is it being read by ChatGPT or Gemini right now? Probably not. But the file is there, it's accurate, and when the standard does gain traction — and it probably will, in some form — we won't be scrambling to catch up.

The honest summary

AI SEO is real, but it's messier than the content mill articles make it sound. llms.txt is worth having and not worth obsessing over. The fundamentals — semantic HTML, structured data, genuine authority built in real places — matter more right now than any single file.

The rules are still being written. What counts as best practice is shifting every few months as the major AI systems update how they retrieve and surface content. That's frustrating, but it also means that doing the foundational work now — accessible pages, honest structured data, real presence in your industry — puts you ahead of competitors waiting for the standards to settle.

Tags: AI SEO llms.txt GEO Structured Data Accessibility Content Strategy
Share this article:

AI SEO is moving fast. We can help you navigate it.

From llms.txt and structured data to building the kind of authority AI systems actually trust — we can map out what's worth doing for your site, specifically.

Let's Talk