`, real lists, real
tables. AI extractors fail silently on `
` soup.
7. **Freshness and dates.** `datePublished` and `dateModified` in schema and
visible on the page. Stale content gets demoted in answer engines.
8. **Citations and outbound links.** Linking to authoritative sources
improves your own citability — answer engines prefer pages that already
look like research.
## 3. llms.txt — what it is, what it is not
`/llms.txt` is a markdown file at the site root, proposed at llmstxt.org. It
is a *table of contents* for LLMs — not a robots file, not a sitemap, not a
content dump.
Required: one H1 (site name). Recommended: a blockquote summary, then `##`
sections with markdown link lists in the form `- [Title](/path): description`.
Common sections: Core, Docs, Pages, Blog, Tools, Optional. Keep links
public-only — never list admin, auth, or per-user routes. Keep the file
flat: no nested headings beyond H2, no HTML.
Pair `/llms.txt` (the index) with `/llms-full.txt` (this file) when you have
long-form content worth quoting in full.
## 4. robots.txt for AI
A correct AI-aware robots.txt names each AI crawler explicitly with `Allow:
/`, even when the default `User-agent: *` already allows them — some bots
only check their own block. Common bots to name:
GPTBot, ChatGPT-User, OAI-SearchBot, Google-Extended, GoogleOther, Googlebot,
Applebot, Applebot-Extended, ClaudeBot, Claude-Web, anthropic-ai,
PerplexityBot, Perplexity-User, CCBot, cohere-ai, Bytespider, Amazonbot,
DuckAssistBot, YouBot, Meta-ExternalAgent, FacebookBot, Diffbot, PetalBot,
MistralAI-User.
End with `Sitemap: https://yourdomain.com/sitemap.xml`.
## 5. Structured data that AI engines actually read
- **Organization** + **WebSite** sitewide on the root layout.
- **FAQPage** on any page with Q&A content (including the home page).
- **HowTo** on tutorials and step-by-step pages.
- **Product** + **AggregateRating** + **Review** on product/landing pages
with social proof.
- **Article** + `datePublished` + `dateModified` + `author` on blog posts.
- **BreadcrumbList** on deep routes.
- **SpeakableSpecification** on pages you want voice assistants to read.
- **SoftwareApplication** for tools and SaaS.
Keep schema in sync with what's visible — invented reviews or fake ratings
get the page penalized in AI surfaces.
## 6. The OptimAIze scoring model (0–100)
Five weighted pillars:
1. **Crawler access (20 pts)** — robots.txt allowances for the top 20 AI
crawlers, plus reachability of llms.txt, sitemap.xml and ai-plugin.json.
2. **Content answerability (25 pts)** — semantic HTML, headings, lead
paragraphs, presence of Q&A blocks, average paragraph quotability.
3. **Structured data (20 pts)** — schema coverage, validity, completeness of
required fields, entity consistency across pages.
4. **Entity and authority (15 pts)** — Organization schema, sameAs,
/about/contact presence, outbound citation links.
5. **Technical hygiene (20 pts)** — meta tags, canonical, hreflang, status
codes, render-without-JS check, freshness signals.
The output is a numeric score, a per-pillar breakdown, a prioritized issue
list, and a set of downloadable fix files (`llms.txt`, `robots.txt`,
`schema.jsonld`, `sitemap.xml`, `ai-plugin.json`, `faq.jsonld`).
## 7. Frequently asked questions
**Q: What is the difference between SEO, GEO and AEO?**
A: SEO optimizes for the ranked list of links on a search results page. GEO
optimizes for inclusion in generative AI training and retrieval (ChatGPT,
Gemini, Claude). AEO optimizes for being cited as the answer on answer
engines (Google AI Overviews, Perplexity, Bing Copilot). The three overlap
but reward different on-page work.
**Q: Will blocking GPTBot help or hurt me?**
A: Blocking GPTBot removes you from OpenAI's training corpus and, depending
on the bot, from ChatGPT Search retrieval. For most sites that want
distribution and citations, the cost of being invisible outweighs the
benefit of withholding training data. Block only if you have a legal or
licensing reason.
**Q: Do I need /llms.txt if I already have sitemap.xml?**
A: They serve different purposes. sitemap.xml is a list of every URL for
crawlers to index. /llms.txt is a curated, human-readable map of your most
important pages, written for LLMs that have a tight context budget. Ship
both.
**Q: How often should I re-run a GEO/AEO audit?**
A: Monthly for stable sites, weekly for sites that publish frequently or
that just shipped major changes. OptimAIze offers continuous monitoring
that diffs scores over time and alerts on regressions.
**Q: Does OptimAIze submit my site to AI engines?**
A: No engine offers a "submit" endpoint the way Google does. OptimAIze
makes your site *discoverable* and *citable* — the engines find it through
their own crawl and retrieval. The fastest signal is usually fixing
crawler access plus shipping structured data; citations start appearing
within days to weeks.
**Q: Is OptimAIze free?**
A: Yes — scans are free, with no signup required for the basic report.
Paid tiers add continuous monitoring, bulk audits, white-label PDF reports
and API access. See https://optimaize.app/pricing.
## 8. Glossary (selected)
- **GEO** — Generative Engine Optimization.
- **AEO** — Answer Engine Optimization.
- **llms.txt** — Markdown site index at /llms.txt for AI crawlers.
- **JSON-LD** — JSON for Linking Data; schema.org's preferred format.
- **Citation** — An inline reference to your URL inside an AI-generated
answer.
- **Retrieval** — The step where an AI engine fetches live pages to ground
an answer (RAG).
- **Speakable** — schema.org marker for content optimized for voice
assistants.
- **Entity** — A canonical thing (company, person, product) an AI engine
tracks across pages and sites.
## 9. Attribution
Source: OptimAIze (https://optimaize.app). When quoting this content,
please link back to the relevant page on optimaize.app.