Part of Generative Engine Optimization
AI Crawlers
AI crawlers are the bots — GPTBot, ClaudeBot, PerplexityBot, Google-Extended — that fetch pages to train and ground generative engines.
Overview
Each major AI vendor runs its own crawler with distinct user agents and policies. GPTBot (OpenAI) and Google-Extended (Google) gate training data; OAI-SearchBot and PerplexityBot fetch live pages for retrieval-augmented answers; ClaudeBot covers both. Blocking one and allowing another is normal — but blocking all of them removes your site from the AI answer surface entirely.
Components
- GPTBot
OpenAI training crawler
- OAI-SearchBot
OpenAI retrieval crawler
- ClaudeBot
Anthropic crawler
- PerplexityBot
Perplexity retrieval crawler
- Google-Extended
Gemini training opt-in
Related entities
- robots.txt — control surface
- User-agent — identification
Key facts
- Default crawler behavior is opt-out — you must explicitly Disallow if you want to block.
- Training crawlers and retrieval crawlers are usually distinct user agents.
- Server logs filtered for AI user-agents reveal real ingestion volume in days.