All topics

Part of Generative Engine Optimization

AI Crawlers

AI crawlers are the bots — GPTBot, ClaudeBot, PerplexityBot, Google-Extended — that fetch pages to train and ground generative engines.

Overview

Each major AI vendor runs its own crawler with distinct user agents and policies. GPTBot (OpenAI) and Google-Extended (Google) gate training data; OAI-SearchBot and PerplexityBot fetch live pages for retrieval-augmented answers; ClaudeBot covers both. Blocking one and allowing another is normal — but blocking all of them removes your site from the AI answer surface entirely.

Components

Key facts

  • Default crawler behavior is opt-out — you must explicitly Disallow if you want to block.
  • Training crawlers and retrieval crawlers are usually distinct user agents.
  • Server logs filtered for AI user-agents reveal real ingestion volume in days.