🤖 Robots.txt Analyzer

Analyze your robots.txt file, check which bots are blocked, and get code to block AI crawlers from training on your content.

🛡️ Block AI Crawlers from Your Site

AI companies like OpenAI, Anthropic, Google, and others crawl websites to collect training data for their models. If you don't want your content used to train AI, you can add rules to your robots.txt file to block these crawlers.

Note: robots.txt is a voluntary standard. While reputable AI companies generally respect it, there's no technical enforcement. For stronger protection, consider additional measures like rate limiting or authentication.

Copy-Paste Code to Block AI Bots

Add this to your robots.txt file to block known AI training crawlers:

robots.txt
# Block AI Training Crawlers
# Add this to your robots.txt to prevent AI companies from using your content

# GPTBot (OpenAI)
User-agent: GPTBot
Disallow: /

# ChatGPT-User (OpenAI)
User-agent: ChatGPT-User
Disallow: /

# OAI-SearchBot (OpenAI)
User-agent: OAI-SearchBot
Disallow: /

# ClaudeBot (Anthropic)
User-agent: ClaudeBot
Disallow: /

# Claude-Web (Anthropic)
User-agent: Claude-Web
Disallow: /

# Anthropic-AI (Anthropic)
User-agent: anthropic-ai
Disallow: /

# Google-Extended (Google)
User-agent: Google-Extended
Disallow: /

# CCBot (Common Crawl)
User-agent: CCBot
Disallow: /

# PerplexityBot (Perplexity)
User-agent: PerplexityBot
Disallow: /

# Bytespider (ByteDance)
User-agent: Bytespider
Disallow: /

# Diffbot (Diffbot)
User-agent: Diffbot
Disallow: /

# FacebookBot (Meta)
User-agent: FacebookBot
Disallow: /

# Meta-ExternalAgent (Meta)
User-agent: meta-externalagent
Disallow: /

# Cohere-AI (Cohere)
User-agent: cohere-ai
Disallow: /

# Omgilibot (Webz.io)
User-agent: Omgilibot
Disallow: /

# YouBot (You.com)
User-agent: YouBot
Disallow: /

# Applebot-Extended (Apple)
User-agent: Applebot-Extended
Disallow: /

# ImagesiftBot (Hive)
User-agent: ImagesiftBot
Disallow: /

# img2dataset (LAION)
User-agent: img2dataset
Disallow: /

What Each Bot Does

Bot Company Purpose
GPTBotOpenAICollects data for training GPT models
ChatGPT-UserOpenAIChatGPT's web browsing feature
ClaudeBotAnthropicCollects data for training Claude
Google-ExtendedGoogleGemini/Bard training data (separate from search)
CCBotCommon CrawlOpen dataset used by many AI companies
PerplexityBotPerplexityPerplexity AI search engine
BytespiderByteDanceTikTok's AI training crawler

Understanding robots.txt

The robots.txt file is a plain text file placed in your website's root directory that tells web crawlers which pages they can and cannot access. It uses a simple syntax with directives like:

  • User-agent: Specifies which bot the rules apply to (use * for all bots)
  • Disallow: Paths that should not be crawled
  • Allow: Exceptions to disallow rules
  • Sitemap: Location of your XML sitemap
  • Crawl-delay: Seconds between requests (not universally supported)

Example robots.txt

# Allow all bots to crawl everything
User-agent: *
Allow: /

# Block admin areas
User-agent: *
Disallow: /admin/
Disallow: /private/

# Point to sitemap
Sitemap: https://example.com/sitemap.xml