🤖 Robots.txt Analyzer

Analyze your robots.txt file, check which bots are blocked, and get code to block AI crawlers from training on your content.

🛡️ Block AI Crawlers from Your Site

AI companies like OpenAI, Anthropic, Google, and others crawl websites to collect training data for their models. If you don't want your content used to train AI, you can add rules to your robots.txt file to block these crawlers.

Note: robots.txt is a voluntary standard. While reputable AI companies generally respect it, there's no technical enforcement. For stronger protection, consider additional measures like rate limiting or authentication.

Copy-Paste Code to Block AI Bots

Add this to your robots.txt file to block known AI training crawlers:

robots.txt

# Block AI Training Crawlers
# Add this to your robots.txt to prevent AI companies from using your content

# GPTBot (OpenAI)
User-agent: GPTBot
Disallow: /

# ChatGPT-User (OpenAI)
User-agent: ChatGPT-User
Disallow: /

# OAI-SearchBot (OpenAI)
User-agent: OAI-SearchBot
Disallow: /

# ClaudeBot (Anthropic)
User-agent: ClaudeBot
Disallow: /

# Claude-Web (Anthropic)
User-agent: Claude-Web
Disallow: /

# Anthropic-AI (Anthropic)
User-agent: anthropic-ai
Disallow: /

# Google-Extended (Google)
User-agent: Google-Extended
Disallow: /

# CCBot (Common Crawl)
User-agent: CCBot
Disallow: /

# PerplexityBot (Perplexity)
User-agent: PerplexityBot
Disallow: /

# Bytespider (ByteDance)
User-agent: Bytespider
Disallow: /

# Diffbot (Diffbot)
User-agent: Diffbot
Disallow: /

# FacebookBot (Meta)
User-agent: FacebookBot
Disallow: /

# Meta-ExternalAgent (Meta)
User-agent: meta-externalagent
Disallow: /

# Cohere-AI (Cohere)
User-agent: cohere-ai
Disallow: /

# Omgilibot (Webz.io)
User-agent: Omgilibot
Disallow: /

# YouBot (You.com)
User-agent: YouBot
Disallow: /

# Applebot-Extended (Apple)
User-agent: Applebot-Extended
Disallow: /

# ImagesiftBot (Hive)
User-agent: ImagesiftBot
Disallow: /

# img2dataset (LAION)
User-agent: img2dataset
Disallow: /

What Each Bot Does

Bot	Company	Purpose
`GPTBot`	OpenAI	Collects data for training GPT models
`ChatGPT-User`	OpenAI	ChatGPT's web browsing feature
`ClaudeBot`	Anthropic	Collects data for training Claude
`Google-Extended`	Google	Gemini/Bard training data (separate from search)
`CCBot`	Common Crawl	Open dataset used by many AI companies
`PerplexityBot`	Perplexity	Perplexity AI search engine
`Bytespider`	ByteDance	TikTok's AI training crawler

Understanding robots.txt

The robots.txt file is a plain text file placed in your website's root directory that tells web crawlers which pages they can and cannot access. It uses a simple syntax with directives like:

User-agent: Specifies which bot the rules apply to (use * for all bots)
Disallow: Paths that should not be crawled
Allow: Exceptions to disallow rules
Sitemap: Location of your XML sitemap
Crawl-delay: Seconds between requests (not universally supported)

Example robots.txt

# Allow all bots to crawl everything
User-agent: *
Allow: /

# Block admin areas
User-agent: *
Disallow: /admin/
Disallow: /private/

# Point to sitemap
Sitemap: https://example.com/sitemap.xml