🤖 Robots.txt Analyzer
Analyze your robots.txt file, check which bots are blocked, and get code to block AI crawlers from training on your content.
🛡️ Block AI Crawlers from Your Site
AI companies like OpenAI, Anthropic, Google, and others crawl websites to collect training data for their models. If you don't want your content used to train AI, you can add rules to your robots.txt file to block these crawlers.
Note: robots.txt is a voluntary standard. While reputable AI companies generally respect it, there's no technical enforcement. For stronger protection, consider additional measures like rate limiting or authentication.
Copy-Paste Code to Block AI Bots
Add this to your robots.txt file to block known AI training crawlers:
# Block AI Training Crawlers # Add this to your robots.txt to prevent AI companies from using your content # GPTBot (OpenAI) User-agent: GPTBot Disallow: / # ChatGPT-User (OpenAI) User-agent: ChatGPT-User Disallow: / # OAI-SearchBot (OpenAI) User-agent: OAI-SearchBot Disallow: / # ClaudeBot (Anthropic) User-agent: ClaudeBot Disallow: / # Claude-Web (Anthropic) User-agent: Claude-Web Disallow: / # Anthropic-AI (Anthropic) User-agent: anthropic-ai Disallow: / # Google-Extended (Google) User-agent: Google-Extended Disallow: / # CCBot (Common Crawl) User-agent: CCBot Disallow: / # PerplexityBot (Perplexity) User-agent: PerplexityBot Disallow: / # Bytespider (ByteDance) User-agent: Bytespider Disallow: / # Diffbot (Diffbot) User-agent: Diffbot Disallow: / # FacebookBot (Meta) User-agent: FacebookBot Disallow: / # Meta-ExternalAgent (Meta) User-agent: meta-externalagent Disallow: / # Cohere-AI (Cohere) User-agent: cohere-ai Disallow: / # Omgilibot (Webz.io) User-agent: Omgilibot Disallow: / # YouBot (You.com) User-agent: YouBot Disallow: / # Applebot-Extended (Apple) User-agent: Applebot-Extended Disallow: / # ImagesiftBot (Hive) User-agent: ImagesiftBot Disallow: / # img2dataset (LAION) User-agent: img2dataset Disallow: /
What Each Bot Does
| Bot | Company | Purpose |
|---|---|---|
GPTBot | OpenAI | Collects data for training GPT models |
ChatGPT-User | OpenAI | ChatGPT's web browsing feature |
ClaudeBot | Anthropic | Collects data for training Claude |
Google-Extended | Gemini/Bard training data (separate from search) | |
CCBot | Common Crawl | Open dataset used by many AI companies |
PerplexityBot | Perplexity | Perplexity AI search engine |
Bytespider | ByteDance | TikTok's AI training crawler |
Understanding robots.txt
The robots.txt file is a plain text file placed in your website's root directory that tells web crawlers which pages they can and cannot access. It uses a simple syntax with directives like:
- User-agent: Specifies which bot the rules apply to (use * for all bots)
- Disallow: Paths that should not be crawled
- Allow: Exceptions to disallow rules
- Sitemap: Location of your XML sitemap
- Crawl-delay: Seconds between requests (not universally supported)
Example robots.txt
# Allow all bots to crawl everything User-agent: * Allow: / # Block admin areas User-agent: * Disallow: /admin/ Disallow: /private/ # Point to sitemap Sitemap: https://example.com/sitemap.xml