Technical SEO 7 min read

Robots.txt Guide — Control How Search Engines Crawl Your Site

Learn how robots.txt works, how to write rules for Googlebot, how to block pages from crawling, and common robots.txt mistakes that hurt your SEO.

The robots.txt file is a plain text file at the root of your website that tells search engine crawlers which pages or sections they're allowed (or not allowed) to crawl. It's one of the first files Googlebot checks when visiting your site — and mistakes here can make your entire site invisible.

How Robots.txt Works

When a search engine bot visits https://yoursite.com/robots.txt, it reads the rules before crawling any other pages. The file uses a simple syntax with User-agent (which bot the rule applies to), Disallow (paths to block), and Allow (paths to explicitly permit) directives.

Basic Robots.txt Syntax

# Allow all bots to crawl everything
User-agent: *
Allow: /

# Block a specific directory
User-agent: *
Disallow: /admin/
Disallow: /private/

# Block a specific bot
User-agent: BadBot
Disallow: /

# Point to your sitemap
Sitemap: https://yoursite.com/sitemap.xml

Common Robots.txt Mistakes That Kill SEO

Blocking CSS/JS Files

Blocking your CSS or JavaScript files prevents Google from rendering your pages properly. Google needs to render pages to evaluate content and UX — blocked resources cause indexing issues.

Accidentally Blocking the Entire Site

User-agent: * / Disallow: / blocks ALL crawlers from ALL pages. This is surprisingly common after staging-to-production migrations. One misplaced slash can deindex your entire site.

Using Robots.txt for Privacy

Robots.txt is a suggestion, not a security measure. Malicious bots ignore it. For truly private content, use authentication or noindex meta tags. Disallowed URLs can still appear in search results if other sites link to them.

Blocking Crawl but Not Indexing

If you disallow a URL in robots.txt but other sites link to it, Google may still INDEX the URL (showing it in search results with "No information is available for this page"). Use noindex meta tags to prevent indexing.

Robots.txt for AI Crawlers

In 2026, AI companies send their own crawlers to train models. You can control access for specific AI bots:

# Allow Google's AI crawler (for AI Overviews)
User-agent: Google-Extended
Allow: /

# Block OpenAI's crawler (optional)
User-agent: GPTBot
Disallow: /

# Block Common Crawl (used for AI training)
User-agent: CCBot
Disallow: /

Important consideration: Blocking AI crawlers means your content won't be included in AI training data or cited by AI chatbots. For businesses wanting AEO visibility, you should allow AI crawlers while using your content structure to control how AI models represent you.

Best Practices for Robots.txt

Always include a Sitemap: directive pointing to your XML sitemap
Test changes in Google Search Console before deploying
Block admin panels, internal search results, and duplicate paths
Keep the file under 500KB (Google's limit)
Use Allow: to override broader Disallow rules
Audit quarterly — especially after migrations or redesigns

SAGIScan Tip: Our SEO Scanner checks your robots.txt for common mistakes, and our AI Content Analyzer evaluates whether your AI crawler directives align with your AEO strategy.

Frequently Asked Questions

Where should my robots.txt file be located?

Always at the root of your domain: https://yoursite.com/robots.txt. Subdirectory robots.txt files (like /blog/robots.txt) are ignored by search engines. Each subdomain needs its own robots.txt.

Does robots.txt affect my SEO rankings?

Indirectly, yes. A properly configured robots.txt ensures Google crawls your important pages efficiently and doesn't waste crawl budget on low-value URLs. Misconfigured robots.txt can accidentally block important content, destroying your rankings overnight.

Should I block AI crawlers like GPTBot?

It depends on your goals. If you want AI chatbots (ChatGPT, Perplexity) to cite your content (AEO), allow GPTBot and other AI crawlers. If you're concerned about AI training on your content without compensation, block them. There's a real trade-off between AI visibility and content protection.

Check Your AI Visibility Score

175 AI agents analyze your SEO, AEO, GEO & Local presence in seconds

Free Scan