Robots.txt Checker

Enter any URL to fetch and analyze its robots.txt file. Check for blocked pages, missing sitemaps, and common configuration mistakes.

What is robots.txt?

Robots.txt is a plain text file that lives at the root of your website (e.g., example.com/robots.txt). It tells search engine crawlers which pages or sections of your site they are allowed or not allowed to access. Every major search engine, including Google, Bing, and others, checks this file before crawling your site. While robots.txt is not a security mechanism (it does not prevent access, only requests that crawlers respect the rules), it plays a critical role in managing how search engines discover and index your content.

How robots.txt affects SEO

Your robots.txt file directly controls which parts of your site search engines can crawl. If important pages are accidentally blocked, they will not appear in search results at all. Conversely, allowing crawlers access to low-value pages (like admin panels or duplicate content) wastes your crawl budget and can dilute your site's overall quality signals. A well-configured robots.txt ensures search engines spend their time on the pages that matter most, helping your best content get indexed faster and rank higher. It is also the standard place to declare your sitemap URL, which helps search engines discover all your pages efficiently. Make sure to also verify your canonical tags are set correctly — they work together with robots.txt to control how Google indexes your site.

Common robots.txt mistakes

  • Blocking your entire site. A single Disallow: / line under User-agent: * prevents all search engines from crawling any page. This is sometimes left over from staging environments and can deindex your whole site.
  • No sitemap reference. Failing to include a Sitemap: directive means search engines must discover your sitemap through other means, which can delay indexing of new or updated pages.
  • Blocking CSS and JavaScript. Google needs to render your pages to understand them. Blocking CSS or JS files in robots.txt prevents Googlebot from seeing your pages the way users do, which can hurt rankings.
  • Using robots.txt instead of noindex. Robots.txt prevents crawling, but it does not remove pages from search results. If a blocked page has external links pointing to it, Google may still index the URL (just without crawling its content). Use a noindex meta tag to truly remove pages from results.
  • Syntax errors. Robots.txt is case-sensitive for paths and requires precise formatting. A typo in a Disallow rule can leave pages exposed or accidentally block important content.

Want to fix these automatically?

GSCPilot connects your Google Search Console and GitHub to find SEO issues, generate AI-powered fixes, and ship them via pull request. No manual work needed.