What is robots.txt used for?

robots.txt is a text file at the root of your domain that tells search engine crawlers which parts of your site they may or may not crawl. It's used to manage crawl load — keeping bots out of admin areas, internal search, infinite parameter URLs, and other low-value paths — and to declare the location of your XML sitemap. It controls crawling, not indexing, which is the distinction most mistakes come from.

Does robots.txt stop a page from being indexed?

No. Disallowing a URL in robots.txt stops crawlers from fetching it, but it does not remove the URL from the index — Google can still index a disallowed URL it knows about from links, showing it without a description. Worse, because the page can't be crawled, Google can't see a noindex tag on it. To keep a page out of search, allow it to be crawled and add a noindex tag; use robots.txt only to manage crawling.

What is the most common robots.txt mistake?

The most damaging mistake is an over-broad Disallow rule that blocks more than intended — for example Disallow: / left over from a staging environment, which blocks the entire site, or a broad path rule that unintentionally covers important pages. Other common errors are blocking CSS or JavaScript that Google needs to render the page, and forgetting to declare the sitemap. Always test rules against real URLs before deploying.

Crawlability · Explainer

robots.txt for SEO

robots.txt controls what search engines crawl — and a single over-broad rule can hide your whole site. Here's how it actually behaves, why blocking isn't the same as deindexing, and how not to block your own pages.

Audit my crawlability free Technical SEO Checker

What robots.txt is for

It's a crawl-management tool, not an indexing tool. That one distinction explains almost every robots.txt mistake.

robots.txt is a plain text file at yourdomain.com/robots.txt that crawlers read before crawling. With User-agent and Disallow / Allow rules, you tell bots which paths to skip — typically admin areas, internal search, login pages, and infinite parameter URLs that would waste crawl budget. You also use it to declare your sitemap. What robots.txt does not do is remove pages from the index — and conflating the two is where sites get into trouble.

Blocking crawling vs blocking indexing

The most important thing to understand about robots.txt.

Disallow stops crawling, not indexing

If you Disallow a URL, crawlers won't fetch its content — but Google can still index the URL if it's linked from elsewhere, showing it in results without a description ("No information is available for this page"). So disallowing a page you wanted hidden can leave a bare, description-less listing in search.

The noindex trap

Here's the trap that catches people: if you want a page out of search, you might add a noindex tagand Disallow it in robots.txt to be thorough. But because robots.txt blocks crawling, Google never fetches the page, so it never sees the noindex — and the page stays indexed. To deindex a page, you must allow crawling so Google can read the noindex. Disallow and noindex work against each other.

The mistakes that block your own site

Real robots.txt checks the audit runs, and how each goes wrong.

Disallow: / from staging. Staging sites often block all crawlers. If that file ships to production, your entire site becomes uncrawlable. This is the robots.txt equivalent of the stray noindex. Blocking indexable pages. An over-broad path rule can cover pages you want ranked — the audit flags robots.txt blocking indexable pages. Blocking the sitemap. A rule that disallows your sitemap path stops crawlers reading it. No sitemap declared. robots.txt should include a Sitemap: line pointing to your XML sitemap. Blocking CSS/JS. Google renders pages; blocking the assets it needs to render can hurt how it sees the page. Missing robots.txt entirely isn't fatal, but a present, correct file is best practice — the audit flags a missing robots.txt too.

Run your free crawl audit Sitemap Validator

robots.txt FAQ

Where does robots.txt go?

At the root of the domain: https://yourdomain.com/robots.txt. It only applies to the host it's served from, so subdomains need their own. It must return a 200 status and be plain text.

How do I block a page from Google properly?

Allow it to be crawled and add a noindex meta tag or X-Robots-Tag header. For sensitive content, use authentication, not robots.txt — disallowed URLs are public and can still be discovered. robots.txt is for crawl management, not security or guaranteed removal.

Should I declare my sitemap in robots.txt?

Yes. Add a line like Sitemap: https://yourdomain.com/sitemap.xml. It's the simplest way to make sure every crawler can find your sitemap, and the audit flags a robots.txt that doesn't declare one.

robots.txt for SEO

What robots.txt is for

Blocking crawling vs blocking indexing

Disallow stops crawling, not indexing

The noindex trap

The mistakes that block your own site

robots.txt FAQ

Where does robots.txt go?

How do I block a page from Google properly?

Should I declare my sitemap in robots.txt?

Make sure robots.txt isn't blocking what matters

We Value Your Privacy

Cookie Preferences

Essential Cookies

Analytics & Performance Cookies

Advertising & Marketing Cookies