Crawl budget is how much of your site search engines will crawl in a given window. For small sites it barely matters; for large ones, wasted crawl budget means important pages get crawled and refreshed less often. Here's what it is and how to protect it.
Two forces decide it: how much your server can handle, and how much Google wants to crawl you. Where those meet is your budget.
Search engines don't crawl every URL on the web constantly — they ration. Crawl capacity is how many requests your server can take without slowing down; a fast, reliable server earns more crawling, a slow or error-prone one earns less. Crawl demand is how much Google wants to crawl you, driven by your site's size, popularity, and how often your content actually changes. The practical budget is where capacity and demand meet — and the question that matters is whether that budget covers all the URLs worth crawling.
Be honest about which situation you're in before spending time on it.
If your site has up to a few thousand URLs, Google can crawl all of it easily and frequently. Worrying about crawl budget here is a distraction — your time is far better spent on content quality and internal links. Don't optimise a problem you don't have.
Crawl budget becomes real when you have tens of thousands of URLs, or when your site generates many low-value URLs automatically: faceted navigation, filter and sort parameters, session IDs, infinite calendars, or near-duplicate pages. On these sites, Google can spend its budget on the noise and crawl your important pages rarely — so new content is indexed slowly and updates take a long time to register.
Most crawl-budget problems are the same technical issues this guide already covers — they just compound at scale.
Broken links and 404s send crawlers to dead ends they then re-check — see why broken links hurt SEO. Redirect chains make Google follow multiple hops for one page — see redirect chains. Parameter and faceted URLs multiply near-duplicate pages; consolidate with canonicals or block low-value parameters in robots.txt. Duplicate content means crawling the same thing many times. A bloated sitemap full of non-indexable URLs points crawlers at the wrong pages. A slow server directly lowers crawl capacity. Fix these and the available budget flows to your real, indexable content — which is the entire goal.
Check Search Console's Crawl Stats and the Pages report. Signs of a problem: many "Discovered – currently not indexed" or "Crawled – currently not indexed" URLs, important pages crawled infrequently, or a large gap between how many URLs you have and how many are indexed. On small sites with full indexing, it's not your bottleneck.
Not directly — it affects discovery and freshness. If a page isn't crawled, it can't be indexed or updated in the index, which indirectly costs rankings. Optimising crawl budget is about making sure your good pages get crawled, not about a direct ranking boost.
Blocking genuinely low-value paths (internal search, infinite parameters) can help redirect crawling toward valuable pages. But be careful not to block pages you want indexed, and remember robots.txt doesn't deindex — see the robots.txt guide.
Free to start. Find the broken links, redirect chains and duplicate URLs burning your crawl budget.
Start my free audit