Sitemaps · Explainer

XML sitemap best practices

Your sitemap is a direct list of the URLs you want indexed. Keep it to clean, canonical, 200-status pages and it guides crawlers efficiently. Fill it with redirects and errors and it becomes noise. Here's how to do it right.

The one rule that drives every best practice

Your sitemap should list exactly the canonical, indexable, 200-status URLs you want in search — no more, no less. Everything below follows from that.

A sitemap doesn't directly boost rankings — it's a discovery aid that helps search engines find your pages and understand which URLs you consider canonical. That makes it most valuable for large sites, new sites without many backlinks, and deep pages. But its value depends entirely on cleanliness: a sitemap full of non-indexable URLs tells crawlers you want pages indexed that you've also told them not to index. The best practices below all serve the goal of a sitemap that matches your true indexable URL set.

The best practices

Each maps to a real check the audit runs against your sitemap.

Include only clean, indexable URLs

Every URL in the sitemap should be canonical, indexable, and return 200. Exclude redirects, 404s, noindex pages, and non-canonical URLs. The audit flags non-canonical URLs in the sitemap and 3XX (redirect) URLs in the sitemap — both contradict the sitemap's purpose.

Respect the size limits

Maximum 50,000 URLs or 50MB uncompressed per file. For larger sites, use a sitemap index that points to multiple child sitemaps, often split logically (by section, by content type). The audit flags a sitemap that's too large.

Keep lastmod accurate

Provide a truthful lastmod for each URL so crawlers can prioritise genuinely updated pages. Don't stamp every URL with today's date and never use a future date — the audit flags missing lastmod and future-dated lastmod, both of which train crawlers to ignore the field.

Declare it and keep it valid

Reference the sitemap in robots.txt with a Sitemap: line and submit it in Search Console. Make sure it's valid XML and loads quickly — the audit flags sitemap syntax errors and sitemap timeouts. And don't list the same URL in multiple sitemaps; the audit flags pages appearing in multiple sitemaps.

Keep the sitemap and your indexable set in sync

The subtle failure isn't a bad URL in the sitemap — it's a good URL missing from it.

Two sets should match: the URLs in your sitemap, and the canonical indexable URLs on your site. When a non-indexable URL sneaks into the sitemap, you send a contradictory signal. When an indexable page is missing from the sitemap, it may be crawled less and discovered later — the audit flags indexable pages not in the sitemap for exactly this reason. It also flags pages dropped from the sitemap since the last crawl, which often signals an accidental removal. Treat the sitemap as a generated reflection of your indexable pages, not a hand-maintained list that drifts out of date.

XML sitemap FAQ

Do I need a sitemap if my site is small?

It's still recommended — it's a low-effort way to ensure discovery and signal your canonical URLs — but a small, well-linked site will mostly be crawled fine without one. The value of a sitemap grows with site size and structural depth.

Should I include images and videos?

You can add image and video extensions to a sitemap to aid their discovery, which helps if media is important to your site. Keep the core principle: only include media tied to indexable, canonical pages.

How often should the sitemap update?

It should regenerate whenever pages are added, removed, or change canonical/index status, so it always reflects the current indexable set. Most platforms generate it dynamically; the key is that it stays accurate rather than being a stale snapshot.

Audit your sitemap against your real pages

Free to start. Find redirects, errors, non-canonical URLs and missing pages in your sitemap.

Start my free audit