An XML sitemap is not simply a list of your URLs — it is a communication channel between your website and search engine crawlers, telling them which pages exist, how important they are relative to each other, and when they were last updated. For large websites with thousands of pages, a well-structured sitemap is essential for ensuring that Googlebot discovers and indexes your content efficiently. For smaller sites, a properly maintained sitemap accelerates indexing of new content and helps you identify indexing problems early. This guide covers XML sitemap creation, structuring, submission, and ongoing maintenance for websites of all sizes.
What XML Sitemaps Do and Do Not Do
An XML sitemap informs search engines about the URLs on your site, but does not guarantee that those URLs will be crawled or indexed. It is a hint, not a directive. Googlebot will use the sitemap as one input alongside its own crawling behaviour — it may crawl URLs not in your sitemap (discovered through internal links) and may ignore URLs in your sitemap if it considers them low quality or already well-known. Where sitemaps provide the most value is for new content (getting newly published pages indexed faster), for large sites where internal linking may not reach all pages efficiently, and for identifying indexing problems when you compare sitemap URLs against Google's indexed URL count. Common misconceptions: including a URL in your sitemap does not prevent it from being deindexed if it has quality problems; sitemaps do not replace good internal linking architecture; and having a sitemap does not improve rankings directly — it only improves crawl efficiency.
- Sitemaps accelerate discovery of new content — critical for sites that publish frequently
- Sitemaps help large sites ensure all pages enter Googlebot's crawl queue
- Sitemaps do not guarantee indexing — low-quality pages will still be excluded or deindexed
- Sitemaps complement (not replace) good internal linking architecture
- Compare sitemap URL count vs. indexed URL count in Search Console to diagnose indexing gaps
- Image and video sitemaps extend coverage to media content that may not be discovered otherwise
XML Sitemap Structure and Format
A valid XML sitemap must conform to the sitemap protocol (sitemaps.org). The basic structure is an XML file with a urlset root element containing url child elements, each with a loc (required), lastmod (strongly recommended), changefreq (optional, largely ignored by Google), and priority (optional, largely ignored by Google). The most important field is loc — the canonical URL of the page, exactly matching the canonical href declared on that page. The lastmod field tells Google when the content was last substantially updated — use this accurately (not as a manipulation tactic of setting lastmod to today on old unchanged pages, which Google has explicitly flagged as a trust signal issue). A single XML sitemap can contain up to 50,000 URLs and must not exceed 50MB uncompressed. Sites with more than 50,000 URLs need a sitemap index file.
- loc: the canonical URL exactly matching the rel='canonical' on the page — required
- lastmod: ISO 8601 date format (YYYY-MM-DD) of the last substantial content update — strongly recommended
- changefreq and priority: technically supported but largely ignored by Google — include or omit either way
- Maximum 50,000 URLs per sitemap file and 50MB uncompressed file size
- Compress sitemaps with Gzip (.xml.gz) for files over 1MB to reduce server bandwidth
- URLs must use the same protocol (https) and domain (www or non-www) as your canonical tags
What to Include and Exclude from Your Sitemap
The most critical rule for XML sitemaps is: only include URLs you want Google to index. Including non-canonical, noindexed, or low-quality URLs in your sitemap creates inconsistency signals (you are telling Google to index a URL while simultaneously telling it not to index it), wastes crawl budget, and reduces the overall trust value of your sitemap. Exclude: pages with noindex tags, paginated pages (page 2, 3 etc. unless they have significant independent search value), filtered or sorted e-commerce URLs (faceted navigation), admin and account pages, thank-you and confirmation pages, login pages, and any URL that 301-redirects to another URL. Include: all canonical, indexable pages you want ranked — service pages, blog posts, product pages, category pages, location pages, and landing pages. For e-commerce sites with faceted navigation, only include the canonical category pages — not the filter combinations.
- EXCLUDE: pages with noindex tags — a noindexed page in your sitemap creates a contradiction
- EXCLUDE: pages that 301-redirect to another URL — include only the final destination URL
- EXCLUDE: admin pages, login pages, cart pages, and account management pages
- EXCLUDE: faceted navigation filter combinations for e-commerce — only include canonical category pages
- EXCLUDE: paginated pages past page 1 unless they have independent search value
- INCLUDE: all canonical, indexable pages — service pages, blog posts, product pages, location pages
- INCLUDE: recently published content pages immediately after publishing for faster indexing
Sitemap Index Files for Large Websites
When your site exceeds 50,000 URLs or when you want to organise sitemaps by content type, use a sitemap index file. A sitemap index is an XML file that lists multiple child sitemap files, each of which can contain up to 50,000 URLs. The sitemap index itself uses a sitemapindex root element with sitemap child elements, each containing a loc (URL of the child sitemap) and optionally a lastmod. This structure allows you to submit a single URL (the sitemap index) to Google Search Console while Google separately crawls each child sitemap. For large sites, organise child sitemaps by content type: a blog sitemap for all blog posts, a services sitemap for service pages, a products sitemap for product pages, and an images sitemap for image content. This organisation makes it easier to monitor indexing performance by content type in Search Console.
- Use a sitemap index file when you have more than 50,000 URLs or multiple content sections
- Sitemap index: sitemapindex root element with sitemap children each having loc and optional lastmod
- Organise child sitemaps by content type (blog, services, products) for easier monitoring
- Submit only the sitemap index URL to Google Search Console — not each individual child sitemap
- Each child sitemap listed in the index must be accessible (return 200 status) when Googlebot requests it
- Update lastmod in the sitemap index when a child sitemap is updated to signal freshness
Dynamic Sitemaps: Keeping Sitemaps Current
For sites that publish content frequently (blogs, e-commerce platforms, news sites), dynamic XML sitemaps — generated on-the-fly from your database — are more reliable than manually maintained static files. Most CMS platforms include dynamic sitemap generation: WordPress sites using Yoast SEO or RankMath automatically generate and update XML sitemaps as content is published, updated, or deleted. For custom-built sites, sitemap generation should be integrated into your publishing pipeline — when a new page is created, its URL should automatically be added to the appropriate sitemap, and the lastmod date on that sitemap entry should update when the page content is modified. For large e-commerce platforms (10,000+ products), dynamic sitemaps must also handle products going out of stock (remove from sitemap or keep with noindex depending on your strategy) and new product additions in real-time.
- Use CMS-generated dynamic sitemaps for sites that publish content regularly
- Configure Yoast SEO or RankMath for automatic WordPress sitemap generation and updates
- Build sitemap generation into your publishing pipeline for custom CMS implementations
- Automate lastmod date updates whenever substantial page content changes
- Set up sitemap regeneration triggers for product availability changes on e-commerce sites
- Validate dynamically generated sitemaps periodically using Google's Sitemap testing in Search Console
Submitting Sitemaps to Google and Bing
Submit your XML sitemap to both Google Search Console and Bing Webmaster Tools for maximum indexing coverage. In Google Search Console: navigate to Sitemaps in the left sidebar, enter the relative path of your sitemap file (e.g., /sitemap.xml or /sitemap-index.xml), and click Submit. Google will then fetch and process the sitemap, and the Sitemaps report will show the number of submitted URLs and how many are indexed. Also add your sitemap URL to your robots.txt file using the Sitemap: directive (Sitemap: https://yourdomain.com/sitemap.xml) — this allows any search engine crawler, including those for Bing, Yandex, and AI crawlers, to discover your sitemap without needing explicit submission. For Bing (which feeds ChatGPT Search), submit separately through Bing Webmaster Tools at bing.com/webmasters.
- 1Go to Google Search Console > Sitemaps and submit your sitemap URL
- 2Add Sitemap: https://yourdomain.com/sitemap.xml to your robots.txt file
- 3Submit your sitemap to Bing Webmaster Tools at bing.com/webmasters
- 4Review the Sitemaps report in Search Console after 24-48 hours for submitted vs. indexed URL counts
- 5Investigate any URLs that are in the sitemap but not indexed — review their quality and canonical tags
- 6Set a calendar reminder to review sitemap health monthly using the Search Console Sitemaps report
Diagnosing Sitemap Issues in Google Search Console
Google Search Console's Sitemaps report provides three critical data points: the number of URLs submitted (all URLs in your sitemap), the number discovered (all URLs Googlebot found), and the number indexed (URLs that made it into Google's index). A large gap between submitted and indexed typically indicates: the gap pages have noindex tags (check — if intentional, exclude them from the sitemap), the gap pages have canonicalization issues (canonical pointing to a different URL), the gap pages have content quality issues (thin content, duplicate content), or the gap pages are newly published and not yet crawled. For each Sitemaps error type — Sitemap could not be read, Sitemap has issues, URLs could not be processed — Search Console provides specific guidance. Common processing errors include URLs that redirect (should be the final URL), URLs with non-canonical canonical tags, and URLs returning non-200 status codes.
- Compare submitted URL count to indexed count — large gaps indicate quality or canonicalization issues
- Check all sitemap URLs are returning 200 status — not redirecting or erroring
- Investigate Sitemap errors in Search Console for specific processing issues
- Review why specific pages are excluded from the index using URL Inspection tool
- Re-submit your sitemap after major updates to trigger fresh Googlebot processing
- Monitor the Sitemaps report weekly for new errors or drops in indexed URL count
Image and Video Sitemaps for Rich Media Content
Standard XML sitemaps only handle page URLs. For sites with significant image or video content, image sitemaps and video sitemaps extend coverage to media that may not be discovered through standard crawling. Image sitemaps use the image:image extension within standard page URL entries — add image:loc, image:title, and image:caption properties for each image on a page. Video sitemaps use the video:video extension with required properties including video:thumbnail_loc, video:title, video:description, and video:content_loc or video:player_loc. Image sitemaps are particularly valuable for e-commerce sites (product images in Google Images) and news sites (editorial photography). Video sitemaps are required for videos to be eligible for Google's Video rich results and video carousel features. Both types use the same sitemap index structure and can be submitted through Search Console.
- Add image:image extensions to page URL entries for important product or editorial images
- Include image:loc (image URL), image:title, and image:caption in image sitemap entries
- Create a video sitemap for sites with embedded video content using video:video extensions
- Video sitemaps require video:thumbnail_loc, video:title, video:description, and video content location
- Image sitemaps improve discovery and ranking in Google Images search
- Submit image and video sitemaps separately to Search Console for independent performance monitoring
A well-maintained XML sitemap is a low-effort, high-impact technical foundation that pays dividends across indexing speed, crawl efficiency, and diagnostics capability. The most critical discipline is keeping your sitemap accurate — only canonical, indexable 200-status URLs should be included. An accurate sitemap makes Search Console's coverage reporting reliable, helps Google allocate crawl budget efficiently, and accelerates indexing of your most important new content. Set up dynamic sitemap generation if your site publishes content regularly, submit to both Google and Bing, and review the Sitemaps report monthly to catch issues early.
Frequently Asked Questions
Does having a sitemap improve Google rankings?
Sitemaps do not directly improve rankings — they improve crawl efficiency and indexing speed. A page that is not indexed cannot rank for anything, so ensuring your important pages are indexed (which sitemaps help with) is a prerequisite for ranking. For established sites with good internal linking, sitemaps primarily help with new content discovery speed. For large sites with indexing gaps, fixing sitemap issues can have significant ranking impact.
How many URLs can I include in one XML sitemap?
A single XML sitemap file can contain a maximum of 50,000 URLs and must not exceed 50MB in uncompressed size. Sites with more than 50,000 URLs must use a sitemap index file that references multiple child sitemap files, each within the 50,000 URL limit. Most medium-sized business websites are well within the single-file limit.
Should I include paginated pages in my sitemap?
Generally no — paginated pages (page 2, 3, etc.) should be excluded from sitemaps unless they have significant independent search value. Pagination pages for blog archives, category pages, and search results are typically low-value for indexing purposes. If you want pagination pages indexed, ensure they have unique meta descriptions and meaningful standalone content. For most sites, only page 1 of any paginated series should be in the sitemap.
How often should I update my XML sitemap?
Dynamic CMS sitemaps should update automatically whenever content is published, updated, or deleted. Manually maintained sitemaps should be updated whenever new pages are published or existing pages have their canonical structure changed. At minimum, review your sitemap for accuracy quarterly — checking that it only contains canonical, indexable 200-status URLs. Set up Search Console monitoring to alert you to sitemap processing errors.
What is a sitemap index file and when do I need one?
A sitemap index file is an XML file that lists multiple child sitemap files, used when your site exceeds 50,000 URLs or when you want to organise sitemaps by content type. The sitemap index itself is submitted to Google Search Console, which then fetches and processes each child sitemap. Most business websites under 10,000 pages do not need a sitemap index — a single sitemap file is sufficient and simpler to manage.
Why are pages in my sitemap not being indexed by Google?
Common reasons include: the page has a noindex tag (contradicting its sitemap inclusion), the page has a canonical tag pointing to a different URL, the page has thin or duplicate content that Google does not consider worth indexing, the page is newly published and not yet crawled, or the page has technical issues preventing Googlebot from rendering it correctly. Use Google Search Console's URL Inspection tool on specific non-indexed URLs to get Google's specific reason for exclusion.