Technical SEO·Baz Furby·11 min read

XML Sitemaps: How to Create, Structure, and Submit Them for Maximum Indexing

Everything you need to know about XML sitemaps in 2026 — structure, best practices, image and video sitemaps, submission, and common mistakes to avoid.


Technical SEO has always separated the practitioners who actually move rankings from those who cargo-cult tactics from five years ago. XML sitemaps are a good example of a tool that many sites implement incorrectly, then wonder why Google is not indexing their content efficiently. The sitemap is one of the most direct lines of communication you have with Googlebot — and it is worth getting right.

This guide covers how XML sitemaps work, how to structure them correctly, which attributes Google actually uses (versus ignores), the different sitemap types available, and the mistakes that cause sitemaps to actively harm rather than help your indexing.


What an XML Sitemap Actually Does

There is a common misconception worth clearing up immediately: an XML sitemap does not tell search engines what to index. It tells them what exists.

Submitting a URL in your sitemap does not guarantee Google will index it. Google still applies its own quality judgements — a URL in your sitemap can be crawled and then excluded from the index if Google determines the content is thin, duplicate, or low quality. What the sitemap does is ensure Google knows those URLs exist and has a pathway to discover them without relying solely on internal links.

This distinction matters because it explains why you sometimes see URLs in Search Console's Coverage report as "Submitted in sitemap but not indexed." The sitemap did its job. The content did not pass Google's quality bar.

The sitemap is most valuable for:

  • Large sites where Googlebot might not discover deep pages through crawling alone
  • New sites with few external links where organic discovery is slow
  • Sites with content that changes frequently (news, e-commerce inventory, job listings)
  • Rich media content (images, videos) that you want indexed separately

For small sites with good internal linking, the sitemap is still worth having, but it is less operationally critical.


XML Sitemap Structure

A standard XML sitemap follows the Sitemaps Protocol. Here is the basic structure:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://www.example.com/page/</loc>
    <lastmod>2026-03-15</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

The <loc> Element

Required. This is the canonical URL of the page. It must:

  • Be an absolute URL including protocol (https://)
  • Match the canonical URL on that page — if your canonical says https://www.example.com/page/ but your sitemap says https://example.com/page/ (no www), you are creating a conflict
  • Be URL-encoded (special characters like & should be written as &amp;)

The <lastmod> Element

Optional, but important when used correctly. lastmod tells search engines when the page was last substantively modified. Google uses this to prioritise recrawling pages that have changed.

The key word is substantively. Updating lastmod without actually changing the content — a practice some sites use to try to appear fresher — is something Google actively detects and responds to by ignoring your lastmod entirely. If Google determines your lastmod dates are inaccurate, it stops trusting them across your whole sitemap. Use lastmod only when content has genuinely changed.

Format: W3C Datetime format — YYYY-MM-DD is sufficient for most purposes, though full datetime format (2026-03-15T14:30:00+00:00) can be used for frequently updated content.

The <changefreq> Element

Optional, and largely ignored by Google. changefreq was intended to tell crawlers how often a page's content changes (always, hourly, daily, weekly, monthly, yearly, never). In practice, Google has confirmed it treats this as an unreliable hint at best. You can include it for completeness with other crawlers, but do not expend effort on it.

The <priority> Element

Optional, and similarly ignored by Google in most cases. priority is a relative importance score from 0.0 to 1.0. The default is 0.5. Google has stated publicly that it does not use priority to determine crawl frequency or ranking. You can include it if you wish, but it will not change your results.


Sitemap Best Practices

Exclude noindexed Pages

Never include URLs in your sitemap that carry a noindex directive. This sends a contradictory signal: your sitemap says "please crawl this" and your page says "please do not index this." Google will typically respect the noindex tag, but you are wasting crawl budget and creating confusion. Audit your sitemap regularly to ensure every URL is indexable.

Exclude Pages Blocked by robots.txt

Similarly, do not include URLs that you have blocked in robots.txt. If Googlebot cannot crawl the page, submitting it in a sitemap is pointless and wastes resources.

Keep Sitemaps Under 50,000 URLs

The sitemap protocol limits individual sitemaps to 50,000 URLs and 50MB uncompressed. If your site exceeds this, use a sitemap index file.

Use a Sitemap Index for Large Sites

A sitemap index is a file that points to multiple individual sitemaps:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://www.example.com/sitemap-pages.xml</loc>
  </sitemap>
  <sitemap>
    <loc>https://www.example.com/sitemap-blog.xml</loc>
  </sitemap>
  <sitemap>
    <loc>https://www.example.com/sitemap-products.xml</loc>
  </sitemap>
</sitemapindex>

Segmenting by content type (pages, posts, products) also makes it easier to diagnose indexing issues per section.

Reference Your Sitemap from robots.txt

Add this line to your robots.txt:

Sitemap: https://www.example.com/sitemap.xml

This ensures all crawlers — not just Google — can discover your sitemap without relying on Search Console submission.


Types of Sitemaps

Standard XML Sitemap

The default sitemap containing your standard page URLs. This is the one to start with.

Image Sitemap

Google can discover images through standard crawling, but an image sitemap ensures comprehensive discovery — particularly for images that are loaded via JavaScript or are not referenced by any crawlable link.

Image sitemaps use an extension to the standard sitemap namespace:

<url>
  <loc>https://www.example.com/product/red-trainers/</loc>
  <image:image>
    <image:loc>https://www.example.com/images/red-trainers-hero.jpg</image:loc>
    <image:title>Red Running Trainers</image:title>
    <image:caption>Our bestselling trainer in red, available in sizes 5–12</image:caption>
  </image:image>
</url>

Image sitemaps are most valuable for e-commerce and photography sites where image search traffic is meaningful.

Video Sitemap

Video sitemaps tell Google about video content embedded on your pages, including title, description, thumbnail URL, and duration. They support eligibility for video rich results in search. If you host video content directly (not purely via YouTube embeds), a video sitemap is worth implementing.

News Sitemap

News sitemaps are for Google News-approved publishers. They work differently from standard sitemaps — they should only contain articles published within the last 48 hours and must include a <news:news> element with publication name, publication date, and article title. News sitemaps are reindexed very frequently.


How to Submit Your Sitemap in Google Search Console

  1. Open Google Search Console
  2. Select your property
  3. Navigate to Indexing > Sitemaps
  4. Enter your sitemap URL (e.g., https://www.example.com/sitemap.xml)
  5. Click Submit

After submission, Search Console will show:

  • Status — whether the sitemap was successfully read
  • Discovered URLs — how many URLs Google found in the sitemap
  • Last read — when Google last fetched the sitemap

The key metric to watch is the ratio of Discovered URLs to Indexed URLs. A large gap between the two does not necessarily mean something is wrong — it may mean your content has not yet passed Google's quality threshold — but a sudden drop in indexed URLs after a sitemap change is a signal to investigate.


Dynamic vs Static Sitemaps

Dynamic Sitemaps

Most CMS platforms (WordPress, Shopify, Webflow, Astro) generate sitemaps dynamically — the sitemap is generated on-the-fly based on your current content. Dynamic sitemaps automatically include new pages and remove deleted ones, which makes them easier to maintain.

The risk with dynamic sitemaps is that they can accidentally include the wrong pages (draft posts, password-protected pages, parameter duplicates) if not configured correctly. Review the output of your dynamic sitemap regularly, not just on initial setup.

Static Sitemaps

Static sitemaps are hand-crafted or generated by a build process (common in static site generators like Astro, Eleventy, or Hugo). They require updating whenever content is added or removed, but give you precise control over what is included.

For static sites where you have full control of the build pipeline, static sitemaps generated at build time are often the cleanest option.


Common Sitemap Mistakes

Including Login Pages and Admin URLs

Any page that requires authentication should not appear in your sitemap. These pages return a redirect or error to Googlebot, and their inclusion signals poor sitemap hygiene. Audit your sitemap for any /account/, /login/, /dashboard/, or /admin/ URLs.

Including Query Parameter Duplicates

If your site generates parameterised URLs for filtering, sorting, or tracking, these variants can end up in auto-generated sitemaps. Include only canonical, parameter-free URLs in your sitemap — or the parameter-containing URLs that represent genuinely distinct, indexable content.

Using Inaccurate lastmod Dates

As noted earlier, Google actively detects when lastmod dates do not correspond to genuine content changes. Setting all pages to today's date as a "freshness hack" will cause Google to discard your lastmod data entirely. Use accurate timestamps or omit lastmod on pages where you cannot track it reliably.

Forgetting to Update the Sitemap After URL Changes

When you restructure your URLs or redirect old content, your sitemap must be updated. A sitemap containing old URLs that now 301 redirect to new ones causes Google to follow the redirect and may slow down its recognition of the new canonical URLs.

Not Validating the Sitemap

Malformed XML will cause the sitemap to fail. Validate your sitemap before submission using a tool like the Google Search Console sitemap validator or an online XML validator.


Sitemap Strategy for Large Sites

For sites with hundreds of thousands of pages — large e-commerce stores, news publishers, job boards — sitemap management becomes a more active discipline.

Segment by Content Type

Use a sitemap index to separate your sitemap into logical groups: pages, products, blog posts, categories. Segmentation makes indexing analysis more granular. If 90% of your product pages are not being indexed but your blog posts are, you need that visibility to diagnose the problem.

Prioritise High-Value Content

Although the <priority> tag is ignored by Google, you can influence crawl prioritisation indirectly by including only high-value pages and ensuring low-quality pages are excluded. Google allocates crawl budget based on its assessment of page quality and site authority — a sitemap full of thin or near-duplicate pages will receive less thorough crawling than a sitemap of high-quality, unique content.

Monitor Indexed vs Submitted Ratio Over Time

A healthy large site will see the indexed count trend upward as new content is published and existing content is re-evaluated. Sudden drops in the indexed count — or a widening gap between submitted and indexed — are early warning signals of a quality issue, a technical problem (server errors, noindex directives deployed accidentally), or a Google algorithm adjustment affecting your site's perceived quality.

Track this ratio monthly. Set a threshold (e.g., alert if indexed count drops more than 10% month-on-month) so you catch problems before they compound into significant traffic losses.

Handle Deleted Content

When pages are removed, your sitemap should be updated promptly to remove them. Pages that return 404 or 410 errors and remain in your sitemap create ongoing crawl waste. A 410 (Gone) status is preferable to 404 for permanently deleted content — it signals to Google that the deletion is intentional rather than a temporary error.


Checking Your Sitemap for Errors

Run through this checklist when auditing your sitemap:

  • Fetch the sitemap directly in a browser and confirm it loads without errors
  • Validate the XML structure
  • Check a random sample of URLs to confirm they return 200 status
  • Verify none of the included URLs have noindex directives
  • Check Search Console for sitemap-level errors or "Discovered but not indexed" warnings
  • Compare the submitted URL count against your actual indexable page count

Surfaceable validates sitemap structure and checks for common issues — including noindexed URLs in sitemaps and mismatched canonical references — as part of its technical SEO audit. Running this check after any CMS update or site migration prevents issues from going undetected until they show up as indexing drops.

A well-maintained XML sitemap is low-effort infrastructure that pays ongoing dividends. The goal is not to game crawl frequency — it is to give Google a clean, accurate, trustworthy map of your site so that when you publish something worth ranking, Googlebot finds it promptly.


Try Surfaceable

Track your brand's AI visibility

See how often ChatGPT, Claude, Gemini, and Perplexity mention your brand — and get a full technical SEO audit. Free to start.

Get started free →