XML Sitemap Problems | Blue Frog Docs

XML Sitemap Problems

Fix sitemap errors to help search engines discover and index all your important pages

XML Sitemap Problems

What This Means

An XML sitemap is a file that lists all the important pages on your website, helping search engines discover, crawl, and index your content more efficiently. When sitemaps are missing, incorrect, or poorly structured, search engines may miss important pages, waste crawl budget on unimportant URLs, index wrong versions of pages, or struggle to understand your site's structure, leading to reduced search visibility and organic traffic.

How XML Sitemaps Work

Basic Sitemap Structure:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
        <loc>https://example.com/</loc>
        <lastmod>2025-01-15</lastmod>
        <changefreq>daily</changefreq>
        <priority>1.0</priority>
    </url>
    <url>
        <loc>https://example.com/products/widget</loc>
        <lastmod>2025-01-10</lastmod>
        <changefreq>weekly</changefreq>
        <priority>0.8</priority>
    </url>
</urlset>

What Each Element Means:

  • <loc> - Full URL of the page (required)
  • <lastmod> - Last modification date (optional but recommended)
  • <changefreq> - How often page changes (optional, advisory only)
  • <priority> - Relative importance 0.0-1.0 (optional, advisory only)

Impact on Your Business

SEO Consequences:

  • Pages not indexed - Search engines can't find important content
  • Delayed indexing - New pages take weeks instead of days
  • Wasted crawl budget - Bots crawl wrong pages
  • Lost organic traffic - Products/articles don't appear in search
  • Poor site architecture signals - Google sees disorganized site

Common Sitemap Problems:

  1. No sitemap exists - Search engines must discover pages organically
  2. Outdated sitemap - Lists deleted pages, missing new content
  3. Too large - Over 50,000 URLs or 50MB (Google limit)
  4. Wrong URLs - 404s, redirects, non-canonical URLs
  5. Blocked by robots.txt - Sitemap location blocked from crawling
  6. Not submitted to Search Console - Google doesn't know it exists

Real-World Impact:

  • Sites with sitemaps get indexed 50% faster
  • E-commerce sites without sitemaps miss up to 30% of product indexing
  • News sites need sitemaps to appear in Google News
  • Large sites (10,000+ pages) need sitemaps for efficient crawling

How to Diagnose

Method 1: Check Sitemap Exists

  1. Visit https://yoursite.com/sitemap.xml
  2. or try https://yoursite.com/sitemap_index.xml
  3. Check if it loads

What to Look For:

Good sitemap:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
        <loc>https://example.com/page1</loc>
        <lastmod>2025-01-15</lastmod>
    </url>
</urlset>

Problems:

404 Not Found - No sitemap exists

or

<urlset>
    <url>
        <loc>example.com/page</loc> <!-- Missing protocol https:// -->
    </url>
</urlset>

Method 2: Google Search Console Sitemaps Report

  1. Open Google Search Console
  2. Go to Sitemaps section (left menu)
  3. Check submitted sitemaps

Check For:

Status: Success ✅
URLs discovered: 1,523
URLs indexed: 1,487

or

Status: Couldn't fetch ❌
Error: 404 Not Found

or

Status: Has errors ⚠️
4 URLs couldn't be indexed

Common Errors:

  • "Couldn't fetch" - Sitemap URL doesn't exist or is blocked
  • "Sitemap is HTML page" - Not an XML file
  • "Parsing error" - Invalid XML syntax
  • "Sitemap contains URLs blocked by robots.txt"
  • "Sitemap is too large"

Method 3: XML Sitemap Validator

  1. Visit XML Sitemap Validator
  2. Enter your sitemap URL
  3. Click Validate

Check Results:

  • Valid XML syntax
  • All URLs accessible
  • No 404 errors
  • Proper protocol (https://)
  • Correct domain

Method 4: Manual Sitemap Review

Common issues to check:

<!-- ❌ HTTP instead of HTTPS -->
<loc>http://example.com/page</loc>

<!-- ❌ Relative URLs -->
<loc>/page</loc>

<!-- ❌ 404 pages in sitemap -->
<loc>https://example.com/deleted-page</loc>

<!-- ❌ Redirecting URLs -->
<loc>https://example.com/old-url</loc> <!-- Redirects to new-url -->

<!-- ❌ Non-canonical URLs -->
<loc>https://example.com/page?tracking=123</loc>
<!-- Canonical is: https://example.com/page -->

<!-- ❌ Blocked URLs -->
<loc>https://example.com/admin/</loc>
<!-- Blocked in robots.txt -->

<!-- ❌ Duplicate URLs -->
<loc>https://example.com/page</loc>
<loc>https://example.com/page</loc> <!-- Listed twice -->

<!-- ❌ Wrong domain -->
<loc>https://staging.example.com/page</loc>
<!-- On production sitemap -->

Method 5: Check Sitemap Size

  1. Download sitemap XML file
  2. Check file size
  3. Count URLs

Limits:

  • Maximum 50,000 URLs per sitemap
  • Maximum 50MB uncompressed
  • If exceeded, use sitemap index
# Count URLs in sitemap
curl -s https://example.com/sitemap.xml | grep -c "<loc>"

# Check file size
curl -sI https://example.com/sitemap.xml | grep -i content-length

General Fixes

Fix 1: Create Basic XML Sitemap

Simple sitemap structure:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <!-- Homepage -->
    <url>
        <loc>https://example.com/</loc>
        <lastmod>2025-01-15</lastmod>
        <changefreq>daily</changefreq>
        <priority>1.0</priority>
    </url>

    <!-- Important pages -->
    <url>
        <loc>https://example.com/about</loc>
        <lastmod>2025-01-10</lastmod>
        <changefreq>monthly</changefreq>
        <priority>0.8</priority>
    </url>

    <url>
        <loc>https://example.com/products</loc>
        <lastmod>2025-01-14</lastmod>
        <changefreq>weekly</changefreq>
        <priority>0.9</priority>
    </url>

    <!-- Product pages -->
    <url>
        <loc>https://example.com/products/widget</loc>
        <lastmod>2025-01-12</lastmod>
        <changefreq>weekly</changefreq>
        <priority>0.7</priority>
    </url>

    <!-- Blog posts -->
    <url>
        <loc>https://example.com/blog/post-title</loc>
        <lastmod>2025-01-08</lastmod>
        <changefreq>monthly</changefreq>
        <priority>0.6</priority>
    </url>
</urlset>

Fix 2: Use Sitemap Index for Large Sites

When you have 50,000+ URLs:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <sitemap>
        <loc>https://example.com/sitemap-products.xml</loc>
        <lastmod>2025-01-15</lastmod>
    </sitemap>
    <sitemap>
        <loc>https://example.com/sitemap-blog.xml</loc>
        <lastmod>2025-01-14</lastmod>
    </sitemap>
    <sitemap>
        <loc>https://example.com/sitemap-pages.xml</loc>
        <lastmod>2025-01-10</lastmod>
    </sitemap>
    <sitemap>
        <loc>https://example.com/sitemap-images.xml</loc>
        <lastmod>2025-01-12</lastmod>
    </sitemap>
</sitemapindex>

Fix 3: Generate Sitemap Dynamically

Node.js/Express example:

const express = require('express');
const { SitemapStream, streamToPromise } = require('sitemap');

app.get('/sitemap.xml', async (req, res) => {
    try {
        const sitemap = new SitemapStream({ hostname: 'https://example.com' });

        // Add homepage
        sitemap.write({ url: '/', changefreq: 'daily', priority: 1.0 });

        // Add products from database
        const products = await getProductsFromDB();
        products.forEach(product => {
            sitemap.write({
                url: `/products/${product.slug}`,
                lastmod: product.updatedAt,
                changefreq: 'weekly',
                priority: 0.8
            });
        });

        // Add blog posts
        const posts = await getPostsFromDB();
        posts.forEach(post => {
            sitemap.write({
                url: `/blog/${post.slug}`,
                lastmod: post.updatedAt,
                changefreq: 'monthly',
                priority: 0.6
            });
        });

        sitemap.end();

        const xml = await streamToPromise(sitemap);
        res.header('Content-Type', 'application/xml');
        res.send(xml.toString());
    } catch (error) {
        res.status(500).send('Error generating sitemap');
    }
});

Fix 4: WordPress Sitemap

Use Yoast SEO or RankMath:

// Yoast generates sitemap automatically at:
// https://example.com/sitemap_index.xml

// RankMath generates at:
// https://example.com/sitemap_index.xml

// Or use WordPress core (5.5+):
// Automatically creates at:
// https://example.com/wp-sitemap.xml

Custom WordPress sitemap:

// functions.php
function generate_custom_sitemap() {
    $posts = get_posts(['numberposts' => -1]);
    $pages = get_pages();

    header('Content-Type: application/xml');
    echo '<?xml version="1.0" encoding="UTF-8"?>';
    echo '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';

    // Homepage
    echo '<url>';
    echo '<loc>' . home_url('/') . '</loc>';
    echo '<lastmod>' . date('Y-m-d') . '</lastmod>';
    echo '<priority>1.0</priority>';
    echo '</url>';

    // Posts
    foreach ($posts as $post) {
        echo '<url>';
        echo '<loc>' . get_permalink($post) . '</loc>';
        echo '<lastmod>' . get_post_modified_time('Y-m-d', false, $post) . '</lastmod>';
        echo '<priority>0.6</priority>';
        echo '</url>';
    }

    // Pages
    foreach ($pages as $page) {
        echo '<url>';
        echo '<loc>' . get_permalink($page) . '</loc>';
        echo '<lastmod>' . get_post_modified_time('Y-m-d', false, $page) . '</lastmod>';
        echo '<priority>0.8</priority>';
        echo '</url>';
    }

    echo '</urlset>';
    exit;
}

// Create virtual sitemap
add_action('init', function() {
    add_rewrite_rule('^sitemap\.xml$', 'index.php?custom_sitemap=1', 'top');
});

add_filter('query_vars', function($vars) {
    $vars[] = 'custom_sitemap';
    return $vars;
});

add_action('template_redirect', function() {
    if (get_query_var('custom_sitemap')) {
        generate_custom_sitemap();
    }
});

Fix 5: Shopify Sitemap

Shopify automatic sitemaps:

# Main sitemap index
https://yourstore.myshopify.com/sitemap.xml

# Component sitemaps (auto-generated)
https://yourstore.myshopify.com/sitemap_products_1.xml
https://yourstore.myshopify.com/sitemap_collections_1.xml
https://yourstore.myshopify.com/sitemap_pages_1.xml
https://yourstore.myshopify.com/sitemap_blog_1.xml

Fix 6: Clean Up Sitemap

Remove problematic URLs:

<!-- BEFORE: Messy sitemap -->
<urlset>
    <url>
        <loc>http://example.com/page</loc> <!-- HTTP -->
    </url>
    <url>
        <loc>https://example.com/admin/</loc> <!-- Blocked -->
    </url>
    <url>
        <loc>https://example.com/page?sort=price</loc> <!-- Parameter -->
    </url>
    <url>
        <loc>https://example.com/deleted</loc> <!-- 404 -->
    </url>
    <url>
        <loc>https://example.com/page</loc> <!-- Canonical -->
    </url>
</urlset>

<!-- AFTER: Clean sitemap -->
<urlset>
    <url>
        <loc>https://example.com/page</loc> <!-- Only canonical URL -->
        <lastmod>2025-01-15</lastmod>
    </url>
</urlset>

Fix 7: Submit to Search Engines

Google Search Console:

  1. Go to Sitemaps section
  2. Enter sitemap URL: sitemap.xml or sitemap_index.xml
  3. Click Submit
  4. Monitor for errors

Bing Webmaster Tools:

  1. Go to Sitemaps section
  2. Click Submit sitemap
  3. Enter full URL: https://example.com/sitemap.xml
  4. Click Submit

robots.txt reference:

User-agent: *
Disallow: /admin/

Sitemap: https://example.com/sitemap.xml

Fix 8: Add Images to Sitemap

Image sitemap extension:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
    <url>
        <loc>https://example.com/products/widget</loc>
        <image:image>
            <image:loc>https://example.com/images/widget-main.jpg</image:loc>
            <image:caption>Blue Widget Product Photo</image:caption>
            <image:title>Blue Widget</image:title>
        </image:image>
        <image:image>
            <image:loc>https://example.com/images/widget-side.jpg</image:loc>
            <image:caption>Blue Widget Side View</image:caption>
        </image:image>
    </url>
</urlset>

Platform-Specific Guides

Detailed implementation instructions for your specific platform:

Platform Troubleshooting Guide
Shopify Shopify Sitemap Guide
WordPress WordPress Sitemap Guide
Wix Wix Sitemap Guide
Squarespace Squarespace Sitemap Guide
Webflow Webflow Sitemap Guide

Verification

After creating/updating sitemap:

Test 1: Validate XML

  1. Visit sitemap URL directly
  2. Browser should display XML
  3. No syntax errors
  4. All URLs use https://

Test 2: XML Validator

  1. Use XML Sitemap Validator
  2. Enter sitemap URL
  3. Check for errors
  4. Verify all URLs return 200

Test 3: Google Search Console

  1. Submit sitemap
  2. Wait 24-48 hours
  3. Check Sitemaps report
  4. Verify "Success" status
  5. Check indexed vs discovered ratio

Test 4: Fetch as Google

  1. URL Inspection tool
  2. Enter a URL from sitemap
  3. Should be discoverable
  4. Should be indexable

Common Mistakes

  1. No sitemap - Create one!
  2. Not submitting to Search Console - Google may not find it
  3. Including 404s - Remove deleted pages
  4. Including redirects - Use final destination URL
  5. Including noindex pages - Don't list pages with noindex tag
  6. Using HTTP - All URLs should be HTTPS
  7. Too many URLs - Split into multiple sitemaps
  8. Not updating - Regenerate when adding/removing pages
  9. Including blocked URLs - Don't list robots.txt blocked pages
  10. No lastmod dates - Help crawlers prioritize

Further Reading

// SYS.FOOTER