Common Mistakes the Analyzer Surfaces
These are the patterns curators see most often when running this tool on real public sites.
robots.txt
- A
Disallow: /adminrule that inadvertently documents the admin path to everyone. - A sitemap declared in
robots.txtthat 404s or points to a stale domain. - Contradictory
User-agent: *and bot-specific rules.
Sitemaps
- Stale
lastmodvalues that haven't changed in years. - Inclusion of
noindexpages — the site is asking Google to crawl pages it has also asked Google not to index. - Single sitemap files over 50,000 URLs (the protocol limit) or 50 MB uncompressed.
Meta tags
- A canonical pointing at the wrong scheme (
http://vs.https://). - A
<meta name="robots" content="noindex">left in place after launch. - Open Graph images that are missing or smaller than the 1200×630 minimum.