A complete SEO audit checklist for 2026. Covers technical SEO, on-page, content, schema, Core Web Vitals, and AI search readiness — with 50 actionable checks.
Running an SEO audit without a structured checklist is how teams spend three days on the wrong problems. You fix image alt text while the site has 200 broken internal links and a robots.txt that's blocking Googlebot from half the domain.
This checklist covers 50 specific checks across every layer of SEO — technical foundations, on-page signals, content quality, Core Web Vitals, schema markup, and AI search readiness. Work through it in order: technical issues at the foundation will invalidate everything built on top of them.
Surfaceable's free audit runs 16 of the most critical checks automatically in under two minutes — useful as a starting point before you work through the full list below.
Getting crawled and indexed correctly is the non-negotiable first layer. None of the other checks matter if search engines can't access your content reliably.
1. Robots.txt is live and correctly configured
Fetch yourdomain.com/robots.txt directly. Confirm it exists, that it's not blocking critical paths (CSS, JS, images, or key page directories), and that it references your XML sitemap.
2. XML sitemap exists and is submitted to Search Console
Your sitemap should be at /sitemap.xml or listed in robots.txt. Submit it in Google Search Console and check for errors. Sitemaps should only include canonical, indexable URLs.
3. No important pages are blocked by robots.txt Use Google Search Console's URL Inspection tool or a crawler to identify any unintentionally blocked pages. Blocking CSS and JS files prevents Google from rendering pages correctly.
4. Crawl budget isn't being wasted on low-value URLs
Faceted navigation, infinite scroll parameters, session IDs, and tracking parameters can multiply crawlable URLs. Use noindex or parameter handling in Search Console to keep crawl budget on pages that matter.
5. No orphaned pages exist Every page worth indexing should be reachable via internal links. Orphaned pages — those with no internal links pointing to them — are effectively invisible to crawlers even if they're not blocked.
6. Redirect chains are short (maximum two hops) Each redirect in a chain passes slightly less authority and adds latency. Map your redirects and eliminate chains. A → B → C should become A → C directly.
7. No redirect loops A redirects to B which redirects back to A. These break pages entirely. Any crawler will surface them quickly.
8. All HTTP pages redirect to HTTPS Every HTTP URL should return a 301 to its HTTPS equivalent. Check that the redirect goes directly to the canonical HTTPS version, not via intermediate hops.
9. HTTPS certificate is valid and not expiring within 30 days An expired certificate blocks access and tanks trust signals immediately. Set a calendar reminder or use certificate monitoring.
10. Canonical tags are correctly implemented Every page should either have a self-referencing canonical or point to the authoritative version. Cross-domain canonicals, paginated pages, and print versions all need explicit handling.
11. Pagination is handled correctly
Paginated sequences should use either rel="next" / rel="prev" (still supported by some engines) or a robust linking structure. The canonical on page 2 of a series should not point to page 1 unless you want page 2 deindexed.
12. Hreflang is implemented correctly for multilingual sites Missing return tags, incorrect locale codes, and hreflang pointing to non-canonical URLs are the three most common errors. Each language variant must confirm the relationship back to every other variant.
13. 404 pages return a proper 404 status code Soft 404s — pages that return 200 but display "not found" content — confuse crawlers and waste crawl budget. Your 404 page should return HTTP status 404.
14. Server response times are under 200ms Slow server response increases time-to-first-byte (TTFB) and can directly affect crawl rate and user experience. Check TTFB with PageSpeed Insights or WebPageTest.
Once you've confirmed the site can be crawled and indexed, on-page signals determine what each page is about and how it should rank.
15. Every page has a unique title tag Duplicate title tags tell Google that two pages cover the same topic. Every page needs a distinct title reflecting its specific content.
16. Title tags are 50–60 characters This keeps them within the typical display width in Google's search results. Longer titles get truncated in ways you can't control.
17. Primary keyword appears in the title tag Front-load the primary keyword where natural. "SEO Audit Checklist 2026" outperforms "A Complete Guide to SEO Auditing in 2026."
18. Meta descriptions are unique and 145–155 characters Google rewrites them frequently, but a well-written meta description still influences click-through rate — particularly when Google does use it verbatim.
19. Each page has a single H1 The H1 signals the primary topic of the page. Multiple H1s create ambiguity. Missing H1s leave a signal gap.
20. H1 includes the target keyword Not stuffed — naturally incorporated. The H1 and title tag can differ, but they should align on the core topic.
21. Heading hierarchy is logical (H1 → H2 → H3) Heading structure helps both users and crawlers understand content organisation. Skipping from H1 to H4 or using heading tags for visual styling breaks this.
22. URLs are short, descriptive, and lowercase
/seo-audit-checklist beats /page?id=4821&cat=12. Avoid dynamic parameters, unnecessary stop words, and mixed case in URLs.
23. Images have descriptive alt text
Alt text serves accessibility and provides a text signal for image content. Every meaningful image should have alt text; decorative images should have alt="".
24. Internal links use descriptive anchor text "Click here" and "read more" tell crawlers nothing. Use anchor text that describes the destination page's topic.
Technical and on-page checks get you into the game. Content quality determines whether you stay there.
25. No significant thin content pages exist Pages with under 300 words of original content that don't serve a clear purpose (landing pages, contact pages excluded) dilute your site's overall quality signal. Consolidate or expand them.
26. No duplicate content issues Use a crawler to identify pages with identical or near-identical body content. Resolve with canonicals, 301 redirects, or by making the content genuinely distinct.
27. Content demonstrates first-hand expertise (E-E-A-T) Google's quality rater guidelines weight Experience, Expertise, Authoritativeness, and Trustworthiness. For YMYL topics (health, finance, legal) this is especially critical. Author bios, credentials, and cited sources all contribute.
28. Author information is present and credible Named authors with linked profiles, bios showing relevant credentials, and consistent publishing histories all support E-E-A-T signals.
29. Content is regularly updated Stale content — particularly on topics where freshness matters — signals neglect. Add last-updated dates where accurate, and audit high-traffic pages annually at minimum.
30. No keyword cannibalism across multiple pages Two pages targeting the same primary keyword split authority and create ranking instability. Consolidate or differentiate them clearly.
Core Web Vitals are confirmed ranking signals. More importantly, they're direct measures of user experience. Poor scores cost you both.
31. Largest Contentful Paint (LCP) is under 2.5 seconds LCP measures how long the largest visible element (usually a hero image or heading) takes to render. Causes of poor LCP: slow server response, render-blocking resources, unoptimised images.
32. Interaction to Next Paint (INP) is under 200 milliseconds INP replaced First Input Delay in 2024 as Google's interactivity metric. It measures the full duration of input interactions throughout a page visit, not just the first one.
33. Cumulative Layout Shift (CLS) is under 0.1 CLS measures visual instability — elements jumping around as the page loads. Common causes: images without declared dimensions, dynamically injected content above existing content, web fonts causing FOIT/FOUT.
34. Field data (CrUX) matches lab data PageSpeed Insights shows both lab data (simulated) and field data (real user measurements via Chrome User Experience Report). Discrepancies between the two indicate real-world performance problems that lab tests aren't capturing.
35. Mobile and desktop Core Web Vitals are both measured Google uses mobile-first indexing. Your mobile CWV scores are what matter for rankings, but desktop scores affect users on those devices. Check both in Search Console's Core Web Vitals report.
Structured data helps search engines understand content and unlocks rich results. It's also increasingly important for AI search engines parsing your content.
36. Organisation schema is implemented on the homepage Declares your brand name, logo, contact information, and social profiles to search engines in machine-readable format.
37. BreadcrumbList schema is on all inner pages Enables breadcrumb display in SERPs and clarifies site hierarchy to crawlers.
38. Article or BlogPosting schema on editorial content Includes author, datePublished, dateModified, and headline. Supports rich results and helps AI search engines attribute content correctly.
39. Product schema is complete for e-commerce pages Price, availability, SKU, and review aggregation. Missing fields mean missed opportunities for product rich results.
40. FAQ schema is used where genuinely appropriate FAQ schema can earn additional SERP real estate. Use it only where the content is genuinely structured as questions and answers — not as a tactic to stuff schema onto pages that aren't actually FAQs.
41. Schema is validated with Google's Rich Results Test Implementation errors are common. Validate every schema type with Google's tool before considering it done.
42. Images are in next-gen formats (WebP or AVIF) JPEG and PNG remain widely supported but file sizes are significantly larger than WebP or AVIF equivalents at comparable quality. Serve next-gen formats with appropriate fallbacks.
43. Images are correctly sized for their display dimensions Serving a 3000px image in a 600px container wastes bandwidth and slows LCP. Resize images to their maximum display size before serving.
44. Images are lazy-loaded below the fold
The loading="lazy" attribute on images that aren't in the initial viewport reduces page weight for the initial load. Do not lazy-load the LCP image — this delays it.
45. Image file names are descriptive
seo-audit-checklist-2026.webp provides more signal than IMG_4821.jpg. Rename images at upload.
This section is new for 2026 and reflects the reality that a significant share of informational queries are now answered by AI search engines — ChatGPT, Perplexity, Claude, Gemini — rather than traditional web search. Being indexable and well-structured for these systems is a distinct requirement.
46. llms.txt file is present and correctly formatted
The emerging /llms.txt standard provides AI systems with a structured overview of your site — your key pages, content categories, and any usage guidance. Think of it as robots.txt for LLMs. [See Surfaceable's llms.txt guide for implementation details.]
47. AI crawlers are not blocked in robots.txt Check that GPTBot, ClaudeBot, PerplexityBot, and Googlebot-Extended are not disallowed. Blocking them means your content cannot be used as source material for AI answers.
48. Open Graph and meta tags provide clean, complete metadata AI scrapers — including those used by Perplexity and ChatGPT — pull og:title, og:description, and og:image as part of their content understanding. Clean, accurate OG tags improve how your content is represented in AI-generated summaries.
49. Content includes direct, question-answering passages AI search engines extract specific passages to answer queries. Content structured around direct answers — "What is X?", followed by a clear, self-contained response — is far more likely to be cited than content that buries the answer in narrative prose.
50. AI visibility is being actively monitored Ranking in Google and appearing in AI-generated answers are now separate metrics. Surfaceable tracks your brand's presence across ChatGPT, Claude, Gemini, and Perplexity — showing which prompts surface your content, and where you're invisible. The free tier covers 5 AI prompts per month, which is enough to establish a baseline.
Not everything on this list carries equal weight. Work in this order:
For a fast starting point, run Surfaceable's free 16-check audit — it covers the most common critical issues across technical, on-page, and performance and gives you a baseline score to work from. Then use this full checklist to work through the rest systematically.
The teams that treat an SEO audit as a one-time exercise fall behind. The ones who run through a structured checklist quarterly — and monitor AI visibility alongside traditional rankings — are the ones who maintain and compound their organic visibility over time.
Surfaceable is built for
Try Surfaceable
See how often ChatGPT, Claude, Gemini, and Perplexity mention your brand — and get a full technical SEO audit. Free to start.
Get started free →How to Improve Your SEO Score: A Step-by-Step Guide
Your SEO score reflects the health of your website across technical, content, and performance signals. Here's exactly how to improve it, step by step.
llms.txt: The Complete Guide (What It Is, How to Write One, and Why It Matters)
Everything you need to know about llms.txt — the file standard that helps AI systems understand your website. Includes the exact format, a real annotated example, common mistakes, and how to check compliance.
Meta Tags for SEO: The Complete Guide for 2026
Everything you need to know about meta tags for SEO in 2026 — title tags, meta descriptions, robots directives, Open Graph, and what actually moves rankings.