Technical SEO·Baz Furby·10 min read

Technical SEO Checklist: 40 Checks for a Crawlable, Indexable Site

A practical technical SEO checklist covering crawlability, indexing, site structure, security, redirects, canonical tags, Core Web Vitals, and schema markup.


Technical SEO has always separated the practitioners who actually move rankings from those who cargo-cult tactics from five years ago. A well-structured content strategy means nothing if Googlebot can't crawl your pages, your canonical tags are conflicting, or your Core Web Vitals are dragging down your rankings before a user has even read a sentence.

This checklist covers 40 checks across the technical layer only. It does not cover content quality, backlink building, or on-page optimisation — that is a separate conversation. What follows is everything you need to ensure search engines can find, crawl, understand, and index your site correctly.

Run through this list against any site you're working on. Surfaceable automates 16 of these checks as part of its free SEO audit, flagging issues with priority severity so you know what to fix first.


1. Crawl and Indexation

The most fundamental question in technical SEO: can search engines reach your pages, and are you telling them what to index?

robots.txt

Check 1 — robots.txt exists and is accessible. Your robots.txt file should return a 200 status at yourdomain.com/robots.txt. A 404 here is not catastrophic (Googlebot assumes nothing is blocked), but it is sloppy and can cause confusion with crawlers that behave differently.

Check 2 — No critical pages are blocked. Review your Disallow directives carefully. It is surprisingly common to find / disallowed in production — usually the result of a staging config that was accidentally deployed. Also check that CSS and JavaScript files are not blocked; Google needs them to render pages correctly.

Check 3 — robots.txt references your XML sitemap. Adding Sitemap: https://yourdomain.com/sitemap.xml in your robots.txt file helps crawlers discover your sitemap without relying solely on Search Console submission.

XML Sitemap

Check 4 — A valid XML sitemap exists. Your sitemap should be accessible, return a 200 status, and conform to the sitemap protocol. Validate it at sitemaps.org or submit to Google Search Console to check for errors.

Check 5 — Sitemap only includes indexable URLs. Every URL in your sitemap should return a 200 status and have no noindex directive. Including redirects, noindexed pages, or 404s signals sloppy configuration and wastes crawl budget.

Check 6 — Sitemap is submitted to Google Search Console. Submission does not guarantee indexing, but it gives you a clear view of how many URLs Google has discovered versus indexed, and surfaces errors in the Coverage report.

Crawl Errors

Check 7 — No critical crawl errors in Google Search Console. Check the Coverage report for pages with 404s, 5xx server errors, or crawl anomalies. Address any that involve pages you want indexed.

Check 8 — Crawl budget is not being wasted. If your site has tens of thousands of pages, audit what Googlebot is actually spending time on. Faceted navigation, session ID parameters, and infinite scroll can silently consume crawl budget on duplicate or low-value URLs.

noindex Directives

Check 9 — Intended pages are not noindexed. Use a tool to crawl your site and flag any pages with a noindex meta tag or X-Robots-Tag header. Staging environments and CMS default settings frequently leave noindex on categories, tags, author pages, or pagination — content that may be worth indexing.

Check 10 — noindex is being used intentionally on the right pages. Tag pages, internal search results, thin content pages, and user account pages typically should be noindexed. Confirm your noindex decisions are deliberate and documented.

Blocked Resources

Check 11 — JavaScript and CSS are crawlable. Blocking render-critical resources in robots.txt prevents Google from fully rendering your pages. Use Google Search Console's URL Inspection tool to check the rendered HTML of key pages and confirm content is visible post-render.


2. Site Architecture

How your site is structured determines how effectively PageRank flows through it and how many clicks it takes Googlebot to reach any given page.

URL Structure

Check 12 — URLs are clean, lowercase, and human-readable. Avoid dynamic query string parameters in canonical URLs wherever possible. Use hyphens rather than underscores. Keep URLs short but descriptive.

Check 13 — URLs are consistent — no trailing slash variation. Pick a convention (trailing slash or no trailing slash) and apply it consistently. Mixing the two creates duplicate content and dilutes link equity.

Check 14 — No unnecessary URL parameters are creating duplicates. Tracking parameters (?utm_source=, ?ref=, ?fbclid=) should be handled via Google Search Console's parameter tool or via canonical tags pointing to the clean URL.

Internal Linking Depth

Check 15 — Key pages are within three clicks of the homepage. Pages buried deeper than three clicks receive less crawl attention and less internal PageRank. Review your crawl data to identify orphaned or deeply buried content.

Check 16 — No orphaned pages exist. Orphaned pages — those with zero internal links pointing to them — are invisible to crawlers unless they appear in your sitemap. Audit for them and add contextual internal links.

Pagination

Check 17 — Paginated content is handled correctly. Google no longer supports rel=prev/next pagination signals, but paginated pages should still be individually indexable (if they contain unique content) or noindexed if they are thin. Avoid canonicalising all paginated pages to page 1, which hides content from Google.

Breadcrumbs

Check 18 — Breadcrumbs are implemented and marked up with BreadcrumbList schema. Breadcrumbs clarify site hierarchy to both users and search engines, and BreadcrumbList structured data enables breadcrumb display in search results.


3. Redirects

Redirect handling is where many sites haemorrhage link equity silently.

Check 19 — All redirects are 301, not 302. Use 301 (permanent) redirects for any URL change you intend to be permanent. 302s are temporary by definition and Google may not pass full PageRank through them in all cases.

Check 20 — No redirect chains exist. A redirect chain occurs when URL A redirects to URL B which redirects to URL C. Each hop dilutes link equity and slows crawling. Compress chains to a single hop.

Check 21 — No redirect loops exist. A redirects to B, B redirects back to A. These cause crawl errors and browser failures.

Check 22 — All internal links point to the final destination URL. Internal links should point directly to live pages, not to URLs that redirect. Update your CMS templates and navigation to link to the final canonical URLs.

Check 23 — No broken internal links (404s). Crawl your site and fix any internal links pointing to 404 pages. These waste crawl budget and create a poor user experience.


4. Duplicate Content and Canonicals

Check 24 — Every page has a self-referencing canonical tag. Even if no duplicate content risk exists, a self-referencing canonical confirms to Google which URL is the preferred version of each page.

Check 25 — Canonical tags point to the correct, live URL. A canonical pointing to a noindexed page, a redirecting URL, or a 404 is worse than having no canonical at all. Audit your canonicals for broken or conflicting references.

Check 26 — www and non-www are handled correctly. One version should return a 200, the other should 301 redirect to the canonical version. Both cannot be live simultaneously without canonicalisation.

Check 27 — HTTP redirects to HTTPS. All HTTP URLs should permanently redirect to their HTTPS equivalents. Run a crawl starting from http:// to confirm this is in place.

Check 28 — URL parameters are not creating indexed duplicate content. Use canonical tags or the robots.txt Disallow directive to prevent parameterised URL variants from being indexed separately.


5. Security

Check 29 — The site uses HTTPS with a valid SSL certificate. An expired or invalid SSL certificate triggers browser warnings and can cause Googlebot to fail page crawls. Check your certificate expiry date.

Check 30 — No mixed content warnings exist. Mixed content occurs when an HTTPS page loads resources (images, scripts, stylesheets) over HTTP. Check for mixed content using browser developer tools or a crawling tool.

Check 31 — Security headers are present. At minimum, your site should return X-Content-Type-Options, X-Frame-Options, and a Content-Security-Policy header. These are not direct ranking factors, but they affect trust signals and reduce attack surface.


6. Core Web Vitals

Core Web Vitals are Google's user experience metrics and a confirmed ranking signal. The three metrics to focus on are LCP, INP, and CLS.

LCP (Largest Contentful Paint)

Check 32 — LCP is under 2.5 seconds. LCP measures how long it takes for the largest visible element (typically a hero image or heading) to render. Common causes of poor LCP: unoptimised hero images, render-blocking JavaScript, slow server response times (TTFB).

Check 33 — LCP resource is preloaded. Add <link rel="preload"> for your LCP image to ensure the browser fetches it as early as possible in the loading sequence.

INP (Interaction to Next Paint)

Check 34 — INP is under 200 milliseconds. INP replaced First Input Delay as of March 2024. It measures the latency of all user interactions throughout the page lifecycle, not just the first. Long JavaScript tasks, excessive event listeners, and heavy third-party scripts are the primary culprits.

CLS (Cumulative Layout Shift)

Check 35 — CLS is under 0.1. CLS measures unexpected layout shifts during page load. Common causes: images without defined width and height attributes, ads injected above existing content, web fonts causing text reflow (FOUT).


7. International SEO (hreflang)

Skip this section if your site serves a single language and region only.

Check 36 — hreflang tags are implemented correctly. hreflang signals which language and regional version of a page to serve to which users. Each hreflang tag must include a reciprocal tag on the corresponding page — without it, Google ignores the tag entirely.

Check 37 — hreflang uses correct language and region codes. Use BCP 47 language codes (en, en-GB, en-US, fr, de). Country codes must be ISO 3166-1 alpha-2. Errors here are extremely common and render hreflang ineffective.


8. Schema Markup

Structured data helps search engines understand your content and unlocks rich result features in SERPs.

Check 38 — Organisation schema is present on the homepage. Organisation schema should include your business name, logo, URL, and social profiles. This contributes to Knowledge Panel eligibility and improves entity disambiguation.

Check 39 — WebSite schema is present and includes a SearchAction. WebSite schema with a SearchAction enables sitelinks search boxes in Google's results for branded queries. Implement it on your homepage.

Check 40 — BreadcrumbList schema matches your breadcrumb navigation. If you are implementing breadcrumbs (check 18), ensure the BreadcrumbList structured data matches the visible breadcrumb text and URLs exactly. Google validates against the visible page content.


Running These Checks Efficiently

Working through 40 checks manually is time-consuming. Surfaceable runs 16 core technical SEO checks automatically as part of its free site audit, covering canonicals, indexation signals, sitemap validity, redirect handling, and HTTPS status. For Core Web Vitals, use Google's PageSpeed Insights or CrUX data in Search Console directly.

Prioritise fixes in this order: anything that prevents indexation comes first (noindex errors, robots.txt blocks), then redirect chains and broken links, then performance. Structural and schema improvements have compounding value but rarely require the same urgency as crawlability issues.

The goal of this checklist is not to achieve perfection across all 40 points simultaneously — it is to identify which issues are actively costing you rankings and fix those first.


Try Surfaceable

Track your brand's AI visibility

See how often ChatGPT, Claude, Gemini, and Perplexity mention your brand — and get a full technical SEO audit. Free to start.

Get started free →