Key takeaways:

  1. Download the template (Google Sheet format) in a language of your choosing:
  2. Tools you will need
  3. Things you should know before you begin
  4. Jump to the instructions

The goal of technical SEO is to reduce all possible barriers that are making it difficult for search engines to discover pages, images, and other files that you want shown on search engine result pages.

But this is not realistic, for example:

  • clients have budget constraints
  • correcting legacy issues are cost-prohibitive and requires buy-in from stakeholders who are not involved in the current remit
  • inadequate development or engineering resources.

Therefore, your job is to (i) identify all possible issues, then (ii) highlight mission-critical technical SEO issues and this checklist will allow you to do so.

Technical SEO checklist and their criterion Show

  • You do not need to check every criterion
  • Depending on the purpose of your audit, what you hope to achieve, how much time you have, and what the final deliverable looks like, you may focus on a selection of criteria related to these variables
  • While I created this checklist for auditing Wordpress builds, you can use it for any CMS or platform
  • Anyone can follow the instructions and I encourage beginners and SEO juniors/executives to give this a go
  • I share a lot of technical SEO tips on my website
  • Alternatively, learningseo.io and ContentKing Academy are recommended sources of SEO truth

Instructions

  1. Choose your preferred language – the technical SEO checklist is available in Arabic. Spanish, Hindi, English, and Indonesian.
  2. You will be prompted to make a copy of the Google Sheet.
  3. Depending on why you’re carrying out this technical SEO audit, navigate to the appropriate sections:
  4. Click on the corresponding link to access step-by-step instructions

    Note: doing so will take you to a section on this webpage
  1. Click on the toggle to select an appropriate response based on the instructions provided

    Note: doing so will set off an IFS statement in the spreadsheet. This will give you a high-level recommendation on whether you should dive deeper or move onto another issue.

>> Crawling-related

Crawling refers to the discovery process that a search engine must take in order to find all the URLs on a website. Search engines such as Google must be able to crawl a website before it can index it.

Therefore, even if you suspect that there are indexing issues, first start by removing possible crawling barriers.

Is there a robots.txt file?

How to check if a website has a robots.txt file:

  1. Append “robots.txt” to your root domain (e.g., danielkcheung.com/robots.txt)
  2. If the page loads in your browser, a robots.txt file exists
    • In this instance, mark ‘yes’ in the checklist
  3. If you get a 404 error, a robots.txt filed does not exit
    • In this instance, mark ‘no’ in the checklist

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, review if any crawlers and search engines have been blocked.

>> If ‘no’, consider creating and uploading a robots.txt file especially if you suspect Google is having difficulty discovering your URLs.

Recommended reading:

Are crawlers or search engine crawlers being blocked in the robots.txt file?

How to check if robots.txt is blocking Google using Search Console:

Note: the following does not work for Domain-verified GSC properties – if you don’t have URL-prefix properties, use the manual eyeball test

  1. Go to Google’s Robots Testing Tool
  2. Scroll through the robots.txt code to locate any highlighted syntax warnings and logic errors
  3. Type in the URL of a page on your site in the text box at the bottom of the page
  4. Select the user-agent you want to simulate in the dropdown list to the right of the text box
  5. Click the TEST button to test access
  6. Check to see if TEST button now reads ACCEPTED or BLOCKED to find out if the URL you entered is blocked from Google web crawlers
  7. Edit the file on the page and retest as necessary.
  8. Copy your changes to your robots.txt file on your site.

How to check if robots.txt is blocking any search engine manually:

  1. Open the robots.txt file in your browser
  2. See if any of the following instances exist in the file:
    • User-agent: Googlebot
      Disallow: /
    • User-agent: Bingbot
      Disallow: /
    • User-agent: *
      Disallow: /
  3. If you see any of the above instances, mark ‘yes’ in the checklist
  4. If you do not see any user-agents being blocked, mark ‘no in the checklist’.

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, one or more crawlers or search engines are being blocked. This is not necessarily wrong and your next is to understand what is being blocked and why.

>> If ‘no’, the robots.txt is not causing any crawling issues.

Recommended reading:

Are paginated URLs being blocked in robots.txt?

Note: Some SEO plugins offer the option to block pagination series from being crawled by search engines.

How to check if paginated series are being blocked in robots.txt:

  1. Open the robots.txt file in your web browser
  2. See if any of the following instances exist in the file:
    • Disallow: /blog-page/page
    • Disallow: /?page=
    • Disallow: /&page=
  3. If you see any of these in the robots.txt file, check ‘yes’ in the spreadsheet.

What this means:

> If ‘yes’, verify with the site owner why this has happened. Paginated series should not usually be blocked in robots.txt although there are specific scenarios where this may be done.

>> If ‘no’, the robots.txt is not causing any crawling or indexing issues in relation to deeper URLs found on paginated series.

Recommended reading:

Are JavaScript files being blocked in robots.txt?

How to find out if JS is being blocked in robots.txt using Screaming Frog:

  1. Go to CONFIGURATION > SPIDER > RENDERING and change rendering from “text-only” to “JavaScript”
  2. Change AJAX TIMEOUT from 5-seconds to 8-seconds then click the OK button
  1. Crawl the entire site, a section within the website, or a sample of the website
  2. Select any HTML file that has been crawled and navigate to RENDERED PAGE tab
  3. See if the content of the page has loaded
    • If you do not see the page loaded correctly in the RENDERED PAGE tab, proceed to the next step
    • If you see the page load correctly, proceed to the next step to verify if JS are blocked in robots.txt file
  4. Go to BULK EXPORT > RESPONSE CODES > BLOCKED BY ROBOTS.TXT INLINKS and open the .csv file and scan for JS that has been blocked in robots.txt

How to find out if JS is being blocked in robots.txt manually:

Note: For most Wordpress websites, JavaScript comes from the theme and is typically housed in ../wp-content/themes/yourTheme. Therefore, load the robots.txt file in a web browser and see if /wp-content/ is in the file.

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, JS is being blocked and this may be making it harder for Google to render the content on the pages you want to rank.

>> If ‘no’, JS files are not being blocked from crawlers in the robots.txt file. If the website is experiencing indexing issues, this is not the cause.

Recommended reading:

Does the robots.txt file disallow parametized URLs?

How to check if parametised URLs are blocked in robots.txt:

  1. Open the robots.txt file in your web browser
  2. See if any of the following instances exist in the file:
    • User-agent:*
      Disallow: /products/t-shirts?
    • User-agent:*
      Disallow: /products/jackets?
  3. If you do, check ‘yes’ in the spreadsheet.

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, one or more parametized URLs are blocked in robots.txt. In most use cases, parametised URLs should not be disallowed in robots.txt file because Googlebot cannot read the rel-canonical if it has been blocked from crawling said parametised URL.

>> If ‘no’, there is another reason why the website is experiencing indexing issues.

Recommended reading:

Are there any host status problems reported in Search Console crawl stats?

How to check if there are crawl stat problems in GSC:

  1. Log into Google Search Console
  2. On the left hand side navigation, click on SETTINGS
  3. Then click on OPEN REPORT in the Crawl stats section
  1. In the Hosts dashboard, check the Status column:
    • If you see a green tick icon, mark ‘no’ in the checklist
    • If you see a red exclamation icon, mark ‘yes’ in the checklist.

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, Google encountered a problem with your server or web host in the last 7 days. If the crawler consistently encounters 4XX and 5XX response codes for a long period of time, your URLs can drop out of Google’s index.

>> If ‘no’, Google didn’t encounter any significant crawl availability issues on your site in the past 90 days.

Recommended reading:

Are there crawl requests being made as reported in Search Console crawl stats?

How to find out if Google has made crawl requests in the last 90 days in GSC:

  1. Log into Google Search Console
  2. On the left hand side navigation, click on SETTINGS
  3. Click on OPEN REPORT in the Crawl stats section
  1. Look at TOTAL CRAWL REQUESTS
    • If the number > 0, check ‘yes’ in the spreadsheet
    • If the number 0, check ‘no’ in the spreadsheet

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, Google crawled the website in the last 90 days. If you think there are crawling issues, this data point indicates that the issue (or issues) is on your end and not due to Google’s inability to crawl the website.

>> If ‘no’, Google did not crawl any URLs in the last 90 days. This is a symptom that something crawling-related requires further investigation. I recommend checking if crawlers or assets are being blocked in robots.txt and if a sitemap has been submitted to Search Console.

Recommended reading:

Are there crawl requests being made for “refresh” purposes in Search Console crawl stats?

How to find out if Google has made crawl requests for refresh reasons in GSC:

  1. Log into Google Search Console
  2. On the left hand side navigation, click on SETTINGS
  3. Click on OPEN REPORT in the Crawl stats section
  1. Scroll down until you see BY PURPOSE
    • If you see REFRESH followed by a % greater than zero, check ‘yes’ in the spreadsheet
    • If you see a 0% next to REFRESH, check ‘no’ in the spreadsheet

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, Google crawled the website in the last 90 days and recrawled known URLs. If you think there are crawling issues, this data point indicates that the issue (or issues) is on your end and not due to Google’s inability to crawl the website.

>> If ‘no’, Google did not recrawl any URLs in the last 90 days. This is a symptom that something crawling-related requires further investigation. I recommend checking if crawlers or assets are being blocked in robots.txt and if a sitemap has been submitted to Search Console.

Recommended reading:

Are there crawl requests being made for “discovery” purposes in Search Console crawl stats?

How to find out if Google has tried to discover new URLs via GSC:

  1. Log into Google Search Console
  2. On the left hand side navigation, click on SETTINGS
  3. Click on OPEN REPORT in the Crawl stats section
  1. Scroll down until you see BY PURPOSE
    • If you see DISCOVERY followed by a % greater than zero, check ‘yes’ in the spreadsheet
    • If you see a 0% next to DISCOVERY, check ‘no’ in the spreadsheet.

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, Google has crawled new URLs in the last 90 days.

>> If ‘no’, Google did not any new URLs in the last 90 days. This is a symptom that something crawling-related requires further investigation. I recommend checking if crawlers or assets are being blocked in robots.txt and if a sitemap has been submitted to Search Console.

Recommended reading:

According to Search Console crawl stats, have HTML files been crawled in the last 72-hours?

How to find out if Google has crawled HTML files in GSC:

  1. Log into Google Search Console
  2. On the left hand side navigation, click on SETTINGS
  3. Click on OPEN REPORT in the Crawl stats section
  1. Scroll down until you see BY FILE TYPE
    • If you see HTML followed by a % greater than zero, check ‘yes’ in the spreadsheet
    • If you see a 0% next to HTML, check ‘no’ in the spreadsheet.

Possible answers:

  • yes
  • no

What this means:

> If ‘yes’, Google has crawled HTML files on the website in the last 72 hours. This indicates that Google has been able to crawl some sections of the website.

>> If ‘no’, Google has not crawled any HTML files on the website in the last 72 hours. This is a significant red flag and warrants further investigation as to why it was unable to do so. For example, review robots.txt, site-wide meta tags, and server status.

Does the website have a sitemap?

How to find if a website has a sitemap:

  • Append “sitemap.xml” or “sitemap_index.xml” to the root domain (e.g., danielkcheung.com/sitemap.xml)
  • Check if robots.txt references the sitemap
  • Look in the primary and footer navigation menus for a HTML sitemap
  • Check in Google Search Console.

Possible answers:

  • yes
  • no

What this means:

> If ‘yes’, a sitemap exists. This is a good start but warrants further investigation.

>> If ‘no’, consider creating one or more sitemaps and submitting them to Search Console and Bing Webmaster.

Recommended reading:

Is the sitemap XML or HTML?

How to work out if a website’s sitemap is HTML or XML?

  1. Does the sitemap end in “.XML”?
    • If it does, check ‘yes’ in the spreadsheet
    • If it does not end in “.XML” and follows a subfolder path such as “domain/sitemap”, check ‘no’ in the spreadsheet.
  2. Does the sitemap end in “.GZ” extension?
    • If it does, check ‘yes’ in the spreadsheet as this is a compressed XML sitemap
    • If not, it is not a compressed XML sitemap.

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, an XML sitemap exists.

>> If ‘no’, the website use a HTML sitemap. This will make it more difficult for you to analyse as crawlers such as Screaming Frog and Sitebulb cannot crawl HTML sitemaps.

>>> If ‘not applicable’, no HTML or XML sitemap exists. If the website has more than 500 indexable URLs, consider creating a sitemap to improve Google’s ability to crawl all internal URLs.

Is the sitemap index URL(s) referenced in the robots.txt file?

How to check if robots.txt mentions the sitemap address:

  1. Open the robots.txt file in your browser
  2. Search for “sitemap” and see if a full URL has been provided
    • If you see this, check ‘yes’ in the spreadsheet
    • If you do not find “sitemap” in the robots.txt, check ‘no’ in the spreadsheet
    • If there is no robots.txt file, check ‘not applicable’ in the spreadsheet

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, the sitemap is referenced in robots.txt and crawlers such as Googlebot can easily find and crawl it.

>> If ‘no’, the sitemap is not referenced in robots.txt. As long as the sitemap has been submitted to Google Search Console, this is not a red flag. If no sitemap has been submitted to GSC, this should be something you address ASAP if crawling and indexing is an issue.

How many URLs are in the sitemap?

How to find out how many URLs are in a sitemap using Screaming Frog:

  1. Change mode from Spider to List
  2. Click on UPLOAD button and select DOWNLOAD FROM URL
  3. Copy and paste the full URL of the XML sitemap
  4. Click OK
  5. Screaming Frog will analyse the sitemap and report the number of URLs found in the XML file.

Are there more than 1000 URLs in a sitemap?

Refer to the previous criterion to find out how many URLs are contained in a sitemap.

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, consider splitting large sitemaps into smaller ones if you are experiencing crawling and indexing issues.

>> If ‘no’, the website is relatively small and probably does not warrant more than one sitemap.

Recommended reading:

Are non-indexable URLS in the sitemap?

How to identify if non-indexable URLs are in a sitemap using Screaming Frog:

  1. Copy the full address of the sitemap file (e.g., domain/sitemap.xml)
  2. In Screaming Frog, go to MODE > LIST
  3. Click UPLOAD button and select DOWNLOAD XML SITEMAP
  1. Paste the full address of the sitemap from step #1 then click OK button

    Note: it will take Screaming Frog a few seconds to extract all URLs contained in one ore more sitemap files
  1. Click OK button and this will initiate Screaming Frog to crawl all the URLs discovered from the sitemap
  2. Upon crawl completion, go to SITEMAP tab

    Note: if you do not see SITEMAP tab, click on the toggle to reveal a list of tabs
  1. In OVERVIEW tab, you will see the total number of non-indexable URLs that have been included in the sitemap:
    • If the number of non-indexable URLs in sitemap > 0, check ‘yes’ in the spreadsheet
    • If the number of non-indexable URLS in sitemap = 0, check ‘no’ in the spreadsheet
  1. You can also extract a full list of all these non-indexable URLs that are currently in the sitemap.

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, this means that there are non-indexable URLs in the sitemap(s). If you see a high number of 301 and 404 URLs in the sitemap, investigate how the sitemap is generated. For most Wordpress sites with Yoast, sitemaps are generated and updated automatically.

>> If ‘no’, only indexable URLs are found in the sitemap. This is usually a good thing and indicates that any crawling or indexing issues the site is experiencing is not due to sitemap inefficiencies.

Recommended reading:

Are there URLs that are blocked in robots.txt that are found in the sitemap?

How to check if URL blocked in robot.txt is included in a sitemap in Screaming Frog:

  1. Follow steps #1-8 in ‘how to identify if non-indexable URLs are in a sitemap using Screaming Frog’
  2. Navigate to the SITEMAPS tab and look at the INDEXABILITY STATUS:
    • If you see “blocked by robots.txt”, check ‘yes’ in the spreadsheet
    • If you see “canonlicalized”, “noindex” or if there are no URLs shown in the tab, check ‘no’ in the spreadsheet

Possible answers:

  • yes
  • no
  • not applicable

What this means:

Note: Google can index URLs that have been blocked in robots.txt – you will see these in Search Console’s COVERAGE tab as “Valid with warning”.

> If ‘yes’, there are blocked URLs contained in the sitemap. Investigate whether these URLs should be blocked in robots.txt or whether or not these URLs should be excluded from the sitemap.

>> If ‘no’, there are no URLs blocked in robots.txt found in the sitemap.

Have sitemaps been submitted to Google Search Console?

How to check if sitemaps have been submitted to GSC:

  1. Log into Google Search Console
  2. Go to INDEX > SITEMAPS

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, sitemaps have been submitted to Search Console. Now check if they have been processed.

>> If ‘no’ and if a sitemap does exist, manually submit the sitemap URL to GSC.

Recommended reading:

Has Google Search Console processed the submitted sitemap(s)?

How to confirm if submitted sitemaps have been proceed by GSC:

  1. Log into Google Search Console
  2. Go to INDEX > SITEMAPs
  1. Look at the STATUS column
    • If you see “Success” and DISCOVERED URLS > 0, check ‘yes’ in the spreadsheet
    • If you see “error”, check ‘no’ in the spreadsheet
    • If you see “processing’, ‘check ‘no’ in the spreadsheet

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, URLs in the sitemap has been processed by GSC.

>> If ‘no’, GSC has encountered an error with crawling the URLs in the submitted sitemap file or it has not yet processed the file yet.

Recommended reading:

Can internal search URLs be crawled?

How to check if internal search URLs can be crawled: using SEO Pro Extension

  1. In a web browser, append “/?s={query}” to the domain name (e.g., danielkcheung.com.au/?s=seo)

    Note: this will mimic any internal search for a Wordpress site
  2. Run SEO Pro Extension once the page loads
  1. The internal search page should return a noindex result
    • If you see noindex, check ‘no’ in the spreadsheet
    • If you index, check ‘yes’ in the spreadsheet

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, internal search result pages are indexable. For most Wordpress sites, this is not recommended as internal result pages don’t offer users a good user experience. This is because there are an infinite number of URLs that can be generated by this. If you encounter this situation, consult with your client to understand why internal search pages are indexable as this will have an impact on crawl budget.

>> If ‘no’, internal search result pages are non-indexable. In most use cases, this configuration is recommended. FYI, plugins such as Yoast usually handle this. Alternatively, robots.txt may be blocking this query string.

Recommended reading:

Are any redirects done via JavaScript?

How to check if redirects are done via JS using Screaming Frog:

  1. Go to CONFIGURATION > SPIDER > RENDERING and change rendering from “text-only” to “JavaScript”
  2. Change AJAX TIMEOUT from 5-seconds to 8-seconds then click the OK button
  1. Crawl the entire site, a section within the website, or a sample of the website
  1. Once done, go to OVERVIEW tab, scroll down to RESPONSE CODE section and look at the “Redirection (JavaScript)” row:
    • If you see a number > 0, check ‘yes’ in the spreadsheet
    • If you see 0, check ‘no’ in the spreadsheet

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, one or more redirects are done so via JavaScript. This means that the redirect happens client-side and in most use cases, a server-side 301/302 redirect is preferred. Clarify with the client why this is the case because there are easier and better ways to implement redirects on a Wordpress build.

>> If ‘no’, no redirects are done via JavaScript – which is to be expected for 99% Wordpress websites.

Recommended reading:

Are redirects being done client-side via meta refresh?

How to check if meta refresh redirects exist via Screaming Frog:

  1. Go to CONFIGURATION > SPIDER > RENDERING and change rendering from “text-only” to “JavaScript”
  2. Change AJAX TIMEOUT from 5-seconds to 8-seconds then click the OK button
  1. Crawl the entire site, a section within the website, or a sample of the website
  1. Once done, go to OVERVIEW tab, scroll down to RESPONSE CODE section and look at the “Redirection (Meta Refresh)” row:
    • If you see a number > 0, check ‘yes’ in the spreadsheet
    • If you see 0, check ‘no’ in the spreadsheet

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, one or more redirects are done so via meta refresh.

>> If ‘no’, no redirects are done via meta refresh – which is to be expected for 99% Wordpress websites.

Recommended reading:

Does the website have pagination?

How to check if pagination exists:

  1. In a web browser, navigate to a blog or product category page
  2. Scroll down to the bottom of the page and look for “show more” or “read more” or a series of numbers (refer to below screenshot):
    • If you see this, check ‘yes’ in the spreadsheet
    • If you do not see this, check ‘no’ in the spreadsheet

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, the are paginated URLs and this warrants further investigation into how canonicals and indexation is managed.

>> If ‘no’, the website has no pagination.

Have paginated URLs been included in the sitemap(s)?

How to check if paginated URLs are in a sitemap using Screaming Frog:

  1. Copy the full address of the sitemap file (e.g., domain/sitemap.xml)
  2. In Screaming Frog, go to MODE > LIST
  3. Click UPLOAD button and select DOWNLOAD XML SITEMAP
  1. Paste the full address of the sitemap from step #1 then click OK button

    Note: it will take Screaming Frog a few seconds to extract all URLs contained in one ore more sitemap files
  1. Click OK button and this will initiate Screaming Frog to crawl all the URLs discovered from the sitemap
  2. Upon crawl completion, go to SITEMAP tab

    Note: if you do not see SITEMAP tab, click on the toggle to reveal a list of tabs
  1. In OVERVIEW tab, click on ALL to reveal all URLs included in the sitemap(s)
  2. Scan for pagination URLs in the ADDRESS column:
    • If you find paginated URLs, check ‘yes’ in the spreadsheet
    • If there are no paginated URLs, check ‘no’ in the spreadsheet

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, there are paginated URLs in the sitemap – this not ideal. Consider removing them because URLs in the sitemap should only contain pages that you want to rank – and paginated series do not usually fit this criteria. That is, the deeper URLs found in paginated series tend to be more important than the paginated series. The one exception to this rule is including the “View All” URL in the sitemap.

>> If ‘no’, paginated URLs are not in sitemap.

Recommended reading:

Are there any internal links not wrapped by an <a> tag?

Note: essentially you’re looking for JavaScript links

How to find internal links not wrapped by an <a> tag using Screaming Frog:

  1. Go to CONFIGURATION > SPIDER > RENDERING and change rendering from “text-only” to “JavaScript”
  2. Change AJAX TIMEOUT from 5-seconds to 8-seconds then click the OK button
  1. Crawl the entire site, a section within the website, or a sample of the website
  2. Once crawl is done, go to JAVASCRIPT tab
  1. Select CONTAINS JAVASCRIPT LINKS from the dropdown toggle

    Note: this will reveal internal hyperlinks that are not in the raw HTML and are only discoverable in the rendered HTML.
  1. Look at the UNIQUE JS INLINK column:
    • If you see a number > 0, check ‘yes’ in the spreadsheet
    • If there are no URLs shown, check ‘no’ in the spreadsheet

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, there are hyperlinks that are discoverable in the rendered HTML after JavaScript execution. While Google is able to render pages and see these JS links, it does not happen as quickly and as efficiently as hyperlinks served server-side in the raw HTML. If the website is experiencing non-optimal crawling and indexing, it could be due to render blockers in place. I recommend that you check to see if important content can be seen with JS disable.

>> If ‘no’, there are no links discoverable client-side only. All internal links are available to be crawled in the raw HTML. Therefore, this is not the reason why Google is having difficulty crawling the pages on the website.

Recommended reading:

>> Rendering-related

Rendering is the process where a search engine visits a URL and sees what information is found on the page. To do this, it will run the code on the page and spend a few seconds to see what is displayed.

JavaScript is the biggest cause of rendering issues and this usually is a result of the Wordpress theme or pagebuilder. JavaScript requires client-side rendering which means that any content that requires JS will only be visible and discoverable in the rendered HTML. This means that in its initial crawl of the page, Google will not see this content in the raw HTML.

Is there a trend of increasing “Submitted URL seems to be a soft 404” in Search Console index coverage errors?

What theme is being used?

Are there known issues with this theme?

Does the website use a pagebuilder?

Does the pagebuilder have known issues?

Is lazy loading applied to above-the-fold images?

Does the primary navigation menu load when JavaScript is disabled?

Is all the body content visible when JavaScript is disabled?

>> Indexing-related

Indexing is the process how and where a search engine places a URL on its result pages. This is often referred to as ‘ranking’.

For the purpose of this technical SEO checklist, indexing and ranking will be treated differently. Instead, indexing-related criteria will help expose potential symptoms that are causing indexing issues.

In this section, you will look for signs that contribute to common indexing issues such as:

  • Crawled – not indexed
  • Discovered – not indexed
  • Canonical URLs being excluded despite having the correct canonical tag set.

Does the website rank in the top 2 positions when searching for the brand name in Google?

How to check if a website ranks for its own brand name:

  1. Open a private tab or window in your web browser to perform an incognito search
  2. Carry out a search using the brand name
  3. Manually verify if the website ranks in the top 2 positions of the results page
    • If the website is ranking on the first 2 positions for its brand name, check ‘yes’ in the spreadsheet
    • If the website is ranking in positions 3-10 for its brand name, check ‘no’ in the spreadsheet
    • If the website does not rank in the top 2 positions for its brand name, check ‘no’ in the spreadsheet.

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, this indicates that Google has associated the brand name (i.e., entity) with the website. This is a good thing and 9.99 out of 10 cases should result in a positive finding.

>> If ‘no’, there may be one or more serious technical SEO issues at play. Alternatively, the website may be new and/or has no backlinks to it, or, the brand name is a common word used in everyday conversation (e.g., chicken pie).

Recommended reading:

Does http redirect to https?

How to check if http redirects to https in one hop:

  1. In a web browser, load the homepage of the website
  2. In the web browser address bar, remove the “s” in https://domain and reload the page
  3. Once the page loads, use SEO PRO EXTENSION
  1. Click on the STATUS tab
    • If you see 1x 301, check ‘yes’ in the spreadsheet
    • If you see more than one 301, check ‘no’ in the spreadsheet.
    • If a new URL loads and SEO PRO EXTENSION does not show a 301 at all and only a single 200 (OK), check ‘no’ in the spreadsheet

What this means:

> If ‘yes’, http to https redirection has been configured correctly and is not causing duplicate and indexable version of the same URL.

>> If ‘no’, reduce the number of 301 hops to one or enable the correct 301 redirect from http to https via htcaccess.

Recommended reading:

Is the site www or non-www?

How to check if the website has www subdomain or not:

  1. In a web browser, open any page on the website – it can be the homepage
  2. Look at the browser’s address bar – sometimes you will need to double click on it to reveal the full address including subdomains
  3. See if the domain name has www or not.

Possible answers:

  • www
  • non-www

What this means:

From an indexing perspective, it is best practice to tell a search engine what the primary version of a URL is. Having both www and non-www versions as indexable URLs will confuse the search engine and may result in a the wrong version being chosen as the canonical.

> If ‘www’, the website uses the www-subdomain. All internal links and backlinks should point to the www-version and non-www versions should 301 redirect to the www-version.

> If ‘non-non’, the website does not use the www-subdomain. All internal links and backlinks should point to the non-www version and www versions should 301 redirect to the non-www version.

If www by default, does non-www redirect to www?

If non-www by default, does www redirect to non-www?

Do URLs end with a trailing slash or non-trailing slash?

How to check if URLs end with a trailing slash or non-trailing slash:

  1. In a web browser, open any page on the website
  2. Click on a link from the navigation menu or internal link
  3. Look at the browser’s address bar and see if the URL ends with a trailing slash (e.g, domain/about/) or without a trailing slash (e.g., domain/about).

Possible answers:

  • trailing slash
  • non-trailing slash

What this means:

From an indexing perspective, it is best practice to tell a search engine what the primary version of a URL is. Having both / and non-/ versions as indexable URLs will confuse the search engine and may result in a the wrong version being chosen as the canonical.

> If ‘trailing slash’, URLs ending with “/” are the preferred version. Therefore, all internal links and backlinks should point to URLs ending with a trailing slash. Similarly, non-trailing slash URLs should 301 redirect to the trailing slash version.

> If ‘non-trailing slash’, URLs ending without “/” are the preferred version. Therefore, all internal links and backlinks should point to URLs ending without a trailing slash. Similarly, trailing slash URLs should 301 redirect to the non-trailing slash version.

If URLs have a trailing slash by default, do URLs without trailing slash redirect to trailing slash?

If URLs do not have a trailing slash by default, do URLs with a trailing slash redirect to non-trailing slash?

Are there orphaned URLs in the sitemap?

Note: the website you’re auditing will need to have a sitemap for you to check this

How to find if there are orphan URLs are in a sitemap using Screaming Frog:

  1. Copy the full address of the sitemap file (e.g., domain/sitemap.xml)
  2. In Screaming Frog, go to MODE > LIST
  3. Click UPLOAD button and select DOWNLOAD XML SITEMAP
  1. Paste the full address of the sitemap from step #1 then click OK button

    Note: it will take Screaming Frog a few seconds to extract all URLs contained in one ore more sitemap files
  1. Click OK button and this will initiate Screaming Frog to crawl all the URLs discovered from the sitemap
  2. Upon crawl completion, go to CRAWL ANALYSIS > START
  3. Navigate to the SITEMAPS tab and on the right hand side panel you will see a number count of ORPHAN URLS in sitemap:
    • If you see a number greater than 0, check ‘yes’ in the spreadsheet
    • If you see 0, check ‘no’ in the spreadsheet

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, there are URLs that have no internal links pointing to them. This means that it is very difficult for search engines to discover these pages on the website. Confirm if these pages are important and add internal links to these orphan URLs from the navigation menu, homepage, or other pages that get organic traffic.

>> If ‘no’, all URLs have another URL linking to them. However, if you are experiencing indexing issues, find more opportunities for these URLs to be internally linked from.

Are there 301 URLs in the sitemap?

Note: the website you’re auditing will need to have a sitemap for you to check this

How to find 301 URLs in a sitemap using Screaming Frog:

  1. Copy the full address of the sitemap file (e.g., domain/sitemap.xml)
  2. In Screaming Frog, go to MODE > LIST
  3. Click UPLOAD button and select DOWNLOAD XML SITEMAP
  1. Paste the full address of the sitemap from step #1 then click OK button

    Note: it will take Screaming Frog a few seconds to extract all URLs contained in one ore more sitemap files
  1. Click OK button and this will initiate Screaming Frog to crawl all the URLs discovered from the sitemap
  2. Upon crawl completion, go to CRAWL ANALYSIS > START
  3. When the analysis is done, go to SITEMAP tab

    Note: if you do not see SITEMAP tab, click on the toggle to reveal a list of tabs
  1. In OVERVIEW tab, you will see the total number of non-indexable URLs that have been included in the sitemap
  1. Click on STATUS CODE to sort the column in ascending or descending order
  2. Scan for any 301 results:
    • If you see one or more 301 results in the STATUS CODE column, check ‘yes’ in the spreadsheet
    • If you see no 301 results in the STATUS CODE column, check ‘no’ in the spreadsheet

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, there are 301 URLs in the sitemap. Generally speaking, only indexable URLs should be in a sitemap (i.e., status code 200 only). However, in some instances, it is good practice to have a separate sitemap of 301 redirects so that they can be crawled, mapped and monitored easily. Generally speaking, having 301 URLs in a sitemap is not cause for concern but can point to a history of technical issues that have not been resolved.

>> If ‘no’, there are no 301 URLs in the sitemap.

Recommended reading:

Are there 404 URLs in the sitemap?

Note: the website you’re auditing will need to have a sitemap for you to check this

How to find 404 URLs in a sitemap using Screaming Frog:

  1. Copy the full address of the sitemap file (e.g., domain/sitemap.xml)
  2. In Screaming Frog, go to MODE > LIST
  3. Click UPLOAD button and select DOWNLOAD XML SITEMAP
  1. Paste the full address of the sitemap from step #1 then click OK button

    Note: it will take Screaming Frog a few seconds to extract all URLs contained in one ore more sitemap files
  1. Click OK button and this will initiate Screaming Frog to crawl all the URLs discovered from the sitemap
  2. Upon crawl completion, go to CRAWL ANALYSIS > START
  3. When the analysis is done, go to SITEMAP tab

    Note: if you do not see SITEMAP tab, click on the toggle to reveal a list of tabs
  1. In OVERVIEW tab, you will see the total number of non-indexable URLs that have been included in the sitemap
  1. Click on STATUS CODE to sort the column in ascending or descending order
  2. Scan for any 404 results:
    • If you see one or more 404 results in the STATUS CODE column, check ‘yes’ in the spreadsheet
    • If you see no 404 results in the STATUS CODE column, check ‘no’ in the spreadsheet

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, there are 404 URLs in the sitemap. Generally speaking, only indexable URLs should be in a sitemap (i.e., status code 200 only). However, in some instances, it is good practice to have a separate sitemap of deleted URLs so that they can be crawled, mapped and monitored easily. Generally speaking, having 404 URLs in a sitemap is not cause for concern and do not directly contribute towards indexing issues. But seeing a large number of 404 status code URLs in a sitemap can point to a history of technical issues that have not been resolved.

>> If ‘no’, there are no 404 URLs in the sitemap.

Recommended reading:

How are sitemaps generated on the website?

How to find out how sitemaps are generated:

  1. Log into into Wordpress dashboard
  2. Look for an SEO plugin such as Yoast, AIOSEO, Rankmath etc
  3. Check if the plugin has been tasked with creating sitemaps

Do any of the sitemaps in Google Search Console show errors?

How to check if there are sitemap errors in GSC:

  1. Log into Google Search Console
  2. Go to INDEX > SITEMAPS
  3. Check the STATUS column for errors
    • If an error is reported, check ‘yes’ in the spreadsheet
    • If no errors are reported, check ‘no’ in the spreadsheet.

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, check if the submitted sitemap is relevant and current. If it is, investigate why GSC is reporting an error as this may give you a clue as to what is blocker to efficient crawling and indexing for the website.

>> If ‘no’, there are no sitemap errors reported by Search Console.

Are there errors reported in Search Console index coverage?

How to check if there are sitemap errors in GSC:

  1. Log into Google Search Console
  2. Go to INDEX > COVERAGE
  3. Check if any errors are reported (see below screenshot for example):
    • If an error is reported, check ‘yes’ in the spreadsheet
    • If no errors are reported, check ‘no’ in the spreadsheet.

Possible answers:

  • yes
  • no
  • not applicable

What this means:

> If ‘yes’, one or more URLs could not be indexed by Google and cannot be found via Google search. There are 8 types of possible errors that are documented in ‘index coverage report’ by Search Console Help.

>> If ‘no’, there are no URLS that could not be indexed by Google. However, this does not mean that there are no indexing issues.

Recommended reading:

Are these errors expected?

Are there URLs marked as “Submitted URL blocked by robots.txt in Search Console index coverage?

Are there valid URLs reported in Search Console index coverage?

Are there an increasing number of URLs reported as “Indexed not submitted in sitemap” in Search Console index coverage?

Are there excluded URLs reported in Search Console index coverage?

Is there a trend of increasing URLs reported as “Crawled – currently not indexed” in Search Console index coverage?

Is there a trend of increasing URLs reported as “Discovered – currently not indexed” in Search Console index coverage?

Is there a trend of increasing URLs reported as “Excluded by noindex tag” in Search Console index coverage?

Is there a trend of increasing URLs reported as “Duplicate without user-selected canonical” in Search Console index coverage?

Is there a trend of increasing URLs reported as “Duplicate, Google chose different canonical than user” in Search Console index coverage?

In Search Console Search Performance, are clicks trending up, down or staying steady in the last 16-months?

In Search Console Search Performance, are impressions trending up, down or staying steady in the last 16-months?

Are there non-indexable URLs?

Should these URLs be set as noindex?

Are there 301/302 redirections in place?

Are these redirections correct?

Are there internal links pointing to 404 or 301 URLs?

Are there redirect chains?

Do URLs have canonicals set?

Are the canonical tags correct?

Are internal search URLs indexable?

Does the website have internal links pointing towards URLs generated by internal search?

How many indexable URLs does the website have?

How to find out the exact number of indexable URLs a website has with Screaming Frog:

  1. Run a crawl of the website
  2. When the crawl is complete, got to the OVERVIEW tab and look for TOTAL INTERNAL INDEXABLE URLS
  3. Input the number into the cell in the spreadsheet.

What this means:

This number represents all the URLs that you believe are important to the website. Assuming that 301 redirects, 404/410 HTTP status, and canonicalised URLs have been implemented correctly, this is number of indexable URLs you want indexed in an ideal world.

You can use this number to benchmark the indexing run-rate once you have calculated the number of URLs indexed by Google and also the number of URLs not indexed.

How many URLs are indexed on Google?

How to find out how many URLs have been indexed by Google using COVERAGE information in GSC:

  1. Log into Google Search Console
  2. Go to INDEX > COVERAGE
  1. Look at the VALID number, this is the number of indexed URLs according to Search Console

    Note: however this method does not tell you which URLs have been indexed for websites > 1,000 URLs and this is why the following method using Screaming Frog and GSC API is recommended

How to find out how many URLs have been indexed by Google using Screaming Frog and GSC API:

  1. In Screaming Frog, go to API tab and click on the settings (cogwheel) button
  1. Click CONNECT TO NEW ACCOUNT
  2. This will trigger a new tab/window to open in your web browser
  3. Log into a Google account that has GSC access to the website click ALLOW
  4. Return back to Screaming Frog and select the GSC property that you’re auditing
  1. Click OK button but do not close the dialogue window
  2. Go to URL INSPECTION tab and check ENABLE URL INSPECTION and click OK button
  1. Run a crawl of the website
  2. Wait for CRAWL and API to reach 100%
  3. Once crawl and API retrieval has completed, go to SEARCH CONSOLE tab and OVERVIEW tab
  1. Subtract URL IS NOT ON GOOGLE from ALL – this will give you the total number of URLs that have been indexed by Google

    e.g., in the below example, the number URLs indexed is 29 because 12 subtracted from 41 =29.

What this means:

The number of indexed URLs represent all the pages on the website that can be found on Google Search.

With this number, you can calculate the percentage of indexed URLs vs all indexable URLs.

However,

  • It is uncommon for websites to have all URLs indexed
  • It can take weeks for new content to be indexed
  • There is also no guarantee that Google will index your content or all the content on the web

If the website is experiencing indexing issues with new URLs, I recommend that you identify which URLs have not been indexed and look for any patterns as to why Google has decided not to index these URLs.

How many URLs are not indexed on Google?

How to find out how many URLs have not been indexed by Google using Screaming Frog and GSC API:

  1. In Screaming Frog, go to API tab and click on the settings (cogwheel) button
  1. Click CONNECT TO NEW ACCOUNT
  2. This will trigger a new tab/window to open in your web browser
  3. Log into a Google account that has GSC access to the website click ALLOW
  4. Return back to Screaming Frog and select the GSC property that you’re auditing
  1. Click OK button but do not close the dialogue window
  2. Go to URL INSPECTION tab and check ENABLE URL INSPECTION and click OK button
  1. Run a crawl of the website
  2. Wait for CRAWL and API to reach 100%
  3. Once crawl and API retrieval has completed, go to SEARCH CONSOLE tab and OVERVIEW tab
  1. Choose INDEXABLE URL NOT INDEXED in the dropdown toggle as shown below and this will give you all the URLs that have not been indexed by Google

    Note: the OVERVIEW tab will give you the exact number

What this means:

The URLs shown cannot be found on Google search because they’re not in Google’s index.

The COVERAGE column will give you a clue as to why these indexable URLs are not indexed.

Typically, they reasons provided are:

  • Page with redirect
  • URL is unknown to Google
  • Duplicate, submitted URL not selected as canonical
  • Duplicate, Google chose different canonical than user
  • Crawled – currently not indexed

How many low content pages exist?

Does a site: operator search show paginated URLs in the search results?

Are all paginated series indexable?

Are paginated URLs being canonicalized to a “view all” page?

Are paginated URLs being canonicalized to a root page?

Do internal links to paginated pages use “rel=nofollow” attribute?

Do pages that don’t exist return a 404 or 200 HTTP status code response?

Have URLs been submitted to Search Console removal tool?
Does faceted navigation create indexable URLs?

Does a site: operator search show parametised URLs in the search results?

Do all pages have a h1?

Are there empty h1 tags?

Is there more than one h1 per URL?

Are title links the same as h1?

Can title links have different text as the h1?

Are Wordpress tags being used?

Are Wordpress tags indexable?

Do Wordpress tags have canonical URLs?

Are Wordpress categories being used?

Are Wordpress categories pages indexable?

Do Wordpress categories have canonical URLs
Are URLs with parametised URLs indexable?

Does the business have permanently discontinued products?

Can these URLs be reused or repurpused?

Do permanently discontinued product URLs return a 301, 404 or 410 HTTP status code?

Have permanently discontinued product URLs been marked as noindex?

If a 301 redirect exists, do they point to a relevant or irrelevant URL?

Does the website have product detail pages (PDPs) that are temporarily unavailable?

Do temporarily unavailable PDPs return a HTTP 200 OK status code?

Do temporarily unavailable PDPs inform users when the produce will be available again?

Do temporarily unavailable PDPs have Offer Product Schema markup?

Is AMP being used?

Was AMP used in the past?

Do non-AMP URLs reference to the AMP version using “link-rel=amphtml” attribute?

Do AMP URLs canonicalise to the non-AMP version?

>> International SEO

Is the website multilingual, multi-regional, or both?

Does the website use a geo-redirect?

Is hreflang being used?

How is hreflang being used on the website?

Are all hreflang tags self-referencing?

Are all hreflang tags bidirectional?

Are the correct ISO 639-1 and ISO 3166-1 Alpha 2 formats being used?

Has an x-default hreflang attribute been set?

Are hreflang tags using absolute URLs or relative URLs?

Is there a separate canonical tag and hreflang tag or have they been combined?

Does the website’s language encoding match hreflang designation?

Is there more than one URL provided for a hreflang value?

>> Ranking-related

Do important product category pages have a crawl depth greater than 2 clicks from the homepage?

Do primary keyword URLs have a crawl depth greater than 4 clicks from the homepage?

Do important service URLs have descriptive anchor text internal links pointing to them from multiple URLs?

Do important product category pages have descriptive anchor text internal links pointing to them from multiple URLs?

Do primary keyword URLs have descriptive anchor text internal links pointing to them from multiple URLs?

Do important service URLs have a crawl depth greater than 2 clicks from the homepage?

How many pages are not mobile-friendly?

Do any 404 URLs have referring domains linking to them?

Have these 404 URLs been 301 redirected?

Do the pages with the most traffic link to other important URLs?

Do internal links have descriptive anchor text that conveys what the target URL is about?

Do internal links use a variety of keyword-rich anchor text?

Do any internal links have rel=nofollow attribute?

Are years, months and dates used in subfolders?

Are years, months and dates used in URL slugs?

Do ranking URLs have rich results?

Do FAQPage rich results have internal links?

Can FAQPage or HowTo rich results be added to existing ranking URLs?

>> Other technical SEO

Does the site load via https protocol?

How to check if a website loads via HTTPS protocol:

  1. In a web browser, load any URL on the website (e.g., homepage, about, contact, product URL)
  2. Look to see if a padlock icon shows in the browser address bar
    • If you see a padlock icon, check ‘yes’ in the spreadsheet
    • If you do see a padlock icon, follow the next step to verify
  3. Double-click on the browser address bar to reveal the full URL
    • If you see “https://”, check ‘yes’ in the spreadsheet
    • If you see “http://”, check ‘no’ in the spreadsheet

Possible answers:

  • yes
  • no

What this means:

> If ‘yes’, a valid SSL is installed and this means the website loads over HTTPS protocol.

>> If ‘no’, consider recommending the addition of a SSL certificate because HTTPS has been confirmed as a minor ranking signal by Google. However, be aware that the appropriate 301 redirects from HTTP to HTTPS must be implemented.

Recommended reading:

Is HSTS enabled?

Does the webite have a staging environment?

Has the staging environment been indexed by Google?

Is the staging environment indexable?

Are there internal links pointing to the staging environment?

Does the staging environment have password protection?

Are there plugins that have not been updated?

Is the active theme up-to-date?

Is the version of Wordpress up-to-date?

Does running a crawl on the site cause browsing on the website to become significantly slower?

Does running a crawl on the site cause 5XX status codes?

Is there an SEO plugin installed?

What SEO plugin is being used?

How many plugins are installed?

Are there images over 750KB in filesize?

Are there images exceeding 1MB in filesize?

Do internal links use absolute URLs or relative URLs?