Meet the team. See what’s next.
The conference for marketers ready to win in 2026
Search, AI, strategy—live on stage this October in San Diego

The Beginner’s Guide to Technical SEO

Patrick Stox
Patrick Stox is a Product Advisor, Technical SEO, & Brand Ambassador at Ahrefs. He was the lead author for the SEO chapter of the 2021 Web Almanac and a reviewer for the 2022 SEO chapter. He also co-wrote the SEO Book For Beginners by Ahrefs and was the Technical Review Editor for The Art of SEO 4th Edition. He’s an organizer for the Triangle SEO Meetup, the Tech SEO Connect conference, he runs a Technical SEO Slack group, and is a moderator for /r/TechSEO on Reddit.
Article Performance
Data from Ahrefs
  • Organic traffic
  • Linking websites

The number of websites linking to this post.

This post's estimated monthly organic search traffic.

Technical SEO is the most important part of SEO until it isn’t. Pages need to be crawlable and indexable to even have a chance at ranking, but many other activities will have minimal impact compared to content and links.

We wrote this beginner’s guide to help you understand some of the basics and where your time is best spent to maximize impact.

Contents

What is technical SEO?

Technical SEO is the practice of optimizing your website to help search engines find, crawl, understand, and index your pages. It helps increase visibility and rankings in search engines. It matters for AI search, too.

How complicated is technical SEO?

It depends. The fundamentals aren’t really difficult to master, but technical SEO can be complex and hard to understand. I’ll keep things as simple as I can with this guide.

Does technical SEO matter for AI search?

Yes. AI search still depends on crawlable, well-structured, trustworthy web pages. Technical SEO ensures your site is fast, accessible, and indexable, all of which improve the chances your content is used in AI-driven answers as well as traditional search.


In this chapter, we’ll cover how to make sure search engines can efficiently crawl your content.

How crawling works

Crawling is where search engines grab content from pages and use the links on them to find even more pages. There are a few ways you can control what gets crawled on your website. Here are a few options.

Robots.txt

A robots.txt file tells search engines and AI platforms where they can and can’t go on your site.

Most search engines and AI crawlers respect your robots.txt settings. If you explicitly disallow them, they’ll obey and won’t crawl your content or include it in training data. But if you block search engines and LLMs from using your website as training material, you also limit your chances of becoming visible in their responses.

DID YOU KNOW?
Google and some LLMs may index pages that they can’t crawl if links are pointing to those pages. This can be confusing, but if you want to keep pages from being indexed, check out this guide and flowchart which can guide you through the process.

LLMs.txt

LLMs.txt is a voluntary standard for telling large language models (LLMs) how they can use your content, but as we explain in our guide to LLMs.txt, it’s not especially effective and likely not worth the effort.

There’s no evidence yet that LLMs.txt improves AI retrieval, boosts traffic, or enhances model accuracy.

Crawl rate

There’s a crawl-delay directive you can use in robots.txt that many crawlers support. It lets you set how often they can crawl pages. Unfortunately, Google doesn’t respect this.[1] For Google, you’ll need to change the crawl rate in Google Search Console.[2]

Access restrictions

If you want the page to be accessible to some users but not search engines, then what you probably want is one of these three options:

  • Some kind of login system
  • HTTP authentication (where a password is required for access)
  • IP whitelisting (which only allows specific IP addresses to access the pages)

This type of setup is best for things like internal networks, member-only content, or for staging, test, or development sites. It allows for a group of users to access the page, but search engines will not be able to access the page and will not index it.

How to see crawl activity

For Google specifically, the easiest way to see what it’s crawling is with the “Crawl stats” report in Google Search Console, which gives you more information about how it’s crawling your website.

If you want to see all crawl activity on your website, including from AI crawlers, then you will need to access your server logs and possibly use a tool to better analyze the data. This can get fairly advanced. But if your hosting has a control panel like cPanel, you should have access to raw logs and some aggregators like AWstats and Webalizer.

Crawl adjustments

Each website is going to have a different crawl budget, which is a combination of how often Google wants to crawl a site and how much crawling your site allows. More popular pages and pages that change often will be crawled more often, and pages that don’t seem to be popular or well linked will be crawled less often.

If crawlers see signs of stress while crawling your website, they’ll typically slow down or even stop crawling until conditions improve.

After pages are crawled, they’re rendered and sent to the index. The index is the master list of pages that can be returned for search queries. Let’s talk about the index.


In this chapter, we’ll talk about how to make sure your pages are indexed and check how they’re indexed.

Robots directives

A robots meta tag is an HTML snippet that tells search engines how to crawl or index a certain page. It’s placed into the <head> section of a webpage and looks like this:

<meta name="robots" content="noindex" />

Canonicalization

When there is duplicate content creating multiple versions of the same page, Google will select one to store in its index. This process is called canonicalization and the URL selected as the canonical will be the one Google shows in search results. There are many different signals it uses to select the canonical URL including:

The easiest way to see how Google has indexed a page is to use the URL Inspection tool in Google Search Console. It will show you the Google-selected canonical URL.

Checking how a URL is indexed in Google Search Console

One of the hardest things for SEOs is prioritization. There are a lot of best practices, but some changes will have more of an impact on your rankings and traffic than others. Here are some of the projects I’d recommend prioritizing.

Check indexing

Make sure pages you want people to find can be indexed in Google. The two previous chapters were all about crawling and indexing, and that was no accident. 

You can check the Indexability report in Site Audit to find pages that can’t be indexed and the reasons why. It’s free in Ahrefs Webmaster Tools.

Finding noindexed pages in Ahrefs' Site Audit

Run a free technical SEO audit

Free for websites with verified ownershipi

Ownership verification can be done by:

  • Connecting Google Search Console (recommended);
  • Uploading an HTML file;
  • Adding a TXT record to your DNS configuration;
  • Adding an HTML meta tag to your homepage.
Learn more

Signing up here gives you access to Ahrefs Webmaster Tools ↗ for free

Reclaim lost links

Websites tend to change their URLs over the years. In many cases, these old URLs have links from other websites. If they’re not redirected to the current pages, then those links are lost and no longer count for your pages. It’s not too late to do these redirects, and you can quickly reclaim any lost value. Think of this as the fastest link building you will ever do. 

You can find opportunities to reclaim lost links using Ahrefs’ Site Explorer. Enter your domain, go to the Best by Links report, and add a “404 not found” HTTP response filter. I usually sort this by “Referring Domains”.

This is what it looks like for 1800flowers.com:

Finding 404 pages in Ahrefs' Site Explorer

Looking at the first URL in archive.org, I see that this was previously the Mother’s Day page. By redirecting that one page to the current version, you’ll reclaim 225 links from 59 different websites—and there are plenty more opportunities.

I even created a script to help you match redirects. Don’t be scared away; you just have to download a couple of files and upload them. The Colab notebook walks you through it and takes care of the heavy lifting for you.

You’ll want to 301 redirect any old URLs to their current locations to reclaim this lost value.

DID YOU KNOW?

A 301 redirect is a permanent redirect. Any links pointing to the redirected URL will count toward the new URL in Google’s eyes.[3]

Add internal links

Internal links are links from one page on your site to another page on your site. They help your pages be found and also help the pages rank better. We have a tool within Site Audit called Internal Link Opportunities that helps you quickly locate these opportunities. 

This tool works by looking for mentions of keywords that you already rank for on your site. Then it suggests them as contextual internal link opportunities.

For example, the tool shows a mention of “faceted navigation” in our guide to duplicate content. As Site Audit knows we have a page about faceted navigation, it suggests we add an internal link to that page.

Finding internal linking opportunities in Ahrefs' Site Audit

Add schema markup

Schema markup is code that helps search engines understand your content better and powers many features that can help your website stand out from the rest in search results. It may also help LLMs correctly interpret your page content. Google has a search gallery that shows the various search features and the schema needed for your site to be eligible.


AI has changed how content is found and displayed to searchers, but it still depends on a foundation of crawlable, well-structured, and trustworthy web pages. It’s also changing how we create and optimize content.

Paying attention to a few AI-specific technical factors can help you stay visible wherever people search for information.

Make your site accessible to LLMs

Like search engines, LLMs need to be able to crawl your website and access its content. However, they work a little differently from search engine crawlers.

For instance, most LLMs don’t render JavaScript, a common coding language used to build websites. If key content or navigation only appears after JavaScript loads, there’s a risk some AI crawlers won’t see it. So, it’s best to avoid using it for any mission-critical content that you want visible in AI search.

It’s also worth checking whether third-party tools are blocking AI crawlers from accessing your website.

For instance, Cloudflare introduced new features allowing website owners to control whether AI platforms can scrape content for their training datasets.

The default setting blocks AI crawlers from accessing content. However, you’ll need to switch it off if you want your content to maximize your visibility in AI search results.

Redirect hallucinated URLs

AI search systems may cite URLs on your domain that don’t exist. You can discover these in Ahrefs’ Web Analytics by looking at pages that receive AI search traffic:

If any of these pages resolve to a 404 error, the AI system may have hallucinated the URL. To avoid losing traffic, you can redirect that URL to a relevant live page.

Regular monitoring prevents user frustration and protects brand authority.

AI content detection

Although it is fine to use AI to create content for your website, too much AI content can be seen as a spam signal that limits your content’s visibility in traditional and AI search systems. 

You can use Ahrefs’ AI detector within Site Explorer > Page Inspect to see how machines may interpret the level of AI used in your content.

You can also check this in bulk in the Top Pages report to spot existing pages that may need to be rewritten:


Code injected by AI tools

If you’ve used AI to help build your website or add new features to it, they may add extra HTML code that reveals AI was used.

In one case, a Yoast SEO bug inserted hidden AI-related classes into pages, making it obvious to search engines that AI was involved.

If you use AI tools to make on-page changes, check your site’s source code to ensure nothing unexpected is being added. Hidden “fingerprints” like this can be avoided with regular code reviews and testing before publishing updates.


The projects we’ll talk about in this chapter are all good things to focus on, but they may require more work and have less benefit than the “quick win” projects from the previous part. That doesn’t mean you shouldn’t do them. This is just to help you get an idea of how to prioritize various projects.

Page experience signals

These are lesser ranking factors, but still things you want to look at for the sake of your users. They cover aspects of the website that impact user experience (UX).

Google's search signals for page experience

Core Web Vitals

Core Web Vitals are the speed metrics that are part of Google’s Page Experience signals used to measure user experience. The metrics measure visual load with Largest Contentful Paint (LCP), visual stability with Cumulative Layout Shift (CLS), and interactivity with First Input Delay (FID).

HTTPS

HTTPS protects the communication between your browser and server from being intercepted and tampered with by attackers. This provides confidentiality, integrity, and authentication to the vast majority of today’s WWW traffic. You want your pages loaded over HTTPS and not HTTP.

Any website that shows a “lock” icon in the address bar is using HTTPS.

Example of a website protected by HTTPS

Mobile-friendliness

Simply put, this checks if webpages display properly and are easily used by people on mobile devices.

How do you know how mobile-friendly your site is? Check the “Mobile Usability” report in Google Search Console.

The Mobile Usability report in Google Search Console

This report tells you if any of your pages have mobile-friendliness issues.

Interstitials

Interstitials block content from being seen. These are popups that cover the main content and that users may have to interact with before they go away.

Hreflang — For multiple languages

Hreflang is an HTML attribute used to specify the language and geographical targeting of a webpage. If you have multiple versions of the same page in different languages, you can use the hreflang tag to tell search engines like Google about these variations. This helps them to serve the correct version to their users.

Ahrefs now helps make hreflang implementation easier with a visual hreflang link graph in Site Audit.

This hreflang graph shows all language variants of a page and highlights any configuration issues. Plus, it flags errors like invalid language codes, missing self-links, and a lack of reciprocal tags and gives clear guidance on how to fix them.

General maintenance/website health

These tasks aren’t likely to have much impact on your rankings but are generally good things to fix for user experience.

Broken links

Broken links are links on your site that point to non-existent resources. These can be either internal (i.e., to other pages on your domain) or external (i.e., to pages on other domains).

You can find broken links on your website quickly with Site Audit in the Links report. It’s free in Ahrefs Webmaster Tools.

Finding broken internal and external links in Ahrefs' Site Audit

Redirect chains

Redirect chains are a series of redirects that happen between the initial URL and the destination URL. 

You can find redirect chains on your website quickly with Site Audit in the Redirects report. It’s free in Ahrefs Webmaster Tools.

Finding redirect chains in Ahrefs' Site Audit

These tools help you improve the technical aspects of your website.

Google Search Console

Google Search Console

Google Search Console (previously Google Webmaster Tools) is a free service from Google that helps you monitor and troubleshoot your website’s appearance in its search results.

Use it to find and fix technical errors, submit sitemaps, see structured data issues, and more.

Bing and Yandex have their own versions, and so does Ahrefs. Ahrefs Webmaster Tools is a free tool that’ll help you improve your website’s SEO performance. It allows you to:

  • Monitor your website’s SEO health.
  • Check for 100+ SEO issues.
  • View all your backlinks.
  • See all the keywords you rank for.
  • Find out how much traffic your pages are receiving.
  • Find internal linking opportunities.

It’s our answer to the limitations of Google Search Console.

Google’s Mobile-Friendly Test

Google's Mobile-Friendly Test tool

Google’s Mobile-Friendly Test checks how easily a visitor can use your page on a mobile device. It also identifies specific mobile-usability issues like text that’s too small to read, the use of incompatible plugins, and so on.

The Mobile-Friendly Test shows what Google sees when it crawls the page. You can also use the Rich Results Test to see the content Google sees for desktop or mobile devices.

Chrome DevTools

Google Chrome's DevTools

Chrome DevTools is Chrome’s built-in webpage debugging tool. Use it to debug page speed issues, improve webpage rendering performance, and more.

From a technical SEO standpoint, it has endless uses.

Ahrefs’ SEO Toolbar

Ahrefs SEO Toolbar

Ahrefs’ SEO Toolbar is a free extension for Chrome and Firefox that provides useful SEO data about the pages and websites you visit.

Its free features are:

  • On-page SEO report
  • Redirect tracer with HTTP headers
  • Broken link checker
  • Link highlighter
  • SERP positions

In addition, as an Ahrefs user, you get:

  • SEO metrics for every site and page you visit and for Google search results
  • Keyword metrics, such as search volume and Keyword Difficulty, directly in the SERP
  • SERP results export

PageSpeed Insights

Google Pagespeed Insights

PageSpeed Insights analyzes the loading speed of your webpages. Alongside the performance score, it also shows actionable recommendations to make pages load faster. 

Key takeaways

  • If your content isn’t indexed, then it won’t be found in search engines.
  • When something is broken that impacts search traffic, it can be a priority to fix. But for most sites, you’re probably better off spending time on your content and links.
  • Many of the technical projects that have the most impact are around indexing or links.
  • Technical SEO still matters for AI search. Well-structured, crawlable pages help AI systems find, understand, and surface your content.

References

  1. “Is a crawl-delay rule ignored by Googlebot?”. Google Search Central. 21st December 2017
  2. “Change Googlebot crawl rate”. Google. Retrieved 9th September 2022
  3. “30x redirects don’t lose PageRank anymore”. Gary Illyes. 26th July 2016
Article Performance
Data from Ahrefs
  • Organic traffic
  • Linking websites

The number of websites linking to this post.

This post's estimated monthly organic search traffic.