Crawlability

What is Crawlability?

Crawlability is the ability of a search engine crawler, such as Googlebot, to access website pages and resources. Crawlability issues can negatively affect the website’s organic search rankings.

You should distinguish crawlability from indexability. The latter refers to the ability of a search engine to analyze a page and add it to its index.

Only crawlable and indexable pages can be discovered and indexed by Google, meaning they can appear in search engine results.

Why is crawlability important?

Crawlability is vital for any website intended to receive organic search traffic. Crawlability allows search engines to crawl pages to read and analyze their content so it can be added to the search index.

A page cannot be properly indexed without crawling. We’re adding “properly” here because, in rare cases, Google can index a URL without crawling based on the URL text and anchor text of its backlinks, but the page title and description won’t show up on SERP.

Crawlability is not only important for Google. Other specific crawlers must crawl website pages for various reasons. AhrefsSiteAudit bot, for example, crawls website pages to check for SEO health and report any SEO issues.

What affects a website’s crawlability?

1. Page discoverability

Before crawling a web page, a crawler must first discover that web page. Web pages that aren’t in the sitemap or lack internal links (known as orphan pages) can’t be found by the crawler and, therefore, can’t be crawled or indexed.

If you want a page to be indexed, it must be included in the sitemap and have internal links (ideally both).

2. Nofollow links

Googlebot does not follow links with the “rel=nofollow” attribute.

So if a page has, for example, only one nofollow link, it’s equal to having no links at all in terms of crawling.

3. Robots.txt file

A robots.txt file tells web crawlers which parts of your site they can and cannot access.

If you want the page to be crawlable, it must not be disallowed in the robot.txt.

4. Access restrictions

Web pages can have specific restrictions that keep crawlers from reaching them.

These can include:

  • Some kind of login system
  • User-agent blacklisting
  • IP address blacklisting

How to find crawlability issues on your website

The easiest way to detect crawlability issues on a website is to use a special SEO tool such as Ahrefs Site Audit or our free Ahrefs Webmaster Tools.

Ahrefs Webmaster Tools can crawl the whole website, keeping tabs on new or recurring issues as time goes on. In addition, it will break down issues into different categories, helping you better understand your site’s overall SEO performance and why your site cannot be crawled.

FAQs

What’s the difference between crawlability and indexability?

Crawlability is the ability of a search engine to access a web page and crawl its content. Indexability is the ability of a search engine to analyze the content it crawls to add it to its index.

A page can be crawlable but not indexable.

Can a webpage be indexed in Google without crawling?

Surprisingly, Google can index a URL without crawling, allowing it to appear in search results. However, it’s a rare occurrence.

When this happens, Google uses anchor text and URL text to determine the purpose and content focus of the page. Note that Google won’t show the page’s title in this case.

This occurrence is briefly explained in Google’s Introduction to robots.txt:

While Google won’t crawl or index the content blocked by a robots.txt file, we might still find and index a disallowed URL if it is linked from other places on the web. As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the page can still appear in Google search results.