Given that you’re here, I’m guessing this isn’t news to you. So let’s get straight down to business.
This article teaches you how to fix any of these three problems:
- Your entire website isn’t indexed.
- Some of your pages are indexed, but others aren’t.
- Your newly-published web pages aren’t getting indexed fast enough.
But first, let’s make sure we’re on the same page and fully-understand this indexing malarkey.
Google discovers new web pages by crawling the web, and then they add those pages to their index. They do this using a web spider called Googlebot.
Confused? Let’s define a few key terms.
- Crawling: The process of following hyperlinks on the web to discover new content.
- Indexing: The process of storing every web page in a vast database.
- Web spider: A piece of software designed to carry out the crawling process at scale.
- Googlebot: Google’s web spider.
Here’s a video from Google that explains the process in more detail:
When you Google something, you’re asking Google to return all relevant pages from their index. Because there are often millions of pages that fit the bill, Google’s ranking algorithm does its best to sort the pages so that you see the best and most relevant results first.
The critical point I’m making here is that indexing and ranking are two different things.
Indexing is showing up for the race; ranking is winning.
You can’t win without showing up for the race in the first place.
Go to Google, then search for
This number shows roughly how many of your pages Google has indexed.
If you want to check the index status of a specific URL, use the same
No results will show up if the page isn’t indexed.
Now, it’s worth noting that if you’re a Google Search Console user, you can use the Coverage report to get a more accurate insight into the index status of your website. Just go to:
Google Search Console > Index > Coverage
Look at the number of valid pages (with and without warnings).
If these two numbers total anything but zero, then Google has at least some of the pages on your website indexed. If not, then you have a severe problem because none of your web pages are indexed.
You can also use Search Console to check whether a specific page is indexed. To do that, paste the URL into the URL Inspection tool.
If that page is indexed, it’ll say “URL is on Google.”
If the page isn’t indexed, you’ll see the words “URL is not on Google.”
Found that your website or web page isn’t indexed in Google? Try this:
- Go to Google Search Console
- Navigate to the URL inspection tool
- Paste the URL you’d like Google to index into the search bar.
- Wait for Google to check the URL
- Click the “Request indexing” button
This process is good practice when you publish a new post or page. You’re effectively telling Google that you’ve added something new to your site and that they should take a look at it.
However, requesting indexing is unlikely to solve underlying problems preventing Google from indexing old pages. If that’s the case, follow the checklist below to diagnose and fix the problem.
Here are some quick links to each tactic—in case you’ve already tried some:
- Remove crawl blocks in your robots.txt file
- Remove rogue noindex tags
- Include the page in your sitemap
- Remove rogue canonical tags
- Check that the page isn’t orphaned
- Fix nofollow internal links
- Add “powerful” internal links
- Make sure the page is valuable and unique
- Remove low-quality pages (to optimize “crawl budget”)
- Build high-quality backlinks
1) Remove crawl blocks in your robots.txt file
Is Google not indexing your entire website? It could be due to a crawl block in something called a robots.txt file.
To check for this issue, go to yourdomain.com/robots.txt.
Look for either of these two snippets of code:
User-agent: Googlebot Disallow: /
User-agent: * Disallow: /
Both of these tell Googlebot that they’re not allowed to crawl any pages on your site. To fix the issue, remove them. It’s that simple.
A crawl block in robots.txt could also be the culprit if Google isn’t indexing a single web page. To check if this is the case, paste the URL into the URL inspection tool in Google Search Console. Click on the Coverage block to reveal more details, then look for the “Crawl allowed? No: blocked by robots.txt” error.
This indicates that the page is blocked in robots.txt.
If that’s the case, recheck your robots.txt file for any “disallow” rules relating to the page or related subsection.
Remove where necessary.
2) Remove rogue noindex tags
Google won’t index pages if you tell them not to. This is useful for keeping some web pages private. There are two ways to do it:
Method 1: meta tag
Pages with either of these meta tags in their
<head> section won’t be indexed by Google:
<meta name=“robots” content=“noindex”>
<meta name=“googlebot” content=“noindex”>
This is a meta robots tag, and it tells search engines whether they can or can’t index the page.
To find all pages with a noindex meta tag on your site, run a crawl with Ahrefs’ Site Audit. Go to the Indexability report. Look for “Noindex page” warnings.
Click through to see all affected pages. Remove the noindex meta tag from any pages where it doesn’t belong.
Method 2: X-Robots-Tag
Crawlers also respect the X-Robots-Tag HTTP response header. You can implement this using a server-side scripting language like PHP, or in your .htaccess file, or by changing your server configuration.
The URL inspection tool in Search Console tells you whether Google is blocked from crawling a page because of this header. Just enter your URL, then look for the “Indexing allowed? No: ‘noindex’ detected in ‘X-Robots-Tag’ http header”
If you want to check for this issue across your site, run a crawl in Ahrefs’ Site Audit tool, then use the “Robots information in HTTP header” filter in the Page Explorer:
Tell your developer to exclude pages you want indexing from returning this header.
Recommended reading: Robots meta tag and X-Robots-Tag HTTP header specifications
3) Include the page in your sitemap
A sitemap tells Google which pages on your site are important, and which aren’t. It may also give some guidance on how often they should be re-crawled.
Google should be able to find pages on your website regardless of whether they’re in your sitemap, but it’s still good practice to include them. After all, there’s no point making Google’s life difficult.
To check if a page is in your sitemap, use the URL inspection tool in Search Console. If you see the “URL is not on Google” error and “Sitemap: N/A,” then it isn’t in your sitemap or indexed.
Not using Search Console? Head to your sitemap URL—usually, yourdomain.com/sitemap.xml—and search for the page.
Or, if you want to find all the crawlable and indexable pages that aren’t in your sitemap, run a crawl in Ahrefs’ Site Audit. Go to Page Explorer and apply these filters:
These pages should be in your sitemap, so add them. Once done, let Google know that you’ve updated your sitemap by pinging this URL:
Replace that last part with your sitemap URL. You should then see something like this:
That should speed up Google’s indexing of the page.
4) Remove rogue canonical tags
A canonical tag tells Google which is the preferred version of a page. It looks something like this:
<link rel="canonical” href="/page.html/">
Most pages either have no canonical tag, or what’s called a self-referencing canonical tag. That tells Google the page itself is the preferred and probably the only version. In other words, you want this page to be indexed.
But if your page has a rogue canonical tag, then it could be telling Google about a preferred version of this page that doesn’t exist. In which case, your page won’t get indexed.
To check for a canonical, use Google’s URL inspection tool. You’ll see an “Alternate page with canonical tag” warning if the canonical points to another page.
If this shouldn’t be there, and you want to index the page, remove the canonical tag.
Canonical tags aren’t always bad. Most pages with these tags will have them for a reason. If you see that your page has a canonical set, then check the canonical page. If this is indeed the preferred version of the page, and there’s no need to index the page in question as well, then the canonical tag should stay.
If you want a quick way to find rogue canonical tags across your entire site, run a crawl in Ahrefs’ Site Audit tool. Go to the Page Explorer. Use these settings:
This looks for pages in your sitemap with non-self-referencing canonical tags. Because you almost certainly want to index the pages in your sitemap, you should investigate further if this filter returns any results.
It’s highly likely that these pages either have a rogue canonical or shouldn’t be in your sitemap in the first place.
5) Check that the page isn’t orphaned
Orphan pages are those without internal links pointing to them.
Because Google discovers new content by crawling the web, they’re unable to discover orphan pages through that process. Website visitors won’t be able to find them either.
To check for orphan pages, crawl your site with Ahrefs’ Site Audit. Next, check the Links report for “Orphan page (has no incoming internal links)” errors:
This shows all pages that are both indexable and present in your sitemap, yet have no internal links pointing to them.
This process only works when two things are true:
- All the pages you want indexing are in your sitemaps
- You checked the box to use the pages in your sitemaps as starting points for the crawl when setting up the project in Ahrefs’ Site Audit.
Not confident that all the pages you want to be indexed are in your sitemap? Try this:
- Download a full list of pages on your site (via your CMS)
- Crawl your website (using a tool like Ahrefs’ Site Audit)
- Cross-reference the two lists of URLs
Any URLs not found during the crawl are orphan pages.
You can fix orphan pages in one of two ways:
- If the page is unimportant, delete it and remove from your sitemap.
- If the page is important, incorporate it into the internal link structure of your website.
6) Fix nofollow internal links
Nofollow links are links with a rel=“nofollow” tag. They prevent the transfer of PageRank to the destination URL. Google also doesn’t crawl nofollow links.
Here’s what Google says about the matter:
Essentially, using nofollow causes us to drop the target links from our overall graph of the web. However, the target pages may still appear in our index if other sites link to them without using nofollow, or if the URLs are submitted to Google in a Sitemap.
In short, you should make sure that all internal links to indexable pages are followed.
To do this, use Ahrefs’ Site Audit tool to crawl your site. Check the Links report for indexable pages with “Page has nofollow incoming internal links only” errors:
Remove the nofollow tag from these internal links, assuming that you want Google to index the page. If not, either delete the page or noindex it.
Recommended reading: What Is a Nofollow Link? Everything You Need to Know (No Jargon!)
7) Add “powerful” internal links
Google discovers new content by crawling your website. If you neglect to internally link to the page in question then they may not be able to find it.
One easy solution to this problem is to add some internal links to the page. You can do that from any other web page that Google can crawl and index. However, if you want Google to index the page as fast as possible, it makes sense to do so from one of your more “powerful” pages.
Why? Because Google is likely to recrawl such pages faster than less important pages.
To do this, head over to Ahrefs’ Site Explorer, enter your domain, then visit the Best by links report.
This shows all the pages on your website sorted by URL Rating (UR). In other words, it shows the most authoritative pages first.
Skim this list and look for relevant pages from which to add internal links to the page in question.
For example, if we were looking to add an internal link to our guest posting guide, our link building guide would probably offer a relevant place from which to do so. And that page just so happens to be the 11th most authoritative page on our blog:
Google will then see and follow that link next time they recrawl the page.
Paste the page from which you added the internal link into Google’s URL inspection tool. Hit the “Request indexing” button to let Google know that something on the page has changed and that they should recrawl it as soon as possible. This may speed up the process of them discovering the internal link and consequently, the page you want indexing.
8) Make sure the page is valuable and unique
Google is unlikely to index low-quality pages because they hold no value for its users. Here’s what Google’s John Mueller said about indexing in 2018:
We never index all known URLs, that’s pretty normal. I’d focus on making the site awesome and inspiring, then things usually work out better.
— 🍌 John 🍌 (@JohnMu) January 3, 2018
He implies that if you want Google to index your website or web page, it needs to be “awesome and inspiring.”
If you’ve ruled out technical issues for the lack of indexing, then a lack of value could be the culprit. For that reason, it’s worth reviewing the page with fresh eyes and asking yourself: Is this page genuinely valuable? Would a user find value in this page if they clicked on it from the search results?
If the answer is no to either of those questions, then you need to improve your content.
This will return “thin” pages that are indexable and currently get no organic traffic. In other words, there’s a decent chance they aren’t indexed.
Export the report, then paste all the URLs into URL Profiler and run a Google Indexation check.
It’s recommended to use proxies if you’re doing this for lots of pages (i.e., over 100). Otherwise, you run the risk of your IP getting banned by Google. If you can’t do that, then another alternative is to search Google for a “free bulk Google indexation checker.” There are a few of these tools around, but most of them are limited to <25 pages at a time.
Check any non-indexed pages for quality issues. Improve where necessary, then request reindexing in Google Search Console.
You should also aim to fix issues with duplicate content. Google is unlikely to index duplicate or near-duplicate pages. Use the Duplicate content report in Site Audit to check for these issues.
9) Remove low-quality pages (to optimize “crawl budget”)
Having too many low-quality pages on your website serves only to waste crawl budget.
Here’s what Google says on the matter:
Wasting server resources on [low-value-add pages] will drain crawl activity from pages that do actually have value, which may cause a significant delay in discovering great content on a site.
Think of it like a teacher grading essays, one of which is yours. If they have ten essays to grade, they’re going to get to yours quite quickly. If they have a hundred, it’ll take them a bit longer. If they have thousands, their workload is too high, and they may never get around to grading your essay.
Google does state that “crawl budget […] is not something most publishers have to worry about,” and that “if a site has fewer than a few thousand URLs, most of the time it will be crawled efficiently.”
Still, removing low-quality pages from your website is never a bad thing. It can only have a positive effect on crawl budget.
You can use our content audit template to find potentially low-quality and irrelevant pages that can be deleted.
10) Build high-quality backlinks
Backlinks tell Google that a web page is important. After all, if someone is linking to it, then it must hold some value. These are pages that Google wants to index.
For full transparency, Google doesn’t only index web pages with backlinks. There are plenty (billions) of indexed pages with no backlinks. However, because Google sees pages with high-quality links as more important, they’re likely to crawl—and re-crawl—such pages faster than those without. That leads to faster indexing.
We have plenty of resources on building high-quality backlinks on the blog.
Take a look at a few of the guides below.
Indexing ≠ ranking
Having your website or web page indexed in Google doesn’t equate to rankings or traffic.
They’re two different things.
Indexing means that Google is aware of your website. It doesn’t mean they’re going to rank it for any relevant and worthwhile queries.
That’s where SEO comes in—the art of optimizing your web pages to rank for specific queries.
In short, SEO involves:
- Finding what your customers are searching for;
- Creating content around those topics;
- Optimizing those pages for your target keywords;
- Building backlinks;
- Regularly republishing content to keep it “evergreen.”
Here’s a video to get you started with SEO:
… and some articles:
There are only two possible reasons why Google isn’t indexing your website or web page:
- Technical issues are hindering them from doing so
- They see your site or page as low-quality and worthless to their users.
It’s entirely possible that both of those issues exist. However, I would say that technical issues are far more common. Technical issues can also lead to the auto-generation of indexable low-quality content (e.g., problems with faceted navigation). That isn’t good.
Still, running through the checklist above should solve the indexation issue nine times out of ten.
Just remember that indexing ≠ ranking. SEO is still vital if you want to rank for any worthwhile search queries and attract a constant stream of organic traffic.