Data & Studies

Meet the New Web Crawlers: AI Bots Are Closing in on Search Engine Bots

Patrick Stox
Patrick Stox is a Product Advisor, Technical SEO, & Brand Ambassador at Ahrefs. He was the lead author for the SEO chapter of the 2021 Web Almanac and a reviewer for the 2022 SEO chapter. He also co-wrote the SEO Book For Beginners by Ahrefs and was the Technical Review Editor for The Art of SEO 4th Edition. He’s an organizer for several groups including the Raleigh SEO Meetup (the most successful SEO Meetup in the US), the Beer and SEO Meetup, the Raleigh SEO Conference, Tech SEO Connect, runs a Technical SEO Slack group, and is a moderator for /r/TechSEO on Reddit.
The web is being crawled more than ever, but it’s no longer just search engines doing the crawling. AI bots are soaking up all the data they can to train and power their AI assistants.

The problem is that AI bots are new to web crawling. They’re still making some silly crawl decisions, bumping around, and wasting a lot of resources. We’ve already seen that AI bots are starting to be blocked by a lot more of the most trafficked websites.

I have no doubt the AI bots will figure things out, but it will likely be a painful process. I don’t think any of them have even started rendering content yet.

There was a lot of blowback on Bing in the past because of their aggressive crawling. They fixed it, started IndexNow, and now they may be the most efficient web crawler.

Check out IndexNow

If you haven’t set up IndexNow for your website, you should.

If you’re a CDN, CMS, an AI bot, or Google, you should also be getting in on this. Save resources, save money, and save the planet. It’s good for you, it’s good for your users, it’s good for website owners, it’s good for website hosts, and it’s good for the web and the planet.

It’s estimated that 53% of crawler traffic is wasted effort, and IndexNow can help cut that number down to almost nothing.

By looking at data from Cloudflare Radar, which reflects a massive slice of internet traffic, we can see how AI bots compare to search engine bots and SEO bots. Cloudflare handles ~20% of all internet traffic, so this is a large and representative dataset for overall web crawling.

Let’s dive in.

Search Engine bots crawl the most. AI Bots are relatively new to the web, but they are firmly in 2nd place despite there being fewer of them. If more AI assistants are launched and their bots start crawling the web, they’ll probably pass search engines in crawling. At the rate they’re growing, this could happen in the next couple years.

share of bot requests search engine bots 34.6%, ai bots 18.9%, seo bots 7.9%

If we break out some of the individual bots, you can see that Googlebot is eating the internet much more than other bots.

Notice that Google’s AI bot also has a fast crawl rate and so does Google images. There are a lot more Googlebots that all perform various tasks.

Amazonbot and GPTBot are both on par with crawling from Bing.

Ahrefsbot is the most active bot by far in the SEO industry. Some other platforms like to claim they crawl faster, but all the evidence says that’s not true. Ahrefs makes nearly as many requests as all 20 other SEO bots combined. We’d beat a combination of 19 of them.

Despite our crawl rate, I’d guarantee that we’re also the most efficient crawler of any SEO bot. We’ve solved a lot of issues that I still see present in these other crawlers. We’re also getting the data from IndexNow via our search engine, Yep, to help us be even more efficient. At this point, maybe we should be classified as a search engine bot instead.

share of bot requests for top bots. Google and bing for search engines, amazon and gpt for ai bots, ahrefs for seo bots

AI bots have ramped up crawling this year. They were nearly 25% of all bot requests in May, just 10% below search bots. That’s amazing to me considering that Googlebot is so dominant overall and there are about half as many AI bots as there are search engine bots.

share of bot requests over time for search engine, ai, and seo bots

Googlebot is by far the most dominant bot on the web. GPTBot is charging ahead this year. I’m curious if they’ll close the gap more in the future.

share of bot requests for individual bots. Top bots include googlebot, gptbot, bingbot, GoogleOther

Final thoughts

There’s a cost to bots crawling your websites and there’s a social contract between search engines and website owners, where search engines add value by sending referral traffic to websites. Google sends traffic (for now), so they don’t get blocked.

This contract extends to AI assistants. Send traffic to websites, give website owners data on searches and the visibility of their website in your system for reporting, or you will be blocked at the rate you’re crawling. We’re already seeing it happen.

Explicit blocks of AI bots on the top 1 million websites by traffic

It’s not surprising given that AI search traffic is only 0.1% of the overall traffic to websites. Send more traffic to websites, provide more value to website owners.

If you have any questions, ask me on LinkedIn or X.