“Fresh index, good coverage, not get blocked. Choose two.”
– Dmytro Gerasymenko, Founder & CEO of Ahrefs
When building a big index, there’s always a trade-off between freshness and good coverage.
Freshness implies running regular crawls to keep information up-to-date. Good coverage implies crawling as many pages as possible. Yet, you can’t have both running at full capacity, otherwise you’d get blocked by webmasters and hosting companies.
The answer to this is implementing a crawl budget, which refers to the number of URLs a crawler can and wants to crawl.
Crawl budget is composed of two parts: crawl rate and crawl demand.
Crawl rate refers to the number of requests a crawler can make to a site when crawling it.
Crawling a website too fast can add too much load to a server. Since this can lead to poor user experience or result in our crawler getting blocked, our crawl rate takes into account:
Page speed – Faster-loading pages are preferred to slower-loading ones.
Website size – Small websites with high-quality links will most likely be crawled in full as compared to larger websites with low-quality links that might get only partially crawled.
Crawl demand, or call priority, represents the level of importance attached to crawling and recrawling pages on a website.
This is done by our scheduler, which determines the crawl demand by:
URL popularity (URL Rating) – The higher the quality of the backlinks pointing to a page, the higher the priority.
Website popularity (Domain Rating) – The higher the strength of a website’s backlink profile, the higher the priority.
How to analyze yours and your competitors’ websites with Site Explorer
How to master keyword research with Keywords Explorer
How to improve your on-page and technical SEO with Site Audit
How to track and improve your Google rankings with Rank Tracker
How to discover untapped keyword and link building opportunities with Content Explorer
How to get keyword and link building opportunities on autopilot with Alerts