And now it does!
Crawling the static HTML is a pretty straightforward task. In simplest terms, the web crawler just reads the source code of a page. All the text content and all the links are there.
I will use this page as an example: The Complete List of Link Building Tactics by Jon Cooper, PointBlankSEO.
Ahrefs Site Explorer says that it has a link to ahrefs.com.
And if you open that page you will see it’s there.
But if you try to find a mention of “ahrefs” in the source code of a page…
Not a single mention.
Our Site Explorer was smart enough to report this link because not so long ago it started rendering JS (in a pretty similar way Google does).
Now our Site Audit tool can do that too!
So today a website built with JS is not a problem for our Site audit tool. It will detect 100+ pre-configured SEO issues on such websites, like it does for static HTML sites.
How does it work?
Generally, our crawler can see the JS-powered website in the same way it is rendered in a visitor’s browser.
Here’s is the oversimplified sequence:
- Document Object Model (DOM) of the page is loaded. DOM is the basis of the dynamically generated page.
- The scripts and resources required to render a page are loaded.
- The Site Audit crawler waits for 3 seconds and takes a snapshot of the HTML code generated.
That’s the reason why crawling dynamic HTML requires much more resources than crawling the static pages.
But we have these resources here at Ahrefs!
Not quite. In some cases, our crawler will not fetch all the possible data from a dynamically generated page.
- Sometimes scripts can be executed some time after the page loads. So if the code is generated after our crawler takes a snapshot of the HTML, it won’t be crawled.
- There are scroll-triggered and click-triggered scripts. Site Audit crawler won’t simulate scroll-down or click actions that may be used on a page to trigger scripts. Facebook’s infinite scroll where more and more content appears as you scroll down is a good example.
So if the link only shows up in the cases described above, it won’t be found and followed by our crawler.
Does it crawl websites built with modern JS frameworks?
Yes. The same way our Site Explorer crawler does.
Will it trigger trackers and ads?
No. Our crawler will not execute tracker codes like Google Analytics or Matomo (formerly Piwik).
The ads will not be triggered as well.
What else should you know to crawl JS-powered websites with Site Audit tool?
Such crawls might create a higher load on your web server as it requests more resources like JS scripts.
We promise you that this is not the last feature added to our Site Audit tool. It has a very solid roadmap. And if you’d like to contribute, you can suggest more features here.