How to Search Through the Source Code of the Entire Website

Nick Churick
Nick is one of our Product Marketers and coincidentally he's also a pretty skilled writer. So there you have it - he's now a regular contributor to our blog.
Article Performance
  • Linking websites
    16

The number of websites linking to this post.

This post's estimated monthly organic search traffic.

    Ahrefs Site Audit, also available as part of the free Ahrefs Webmaster Tools, allows you to search through the raw HTML code or the JS-rendered code across all crawled pages of the website.

    This feature is particularly useful when you need to verify analytics tags, identify pages that call certain scripts or stylesheets, detect unwanted injections into the page code, or research the competitors’ technologies.

    It is important to understand that in the era of JavaScript-powered websites, the page code can exist in two forms:

    Raw (Source): the HTML code before any JavaScript on the page has been executed. This is what you see using the “View Page Source” feature in the browser.

    Rendered: the final HTML code after being altered/generated by JavaScript. It is visible in the “Inspect” mode in the browser.

    The source and rendered versions can be significantly different, so it’s important to ensure you’re searching through the correct version of the page code.

    How to search through the rendered code of the pages

    If you need to search through the JS-rendered HTML code of all the pages on the website, run a crawl in Site Audit or Ahrefs Webmaster Tools. Ensure that the “Execute JavaScript” option is activated in the crawl settings.

    Execute JavaScript setting

    Once the crawl is complete, go to the Page Explorer and access the Advanced filter. Select ‘Page source’ followed by ‘Contains’ from the dropdown menu. Then, enter the specific piece of code you are searching for.

    Advanced filter

    The example above finds all pages on our blog that that contain an embedded table.

    How to search through the raw HTML of the pages

    Searching through the raw HTML (also called source HTML) requires a few extra actions:

    1. Disable JavaScript rendering in the crawl settings

    Execute JavaScript setting - off

    2. Ensure discoverability of all pages by the crawler.

    This is crucial for websites where page content (including the internal links) is generated via JavaScript, as AhrefsSiteAudit bot may not automatically discover all pages via raw HTML code.

    That’s why you need to supply the Site Audit tool with a list of input URLs that we call “Seeds.”

    The easiest way to do that is to make sure that the Sitemaps are used in the “URL Sources.” If that’s not feasible, use the Custom URL list.

    URL Sources

    When the crawl is finished, use the advanced filter to search through the source code of all crawled pages.

    Article Performance
    • Linking websites
      16

    The number of websites linking to this post.

    This post's estimated monthly organic search traffic.