Your Polluted URLs Stink: A Guide to Cleaning up the Mess

Tad Chef

Tad Chef writes for SEO blogs from all over the world including his own one called SEO 2.0. He helps people with blogs, social media and search, both in German and English. You can follow Tad on Twitter @onreact_com to get his latest insights daily.

Article stats


    Marketers love to add all kind of crap parameters to your Internet addresses (URLs). Especially

    third party marketing tools like Buffer pollute your URLs

    • until they look really ugly
    • damage the user experience
    • and literally stink.

    Why and how to clean up the mess others have left in the browser address bar.

     

    O as in Optimization vs M as in Marketing — What’s the difference?

    Over the years the difference between marketers and optimizers has been growing. While social media marketers (SMM) prefer to buy data on people from the likes of Facebook and then bombard them with personalized ads based on their private preferences social media optimizers (SMO) increasingly focused on supplying social media users with amazing content that enriches their lives and makes them happy.

    Search Engine Optimizers craft and optimize pillar content so that it spreads far and wide and consequently earns a lot of incoming links through that engagement.

    In contrast Search Engine Marketers (SEM) or PPC (Pay Per Click) marketers focus on buying Google Ads and counting the ROI (Return On Investment) of it by any means possible so they can prove their worth.

    In many cases both roles have been performed by the same people many of whom are my friends. Recently there’s a growing conflict of interest between the two approaches though. One of the best ways to show that is how marketers damage the user experience and thwart social media as well as search engine optimization best practices of clean URLs.

     

    Web dev best practices overthrown by marketers

    For years it has been widely known among professionals ranging from web development to usability to information architecture that short, clean and self-explanatory file names and URLs are the way to go on the Web.

    URL parameters have been a bane of the past stemming from the early days of clumsy content management systems that wouldn’t work without adding lots of cryptic digits and characters to clean Web addresses.

    For

    • SEO (Search Engine Optimization)
    • UX (User Experience)
    • IA (Information Architecture)

    etc.

    a URL like this is the best:

    https://ahrefs.com/blog/url-clean-up/

    It’s short clean, readable and contains keywords and most importantly tells you where you are (on the blog of Ahrefs.com, reading the clean URLs article).

    Sadly automated social media marketing tools like Buffer will make something like that out of that URL without asking the blogger or website owner for permission:

    https://ahrefs.com/blog/url-clean-up/?utm_content=buffer7d676&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

    Yes, they will just add more characters to your Internet address than it has originally. Also please note how they insert two ads into your URL advertising their own service (Buffer) through it.

    Adding insult to injury they insert a whole website address, here Twitter.com

    to it so that some people, especially only viewing the by now very long URL partially — might mistake their whereabouts. Why does the address bar say I’m on Twitter.com? I wanted to go to the Ahrefs blog!

     

    Is adding something to unsuspecting webmasters’ URL really that harmful?

    Of course I do exaggerate a bit. One or two of these things by itself wouldn’t do a lot of harm. Personally I love to tell people that I have send them visitors. So I sometimes add parameters like ?via=seo2.us/ebook to my outgoing links.

    Note how I do something of value for those webmasters first (linking to them and sending them visitors) and only tell them about it by using one single parameter and a short URL in order not to mess up their original one too much.

    You might argue that all of this is irrelevant as long as the Website is still found using such a “polluted” URL but even that isn’t as safe as you might think. Recently I witnessed the following scenario on a popular marketing blog:

    visiting a blog post by using a Buffer polluted URL (with all those redundant parameters) I couldn’t see any content.

    The page started with the comment section. So I tested the URL without the parameters and guess what happened? The blog article was visible. Could that be a mistake of my own?

    I just assumed that it was only a temporary glitch and looked up the polluted URL version again, yet here there was still no content. I reported it to the author of the post (who only read it a few hours later and got angry at me because he couldn’t replicate it).

    It might have been a glitch lasting an hour or two but you see what happens. Even an old school Internet user like I am starts wondering and looking at the strange address. Average users are even more likely to be confused.

     

    Google doesn’t care for the people

    Google’s growing disregard for SEO practitioners has also increasingly contributed to the URL mess. In their market leading Google Analytics software they make webmasters add random parameters without considering the side-effects of polluted URLs mentioned above.

    With more than half of websites globally using GA this issue is only getting worse especially given the enormous pressure to prove the ROI of any marketing discipline and beyond. You can even measure the ROI of your mom by now. Just kidding, but it’s close.

    Like Buffer Google wants to prove the value of its ads by showing their customers that and how many people have clicked them

    while in reality half of them remain unseen not even to mention widespread clickfraud. Thus adding parameters which are useless from the user point of view but lucrative from the marketing point of view will be more common in future.

    Unless you don’t care that you leave users confused (just think of such an URL in a message sent to somebody in plain text) you want to protect yourself against this type of URL hijacking.

    For my own purposes I created a very simple JavaScript years ago. I even mentioned it here on Ahrefs a while ago in another context. I had to take it down because it was vulnerable to SPAM bot attacks on WordPress.

     

    How to clean up parameter polluted URLs?

    So after the recent case of a hidden content glitch by way of URL pollution I decided to search for a more sophisticated solution. I’ve found one albeit one that is not very popular yet. The post detailing has barely been noticed on the Web it seems. Yet, the issues affects billions of websites and should be addressed (no pun intended here).

    In many cases website owners do not want to remove or clean up all parameters from their URLs.

    Some of them might be useful or needed for their blog or site to work properly. On my personal site over at onreact.com I simply remove all parameters no matter what as I know that my site which I coded myself) works perfectly without them. I just cut them off and redirect to the clean URL again:

    //Preventing URL spoofing and polluting
    var url = location.href;
    var p = url.indexOf("?");
    if (p >= 1)
    {
      url = url.slice(0,p);
      self.location.replace(url);
    }
    

    As mentioned above this will suffice for static sites that don’t use parameters but might backfire for WordPress users.

     

    Tracking campaign and fixing URLs after the act

    Assuming that you’re both a marketer and an optimizer as in reality most often the case (I’m a notable exception as I don’t sell ads) you do not want to cripple Google Analytics you probably use on your site as well.

    You can also first track your campaign and then still fix your URLs.

    This is what the following script by Jason Weathered and Henrik Nyh does. It adds a simple JavaScript function to your Google Analytics code to remove the unwanted parameters once they have been counted by GA:

    var _gaq = _gaq || [];
    _gaq.push(['_setAccount', 'UA-XXXXX-X']);
    _gaq.push(['_trackPageview']);
    _gaq.push(function() {
      var newPath = location.pathname + location.search.replace(/[?&]utm_[^?&]+/g, "").replace(/^&/, "?") + location.hash;
      if (history.replaceState) history.replaceState(null, '', newPath);
    });
    
    
    (function() {
      var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
      ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
      var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
    })();
    

     

    Put it between the script-tags. This piece of code has been around in the wild since 2011/2013 (the updated version) so you can rest assured that it doesn’t do anything malicious.

    You have to replace your original Google Analytics tracking code on your site/blog with the one above, of course changing the UA-XXXXX-X part to your actual account number over at Google Analytics. I use the script on my SEO 2.0 blog and it removes the parameters flawlessly.

     

    * The “polluted URLs stink” illustration has been made by my friends from Freepik.com

    Tad Chef

    Tad Chef writes for SEO blogs from all over the world including his own one called SEO 2.0. He helps people with blogs, social media and search, both in German and English. You can follow Tad on Twitter @onreact_com to get his latest insights daily.

    Article stats

    Shows how many different websites are linking to this piece of content. As a general rule, the more websites link to you, the higher you rank in Google.

    Shows estimated monthly search traffic to this article according to Ahrefs data. The actual search traffic (as reported in Google Analytics) is usually 3-5 times bigger.

    Get notified of new articles

    46,592 marketers are already subscribed to Ahrefs blog. Leave your email to get our weekly newsletter.

    • this is e valuable information…thanks for your sharing

    • ronahi

      Hello