Usually I don’t accept the made up Google term “unnatural links“. After all what links are natural, those growing on trees from organic farming? That’s a nonsense made up word by Google to discredit the SEO trade.
On the other hand I recently discovered that my WordPress blog has accidentally created around 50.000 spam pages or rather URLs and the same number of unnatural links to my blog commenters. These links were really unnatural by any means even I had to admit.
How could that happen?
After all I’m a seasoned blogger and SEO. I use WordPress for a decade now and practice SEO for almost as long.
Let this be a cautionary tale for everybody out there. You never know what quirks evolving technologies can show and how it might affect you negatively in an ever changing Web environment. Thanks God I knew there was Ahrefs.com to even be able confirm the issue.
When an online colleague of mine, Martin Harris alerted me to the issue I didn’t even understand the problem: What? I am spamming you with tens of thousands of links after you commented on my blog? The sheer idea was so bizarre I needed some visualization of it:
I checked other people who have commented on my blog to find even more links, sometimes up to 50k.
What are clean URLs?
Let me explain the basics first so that everybody understands the problem. For Google the Web doesn’t consist of pages but of URLs or Internet addresses of pages. What’s the difference? Well, a page can have more than one address. That’s a problem for search engines like Google. That’s why Google has created another term, the so called “duplicate content”. Duplicate content is the same text on two URLs. For you it might be the same one page but for Google it’s not.
I will use a prominent example. A typical page with more than one address looks like this:
http://www.nytimes.com/2013/09/01/magazine/googling-yourself-takes-on-a-whole-new-meaning.html?_r=1&pagewanted=all&
http://www.nytimes.com/2013/09/01/magazine/googling-yourself-takes-on-a-whole-new-meaning.html?_r=1
http://www.nytimes.com/2013/09/01/magazine/googling-yourself-takes-on-a-whole-new-meaning.html
See the difference? This is the same New York Times article each time but it has so called parameters added to the Internet address. These usually start with an “?”. In case there is more than one parameter, in most cases an “&” is used for the latter ones.
What are parameters for? They usually help the software a website runs on to function properly, especially when it comes to old, clunky and outdated software. When I started out on the Web in the nineties and the first content management systems came up it was commonplace to pollute your URLs with all kinds of parameters, sometimes dozens of them.
WordPress and parameters
Even early WordPress started out using parameters. With WP 1 WordPress post URLs looked like this:
example.com/blog/?p=115
You can still see that URL when you preview your posts. After a few years web developers, usability experts and SEO practitioners have cleaned up most of the Web from parameters. Why?
“Parameters are responsible for all kinds of issues.”
Not only Google struggled with them, also users were often annoyed by polluted URLs. Even on social media these types of cluttered URLs were unusable. You could submit the same article a few times to a social site and each time it would be treated as separate content. So the article couldn’t get popular as the popularity would get spread among several addresses.
WordPress was pretty quick to fix the parameter issue. I think starting with version 1.5 they added URL rewriting by default to the WP features. From then on you could display proper readable addresses like
example.com/blog/a-clean-url-of-a-post
Other content management systems were much slower to adapt and that’s one of the reasons why 20% of the Web runs on WordPress these days. By now most CMS vendors offer clean URLs out of the box. The few that don’t are negligible or they have made parameters as short and clean as possible. Google can read parameters by now too so you could assume there is no problem anymore.
Old wine in new bottles
“A few years after CMS makers cleaned up their URLs third party tools started to pollute them again.”
Most notably Google itself and some of its tools like Feedburner or Google Analytics messed up Internet addresses by adding tracking parameters so that marketers can see whether their ads work and where their users came from originally. So Google Analytics for example will add an ?utm_source=x parameter. Many people in the SEO industry embraced this too because they could garner more data on their visitors. I didn’t. I hated that pollution all along.
Thus I created a simple JavaScript to clean up the URLs once some tool or prankster decided to add something to them. Yes, anybody can add some parameters to your URL and make it look as if it was you and it was done on purpose. Try adding “?spam=yes” to your URL and click on it. Does it work? Does the added parameter still show up?
The script just had a few lines and simply redirected someone who clicked on a link like
example.com/blog/a-clean-url-of-a-post?spam=yes
back to the original at
example.com/blog/a-clean-url-of-a-post
How I spammed my commenters
Nothing fancy here. Of course I used the script on my own blog too. It wasn’t just for my readers. I needed the solution right away for my own blog. I used it for years successfully. Until the day Martin Harris alerted me that he gets tens of thousands of spammy links from my domain after commenting on my blog. I got scared. I thought maybe I got hacked or something but then Ahrefs has shown me the issue. I clicked the onreact.com links to find out where they come from and I’ve looked at this:
So I was offending Martin by using links from my site? Luckily I remembered that the “?you-suck” parameter was the example I have given in my post on cleaning up your URLs with my script. I looked up the URLs and they looked like this:
http://seo2.0.onreact.com/?you-suck/page/2/page/2/page/2/page/2/page/2/page/2/page/2/page/2/page/2/page/2/page/23
So what happened? I am not sure but my best idea has been that somehow my script has interfered with the built in URL redirection of WordPress. WordPress redirects from ?p=115 to a clean URL like /a-clean-url-of-a-post internally.
I assumed that that my poor scripting skills have somehow collided with the new WordPress 3.6 as the issue seemed to be new. At least nobody has noticed it so far.
“Apparently WordPress is trying frantically to paginate my non-existent ?you-suck category.”
Of course my reaction was to remove the script from my blog, update my post about it and inform my readers about this issue. Happy ending?
Here the story could have ended. Sadly it didn’t. It seems to just unfold. Despite the script having been removed the number of fake URLs is still growing. When I checked on September 19th I’ve seen this:
It gets worse, WordPress as a negative SEO weapon?
“So the number of unnatural links to my commenters’ sites my WordPress is generating is rising exponentially it seems.”
In the best case it’s just the remaining garbage there is since I deleted the script. In the worst case I have unearthed a rare WordPress vulnerability that enables hackers to use almost any WordPress blog for so called negative SEO. Negative SEO is about hurting your competition on Google search.
My hosting provider has notified me meanwhile that someone is trying to access these non-existing URLs over and over, so often that the server stability got compromised.
“Can your WordPress blog be used as a negative SEO weapon just by accessing non-existing URLs?”
I will have to find out. Now that I have written this post I will be able to explain the issue in the first place. Maybe it’s not an accident but an attack. I hope somebody can help.


