Referer Spam
Published by rich on Tuesday, May 27, 2008 - 16:06:19
Like many webmasters who depend on their sites for their living - I’m pretty obsessive about analyzing my server’s logs.
Specifically, I try to figure out where traffic is coming from (or sometimes stopped coming from).
More traffic equals more downloads of my anti spam email software’s trial. More downloads equals more money. If I see a change in traffic patterns - I try to figure out how I can utilize them to my benefit.
Most web browsers relay “referer” information (apparently it is “referer” not “referrer” - but only on the internet) telling websites which site or search engine the user came from. The server then logs this data.
It turns out it’s also really easy to spoof a referer data when making an HTTP request. But why would you want to do that?
The other day I came across several referers in SpamButcher’s logs looking like this:
http://fofksaaa.cssddssa121a.com/cheapdrugz4u.html
http://fofkfasaa.cssdqq22dssaa.com/cheapdrugz4u.html
http://afsafofksaa.cerwssddssaa.com/cheapdrugz4u.html
Like a dummy - I went to one of the sites to check it out. It was of course an “online pharmacy.”
The short of it is that spammers are intentionally hitting websites with fake referer data in an effort to get their webmasters to visit spam sites.
While maybe only 0.1% of webmasters will end up finding the referer entries, and then clicking through - this may not be that different from the delivery / click-through rate for junk email.
One possible way to block this kind of spam would be to have the web server query the referring page to see if the presumed link to the site actually exists. However, this would generate a large amount of CPU and network overhead to solve a relatively minor problem.
Another approach would be to do the same query when analyzing the log files. The log analysis software could include an anti-spam tool to either “flag” or discard referers without confirmed incoming links. This approach would provide the same result - but wouldn’t create an unwanted delay when serving webpages.

