How bad is referer spam?

While exploring the depth of my ridiculous referer spam issue, I ran the following simple
query:

mysql> select count(*) as cnt , baseDomain from referer_visitlog where to_days(now()) - 
to_days(visitTime) = 0 group by baseDomain order by cnt desc limit 10 ;
+-----+--------------------------+
| cnt | baseDomain               |
+-----+--------------------------+
| 682 | chikaliresortmalawi.com  |
| 682 | champvilleclub.com       |
| 682 | ceyloncurry.com          |
| 682 | cbmwyo.org               |
| 289 | brittandersondesigns.com |
| 243 | clevelandfyi.com         |
|  50 | google.com               |
|  16 | google.co.uk             |
|   8 | xopy.com                 |
|   6 | search.yahoo.com         |
+-----+--------------------------+
10 rows in set (1.16 sec)

It’s kind of depressing to find just how many people work that hard to try to spam my silly little site.

Addendum: It seems that virtually all my referer spam comes from four distinct IP addresses. They are now in my blacklist. We shall see how long this holds.

Fighting Referer Spam

In the last couple of days, I’ve been targetted by referer spam bots. These dorks access pages on a weblog repeatedly in an attempt to get their referer tag listed on your home page. I’ve been trying to figure out how to combat this behavior, and can see two different ways of dealing with it:

  • Ping the referer back, and make sure it does link to my site. Probably slow and not scaleable, particularly in the situation I have with asymmetric bandwidth.
  • Blacklist sites which generate bursts of referer traffic. If we get lots of referers to a particular url in a short period of time, put them in a database of blacklisted sites and keep them from ever appearing in the referer list.

The second seems easy, but I must admit: the query to find such lists seems difficult to write. I’ll continue to think it over, but does anyone have any suggestions?