Diary of My Fight Against Guestbook Spamming

Summary

Spam in Guestbooks thanks to Google

Aren't guest books great? As opposed to email, guestbook messages are available for everyone to see, massaging the vanity of many a site operator.

Enter Google's "link popularity" to determine how web pages rank. Since Google deems one-way links more valuable for a site than both-way links, guest books are a a pretty effective method of getting one-way links. There are people who assert that Google ignores guestbook links, but I believe the contrary is true. Guestbook Links are relevant for Google.

My Guestbook and the First Spam Entry

On December 8 2005, I opened a PHP/MySQL guestbook which was only visible from an obscure free-hosted website http://medlem.spray.se/coolgroove. The anchor text was "Please sign my guestbook". As I expected, no one bothered to sign. For quite a while.

On February 2, 2006, the first entry appeared with the text "So interesting site, thanks!". I wondered which site was being referred to, since my site certainly wasn't interesting at all! When I saw the URL www. haywired. com / vicodin, I knew: Spam!

The flow of spammed guestbook entries had begun.

My First Step: Finding Out Where the Spam "Came From"

Since I had already logged the IP of each guestbook entry, I added reverse-IP and country information to each guestbook entry. I was getting spam from all over the world! How colourful! I was impressed. (You can browse through my guestbook and see for yourself).

Upon closer inspection however, I was getting similar spam from widely different countries. The spammers were using proxy servers to hide their identity.

Trying to Look Behind the Proxy

A snippet of PHP code to detect a proxy is shown below. If no proxy is used, or one can't be detected (high anonymity), then just the IP-address is displayed. Otherwise the IP of the proxy and the real IP are displayed.

$output = $_SERVER['REMOTE_ADDR']; // ip

//check for proxy
$http_via = getenv('HTTP_VIA');
$http_forwarded = getenv('HTTP_X_FORWARDED_FOR');
$remoteport = getenv('REMOTE_PORT');

if (($http_forwarded!=NULL) && (strcmp($http_forwarded,$ip) != 0))
{
  // proxy used
  $cc = ip_to_country_both($http_forwarded);
  $output = $output.'<br>Originator IP: '.$http_forwarded.' [via: '.$http_via.', remote port: '.$remoteport.']';
}

return $output;

You can see this in action here.

Note that if the client has Java enabled, it is possible to obtain the real IP even through a high anonymity proxy, according to this site.

Brainstorming: Getting to a Solution for Spam Reduction

Solution 1: Add a NOINDEX-META

Add a

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

to the guestbook and display this prominently on the title --> This had no effect!!!
Rename the Guestbook "Spammer's Honey Pot" --> No effect! Amazing. This made me believe that the spammer is either a robot or doesn't understand English or just "follows orders".

Solution 2: Analyzing the spam

There were similar words in all the messages: Viagra, Cialis, Replica, Rolex, Pills, Poker, Blackjack, and quite a few more. A keyword-based filter might work? The amount of work involved put me off. I am inherently l-a-z-y.

Solution 3: Automated Email confirmation

Since I was getting the same spam with different email address specified, I assumed the email addresses to be mostly if not entirely fake. So a simple automated email-confirmation might get rid of more of the spammed entries.

Solution 4: Moderated Guestbook

A moderated guestbook is another solution, but this requires manual screening. But as I've already mentioned, I'm plain lazy, so this is out of the question.

Solution 5: More?

Can you think of any others? Tell me.
 

My Second Step: Anti-Spam Measures

Anti-Spam Measure 1: April 08 2006, Solution 3 is implemented. It works as follows:

1) Add two additional rows to the MySQL guestbook table:
    - hash varchar(32) containing the validation key
    - validated char(1) specifying whether email has been validated

2) After the guestbook has been signed, generate a hash code.
    As seed, use current time, a secret string and the email address.

// create the MD5 hash
$
secret_code = 'anything';
$
formatted_email = preg_replace("/(-|\@|\.)/", "", $email);
$hash = md5(date('
h:i:s').$secret_code.$formatted_email);

3) Along with each guestbook entry store the hash and validated='0'

4) Send an email to user with a link containing the hash string

$mail_body = "To validate your guestbook entry click the following link:\nhttp://www.romanvirdi.com/guestbook/index.php?m=$hash\n\nDO NOT CLICK the link if you do not know why you have received this email! Delete this mail instead.";
mail($email, "Validation Email", $mail_body, "From: guestbook@www.romanvirdi.com\n");

5) When the user clicks on the link, it activates the php script again.
    Check for 'm' parameter, and select the guestbook with the hash key.
    If a row is found, update that row with validated="1".
    If not, issue an error.

6) When displaying guestbook entries, only display validated records.

select * from guestbook where validated='1';

RESULT after one day:

NO MORE VISIBLE SPAM! Unfortunately the guestbook database continues getting filled by unverified entries.

This is how the guestbook looks when filtered
(select * from guestbook where validated='1')

This is how the guestbook looks when unfiltered
(select * from guestbook)
 

Anti-Spam Measure 2: Introduce logic to prevent spam bots from filling up the database

Using some code from http://php.webmaster-kit.com and adapting it accordingly, I introduced a 3 digit number displayed as a graphic on the guestbook entry form. The user must enter this number correctly for the entry to validate.

There are three snippets of code:

1) The entry html needs to be extended with the graphic and entry fields:

<p>Please enter this security code </p>
<img width=72 height=30 src="button.php" border="0" align="top"><br />
<input MAXLENGTH=3 SIZE=3 name="userdigits" type="text" value=""><br>

2) button.php is a new file which outputs the graphic:

<?php

$image = imagecreate(72, 30);
$darkgray = imagecolorallocate($image, 0x50, 0x50, 0x50);

srand((double)microtime()*1000000);

for ($i = 0; $i < 3; $i++) {$cnum[$i] = rand(0,9);}

for ($i = 0; $i < 3; $i++) {
  $fnt = rand(3,5);
  $x = $x + rand(12 , 20);
  $y = rand(7 , 12);
  imagestring($image, $fnt, $x, $y, $cnum[$i] , $darkgray);
}

$digit = "$cnum[0]$cnum[1]$cnum[2]";
session_start();
$_SESSION['digit'] = $digit;
header('Content-type: image/png');
imagepng($image);
imagedestroy($image);

?>

3) And finally, the validation in index.php itself:

session_start();
$digit = $_SESSION['digit'];
$userdigits = $_POST['userdigits'];
echo 'dig='.$digit.' userdig='.$userdigits;
session_destroy();

if (($digit == $userdigits) && ($digit > 0))
{
   // OK
} else // NOK;

This should get rid of any further unvalidated records.

Hehe, I am rubbing my hands in glee! Guestbook spam has been eliminated 100%!! Wonder how long it will last :-)