For the past two years I have been developing a traffic trading script that I actually bought many years ago. At the time I needed something fast to setup a few sites where I would be trading traffic with other sites. The script needed a lot of new features so over time I have managed to integrate various changes to help with performance and also usability. The one thing I always had problems with was the script constantly logged traffic that was not human traffic.
One of the features that I added was tracking hits that had no referrer. The problem with that was this also allowed bot traffic to get logged as well. Bots usually don’t have a referring site so the script was picking these up as valid hits and throwing off my no referrer stats.
The simplest solution to this was to basically implement a way to detect real users. My first thought was why not just look at the user agent but some bots actually fake this and report themselves as popular browsers like IE, Mozilla, etc. The good bots actually report themselves with a particular name. For example Google usually uses a name like googlebot in the user agent string to identify itself. Now if all the bots would do this then the easiest thing to do would be to get a list of all the bots and just compare the visitors user agent to the list of bots and filter accordingly.
So after some testing I added the following little php function to my logging script:
function isSpider($agent){
//popular bots list
$bots = array(
‘ia_archiver’,
‘spider’,
‘Scooter’,
‘Ask Jeeves’,
‘Baiduspider’,
‘Bing’,
‘Bot’,
‘Gigabot’,
‘Mediapartners-Google’,
‘Google Desktop’,
‘Feedfetcher-Google’,
‘Googlebot’,
‘Yahoo-MMCrawler’,
‘Yahoo! DE Slurp’,
‘Yahoo! Slurp’,
‘YahooSeeker’,
);//check the bots list
foreach ($bots as $bot){//detect the bot name from the HTTP USER AGENT
if (stristr($agent,$bot) == true){return true;
}
}
return false;
}
This function would allow me to pass the user agent from the visitor to the function and then detect if it was using one of the listed user agents. Of course you are welcome to add more if you know of more but this list handles most of the top bots that troll my sites. There are also other ways to detect bots even if they are using valid browser user agent strings. But for my purposes this works well as most of my bot hits come from the list above.

Tags: