I <3 Bots!
Subscribe to the RSS feed

Keyword - fuckthespam

Entries feed - Comments feed

Wednesday, August 13 2008

And so you wanted to protect your email address on your website...

People start thinking of how to prevent spam when they're building website, that's a fact and that's very good indeed. The only problem is when they don't actually know how a bot would handle the HTML page...

For instance, I was surfing on qik.com and saw this little piece of JavaScript in order to protect the exposure of the email address:

<script type="text/javascript">
//<![CDATA[
  document.write('<a href="mailto:XXXX@qik.com"\
    title="Send us an email!">XXXX@qik.com<\/a>');
//]]>
</script>

As the readers of this blog may know, the bot process is really easy.... download the HTML page (crawling) and then trying to extract the email address (parsing). This is just obvious that a bot wouldn't bother with the CDATA tag or because this is embedded in a JavaScript code, if I would have to do a bot, nonetheless I would have a very lossy parsing in order to gather as much information as possible, but I wouldn't care about "in which context am I?". Also, according to some testing I'm doing, I can tell you have if this was a URL, the Google bots would get them...

So please, obfuscate just a bit this... some example can be found on fuckthespam.com

Tuesday, January 29 2008

Search engine keywords extraction

For fuckthespam!, I wanted to add a nice feature due to the content of this website: a listing of keywords that people used to come on this website.

Well, the code is pretty simple bust just wanted to share it; it's working for google, msn and yahoo (the 3 most important search engine), I don't really care about having everything and just wanted to share this PHP snippet.

$referer = $_SERVER["HTTP_REFERER"];
if (strpos($referer,"search") > 0) {
	// look for google, yahoo and MSN
	$key = 0;
	if (strpos($referer,"google.") > 0 || strpos($referer,"msn.") > 0)
		$key = "q";
	else if (strpos($referer,"yahoo.") > 0)
		$key = "p";

	if ($key) {
		$parse_url = parse_url (urldecode($referer));
		if (array_key_exists("query",$parse_url)) {
			$query = $parse_url['query'];
			// extract (.+)$key=(.*)&
			$t = explode("&", $query);
			foreach($t as $k=>$e) {
				if ($e[0] == $key && $e[1] == '=') {
					$k = "$key=";
					$keyword = str_replace($k,'',$e);
					if (strlen($keyword) > 2) {
						// $keyword is actually the whole content of the search
					}
					break;
				}
			}
		}
	}
}
I <3 Bots!