Have you ever been involved in a conversation with someone and seen or found something absolutely embarrassing about them. Something like a piece of food stuck between their teeth, or someone with a piece of toilet paper on their shoe, or worse a guy with his fly down. It’s distracting to say the least and makes it really hard to concentrate on what the person is talking about. This is how I feel about comment spam on blogs as will most of your blog’s audience. Don’t fret; there are steps you can take to stop it. In this article I’ll talk about a few of the steps we here at Digital Positions are taking, including monitoring our posts, moderating suspected comments, and setting up black lists for keywords, domains, and known email address’s of comment spammers.
Before we can take steps to block comment spam, we need to have a common understanding of what exactly it is. Comment spam is when someone or something (an automated script or malicious software) posts comments to your blog which have no relevance to the blog post content. Some can be innocent posts like “hey like your site, check out mine” while others can be down right rude “dude! I found the cheapest place to buy <insert drug of the week> and you can get it too. Just click here!” My personal favorites though are the idiots, yes they are idiots in my book, who post nothing but 50 URL’s, or actually try to make you think they know you. Their motives are always the same, they want to pull visitors from our site and bring them to their site, or they want my comments to act as a link to there site thereby increasing their ranking with the search engines.
So when we decided to start a blog here at Digital Positions my first concern was keeping this spam away from our valued readers. I searched the internet and found lots of great resources on how to stop it, methods other people are using to stop it and actual code people used to implement these checks.
My first approach was to generate a list of keywords to search a post for and if one of these keywords was found an alert was sent so we could remove the post. This worked great initially because we had few visitors so we would only get 1 spam comment every month or so. But as valid posts became more frequent so did spam. I had to come up with a better plan.
Thanks to my earlier research I was actually very well prepared to handle this task. I quickly figured out that there are 4 major areas I needed to scrub for spam.
So with these 4 business rules in mind I set out to build our spam checker for our blog comments. My first step was to rate the business rules in the order in which I figured a comment might receive a hit on. The order you see above is the order in which we check each comment. If a rule scores a hit the process stops and the comment is marked as spam.
Our team gets an email telling us when a new comment has been posted (both good and bad comments), if the post is suspected of being spam, and which rule it scored a hit on. A comment marked as a spam post doesn’t get deleted immediately, at least not at this time. Instead it is set to inactive and a member of our team can then review the post and either delete it or reactivate it. This helps us out immensely because this junk is never seen by our readers and we can deal with it as we have time. With a popular blog you could spend a lot of your time just cleaning up the spam, or moderating your posts. The key here is that we don’t immediately delete the spam. We simple make the post hidden from our viewers and strip out any offending HTML until a member of our team and either approve it, or delete it out of the system.
In addition to flagging and inactivating potential spam, we have a few more tricks up our sleeves to help ward off comment spam. Two other things we implemented which have helped us to keep spam low was to limit the number of posts someone can post in a set period of time and to assign each post a key. Right now you can only post a comment every 60 seconds and because the post has a key you can’t just keep hitting refresh to post multiple comments. This prevents someone from setting up a script to constantly submit comment spam, and requires a bit more human interaction for the submission process.
This is unobtrusive to an end-user because most people will read a post comment and read another post and possibly comment. If they are taking their time will typically be well outside of the 60 second window between their comment posts.
One final step you could take, which we have chosen not to implement at this time, is to install a captcha script. A captcha script is simple a bit of code which displays a random image with text which can’t be easily deciphered by a machine, or an image with a related question (i.e. a picture of a dog with a bandana and the question asks what color is the bandana). These are great tools and have been proven to severely limit the amount of spam you incur on your blog but it can also have an adverse affect for your visitors as well. For example if the letters or word in your captcha image is hard to read a user might not be able to pass even thought they are a legitimate poster. Also someone who browses with images turned off, or has a disability which requires them to use a screen reader will have difficulties with captcha’s. You can learn more about captcha’s at http://www.captcha.net/.
I’m sure some of you are wondering why we are posting this and in essence hanging our laundry out for everyone to see. I personally feel that by sharing this information I’ll be able to gain insight from others, and for those who wish to circumvent us they will just help me build a better mouse trap. Either way it will be a win-win situation. If you are interested in learning more about the code or our spam lists drop me a line at erj @ d – p . com (you’ll have to remove all the spaces).
Comments (1)