Comment Spam

This morning I woke up to find 95 inactive comments awaiting my approval; all of them were comment spam.

A couple of years ago, Tobyjoe moved my site from an engine he wrote specifically for me to Typo. Almost immediately, I began receiving a lot of comment spam. Frustrated and sick of hearing me whine about it, he came up with a hack. At that time, he created something called honeypot.html. Users without Javascript enabled were unable to leave comments. Why did this work? (I had to ask.) Spammers don’t use browsers when leaving comments (unless they’re paying actual people to leave the spam, which is rare). They use scripts to do so. Since we told my system that everyone needed Javascript in order to leave a comment, we immediately solved the problem. Those who weren’t using a browser were sent directly to the honeypot.html. Those who were actual humans and didn’t have Javascript enabled saw a message stating as much. (“Sorry, you need Javascript in order to leave a comment.”) They could still read the site but were unable to comment. It worked for the most part, although a user like my Luddite friend, Gerry, wasn’t able to leave comments because homeboy hasn’t upgraded his browser since, like, 1998.

When Tobyjoe and I moved over from Typo to Mephisto (January of this year), we signed up for Akismet, which is an amazing system. Here’s how it works. All comments (meaning from every Web site running Akismet) are submitted to Akismet. They scan each and every one of them. They flag words, phrases, IPs, etc. If there are any similarities from what the spammer leaves on your Web site and what they have in their database, the comment is temporarily marked as spam and is not pushed live. It is then up to you to sift through them and/or delete them in your admin tool. And just so you know how right on they are, I have received thousands of comment spam since January and only once did Akismet get it wrong.

For those bloggers overrun by spam, I highly recommend checking out Akismet. Every day, as I delete this crap, I get a thrill, like I am playing a video game, zapping the bad guys. This way, you don’t have to activate every single comment that comes in. The ones that are legit, show up. The others are set aside for your approval.

I still receive a lot of retro comments/spam from actual people. And sometimes the comments irritate me because they are left after someone searches for something disgusting, cruel, or just plain stupid. I haven’t decided what to do about this yet, but I might have the system disable comments on some posts after 48-hours. I might just continue to delete them by hand. However, if in the next couple of weeks you see this taking place here and there, it’s because I got fed up with the juveniles.


  1. I use Spam Karma (SK2) on my WordPress blog. Lately, I’ve been noticing more and more spam is coming through. Like you, I end up having a handful in moderation. I’m thinking about switching to Akismet since it seems to be where all the development is lately.

    I’m always scared when marking spam as spam though. I heard the “spammers” want you to mark the mumbo jumbo stuff (words that makes no sense) as spam because some can actually start allowing more spam through by finding loopholes. Does that make sense? Tobyjoe? Have you heard anything like that?


  2. i don’t ‘blog’ , actually i don’t even know what that means. but but it’s interesting what you guys know about this crap none the less. and what’s a Luddite … that a brand of flashlight? : )


  3. Well, the term Luddite here was used to describe someone who is opposed to technology, like our friend, The Wizard, who won’t even write emails. Although, I think the term originated because weavers were against machines taking over. They wanted to do things by hand. (Something like that. I’m regurgitating information.) Now, people (probably incorrectly so, as I have done) use it to describe anyone who fears computers and/or changes with one like Gerry and a lot of my old bosses.

    Sarah S. Interesting. I will have to press TJ on that one.


  4. More on Luddite.

    The English historical movement has to be seen in its context of the harsh economic climate due to the Napoleonic Wars; but since then, the term Luddite has been used to describe anyone opposed to technological progress and technological change. For the modern movement of opposition to technology, see neo-luddism.


  5. Sarah – There isn’t a general incentive to have messages marked, but what you’re describing is someone using filtration as a sort of attack vector. I deal with it in profanity filters on big brand client sites all the time. If you let someone know that their submission was caught in a filter, they will continue trying deviations until they find something that can get through. If you don’t let them know, they have no feedback, and thus no way of learning or planning an attack. So, unless they’re seeing lots of non-spam comments sneak through, they aren’t learning. If everything is flagged, they can’t find a loophole. So what you REALLY have to worry about is the bad stuff that gets through.


  6. I intensely dislike askimet—I think their licensing policy and fee system is really unbalanced.

    I also don’t see it as anything more than a version of cloudmark/razor for blogs.

    Cloudmark’s policy is way better though – they open sourced razor, made it trully collaborative, and allow free access to everyone. If you don’t like that, you can specify your own servers.

    I almost dropped using wordpress because there almost weren’t any viable options other than askimet.

    have you considered using false-positives ? Like letting the post appear as if it worked to the requesting client , but actually /dev/nulling the submission ?


  7. But, Jonathan, it works. It, like, really, really works. Why hate on something that actually works? This has been really helpful.

    You’re a h8tr. ;]


  8. Akismet support is built into Mephisto, and though I am working on some patches for Mephisto, Akismet works well enough that I don’t have much of a priority on adding support for others. Most of my patches have to do with the admin tool, Mint support, and asset management.


  9. Thanks Tobyjoe!

    I found the source of my information from an e-mail a co-worker sent. It is regarding e-mails sent with mumbo gumbo;

    Why am I getting all this gibberish-filled email?

    Recently I’ve been getting a lot of weird email messages. They aren’t exactly spam – at least they aren’t trying to sell anything that I can see and they don’t contain any links for me to click. They’re just full of gibberish, what appears to be random words and phrases. Who in the world is sending these things and why?—Genie L.

    You’re right: they aren’t spam, but they are sent by spammers. The point isn’t to get you to buy anything, but to get you to mark the unwanted messages as spam in order to confuse your Bayesian spam filters. Most anti-spam programs now use some form of Bayesian filtering – this is a way of using statistical methods to classify messages as spam (or not). The software “learns” to recognize what you consider to be spam based on the messages that you mark as spam. It’s a great idea and works well – except when the filters are “poisoned” by lots of messages that contain large amounts of random words and phrases that’s likely to appear in legitimate messages.

    Bayesian poisoning messages sometimes consist of random words and sometimes a block of text from a literary work or the like. In either case, the goal is to confuse the filters and render them useless.

    It’s also possible that even though they don’t contain links, some of the gibberish messages may contain web beacons. These are tiny, transparent (and thus invisible) graphics files placed in HTML email messages. When you open the message, your email client downloads the graphic from the sender’s server. This lets the spammer know that the email address is a “live” one.


  10. I used to get those as well and then Tobyjoe set up some sort of Spam assassin thing. I am sure he’ll know more about that.

    That’s different from comment spam, tho, right?

    Oh, you said as much. It’s funny that it’s turned into a battle between good and evil. Soon, more and more people will be hired for some small amount to physically go in and leave comments, which are actually spam. Correct me if I’m wrong, but isn’t spam about getting bigger Google ratings more so than getting folks to click on it? I guess that’s why it’s good to get them and delete them right away. Right?


  11. I dislike askimet because its a pay service that does several things;

    a- they’re charging for essentially the same services as free systems that exist.
    b- they’ve convinced or pushed development on all the free blog systems to use them as a default system, which has pretty much left other anti-spam measures abandoned. its basically the same thing as the software bundling that microsoft was forced to undo. on most systems now, you simply don’t have an option.
    c- their prices are pretty high. their pricing structure is insane—you pay based on what you’re classified as, not on usage. there’s also no demo.
    d- i’m tuned off by the idea of high prices for collaborative systems. considering that their algorithms and systems improve by the number of users and comments, their systems should charge less .

    call me a hater—but using askimet hurts open source, and it hurts the concept of blogging in general.

    re: spam

    a_ gibberish emails
    Most of them aren’t used to poison bayes – they’re used for address testing and throttle poisoning. Most modern systems run spam analysis as the message is accepted – it was developed as a way to combat spammers. If a message looks to be spam, it will be rejected outright. If too many spam messages or too many false addresses come from a host, their ability to send will be throttled. when spammers send the gibberish messages, they bypass the bayes filter and get instant info on whether or not the message was accepted. if it was, there’s a good chance its a real address—and then they’ll figure out how to get actual spam to that person.

    b_ point of spam links
    it used to be for google ranking. but then google and all the blogging systems added the ref=nofollow standard—most blogs add that marking to comment links , and google + many search engines automatically ignore those links. now the spammers are straight up trying to convince people to click on a link for poker/porn/pills.


  12. I didn’t look too hard, but it’s free for personal use, they ask for donations. Am I missing something that you’re seeing while I am not?


  13. Its free for personal use.
    If you make more than $500 a month off your blog, or are a company of any sort, its pricey.

    If they were a standalone app , I honestly wouldn’t care.

    But all the blog systems have pretty much abandoned open source and truly free efforts, and are pushing people towards Askimet by offering it as a plugin by default while they stop pursuing the issue on their own.

    I don’t just think thats bad, I think thats terrible.


  14. I don’t presume to know much about this stuff but with Akismet, don’t they kind of have to do that to make it work well? Meaning the more folks who are using the system, the more they will be able to conquer spam?

    As far as the blog systems offering it as a plugin, isn’t it those particular blog systems you should be annoyed with and not Akismet? Or did one of these blog systems come up with it?

    A question: if I were running my blog, using Tobyjoe’s custom software (as we once had), could I use Akismet? Or do they lock you out if you’re not using a certain engine.

    Totally related yet not related: The one thing that really, really bugs me, however, is that I can’t use my WordPress account, which I had to sign up for to get Akismet, in order to leave comments on blogs that force you to sign in to leave them. And when I went to sign up, they told me the email I used was already in use. Like, duh, I have an account with you already.

    NOW THAT’S something that pisses me off. I am not sure why that is. Can’t they all get on the same page? Annoying.


  15. you can use askimet with anything you want, you just have to write a plugin. wordpress owns askimet, they basically stopped doing open source anti-spam to push their pay service, and then started pushing it into the other engines.

    all collaborative filtering systems need more users. i prefer razor which is entirely free and open source.


  16. I still wonder about what I asked above. If you use a system that doesn’t have as many users, will it be as good? Because, to be perfectly honest, I’ll support anyone who can get rid of comment spam.


  17. i’m kind of a free market fan and don’t see pay services as bad. if they charge too much, the market will respond. if folks stop using freebies and use Akismet, blame the consumers, I say.

    Their providing a nice API and a system that just works is what lead to our using them – not that Mephisto had the option built-in. That just introduced their name to me.

    It does work, too. Quite well. If michele monetizes this site somehow, we’ll evaluate the purchase model and decide at that point whether the product is worth it.

    Sarah – I didn’t know you were describing Bayes poisoning. Like Jon said, it isn’t the purpose of most of that stuff. A multi-tiered approach is ideal to fight spam, though.


  18. That reminds me, I have been meaning to tell you about viagra, poker, and some excellent porn I have found. Email me for more information.


Leave a ReplyCancel reply