Weeds in the Garden

Under the Iron has an old interview with Scott Johnson that is a good read. Now scroll down to the comments. Dozens and dozens of spam comments. I see this over and over again on MT and s9y sites. What’s terrible is these pages are just as dangerous as dedicated spam blogs. Think about it: I shouldn’t even be linking to it now.

Alex told me the other day about a new type of comment spam he’s been seeing: comments that link to normal blog entries. Well known blogs like Mozillazine. As advanced as tools like MT Blacklist have become, they’re pretty useless in cases like this. Are you going to blacklist Dave Sifry? Molly.com used to have spam comments on her site all the time. Even though she spent a lot of time and effort dealing with them (a daily chore) they only need to be there long enough for Googlebot to index them for the harm to be done. I’m not dogging on MT here, it’s just that there are tens of thousands of MT blogs out there who don’t have any protection and the spammers are targetting them mercilessly. Domain blacklists don’t scale (spammers can have thousands of domains easily and hijack innocent domains) and centralized registration hasn’t shown to be effective except against people who don’t like centralized registration, a group that doesn’t include spammers.

People used to say that WordPress doesn’t get spam comments because it’s not popular enough. I don’t think this argument holds water anymore. It’s true that MT has three to four times as many blogs as WordPress, but Serendipity has an order of magnitude fewer blogs than WP and is highly targetted by spammers. I think WordPress has, through design and luck, done a lot of things right with regards to comment management in general. First we respond to the problem in the core code quickly. Moderation and blacklisting has been in the core for half a year now. All of the WordPress developers are bloggers as well so we’re pretty sensitive to new techniques in use by the spammers. When early versions of WordPress 1.0 advertised moderation was on spammers instantly adapted to that and started searching for blogs that didn’t have the phrases we used, so in the next nightly build for testers I had changed how that worked so it couldn’t be targeted anymore. Then in 1.2 we expanded the already successful moderation to allow powerful regular expressions and target not just the content but things like number of links in a post. Let’s say that somehow two hundred spam comments did get on your blog, which would never happen in the first place because we’ve had throttling for over a year now, you can easily delete hundreds of spam comments at once in under five clicks. We’re not sitting still either, version 1.3 will have emergent registration based on code originally written by Kitten so there is a type of automatic whitelisting going on that spammers can’t duplicate because it uses email addresses like a secret key and WordPress never reveals your email address. (So Dave and Mark, stop leaving fake ones!) The code will be flexible enough to adapt for GPG signing for the ultra-geeky in the audience.

Any of these things by themself wouldn’t be very effective, and each method I’ve listed has its flaws and weaknesses and I know them. Which brings us to what I think the real reason WordPress, despite its explosion of popularity, still doesn’t get the level of spam other tools do: it’s more trouble than it’s worth. WordPress, to spammers, is an unpredictable and moving target. We’re not resting on our laurels, we have another exciting feature-filled release coming just a few months after the landmark version 1.2. The WordPress moderation system can be be toggled to manual mode, which is 100% effective at catching spam, or triggered only when something is suspicious. We’re committed to keeping the cost high and the reward uncertain for spammers which means you don’t have to wake up every morning to filth on your weblog as well as in your inbox. You can focus on what draws us all to this medium, writing and genuine interaction. Here’s a quote from Molly from a comment she left on Keith’s site:

I wanted open comments. In my situation, MT, despite the wonderful Jay Allen personallyhelping me on an almost daily basis to deal with comment spam, I was a major target. My ISP refused to continue dealing with me because the server molly.com resided on was brought to its knees twice due to spam floods. I was spending up to two hours PER DAY to undo the spam much less post.

Since switching to WP, I’ve had exactly five emails sent to me automagically for moderation. 3 of them were spam, 2 were just enthusiastic posts with multiple links from a reader.

Either way, I had instantaneous access to accept or delete those posts.

That’s the sort of thing that is incredibly rewarding about working on WordPress. Knowing that your work makes it easy for someone else to do what they love is one of the greatest feelings in the world. No amount of money or recognition can ever match that.

22 thoughts on “Weeds in the Garden

  1. “So Dave and Mark, stop leaving fake ones!”

    Not likely.

    It’s great that WordPress never reveals ’em; it’s one of the few, and you should be commended. But there are enough out there that display an email address unencoded that I’ve learned never to trust another site to do the right thing. Memorizing which sites do and which sites don’t is much more mental parsing than I care to dedicate to the task of keeping my inbox spam free, especially when I have an easy out by leaving a pseudo-address instead.

    Besides, I’m sure whitelisting will work just fine since I leave the same pseudo-address every time.

  2. You don’t have to memorize which sites post email addresses and which don’t. It says that email address isn’t shown right above the comment form.

  3. Whitelisting should work better-than-fine, since I know what pseudo-address Mark uses, and I can find out what Dave uses easily enough 🙂

    Maybe what we need is an Interview The Spammed project, to find out just what it is that’s getting people so spammed. For instance, when you create a new weblog in MT, “Email New Comments” defaults to unchecked. Are they spammed like that because they haven’t the slightest idea, because they don’t even know they can get comments emailed to them?

  4. “You don’t have to memorize which sites post email addresses and which don’t. It says that email address isn’t shown right above the comment form.”

    Right, so expand the list of what I don’t want to parse to include instructions on comment forms. This is also why I rarely use HTML in my comments, because I’m not going to bother cross-referencing a list of which tags are allowed and which aren’t because everyone has a different list.

    Comment forms on web logs are horribly unusable because every one seems to be different. To cope, rather than taking the time to learn everybody’s system I cater to the lowest common denominator. I like what I hear about Matt’s plans to improve WordPress, but that still isn’t enough to change my behaviour. The sites I comment on run a wide range of other software, and they’d all have to standardize on the same thing to shake up the way I use comment forms.

  5. Dave, actually your fake address is probably more secret than your real one. 🙂 I can see why you’d be cautious though. I always thought it was silly that MT obscures people’s URLs but puts their email addresses out there in plain text!

    I gave up that fight long ago. The more spam I get the better trained my bayesian filters get. “Bring it on.”

  6. I don’t understand. You say there is “a new type of comment spam…completely normal, topical comments linking to a blog entry on a real blog…” In other words, it is a unique comment that points to another real blog, so what is the spam accusation about? If it’s a legitimate comment, linking to a relevant blog entry, then it’s not pointing to a spam domain or pitching any products, it’s a real comment and not spam.
    Somehow I think you missed something in your description.. like why this sort of message is spam.

  7. My fault, I should probably edit this entry a bit. The links go to a blog post on a real blog that has been infested with spam comments already. So the comment itself doesn’t seem like spam, but it takes you to a page which is full of spam, maybe even on a domain you recognize.

  8. Pingback: Chris' Corner
  9. Pingback: monkinetic
  10. WordPress, to spammers, is an unpredictable and moving target.

    This doesn’t quite make sense.

    WordPress is a moving target only as long as it’s users keep updating. It’s currently uptodate in most places because (I’m guessing) it’s quite an early adopter product at the moment. When WP moves more mainstream, as I imagine you want it to, this will change. In the same way as you say that the tens of thousands of blogs are running unprotected versions of MT, in time people will discover flaws in WP v1.2 & 1.3 and tens of thousands of users will be still using these, blissfully unaware.

    On a separate note, the WP comment system mails out comment notifications with the “From:” header populated with the email of the commenter. Many mail clients gather emails in From: headers into the address book. The system get’s hit with a virus and, bang, the email address that WordPress carefully didn’t reveal is suddenly global property.

    Oh dear, I sound pretty depressed don’t I? Don’t take it that the job Wp does isn’t good… it is. It’s very difficult to control any system which is supposed to only faciliate communication of the right sort.

  11. Simon, not at all. This is all my opinion, so you’re more than welcome to add yours to the mix.

    You bring up a good point with upgrading. We’ll have to bug Carthik to get some version data from the WP blog crawler to get some hard data on WordPress in this regard. My impression is everyone seems to upgrade but the blogs I see are a very small and self-selective group. I think that in general anything we can do to make it as easy to upgrade would contribute to making WordPress a tricky target for spammers. Here are my thoughts on what works, I may turn this into another post:

    * Add features people want (a compelling reason to upgrade)
    * Make instructions simple (overwrite wp-files, run upgrade.php)
    * Don’t surprise people too much (a little is okay)
    * Don’t break customizations (plugins address this a lot, backwards compatibility is king)
    * Publicize the new version
    * Don’t give people any good reason not to upgrade (license change, money)
    * Have a sane and predictable release schedule
    * Tease people (make it sexy)
    * Maintain constant communication with developers

  12. Matt, I’m thinking of what happened to Alex as “second-order comment spamming”. Unfortunately, where you can control first-order spamming—the methods are many, and we know what they are—it’s hard to control what the next guy does.

    I think you have to control your space and go on with life, honestly.

    As far as linking to people who have unclean comment areas: I understand the desire not to link, but avoiding links is akin to “letting the terrorists win”. [I’ll flagellate myself on that one, thanks.] I think you have to link and let things fall where they may, because there’s no telling when the spamming will happen. What about hyperlinks I’ve made in the past and haven’t checked—what if those have become spam targets without my knowing?

    As to Simon’s issue: I would strongly urge, on a non-post.php page, a brief hyperlink to a page with the latest news on WP. CPanel/WHM does this—a nice, one-line statement that says, “You need to upgrade your installation.” Assuming that the version is still outputted in /wp-admin/ pages, it should be trivial to write some JavaScript to check the header against the latest stable version.

  13. I’m curious to know more about the thinking behind the “Domain blacklists don’t scale” idea. I’ve had an excellent experience using the surbl.org lists on the email front. Think about how many more millions of transactions a for-email-use blacklist is going to see than blog software checking out comments.

    I think the harder part to scale is not so much the zone and dns services, but rather the trust mechanisms and moderation of the list.

    As an aside, I’ve always wished I could establish an overlap between domains in comment spam, and the sc.surbl.org blacklist. Alas, I never have made a meaningful correlation.

  14. Damn. Ya know, where are all the Cialis and Online Poker ads when you *really* need them on this site? Just when I was starting to get an itch to throw down $35 on the next big win of Texas hold ’em while also curing erectile dysfunction (or whatever Cialis is supposed to do) I come across this post… Oh well…

  15. Regarding the problem of getting people to update.. wouldn’t it be easy to just have the WP admin check what the latest version of WP is? (perhaps by reading an XML feed or similar from the main WP site.

    If its newer than the current version then it could post a polite informative alert in the admin header (With like.. a little red star or something – to show its important). It could even provide a link to the newest version.

    That way there would be a fool-proof way to make sure users have the most opertunity to upgrade. Not everyone is going to, but 99% probably will eventually. the same way msn, or filesharing programs say “hey, there’s a new version out! do you want to update?” but less invasive and annoying.

    I think it would make a good feature for 1.3, or 1.4 if its too late for that atm, but its probably only 5mins of coding, if that.

  16. First of all, how embarassing.

    UTI has been long off my radar due to the end of my school days, the start of my work days, and vacation. There is some new stuff coming up later, but for now I’ll just shut it down. I’m now deleting violating comments and I just thanked Matt via IM for bringing this to my attention.

    And yes, I’m considering switching to WordPress. But keep your hats on, fanboys, I’m also considering switching to MT3, Textpattern, a home-grown system, et.c. 😉

  17. Good article Matt. I’ll be linking to it in my article on comment spam at my Vaspers the Grate site. Should be posted tonight or tomorrow.

    A famous art magazine online has a discussion forum. Loaded with miserable pervert, race hate, etc. comment spam.

    To bypass filters, there are both copy & paste and random text generation techniques being used.

    Or song lyrics, a new tactic I never saw before.

    Copy & paste, the spambots or human spammers (you tell me) obtain a few paragraphs of whatever, from literary reviews to municipal law texts, plug it into a post, attach URL beneath the text.

    I’ve emailed the editors twice about this prob, but no response.

    Also, why are blog and forum operators so lazy to not just manually delete comment spam, when it’s not hundreds but only a few on a thread? Ridiculous. I always post a comment about how the comment above mine is spam and needs to be removed.

    Tha’ts why they call me Vaspers the GRATE: abrasive avenging angel of the internet!

  18. I’m getting 5 to 10 spam comments a day on my WP1.2 blog…

    …and my blog is almost completly unread! (apart from by spambots – obviously)…

    …the sooner the update rolls out the better 🙂