Wikipedia Spam

Sometimes I’m amazed at how much manual labor the Wikipedia uses. For example, how long can this type of spam protection go on before it becomes overwhelming?

11 thoughts on “Wikipedia Spam

  1. Well, it’s hard to defend against a wiki, I would guess, since the underlying principle of a wiki is to open it up to everyone for editing. I don’t know how well Akismet or Bad Behavior would fare against this use, since it really seems to me to be just a list of URL’s. Bad Behavior could defend against robots, but, like I said, it’s hard to combat spam or racketeering on a wiki…

  2. Sounds like a job for Akismet.

    All glibness aside, I wonder (a) what it would take to successfully integrate the Akismet API into a program like MediaWiki and (b) what kind of resources would need to be poured into Automattic to make it possible with regard to server load and resource usage.

  3. Wikipedia (and the software it runs Wikimedia) are many people’s first experience with wikis, and it can be frustrating one, where too much time is spend removing the spam.

    Wikimedia claims other spam protection features, but for me it was not a good “out of the box” experience (a year ago). Hopefully, that has changed.

  4. Akismet is being used now for wikis and bugs on Trac, and I hear the results have been good. For wikis you just submit the diff and it works really well.

    I don’t know how many edits a day Wikipedia gets, but unless it’s 5 million+ it’s unlikely it’d require any additional server resources on the Akismet side.

    It’d certainly work for the automated spam, which seems to be mostly what that page deals with, but Wikipedia still has to deal with much more subtle forms of unwanted content, though I wouldn’t classify most as spam.

  5. Wikipedia has never met a blacklist it didn’t like. Yes, I suppose fascism can make the encyclopedia run on time, but I don’t want to live under it. Secret, unaccountable blacklists are the height of fashion — witness the TSA no-fly list. I’m still waiting for this rerun of the McCarthy era to end.

  6. Blacklisting URLs is like blacklisting email addresses, useless functionality, in my view even with regexp. The spammer can just rewrite some words to cryptic or use the HTML notation ( &+#+xx+; ) and the filter is bypassed. In my view catchas but not graphical ones (which would be against people with disabilities) I mean “Do you pass math” or “Elvis’ surname?” captchas. 🙂

    Or in action-attribute of the form like my plugin does. 😉

  7. What might help a lot would be Wikipedia utilizing the OpenID plugin… this is what I do on my blog, and I get zero spam. (And on other blogs as well.)

    There’s one available for Mediawiki, but for WordPress:

    Just say users can’t register themselves, you must be logged in to comment, then activate this plugin and it’ll handle the rest. (That’s to turn off non-OpenID commenting, of course. You can just follow the instructions to use it normally.)

  8. According to my brother (who works as an admin for Wikipedia), they typically get 300,000-400,000 edits per day across the various wikipedia sites (en, de, etc.)

  9. Brendan, that’s probably because not enough people use OpenID or the plugin to make it worth spamming. There are a thousand things you can do to your blog to make it just different enough to discourage spammers, this is generally known as a club solution.