Categories
Asides Spam Wikipedia

Wikipedia Spam

Sometimes I’m amazed at how much manual labor the Wikipedia uses. For example, how long can this type of spam protection go on before it becomes overwhelming?

11 replies on “Wikipedia Spam”

Well, it’s hard to defend against a wiki, I would guess, since the underlying principle of a wiki is to open it up to everyone for editing. I don’t know how well Akismet or Bad Behavior would fare against this use, since it really seems to me to be just a list of URL’s. Bad Behavior could defend against robots, but, like I said, it’s hard to combat spam or racketeering on a wiki…

Sounds like a job for Akismet.

All glibness aside, I wonder (a) what it would take to successfully integrate the Akismet API into a program like MediaWiki and (b) what kind of resources would need to be poured into Automattic to make it possible with regard to server load and resource usage.

Wikipedia (and the software it runs Wikimedia) are many people’s first experience with wikis, and it can be frustrating one, where too much time is spend removing the spam.

Wikimedia claims other spam protection features, but for me it was not a good “out of the box” experience (a year ago). Hopefully, that has changed.

Akismet is being used now for wikis and bugs on Trac, and I hear the results have been good. For wikis you just submit the diff and it works really well.

I don’t know how many edits a day Wikipedia gets, but unless it’s 5 million+ it’s unlikely it’d require any additional server resources on the Akismet side.

It’d certainly work for the automated spam, which seems to be mostly what that page deals with, but Wikipedia still has to deal with much more subtle forms of unwanted content, though I wouldn’t classify most as spam.

Wikipedia has never met a blacklist it didn’t like. Yes, I suppose fascism can make the encyclopedia run on time, but I don’t want to live under it. Secret, unaccountable blacklists are the height of fashion — witness the TSA no-fly list. I’m still waiting for this rerun of the McCarthy era to end.

Blacklisting URLs is like blacklisting email addresses, useless functionality, in my view even with regexp. The spammer can just rewrite some words to cryptic or use the HTML notation ( &+#+xx+; ) and the filter is bypassed. In my view catchas but not graphical ones (which would be against people with disabilities) I mean “Do you pass math” or “Elvis’ surname?” captchas. 🙂

Or in action-attribute of the form like my plugin does. 😉

What might help a lot would be Wikipedia utilizing the OpenID plugin… this is what I do on my blog, and I get zero spam. (And on other blogs as well.)

There’s one available for Mediawiki, but for WordPress: http://the-notebook.org/12/01/2006/openid-comments-for-wordpress/

Just say users can’t register themselves, you must be logged in to comment, then activate this plugin and it’ll handle the rest. (That’s to turn off non-OpenID commenting, of course. You can just follow the instructions to use it normally.)

Brendan, that’s probably because not enough people use OpenID or the plugin to make it worth spamming. There are a thousand things you can do to your blog to make it just different enough to discourage spammers, this is generally known as a club solution.

[…] Akismet. A must for WP blogs… I’ve complained in the past how their service is somewhat of a blackhole and I can’t seem to rescue a few of the people who leave comments on RCG from the Akismet spam filter. Nonetheless, the service catches hundreds of spam messages every moment. If I didn’t have a life, I could just keep hitting refresh on my spam filter and there would always be another spam message to delete. Matt, if you’re listening… Here are the two improvements I’d like to see. (1) A way to not have spam limited to showing only the most recent 150 spam messages. Recently, I’ve had two different people leave comments who say there were picked up by the spam filter, but because my queue had already built up to greater than 150 messages, I had no way to rescue them. When I hit the “delete all” button, I only (wrongly) reinforced that these people were spam. (2) A way to rescue people who are labeled spam from deep within the blackhole of Akismet’s database. Galen, one of RCG’s contributor’s has to go “save himself” every time he posts. This sucks! (but is better than me dealing with 450 spam messages a day!) […]

SHARE YOUR THOUGHTS