OpenID and Spam

Magnolia is going to be restricting their signups to only OpenID users:

Why? Because 75% of new accounts being created there lately have been created by spammers using automated tools. Spammers took over Ma.gnolia. Now, the company is using OpenID as a system of 3rd party verified identity and using the superior spam blocking skills of services like Yahoo! and AIM to clean up the Ma.gnolia ranks. Spamfighting could be the incentive that puts many other vendors over the edge to leverage OpenID.

At best this is a Club solution, meaning it’ll be effective as long as Magnolia is not a worthwhile enough target or not enough people use the technique.

Anyone advocating that a Yahoo, Google, or AOL account is going to stop spam signups, sploggers, or anything of the sort is out of touch with the dark side of the internet. The going rate for a valid Google account is about a penny each. For $100 get a text file with 10,000 valid logins and passwords, and go to town. We used to require email verification to signup for WordPress.com, and the vast majority of splogs were coming from Gmail or Yahoo email addresses, hundreds of thousands of them. Myspace and ICQ are both good examples of completely closed identity systems with registration barriers but still overrun with spam.

Each of the big guys probably has an anti-abuse team larger than all of Magnolia fighting these spam signups, but it obviously hasn’t been effective. In theory you could blacklist OpenID providers but who’s going to block Google and Yahoo and even if they did they’re just pushing the problem outward, to the point where spammers eventually run their own identity providers, and if you think they won’t come from millions of unique registered domains look at your comment spam queue.

OpenID has a ton of promise for the web — let’s not hurt it by setting people up for disappointment by telling them it’s a spam blocker when it’s not. Regardless of registration, identity verification, or CAPTCHA, you still need something working at the content level to block spam.

50 thoughts on “OpenID and Spam

  1. I don’t think anything is a spam-fighting panacea, and you’re right that the OpenID approach is primarily a “club” approach. But given that “all of Ma.gnolia” is essentially Todd and Larry, it’s nice to be the ones starting up the club so that, until it gets worse, they can get back to focusing on their core offering instead of fighting off the dark side of the web.

    It’s also true that Google and Yahoo and Microsoft have a shit-ton of spammers abusing their email systems for accounts. That probably won’t change so long as they remain open systems. And it’s also true that independent OpenID providers are no less susceptible to the same kinds of abuse.

    Nevertheless, OpenID seems to be somewhat more useful for fighting spam, especially if used in tandem with other approaches.

    How are you guys going to use OpenID with WordPress? We’re still waiting to see OpenID consumption on wordpress.com. If it’s good enough for Blogger…? 😉

  2. Chris, there seem to be some good plugins for wp.org, but haven’t seen anything we could use on wp.com yet. It’d be interesting to hear any stats the Blogger guys could share about how many OpenID-authed comments they get, and what that is as a percentage of their total. Commenting, though, seems a fairly lame place to enable consumption since 99% of our blogs don’t require registration.

  3. The only thing that is going to block spammers, is making spamming impractical and making the ROI less than or equal to zero. OpenID, from a spammer’s point of view, is practically the same as e-mail. The only thing we can do to prevent it is to start blacklisting… Again… :). And, of course, use Akismet ;-).

    PS: I do consider OpenID to be a step forward! It just won’t prevent spamming on a large scale.

  4. OpenID by itself obviously isn’t the solution. OpenID + XFN + something to glue them together into an open authenticated social network and then you’d have something that could at least help identify non-spam users, but it doesn’t help identify spam users.

  5. What stops someone from creating an openID server and using tons of domains to create many, many bogus openID users?

    This seems as easy as anything else. They can even “taste” the domain for 30 days… plenty of time for spamming.

  6. I’d prefer to see full openID integration with wordpress. It now that 2.5 is out the wordpress plugins that did exist all are now broken.

    I want my openID to work with my site :O)

  7. @Chris – Matt meant “club” as in “the silly piece of metal you wedge onto your steering wheel so a thief will go for an easier target” – not “the loud place where cool people get drunk together”

  8. Argh… I *told* people this sort of thing would happen, but no, they never listen…

    Well, with any luck, it will completely useless and ineffective and they’ll get rid of the idea, thanks to those anon OpenID sites like jkg.in’s.

  9. Yeah, it is lame we haven’t seen wider consumption. But, somehow the powers that be were able to get credit cards accepted just about anywhere (granted, it’s more of an oligarchy) and it seems to have made things better…? I dunno.

    I would like to see stats too — I’ll make an inquiry.

    @Viper007Bond: that logic fails. That’s like saying people use http to serve splogs, so therefore http has failed. Spam isn’t a problem of technology. It’s facilitated by technology, just like the ability of people to blog in non-spammy ways. Email has had this problem before, OpenID will have it next, and we’ll cope.

    @west: Actually, that’s also not an accurate statement. OpenID is simply for remote identification. Ma.gnolia is using OAuth for authorization. Once you’ve identified yourself using remote credentials, Ma.gnolia will determine what you’re *authorized* to do, using secondary (yet to be developed) heuristics.

  10. Matt, in your opinion what would it take for one of the existing openid plugins to be a viable option for wp.com? I know one problem is current incompatibility with the openid server plugin wp.com is using. Are there other blocking issues?

  11. Matt, you have way more experience with spam than we do at Ma.gnolia, so it’s great to hear you chime in on this.

    However, our decision to close off native account registration is really part of a incremental reshaping of identity in Ma.gnolia to continue to build it as a component in the unofficial, loosely joined distributed social network floating out there in the web. Interestingly, no one blogging on our move has picked up on this or bothered to contact me to talk about it.

    The fact that this change has actually lead to a large decrease in the creation of spam accounts is a happy byproduct of removing ourselves from the economy of scale of social bookmark spam. This economy relies on the large number of social bookmarking sites that all work exactly the same way, supporting the same api, allowing for the automated generation and population of accounts. Ma.gnolia is now the exception and not as desirable a target. I’m sure at some point in the future this will change, but I’m not going to complain about the current effect.

    Viper007Bond: You’re argument that by supporting OpenID at Ma.gnolia, we’re really harming it. Really? Really? Wow.

  12. Matt, I don’t think anyone disagrees with you that Ma.gnolia only accepting OpenIDs will not fully eliminate the spam problem. In fact, this was a topic of quite amount of public debate following their announcement. That said, I think you’re partially missing the point.

    When you have an authenticated OpenID user, it is more likely that you can fit them into a larger picture of who they are around the web and answer if they are “good” or “bad”. Comparing this to the state of the art today, a verified email address, it is rare that a service can find out if a particular email is “good” or “bad”. This is due to the fact that email addresses are almost always hidden due to concerns around spam and account linking (as seen on Flickr yesterday). Commenting on your blog right now, you accept that the URL I provided (http://www.davidrecordon.com/) means that I’m actually David, but with OpenID I could prove it. Take this a step further with Google’s Social Graph API and you could see that while you don’t follow me on Twitter, I follow you, and we have many friends in common; maybe my comment then bypasses any form of your moderation.

    In Ma.gnolia’s case, there is nothing preventing them from also displaying a captcha or requiring a verified email address. Just that they’re now able to fit their new users into a much larger picture and make policy decisions accordingly. If an OpenID is a one off (just like there is BugMeNot for email), they could be made to pass a higher bar. If on the other extreme, the OpenID is a friend of Larry’s on another service, then the account is created right away. No, they might not being doing this today, but they are setting themselves up to be in a truly innovative position tomorrow.

  13. Matt,

    While your blog might not require registration, it would be good if we could comment on it without dumping our email addresses into it– there can be spam issues right there. And requiring commenters to be OpenID enabled ended the spam issue on my blog. Maybe not forever, but it’s been a long time.

    As to how to OpenID enable WP.com– it’s surprisingly easy to use the OpenID libraries, even without a plugin directly written for you. I’m sure there are many people in the OpenID community who would jump at the opportunity to bring WP.com on board.

  14. I completely agree that promoting OpenID as the solution to spam registrations could do more harm than good once it becomes clear how easy it is for spammers to create OpenIDs (even on supposedly trusted services such as Yahoo!). However, I believe that OpenID whitelisting could do a great deal to help reduce the moderation burden by enabling known good OpenIDs to bypass the moderation queue. I wrote about this last year: http://simonwillison.net/2007/Jan/22/whitelisting/

  15. @Matt: Re: “That seems like a lot of work for such a small problem.”

    Actually it wasn’t too bad since we already supported OpenIds as a login/signup method. It was mostly just removing the un/pw form and adding our new drop down for selecting your preferred auth type with a bit of refactoring on the backend.

  16. Spam is something that we’ve learned to live with in this day and age unfortunately. Any wall that can be put up will be torn down or walked around as long as there is monetary incentive. OpenID is fine, but it’s not really a move to stop spam.

  17. I would think an authentication or signup system that requires users to receive a confirmation code via SMS would be the best route right now — something like what Google was doing with GMail registrations for awhile. I can understand the privacy implications and the fact that not everyone has access to a mobile phone, but sacrifice needs to be made somewhere.

    We need a way to confirm the unique identity of an individual and right now a mobile number is probably the best option (due to the percentage of the population who have mobile phones).

    If there was a system like OpenID that forced all registrants to periodically “re-authenticate” (perhaps every three months) using the same SMS method, then phone numbers wouldn’t need to be stored and the system would stay relatively spam-free. I say relatively because anything can be bought for a price. 🙂

    Like Xavez said, if it costs the spammers more money than the spamming generates, it won’t be worthwhile. Spammers will never stop, but just as you don’t see the sky full of advertising blimps because it’s so expensive, we need to make it unreasonably expensive for the spammers. The only way I see that happening anytime soon is by linking the virtual world with the real one.

  18. I just want a system that puts an end to the endless signing up to websites I end up doing. And OpenID is that system. Seems to me it should have been rolled into WP a long time ago.

  19. Simon et al:

    When you refer to whitelisting, do you mean individual identity whitelisting or whitelisting of providers? I mean, whitelisting google isn’t going to help, because spammer accounts are a dime a dozen.

    Ciao!

  20. You’re correct in your assessment that using Open ID is not a solution for spam. It just pushes the problem to someplace else. That said, I think that over time Open ID will become a critical piece in the larger puzzle to help control spam more effectively. Open ID will allow bloggers to tie into larger sites with huge user bases and social networks (such as Yahoo!). Then, as those sites open up their social networks, we will have a chance to provide more limited moderation to first, second, and third, etc. degree contacts and utilize their blacklists. Consider a spam filter based on the six degrees of yourself. The possibilities are numerous. I do think there is a place for anonymous comments, although moderation and spam filtering will work differently in the future. This more ideal world is still a few years out; however I’m certain that’s where we’re headed and Open ID will play a big role.

  21. Simon’s definitely right that OpenID alone isn’t a solution to spam and although there will be a lag before spammers start setting up OpenID accounts via their own servers it will undoubtedly happen.

    I’m obviously pleased to see Larry’s decision to switch to all-OpenID login. However, the really big blocker to using OpenID accounts for spamming will come when there is a true cost to losing an OpenID account.

    There is a cost-point and it’s not necessarily high where it doesn’t make sense to spam something. If the marginal cost of the spam is higher than the marginal benefit the spam literally isn’t worth the effort.

  22. @ Otto

    They [the spammers] wouldn’t actually need any pre-existent ‘anonymous’ OpenID provider as it is quick and relatively straight-forward to set one up. By way of example, phpMyID allows me to be my own OpenID provider.

    The benefits of OpenID are mainly ‘user-facing’ in terms of the ease of login. From the point of view of OpenID adopting sites, they should see themselves as providing some added value for their visitors but can’t really expect any benefit beyond the normal method of login.

    From the WordPress/blog platform spam-control point of view, Accepting OpenID for comments is roughly the same as allowing comments-without-login.

    Akismet/YourOwnFilterOfChoice will continue to be your friend 😉

  23. Christian Höltje – when I talk about social whitelisting I mean whitelisting of individual identities, not providers as a whole.

    That said, whitelisting providers is also interesting. If I were building an alumni site for my old university I might decide to allow whitelisted access to any OpenID provided by that institution, on the basis that they would only be available to staff and students and hence someone else would already have confirmed that those people weren’t spammers.

  24. In my opinion, no openID, captcha or webmaster can alone fight spam completely. The best solution, in the Web 2.0 spirit, is to let the users fight the spam. Who else is more capable of selection out which comments that are spam other than the readers of the site?
    In my case, Akismet has worked fine on several websites, but if it were combined with a smart system which let my visitors mark out the spam, it could easily and fast fight the spam. And including openID in this also could make a global and excellent anti-spam system, for all cms.
    Captcha is making more trouble than good. I often type wrong letters, and after a couple of tries, I just give up.

  25. @Robert Accettura: Domains can actually only be “tasted” for a 4-5 day grace period (not 30 days) where they have the opportunity to request a full refund. Fortunately ICANN is planning on taking action against this practice soon, as it’s being abused by spammers with millions of “tasted” registrations.

  26. “Commenting, though, seems a fairly lame place to enable consumption since 99% of our blogs don’t require registration.” “That seems like a lot of work for such a small problem.”

    I see a theme in Matt’s comments: a solution has to address a big enough problem to be worth messing with. OpenID was designed to prove you own the URL associated with your blog comment; but commenter impersonation just isn’t a big problem for most blogs. OpenID is being extended to encompass reputation and therefore conceivably to becoming useful as a spam-blocking tool. But it isn’t there yet, and content-based techniques like Akismet work right now.

    “OpenID has a ton of promise for the web — let’s not hurt it by setting people up for disappointment by telling them it’s a spam blocker when it’s not.” “I just want a system that puts an end to the endless signing up to websites I end up doing.”

    The third way OpenID ended up getting used is as a web single sign-on system, and here is where I think a lot of people see a solution to a big problem. To me it makes a lot of sense to focus on getting some real traction for Web SSO before tackling the much tougher reputation / spam problem, which unlike Web SSO already has some workable solutions.

  27. Gmail had an interesting “invitation only” approach when it first launched its service, perhaps there’s food for thought there.

    If that service had featured a sister-option to dynamically apply to be chosen for an invitation (by another member), maybe that’s enough to block most spammers.

  28. OpenID was designed to prove you own the URL associated with your blog comment

    No, no, no. That’s not *at all* what OpenID was designed to do, and the fact that people keep getting this wrong is what I find so annoying about the whole thing.

    OpenID was designed to being an alternative to having a username and password on every bloody site you use.

    Sign into digg? Enter your username and password. Sign into wordpress.com? Enter a different username and password. Annoying, no? OpenID was designed to allow single sign on. You login to your own site, and then your site tells anybody else that asks whether you’re valid or not.

    It has been extended to allow for easier site-registration. If I want to create an account on newdigg.com or something, I can sign-in with my OpenID and it gets my info from my OpenID provider.

    But applying OpenID to any other problem is a complete and total misuse of it. It does not stop spam and it does not prove identity. It does provide authentication and limited information transfer, but that is all.

  29. Kjetil has the right Idea – we need to start having more “mark as spam” and “report” buttons in more systems.

    Plus it would be cool if we could use Googles social api (with XFN) to check that only people that have been around on the web (have a blog/twitter/myspace/etc.) are allowed to comment. (of course that would mean grandpa would be in trouble).

  30. I don’t understand either of Otto’s comments.

    His first talks about “I told people this will happen”. What’s happened? Ma.gnolia picked an alternate authentication mechanism permanently, and found it as a way to cut spam in the process?

    Ok, fine. OPENID IS NOT THE SOLUTION TO SPAM. We get it Otto. Thanks. Anyone who thinks so didn’t bother to learn anything about OpenID, and become incorrectly educated.
    There’s really no fix for that.

    His second is an attempt to laugh right in Brad Fitzpatrick’s face…

    OpenID Authentication provides a way to prove that an end user controls an Identifier. It does this without the Relying Party needing access to end user credentials such as a password or to other sensitive information such as an email address.

    The very first line that has ALWAYS been present in the OpenID specs, even when it was called YADIS.

    OpenID doesn’t give you a universal username, it says (for example) “Matt Mullenweg owns ma.tt”. It however, serves a good purpose as a decentralized, globally unique identifier.

    I hear you, Otto. I don’t want to kill anonymity. I however appreciate the improvement to identity.

  31. Hello,

    To avoid spam registratiopn, WP pluging SABRE works like a charm (with captcha, confirmation, various detection schemes without been too annoying to a regular subscriber).

    Luc

  32. On a SXSW panel on OpenID I asked the panelists whether it would reduce comment spam. A vigorous debate ensued, with no conclusion. One made the comment that OpenID would help in compiling a whitelist, which is a better method in the long term.

  33. Jason:

    If you cannot see the difference between “OpenID Authentication provides a way to prove that an end user controls an Identifier.” vs. “OpenID was designed to prove you own the URL”, then I’m not certain that I can explain it to you. “Control” and “own” are two entirely different things. Yes, proof of ownership would provide proof of identity, but “control” is wider in scope, and provides no clue to identity.

    Furthermore, neither one of these really has anything to do with “Identity”, as Brad only talks about “Identifier”, which is a critical difference. An Identifier is the same thing as the letter I in URI.

Leave a Reply to Pascal Van HeckeCancel reply