Percentage of Splogs

I’ve been indicated a few places saying a third of blogs are spam. Someone came up with this by me saying we’ve axed around 800,000 splogs on WordPress.com, and looking at our number of blogs, which is 2.5m.

As for percentage of the total blogosphere, reported by Technorati as north of 100 million, which are splogs, I’d say the number is much higher – probably 80%. This isn’t as bad as it sounds, I just think spammers are very effective at creating hundreds of thousands to millions of blogs, they tend to stick around, and I feel like Technorati’s number doesn’t doesn’t adequately scrub these out.

While I’m making data-less estimates, I’d say there are about 25-30 million non-spam blogs, and about 8-14 million of those are active in terms of getting traffic or new posts. You could cover a meaningful portion of the blogosphere by just indexing 4 or 5 million blogs.

Splogs and blogger attrition are two problems no one really talks about, but that’s okay because I don’t think either is hindering anyone’s growth as measured by metrics that matter, like pageviews or uniques. (Though many of the services supporting so many splogs must have an inordinate amount of resources devoted to them.)

See also: Blog Ping and Spam Statistics, WordPress.com February wrap-up.

36 thoughts on “Percentage of Splogs

  1. I’d say the raw spam numbers in the pingosphere are a bit higher.

    I blogged about this a few weeks ago:

    http://blog.spinn3r.com/2008/01/blog-ping-and-s.html

    We’re seeing closer to 90%

    I’m not sure I’d say that 80% of spam blogs are on hosted platforms though. I’m seeing low spam numbers on wordpress.com. Oddly enough the biggest splog host by FAR is Blogger.

    (They really should be ashamed of themselves. This problem should have been fixed years ago).

    The spam problem problem is one big reason we convert a lot of clients to Spinn3r. They’d just rather NOT deal with spam if they can avoid it……

    Kevin

  2. In the words of someone famous,
    “Spam, spam, spam, spam, spam…oh, I hate spam!”

    Maybe all we need one day is a gigantic barbecue? 🙂

  3. More are definitely on Blogger, and a good chunk of the ones that steal content from me that are self-hosted are on Dreamhost servers, and they won’t do anything about, unless contacted by courts (I e-mailed them about it, and they basically told me to take a flying leap).

  4. @Sarah Sorry to hear that. If there was a bad content excerpt on a genuine blog I would however hope that contacting the author would render a more pleasant resolution.

    To the actual post: For everything that enhances the ways we use the internet there is something nasty ready to adapt and abuse it. The guessed statistics are quite frankly scary to consider… I for one do not envy you, even if you are wearing some impressive armour out there on the front-line.

  5. You are falling into the same hole that many North American Internet folks fall into… 😉 Don’t forget the world has almost as many Chinese-language netizens, many millions of Indian bloggers, tons of Spanish bloggers… and the list goes on. And many of them use WordPress. I know you are perhaps referring to English-language blogs, there are probably as many other-language splogs (and blogs).

  6. I was astounded by those February wrap-up numbers. If there are only 725,000 active blogs, give or take, yet more than 2.5 million wordpress.com blogs, that’s a lot of chaff in the wheat. If the vast majority are spam or dead blogs, what good are the numbers, really? A hundred million times zero is still zero.

  7. One of the problems that I have found with splogs is not so much the amount of time and effort these ‘people’ are able to put in to their work, but the ease with which they can do it. Many I have recently seen do not need to create individual splogs as such, they can use WordPress MU and similar resources and just keep on creating new sub-domains when one gets knocked off the map.

  8. Spam is just entropy, it spins the bits around, but doesn’t accomplish anything useful. And, if you’re unlucky and can’t automatically filter it all out without discarding real stuff, you waste human time too.

  9. It’s easy to hate spammers. The people I hate are the morons who click on their links. Without those clicks the spam would go away.

  10. As long as there are blogs there will be splogs. I’ve accepted it as a fact of blogging that my content will be stripped and placed on a splog somewhere. I no longer worry too much about it since it’s inevitable.

    And Keven is right. Something should have been done about Blogger years ago.

    Sarah, if Dreamhost is refusing to kill spam blogs, then make a post on WebHostingTalk.com about it. It is a major forum for webhosts that sees tons of traffic everyday. I’ve seen the owners of all the major webhosts on that forum at one time or another.

  11. This carefree attitude about splogs is distressing coming from someone with clout in the WordPress community. Splogs are already a significant burden on search quantity, I must get them at least a third of the time when searching blogs. Instead of saying “this isn’t hurting me or mine”, I wish you would think about technical ways to limit their growth. None of the solutions look appealing, but this is a major problem. Do we need some sort of captcha validation mechanism on syndication? Or how about a cooperative database like Akismet?

  12. Fred, we spend significant resources on limiting the growth of splogs, and our product Akismet was one of the first to target web spam specifically, it has blocked billions of spams from getting on blogs.

  13. The one thing that really amazes me about spam is that most of it has no point, no real products in mind, no way to contact them to buy anything. It just do not promote anything at all. Just meaningless gibberish. Why send something like that out at all? Are there any competitions out there that rewards the one that can send out the most spam messages or post the most spam posts?

  14. No-one has mentioned the motivation driving sploggers.

    Ads.

    Doesn’t take a genius to connect the dots mentioned in these comments.

  15. @Pi, Speaking of how spammers can set up so many sites so quickly, I ran across this article a while back giving step by step instructions on how to find WordPress MU sites to abuse.

    (I’ll put some spaces in the URL so as not to link to them.)
    http:// www. earnersblog .com/ wordpress-mu/

    Basically, they are using Google to search for text within wp-signup.php, which indicates a MU site. To make it even worse, they’re working on automating the process.

    The good news is we can cut down on these splog registrations by making sure Google doesn’t index those files.

  16. Personally I think splogs are ruining the internet. They take valid blog posts and reap the profits of all the good traffic they get for just plagiarizing other writers. Although spam is tasty fried on a sandwich, but thats rather off topic.

  17. A salute to Matt for talking about the down side.

    After a month of daily-refreshed registration-spams from Poland, I disabled Registration (no loss..). In the next few days, I will hack registration.php to install a little do-something I/O so their script gets a pass-through.

    (Actually, I will probably first try to borrow the “Required Fields” code from comments.php. My spam-reg is not filling in names, etc.)

    I doubt this Poland-domain is about Ads. It looks like a setup. It could be bad.

    Though most comment-spam is certainly the Ad-game, a few are probably playing some other game.

    For example, TanTanNoodles’ Simple Spam Filter reads all the words in all my spams. I had to make it quit doing that (every time), since it is ridiculous. But note, a simple little script has no problem accessing all my old spams. Obviously, I’m keeping them, and equally obviously, their retention could be handy to someone who appears to be stuffing ‘senseless’, ‘pointless’ spams in my box. Again, it could be a set-up.

    Most spam is no more than what it looks like, but the sheer volume provides plenty of cover for more worrisome operations.

  18. I have a big problem with splogs scraping my content and republishing it. Thanks to Akismet I find trackbacks from several of them a week. With comment spam, I can easily delete it (or in most cases Akismet gets it before I even see it), but there’s not much we can do about splogs scraping & stealing our content.

    I did report one splog hosted at wordpress.com and it was taken down the same day.

  19. If the numbers you guesstimate are believable, it’s weird to think that our two-parent, two-kid household (with its 10 active non-splogs — I have seven(?!), the rest have one each) makes up one millionth of the entire active non-splogosphere by ourselves!

  20. We are still reviewing every single blog that joins our community and have to say we turn away around a third of all blogs joining as they are spam or porn. This has remained constant for the past 6 months. I’d be happy to forward to someone any WP splogs when we reject them if you guys would like that 🙂

Leave a Reply to ian in hamburgCancel reply