Bloglines is DOSing blog providers. Every other major crawler implements some sort of per-resolved-IP throttle, why can’t Bloglines? Even if there were a way to opt-out of their hundreds of simultanous crawlers descending on your service, it seems to me the default behavior should be to not be harmful, and then work with large providers on a case-by-case basis to increase the concurrency of requests. We don’t have this problem with any other aggregator or crawler, hosted or non-hosted. Test: freedbacking
And unfortunately Bloglines has now reached critical mass, where they’re the one company folks think of for online newsreading. Sigh. There’s better alternatives out there (I use the newsgator/FeedDemon combo) but unfortunately folks keep gravitating to Bloglines…
Perhaps they could instruct the bloglines bot to only visit blogs under hosted blog providers, like WordPress.com, when it receives a ping.
James, we already ping them through Ping-O-Matic and they’ve said this changes their crawling behavior, but I imagine edublogs is using Ping-O-Matic as well.
Matt, I hate to trouble you but care to share the IP address or block that these requests are coming out of? I see that the BlogLines website is at 65.214.39.* but the closest IP I can find on our routers right off is 65.214.44.* and that’s a UUNET dialup.
Sadly, many of the “free blog” providers are not good at patrolling the content of their domains, and are not quick to block obvious splog activity. With some spammers coming into the game and setting up hundreds of splogs and thousands of pages of junk, it is a serious problem. Indexes like bloglines can actually be overwhelmed by agressive spammers, which is turn makes thier site (and commercial product) unusuable.
It is all about satisfying the end users, and the end users don’t want spam.
I don’t see it as an issue of bloglines, but rather an issue of blog providers who just don’t seem to care enough to keep their space clean.
Alex, that’s 100% wrong for two reasons. One, WordPress.com is aggressively policed against splogs. Two, Bloglines only polls things that people are subscribed to. So unless people are subscribing to splogs, they wouldn’t have the problem you describe.
drmike, they come from 65.214.44.29.
I am sorry, but I can set up a splog in one minute and subscribe to it one minute later, not really a big deal. You keep forgetting that if spammers see a crack in the system, they will exploit it, even if it means setting up a method to subscribe or cross subscribe to as many splogs as they can put up. Whatever the requirement, they can make it happen. How else do you think some smart boy got 5 billion pages into Google so fast? Comment spam and splogs.
Well, am kinda glad I’m not alone… if you know what I mean. Maybe they’ll do something if we kick up a big enough fuss.
They are coming from 65.214.44.29 at me too.
Problem is I’ve got a server and the ability to shout at tech support whereas you present seminars on enterprise architecture š
Am probably hosting 25k+ blogs now though (c. 10k active) – any thoughts for PM readers on how I can still allow Bloglines in but stop it from crashing the server are v. much appreciated.
Alex, I have personally been monitoring all content on edublogs.org and associated sites since it was created… pretty much every post if possible. Yeh a few splogs get through (and then get squashed eventually) but with that and comment spam I couldn’t have been working harder to sort it… not 100% there yet but doing me best – there are some people with WPMU setups who don’t do this but please don’t lump us all into the same basket.
Matt, I’m pretty sure I still get Bloglines asking for a few feeds that I know have no subscribers, at Bloglines or anywhere else. Minor point, but they do continue to crawl feeds with no subscribers. Based on that and your comment above, mightn’t they crawl any feed they get a ping from, too?
I was always under the impression that Bloglines would cache feeds to blogs on their own system, but I guess that was just wishful thinking (or common sense). For instance, why would they want to increase their own bandwidth by sending out requests and then parsing the response for each and every user, when they could easily create a list of all the feeds subscribed to by their users and cache the individual posts as they come in?
Based on this, I may have to pack up my OPML data and move on to another service (and stop recommending Bloglines). I was due for a cleanup of my feeds anyway.
Thanks, Matt.
Don’t know why I was thinking that was a dialup. It’s even labeled as such. Router #1 log show it as the 5th heaviest IP address making connections on my boxes, right behind 3 Google IPs and that idiot spammer that I booted years ago out of Midland, Texas on a RR address.
I know that Bloglines started responding to Pingomatic pings back in April. I wonder as well if they’re spidering with a ping even if they have no subscribers.
How widespread is this Bloglines pestilence? Is this affecting all blog hosts, whether WP(MU) or not?
I was just reading my Stats on my privately hosted blog, the 65.214.44.29 address has sucked up more than 18 Megs of bandwidth this month. That’s excessive – they are the highest bandwidth user I have. I have the bandwidth to spare, but I wonder if this will bog down the server?