Trip Begins

I’m hoping to blog this trip better than my last one, so here goes. I got through security without any troubles and began the trek to my gate which was quite literally the furthest in the airport. I got here find but my shoulder hurt a bit from my carry-on, so I was glad to finally be able to sit down. This part of the airport looks like a mall, just with more places to sit. The food smells great. Anyway as I sat down and opened the Powerbook to write this entry I noticed my pants were sitting a little low — I had left my belt at security. Half an hour later I’m now back where I started. At least I got a little exercise. 🙂 This is going to be my first long trip since I got the iPod and I’m looking forward to enjoying good music and not having to worry about battery life. I’ll see you guys again when I get to the Golden State.

Bloggers Declare Bore

Online Journalism Review writes Bloggers Declare War on Comment Spam, but Can They Win? I’m not sure what that has to do with journalism, but they talk to the same old people and read the same old sites and (not surprisingly) come to the same old tired conclusions. I’m trying to figure it out because I like everyone the article refers to and the article itself is well-written, but it feels very contrived. I think it may be because it draws a lot from blog material a year or more old, and selectively, like the writer had an agenda and Googled until there were enough quotes to fill the space. For example Mark Pilgrim’s blog is called “comment-free” when the entry on the front page for the last three weeks clearly has comments. Is it too much to ask to look at the front page of a blog you’re quoting? The article talks about Blogger redirecting URIs but not about Blogger’s registration aspect. It talks about Typekey but not the PATRIOT act. (Totally kidding there.)

You probably saw this coming from me, but most of all I think it’s silly that they don’t mention a single one of the dozens of other blogging systems that deal effectively with these issues every day. You can’t discuss the Movable Type spam epidemic without talking about people like Molly who tried everything out there including MT-Blacklist to no avail, then switched software and got on with their lives. There is a lot more to the story, but that’s been the conversation over the past year and a lot has come of it. The essence of blogging is communication and comments are here to stay, it’s just a matter of moderation.

Random Photo Returns

The random photos are back. I wrote a quick hack to loop over every photo in every album and read its relevant info into a MySQL table. Now instead of taking a few seconds to get a random photo using the Gallery data stores, it uses a single query and takes a millisecond. Long-time visitors to the site remember that the random image in the corner has always been one of my favorite things about this site, but as the photolog grew to a thousand, two thousand, and then nine thousand images it slowed down more and more. I had to start caching it so it would change once every 15 seconds, then every minute, then every 5 minutes, and then I just manually rotated it for a while. Finally I put the random image out of its misery.

Now it’s fully dynamic, every page you view is completely unique, just like you.

(There goes my bandwidth.)

RSS Bandwidth Usage

Robert Scoble looks at RSS bandwidth usage but unfortunately doesn’t give real numbers. There are a couple of important points in looking at HTML vs. RSS bandwidth usage, some brought up in his comments but I’ll review here:

  1. RSS is not a transport mechanism, and these problems should be handled on the HTTP level. This is faster, better tested, works with caches and proxies, et cetera.
  2. I don’t care if an aggregator checks my feed every 5 minutes, if they support HTTP properly (last-modified headers) the load is neglible for me and them. The bandwidth used each time is around 250 bytes.
  3. Speaking of HTTP, gzip encoding works just as well for RSS feeds.
  4. Bloated HTML in full content feeds will make for bloated feeds. We’ve upped our standards, up yours.
  5. Adjust your number of items to match your posting schedule. If you update two or three times a week, you don’t need 20 items in your news feed, try five. If you update a lot like myself or Robert you run the opposite risk: people who don’t check your feed for a while might actually miss content. I talked to aggregator developers a few months ago about a way to address this, perhaps we need to look at this again.
  6. A visit to a homepage like this generates a lot of requests. First you get the HTML, then your browser requests the CSS files, then it gets all the images, probably about a dozen of them. RSS is generally a single request, and then images embedded in posts may be requested later if that entry is viewed.

That out of the way, how about some real numbers? I can give the best stats for photomatt.net because my RSS feed is on a separate subdomain. Here’s bandwidth usage for August:

photomatt.net – HTML and images, et cetera
56.0 GB
xml.photomatt.net – RSS and other variants
1.7 GB

The ratio for July was similar, 77.6/2.0 GB, I guess I lost some readers in the summer slowdown.

My front page is an average of 17K HTML and about 30K of images, so lets say 50K. The images and CSS are all cached, but I don’t output the proper headers on the HTML because it would be a pain. I would have to check the time of the latest post, check the latest updated link, make sure there’s not a random photo, basically go through a lot of trouble that isn’t really worth it. (When I was caching everything with Staticize the page was stored in its entirety so I sent correct headers, but I don’t bother with that anymore because everything is so fast it doesn’t really make a difference.) All that said, I’m happy with the level of optimization, my HTML and CSS is as streamlined as I want it to be.

My feed, on the other hand, is completely self-contained. I can send headers with confidence because I know everything in that file and can say authoritively when it was last changed. (Actually WordPress handles it all automatically, I don’t worry about it.) Most of the aggregators that hit my site support this. I don’t think about it and they don’t think about it, HTTP just works. My feed has 25 items because of my posting frequency, more than twice the number of items most feeds have. The feed is usually around 10K, and as I already mentioned it’s only one request.

Here’s the kicker: my RSS feed is requested 3 times for every time my front page is loaded. So a HTML page with 1/3 the traffic is using over 30 times the bandwidth. What was that about scalability again?