Monthly Archives: September 2004

Random Photo Returns

The random photos are back. I wrote a quick hack to loop over every photo in every album and read its relevant info into a MySQL table. Now instead of taking a few seconds to get a random photo using the Gallery data stores, it uses a single query and takes a millisecond. Long-time visitors to the site remember that the random image in the corner has always been one of my favorite things about this site, but as the photolog grew to a thousand, two thousand, and then nine thousand images it slowed down more and more. I had to start caching it so it would change once every 15 seconds, then every minute, then every 5 minutes, and then I just manually rotated it for a while. Finally I put the random image out of its misery.

Now it’s fully dynamic, every page you view is completely unique, just like you.

(There goes my bandwidth.)

RSS Bandwidth Usage

Robert Scoble looks at RSS bandwidth usage but unfortunately doesn’t give real numbers. There are a couple of important points in looking at HTML vs. RSS bandwidth usage, some brought up in his comments but I’ll review here:

  1. RSS is not a transport mechanism, and these problems should be handled on the HTTP level. This is faster, better tested, works with caches and proxies, et cetera.
  2. I don’t care if an aggregator checks my feed every 5 minutes, if they support HTTP properly (last-modified headers) the load is neglible for me and them. The bandwidth used each time is around 250 bytes.
  3. Speaking of HTTP, gzip encoding works just as well for RSS feeds.
  4. Bloated HTML in full content feeds will make for bloated feeds. We’ve upped our standards, up yours.
  5. Adjust your number of items to match your posting schedule. If you update two or three times a week, you don’t need 20 items in your news feed, try five. If you update a lot like myself or Robert you run the opposite risk: people who don’t check your feed for a while might actually miss content. I talked to aggregator developers a few months ago about a way to address this, perhaps we need to look at this again.
  6. A visit to a homepage like this generates a lot of requests. First you get the HTML, then your browser requests the CSS files, then it gets all the images, probably about a dozen of them. RSS is generally a single request, and then images embedded in posts may be requested later if that entry is viewed.

That out of the way, how about some real numbers? I can give the best stats for photomatt.net because my RSS feed is on a separate subdomain. Here’s bandwidth usage for August:

photomatt.net – HTML and images, et cetera
56.0 GB
xml.photomatt.net – RSS and other variants
1.7 GB

The ratio for July was similar, 77.6/2.0 GB, I guess I lost some readers in the summer slowdown.

My front page is an average of 17K HTML and about 30K of images, so lets say 50K. The images and CSS are all cached, but I don’t output the proper headers on the HTML because it would be a pain. I would have to check the time of the latest post, check the latest updated link, make sure there’s not a random photo, basically go through a lot of trouble that isn’t really worth it. (When I was caching everything with Staticize the page was stored in its entirety so I sent correct headers, but I don’t bother with that anymore because everything is so fast it doesn’t really make a difference.) All that said, I’m happy with the level of optimization, my HTML and CSS is as streamlined as I want it to be.

My feed, on the other hand, is completely self-contained. I can send headers with confidence because I know everything in that file and can say authoritively when it was last changed. (Actually WordPress handles it all automatically, I don’t worry about it.) Most of the aggregators that hit my site support this. I don’t think about it and they don’t think about it, HTTP just works. My feed has 25 items because of my posting frequency, more than twice the number of items most feeds have. The feed is usually around 10K, and as I already mentioned it’s only one request.

Here’s the kicker: my RSS feed is requested 3 times for every time my front page is loaded. So a HTML page with 1/3 the traffic is using over 30 times the bandwidth. What was that about scalability again?

Firefox Worm

Adam Kalsey doesn’t recommend Firefox because it doesn’t address the needs of users who don’t understand what a “browser” is and he jabs at the Firefox site. I’ve helped people like this and it’s a humbling experience. The IE info page is much worse, especially if you click on any of the links, but people don’t worry about it because IE is always there. Which prompts the obvious answer: a worm that transparently replaces IE with Firefox.