Robert Scoble looks at RSS bandwidth usage but unfortunately doesn’t give real numbers. There are a couple of important points in looking at HTML vs. RSS bandwidth usage, some brought up in his comments but I’ll review here:
- RSS is not a transport mechanism, and these problems should be handled on the HTTP level. This is faster, better tested, works with caches and proxies, et cetera.
- I don’t care if an aggregator checks my feed every 5 minutes, if they support HTTP properly (last-modified headers) the load is neglible for me and them. The bandwidth used each time is around 250 bytes.
- Speaking of HTTP, gzip encoding works just as well for RSS feeds.
- Bloated HTML in full content feeds will make for bloated feeds. We’ve upped our standards, up yours.
- Adjust your number of items to match your posting schedule. If you update two or three times a week, you don’t need 20 items in your news feed, try five. If you update a lot like myself or Robert you run the opposite risk: people who don’t check your feed for a while might actually miss content. I talked to aggregator developers a few months ago about a way to address this, perhaps we need to look at this again.
- A visit to a homepage like this generates a lot of requests. First you get the HTML, then your browser requests the CSS files, then it gets all the images, probably about a dozen of them. RSS is generally a single request, and then images embedded in posts may be requested later if that entry is viewed.
That out of the way, how about some real numbers? I can give the best stats for photomatt.net because my RSS feed is on a separate subdomain. Here’s bandwidth usage for August:
- photomatt.net – HTML and images, et cetera
- 56.0 GB
- xml.photomatt.net – RSS and other variants
- 1.7 GB
The ratio for July was similar, 77.6/2.0 GB, I guess I lost some readers in the summer slowdown.
My front page is an average of 17K HTML and about 30K of images, so lets say 50K. The images and CSS are all cached, but I don’t output the proper headers on the HTML because it would be a pain. I would have to check the time of the latest post, check the latest updated link, make sure there’s not a random photo, basically go through a lot of trouble that isn’t really worth it. (When I was caching everything with Staticize the page was stored in its entirety so I sent correct headers, but I don’t bother with that anymore because everything is so fast it doesn’t really make a difference.) All that said, I’m happy with the level of optimization, my HTML and CSS is as streamlined as I want it to be.
My feed, on the other hand, is completely self-contained. I can send headers with confidence because I know everything in that file and can say authoritively when it was last changed. (Actually WordPress handles it all automatically, I don’t worry about it.) Most of the aggregators that hit my site support this. I don’t think about it and they don’t think about it, HTTP just works. My feed has 25 items because of my posting frequency, more than twice the number of items most feeds have. The feed is usually around 10K, and as I already mentioned it’s only one request.
Here’s the kicker: my RSS feed is requested 3 times for every time my front page is loaded. So a HTML page with 1/3 the traffic is using over 30 times the bandwidth. What was that about scalability again?