Tag Archives: Web

News.com Leads Blog Communication

This is the coolest thing I’ve seen all year. Check out the HTML of this article I linked a few days ago. Notice anything at the top?

<link rel="pingback" href="http://tb.news.com/p2t.cgi/2100-1032-5368454" />

Houston, we have Pingback support! Let’s dig deeper:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
<rdf:Description
rdf:about="http://news.com.com/2100-1032-5368454.html"
dc:title="Microsoft flip-flop may signal blog clog"
dc:identifier="http://news.com.com/2100-1032-5368454.html" />
trackback:ping="http://tb.news.com/tb.cgi/2100-1032-5368454"
</rdf:RDF>

Ugly as sin, but that’s trackback. It gets better…

A little URI hacking takes us to this page which lists all trackbacks and pingbacks the article recieved. How cool is that?

It’s my understanding that even though they’ve had the trackback autodiscovery code for a while they’ve been recieving mostly pingbacks, which makes sense given that it’s more fully and elegantly automatic. It would be cool if they could add support for the nascent rel="trackback" discovery method and save themselves the trouble of the RDF hack. Hopefully spammers won’t exploit their trackback server too soon and they can support legacy systems that don’t implement Pingback yet.

The implications of this are fairly large. News.com is obviously bootstrapping code that will involve their readers with the blog conversation surrounding their articles. How long for other sites to catch up? Will they plug into Technorati or Pubsub next? As far as I know this is the first major media organization to implement Trackback and Pingback. The team at News.com should be commended for their effort and leadership in this area.

Bloggers Declare Bore

Online Journalism Review writes Bloggers Declare War on Comment Spam, but Can They Win? I’m not sure what that has to do with journalism, but they talk to the same old people and read the same old sites and (not surprisingly) come to the same old tired conclusions. I’m trying to figure it out because I like everyone the article refers to and the article itself is well-written, but it feels very contrived. I think it may be because it draws a lot from blog material a year or more old, and selectively, like the writer had an agenda and Googled until there were enough quotes to fill the space. For example Mark Pilgrim’s blog is called “comment-free” when the entry on the front page for the last three weeks clearly has comments. Is it too much to ask to look at the front page of a blog you’re quoting? The article talks about Blogger redirecting URIs but not about Blogger’s registration aspect. It talks about Typekey but not the PATRIOT act. (Totally kidding there.)

You probably saw this coming from me, but most of all I think it’s silly that they don’t mention a single one of the dozens of other blogging systems that deal effectively with these issues every day. You can’t discuss the Movable Type spam epidemic without talking about people like Molly who tried everything out there including MT-Blacklist to no avail, then switched software and got on with their lives. There is a lot more to the story, but that’s been the conversation over the past year and a lot has come of it. The essence of blogging is communication and comments are here to stay, it’s just a matter of moderation.

RSS Bandwidth Usage

Robert Scoble looks at RSS bandwidth usage but unfortunately doesn’t give real numbers. There are a couple of important points in looking at HTML vs. RSS bandwidth usage, some brought up in his comments but I’ll review here:

  1. RSS is not a transport mechanism, and these problems should be handled on the HTTP level. This is faster, better tested, works with caches and proxies, et cetera.
  2. I don’t care if an aggregator checks my feed every 5 minutes, if they support HTTP properly (last-modified headers) the load is neglible for me and them. The bandwidth used each time is around 250 bytes.
  3. Speaking of HTTP, gzip encoding works just as well for RSS feeds.
  4. Bloated HTML in full content feeds will make for bloated feeds. We’ve upped our standards, up yours.
  5. Adjust your number of items to match your posting schedule. If you update two or three times a week, you don’t need 20 items in your news feed, try five. If you update a lot like myself or Robert you run the opposite risk: people who don’t check your feed for a while might actually miss content. I talked to aggregator developers a few months ago about a way to address this, perhaps we need to look at this again.
  6. A visit to a homepage like this generates a lot of requests. First you get the HTML, then your browser requests the CSS files, then it gets all the images, probably about a dozen of them. RSS is generally a single request, and then images embedded in posts may be requested later if that entry is viewed.

That out of the way, how about some real numbers? I can give the best stats for photomatt.net because my RSS feed is on a separate subdomain. Here’s bandwidth usage for August:

photomatt.net – HTML and images, et cetera
56.0 GB
xml.photomatt.net – RSS and other variants
1.7 GB

The ratio for July was similar, 77.6/2.0 GB, I guess I lost some readers in the summer slowdown.

My front page is an average of 17K HTML and about 30K of images, so lets say 50K. The images and CSS are all cached, but I don’t output the proper headers on the HTML because it would be a pain. I would have to check the time of the latest post, check the latest updated link, make sure there’s not a random photo, basically go through a lot of trouble that isn’t really worth it. (When I was caching everything with Staticize the page was stored in its entirety so I sent correct headers, but I don’t bother with that anymore because everything is so fast it doesn’t really make a difference.) All that said, I’m happy with the level of optimization, my HTML and CSS is as streamlined as I want it to be.

My feed, on the other hand, is completely self-contained. I can send headers with confidence because I know everything in that file and can say authoritively when it was last changed. (Actually WordPress handles it all automatically, I don’t worry about it.) Most of the aggregators that hit my site support this. I don’t think about it and they don’t think about it, HTTP just works. My feed has 25 items because of my posting frequency, more than twice the number of items most feeds have. The feed is usually around 10K, and as I already mentioned it’s only one request.

Here’s the kicker: my RSS feed is requested 3 times for every time my front page is loaded. So a HTML page with 1/3 the traffic is using over 30 times the bandwidth. What was that about scalability again?

WordPress.org Search

I’ve ripped out the guts and redone the search on the WordPress.org support forums in the hopes of making it something more people will use. Try it out! The new system searches the wiki (hosted on a different machine), thread titles, recent posts, and does a FULLTEXT post search for the most relevant posts. It has contextual search highlighting (like Google).

When I have some time to get back to this every section will have a “more of this” link to take you to more results (paged). It does this currently with the wiki search, counting the total results and linking to the wiki search directly if there are more than 5 results. Probably still a few bugs to work out. The fulltext query was taking over two seconds to run until I tweaked the JOIN type to get the MySQL optimizer to use the proper index and join order. Everything should validate as XHTML.

A new system is also in place to inject custom results at the top of the page. We’ve been logging searches for the last few months (over a 129,000 so far, about 43,000 unique searches) and I’m going to be working closely with the documentation team to identify which searches are most common and what tailored information would be best to present the user with when they search for targetted terms, be it a blog post, an external resource, someplace on WordPress.org itself, a wiki page, or a specific thread. We can watch trends and spikes in searches to identify any problems in the application itself or features that may be insufficently documented or hard to use.

The work is far from finished, but I think it’s a strong first step into fully integrating search as a support mechanism and bringing the WordPress team even closer to the pulse of the users.

Zoto

Zoto looks pretty neat. I signed up earlier and also set up an account for my Mom. But why isn’t WordPress a “supported blog”? Only a 136 users so far as I write this. Go sign up and try it out. They’ve done some interesting things with the interface and they have an open source photo client available for Windows, OS X, and Linux. I couldn’t get the Linux client working on my Mom’s computer, had no trouble at home on Gentoo though. I think it was an old version of Python.

I uploaded a few not-yet-on-the-photolog photos to my Zoto page to get the party started. (The fact that some of those are from Christmas and New Year’s means I really need to catch up.)

Staticize 2.5

Version 2.5 of the Staticize Reloaded plugin is now available for download. Installation instructions are included in the archive. What does Staticize Reloaded do? It is a highly advanced caching engine that dynamically and automatically caches pages on your site that need to be cached, when they need to be cached. It also allows for some parts of a page to be cached and others not to be, so for example your menu could always be dynamically included from a single file while your main blog content was cached. With Staticize Reloaded you don’t have to worry about rebuilding, stale caches, slow posting times, or any of that. It works silently, efficiently, and trasparently to both the end user and the author.

This version adds the ability to have dynamic functions on a page in addition to dynamic includes. It also adds full support for etags and last modified headers, though you must turn it on in the plugin file. My one tip is that when you redesign or tweak your template temporarily deactivate the plugin. Staticize Reloaded is well-suited for sites on older servers or that receive more than twenty thousand visitors per day. WordPress is so fast anyway I find it’s not worth caching on lower-traffic sites.

Update: The zip archive had a slightly older version of the plugin than the final 2.5 release. Please re-download to get the latest and greatest and fastest.

Mac IE 5 Support Worth It?

In Joining the Dark Side -OR- Is Mac IE 5 Support Worth $1,500, Scott responds to Tantek’s calling out of the new Feedster’s lack of support for Mac IE. Personally I’m sympathetic to Feedster’s case because I’ve had to spend hours talking to someone with a Mac trying to debug Mac IE issues with this site and wordpress.org and ended up having to change my favorite list menu technique from using floats to display: inline, which meant changing all the other menu styles to compensate. It was a pain.

I know that when I’m tweaking and checking things in different browsers, the number of my audience who uses that browser isn’t always the most important thing. In the previous case the only Mac IE I had heard anything from since both of the sites started was Tantek, and that was important enough to spend a couple hours of my time on. Imagine if you’re doing a job and the client’s boss uses Netscape 4, (god help you and) suddenly that browser becomes much more important in your testing, and you should triple your rate.

However, is this something the Web Standards Project should be interested in the same way we have been All Music or Odeon? I don’t speak for anyone but myself, but in my opinion it’s not the same at all. Feedster’s pages are a few trivial mistakes away from valid XHTML 1.1 and valid CSS, which is no easy task. (MIME issues aside.) Of course they should fix those mistakes, but it is a matter of a few minutes rather than 1-1.5 days. They aren’t writing to one browser or propietary technologies, they’re writing to modern standards and excluding browsers that have serious flaws in that area. Is that so different from the browser upgrade campaign?

From a user experience point of view, excluding Mac IE users might be a good idea as well. If Feedster allowed Mac IE users to visit and they saw a messed up layout (or no layout at all), as Tantek has suggested, then their perception of the Feedster brand, reliability, and image would be negative. I bet Keith would have some great thoughts on this. If they’re given a message that the site doesn’t support Mac IE, (honestly) they’ve probably seen this before and will just switch to another browser for that site. In my experience Mac users tend to be total browser flirts, and have every browser you’ve ever heard of installed. I would rather they open up my site in Safari or Firefox.

If Tantek was here I imagine he would counter that those browser options are really only valid for users on OS X, and that ignores hundreds of iMacs and such in libraries and such. Of course the question that a site owner needs to ask himself then is that in terms of costs and benefits, does that half of a single percent audience in libraries on older computers overlap with the audience you’re targetting with your site? If I was doing an ecommerce selling something like BMW accessories, I wouldn’t even give it a second thought. This isn’t about the many innovations that Mac IE introduced or its excellent standards support for its time, the issue is where Mac IE stands today.

On the bright side, Feedster has characterized this as a business cost/benefit decision and said if anyone sends them Mac IE CSS they’ll use it, which seems like a good concession. Of course I think Feedster should support Mac IE, and a day and a half to add support seems a little high, but if they choose not to I can understand.