Category Archives: Search

Search engines, discovery, and finding things on the web.

Twenty-Five

Today I am a quarter of a century old. To be honest I never thought I would be this old, it was a number beyond where I could imagine or visualize but the last few years have just gone by in a blur and here I am, 25 years young and finally able to rent a car without paying an age penalty.

Following up from the open source resolutions, here’s what I’m going to aim for this year in no particular order:

  • Learn a language where WP has a big impact (probably Spanish).
  • Take more videos, post at least 2 a month.
  • Post 10,000 photos in 2009.
  • Post at least one book a month I’ve enjoyed.
  • Don’t try to do everything myself.
  • Redesign Ma.tt! (And get back up in the search engine rankings for “Matt” on Google.)
  • Post more personal stuff. (Like this.)
  • Spend more time working with and coaching other young entrepreneurs and startups.
  • Donate to 5 Open Source projects that touch my life daily.
  • Learn to make/prepare one food item a month.
  • Launch, launch, launch! (Real artists ship.)
  • Get people to capitalize WordPress correctly, and stop using the fake mis-proportioned W. πŸ™‚ (Here are some correct ones.)
  • Print my favorite picture of another person every month and send it to that person in a picture frame.
  • Reinstate WordPress Wednesdays and make it easier to do an amazing photoblog with WP.

(Hat tip to Boris Mann, Benji, Niall Kennedy, John Roberts, Titanas, Network Geek, Avinash, Kirb, Julie, Mark Jaquith, and Kabatology for the resolutions.)

This is the seventh year I’ve blogged my birthday: 19, 20, 21, 22, and 23, and 24. If you had asked meΒ 7 years ago where I would be today I couldn’t have imagined all of the amazing things that have happened, the incredible people I’ve met, and the communities that I’ve become a part of. Thank you. Here’s to the next 25.

All birthday posts: 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42.

Yahoo on WordPress

Stephen Steele (is that a real name?) just wrote in that the new Yahoo Mail updates blog is on WordPress. As far as I know this is the first official Yahoo blog on WP I’ve seen. What makes it really interesting is it’s the first time I’ve seen third-party software (like WordPress) on the yahoo.com domain. You’ll notice every time they’ve done blogs before it’s been on a different domain like yahoo.net or ysearchblog.com, I imagine because of the incredibly strict security requirements anything with access to Yahoo.com cookies must meet. This is very exciting news. πŸ™‚

Google Analytics

Google Analytics is what I’ve been saying a search engine should have done years ago, provide hosted stats to any website that uses it. People love stats so this will get huge uptake, and probably provide Google really invaluable information. A year or so ago I hoped Technorati would do this for blogs, but they probably saw it as outside their core business. Interesting note: this is the first stat tracking javascript I’ve seen that validates as XHTML Strict. Good job, guys. (Most examples leave the language attribute.)

On Rita

Houston is the 4th largest city in the entire United States. The neighborhoods that flood the worst are the poor areas, but that doesn’t sound like it’ll matter with the magnitude of this hurricane heading to my home of 20 years. My parents have been on the road for 10 hours now and haven’t made it out of the city yet. Many other members of my family are staying, along with my Grandmother who is too sick to travel. After Katrina there was a rush of people metasearches and directories, NOW would be a good time for Amazon, Yahoo, Google, and the other giants to pool their resources and get the infrastructure in place to help before it hits. This one is hitting much closer to home for me, it’s hard to think about.

The Word is WYSI

It’s official now, the excellent TinyMCE has been integrated into WordPress 1.6. The real test after the search was if I could use it on a day to day basis, even though I normally can’t stand WYSIWYG-type things. With this integration I’ve actually found it more enjoyable to use and the code it produces is top-notch. Of course if you don’t like it, there’s a new checkbox to disable it under Options > Writing.

New WP.org Search

At the last IRC meetup the WordPress community asked for better search that included both the forums and the Codex and was integrated with the look and feel of the rest of the site. When I did this before it was horribly slow and it involved several queries across several different programs and MySQL hosts to get the results from the wiki, the forums, the blog, and then splice them together somehow. Later we switched to a plain Google site-search but they didn’t like the HTML we used for the search form so we took it down. Well after the meeting I remembered Yahoo Developer Network which had some sort of API for their search with a much higher limit than Google’s.

I went to the site to see how much of a pain it would be so I could start properly procrastinating, but I was taken aback by how incredibly easy it was to get an application ID and start getting the results back as simple XML. I began hacking on it right then. It was about 5 minutes to set up a search form with URIs the way I wanted, 7 minutes to get the XML and parse it out, 5 minutes to write in some paging, and then about 20 minutes tweaking the search page to make it look a little better. The result is the new search.wordpress.org WordPress Search.

It still needs some more work. There seems to be a dupe problem, which is actually a problem with our site, not Yahoo Search. I’d like to tweak the results to highlight newer topics more, or at leats allow for a date-based weighting. Finally I think it would be nice to include some WP-related blogs like Blogging Pro and Weblog Tools Collection in the results. Most importantly we now have a clean URI structure and home for searches which is abstracted from any piece of software or particular service provider. Yahoo deserves major kudos for opening up their information in such a free way and making it so easy that it’s taken me longer to write this post than start using their API.

Search Engine Markshowdown

I decided to run the web page analyzer (excellent tool) against the front pages of a few of the latest and greatest search engines and also do a little analysis of my own. Here are some of the results in one of the only tables you’ll ever see on this site:

  Feedster Technorati Google Yahoo Search
HTML 6.11 3.72 1.18 7.82
Ext. CSS 11.47 11.63 0 1.45
Other 9.10 6.70 15.10 1.72
Total 26.70 22.05 16.27 11.00
Compressed No No Yes No

Numbers are kilobytes, and may not add up exactly due to rounding. CSS is external, linked files. “Other” includes images and javascript.

Yahoo was the surprise winner here. Their HTML was alright but I think could be reduced quite a bit without losing anything. You’ll note they have the heaviest HTML of the bunch, heavier than other sites showing quite a bit more on their front page. They should probably talk to Doug. Overall though I think Yahoo has consistently been doing great nearly-standards-compliant work in their new designs. Yahoo could save about 67% of their HTML size with compression. Interestingly, Yahoo was the only site to specify ISO-8859-1 encoding, all the others claimed UTF-8.

Google was optimized to the hilt, but it’s kind of silly that they put so much effort into their markup but couldn’t go the last inch and make it valid HTML 4. They could probably make it a bit smaller with some more intelligent CSS usage. At least they don’t have font tags anymore. I think under normal circumstances they would have won but they have an olympic logo right now that’s pretty heavy. Google was the only site that used gzip compression for their HTML, but even uncompressed they only weighed in at about 2.4 kilobytes, still the lightest of the group.

Technorati clearly had the smartest markup of the group, and was the only one that validated. (An impressive feat for any website in this day and age.) Their markup is clean as a whistle with excellent structure and logic, and their numbers aren’t bad when you consider that they have a lot of stuff on their front page. This isn’t too surprising since Tantek did it. Their CSS, however, is pretty heavy. It’s strange because it’s very optimized in some ways but bloated in others, I think they could cut a few K from it pretty easily. One smart thing they did is have the CSS named with the date, so it’s name versioned and they can update it monthly without caching issues. All that said, they’re so far ahead of everything else they don’t need to worry about much. Technorati could save about 53% of their XHTML size with compression.

Feedster has its heart in the right place, but the implementation falls far short. For example it has a XHTML 1.1 doctype but then has the needless XML declaration at the top throwing IE into quirks mode. They use CSS in places, but then they have a table with 75 non-breaking spaces in it for positioning. There’s a ton needless markup, including a full kilobyte of HTML comments. On the bright side, they have the most room to improve. Feedster could save about 61% of their XHTML size with compression.