New WP.org Search

At the last IRC meetup the WordPress community asked for better search that included both the forums and the Codex and was integrated with the look and feel of the rest of the site. When I did this before it was horribly slow and it involved several queries across several different programs and MySQL hosts to get the results from the wiki, the forums, the blog, and then splice them together somehow. Later we switched to a plain Google site-search but they didn’t like the HTML we used for the search form so we took it down. Well after the meeting I remembered Yahoo Developer Network which had some sort of API for their search with a much higher limit than Google’s.

I went to the site to see how much of a pain it would be so I could start properly procrastinating, but I was taken aback by how incredibly easy it was to get an application ID and start getting the results back as simple XML. I began hacking on it right then. It was about 5 minutes to set up a search form with URIs the way I wanted, 7 minutes to get the XML and parse it out, 5 minutes to write in some paging, and then about 20 minutes tweaking the search page to make it look a little better. The result is the new search.wordpress.org WordPress Search.

It still needs some more work. There seems to be a dupe problem, which is actually a problem with our site, not Yahoo Search. I’d like to tweak the results to highlight newer topics more, or at leats allow for a date-based weighting. Finally I think it would be nice to include some WP-related blogs like Blogging Pro and Weblog Tools Collection in the results. Most importantly we now have a clean URI structure and home for searches which is abstracted from any piece of software or particular service provider. Yahoo deserves major kudos for opening up their information in such a free way and making it so easy that it’s taken me longer to write this post than start using their API.

33 thoughts on “New WP.org Search

  1. Yep, very nice. Wondered what was going on with all these search changes btw ( :p ) but by far this is the best and quickest solution I’ve used.

  2. Good idea alright. I agree that I’d like plug-in and theme website included in the search. This is the best thing you can do to attempt to “centralize” all the themes, plug-ins and information on WordPress into a single location. That was the toughest part for me when I came over to WP was finding stuff. I knew the same question was answered 100 times but where? Thanks Matt

  3. I also wrote my own PHP Wrapper for Yahoo Search but I still use Google on my site. I don’t like SOAP, but then again, I don’t like yahoo either. Eh, Islands in the stream.

    P.S. If anyone’s interested in my PHP Yahoo wrapper email me (dante at dantecubed dot com).

  4. Nice functionability, but what is it with HUGE buttons and textfields? Whether on WP (login screen) or on the new search, that’s an ugly trend. Not trashing, just looking for some answers.

  5. I kind of like the larger form widgets. It’s a little bit daring– going against the inman-esque “how small can it be before they give up on reading it”-trend.

    Gorgeous job on the appearance and the URLs. Looks terrific.

  6. I have to admit I’m not a big fan of search URLs that pretend to be normal URLs. I always tend to go for blah.com/search/?q=term – it’s nice and short but still lets you know that you’re on a search page by glancing at the URL. ?q= is what Google uses.

    I’d love to hear the argument for blah.com/term style search URLs.

  7. I agree with Simon; even url.com/search/term isn’t that good; search URLs shouldn’t “pretend to be normal URLs (as Simon so eloquently put it).

  8. A while back I wanted to build something using Google’s API but their usage limits seemed severe. I was glad to see Yahoo default to a higher limit.

  9. Very nice search, Matt. Especially for 37 minutes of work! I’m glad YDN (Yahoo Dev Network) is working for you. Keep an eye on developer.yahoo.net, we’ve got more APIs coming soon.

  10. Matt, that’s really cool! I would be interested in how Yahoo makes money off other sites using there search since the results are in that sites customized search template? I mean is Yahoo going to start throwing in sponsored links in the future? I mean how long will this last, being able to sponge off of Yahoo or Google with out adds. I did see that you added a link to Yahoo, was that required in the user agreement or was that something you just did because you liked Yahoo’s search API so much?

  11. Here’s a good PHP library for using yahoo search : http://sourceforge.net/projects/yahoolib
    It’s designed to only require the xml extension with the intention of running on restrictive shared hosting environments.

    Looks good Matt 🙂 I was fed up with trying to use Google for my upcoming search changes, 15 minutes later it’s up and running from my dev server using Yahoo. Kudos to Yahoo for realising that API usablility is paramount to its uptake!

  12. well from an *outsiders* view and lacking the php lingo to fight my way out of a wet paper bag, all I can say is its a ton better than anything that has been there before, and has helped the usability out considerably….

  13. Regex all the way.

    Dante and Simon, I like that searches can become essentially permanent snapshots into a stream of content just like every blog’s front page is. A9.com which is my current search engine default, also does the same thing.

  14. Why should we have normal URLs and search URLs? In my mind, every URL should be a normal URL. Normal people don’t know or care about query strings, and normal URLs are easier for both normal and non-normal people to read and type. Being able to type “search.wordpress.org/whatever” more than makes up for any unfamiliarity caused by not being a search URL.

  15. Regex all the way.

    I tend to do this as well, but only for simple files (or where PHP5 is not availabe). The function I use is thus:

    function parseTag($tag, $context, $tagContents=’A-Za-z0-9 ‘, $whichTag=1)
    {
    $tmp = preg_match(“|<$tag>([$tagContents]+)</$tag>|”, $context, $tmpe);
    return $tmpe[$whichTag];
    }

    Not pretty, but serves its purpose well as a “poor man’s DOM Parser”.

  16. Definitly needed! One suggestion…

    Is there anyway to clean up the headers from the search result so that for a forum result the first two lines or so of every result isn’t

    Support Forums. Search First. Register or login: Username: Password: WordPress as a CMS. Topic started 1 year ago. 16 posts so far. Latest reply from

    and similarly for other search sources?

    Otherwise looking very good…

  17. I think the artificial delineation between search URIs and regular URIs is going to fall by the wayside soon anyway. We’re already close to the age where the physical location of a file on your computer’s filesystem will be less important than using a tool like Spotlight to organize things. Why shouldn’t the web lead the way in this? Having a term like http://www.wordpress.org/search/some_search_term makes a lot of sense, especially when you want to create permalinks to search results.

    Adding a bunch of extraneous cruft to a URL doesn’t make things easier, it just tends to get in the way. A virtual path like /search/ makes some sense, but WordPress has always been an innovator in crafting human-readable URIs in blogging, and it only makes sense to continue to do so with search terms.

  18. Matt,

    This is Lucas Spath. I know it’s been a while since we talked, but Rachel Speight died yesterday. You can read more about it at http://www.chron.com if you search for Speight. I know that you’re good at keeping up with some old people so if you could would you let every and anyone know you from PVA know about it and if they are in Houston could you please give them my phone number.

    281-798-7592. Call me yourself if you want. There is going to be visitation on Thursday and the service is probably going to be on Friday or on Saturday and I know that at the very least a few vocal majors are going to try to be there and sing at her funeral on her mom’s request. This is crazy and just…. wow.

    I just don’t have the words to say much of anything. Just trying to spread the word.

    Peace
    -Lucas

  19. Matt the problem with the dupes seems to be that is returns RSS feeds as well, perhaps the robots.txt should specify that /rss urls are not indexed, or you can add a NOT /rss in your search so that the dupes are removed. Perhaps you could also remove /rss results in parse-time. Good job though, I am finding things I didnt know about.

  20. We’re already close to the age where the physical location of a file on your computer’s filesystem will be less important than using a tool like Spotlight to organize things.

    I agree 100%. My hard drive is an absolute disaster-area in terms of organization, and without tools like Picasa, I’d be completely sunk.

  21. well from an *outsiders* view and lacking the php lingo to fight my way out of a wet paper bag, all I can say is its a ton better than anything that has been there before, and has helped the usability out considerably.

SHARE YOUR THOUGHTS