Open Source Search WordPress

New Search

At the last IRC meetup the WordPress community asked for better search that included both the forums and the Codex and was integrated with the look and feel of the rest of the site. When I did this before it was horribly slow and it involved several queries across several different programs and MySQL hosts to get the results from the wiki, the forums, the blog, and then splice them together somehow. Later we switched to a plain Google site-search but they didn’t like the HTML we used for the search form so we took it down. Well after the meeting I remembered Yahoo Developer Network which had some sort of API for their search with a much higher limit than Google’s.

I went to the site to see how much of a pain it would be so I could start properly procrastinating, but I was taken aback by how incredibly easy it was to get an application ID and start getting the results back as simple XML. I began hacking on it right then. It was about 5 minutes to set up a search form with URIs the way I wanted, 7 minutes to get the XML and parse it out, 5 minutes to write in some paging, and then about 20 minutes tweaking the search page to make it look a little better. The result is the new WordPress Search.

It still needs some more work. There seems to be a dupe problem, which is actually a problem with our site, not Yahoo Search. I’d like to tweak the results to highlight newer topics more, or at leats allow for a date-based weighting. Finally I think it would be nice to include some WP-related blogs like Blogging Pro and Weblog Tools Collection in the results. Most importantly we now have a clean URI structure and home for searches which is abstracted from any piece of software or particular service provider. Yahoo deserves major kudos for opening up their information in such a free way and making it so easy that it’s taken me longer to write this post than start using their API.

33 replies on “New Search”

Good idea alright. I agree that I’d like plug-in and theme website included in the search. This is the best thing you can do to attempt to “centralize” all the themes, plug-ins and information on WordPress into a single location. That was the toughest part for me when I came over to WP was finding stuff. I knew the same question was answered 100 times but where? Thanks Matt

I also wrote my own PHP Wrapper for Yahoo Search but I still use Google on my site. I don’t like SOAP, but then again, I don’t like yahoo either. Eh, Islands in the stream.

P.S. If anyone’s interested in my PHP Yahoo wrapper email me (dante at dantecubed dot com).

I kind of like the larger form widgets. It’s a little bit daring– going against the inman-esque “how small can it be before they give up on reading it”-trend.

Gorgeous job on the appearance and the URLs. Looks terrific.

I have to admit I’m not a big fan of search URLs that pretend to be normal URLs. I always tend to go for – it’s nice and short but still lets you know that you’re on a search page by glancing at the URL. ?q= is what Google uses.

I’d love to hear the argument for style search URLs.

Very nice search, Matt. Especially for 37 minutes of work! I’m glad YDN (Yahoo Dev Network) is working for you. Keep an eye on, we’ve got more APIs coming soon.

Matt, that’s really cool! I would be interested in how Yahoo makes money off other sites using there search since the results are in that sites customized search template? I mean is Yahoo going to start throwing in sponsored links in the future? I mean how long will this last, being able to sponge off of Yahoo or Google with out adds. I did see that you added a link to Yahoo, was that required in the user agreement or was that something you just did because you liked Yahoo’s search API so much?

[…] Tras muchos intentos baldíos, Matt (Mullenweg) acaba de incorporar un buscador a la página oficial de WordPress. Se trata de un sistema basado en el Yahoo Developer Network y, como reconoce el propio autor, aunque aún queda mucho por afinar, ya comienza a devolver resultados del foro y del Codex, además de referencias a las entradas de las bitácoras Blogging Pro y Weblog Tools Collection, que pata Matt resultan fundamentales. […]

Here’s a good PHP library for using yahoo search :
It’s designed to only require the xml extension with the intention of running on restrictive shared hosting environments.

Looks good Matt 🙂 I was fed up with trying to use Google for my upcoming search changes, 15 minutes later it’s up and running from my dev server using Yahoo. Kudos to Yahoo for realising that API usablility is paramount to its uptake!

well from an *outsiders* view and lacking the php lingo to fight my way out of a wet paper bag, all I can say is its a ton better than anything that has been there before, and has helped the usability out considerably….

Regex all the way.

Dante and Simon, I like that searches can become essentially permanent snapshots into a stream of content just like every blog’s front page is. which is my current search engine default, also does the same thing.

Why should we have normal URLs and search URLs? In my mind, every URL should be a normal URL. Normal people don’t know or care about query strings, and normal URLs are easier for both normal and non-normal people to read and type. Being able to type “” more than makes up for any unfamiliarity caused by not being a search URL.

[…] 1)The simplest way to implement site search – Hoorah for XML! 2)Regarding my previous post about indelicate remarks in coding comments, turns out this generalizes to film – special effects artist need to be careful too sometimes… – link. […]

Regex all the way.

I tend to do this as well, but only for simple files (or where PHP5 is not availabe). The function I use is thus:

function parseTag($tag, $context, $tagContents=’A-Za-z0-9 ‘, $whichTag=1)
$tmp = preg_match(“|<$tag>([$tagContents]+)</$tag>|”, $context, $tmpe);
return $tmpe[$whichTag];

Not pretty, but serves its purpose well as a “poor man’s DOM Parser”.

Definitly needed! One suggestion…

Is there anyway to clean up the headers from the search result so that for a forum result the first two lines or so of every result isn’t

Support Forums. Search First. Register or login: Username: Password: WordPress as a CMS. Topic started 1 year ago. 16 posts so far. Latest reply from

and similarly for other search sources?

Otherwise looking very good…

I think the artificial delineation between search URIs and regular URIs is going to fall by the wayside soon anyway. We’re already close to the age where the physical location of a file on your computer’s filesystem will be less important than using a tool like Spotlight to organize things. Why shouldn’t the web lead the way in this? Having a term like makes a lot of sense, especially when you want to create permalinks to search results.

Adding a bunch of extraneous cruft to a URL doesn’t make things easier, it just tends to get in the way. A virtual path like /search/ makes some sense, but WordPress has always been an innovator in crafting human-readable URIs in blogging, and it only makes sense to continue to do so with search terms.


This is Lucas Spath. I know it’s been a while since we talked, but Rachel Speight died yesterday. You can read more about it at if you search for Speight. I know that you’re good at keeping up with some old people so if you could would you let every and anyone know you from PVA know about it and if they are in Houston could you please give them my phone number.

281-798-7592. Call me yourself if you want. There is going to be visitation on Thursday and the service is probably going to be on Friday or on Saturday and I know that at the very least a few vocal majors are going to try to be there and sing at her funeral on her mom’s request. This is crazy and just…. wow.

I just don’t have the words to say much of anything. Just trying to spread the word.


Matt the problem with the dupes seems to be that is returns RSS feeds as well, perhaps the robots.txt should specify that /rss urls are not indexed, or you can add a NOT /rss in your search so that the dupes are removed. Perhaps you could also remove /rss results in parse-time. Good job though, I am finding things I didnt know about.

We’re already close to the age where the physical location of a file on your computer’s filesystem will be less important than using a tool like Spotlight to organize things.

I agree 100%. My hard drive is an absolute disaster-area in terms of organization, and without tools like Picasa, I’d be completely sunk.

WordPress Search 現在透過 Yahoo! Search API 處理

在 Yahoo! Search Web Services 這篇文章裡面提到的 Yahoo! Search Web Services 終於看到應用了… (也許很多人都在用,只是我不知道 :p)
在 Matt çš„ New Search 提到了在 IRC 聚會的時候有些人希望 WordP…

[…] Yahoo! Search Web Services. Somehow Yahoo! suddenly has became the “good guy” again, while the “do good” Google falls to the dark side. Jeremy Zawodny commented on how now uses Yahoo to back its site-wide search. It would be interesting to play around the API, or replace the default WP search engine with Yahoo’s as a plugin (hint hint). […]

[…] Another option is using Google and Yahoo Search, something what search does. It is not very difficult to setup. However its problems lie elsewhere. It needs a public web presence, which might not be the case for corporate internal blogs. This has become more possible with the WordPress multiuser version out. Secondly, they will have their own restrictions, and thirdly the dependency on it. This is not going to help WordPress to make it a serious contender in the corporate space. […]

well from an *outsiders* view and lacking the php lingo to fight my way out of a wet paper bag, all I can say is its a ton better than anything that has been there before, and has helped the usability out considerably.