Keeping Links Kosher

As part of the re-vamp I’ve put together a 404 script that emails me whenever it’s called. This has certainly been an eye-opener as to the misguided traffic that this site gets. An email is so much different than just seeing the hits in your logs, and I would recommend anyone serious about maintaining a site do something similar.

There are a few links to my old curly quotes entry that link to a rather funky perversion of a fly-by-night URI scheme that has long since gone by the by. These links worked just fine until I deleted the file that was keeping things going, now it’s time to move things into the magic .htaccess file.

Let’s take a look at the URI in question:
http://www.photomatt.net/archives/m/200209?p=186

My first thought was to just plop latter part of the request and create a rule just for this link, as such.

Redirect Permanent /archives/m/200209?p=186 http://photomatt.net/p186

Didn’t work, never matched. Next try I decided to go for something a little broader:

RedirectMatch 301 .*p=([0-9]+) http://photomatt.net/p$1

Didn’t work, never matched. Some research found that the problem lies in the query string, and the Apache redirect directives don’t address the query string. So let’s give mod_rewrite a go:

RewriteRule ^.*p=([0-9]+) http://photomatt.net/p$1 [R=301,L]

Still no luck. (For those that wonder, the HTTP response code 301 indicates that the resource has been permanently moved. “Permanent” in the first try is just a synonym of “301”.) It looks like the magical mod_rewrite doesn’t match query strings either. Some more research turned up that while redirect doesn’t match or rewrite query strings, it does pass them all. So we are left with:

Redirect Permanent /archives/m/200209 http://photomatt.net/

Which, counter-intuitively, works. The ?p=186 on the end is just passed to the root of the site, which gives it to my index file which knows just what to do with it. I would like to eliminate the query string entirely and forward the URI to http://photomat.net/p186 but while that would be trivial in any scripting language I can’t nail down how to do it on the Apache or mod_rewrite level. So my options now are to add something to the global header to catch p=something query strings and redirect it, but I’d like to keep that file clean, so more likely is that I’ll start adding some URI management code into the 404 handler and generally make that file more sophisticated in general. We’ll see.

16 thoughts on “Keeping Links Kosher

  1. I put together an intelligent 404 for the company I work for (Nationwide.co.uk), and depending on whether the referrer was a) a search engine b) a web site or c) one of Nationwide’s sites, the message displayed changed to suggest what went wrong and how we might fix it; In addition, I chose to make the reporting of the error voluntary, otherwise we would get overloaded – it seemed to me to make sense to have those pages that were important to people reported as issing rather than every 404. That way we got some level of prioritisation. See this for an example: Error 404 [direct link], or for other referring sites with incorrect URLs: Page not there

  2. What you need is a hook for the query string which is conveniently provided by a variable %{QUERY_STRING}

    So in effect you just need to prepend your RewriteRule with a RewriteCond something like:

    RewriteCond %{QUERY_STRING} p=([0-9]+)
    RewriteRule ^.* http://photomatt.net/p%1 [R=301,L]

    NOte that you need the % instead of $ to backreference matches from the RewriteCond. I am also pretty sure that the ? disappears and does not occur in the query string, but I guess that’s kind of a moot point since it would match anyway.
    This is untested, but at least should lead you down the right track. Hope it helps.

  3. Pingback: geek ramblings
  4. mod_rewrite does match Query Strings, but only if the QSA flag is present. Like:

    RewriteBase /
    RewriteRule ^archives/m/([0-9]+)/?$ index.php [QSA,L]

    The query string is now passed to your index.php and can be handeled there. I guess there are several ways that lead to Rome :p

  5. Whew! glad this page still exists. I was having a similar problem. My RewriteCond wasn’t working. turns out the problem was just needed %1 instead of $1. not quite the perl regexp!

SHARE YOUR THOUGHTS