As part of the re-vamp I’ve put together a 404 script that emails me whenever it’s called. This has certainly been an eye-opener as to the misguided traffic that this site gets. An email is so much different than just seeing the hits in your logs, and I would recommend anyone serious about maintaining a site do something similar.
There are a few links to my old curly quotes entry that link to a rather funky perversion of a fly-by-night URI scheme that has long since gone by the by. These links worked just fine until I deleted the file that was keeping things going, now it’s time to move things into the magic .htaccess
file.
Let’s take a look at the URI in question:
http://www.photomatt.net/archives/m/200209?p=186
My first thought was to just plop latter part of the request and create a rule just for this link, as such.
Redirect Permanent /archives/m/200209?p=186 http://photomatt.net/p186
Didn’t work, never matched. Next try I decided to go for something a little broader:
RedirectMatch 301 .*p=([0-9]+) http://photomatt.net/p$1
Didn’t work, never matched. Some research found that the problem lies in the query string, and the Apache redirect directives don’t address the query string. So let’s give mod_rewrite
a go:
RewriteRule ^.*p=([0-9]+) http://photomatt.net/p$1 [R=301,L]
Still no luck. (For those that wonder, the HTTP response code 301 indicates that the resource has been permanently moved. “Permanent” in the first try is just a synonym of “301”.) It looks like the magical mod_rewrite
doesn’t match query strings either. Some more research turned up that while redirect doesn’t match or rewrite query strings, it does pass them all. So we are left with:
Redirect Permanent /archives/m/200209 http://photomatt.net/
Which, counter-intuitively, works. The ?p=186
on the end is just passed to the root of the site, which gives it to my index file which knows just what to do with it. I would like to eliminate the query string entirely and forward the URI to http://photomat.net/p186
but while that would be trivial in any scripting language I can’t nail down how to do it on the Apache or mod_rewrite
level. So my options now are to add something to the global header to catch p=something
query strings and redirect it, but I’d like to keep that file clean, so more likely is that I’ll start adding some URI management code into the 404 handler and generally make that file more sophisticated in general. We’ll see.
geek language?
“It’s all geek to me.” — Bad pun
Try
RedirectMatch 301 ^p=(.*)$ /$1
Uh huh… that’s why you do the hosting.
“blah blah blah computer blah blah blah hardrive” 😉
I put together an intelligent 404 for the company I work for (Nationwide.co.uk), and depending on whether the referrer was a) a search engine b) a web site or c) one of Nationwide’s sites, the message displayed changed to suggest what went wrong and how we might fix it; In addition, I chose to make the reporting of the error voluntary, otherwise we would get overloaded – it seemed to me to make sense to have those pages that were important to people reported as issing rather than every 404. That way we got some level of prioritisation. See this for an example: Error 404 [direct link], or for other referring sites with incorrect URLs: Page not there
Actually, both those links generate the same message – I can’t remember the URL for the error404 page directly 🙂
hey! sarahw stole my line! ‘Cept I was going to say “geek geek geek, computer, geek geek geek, hardrive” Love, C
How do you catch 404’s? Care to share the knowledge? 😉
if you don’t update, i am going to cry!
Oooooh, you got linkage from Zeldman!
Cool – for IIS is do this http://one-network.com/junkasspage
however for BSD?apache/php i find that the 404 redirect is different.
I would like to know the ideal page ( probly php code? ) for that, such that i can preformat the email to include the info i want about the 404 like i do on the asp/IIS sites.
What you need is a hook for the query string which is conveniently provided by a variable %{QUERY_STRING}
So in effect you just need to prepend your RewriteRule with a RewriteCond something like:
RewriteCond %{QUERY_STRING} p=([0-9]+)
RewriteRule ^.* http://photomatt.net/p%1 [R=301,L]
NOte that you need the % instead of $ to backreference matches from the RewriteCond. I am also pretty sure that the ? disappears and does not occur in the query string, but I guess that’s kind of a moot point since it would match anyway.
This is untested, but at least should lead you down the right track. Hope it helps.
mod_rewrite
does match Query Strings, but only if theQSA
flag is present. Like:RewriteBase /
RewriteRule ^archives/m/([0-9]+)/?$ index.php [QSA,L]
The query string is now passed to your index.php and can be handeled there. I guess there are several ways that lead to Rome :p
Whew! glad this page still exists. I was having a similar problem. My RewriteCond wasn’t working. turns out the problem was just needed %1 instead of $1. not quite the perl regexp!