(Pardon my verbification.)
Here’s an idea for any website, though it could be particularly applicable to weblogs. I’m a reading junkie, I can’t get enough. When I come across a blog I like I often go back in its archives, which is a great way to get a feel for a site. It’s fascinating to see how some blogs have evolved over the years, how posting styles evolve, and to see what people were thinking around the time of important events.
There is one common thread in every archive I browse, I constantly run across dead links. Long-dead links. Dead permalinks even. I have read that the average life-span of a web page is 100 days—I think that may be generous. What good is the wonderful archiving of modern weblog software if those archives become irrelevant less than a year after they’re written?
I think the answer lies in some kind of automatic archiving of all linked content. When you publish a new post an intelligent spider tied to your blog engine could go and grab the content of the page you link to and store it locally. Once a week the spider checks all links on your weblog and if the resource no longer exists it updates the link in the entry to point to the locally archived version. The local archive would have a disclaimer and a link to the original location of the resource, much like Google’s cache. The link in the entry could also be modified in some way, perhaps with a different CSS class or
rel value than normal links. The engine could also alert you so you could be sure to be wary of that website or publisher in the future.
How hard would this be? I know there are copyright issues that I’m ignoring, but I don’t forsee that being something that would hold this back. I doubt copyright holders who can’t keep their URIs cool are going to devote many resources to tracking down blogs violating their missing content. Besides, this could be covered.