Impartial Cache

Lately it is with less frequency that news from the White House sends a chill up my spine, yet it seems the White House is using technical means to prevent spidering and archival of key documents. This is, without question, highly questionable. I hope there is a good reason for this or that it will be reversed quickly, but one has to wonder whether such a deliberate action could have been done by someone who is not a stakeholder, like a web lackey.

Of course this is a situation that could be addressed by technical means. A spidering robot that did not follow the robot exclusion rules could spider a number of public government web pages at set intervals, say twice a day, archive the results of the crawl, and a summary of the differences between the versions could be offered as a service of government transparancy. WhiteHouse.gov would certainly be worth watching and others such as the Fed could be interesting as well. It’s not a trivial task, but I would imagine one of the groups interested in such things would have no problem funding the development and mantainance of such a tool. For complete transparancy the tool could be open source. I can’t think of a legitimate objection that could be brought against such a service by operators of the websites in question. Bandwidth use would be trivial compared to the amount of traffic such sites must get every day.

2 replies on “Impartial Cache”

  1. Mark Pilgrim used to have a thing called “Winer Watch”, which I remember he renamed at some point, but I can’t find any references on his site in my admittedly quick search. Anyhow, he used to monitor Dave Winer’s RSS feed, diff out the changes as Dave edited his posts throughout the day, and generate a summary of the changes.

    I’m pretty sure he had the code available, and that you could track it down, if you tried hard enough. At the worst, I’m sure an email to Mark would be productive.

Comments are closed.