doubleplusundead

February 27, 2009

Down the Memory Hole

So, I was reading Instapundit this morning and this post motivated me to register the domain www.thememoryhole.us.

The service I want to provide is taking daily or weekly snapshots of high-profile websites like www.whitehouse.gov and keeping an archive that will allow us to easily yell gotcha when they disappear a statement that is inconvenient to their current policies.

Now, I just need help with understanding how web-crawlers work and how I'd be able to pull that data down. The rest of the presentation won't be a problem.

Anybody want to help? Alice H., I'm looking in your direction.

Update: Okay, something is there. I'll be putting it together in my very limited spare time this weekend. Basically, I figure I'll use a webcrawler to scrape the pages at various websites* of interest and then archive versions of the pages by date. That'll be all it does initially but eventually we could even have a tool that locates differences in websites over time. That would save having to hand-explore them.

* - I will need to look into the legality of doing this for private/corporate websites. I imagine anything political or governmental is pretty much fine under fair use laws but I'm not a lawyer.

*ahem*. I'm not a lawyer... ...

Posted by: Moron Pundit at 09:51 AM | Comments (32) | Add Comment
Post contains 214 words, total size 2 kb.

Hide Comments | Add Comment

Comments are disabled. Post is locked.

13kb generated in CPU 0.1128, elapsed 0.2279 seconds.
62 queries taking 0.2186 seconds, 145 records returned.
Powered by Minx 1.1.6c-pink.