We all know that the legal element of patent searching can add an interesting twist onto scientific and technical literature searches. Those of us in the Intellogist community who do lots of validity investigations, for example, know that it’s not only *if* material is publicly available, but *when* it was publicly available, that counts. Lots of folks I have talked to recommend the Wayback Machine, a.k.a. the Internet Archive or archive.org, to provide useful evidence that web content was available online before a certain date.
The Internet Archive is an ambitious project to catalog historical versions of web pages. In other words, using the archive, you can see a certain web page not only as it exists today, but as it appeared years ago. The archive does this by using a web crawler to take “snapshots” of web pages. These snapshots are then stored in chronological order, meaning that you can follow a web page’s history as the content displayed on it evolves.
Some uses for the Internet Archive are obvious, like turning back the clock to see what headlines were listed on CNN’s home page ten years ago. However, it can also be used to build evidence of copyright or trademark infringement (e.g. an archive of a certain text passage or logo, once captured, is stored forever). And of course, it can help us during patentability and validity investigations to gather evidence about the date that materials were first publicly available on the web. To use the Wayback Machine, first go to http://www.archive.org/. Enter the URL containing your content of interest into the search bar which appears at the top of the page, and select “Take Me Back.” (the bar will prompt you for a URL and already contains the first part of the protocol, http://). The display will show you captured versions of web pages in chronological order; as you can see from this test of http://www.cnn.com, the archive does not capture the page every day, only sporadically.
There are some caveats to using the Wayback Machine. First, snapshots of a certain site may not be taken very frequently, which means that you can use the archive to prove a date when content existed on a page, but *not* when it was first added. Secondly, the archive does not index pages which contain a robots.txt command, which specifically prohibits web crawlers from capturing the content. Additionally, it may take 6 months to a year before snapshots are actually available in the archive for viewing. One frequent user of the archive told me that it’s a good practice to check as many iterations of the page as you can manage – it’s possible that content can be put up, taken down, and put back up again during revisions to a site, so if you really need to beat a certain date it’s best to be thorough. For more information about the legal uses of the site, you can check out the site’s FAQ, which has a section written especially for attorneys.
What other tips and tricks do you have for dating material on the web? Share them with us in our comments section!
This post was contributed by Intellogist team member Kristin Whitman.