[tweetmeme source=”Intellogist” only_single=false]
Here at Intellogist we tend to focus on strategies to find patents and prior art of interest, but working with individual patent documents can also be challenging. Recently I started looking into which patent search providers offer searchable PDF patent documents as downloads. PDF is one of the most common file formats for vendors to use when providing patent document images, and many legal researchers would like these files to arrive in a form that is already keyword searchable in Adobe. There are programs (such as certain versions of Adobe Acrobat) that can run optical character recognition (OCR) on a document image, but it would save legal researchers even more time if the patent documents were to come pre-treated so that they are already keyword searchable.
After some quick research, I found that whether vendors provide searchable PDFs depends on the country collection you are interested in. I assume that differences between country collections arise because each patent office has different standards for how patent documents are treated during the publication process. However, a cursory field test revealed that there may be some differences between the PDFs from common patent search sources.
When I started to investigate, I quickly found that both Questel’s orbit.com platform (including QPAT) and LexisNexis TotalPatent specifically mention “searchable PDFs” as part of their product benefits. Questel makes it clear on their full text coverage page that they offer searchable PDFs, and LexisNexis highlighted this feature in their initial press release about TotalPatent. (On the other hand, PatBase is clear that PDFs do not come in searchable format, and users must run OCR on the document after download.)
To try to figure out who else was offering searchable PDFs, I used a very small and unscientific sampling of two EP documents, two US grants, and one WO document. I tested only documents that were published in 2009 or 2010 because I assumed that these were the most likely to be offered in searchable format. During my brief investigation into the issue, it seems that recently published EP patent documents are usually keyword searchable upon download from most vendors, and US and WO documents are not commonly in a searchable format.
I ran a quick test in the following systems:
- Google Patents (US documents only)
- MicroPatent PatentWeb
- Patent Lens
- Thomson Innovation
The results were uniform: the US and WO documents I picked were not searchable when downloaded from the systems I tested, while the EP documents were searchable from every source except for esp@cenet. Esp@cenet’s failure to provide searchable PDFs for my two EP test documents derails my hypothesis: I was expecting to conclude that EP documents are usually available in searchable format because the EPO produces them that way, but now I’m not sure that’s the case.
I should also mention that the ability to search a PDF document could also vary by publication stage. During the “fiddling around” stage of my light research, I did find several EP documents where the B1 stage was searchable, for example, but the A3 stage was not searchable because it was simply a republished WO application that had entered in the the EP phase.
I have to admit that, even after some research, I’m still not as knowledgeable on this issue as I’d like to be. Does anyone know the secrets behind which country collections are most often available as searchable PDFs, and why? Anyone have any quick conversion tricks they’d like to share?
This post was contributed by Intellogist Team member Kristin Whitman.