First, let me say that I think “patents on the cloud” is a radical idea, and something that may create some changes in patent information products. Read on for a summary of Alexandria, a new product that provides “patents on the cloud,” and my analysis of this new approach!
I recently spoke to Mike Baycroft of Fairview Research, Inc., the company that has lately acquired IFI and launched a new patent information product, Alexandria, billing the service as “Patents on the Cloud.” Alexandria is a database of patent information hosted on the Amazon Elastic Compute Cloud, ready to license. For those who haven’t followed the jargon, the “cloud” refers to cloud computing, and it basically means that you can get a lot of stuff, like data and software, directly from the delocalized network (or “cloud”) of sources that make up the internet, rather than having to load it on your own local servers.
The idea for the Alexandria database came years ago, when researchers complained that there was no good international database formatted in a way that would support academic research into the patent art. Rather than build another commercial search product, Mike envisioned a database of standardized patent information unconstrained by the functional requirements of a specific application or interface. He wanted to build a raw data source that could be taken and molded by an academic researcher.
There are a number of major sources of raw patent data, but in order to construct a truly international database, up until now you would have needed to source the data from multiple vendors and spend time standardizing and normalizing the data into a single source, and this is a big, demanding task. That is why most of us choose to subscribe to commercial patent search products, rather than building our databases from scratch. However, with Alexandria, we now have the ability to extract a standardized, integrated collection of international patent data from the Amazon Cloud in as little as an hour.
Alexandria aims to take the work out of building a patent data set by offering a single source of international patent data. The structure of Alexandria is based on the DOCDB (INPADOC) bibliographic and family file from the European Patent Office. DOCDB is used to associate full text patent documents into family records; this is similar to the way the PatBase database is structured. It’s not that Alexandria’s full text collection is particularly unique – many of the collections are currently available from other platforms, and some collections are licensed from LexisNexis’ Univentio data. It appears the collection will be updated on a schedule comparable to what other commercial vendors can provide, including the regular weekly updates released for DOCDB. Mike highlighted to me that the notable aspect of Alexandria is the efforts that have been made to unify the data into a single format (ST.36 XML).
The capability to get the Alexandria collection on-demand makes a big difference for anyone looking to set up an customized patent collection, for whatever reason. It is also game-changing for the producers of patent analytical tools out there. Up until now, vendors with software products in the patent analytics field were often left unable to demo their products effectively due to the lack of a complete patent collection within the product – they needed to import patent data sets in order to manipulate them. I have actually seen this first hand at patent information conferences, where analysis product vendors were stuck using canned datasets (and to me at least, it’s all to easy to suspect them of cherry picking their data to make their analytics look wonderful!). Furthermore, the lack of underlying patent data sent a message to the customer that the analysis product is something you buy *on top of* your commercial patent search product, because you need somewhere to get your initial dataset from. With the availability of a clean, usable patent data set “on the cloud,” a subscription to another end-user patent search product (e.g. Delphion, Micropatent, etc.) looks less necessary. Unfortunately, for now, Alexandria’s coverage may not quite stack up to the unique patent data products and collections offered by commercial search vendors, so I think serious patent searchers won’t be giving up their patent search tool subscriptions any time soon.
Notably, the database is multilingual and includes full text native language records for Chinese, Japanese, and Korean patent collections. This serves as the basis for a second major utility Mike and his team envision for Alexandria: as a test-bed for improving machine translation technologies. In order for machine translation technologies to “learn” language, they need to have a large body of data from which to create dictionaries and make associations between languages. Creating dictionaries for technical jargon is quite different than creating dictionaries used for everyday language, and patent documents represent a large, public domain resource of technical data that can be used for this purpose. (In related news, Google recently announced an agreement with the European Patent Office that would allow them to use the EPO’s collections to improve their machine translation technology.)
Those who noted Fairview Research’s acquisition of IFI may wonder how that company’s data may be implemented in Alexandria; it remains to be seen if and how this data becomes a product offering from Fairview.
I think this is a fascinating development in the patent information industry. Patent information, like other public domain information, is in a unique position to be disseminated via the cloud because it is unencumbered by copyright restrictions. Making large datasets available in a form that can be easily loaded and “played with” is a big advantage for information professionals, especially those who work on analysis and big-picture type projects. I am wondering whether the next thing we will see is an easy interface-designer software, sort of like the Andriod App Inventor, which will allow non computer-y types to quickly put together and customize their own search interfaces, effectively allowing them to create their own search tools. If advanced users will be able to easily build their own search interfaces, patent information search providers will have to go beyond simply providing access to patent data (and we have already started to see this in products such as Innography). Patent information providers may have to focus on adding value by integrating multiple data sources together, such as linking patent records to business and financial data, non-patent literature records, and litigation records, and designing more complex interfaces that allow users to quickly cross reference between collections.
How do you think patent information products will evolve now that we’ve got “the cloud” on our side?
This post was edited by Intellogist Team member Kristin Whitman. The Intellogist blog is provided for free by Intellogist’s parent company, Landon IP, a major provider of patent search, technical translation, and information services.
Filed under: Patent Search News, Patent Search Systems Tagged: | academic research, Alexandria, analysis, Android, cloud computing, DOCDB, fairview, Google, IFI, INPADOC, LexisNexis, machine translation, Mike Baycroft, Univentio