There are only a few commercial information providers that can cope with the challenge of querying the chemical information disclosed in Markush structure claims in patents. If you’re not familiar with Markush patent claims, they are patent claims which describe generic patent structures that could include many different interchangeable parts. These complex patent claims can disclose hundreds of different potential chemical compounds by describing them in generic ways. For an example, see the chemical structure searching section of our Best Practices wiki article on Chemistry and Pharmaceuticals Searching.
Some chemical information companies have been interested in creating registries of known chemical substances that exist anywhere (not just in patent art). For example, the Chemical Abstracts Service has a well-known file of chemical substances called CAS REGISTRY, and the ChemSpider database is a newer service which aggregates publicly available chemical data from the web into a single repository. But searching Markush claims is not just a matter of querying a database of known structures. To conduct a successful Markush search, a search engine must be able to search through the patent claim language and understand all the possible compounds that may be covered by structures described in generic chemical terms. For example, how would you teach a computer to understand that a patent which claims a compound substituted by “an alkyl, an alkoxy, hydroxy, or amino,” is a good match for the specific chemical structure you drew as a query?
Certain information providers have aggressively tackled this query by funding hand-built collections of chemical patent records which are treated with complex indexing systems to make the Markush information structure-searchable. This is not a simple task. There are at least three widely-used unique commercial sources of such carefully indexed information: Derwent Fragmentation Codes, the Chemical Abstracts Service (CAS) MARPAT file, and the Merged Markush Service (MMS). Fragmentation codes are an older, simpler method of indexing chemical information, and they were adopted by Derwent in the 1960s. The other two methods are more advanced, and were both adopted in the late 1980’s. While CAS developed an indexing system to create the MARPAT file, a joint venture called Markush DARC was formed by Derwent Information, the French Patent Office (INPI), and Questel to index the DWPI and Pharm files. This joint venture later came to be known as MMS, or the Merged Markush Service, and has been available exclusively on a command line platform offered by Questel. Today, the MMS frontfile indexing is created by Thomson Reuters, the company that acquired Derwent Information.
MMS is a valuable service which does not get much limelight right now – as far as I have seen, Thomson Reuters does not promote it at industry conferences. In addition, the platform to access this data, currently provided by Questel, has aged and is badly in need of an overhaul; specifically, an investment in a re-design focused on usability (the interface for this older search service could charitably be described as “quirky”).
I have learned recently that ChemAxon, producer of MarvinSketch and other chemical information products, is introducing a new platform that may be used to query the MMS database. Lots of questions remain about this new platform, including whether it will include the MMS Pharm content indexed by the French Patent Office (it is unclear to me whether Thomson Reuters has obtained ownership of this data). I would also like to know if the system will possibly also support and utilize the older Derwent Fragmentation codes to create structure search queries against the DWPI. An initial slideshow about the ChemAxon supporting product is available, although I am waiting for a more end-user friendly presentation before I can understand exactly how it works.
This post only touches lightly on the deep and complex challenge of querying chemical information in patents. For those who want to learn more, here is a quick list of other patent information providers to know about. IFI Claims, formerly of Wolters Kluwer and recently acquired by Fairview Research, has a chemical structure registry, and I understand it may also have a fragmentation coding system. SureChem supports chemical structure searching through machine-harvesting of chemical information, and DecrIPt is also worth a look.
For chemical structure searching, I definitely recommend hiring an experienced professional search provider (for example, my employer Landon IP provides chemical structure search services). This is necessary because end user search products in this subject area have not evolved to the point where untrained laypeople can easily use them.
What have your experiences been with Markush chemical structure searching? Let us know in the comments!
This post was edited by Intellogist Team member Kristin Whitman. The Intellogist blog is provided for free by Intellogist’s parent company, Landon IP, a major provider of patent search, technical translation, and information services.
Filed under: Patent Search News, Patent Search Systems Tagged: | CAS, CAS REGISTRY, ChemAxon, Chemical Abstracts Service, ChemSpider, DecrIPt, dwpi, French Patent Office, IFI Claims, Markush, Markush DARC, Merged Markush Service, MMS, Pharm, Questel, Thomson Reuters, Wolters Kluwer