Are your chemical structure searches catching Markush claims?

[tweetmeme source=”Intellogist” only_single=false]

There are only a few commercial information providers that can cope with the challenge of querying the chemical information disclosed in Markush structure claims in patents.   If you’re not familiar with Markush patent claims,  they are patent claims which describe generic patent structures that could include many different interchangeable parts.  These complex patent claims can disclose hundreds of different potential chemical compounds by describing them in generic ways.  For an example, see the chemical structure searching section of our Best Practices wiki article on Chemistry and Pharmaceuticals Searching.

Some chemical information companies have been interested in creating registries of known chemical substances that exist anywhere (not just in patent art).  For example, the Chemical Abstracts Service has a well-known file of chemical substances called CAS REGISTRY, and the ChemSpider database is a newer service which aggregates publicly available chemical data from the web into a single repository.  But searching Markush claims is not just a matter of querying a database of known structures.  To conduct a successful Markush search, a search engine must be able to search through the patent claim language and understand all the possible compounds that may be covered by structures described in  generic chemical terms.   For example, how would you teach a computer to understand that a patent which claims a compound substituted by “an alkyl, an alkoxy, hydroxy, or amino,” is a good match for the specific chemical structure you drew as a query?

Certain information providers have aggressively tackled this query by funding hand-built collections of chemical patent records which are treated with complex indexing systems to make the Markush information structure-searchable.  This is not a simple task.  There are at least three widely-used unique commercial sources of such carefully indexed information:  Derwent Fragmentation Codes, the Chemical Abstracts Service (CAS) MARPAT file, and the Merged Markush Service (MMS).  Fragmentation codes are  an older, simpler method of indexing chemical information, and they were adopted by Derwent in the 1960s.  The other two methods are more advanced, and were both adopted in the late 1980’s.  While CAS developed an indexing system to create the MARPAT file, a joint venture called Markush DARC was formed by Derwent Information, the French Patent Office (INPI),  and Questel to index the DWPI and Pharm files.   This joint venture later came to be known as MMS, or the Merged Markush Service, and has been available exclusively on a command line platform offered by Questel.   Today, the MMS frontfile indexing is created by Thomson Reuters, the company that acquired Derwent Information.

MMS is a valuable service which does not get much limelight right now – as far as I have seen, Thomson Reuters does not promote it at industry conferences.  In addition, the platform to access this data, currently provided by Questel, has aged and is badly in need of an overhaul;  specifically, an investment in a re-design focused on usability (the interface for this older search service could charitably be described as “quirky”).

I have learned recently that ChemAxon, producer of MarvinSketch and other chemical information products,  is introducing a new platform that may be used to query the MMS database.  Lots of questions remain about this new platform, including whether it will include the MMS Pharm content indexed by the French Patent Office (it is unclear to me whether Thomson Reuters has obtained ownership of this data).  I would also like to know if the system will possibly also support and utilize the older Derwent Fragmentation codes to create structure search queries against the DWPI.  An initial slideshow about the ChemAxon supporting product is available, although I am waiting for a more end-user friendly presentation before I can understand exactly how it works.

This post only touches lightly on the deep and complex challenge of querying chemical information in patents.   For those who want to learn more,  here is a quick list of other patent information providers to know about.   IFI Claims, formerly of Wolters Kluwer and recently acquired by Fairview Research, has a chemical structure registry, and  I understand it may also have a fragmentation coding system.   SureChem supports chemical structure searching through machine-harvesting of chemical information, and DecrIPt is also worth a look.

For chemical structure searching, I definitely recommend hiring an experienced professional search provider (for example, my employer Landon IP provides chemical structure search services).  This is necessary because end user search products in this subject area have not evolved to the point where untrained laypeople can easily use them.

What have your experiences been with Markush chemical structure searching?  Let us know in the comments!

Like This!

PatBase Advertisement

Thomson Innovation

This post was edited by Intellogist Team member Kristin Whitman. The Intellogist blog is provided for free by Intellogist’s parent company, Landon IP, a major provider of patent search, technical translation, and information services.


8 Responses

  1. Just to clarify the INPI situation from TR – Thomson Reuters do own the MMS content originally indexed by INPI, and it’s planned to make this available via the ChemAxon tools in 2011

  2. Thanks for the update Alex! We appreciate it, and that’s good news for everyone.

  3. A comprehensive search by structure should search both exemplified compound databases (such as CAS REGISTRY/CAplus) and Markush databases (such as MARPAT). This is necessary because the exemplified compound databases cover only disclosed compounds. In the claims, Markush structures — generic structures that can have a number of definitions for each R-group — often cover many more combinations than were actually synthesized or disclosed.

    CAS scientists index the R-group definitions as indicated by the claims and disclosure, and then upload the indexed structure so that it is structure-searchable with the same query used to search CAS REGISTRY/CAplus and the other structure-searchable databases on STN. CAS recommends searching both REGISTRY/CAplus and MARPAT because REGISTRY retrieves the exemplified compounds that fall under the query whereas MARPAT retrieves the Markush structures.

    The MARPAT settings allow a searcher to choose the level of specificity for each node or group. Node/group settings can be either specifically disclosed (match level atom); specifically or generically (match level class); or specifically, generically, or with a text descriptor, such as “optionally substituted” (match level any). These choices allow a searcher to retrieve the desired type vs. amount of hits.

    MARPAT covers Markush structures found in patents covered by Chemical Abstracts from 1988-present. CAS has also added the Markush indexing done by INPI, which extends the coverage back to the early 1960s. The Markush structures covered by MARPAT represent organic and organometallic compounds.

  4. Thank you very much for your expertise Tony. Your explanation of finding example matches vs. generic matches was helpful for my understanding of the subject.

  5. I completely agree, Tony. One thing I regret about this post that I did not really have time or space to emphasize why both REGISTRY/CAplus and MARPAT should be searched, and that in addition a complete search on Markush claims should involve all available Markush patent claim databases, definitely including both MARPAT and MMS. Thanks for adding your expertise here.

  6. […] tweetmeme_alias = ''; tweetmeme_source = '”Intellogist”'; Searching for Markush structures is a difficult task, and professional searchers need training and experience to search competently […]

  7. […] both STN and SciFinder: a search for generic chemical structures defined in patent claims, called Markush searching.  Markush searches in both STN and SciFinder are conducted in the MARPAT database, but the results […]

  8. […] = ''; tweetmeme_source = '”Intellogist”'; Do you know the best resources for searching Markush chemical structures in patent documents? If you need to freshen up your […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: