Beyond Latent Semantic Analysis: Linguistic Genomics on M·CAM DOORS™

Professional patent analysts are always looking for new tools with unique patent data analysis and visualization features.  I recently viewed a demonstration of the “intellectual property risk management tool” M·CAM DOORS™, which utilizes a distinctive technology known as linguistic genomics to identify related prior art and displays these results through interactive visualizations.  I had the opportunity to speak with the creator of the linguistic genomics technology, CEO Dr. David E. Martin of M·CAM, Inc., and he explained how linguistic genomics is an analytical technology that has evolved beyond keyword searching and latent semantic analysis, due to it’s ability to analyze unstructured data systems and identify multiple perspectives regarding the target simultaneously.

Read on to learn how linguistic genomics can be used to locate previously unidentified prior art, and see a few of the visualizations and data sets produced using the linguistic genomics technology on M·CAM DOORS Version 7.

What is Linguistic Genomics?

M·CAM DOORS™ runs on linguistic genomics technology, originally developed by Dr. David E. Martin. Linguistic genomics is described in a 2004 technical summary prepared by M·CAM:

Using a series of syntactic and linguistic behavioral modeling cues, our technology learns expression patterns during a complex process of compressing and encrypting data so that noise is eliminated, relevance of expression is determined, and human intent is deciphered. Deployed on web-based or portable systems, our SmartText technology can acquire, process and analyze any text or numeric expression to filter for redundancy, systematic obfuscation, and encrypted meaning in real-time.

Dr. Martin described how the linguistic genomics technology can be used on large, heterogeneous data sets to identify multiple perspectives based on a target document, so that users can identify similar technologies, even if the relevant patent documents don’t use the same language as the original document.  According to Dr. Martin, linguistic genomics technology is an excellent way to “out the obfuscators” who’ve tried to hide their technology from examiners and prior art searchers by “getting cute with a thesaurus.”

Dr. Martin defined latent semantic analysis (LSA) as a math process to identify core themes of interest on an already-narrowed theme space.  LSA doesn’t work well on non-normally distributed data, so you can only run it across a homogenous subset instead of an entire corpus of patents.  LSA also often tends to “break” if more than just a few thousand patents are included in the data set.  Linguistic genomics, meanwhile, can analyze non-normal data sets of 10-100,000 million documents simultaneously.

Patent Analysis Tools on M·CAM DOORS™

M·CAM DOORS™ Version 7 includes a number of features applicable to prior art searching and patent analysis, and I was able to view a few of these search and analysis options during a demonstration of the product. Through the platform, users can simply search for bibliographic and citation data on a patent document by entering a patent number in a search form, and the resulting record will include:

  • Bibliographic data on the document
  • Document full text (claims, description)
  • A link to a PDF of the document
  • A list of all “Cited Prior Art”
  • List of “Citing Subsequent Art”
  • List of “Front Page Non-Patent References”
  • List of all Patent Family Members
  • Link to view related documents
  • Life bars beside each patent document number (including the original document, cited and citing patent documents, and patent family members) that displays the amount of time left before the patent expires.

Users can also view a graphic representation of all cited and citing patents related to a patent document through the Magellan Telescope. The Magellan Telescope displays the central target patent document, with bubbles branching off to both the left and the right of the target patent, representing groups of cited or citing patent documents related to a specific assignee.

Magellan Telescope visualization.

The Magellan Satellite displays a color-coded chart centered around a central target patent document, illustrating cited prior art, citing prior art, precedent innovation (relevant prior art not cited by the target document), subsequent innovation (relevant prior art that failed to cite the target document), and concurrent innovation (prior art that was in prosecution during the prosecution period of the target document).

Magellan Satellite visualization.

All color-coded sections of the chart can be selected to view a full result set related to the section. The documents in the result set list are color-coded by relevance.

Color-coded result set from Magellan Satellite.


Many of the search and analysis features on M·CAM DOORS, such as the citation visualization in the Magellan Telescope and the document record view that includes lists of citations and patent family members, are relatively standard features that may be found on other subscription-based patent search systems, like PatBase or TotalPatent.  The most unique tool on M·CAM DOORS seems to be the Magellan Satellite feature, which utilizes the linguistic genomics technology to identify the precedent innovation, subsequent innovation, and concurrent innovation result sets. Prior art searchers may find Magellan Satellite useful for locating previously overlooked prior art that was initially missed due to obfuscated language within the document.  Patent analysts may find this tool useful for identifying potential licensing opportunities through the “subsequent innovation” result set.

Have you utilized M·CAM DOORS as a patent analysis or prior art search tool?  What are your views on linguistic genomics technology?  Let us know in the comments.

Patent Analysis from Landon IP

This post was contributed by Joelle Mornini. The Intellogist blog is provided for free by Intellogist’s parent company Landon IP, a major provider of patent searches, trademark searches, technical translations, and information retrieval services.


3 Responses

  1. in one of my former lives I had the opportunity to experience MCAM Doors in its very early stages, and had a head-to-head with it as well to help determine if it was useful for our company. It didn’t fail as such, but it can’t compete with a necktop computer, although it was faster of course. You can’t beat the human brain for semantic analysis and comprehension.

  2. Dr. Martin’s knowledge of Latent Semantic Analysis is dated and inaccurate. LSA, in its original form (18 years ago) certainly had limitations, but there are some very powerful implementations recently that have significantly forwarded the art even beyond what Dr. Martin claims as his current capabilities. He says, “LSA also often tends to “break” if more than just a few thousand patents are included in the data set”. The machine intelligence technology used in Lexis’ Total Patent product is built from the entire corpus of USPTO patents – just a bit more than “a few thousand patents” and solves the obfuscation problem by exposing the machine intelligence to the user. Linguistic genomics sounds good, but after watching 3 videos of Dr Martin discussing it, I still never found a good description of what it actually does to create a result. Lots of talk about words as “metaphors” or “identifying multiple perspectives based on a target document” but nothing that makes sense to an engineer. A whitepaper, without the metaphors, would be great so that us tech-types could make comparisons against other current solutions.

  3. Very informative article. I’ve never been in this site before but this has been one of the most helpful blogs for me. Thank you for sharing this post on Linguistic Genomics on M·CAM DOORS™.

    My Last Post: Keyword Map Pro

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: