Professional patent analysts are always looking for new tools with unique patent data analysis and visualization features. I recently viewed a demonstration of the “intellectual property risk management tool” M·CAM DOORS™, which utilizes a distinctive technology known as linguistic genomics to identify related prior art and displays these results through interactive visualizations. I had the opportunity to speak with the creator of the linguistic genomics technology, CEO Dr. David E. Martin of M·CAM, Inc., and he explained how linguistic genomics is an analytical technology that has evolved beyond keyword searching and latent semantic analysis, due to it’s ability to analyze unstructured data systems and identify multiple perspectives regarding the target simultaneously.
Read on to learn how linguistic genomics can be used to locate previously unidentified prior art, and see a few of the visualizations and data sets produced using the linguistic genomics technology on M·CAM DOORS Version 7.
What is Linguistic Genomics?
M·CAM DOORS™ runs on linguistic genomics technology, originally developed by Dr. David E. Martin. Linguistic genomics is described in a 2004 technical summary prepared by M·CAM:
Using a series of syntactic and linguistic behavioral modeling cues, our technology learns expression patterns during a complex process of compressing and encrypting data so that noise is eliminated, relevance of expression is determined, and human intent is deciphered. Deployed on web-based or portable systems, our SmartText technology can acquire, process and analyze any text or numeric expression to filter for redundancy, systematic obfuscation, and encrypted meaning in real-time.
Dr. Martin described how the linguistic genomics technology can be used on large, heterogeneous data sets to identify multiple perspectives based on a target document, so that users can identify similar technologies, even if the relevant patent documents don’t use the same language as the original document. According to Dr. Martin, linguistic genomics technology is an excellent way to “out the obfuscators” who’ve tried to hide their technology from examiners and prior art searchers by “getting cute with a thesaurus.”
Dr. Martin defined latent semantic analysis (LSA) as a math process to identify core themes of interest on an already-narrowed theme space. LSA doesn’t work well on non-normally distributed data, so you can only run it across a homogenous subset instead of an entire corpus of patents. LSA also often tends to “break” if more than just a few thousand patents are included in the data set. Linguistic genomics, meanwhile, can analyze non-normal data sets of 10-100,000 million documents simultaneously.
Patent Analysis Tools on M·CAM DOORS™
M·CAM DOORS™ Version 7 includes a number of features applicable to prior art searching and patent analysis, and I was able to view a few of these search and analysis options during a demonstration of the product. Through the platform, users can simply search for bibliographic and citation data on a patent document by entering a patent number in a search form, and the resulting record will include:
- Bibliographic data on the document
- Document full text (claims, description)
- A link to a PDF of the document
- A list of all “Cited Prior Art”
- List of “Citing Subsequent Art”
- List of “Front Page Non-Patent References”
- List of all Patent Family Members
- Link to view related documents
- Life bars beside each patent document number (including the original document, cited and citing patent documents, and patent family members) that displays the amount of time left before the patent expires.
Users can also view a graphic representation of all cited and citing patents related to a patent document through the Magellan Telescope. The Magellan Telescope displays the central target patent document, with bubbles branching off to both the left and the right of the target patent, representing groups of cited or citing patent documents related to a specific assignee.
The Magellan Satellite displays a color-coded chart centered around a central target patent document, illustrating cited prior art, citing prior art, precedent innovation (relevant prior art not cited by the target document), subsequent innovation (relevant prior art that failed to cite the target document), and concurrent innovation (prior art that was in prosecution during the prosecution period of the target document).
All color-coded sections of the chart can be selected to view a full result set related to the section. The documents in the result set list are color-coded by relevance.
Many of the search and analysis features on M·CAM DOORS, such as the citation visualization in the Magellan Telescope and the document record view that includes lists of citations and patent family members, are relatively standard features that may be found on other subscription-based patent search systems, like PatBase or TotalPatent. The most unique tool on M·CAM DOORS seems to be the Magellan Satellite feature, which utilizes the linguistic genomics technology to identify the precedent innovation, subsequent innovation, and concurrent innovation result sets. Prior art searchers may find Magellan Satellite useful for locating previously overlooked prior art that was initially missed due to obfuscated language within the document. Patent analysts may find this tool useful for identifying potential licensing opportunities through the “subsequent innovation” result set.
Have you utilized M·CAM DOORS as a patent analysis or prior art search tool? What are your views on linguistic genomics technology? Let us know in the comments.
This post was contributed by Joelle Mornini. The Intellogist blog is provided for free by Intellogist’s parent company Landon IP, a major provider of patent searches, trademark searches, technical translations, and information retrieval services.