Chemical Structure Searching Now Available on Wikipedia

This is a guest post contributed by Shankar Manyem, a Patent Analyst in Landon IP’s Patent Search Group’s Chemistry Team.

 

Wikipedia is a very useful resource for searching information on chemicals, with the platform hosting more than 15,000 of them. However, until now, it was only possible to search Wikipedia using text terms such as chemical names (and fragment names), trade names, CAS Registry numbers, and to a limited extent, SMILES text.  Now, a structure search engine, Wikipedia Chemical Structure Explorer, is available that allows structure searching of Wikipedia chemical entries. The Wikipedia Chemical Structure Explorer was produced by joint collaboration by researchers from Novartis, École polytechnique fédérale de Lausanne (EPFL), and Actelion Pharmaceuticals.

The website allows for searching chemicals through either exact, similarity, or sub-structure options. Upon accessing the website, four modules are presented:

  1. Module which allows for a structure to be drawn through the JSME Molecular Editor.
  2. Basic information module that provides options to view the result in Wikipedia and to search other similar molecules.
  3. List module that provides a list of results based on the structure search.
  4. Synopsis module that provides the Wikipedia entry for the top hit structure.

Currently, there are about 13263 structure searchable entries, which are typically updated nightly (e.g., on 4/10/15, there were only 13250 structure searchable entries).

Wikipedia Chemical Structure Search User Interface.

Wikipedia Chemical Structure Search User Interface.

 

Wikipedia Chemical Structure Explorer Advantages:

  • The results are presented on the fly as the structure is drawn, providing for structure query modification depending on the number of results.
  • It is possible to combine structure searching with keyword search, although the keyword has to be part of the title of the article in Wikipedia – this allows for identifying, e.g., all amines that contain a benzene ring by drawing a benzene ring and using “amin” as text filter.
  • The results are limited to common chemicals, e.g., active pharmaceutical ingredients, pesticides, etc., for which a reference would typically be provided in the Wikipedia page.

 

Wikipedia Chemical Structure Explorer Limitations:

  • The data set is, of course, limited to chemicals in Wikipedia, i.e., popular chemicals – even here, not all chemicals in Wikipedia are structure searchable since many entries just have text entries and are missing structural entries in a form recognizable by the search script.
  • Keywords other than those in the title of the article are not searchable and cannot be used as text limitations, e.g., property data such as molecular weight or formula cannot be used to limit a search.
  • The results are limited to and are linked to the English pages of Wikipedia – the chemicals in the French or German versions will possibly be included in future versions.

 

Further description of the Wikipedia chemical structure explorer can be found in an article published in Journal of Cheminformatics.

Patent Searches from Landon IP

This post was contributed by Abhishek Tiwari. The Intellogist blog and Intellogist are provided for free by Landon IP, which is a CPA Global company. Landon IP is a major provider of professional services meeting the needs of the IP community, including patent searches; analytics and technology consulting; patent, legal, and technical translations; and information research and retrieval.

One Response

  1. Good article

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: