GQ-Pat Crosses 300 Million Sequences Milestone

GenomeQuest‘s database GQ-Pat has crossed the 300 million sequences milestone, the company announced in a blog post, making it the largest private or public biological sequence database on Earth.

The database now houses 256 million nucleotide sequences and more than 45 million protein sequences. The sequences are not simply automated translations of nucleotides like TrEMBL but are garnered from patents and patent applications published by patent authorities across the globe.

As of 2015, the number of nucleotide sequences in GenBank/EMBL/DDBJ consortium is 185 million, GenomeQuest pointed out.

For a more detailed report, click here.
Patent Searches from Landon IP

This post was contributed by Abhishek Tiwari. The Intellogist blog and Intellogist are provided for free by Landon IP, which is a CPA Global company. Landon IP is a major provider of professional services meeting the needs of the IP community, including patent searches; analytics and technology consulting; patent, legal, and technical translations; and information research and retrieval.

Important Changes to Patent Sequence Search Tools

Major updates have been added on two of the largest search platforms for locating peptide and nucleic acid sequences within patent records, and Intellogist is here to bring you all the details on these recent changes to to the coverage, search, and display features on GenomeQuest and the USGENE BLAST Search Portal. GenomeQuest is a biological sequence data management platform created by GenomeQuest, Inc., and GenomeQuest maintains its own patent sequence database, GQ-PAT. Many other non-patent data files are also simultaneously searchable on the platform, such as GenBank, RefSeq, Swiss-Prot, and other NCBI and EBI files.  We last highlighted the Chinese patent office sequence data added to GenomeQuest in 2010, and today we’ll look at additional India, Brazil, and Chinese patent sequence data now available in an Emerging Countries Domestic Patents Archive.  Additional GenomeQuest updates include normalized patent assignee names and subject database filtering.

The USGENE® BLAST® Search Portal is a subscription-based online search platform created by the SequenceBase Corporation that allows users to search USGENE®, a database that lists peptide and nucleotide sequences from US published applications and issued patents. Previously in July 2012, there were a number of updates added to the portal, and additional updates were implemented in December 2012, including options to expand and download all alignments for search results.

Continue reading to learn about all the new coverage, search, display, and download options recently added on two major sequence search platforms, GenomeQuest and the USGENE BLAST Search Portal!

Continue reading

Search Pharmaceutical Information for Free with DrugBank

[tweetmeme source=”Intellogist” only_single=false] How many ways can you search or browse a database of drug information?  The DrugBank website seems to offer almost every possible option for searching and browsing its database of:

6789 drug entries including 1437 FDA-approved small molecule drugs, 134 FDA-approved biotech (protein/peptide) drugs, 83 nutraceuticals and 5167 experimental drugs. Additionally, 4274 non-redundant protein (i.e. drug target/enzyme/transporter/carrier) sequences are linked to these drug entries.

Each drug profile (“DrugCard”) on DrugBank has “more than 150 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data.”  DrugBank is supported by Dr. David Wishart, the Departments of Computing Science & Biological Sciences at the University of Alberta, Genome Alberta, Genome Canada, and GenomeQuest, Inc.

DrugBank contains over 6,700 detailed drug profiles, which users can search or browse in 11 different ways.  It must cost an arm and a leg to access this database, right?  Actually, it’s free.  Continue reading to learn all the options for searching and browsing this treasure trove of pharmaceutical information!

Continue reading

Patent Information Updates to Intellogist

[tweetmeme source=”Intellogist” only_single=false]

There have been a lot of changes to Intellogist lately (with more to come), reflecting the ever-changing nature of patent information and search systems. After the jump, you’ll find samples of these updates, including info on: FreePatentsOnline, IEEE Xplore, Rospatent, GenomeQuest, USGENE, and a fun crossword!

Continue reading

GenomeQuest adds Chinese patent office sequence data

[tweetmeme source=”Intellogist” only_single=false]

Recently the Intellogist blog discussed GenomeQuest as a source for searching patent sequence data. This month GenomeQuest announced that they are adding Chinese patent sequence data to their collection.  The company is adding sequences filed at the Chinese Patent Office (SIPO).  The GenomeQuest press release explains that 40,000 sequences from over 5,000 Chinese patents have already been indexed into their GQ-IP product, which contains sequence information from patent collections and public sources such as GenBank, EMBL, and DDBJ.

Continue reading

Warning: your electronic patent search databases have gaps!

[tweetmeme source=”Intellogist” only_single=false]

UPDATE: For a further enlightening discussion of the gaps in the USPTO full text database, please see the comment section of this post (click the word “Comments” where it appears at the very end of this post).

Recently, a message came over Carl Oppedahl’s PAIR discussion list highlighting a mysterious gap in the USPTO’s online patent database: data seemed to be missing for patent numbers between 6,363,527 and 6,412,112.

Rick Neifeld, of Neifeld IP Law, responded that his 1999 survey into the PTO’s data revealed many errors in the USPTO’s data, as many of us have probably suspected for some time.  Rick’s description of these errors is very interesting:

The dirt consisted of things as minor as numerous misspellings of assignee names, or HTML pages non compliant with HTML standards, to HTML text that could not be deconstructed into component sections due to HTML formatting errors, assignment records that were combined, corrupt, unreadable.

I absolutely expect our current crop of electronic patent database to contain massive numbers of errors. We have to expect this, if only because of the sheer amount of information involved.   Another reason might be that the economic model of patent data production does not really encourage the national patent offices to maintain high quality electronic patent data. There are millions and millions of patent documents pouring out of government-run institutions (without a profit motive for perfection), and errors are bound to be rampant.

Continue reading

Spotlight on GenomeQuest

[tweetmeme source=”Intellogist” only_single=false]

Recently I was able to attend a demo of the GenomeQuest sequence searching tool, which is designed to support sequence searching for prior art investigations.   GenomeQuest provides access to proprietary patent database collections which have been indexed especially for sequence searching, as well as to public access databases of genetic and protein sequences.

One of GenomeQuest’s most notable databases, GQ-PAT, contains a proprietary collection of nucleotide and protein sequences extracted from patent collections, including the US, EPO, WO/PCT, and the DNA Databank of Japan (where the JPO deposits patents that contain sequences).  Because some WO/PCT documents are only available as images and not as electronic text, GenomeQuest employs an in-house Optical Character Recognition (OCR) process that can involve human editing with the assistance of a related machine-readable documents, such as a US family member.  The patents in GQ-PAT are also supplemented by corresponding INPADOC records to ensure that their legal status and assignee information stays up-to-date with this source.

Continue reading