[tweetmeme source=”Intellogist” only_single=false]
Searching through genetic data is a highly specialized skill, but prior art searchers looking for systems or tools to support genetic searches do have a variety of options, especially with free tools. The Intellogist article on “Biotechnology Searching Best Practices” lists subscription-based systems and databases that support sequence searching in patent collections: GenomeQuest, DGENE, USGENE, and PCTGENE. The article also mentions that the free NCBI network hosts a patent file searchable by “BLAST (Basic Local Alignment Search Tool) algorithms for polypeptide and nucleic acid sequence similarity searching.” The NCBI network of resources hosts a variety of free genetic search tools and databases, which we’ll discuss after the jump. Other free genetic search tools also exist online that utilize the BLAST algorithm or link to PubMed NCBI documents.
Read on to learn about a variety of free tools for searching genetic data, including search systems on NCBI (BLAST and OMIM), iHOP, and NextBio!
NCBI – BLAST and OMIM
The National Center for Biotechnology Information (NCBI) hosts a wide range of tools, including many resources related to DNA, genetics, proteins, and sequence analysis. This post will focus on two of the search tools available on NCBI:
- NCBI BLAST – The BLAST homepage on the NCBI network gives a brief description of how the tool functions:
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
Searchers can find a variety of useful help guides on searching with BLAST through the NCBI Bookshelf. A detailed description of how to correctly search on the NCBI platform using BLAST can be found on Chapter 16 of the NCBI Handbook. A more detailed manual, BLAST® Help, is available to view online. The BLAST homepage also links to many help resources (including the previously mentioned guides) through the BLAST help section.
- Online Mendelian Inheritance in Man ® (OMIM) – According to its homepage, OMIM contains full text overviews detailing “all known mendelian disorders and over 12,000 genes.” The database is authored and edited at the McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, under the direction of Dr. Ada Hamosh and is updated daily. The collection is also searchable through OMIM.org.
Users can search OMIM through five available search forms (between the NCBI portal and OMIM.org):
- NCBI search form with limiters – Users can enter search criteria into a single search form (which does support field tags and Boolean operators). Users can then select specific limiters to narrow the search, such as “Search in Field(s),” “MIM Number Prefix,” “Chromosome(s),” “Only Records with,” “Creation Date,” and “Last Modification.”
- Simple search form on OMIM.org, which supports Boolean operators, field searching, parentheses, proximity searching, wildcard operators, (+/-) operators, and “term weight boosting.” See the search guide for for instructions on query construction and all available field codes in OMIM.
- Advanced search form on OMIM.org, which has the same functionality as the simple search form, but limiters similar to those on the NCBI search form are available.
- Clinical Synopsis advanced search form on OMIM.org, which allows users to search clinical synopses, sort them by relevance or date, and limit records to certain topics. This form has its own search fields.
- Gene Map advanced search on OMIM.org (also linked to on NCBI portal), which searches “the cytogenetic locations of genes and disorders, respectively, that are described in OMIM. Only OMIM entries for which a cytogenetic location has been published in the cited references are represented” in the Gene Map. Within the search form, users can narrow the search by selecting chromosome numbers as limiters. This form has its own search fields.
Information Hyperlinked Over Proteins (iHOP), created by Robert Hoffman, is a free resource which uses the genes and proteins listed in PubMed as hyperlinks between sentences and abstracts. According to the homepage, The iHOP network contains more than 2,700 organisms, 110,000 genes, and 28.4 million sentences and is updated daily.
Through the home page, users can search for a gene synonym or accession number. A drop-down menu beside the search form can limit the search to all fields, synonyms, any accession number, NCBI Gene, UniProt, or Google. Users can also limit the search to genes of specific organisms via a second drop-down menu.
The hit list contains a list of genes that match the query, including gene symbol, name, synonym/DB-reference, and organism. The iHOP help section lists the following record views for each gene:
- Defining information for this gene – This view contains all sentences found in the literature that mention the main gene (gene X) together with relevant biomedical terms (e.g. lymphoma). Sentences are ranked by significance. Click on the “double paper” icon beside sentences to read the corresponding abstracts or full text papers. Interesting sentences can be collected into a Gene model by clicking on the “plus sign” icon.
- Interaction information for this gene – This view contains all sentences found in the literature for the main gene of a page (gene A) and other genes (gene B). Gene symbols within sentences are hyperlinked. Therefore, clicking on a gene symbol (e.g. gene B) will bring you to the page of gene B. Besides literature information, these pages contain interactions collected from external resources (e.g. large scale experimental data). Users can access this interaction data by clicking on the “beaker” icon beside each sentence.
- Most recent information for this gene
- Minimal information for this gene – General information (symbol, name, organism, etc.), Useful links to external resources (e.g. UniProt, NCBI, OMIM, etc.), links to other iHOP views on this gene, homologues, enhanced PubMed/Google query.
NextBio Public is an online library of genomic data curated from sources such as GEO and dbGaP which allows users to mine billions of precomputed data correlations and thousands of public genomic studies. Users can register for a free account with NextBio (with the use of an email address from a valid government, academic, or non-profit institution) in order to access the following features:
- Access to full view of search results and all search sections (“Apps”).
- Import data into NextBio and correlate it with public data.
- Save and share search results (through My Studies, My Projects, Bookmarks, and Collaborations).
Apps available for registered users on NextBio include:
- QuickView – Under “NextBio Summary,” lists of top 5 relevant results from different search Apps (Disease Atlas, Curated Studies, Literature, Clinical Trials, etc). Under “General Info,” a brief definition of the term.
- Genomic Applications – Search and rank public genomic data through these applications.
- Curated Studies – Query or browse all studies curated by NextBio. You can query by gene, SNP, sequence region, biogroup, bioset, phenotype, compound, tissue, or keyword. Or browse using filters and text-based search.
- Body Atlas – View the tissues, cell types, and cell lines in which a queried gene, biogroup, or bioset is significantly expressed or enriched.
- Disease Atlas – Find diseases, traits, conditions, and surrogate endpoints associated with a queried gene, sequence region, SNP, biogroup, or bioset.
- Pharmaco Atlas – Discover which compounds and treatments affect a queried gene, sequence region, biogroup, or bioset.
- Knockdown Atlas – Perform a knockdown, knockout, or overexpression experiment in reverse: See which genetic perturbations affect a queried gene, sequence region, biogroup, or bioset.
- Genetic Markers – Locate genes and SNPs that are significantly linked to a queried phenotype, tissue, or compound.
- Biogroups – Find biogroups for which your queried bioset, phenotype, tissue, or compound is highly enriched.
- Genome Browser – View experimental datasets on genomes through an interactive display.
- Literature – Do a classic PubMed literature search, or search hundreds of biology and health-related news sources with any query term or keyword.
- Clinical Trials – Find clinical trials with any query term or keyword.
- Meta-Analysis – Discover which genes are significantly regulated in common, across up to 50 biosets.
All apps (except the interactive Genome Browser display) have two search forms: the keyword search form and the “search sequence regions” form. The “search sequence region” form allows the user to select the organism type and chromosome via drop-down menus and define the start and stop points on the sequence.
Most free genetic search tools available on the internet use open-access data from NCBI network databases and resources, although each search tool provides unique search options to manipulate this open-access data. The iHOP tool allows users to search through PubMed documents for genetic data by turning gene and protein names in the sentences and abstracts of the records into hyperlinks. NextBio provides an entire toolkit of applications which searchers can use for locating genetic data in a variety of document types. Prior art searchers should use subscription databases and systems, like GenomeQuest, to locate genetic data in patent literature. The free genetic search tools on the NCBI network, as well as additional free tools like iHOP and NextBio, can be useful for locating genetic data in non-patent literature.
Do you know of any free genetic search tools? Have you searched through NCBI BLAST, OMIM, iHOP, or NextBio? Let us know in the comments!
This post was contributed by Joelle Mornini. The Intellogist blog is provided for free by Intellogist’s parent company Landon IP, a major provider of patent searches, trademark searches, technical translations, and information retrieval services.