Mining Scientific Literature

ResearchBlogging.orgThere’s an interesting piece in the most recent issue of Nature.  Corie Lok discusses how researchers are dealing with the information firehose that is scientific literature.  How big is that firehose?

The 19 million citations and abstracts covered by the US National Library of Medicine’s PubMed search engine include nearly 830,000 articles published in 2009, up from some 814,000 in 2008 and around 772,000 in 2007. That growth rate shows no signs of abating, especially as emerging countries such as China and Brazil continue to ratchet up their research.

With that amount of data overload, how is an established researcher going to keep up with relevant work in their field? Nevermind how a new investigator can a handle on establishing a research focus, or a midcareer scientist switch tracks.  That’s where literature mining comes in.  There are several start-up services on the web to help scientists find relevant research, make connections, and generate hypotheses. A few of these are discussed in Lok’s article, but there are many others.

PubMed – My default search engine for research papers, this is the first stop for many scientists.  So let’s try a little experiment.  Searching “alcoholism” generates the following – Results: 1 to 20 of 66827. PubMed does have tools to help you narrow your search.  Say I’m interested in the genetics of alcoholism. Searching “alcoholism genetics” returns – Results: 1 to 20 of 5940. A lot of research to go through. Restricting that search to only publications in the last 5 years, in English, and about humans gives Results: 1 to 20 of 1283. Still a lot of literature, even if you’re only reading the abstracts.

That’s where literature mining can really make a difference.

Scientific Literature Mining Services

  • pubget – One of the most time consuming tasks in research is going through the literature and trying to stay current. Add to that most institution’s clunky access to online resources, and the process can be painfully slow.  I can search for articles from my library homepage, but have to go through several extra steps to actually get my hands on the pdf. The “find full text” function in the newest version of EndNote has been a tremendous help in accessing content, but for those researchers at institutions that don’t provide that service (and even those that do but want an additional resource), pubget is a handy tool.  When you create an account, pubget signs in to your institution and allows you to search the subscribed resources. When you find a reference you want, just click the pdf icon and there it is. No clicking through to content provider websites. You can tag references as “keepers” to come back to them later, or search for the newest articles from a particular journal. Unfortunately, it doesn’t allow you to annotate the pdfs. Not yet anyway.
YouTube Preview Image
  • GoPubMed – A search engine where “your keywords are submitted to PubMed and the resulting abstracts are classified using Gene Ontology and Medical Subject Headings (MeSH).” So using our example search “alcoholism genetics” pulls 5,940 abstracts from PubMed. What’s different about GoPubMed’s results is the navigation panel.

GoPubMed top terms

You can click the check boxes next to the relevant terms. Restricting the search to “genetic predisposition to disease” gives 720 references. Under the Knowledge Base > Named Groups category, you could restrict it further to only studies using adult subjects, or to exclude studies in children. GoPubMed provides additional information on your search topic as well, including top authors (who’s doing the research) and top journals (who’s publishing the research). There’s even a network of top authors, so you can see how they collaborate on your search topic.

Author network for genetic predisposition to alcoholism.

Philanthropologist might recognize a name toward the top of that network. A useful tool for finding potential collaborators.

  • NextBio – A freemium service (basic is free, pro is subscription) that allows researchers to set up a profile, but also has a database for lit mining.  Searching for “alcoholism” in NextBio brings up several relevant sources, including associated genes, literature, researchers, news, even clinical trials.

    NextBio genes associated with alcoholism

  • EbiMed – analyzes PubMed results “to offer a complete overview on associations between UniProt protein/gene names, GO annotations, Drugs and Species.” Searching “alcoholism” here turns up a table of links to various sources, including proteins, genes, and biological processes related to the search term.
  • PubGene – specifically for finding genes, PubGene draws a network of genes associated with a particular keyword.

    PubGene network for alcoholism

This network is searchable. Clicking on a gene allows you to browse the literature associated with that gene, in addition to highlighting associations with other genes.  As a geneticist, this tool has considerable utility.

  • PubAnatomy – “integrates [the] Allen Brain Atlas gene expression data, relationships between brain regions and diseases for more efficient exploration of Medline database and gene expression data.” A keyword search for “alcoholism” lights up regions on the brain map associated with the disease.

PubAnatomy brain map

And clicking one of those highlighted regions displays the relevant references for that brain structure, as well as genes that are active in that region.

  • Neuroscience Information Framework – “An initiative of the NIH Blueprint for Neuroscience Research, the Neuroscience Information Framework advances neuroscience research by enabling discovery and access to public research data and tools worldwide through an open source, networked environment.” This tool can search the full text of articles (at least those that are open source), rather than just the abstract. But it does much more.

Second from the bottom, NIF diplays grants related to your search. You can read the abstract, see who is doing research on your topic, and which granting agencies are funding that research. Useful info when preparing your own proposals.

All of these tools help scientists sip from that firehose, giving different ways to access and interact with the data. What an exciting time to be starting a career in science.

Lok, C. (2010). Literature mining: Speed reading Nature, 463 (7280), 416-418 DOI: 10.1038/463416a

Blog Widget by LinkWithin

Tags: , ,

6 Tweets

Additional comments powered by BackType