There’s an interesting piece in the most recent issue of Nature. Corie Lok discusses how researchers are dealing with the information firehose that is scientific literature. How big is that firehose?
The 19 million citations and abstracts covered by the US National Library of Medicine’s PubMed search engine include nearly 830,000 articles published in 2009, up from some 814,000 in 2008 and around 772,000 in 2007. That growth rate shows no signs of abating, especially as emerging countries such as China and Brazil continue to ratchet up their research.
With that amount of data overload, how is an established researcher going to keep up with relevant work in their field? Nevermind how a new investigator can a handle on establishing a research focus, or a midcareer scientist switch tracks. That’s where literature mining comes in. There are several start-up services on the web to help scientists find relevant research, make connections, and generate hypotheses. A few of these are discussed in Lok’s article, but there are many others.
PubMed – My default search engine for research papers, this is the first stop for many scientists. So let’s try a little experiment. Searching “alcoholism” generates the following – Results: 1 to 20 of 66827. PubMed does have tools to help you narrow your search. Say I’m interested in the genetics of alcoholism. Searching “alcoholism genetics” returns – Results: 1 to 20 of 5940. A lot of research to go through. Restricting that search to only publications in the last 5 years, in English, and about humans gives Results: 1 to 20 of 1283. Still a lot of literature, even if you’re only reading the abstracts.
That’s where literature mining can really make a difference.


