<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Freethinker's Asylum &#187; text mining</title>
	<atom:link href="http://freethinkersasylum.com/tag/text-mining/feed/" rel="self" type="application/rss+xml" />
	<link>http://freethinkersasylum.com</link>
	<description>A Bastion of Sanity in the Land of Oz</description>
	<lastBuildDate>Sat, 14 Aug 2010 23:24:44 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=abc</generator>
		<item>
		<title>Mining Scientific Literature</title>
		<link>http://freethinkersasylum.com/2010/01/mining-scientific-literature/</link>
		<comments>http://freethinkersasylum.com/2010/01/mining-scientific-literature/#comments</comments>
		<pubDate>Sat, 30 Jan 2010 19:10:23 +0000</pubDate>
		<dc:creator>Kris</dc:creator>
				<category><![CDATA[Education]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[online tools]]></category>
		<category><![CDATA[science 2.0]]></category>
		<category><![CDATA[text mining]]></category>

		<guid isPermaLink="false">http://freethinkersasylum.com/?p=598</guid>
		<description><![CDATA[There&#8217;s an interesting piece in the most recent issue of Nature.  Corie Lok discusses how researchers are dealing with the information firehose that is scientific literature.  How big is that firehose? The 19 million citations and abstracts covered by the US National Library of Medicine’s PubMed search engine include nearly 830,000 articles published in 2009, [...]]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_blue" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Ffreethinkersasylum.com%252F2010%252F01%252Fmining-scientific-literature%252F%22%2C%20%22shorturl%22%3A%20%22http%3A%2F%2Fbit.ly%2Fczxfr3%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22Mining%20Scientific%20Literature%22%20%7D);"></div>
<p><span style="float: left; padding: 5px;"><a href="http://www.researchblogging.org"><img style="border: 0;" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" alt="ResearchBlogging.org" /></a></span>There&#8217;s an interesting piece in the most recent issue of <em>Nature</em>.  Corie Lok discusses how researchers are dealing with the information firehose that is scientific literature.  How big is that firehose?</p>
<blockquote><p>The 19 million citations and abstracts covered by the US National Library of Medicine’s <a href="http://www.ncbi.nlm.nih.gov/pubmed/">PubMed</a> search engine include nearly 830,000 articles published in 2009, up from some 814,000 in 2008 and around 772,000 in 2007. That growth rate shows no signs of abating, especially as emerging countries such as China and Brazil continue to ratchet up their research.</p></blockquote>
<p>With that amount of data overload, how is an established researcher going to keep up with relevant work in their field? Nevermind how a new investigator can get a handle on establishing a research focus, or a midcareer scientist switch tracks.  That&#8217;s where literature mining comes in.  There are several start-up services on the web to help scientists find relevant research, make connections, and generate hypotheses. A few of these are discussed in Lok&#8217;s article, but there are many others.</p>
<p><a href="http://www.ncbi.nlm.nih.gov/pubmed/">PubMed</a> &#8211; My default search engine for research papers, this is the first stop for many scientists.  So let&#8217;s try a little experiment.  Searching &#8220;alcoholism&#8221; generates the following &#8211; <strong>Results: 1 to 20 of 66827. </strong>PubMed does have tools to help you narrow your search.  Say I&#8217;m interested in the genetics of alcoholism. Searching &#8220;alcoholism genetics&#8221; returns &#8211; <strong>Results: 1 to 20 of 5940. </strong>A lot of research to go through. Restricting that search to only publications in the last 5 years, in English, and about humans gives <strong>Results: 1 to 20 of 1283. </strong>Still a lot of literature, even if you&#8217;re only reading the abstracts.</p>
<p>That&#8217;s where literature mining can really make a difference.</p>
<p><span id="more-598"></span></p>
<h3>Scientific Literature Mining Services</h3>
<ul>
<li><a href="http://pubget.com">pubget</a> &#8211; One of the most time consuming tasks in research is going through the literature and trying to stay current. Add to that most institution&#8217;s clunky access to online resources, and the process can be painfully slow.  I can search for articles from my library homepage, but have to go through several extra steps to actually get my hands on the pdf. The &#8220;find full text&#8221; function in the newest version of EndNote has been a tremendous help in accessing content, but for those researchers at institutions that don&#8217;t provide that service (and even those that do but want an additional resource), pubget is a handy tool.  When you create an account, pubget signs in to your institution and allows you to search the subscribed resources. When you find a reference you want, just click the pdf icon and there it is. No clicking through to content provider websites. You can tag references as &#8220;keepers&#8221; to come back to them later, or search for the newest articles from a particular journal. Unfortunately, it doesn&#8217;t allow you to annotate the pdfs. Not yet anyway.</li>
</ul>
<p><a href="http://freethinkersasylum.com/2010/01/mining-scientific-literature/"><em>Click here to view the embedded video.</em></a></p>
<ul>
<li><a href="http://www.gopubmed.com/">GoPubMed</a> &#8211; A search engine where &#8220;your keywords are submitted to PubMed and the resulting abstracts are classified using Gene Ontology and Medical Subject Headings (MeSH).&#8221; So using our example search &#8220;alcoholism genetics&#8221; pulls 5,940 abstracts from PubMed. What&#8217;s different about GoPubMed&#8217;s results is the navigation panel.</li>
</ul>
<div id="attachment_599" class="wp-caption aligncenter" style="width: 381px"><a href="http://freethinkersasylum.com/wp-content/uploads/2010/01/gopubmed.jpg"><img class="size-full wp-image-599" title="gopubmed" src="http://freethinkersasylum.com/wp-content/uploads/2010/01/gopubmed.jpg" alt="" width="371" height="520" /></a><p class="wp-caption-text">GoPubMed top terms</p></div>
<p style="text-align: left;">You can click the check boxes next to the relevant terms. Restricting the search to &#8220;genetic predisposition to disease&#8221; gives 720 references. Under the Knowledge Base &gt; Named Groups category, you could restrict it further to only studies using adult subjects, or to exclude studies in children. GoPubMed provides additional information on your search topic as well, including top authors (who&#8217;s doing the research) and top journals (who&#8217;s publishing the research). There&#8217;s even a network of top authors, so you can see how they collaborate on your search topic.</p>
<div id="attachment_600" class="wp-caption aligncenter" style="width: 310px"><a href="http://freethinkersasylum.com/wp-content/uploads/2010/01/network.jpg"><img class="size-medium wp-image-600" title="network" src="http://freethinkersasylum.com/wp-content/uploads/2010/01/network-300x246.jpg" alt="" width="300" height="246" /></a><p class="wp-caption-text">Author network for genetic predisposition to alcoholism.</p></div>
<p style="text-align: left;">Philanthropologist might recognize a name toward the top of that network. A useful tool for finding potential collaborators.</p>
<ul>
<li><a href="http://www.nextbio.com">NextBio</a> &#8211; A freemium service (basic is free, pro is subscription) that allows researchers to set up a profile, but also has a database for lit mining.  Searching for &#8220;alcoholism&#8221; in NextBio brings up several relevant sources, including associated genes, literature, researchers, news, even clinical trials.
<div id="attachment_602" class="wp-caption aligncenter" style="width: 487px"><a href="http://freethinkersasylum.com/wp-content/uploads/2010/01/genes.jpg"><img class="size-full wp-image-602" title="genes" src="http://freethinkersasylum.com/wp-content/uploads/2010/01/genes.jpg" alt="" width="477" height="207" /></a><p class="wp-caption-text">NextBio genes associated with alcoholism</p></div></li>
</ul>
<p style="text-align: left;"><a href="http://freethinkersasylum.com/wp-content/uploads/2010/01/researchers.jpg"><img class="aligncenter size-full wp-image-601" title="researchers" src="http://freethinkersasylum.com/wp-content/uploads/2010/01/researchers.jpg" alt="" width="467" height="446" /></a></p>
<ul>
<li><a href="http://www.ebi.ac.uk/Rebholz-srv/ebimed/index.jsp">EbiMed</a> &#8211; analyzes PubMed results &#8220;to offer a complete overview on  associations between  <a href="http://www.ebi.uniprot.org/">UniProt</a> protein/gene names,  <a href="http://www.geneontology.org/">GO</a> annotations,  <a href="http://www.nlm.nih.gov/medlineplus/druginformation.html">Drugs</a> and  <a href="http://www.ncbi.nih.gov/Taxonomy">Species</a>.&#8221; Searching &#8220;alcoholism&#8221; here turns up a table of links to various sources, including proteins, genes, and biological processes related to the search term.</li>
<li><a href="http://www.pubgene.org/">PubGene</a> &#8211; specifically for finding genes, PubGene draws a network of genes associated with a particular keyword.
<p><div id="attachment_605" class="wp-caption aligncenter" style="width: 494px"><a href="http://freethinkersasylum.com/wp-content/uploads/2010/01/network1.jpg"><img class="size-full wp-image-605" title="network" src="http://freethinkersasylum.com/wp-content/uploads/2010/01/network1.jpg" alt="" width="484" height="272" /></a><p class="wp-caption-text">PubGene network for alcoholism</p></div></li>
</ul>
<p>This network is searchable. Clicking on a gene allows you to browse the literature associated with that gene, in addition to highlighting associations with other genes.  As a geneticist, this tool has considerable utility.</p>
<ul>
<li><a href="http://brainarray.mbni.med.umich.edu/Brainarray/prototype/PubAnatomy/">PubAnatomy</a> &#8211; &#8220;integrates [the] Allen Brain Atlas gene expression data, relationships between brain regions and diseases for more efficient exploration of Medline database and gene expression data.&#8221; A keyword search for &#8220;alcoholism&#8221; lights up regions on the brain map associated with the disease.</li>
</ul>
<p style="text-align: left;">
<p><div id="attachment_606" class="wp-caption aligncenter" style="width: 489px"><a href="http://freethinkersasylum.com/wp-content/uploads/2010/01/brain.jpg"><img class="size-full wp-image-606 " title="brain" src="http://freethinkersasylum.com/wp-content/uploads/2010/01/brain.jpg" alt="" width="479" height="276" /></a><p class="wp-caption-text">PubAnatomy brain map</p></div>
<p>And clicking one of those highlighted regions displays the relevant references for that brain structure, as well as genes that are active in that region.</p>
<ul>
<li><a href="http://www.neuinfo.org">Neuroscience Information Framework</a> &#8211; &#8220;An initiative of the NIH Blueprint for Neuroscience Research, the Neuroscience Information Framework advances neuroscience research by enabling discovery and access to public research data and tools worldwide through an open source, networked environment.&#8221; This tool can search the full text of articles (at least those that are open source), rather than just the abstract. But it does much more.</li>
</ul>
<p><a href="http://freethinkersasylum.com/wp-content/uploads/2010/01/nif.jpg"><img class="aligncenter size-full wp-image-607" title="nif" src="http://freethinkersasylum.com/wp-content/uploads/2010/01/nif.jpg" alt="" width="194" height="283" /></a></p>
<p>Second from the bottom, NIF diplays <em>grants</em> related to your search. You can read the abstract, see who is doing research on your topic, and which granting agencies are funding that research. Useful info when preparing your own proposals.</p>
<p>All of these tools help scientists sip from that firehose, giving different ways to access and interact with the data. What an exciting time to be starting a career in science.</p>
<p>&#8211;</p>
<p><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature&amp;rft_id=info%3Adoi%2F10.1038%2F463416a&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Literature+mining%3A+Speed+reading&amp;rft.issn=0028-0836&amp;rft.date=2010&amp;rft.volume=463&amp;rft.issue=7280&amp;rft.spage=416&amp;rft.epage=418&amp;rft.artnum=http%3A%2F%2Fwww.nature.com%2Fdoifinder%2F10.1038%2F463416a&amp;rft.au=Lok%2C+C.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Clinical+Research%2CResearch+%2F+Scholarship%2CGenetics%2C+Publishing">Lok, C. (2010). Literature mining: Speed reading <span style="font-style: italic;">Nature, 463</span> (7280), 416-418 DOI: <a rev="review" href="http://dx.doi.org/10.1038/463416a">10.1038/463416a</a></span></p>

]]></content:encoded>
			<wfw:commentRss>http://freethinkersasylum.com/2010/01/mining-scientific-literature/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>
