By Dai Cooper, via my grad student listserv.

A Bastion of Sanity in the Land of Oz
We need to do a better job advertising this, because I’m a scientist and this is the first I’ve heard of it. In the spirit of science literacy, here are a few links to explore.
The entire photo collection is an inspiration to women in science.
***
Speaking of science literacy: On the book club front, I’m about 2/3 of the way through January’s book. It’s interesting but dense reading, and I haven’t had as much time to read while nursing as I thought. Baby keeps grabbing the book. It’s easier to read while pumping, but that gives me roughly 30 minutes a day. I have had a couple of long waits at the car shop over the last week, so I’m making good progress. Maybe I’ll be finished by this weekend.
There’s an interesting piece in the most recent issue of Nature. Corie Lok discusses how researchers are dealing with the information firehose that is scientific literature. How big is that firehose?
The 19 million citations and abstracts covered by the US National Library of Medicine’s PubMed search engine include nearly 830,000 articles published in 2009, up from some 814,000 in 2008 and around 772,000 in 2007. That growth rate shows no signs of abating, especially as emerging countries such as China and Brazil continue to ratchet up their research.
With that amount of data overload, how is an established researcher going to keep up with relevant work in their field? Nevermind how a new investigator can a handle on establishing a research focus, or a midcareer scientist switch tracks. That’s where literature mining comes in. There are several start-up services on the web to help scientists find relevant research, make connections, and generate hypotheses. A few of these are discussed in Lok’s article, but there are many others.
PubMed – My default search engine for research papers, this is the first stop for many scientists. So let’s try a little experiment. Searching “alcoholism” generates the following – Results: 1 to 20 of 66827. PubMed does have tools to help you narrow your search. Say I’m interested in the genetics of alcoholism. Searching “alcoholism genetics” returns – Results: 1 to 20 of 5940. A lot of research to go through. Restricting that search to only publications in the last 5 years, in English, and about humans gives Results: 1 to 20 of 1283. Still a lot of literature, even if you’re only reading the abstracts.
That’s where literature mining can really make a difference.
A new study in PLoS Biology suggests one of the most common Western European Y halplogroups, R1b1b2, might have originated in Turkey and radiated into Europe with the spread of agriculture during the Neolithic. This is significant because this haplogroup is the most frequent in Western Europe, and has been posited as a signal from Paleolithic populations who were less impacted by the Neolithic Revolution.
The researchers compared STR variance for this haplotype in several European populations and three Turkish groups, and found a significant correlation (R2 = 0.358; p = 0.004) between that variance and the longitude of the population (i.e., how far east the population was located).
From the plot, the greatest variance (indicated by the most intense color) within haplogroup R1b1b2 is found in Turkey. They also calculated the time to most recent common ancestor (TMRCA) using STR variance, and found that the oldest lineages, dated between 7,000-7,989 years, are also in Turkey. The youngest lineage is in Cornwall, dating from 5,460 years ago. The researchers inferred that R1b1b2 originated in Anatolia and spread rapidly into Europe with the spread of agriculture.

Balaresque et al. Figure 1B. Frequency distribution of Haplogroup R1b1b2. More intense color indicate higher frequency.
A couple of things strike me about this study. 1) Haplogroup R1b1b2 reaches it’s highest frequencies in Western Europe, up to 85% of Y-chromosomes in Ireland belong to this haplogroup (Figure 1B). And there are two populations, one in Germany (GE1) and one on the northwest coast of France (FR2), with TMRCA dates in the range of the Turkish dates (7,282 and 7,384 years, respectively). 2) The Turkish data come from Cinnioglu et al. (2004), and consist of samples collected in 90 cities from blood banks, paternity clinics, and university students classified into geographical areas by self-reported “paternal residential heritage” (128). There is the possibility of introducing error into the sample from this self-reported residence. It’s also possible that the high variance present in the Turkish R1b1b2 lineages reflects more recent immigration. In addition, TMRCA applies to the molecule, not the populations in which it is found, so while a particular lineage may be 7,000 years old it does not mean that the population has been in that particular location for that length of time. And the authors note, in the supplemental information, “…there is a tendency for TMRCA to be underestimated when single-haplogroup data are considered.”
It’s an interesting hypothesis, though, and I’m curious to see what analyses with additional populations will show.
–
Balaresque P, Bowden GR, Adams SM, Leung HY, King TE, Rosser ZH, Goodwin J, Moisan JP, Richard C, Millward A, Demaine AG, Barbujani G, Previderè C, Wilson IJ, Tyler-Smith C, & Jobling MA (2010). A predominantly neolithic origin for European paternal lineages. PLoS biology, 8 (1) PMID: 20087410
Cinnioğlu C, King R, Kivisild T, Kalfoğlu E, Atasoy S, Cavalleri GL, Lillie AS, Roseman CC, Lin AA, Prince K, Oefner PJ, Shen P, Semino O, Cavalli-Sforza LL, & Underhill PA (2004). Excavating Y-chromosome haplotype strata in Anatolia. Human genetics, 114 (2), 127-48 PMID: 14586639