Small Font Medium Font Large Font

R1b1* DNA Project

Project News

I recently undertook an examination of R1b1*. In case anyone wants

to peer over my shoulders, here's what I have so far.



R1b1* is a paragroup, generally defined as having the SNPs known as

M343 and P25 but lacking the downstream SNPs(M18,M73,M269, and

M335). R1b1* is much less common in Europe than its descendant

subclade R1b1c.



Through a variety of search methods, and with the help of friends on

this list, I found thirty-six haplotypes which are conclusively or

almost certainly R1b1*. These haplotypes, along with their

associated ysearch IDs, can be viewed in PDF or XLS form:



<a href='http://vizachero.com/R1b1/R1b1Table.pdf'>http://vizachero.com/R1b1/R1b1Table.pdf</a>



<a href='http://vizachero.com/R1b1/R1b1Table.xls'>http://vizachero.com/R1b1/R1b1Table.xls</a>





It is likely that there are additional R1b1* haplotypes in ysearch,

but except in a few limited cases it is necessary to have tested 25

or 37 STRs before one can conclusively draw a distinction between

R1b1* and R1b1c and this necessarily limited the pool. In some cases

personal knowledge of SNP tests provided some insights, and very near

or identical matches with established R1b1* members allowed the

inclusion of some 12-marker haplotypes in this analysis.



For the most part, R1b1* can be detected rather easily because it

almost always presents with DYS438=11 in conjunction with DYS464a=12

(or sometimes 13). In virtually all cases, nearby haplotypes (e.g.

R1a or R1b1c) could be excluded by examining their haplotypes at

alternate markers (e.g. DYS385) or their genetic proximity to known

Q, R1a, or R1b1c haplotypes.



Thankfully, an analysis of the R1b1* haplotypes revealed substantial

structure in the extant R1b1* population. This structure is robust,

and I was able to independently reproduce several clusters using a

variety of methods including median-joining networks, parsimony

trees, and distance trees. The following link is an example of a

distance-based tree, with the clades color-coded to match the

previous table of haplotypes.



<a href='http://vizachero.com/R1b1/R1b1Tree.pdf'>http://vizachero.com/R1b1/R1b1Tree.pdf</a>



All of the haplotypes in the tree above are in haplogroup R1b1* with

the exception of XKNX6, which is in haplogroup R2 and was included as

an out-group to root the tree. The tree shown is a UPGMA (distance-

based) tree, constructed using the Fitch method in Phylip and drawn

using FigTree. The inferred branching order of the clusters is

schematic but is also the most parsimonious explanation of the data.



Further, the structure of R1b1* was such that I could manually

construct a phylogeny using only three or four STRs with very low

mutation rates: DYS388, DYS426, DYS454, and YCA II. For simplicity,

I'll describe the clusters using this manual method.



Starting at the root, R1b1* can be divided into two large subgroups.

One group has DYS454=11 and the other group has DYS454=12.



The DYS454=11 group can be further divided into three clusters, one

each represented by DYS388=12, DYS388=13, and DYS388=14. It would

appear that the DYS388=14 cluster is a subclade of the DYS388=13

cluster.



The DYS454=12 group can be further divided into two smaller groups

using DYS426: one smaller group has DYS426=11 and the other has

DYS426=12. Additionally, the DYS426=12 appears to have further

substructure, with cluster having had a RecLOH event affecting YCA

II. One cluster has YCA II=21,21 and the other cluster having YCA

II=21,24.



It is also possible to distinguish five of these six clusters from

each other using only YCA II marker. The five distinct alleles are

18-23, 19-22, 21-21, 21-23, and 21-24 and the structure presented by

these alleles perfectly corresponds to the structure established

using DYS388, DYS426, and DYS454.



In summary, the six identified clusters are:



Cluster 1 (purple): DYS426=12, DYS388=12, DYS454=11, YCA II=19-22



Cluster 2 (blue): DYS426=12, DYS388=14, DYS454=11, YCA II=18-23



Cluster 3 (green): DYS426=12, DYS388=13, DYS454=11, YCA II=18-23



Cluster 4 (yellow): DYS426=11, DYS388=12, DYS454=12, YCA II=21-23



Cluster 5 (orange): DYS426=12, DYS388=12, DYS454=12, YCA II=21-21



Cluster 6 (red): DYS426=12, DYS388=12, DYS454=12, YCA II=21-24



It is worth noting that these four markers are among the slowest

mutating markers in FTDNA's 37 marker panel. Even the fastest

mutating of the four, YCA II, has an estimated mutation rate of

0.00123 or 1 mutation in 813 generations (24,000 years). The average

mutation rate for these four markers is 0.000425, or 1 mutation in

2,300 generations (70,000 years). It is for this reason that I am

not surprised that there is so little variance on these markers

WITHIN each of the six clusters. Moreover, the fact that there is so

much variance on these markers ACROSS the six clusters suggests to me

that the age of R1b1 is much, much older than any of these six

clusters or than its two major subclades (R1b1b and R1b1c since none

of these subclades show any variance in the four markers at all. My

analysis would, I think, be consistent with a age for R1b1 of 20,000

or more years. A proper dating using STR variance is problematic,

given the systematic nature of the search I used to find these

haplotypes in the first place, so I have not presented it.



It is worth noting that most cluster show strong geographic

localization, some more concentrated in Eastern Europe and others

more concentrated in Western Europe. Several of the haplotypes from

Eastern Europe are members of Sean Silver's Jewish R1b project. Some

of these Eastern European clusters also show signs of bottlenecking

during the historic period, which may be of interest to researchers

interested in Jewish genealogy.



A few word about search methods: in most cases I started with a

handful of haplotypes that I knew or suspected to be R1b1* based on

SNP tests or the DYS464/DYS438 motif mentioned earlier. In one case,

I asked a member of one of the clusters (surname Lumsden, cluster 5)

with the suspected to confirm it using a DeepSNP test and they

willingly did so. I then searched for neighbors in ysearch.



I iteratively examined each member of each cluster in ysearch, to

ensure that: 1) I captured all nearby examples within R1b1*; and 2)

that I excluded all nearby examples without R1b1*. In nearly each

case, when considering 37 markers, each cluster spanned a genetic

distance of significantly less than 10 and the nearest non-R1b1*

neighbor was a genetic distance of 19 or more away. Some of these

clusters were closer to R1a and R1b1c than others, but I am confident

that I have made few errors of inclusion or exclusion. To my

knowledge, each cluster includes at least one known SNP-tested

individual, and no cluster has a neighbor within a GD of 19 (at 37

markers) that has been SNP tested as anything BUT R1b1*. I made no

effort to exclude haplotypes that were very closely related or shared

similar surnames.



It would be possible, I think, to effect a similar search at SMGF.org

but probably not at yhrd.org since the later database includes few of

the slow-moving markers needed to discriminate R1b1* from R1b1c or R1a.



In summary, I came away from this exercise with a few conclusions:



1) The age of R1b1 is much older than an analysis of R1b1c would

suggest, a conclusion I reach based on the genetic distance between

these six clusters. I mention this as a caveat to those who would

extrapolate variance-based ages for R1b1c further upstream.



2) It should be possible to hypothesize where in R1b1* the major

subclades branched away. It looks to me like R1b1b is most closely

related to clusters 5 and 6 whereas R1b1c is most closely related to

cluster 2, though I admit that this notion could use a little more

development. Those interested could compare the modals for R1b1b

(4FNSC) or R1b1c (55GU9) to these R1b1* haplotype clusters.



3) The current Y-tree could need a massive revision if SNPs are found

to correspond to some or all of these clusters. Based on the trees I

constructed, I think it is not at all unlikely that two or more non-

redundant SNPs could be interjected between P25 and M269 and/or

between P25 and M73. While it is possible that some of these

clusters may ultimately be shown to be brother clades to R1b1b and

R1b1c, I think it is more likely that one of these clusters could end

up being a parent clade.



4) Tests of SNPs for placement on the Y-tree should include a wide

variety of R1b1* haplotypes, and at least one from each haplotype

cluster identified here. It is possible that SNPs currently believed

to be redundant based on testing one or two R1b1* haplotypes could be

found to be discriminatory based on testing haplotypes from a distant

cluster.