New DNA
evidence is solving the most fought-over question in Indian history. And you
will be surprised at how sure-footed the answer is, writes Tony Joseph
The thorniest, most fought-over
question in Indian history is slowly but surely getting answered: did
Indo-European language speakers, who called themselves Aryans, stream into
India sometime around 2,000 BC – 1,500 BC when the Indus Valley civilisation
came to an end, bringing with them Sanskrit and a distinctive set of cultural
practices? Genetic research based on an avalanche of new DNA evidence is making
scientists around the world converge on an unambiguous answer: yes, they did.
This may come as a surprise to many
— and a shock to some — because the dominant narrative in recent years has been
that genetics research had thoroughly disproved the Aryan migration theory.
This interpretation was always a bit of a stretch as anyone who read the
nuanced scientific papers in the original knew. But now it has broken apart
altogether under a flood of new data on Y-chromosomes (or chromosomes that are
transmitted through the male parental line, from father to son).
Lines of descent
Until recently, only data on mtDNA
(or matrilineal DNA, transmitted only from mother to daughter) were available
and that seemed to suggest there was little external infusion into the Indian
gene pool over the last 12,500 years or so. New Y-DNA data has turned that
conclusion upside down, with strong evidence of external infusion of genes into
the Indian male lineage during the period in question.
The reason for the difference in
mtDNA and Y-DNA data is obvious in hindsight: there was strong sex bias in
Bronze Age migrations. In other words, those who migrated were predominantly
male and, therefore, those gene flows do not really show up in the mtDNA data. On
the other hand, they do show up in the Y-DNA data: specifically, about 17.5% of
Indian male lineage has been found to belong to haplogroup R1a (haplogroups
identify a single line of descent), which is today spread across Central Asia,
Europe and South Asia. Pontic-Caspian Steppe is seen as the region from where
R1a spread both west and east, splitting into different sub-branches along the
way.
The paper that put all of the
recent discoveries together into a tight and coherent history of migrations
into India was published just three months ago in a peer-reviewed journal
called ‘BMC Evolutionary Biology’. In that paper, titled “A Genetic Chronology
for the Indian Subcontinent Points to Heavily Sex-biased Dispersals”, 16
scientists led by Prof. Martin P. Richards of the University of Huddersfield,
U.K., concluded: “Genetic influx from Central Asia in the Bronze Age was
strongly male-driven, consistent with the patriarchal, patrilocal and
patrilineal social structure attributed to the inferred pastoralist early
Indo-European society. This was part of a much wider process of Indo-European
expansion, with an ultimate source in the Pontic-Caspian region, which carried
closely related Y-chromosome lineages… across a vast swathe of Eurasia between
5,000 and 3,500 years ago”.
In an email exchange, Prof.
Richards said the prevalence of R1a in India was “very powerful evidence for a
substantial Bronze Age migration from central Asia that most likely brought
Indo-European speakers to India.” The robust conclusions of Professor Richards
and his team rest on their own substantive research as well as a vast trove of
new data and findings that have become available in recent years, through the
work of genetic scientists around the world.
Peter Underhill, scientist at the
Department of Genetics at the Stanford University School of Medicine, is one of
those at the centre of the action. Three years ago, a team of 32 scientists he
led published a massive study mapping the distribution and linkages of R1a. It
used a panel of 16,244 male subjects from 126 populations across Eurasia. Dr.
Underhill’s research found that R1a had two sub-haplogroups, one found
primarily in Europe and the other confined to Central and South Asia.
Ninety-six per cent of the R1a samples in Europe belonged to sub-haplogroup
Z282, while 98.4% of the Central and South Asian R1a lineages belonged to
sub-haplogroup Z93. The two groups diverged from each other only about 5,800 years
ago. Dr. Underhill’s research showed that within the Z93 that is predominant in
India, there is a further splintering into multiple branches. The paper found
this “star-like branching” indicative of rapid growth and dispersal. So if you
want to know the approximate period when Indo-European language speakers came
and rapidly spread across India, you need to discover the date when Z93
splintered into its own various subgroups or lineages. We will come back to
this later.
So in a nutshell: R1a is distributed
all over Europe, Central Asia and South Asia; its sub-group Z282 is distributed
only in Europe while another subgroup Z93 is distributed only in parts of
Central Asia and South Asia; and three major subgroups of Z93 are distributed
only in India, Pakistan, Afghanistan and the Himalayas. This clear picture of
the distribution of R1a has finally put paid to an earlier hypothesis that this
haplogroup perhaps originated in India and then spread outwards. This
hypothesis was based on the erroneous assumption that R1a lineages in India had
huge diversity compared to other regions, which could be indicative of its
origin here. As Prof. Richards puts it, “the idea that R1a is very diverse in
India, which was largely based on fuzzy microsatellite data, has been laid to
rest” thanks to the arrival of large numbers of genomic Y-chromosome data.
Gene-dating the migration
Now that we know that there WAS
indeed a significant inflow of genes from Central Asia into India in the Bronze
Age, can we get a better fix on the timing, especially the splintering of Z93
into its own sub-lineages? Yes, we can; the research paper that answers this
question was published just last year, in April 2016, titled: “Punctuated
bursts in human male demography inferred from 1,244 worldwide Y-chromosome
sequences.” This paper, which looked at major expansions of Y-DNA haplogroups
within five continental populations, was lead-authored by David Poznik of the
Stanford University, with Dr. Underhill as one of the 42 co-authors. The study
found “the most striking expansions within Z93 occurring approximately 4,000 to
4,500 years ago”. This is remarkable, because roughly 4,000 years ago is when
the Indus Valley civilization began falling apart. (There is no evidence so
far, archaeologically or otherwise, to suggest that one caused the other; it is
quite possible that the two events happened to coincide.)
The avalanche of new data has been
so overwhelming that many scientists who were either sceptical or neutral about
significant Bronze Age migrations into India have changed their opinions. Dr.
Underhill himself is one of them. In a 2010 paper, for example, he had written
that there was evidence “against substantial patrilineal gene flow from East
Europe to Asia, including to India” in the last five or six millennia. Today,
Dr. Underhill says there is no comparison between the kind of data available in
2010 and now. “Then, it was like looking into a darkened room from the outside
through a keyhole with a little torch in hand; you could see some corners but
not all, and not the whole picture. With whole genome sequencing, we can now
see nearly the entire room, in clearer light.”
Dr. Underhill is not the only one
whose older work has been used to argue against Bronze Age migrations by
Indo-European language speakers into India. David Reich, geneticist and
professor in the Department of Genetics at the Harvard Medical School, is
another one, even though he was very cautious in his older papers. The best
example is a study lead-authored by Reich in 2009, titled “Reconstructing
Indian Population History” and published in Nature. This study used
the theoretical construct of “Ancestral North Indians” (ANI) and “Ancestral
South Indians” (ASI) to discover the genetic substructure of the Indian
population. The study proved that ANI are “genetically close to Middle
Easterners, Central Asians, and Europeans”, while the ASI were unique to India.
The study also proved that most groups in India today can be approximated as a
mixture of these two populations, with the ANI ancestry higher in traditionally
upper caste and Indo-European speakers. By itself, the study didn’t disprove
the arrival of Indo-European language speakers; if anything, it suggested the
opposite, by pointing to the genetic linkage of ANI to Central Asians.
However, this theoretical structure
was stretched beyond reason and was used to argue that these two groups came to
India tens of thousands of years ago, long before the migration of
Indo-European language speakers that is supposed to have happened only about
4,000 to 3,500 years ago. In fact, the study had included a strong caveat that
suggested the opposite: “We caution that ‘models’ in population genetics should
be treated with caution. While they provide an important framework for testing
historical hypothesis, they are oversimplifications. For example, the true
ancestral populations were probably not homogenous as we assume in our model
but instead were likely to have been formed by clusters of related groups that
mixed at different times.” In other words, ANI is likely to have resulted from
multiple migrations, possibly including the migration of Indo-European language
speakers.
The spin and the facts
But how was this research covered
in the media? “Aryan-Dravidian divide a myth: Study,” screamed a newspaper
headline on September 25, 2009. The article quoted Lalji Singh, a co-author of
the study and a former director of the Centre for Cellular and Molecular
Biology (CCMB), Hyderabad, as saying: “This paper rewrites history… there is no
north-south divide”. The report also carried statements such as: “The initial
settlement took place 65,000 years ago in the Andamans and in ancient south
India around the same time, which led to population growth in this part. At a
later stage, 40,000 years ago, the ancient north Indians emerged which in turn
led to rise in numbers there. But at some point in time, the ancient north and
the ancient south mixed, giving birth to a different set of population. And
that is the population which exists now and there is a genetic relationship
between the population within India.” The study, however, makes no such
statements whatsoever — in fact, even the figures 65,000 and 40,000 do not
figure it in it!
This stark contrast between what
the study says and what the media reports said did not go unnoticed. In his
column for Discover magazine, geneticist Razib Khan said this about
the media coverage of the study: “But in the quotes in the media the other
authors (other than Reich that is - ed) seem to be leading you to totally different
conclusions from this. Instead of leaning toward ANI being proto-Indo-European,
they deny that it is.”
Let’s leave that there, and ask
what Reich says now, when so much new data have become available? In an
interview with Edge in February last year, while talking about the thesis that
Indo-European languages originated in the Steppes and then spread to both
Europe and South Asia, he said: “The genetics is tending to support the Steppe
hypothesis because in the last year, we have identified a very strong pattern
that this ancient North Eurasian ancestry that you see in Europe today, we now
know when it arrived in Europe. It arrived 4500 years ago from the East from
the Steppe...” About India, he said: “In India, you can see, for example, that
there is this profound population mixture event that happens between 2000 to
4000 years ago. It corresponds to the time of the composition of the Rigveda,
the oldest Hindu religious text, one of the oldest pieces of literature in the
world, which describes a mixed society...” In essence according to Reich, in
broadly the same time frame, we see Indo-European language speakers spreading
out both to Europe and to South Asia, causing major population upheavals.
The dating of the “profound
population mixture event” that Reich refers to was arrived at in a paper that
was published in the American Journal of Human Genetics in 2013, and
was lead authored by Priya Moorjani of the Harvard Medical School, and
co-authored, among others, by Reich and Lalji Singh. This paper too has been
pushed into serving the case against migrations of Indo-European language
speakers into India, but the paper itself says no such thing, once again!
Here’s what it says in one place: “The dates we report have significant implications for Indian history in the sense that they document a period of demographic and cultural change in which mixture between highly differentiated populations became pervasive before it eventually became uncommon. The period of around 1,900–4,200 years before present was a time of profound change in India, characterized by the de-urbanization of the Indus civilization, increasing population density in the central and downstream portions of the Gangetic system, shifts in burial practices, and the likely first appearance of Indo-European languages and Vedic religion in the subcontinent.”
The study didn’t “prove” the
migration of Indo-European language speakers since its focus was different:
finding the dates for the population mixture. But it is clear that the authors
think its findings fit in well with the traditional reading of the dates for
this migration. In fact, the paper goes on to correlate the ending of
population mixing with the shifting attitudes towards mixing of the races in
ancient texts. It says: “The shift from widespread mixture to strict endogamy
that we document is mirrored in ancient Indian texts.”
So irrespective of the use to which
Priya Moorjani et al’s 2013 study is put, what is clear is that the authors
themselves admit their study is fully compatible with, and perhaps even
strongly suggests, Bronze Age migration of Indo-European language speakers. In
an email to this writer, Moorjani said as much. In answer to a question about
the conclusions of the recent paper of Prof. Richards et al that there were strong,
male-driven genetic inflows from Central Asia about 4,000 years ago, she said
she found their results “to be broadly consistent with our model”. She also
said the authors of the new study had access to ancient West Eurasian samples
“that were not available when we published in 2013”, and that these samples had
provided them additional information about the sources of ANI ancestry in South
Asia.
One by one, therefore, every single
one of the genetic arguments that were earlier put forward to make the case
against Bronze Age migrations of Indo-European language speakers have been
disproved. To recap:
1. The first argument was that
there were no major gene flows from outside to India in the last 12,500 years
or so because mtDNA data showed no signs of it. This argument was found faulty
when it was shown that Y-DNA did indeed show major gene flows from outside into
India within the last 4000 to 4,500 years or so, especially R1a which now forms
17.5% of the Indian male lineage. The reason why mtDNA data behaved differently
was that Bronze Age migrations were severely sex-biased.
2. The second argument put forward
was that R1a lineages exhibited much greater diversity in India than elsewhere
and, therefore, it must have originated in India and spread outward. This has
been proved false because a mammoth, global study of R1a haplogroup published
last year showed that R1a lineages in India mostly belong to just three
subclades of the R1a-Z93 and they are only about 4,000 to 4,500 years old.
3. The third argument was that
there were two ancient groups in India, ANI and ASI, both of which settled here
tens of thousands of years earlier, much before the supposed migration of
Indo-European languages speakers to India. This argument was false to begin
with because ANI — as the original paper that put forward this theoretical
construct itself had warned — is a mixture of multiple migrations, including
probably the migration of Indo-European language speakers.
Connecting the dots
Two additional things should be
kept in mind while looking at all this evidence. The first is how multiple
studies in different disciplines have arrived at one specific period as an
important marker in the history of India: around 2000 B.C. According to the
Priya Moorjani et al study, this is when population mixing began on a large
scale, leaving few population groups anywhere in the subcontinent untouched.
The Onge in the Andaman and Nicobar Islands are the only ones we know to have
been completely unaffected by what must have been a tumultuous period. And
according to the David Poznik et al study of 2016 on the Y-chromosome, 2000
B.C. is around the time when the dominant R1a subclade in India, Z93, began
splintering in a “most striking” manner, suggesting “rapid growth and
expansion”. Lastly, from long-established archaeological studies, we also know
that 2000 BC was around the time when the Indus Valley civilization began to
decline. For anyone looking at all of these data objectively, it is difficult
to avoid the feeling that the missing pieces of India’s historical puzzle are
finally falling into place.
The second is that many studies
mentioned in this piece are global in scale, both in terms of the questions
they address and in terms of the sampling and research methodology. For
example, the Poznik study that arrived at 4,000-4,500 years ago as the dating
for the splintering of the R1a Z93 lineage, looked at major Y-DNA expansions
not just in India, but in four other continental populations. In the Americas,
the study proved the expansion of haplogrop Q1a-M3 around 15,000 years ago,
which fits in with the generally accepted time for the initial colonisation of
the continent. So the pieces that are falling in place are not merely in India,
but all across the globe. The more the global migration picture gets filled in,
the more difficult it will be to overturn the consensus that is forming on how
the world got populated.
Nobody explains what is happening
now better than Reich: “What’s happened very rapidly, dramatically, and
powerfully in the last few years has been the explosion of genome-wide studies
of human history based on modern and ancient DNA, and that’s been enabled by
the technology of genomics and the technology of ancient DNA. Basically, it’s a
gold rush right now; it’s a new technology and that technology is being applied
to everything we can apply it to, and there are many low-hanging fruits, many
gold nuggets strewn on the ground that are being picked up very rapidly.”
So far, we have only looked at the
migrations of Indo-European language speakers because that has been the most
debated and argued about historical event. But one must not lose the bigger
picture: R1a lineages form only about 17.5 % of Indian male lineage, and an
even smaller percentage of the female lineage. The vast majority of Indians owe
their ancestry mostly to people from other migrations, starting with the
original Out of Africa migrations of around 55,000 to 65,000 years ago, or the
farming-related migrations from West Asia that probably occurred in multiple
waves after 10,000 B.C., or the migrations of Austro-Asiatic speakers such as
the Munda from East Asia the dating of which is yet to determined, and the
migrations of Tibeto-Burman speakers such as the Garo again from east Asia, the
dating of which is also yet to be determined.
What is abundantly clear is that we
are a multi-source civilization, not a single-source one, drawing its cultural
impulses, its tradition and practices from a variety of lineages and migration
histories. The Out of Africa immigrants, the pioneering, fearless explorers who
discovered this land originally and settled in it and whose lineages still form
the bedrock of our population; those who arrived later with a package of
farming techniques and built the Indus Valley civilization whose cultural ideas
and practices perhaps enrich much of our traditions today; those who arrived
from East Asia, probably bringing with them the practice of rice cultivation
and all that goes with it; those who came later with a language called Sanskrit
and its associated beliefs and practices and reshaped our society in
fundamental ways; and those who came even later for trade or for conquest and
chose to stay, all have mingled and contributed to this civilization we call
Indian. We are all migrants.
Tony
Joseph is a writer and former editor of BusinessWorld. Twitter: @tjoseph0010
Source: thehindu