This was part of a much wider process of Indo-European expansion, with an ultimate source in the Pontic-Caspian region, which carried closely related Y-chromosome lineages, a smaller fraction of autosomal genome-wide variation and an even smaller fraction of mitogenomes across a vast swathe of Eurasia between 5 and 3.5 ka.Following the out-of-Africa (OOA) migration, South Asia (or the Indian Subcontinent, here comprising India, Pakistan, Bangladesh, Sri Lanka, Nepal and Bhutan) was probably one of the earliest corridors of dispersal taken by anatomically modern humans (AMH) .We included three additional 1KGP populations—Han Chinese from Beijing, China (CHB), Tuscans from Italy (TSI) and Yoruba from Nigeria (YRI)—for ADMIXTURE v1.23 [ 0.25, window size of 100 SNPs and step size of 1) included 66,245 SNPs, for ADMIXTURE analysis, and 64,926 SNPs for the PCA.
Indo-European, spoken across northern and central India, and also in Pakistan and Bangladesh, has been frequently connected to the so-called “Indo-Aryan invasions” from Central Asia ~3.5 ka and the establishment of the caste system, but the extent of immigration at this time remains extremely controversial.
Whilst current genome-wide analyses conflate all dispersals from Southwest and Central Asia, we were able to tease out from the mitogenome data distinct dispersal episodes dating from between the Last Glacial Maximum to the Bronze Age.
Moreover, we found an extremely marked sex bias by comparing the different genetic systems.
India, the second most populous country worldwide, includes a patchwork of different religions and languages, including tribal groups (~8% of the population, speaking over 700 different dialects of the Austro-Asiatic, Dravidian and Tibeto-Burman families) and non-tribal populations, who mostly practice Hinduism, grounded in a strictly hierarchical caste system, and speak Indo-European or Dravidian languages.
Indo-European is often associated with northern Indian populations, Pakistan and Bangladesh, and a putative arrival in South Asia from Southwest Asia ~3.5 ka (the so-called “Indo-Aryan invasions”) has been frequently connected with the origins of the caste system  which, combined with a very complex history, makes the genetic study of Indian populations challenging.