The Roma population is the largest transnational ethnic minority in Europe, characterized by a linguistic, cultural and historical heterogeneity. Comparative linguistics and genetic studies have placed the origin of European Roma in the Northwest of India. After their migration across Persia, they entered into the Balkan Peninsula, from where they spread into Europe, arriving in the Iberian Peninsula in the 15th century. Their particular demographic history has genetic implications linked to rare and common diseases. However, the South Asian source of the proto-Roma remains still untargeted and the West Eurasian Roma component has not been yet deeply characterized. Here, in order to describe both the South Asian and West Eurasian ancestries, we analyze previously published genome-wide data of 152 European Roma and 34 new Iberian Roma samples at a fine-scale and haplotype-based level, with special focus on the Iberian Roma genetic substructure. Our results suggest that the putative origin of the proto-Roma involves a Punjabi group with low levels of West Eurasian ancestry. In addition, we have identified a complex West Eurasian component (around 65%) in the Roma, as a result of the admixture events occurred with non-proto-Roma populations between 1270–1580. Particularly, we have detected the Balkan genetic footprint in all European Roma, and the Baltic and Iberian components in the Northern and Western Roma groups, respectively. Finally, our results show genetic substructure within the Iberian Roma, with different levels of West Eurasian admixture, as a result of the complex historical events occurred in the Peninsula.

1. Fraser A. The gypsies. Oxford: Wiley-Blackwell; 1992.

2. Kalaydjieva L, Gresham D, Calafell F. Genetic studies of the Roma (Gypsies): a review. BMC Med Genet. 2001;2(1):5.

3. Mendizabal I, Lao O, Marigorta UM, Kayser M, Comas D. Implications of population history of European Romani on genetic susceptibility to disease. Hum Hered. 2013;76(3–4):194–200. doi: 10.1159/000360762 24861864

4. Turner RL. The Position of Romani in Indo-Aryan. Gypsy Lore Society. London: B. Quaritch; 1927.

5. Boerger BH. Proto-Romanes phonology. Dissertation; 1984.

6. Reich D, Thangaraj K, Patterson N, Price AL, Singh L. Reconstructing Indian population history. Nature. 2009;461(7263):489. doi: 10.1038/nature08365 19779445

7. Sengupta D, Choudhury A, Basu A, Ramsay M. Population stratification and underrepresentation of Indian subcontinent genetic diversity in the 1000 genomes project dataset. Genome Biol Evol. 2016;8(11):3460–70. doi: 10.1093/gbe/evw244 27797945

8. Bouwer S, Angelicheva D, Chandler D, Seeman P, Tournev I, Kalaydjieva L. Carrier rates of the ancestral Indian W24X mutation in GJB2 in the general Gypsy population and individual subisolates. Genet Test. 2007;11(4):455–8. doi: 10.1089/gte.2007.0048 18294064

9. Azmanov DN, Dimitrova S, Florez L, Cherninkova S, Draganov D, Morar B, et al. LTBP2 and CYP1B1 mutations and associated ocular phenotypes in the Roma/Gypsy founder population. Eur J Hum Genet. 2011;19(3):326. doi: 10.1038/ejhg.2010.181 21081970

10. Mendizabal I, Valente C, Gusmão A, Alves C, Gomes V, Goios A, et al. Reconstructing the Indian origin and dispersal of the European Roma: a maternal genetic perspective. PLoS One. 2011 Jan;6(1):e15988. doi: 10.1371/journal.pone.0015988 21264345

11. Martínez-Cruz B, Mendizabal I, Harmant C, de Pablo R, Ioana M, Angelicheva D, et al. Origins, admixture and founder lineages in European Roma. Eur J Hum Genet. 2016;24(6):937. doi: 10.1038/ejhg.2015.201 26374132

12. Gresham D, Morar B, Underhill PA, Passarino G, Lin AA, Wise C, et al. Origins and divergence of the Roma (gypsies). Am J Hum Genet. 2001;69(6):1314–31. doi: 10.1086/324681 11704928

13. Sun C, Kong Q-P, Palanichamy M gounder, Agrawal S, Bandelt H-J, Yao Y-G, et al. The dazzling array of basal branches in the mtDNA macrohaplogroup M from India as inferred from complete genomes. Mol Biol Evol. 2005;23(3):683–90. doi: 10.1093/molbev/msj078 16361303

14. Gusmão A, Gusmao L, Gomes V, Alves C, Calafell F, Amorim A, et al. A Perspective on the History of the Iberian Gypsies Provided by Phylogeographic Analysis of Y-Chromosome Lineages. Ann Hum Genet. 2008;72(2):215–27.

15. Mendizabal I, Lao O, Marigorta UM, Wollstein A, Gusmão L, Ferak V, et al. Reconstructing the population history of European Romani from genome-wide data. Curr Biol. 2012;22(24):2342–9. doi: 10.1016/j.cub.2012.10.039 23219723

16. Moorjani P, Patterson N, Loh P-R, Lipson M, Kisfali P, Melegh BI, et al. Reconstructing Roma history from genome-wide data. PLoS One. 2013;8(3):e58633. doi: 10.1371/journal.pone.0058633 23516520

17. Leblon B. Les Gitans d’Espagne (The Gypsies of Spain). Paris: Presses Universitaires de France; 1985.

18. Bánfai Z, Pö styéni E, Bü ki G, Czakó M, Miseta A, Melegh B. Revealing the impact of the Caucasus region on the genetic legacy of Romani people from genome-wide data. PLoS One. 2018;19(9):e0202890.

19. Chaix R, Austerlitz F, Morar B, Kalaydjieva L, Heyer E. Vlax Roma history: What do coalescent-based methods tell us? Eur J Hum Genet. 2004;12(4):285–92. doi: 10.1038/sj.ejhg.5201126 14760363

20. Moorjani P, Thangaraj K, Patterson N, Lipson M, Loh P-R, Govindaraj P, et al. Genetic evidence for recent population mixture in India. Am J Hum Genet. 2013;93(3):422–38. doi: 10.1016/j.ajhg.2013.07.006 23932107

21. Lawson DJ, Hellenthal G, Myers S, Falush D. Inference of population structure using dense haplotype data. PLoS Genet. 2012;8(1):e1002453. doi: 10.1371/journal.pgen.1002453 22291602

22. Van Dorp L, Balding D, Myers S, Pagani L, Tyler-smith C. Evidence for a Common Origin of Blacksmiths and Cultivators in the Ethiopian Ari within the Last 4500 Years: Lessons for Clustering-Based Inference. PLoS Genet. 2015;11(8):e1005397. doi: 10.1371/journal.pgen.1005397 26291793

23. Singh P, Thandi SS, others. Punjabi identity in a global context. Punjabi Identity in a Global Context. Oxford University Press; 1999.

24. Blum MGB, Heyer E, François O, Austerlitz F. Matrilineal fertility inheritance detected in Hunter-Gatherer populations using the imbalance of gene genealogies. PLoS Genet. 2006;2(8):1138–46.

25. Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, et al. Genetic structure of human populations. Science. 2002;298(5602):2381–5. doi: 10.1126/science.1078311 12493913

26. Ayub Q, Mezzavilla M, Pagani L, Haber M, Mohyuddin A, Khaliq S, et al. The Kalash genetic isolate: ancient divergence, drift, and selection. Am J Hum Genet. 2015;96(5):775–83. doi: 10.1016/j.ajhg.2015.03.012 25937445

27. Hellenthal G, Falush D, Myers S, Reich D, Busby GBJ, Lipson M, et al. The Kalash genetic isolate? the evidence for recent admixture. Am J Hum Genet. 2016;98(2):396. doi: 10.1016/j.ajhg.2015.12.025 26849116

28. Bycroft C, Fernandez-Rozadilla C, Ruiz-Ponte C, Quintela I, Carracedo Á, Donnelly P, et al. Patterns of genetic differentiation and the footprints of historical migrations in the Iberian Peninsula. Nat Commun. 2019;10(1):1–14. doi: 10.1038/s41467-018-07882-8

29. Arauna LR, Hellenthal G, Comas D. Dissecting human North African gene-flow into its western coastal surroundings. Proc R Soc B Biol Sci. 2019;286(1902).

30. Marushiakova E, Popov V. Gypsies in the Ottoman Empire: A Contribution to the History of the Balkans. Vol. 22. Hatfield: Univ of Hertfordshire Press; 2001.

31. Hellenthal G, Busby GBJ, Band G, Wilson JF, Capelli C, Falush D, et al. A genetic atlas of human admixture history. Science. 2014;343(6172):747–51. doi: 10.1126/science.1243518 24531965

32. Shackle C. Punjabi in Lahore. Mod Asian Stud. 1970;4(3):239–67.

33. Alfonso-Sánchez MA, Espinosa I, Gómez-Pérez L, Poveda A, Rebato E, Peña JA. Tau haplotypes support the Asian ancestry of the Roma population settled in the Basque Country. Heredity (Edinb). 2018;120(2):91–9.

34. Regueiro M, Rivera L, Chennakrishnaiah S, Popovic B, Andjus S, Milasin J, et al. Ancestral modal Y-STR haplotype shared among Romani and South Indian populations. Gene. 2012;504(2):296–302. doi: 10.1016/j.gene.2012.04.093 22609956

35. Soulis GC. The Gypsies in the Byzantine Empire and the Balkans in the late Middle Ages. Dumbart Oaks Pap. 1961;15:141–65.

36. Aguirre Felipe J. Historia de las itinerancias gitanas. De la India a Andalucia. Zaragoza: Institución Fernando el Católico; 2006.

37. Martínez MM. Los gitanos y gitanas de España a mediados del siglo XVIII. El fracaso de un proyecto de “exterminio” (1748–1765). Soc Educ Hist. 2015;4(3):312–4.

38. 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491(7422):56. doi: 10.1038/nature11632 23128226

39. Arauna LR, Mendoza-Revilla J, Mas-Sandoval A, Izaabel H, Bekada A, Benhamamouch S, et al. Recent historical migrations have shaped the gene pool of Arabs and Berbers in North Africa. Mol Biol Evol. 2017;34(2):1–12.

40. Mallick S, Li H, Lipson M, Mathieson I, Gymrek M, Racimo F, et al. The Simons genome diversity project: 300 genomes from 142 diverse populations. Nature. 2016;538(7624):201. doi: 10.1038/nature18964 27654912

41. Pagani L, Lawson DJ, Jagoda E, Mörseburg A, Eriksson A, Mitt M, et al. Genomic analyses inform on migration events during the peopling of Eurasia. Nature. 2016;538(7624):238. doi: 10.1038/nature19792 27654910

42. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75. doi: 10.1086/519795 17701901

43. Lazaridis I, Nadel D, Rollefson G, Merrett DC, Rohland N, Mallick S, et al. Genomic insights into the origin of farming in the ancient Near East. Nature. 2016;536(7617):419. doi: 10.1038/nature19310 27459054

44. Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2:e190. doi: 10.1371/journal.pgen.0020190 17194218

45. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19(9):1655–64. doi: 10.1101/gr.094052.109 19648217

46. Behr AA, Liu KZ, Liu-fang G, Nakka P, Ramachandran S. pong: fast analysis and visualization of latent clusters in population genetic data. Bioinformatics. 2016;32(18):2817–23. doi: 10.1093/bioinformatics/btw327 27283948

47. O’Connell J, Gurdasani D, Delaneau O, Pirastu N, Ulivi S, Cocca M, et al. A general approach for haplotype phasing across the full spectrum of relatedness. PLoS Genet. 2014;10(4):e1004234. doi: 10.1371/journal.pgen.1004234 24743097

48. International HapMap Consortium. The international HapMap project. Nature. 2003;426(6968):789. doi: 10.1038/nature02168 14685227

49. Leslie S, Winney B, Hellenthal G, Davison D, Boumertit A, Day T, et al. The fine-scale genetic structure of the British population. Nature. 2015;519(7543):309. doi: 10.1038/nature14230 25788095

50. Nychka Douglas, Furrer Reinhard, Paige John, Stephan Sain. fields: Tools for spatial data. Boulder, CO, USA; 2017.

51. Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y. Ancient Admixture in Human History. Genetics. 2012;192:1065–93. doi: 10.1534/genetics.112.145037 22960212

52. Mondal M, Casals F, Xu T, Olio GMD, Pybus M, Netea MG. Genomic analysis of Andamanese provides insights into ancient human migration into Asia and adaptation. Nat Genet. 2016;48(1066–1070).

53. Chacón-Duque J-C, Adhikari K, Fuentes-Guajardo M, Mendoza-Revilla J, Acuña-Alonzo V, Barquera R, et al. Latin Americans show wide-spread Converso ancestry and imprint of local Native ancestry on physical appearance. Nat Commun. 2018;9(1):5388. doi: 10.1038/s41467-018-07748-z 30568240

54. Kirin M, McQuillan R, Franklin CS, Campbell H, McKeigue PM, Wilson JF. Genomic runs of homozygosity record population history and consanguinity. PLoS One. 2010;5(11):e13996. doi: 10.1371/journal.pone.0013996 21085596

55. Browning BL, Browning SR. Improving the Accuracy and Efficiency of Identity-by-Descent Detection in Population Data. Genetics. 2013;194:459–71. doi: 10.1534/genetics.113.150029 23535385

56. Browning SR, Browning BL, Daviglus ML, Durazo RA, Schneiderman N, Kaplan RC, et al. Ancestry-specific recent effective population size in the Americas. PLoS Genet. 2018;14(5):e1007385. doi: 10.1371/journal.pgen.1007385 29795556

57. Browning SR, Browning BL. Accurate non-parametric estimation of recent effective population size from segments of identity by descent. Am J Hum Genet. 2015;97(3):404–18. doi: 10.1016/j.ajhg.2015.07.012 26299365

58. Maples BK, Gravel S, Kenny EE, Bustamante CD. RFMix: A Discriminative Modeling Approach for Rapid and Robust Local-Ancestry Inference. Am J Hum Genet. 2013;93(2):278–88. doi: 10.1016/j.ajhg.2013.06.020 23910464

59. Xue J, Lencz T, Darvasi A, Pe’er I, Carmi S. The time and place of European admixture in Ashkenazi Jewish history. PLoS Genet. 2017;13(4):1–27.

60. Tcherenkov L, Laederich S. The Roma. History, Language, and Groups. Basel: Schwabe Verlag Basel; 2004.

61. Kenrick D. Historical dictionary of the Gypsies (Romanies). Vol. 7. Scarecrow Press; 2007.

