Population genetic models of GERP scores suggest pervasive turnover of constrained sites across mammalian evolution

Autoři: Christian D. Huber aff001;  Bernard Y. Kim aff002;  Kirk E. Lohmueller aff003
Působiště autorů: School of Biological Sciences, University of Adelaide, Adelaide, South Australia, Australia aff001;  Department of Biology, Stanford University, Stanford, California, United States of America aff002;  Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, United States of America aff003;  Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America aff004;  Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America aff005
Vyšlo v časopise: Population genetic models of GERP scores suggest pervasive turnover of constrained sites across mammalian evolution. PLoS Genet 16(5): e32767. doi:10.1371/journal.pgen.1008827
Kategorie: Research Article
doi: 10.1371/journal.pgen.1008827


Comparative genomic approaches have been used to identify sites where mutations are under purifying selection and of functional consequence by searching for sequences that are conserved across distantly related species. However, the performance of these approaches has not been rigorously evaluated under population genetic models. Further, short-lived functional elements may not leave a footprint of sequence conservation across many species. We use simulations to study how one measure of conservation, the Genomic Evolutionary Rate Profiling (GERP) score, relates to the strength of selection (Nes). We show that the GERP score is related to the strength of purifying selection. However, changes in selection coefficients or functional elements over time (i.e. functional turnover) can strongly affect the GERP distribution, leading to unexpected relationships between GERP and Nes. Further, we show that for functional elements that have a high turnover rate, adding more species to the analysis does not necessarily increase statistical power. Finally, we use the distribution of GERP scores across the human genome to compare models with and without turnover of sites where mutations are under purifying selection. We show that mutations in 4.51% of the noncoding human genome are under purifying selection and that most of this sequence has likely experienced changes in selection coefficients throughout mammalian evolution. Our work reveals limitations to using comparative genomic approaches to identify deleterious mutations. Commonly used GERP score thresholds miss over half of the noncoding sites in the human genome where mutations are under purifying selection.

Klíčová slova:

Comparative genomics – Deletion mutation – Genome evolution – Human genomics – Natural selection – Phylogenetic analysis – Sequence alignment – Substitution mutation


1. Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am J Hum Genet. 2017;101: 5–22. doi: 10.1016/j.ajhg.2017.06.005 28686856

2. Edwards SL, Beesley J, French JD, Dunning AM. Beyond GWASs: illuminating the dark road from association to function. Am J Hum Genet. 2013;93: 779–797. doi: 10.1016/j.ajhg.2013.10.012 24210251

3. Schubert M, Jónsson H, Chang D, Der Sarkissian C, Ermini L, Ginolhac A, et al. Prehistoric genomes reveal the genetic foundation and cost of horse domestication. Proc Natl Acad Sci U S A. 2014;111: E5661–9. doi: 10.1073/pnas.1416991111 25512547

4. Marsden CD, Vecchyo DO-D, O’Brien DP, Taylor JF, Ramirez O, Vilà C, et al. Bottlenecks and selective sweeps during domestication have increased deleterious genetic variation in dogs. Proc Natl Acad Sci U S A. 2016;113: 152–157. doi: 10.1073/pnas.1512501113 26699508

5. Henn BM, Botigué LR, Peischl S, Dupanloup I, Lipatov M, Maples BK, et al. Distance from sub-Saharan Africa predicts mutational load in diverse human genomes. Proc Natl Acad Sci U S A. 2016;113: E440–E449. doi: 10.1073/pnas.1510805112 26712023

6. van der Valk T, de Manuel M, Marques-Bonet T, Guschanski K. Estimates of genetic load in small populations suggest extensive purging of deleterious alleles. bioRxiv. 2019. doi: 10.1101/696831

7. Cooper GM, Stone EA, Asimenos G, NISC Comparative Sequencing Program, Green ED, Batzoglou S, et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 2005;15: 901–913. doi: 10.1101/gr.3577405 15965027

8. Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 2010;20: 110–121. doi: 10.1101/gr.097857.109 19858363

9. Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005;15: 1034–1050. doi: 10.1101/gr.3715005 16024819

10. Margulies EH, Blanchette M, NISC Comparative Sequencing Program, Haussler D, Green ED. Identification and characterization of multi-species conserved sequences. Genome Res. 2003;13: 2507–2518. doi: 10.1101/gr.1602203 14656959

11. Asthana S, Roytberg M, Stamatoyannopoulos J, Sunyaev S. Analysis of Sequence Conservation at Nucleotide Resolution. PLoS Computational Biology. 2007; 3: e254. doi: 10.1371/journal.pcbi.0030254 18166073

12. Boffelli D, McAuliffe J, Ovcharenko D, Lewis KD, Ovcharenko I, Pachter L, et al. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science. 2003;299: 1391–1394. doi: 10.1126/science.1081331 12610304

13. Miller W, Makova KD, Nekrutenko A, Hardison RC. Comparative genomics. Annu Rev Genomics Hum Genet. 2004;5: 15–56. doi: 10.1146/annurev.genom.5.061903.180057 15485342

14. Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420: 520–562. doi: 10.1038/nature01262 12466850

15. Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003;31: 3812–3814. doi: 10.1093/nar/gkg509 12824425

16. Adzhubei I, Jordan DM, Sunyaev SR. Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet. 2013;Chapter 7: Unit7.20.

17. Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 2019;47: D886–D894. doi: 10.1093/nar/gky1016 30371827

18. Huang Y-F, Gulko B, Siepel A. Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data. Nat Genet. 2017;49: 618–624. doi: 10.1038/ng.3810 28288115

19. Davydov EV, Goode DL, Sirota M, Cooper GM, Sidow A, Batzoglou S. Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++. PLoS Comput Biol. 2010;6: e1001025. doi: 10.1371/journal.pcbi.1001025 21152010

20. Cooper GM, Stone EA, Asimenos G, NISC Comparative Sequencing Program, Green ED, Batzoglou S, et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 2005;15: 901–913. doi: 10.1101/gr.3577405 15965027

21. Kimura M. On the probability of fixation of mutant genes in a population. Genetics. 1962;47: 713–719. 14456043

22. Lanfear R, Kokko H, Eyre-Walker A. Population size and the rate of evolution. Trends Ecol Evol. 2014;29: 33–41. doi: 10.1016/j.tree.2013.09.009 24148292

23. Lawrie DS, Petrov DA. Comparative population genomics: power and principles for the inference of functionality. Trends Genet. 2014;30: 133–139. doi: 10.1016/j.tig.2014.02.002 24656563

24. Nielsen R, Yang Z. Estimating the distribution of selection coefficients from phylogenetic data with applications to mitochondrial and viral DNA. Mol Biol Evol. 2003;20: 1231–1239. doi: 10.1093/molbev/msg147 12777508

25. Rands CM, Meader S, Ponting CP, Lunter G. 8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage. PLoS Genet. 2014;10: e1004525. doi: 10.1371/journal.pgen.1004525 25057982

26. Gulko B, Hubisz MJ, Gronau I, Siepel A. A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat Genet. 2015;47: 276–283. doi: 10.1038/ng.3196 25599402

27. McAuliffe JD, Jordan MI, Pachter L. Subtree power analysis and species selection for comparative genomics. Proc Natl Acad Sci U S A. 2005;102: 7900–7905. doi: 10.1073/pnas.0502790102 15911755

28. Koepfli K-P, Paten B, O’Brien SJ, the Genome 10K Community of Scientists. The Genome 10K Project: A Way Forward. Annual Review of Animal Biosciences. 2015;3: 57–111. doi: 10.1146/annurev-animal-090414-014900 25689317

29. Genome 10K Community of Scientists. Genome 10K: A Proposal to Obtain Whole-Genome Sequence for 10 000 Vertebrate Species. Journal of Heredity. 2009;100: 659–674. doi: 10.1093/jhered/esp086 19892720

30. Smith NGC, Brandström M, Ellegren H. Evidence for turnover of functional noncoding DNA in mammalian genome evolution. Genomics. 2004;84: 806–813. doi: 10.1016/j.ygeno.2004.07.012 15475259

31. Guenet JL. The mouse genome. Genome Research. 2005;15: 1729–1740. doi: 10.1101/gr.3728305 16339371

32. Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 2010;20: 110–121. doi: 10.1101/gr.097857.109 19858363

33. Ponting CP, Hardison RC. What fraction of the human genome is functional? Genome Res. 2011;21: 1769–1776. doi: 10.1101/gr.116814.110 21875934

34. The ENCODE Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489: 57–74. doi: 10.1038/nature11247 22955616

35. Graur D, Zheng Y, Price N, Azevedo RBR, Zufall RA, Elhaik E. On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE. Genome Biol Evol. 2013;5: 578–590. doi: 10.1093/gbe/evt028 23431001

36. Doolittle WF. Is junk DNA bunk? A critique of ENCODE. Proceedings of the National Academy of Sciences. 2013;110: 5294–5300.

37. Meader S, Ponting CP, Lunter G. Massive turnover of functional sequence in human and other mammalian genomes. Genome Res. 2010;20: 1335–1343. doi: 10.1101/gr.108795.110 20693480

38. Ward LD, Kellis M. Evidence of abundant purifying selection in humans for recently acquired regulatory functions. Science. 2012;337: 1675–1678. doi: 10.1126/science.1225057 22956687

39. Ludwig M. Functional evolution of noncoding DNA. Current Opinion in Genetics & Development. 2002;12: 634–639.

40. Bullaughey K. Changes in selective effects over time facilitate turnover of enhancer sequences. Genetics. 2011;187: 567–582. doi: 10.1534/genetics.110.121590 21098721

41. Henn BM, Botigué LR, Bustamante CD, Clark AG, Gravel S. Estimating the mutation load in human genomes. Nat Rev Genet. 2015;16: 333–343. doi: 10.1038/nrg3931 25963372

42. Wang L, Beissinger TM, Lorant A, Ross-Ibarra C, Ross-Ibarra J, Hufford MB. The interplay of demography and selection during maize domestication and expansion. Genome Biol. 2017;18: 215. doi: 10.1186/s13059-017-1346-4 29132403

43. Pedersen C-ET, Lohmueller KE, Grarup N, Bjerregaard P, Hansen T, Siegismund HR, et al. The Effect of an Extreme and Prolonged Population Bottleneck on Patterns of Deleterious Variation: Insights from the Greenlandic Inuit. Genetics. 2017;205: 787–801. doi: 10.1534/genetics.116.193821 27903613

44. Kim BY, Huber CD, Lohmueller KE. Inference of the Distribution of Selection Coefficients for New Nonsynonymous Mutations Using Large Samples. Genetics. 2017;206: 345–361. doi: 10.1534/genetics.116.197145 28249985

45. Torgerson DG, Boyko AR, Hernandez RD, Indap A, Hu X, White TJ, et al. Evolutionary Processes Acting on Candidate cis-Regulatory Regions in Humans Inferred from Patterns of Polymorphism and Divergence. PLoS Genet. 2009;5: e1000592. doi: 10.1371/journal.pgen.1000592 19662163

46. Church DM, Goodstadt L, Hillier LW, Zody MC, Goldstein S, She X, et al. Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biol. 2009;7: e1000112. doi: 10.1371/journal.pbio.1000112 19468303

47. Pheasant M, Mattick JS. Raising the estimate of functional human sequences. Genome Res. 2007;17: 1245–1253. doi: 10.1101/gr.6406307 17690206

48. Künstner A, Nabholz B, Ellegren H. Significant selective constraint at 4-fold degenerate sites in the avian genome and its consequence for detection of positive selection. Genome Biol Evol. 2011;3: 1381–1389.

49. Hellmann I, Zollner S, Enard W, Ebersberger I, Nickel B, Paabo S. Selection on human genes as revealed by comparisons to chimpanzee cDNA. Genome Res. 2003;13: 831–837. doi: 10.1101/gr.944903 12727903

50. Alföldi J, Lindblad-Toh K. Comparative genomics as a tool to understand evolution and disease. Genome Res. 2013;23: 1063–1068. doi: 10.1101/gr.157503.113 23817047

51. Lewinger JP, Conti DV, Baurley JW, Triche TJ, Thomas DC. Hierarchical Bayes prioritization of marker associations from a genome-wide association scan for further investigation. Genet Epidemiol. 2007;31: 871–882. doi: 10.1002/gepi.20248 17654612

52. Chen GK, Witte JS. Enriching the analysis of genomewide association studies with hierarchical modeling. Am J Hum Genet. 2007;81: 397–404. doi: 10.1086/519794 17668389

53. King DC, Taylor J, Elnitski L, Chiaromonte F, Miller W, Hardison RC. Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences. Genome Res. 2005;15: 1051–1060. doi: 10.1101/gr.3642605 16024817

54. Gazal S, Finucane HK, Furlotte NA, Loh P-R, Palamara PF, Liu X, et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat Genet. 2017;49: 1421–1427. doi: 10.1038/ng.3954 28892061

55. Finucane HK, Bulik-Sullivan B, Gusev A, Trynka G, Reshef Y, Loh P-R, et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet. 2015;47: 1228–1235. doi: 10.1038/ng.3404 26414678

56. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536: 285–291. doi: 10.1038/nature19057 27535533

57. Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al. Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes. bioRxiv. 2019. doi: 10.1101/531210

58. Schrider DR, Kern AD. Inferring Selective Constraint from Population Genomic Data Suggests Recent Regulatory Turnover in the Human Brain. Genome Biol Evol. 2015;7: 3511–3528. doi: 10.1093/gbe/evv228 26590212

59. Havrilla JM, Pedersen BS, Layer RM, Quinlan AR. A map of constrained coding regions in the human genome. Nat Genet. 2019;51: 88–95. doi: 10.1038/s41588-018-0294-6 30531870

60. Gulko B, Siepel A. An evolutionary framework for measuring epigenomic information and estimating cell-type-specific fitness consequences. Nat Genet. 2019;51: 335–342. doi: 10.1038/s41588-018-0300-z 30559490

61. Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46: 310–315. doi: 10.1038/ng.2892 24487276

62. Spielman SJ, Wilke CO. Pyvolve: A Flexible Python Module for Simulating Sequences along Phylogenies. PLoS One. 2015;10: e0139047. doi: 10.1371/journal.pone.0139047 26397960

Článek vyšel v časopise

PLOS Genetics

2020 Číslo 5
Nejčtenější tento týden
Nejčtenější v tomto čísle
Zapomenuté heslo

Nemáte účet?  Registrujte se

Zapomenuté heslo

Zadejte e-mailovou adresu, se kterou jste vytvářel(a) účet, budou Vám na ni zaslány informace k nastavení nového hesla.


Nemáte účet?  Registrujte se