Prioritizing sequence variants in conserved non-coding elements in the chicken genome using chCADD


Autoři: Christian Groß aff001;  Chiara Bortoluzzi aff003;  Dick de Ridder aff001;  Hendrik-Jan Megens aff003;  Martien A. M. Groenen aff003;  Marcel Reinders aff002;  Mirte Bosse aff003
Působiště autorů: Bioinformatics Group, Wageningen University & Research, 6708 PB, Wageningen, The Netherlands aff001;  Delft Bioinformatics Lab, University of Technology Delft, 2600 GA, Delft, The Netherlands aff002;  Delft Bioinformatics Lab, University of Technology Delft, 2600GA, Delft, The Netherlands aff002;  Animal Breeding and Genomics Group, Wageningen University & Research, 6708 PB, Wageningen, The Netherlands aff003
Vyšlo v časopise: Prioritizing sequence variants in conserved non-coding elements in the chicken genome using chCADD. PLoS Genet 16(9): e32767. doi:10.1371/journal.pgen.1009027
Kategorie: Research Article
doi: 10.1371/journal.pgen.1009027

Souhrn

The availability of genomes for many species has advanced our understanding of the non-protein-coding fraction of the genome. Comparative genomics has proven itself to be an invaluable approach for the systematic, genome-wide identification of conserved non-protein-coding elements (CNEs). However, for many non-mammalian model species, including chicken, our capability to interpret the functional importance of variants overlapping CNEs has been limited by current genomic annotations, which rely on a single information type (e.g. conservation). We here studied CNEs in chicken using a combination of population genomics and comparative genomics. To investigate the functional importance of variants found in CNEs we develop a ch(icken) Combined Annotation-Dependent Depletion (chCADD) model, a variant effect prediction tool first introduced for humans and later on for mouse and pig. We show that 73 Mb of the chicken genome has been conserved across more than 280 million years of vertebrate evolution. The vast majority of the conserved elements are in non-protein-coding regions, which display SNP densities and allele frequency distributions characteristic of genomic regions constrained by purifying selection. By annotating SNPs with the chCADD score we are able to pinpoint specific subregions of the CNEs to be of higher functional importance, as supported by SNPs found in these subregions are associated with known disease genes in humans, mice, and rats. Taken together, our findings indicate that CNEs harbor variants of functional significance that should be object of further investigation along with protein-coding mutations. We therefore anticipate chCADD to be of great use to the scientific community and breeding companies in future functional studies in chicken.

Klíčová slova:

Bird genomics – Genome annotation – Genomics – Chickens – Invertebrate genomics – Mammalian genomics – Sequence alignment – Single nucleotide polymorphisms


Zdroje

1. Consortium IHGS, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062 11237011

2. Consortium EP, et al. The ENCODE (ENCyclopedia of DNA elements) project. Science. 2004;306(5696):636–640. doi: 10.1126/science.1105136 15499007

3. Consortium EP, et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. nature. 2007;447(7146):799. doi: 10.1038/nature05874 17571346

4. Margulies EH, Cooper GM, Asimenos G, Thomas DJ, Dewey CN, Siepel A, et al. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome research. 2007;17(6):760–774. doi: 10.1101/gr.6034307 17567995

5. Alexander RP, Fang G, Rozowsky J, Snyder M, Gerstein MB. Annotating non-coding regions of the genome. Nature Reviews Genetics. 2010;11(8):559–571. doi: 10.1038/nrg2814 20628352

6. Haudry A, Platts AE, Vello E, Hoen DR, Leclercq M, Williamson RJ, et al. An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions. Nature genetics. 2013;45(8):891–898. doi: 10.1038/ng.2684 23817568

7. Alföldi J, Lindblad-Toh K. Comparative genomics as a tool to understand evolution and disease. Genome research. 2013;23(7):1063–1068. doi: 10.1101/gr.157503.113 23817047

8. Craig RJ, Suh A, Wang M, Ellegren H. Natural selection beyond genes: Identification and analyses of evolutionarily conserved elements in the genome of the collared flycatcher (Ficedula albicollis). Molecular ecology. 2018;27(2):476–492. doi: 10.1111/mec.14462 29226517

9. Berr T, Peticca A, Haudry A. Evidence for purifying selection on conserved noncoding elements in the genome of Drosophila melanogaster. bioRxiv. 2019; p. 623744.

10. Harmston N, Barešić A, Lenhard B. The mystery of extreme non-coding conservation. Philosophical Transactions of the Royal Society B: Biological Sciences. 2013;368(1632):20130021. doi: 10.1098/rstb.2013.0021 24218634

11. Braasch I, Gehrke AR, Smith JJ, Kawasaki K, Manousaki T, Pasquier J, et al. The spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons. Nature genetics. 2016;48(4):427–437. doi: 10.1038/ng.3526 26950095

12. Lindblad-Toh K, Garber M, Zuk O, Lin MF, Parker BJ, Washietl S, et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature. 2011;478(7370):476–482. doi: 10.1038/nature10530 21993624

13. Halligan DL, Kousathanas A, Ness RW, Harr B, Eöry L, Keane TM, et al. Contributions of protein-coding and regulatory change to adaptive molecular evolution in murid rodents. PLoS genetics. 2013;9(12). doi: 10.1371/journal.pgen.1003995 24339797

14. Hernandez RD, Kelley JL, Elyashiv E, Melton SC, Auton A, McVean G, et al. Classic selective sweeps were rare in recent human evolution. science. 2011;331(6019):920–924. doi: 10.1126/science.1198878 21330547

15. Williamson RJ, Josephs EB, Platts AE, Hazzouri KM, Haudry A, Blanchette M, et al. Evidence for widespread positive and negative selection in coding and conserved noncoding regions of Capsella grandiflora. PLoS genetics. 2014;10(9). doi: 10.1371/journal.pgen.1004622 25255320

16. Marcovitz A, Jia R, Bejerano G. “Reverse genomics” predicts function of human conserved noncoding elements. Molecular biology and evolution. 2016;33(5):1358–1369. doi: 10.1093/molbev/msw001 26744417

17. Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, et al. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;337(6099):1190–1195. doi: 10.1126/science.1222794 22955828

18. Bortoluzzi C, Megens HJ, Bosse M, Derks MF, Dibbits B, Laport K, et al. Parallel genetic origin of foot feathering in birds. Molecular Biology and Evolution. 2020;. doi: 10.1093/molbev/msaa092 32344429

19. Park PJ. ChIP–seq: advantages and challenges of a maturing technology. Nature reviews genetics. 2009;10(10):669–680. doi: 10.1038/nrg2641 19736561

20. Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nature genetics. 2014;46(3):310. doi: 10.1038/ng.2892 24487276

21. Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic acids research. 2019;47(D1):D886–D894. doi: 10.1093/nar/gky1016 30371827

22. Zhang G, Li C, Li Q, Li B, Larkin DM, Lee C, et al. Comparative genomics reveals insights into avian genome evolution and adaptation. Science. 2014;346(6215):1311–1320. doi: 10.1126/science.1251385 25504712

23. Meredith RW, Zhang G, Gilbert MTP, Jarvis ED, Springer MS. Evidence for a single loss of mineralized teeth in the common avian ancestor. Science. 2014;346(6215):1254390. doi: 10.1126/science.1254390 25504730

24. Lovell PV, Wirthlin M, Wilhelm L, Minx P, Lazar NH, Carbone L, et al. Conserved syntenic clusters of protein coding genes are missing in birds. Genome biology. 2014;15(12):565. doi: 10.1186/s13059-014-0565-1 25518852

25. Bornelöv S, Seroussi E, Yosefi S, Pendavis K, Burgess SC, Grabherr M, et al. Correspondence on Lovell et al.: identification of chicken genes previously assumed to be evolutionarily lost. Genome biology. 2017;18(1):112. doi: 10.1186/s13059-017-1231-1 28615067

26. Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic acids research. 2003;31(13):3812–3814. doi: 10.1093/nar/gkg509 12824425

27. Adzhubei I, Jordan DM, Sunyaev SR. Predicting functional effect of human missense mutations using PolyPhen-2. Current protocols in human genetics. 2013;76(1):7–20. doi: 10.1002/0471142905.hg0720s76 23315928

28. Choi Y, Chan AP. PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics. 2015;31(16):2745–2747. doi: 10.1093/bioinformatics/btv195 25851949

29. Groß C, de Ridder D, Reinders M. Predicting variant deleteriousness in non-human species: applying the CADD approach in mouse. BMC bioinformatics. 2018;19(1):1–10. doi: 10.1186/s12859-018-2337-5 30314430

30. Groß C, Derks M, Megens HJ, Bosse M, Groenen MA, Reinders M, et al. pCADD: SNV prioritisation in Sus scrofa. Genetics Selection Evolution. 2020;52(1):4. doi: 10.1186/s12711-020-0528-9 32033531

31. Bortoluzzi C, Bosse M, Derks MF, Crooijmans RP, Groenen MA, Megens HJ. The type of bottleneck matters: Insights into the deleterious variation landscape of small managed populations. Evolutionary applications. 2020;13(2):330–341. doi: 10.1111/eva.12872 31993080

32. Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. bioinformatics. 2009;25(14):1754–1760. doi: 10.1093/bioinformatics/btp324 19451168

33. Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P. Sambamba: fast processing of NGS alignment formats. Bioinformatics. 2015;31(12):2032–2034. doi: 10.1093/bioinformatics/btv098 25697820

34. Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv. 2012. arXiv preprint arXiv:12073907. 2012.

35. Paten B, Earl D, Nguyen N, Diekhans M, Zerbino D, Haussler D. Cactus: Algorithms for genome multiple sequence alignment. Genome research. 2011;21(9):1512–1528. doi: 10.1101/gr.123356.111 21665927

36. Green RE, Braun EL, Armstrong J, Earl D, Nguyen N, Hickey G, et al. Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs. Science. 2014;346(6215):1254449. doi: 10.1126/science.1254449 25504731

37. Hickey G, Paten B, Earl D, Zerbino D, Haussler D. HAL: a hierarchical format for storing and analyzing multiple genome alignments. Bioinformatics. 2013;29(10):1341–1342. doi: 10.1093/bioinformatics/btt128 23505295

38. Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome research. 2005;15(8):1034–1050. doi: 10.1101/gr.3715005 16024819

39. Sadri J, Diallo AB, Blanchette M. Predicting site-specific human selective pressure using evolutionary signatures. Bioinformatics. 2011;27(13):i266–i274. doi: 10.1093/bioinformatics/btr241 21685080

40. Miller W, Rosenbloom K, Hardison RC, Hou M, Taylor J, Raney B, et al. 28-way vertebrate alignment and conservation track in the UCSC Genome Browser. Genome research. 2007;17(12):1797–1808. doi: 10.1101/gr.6761107 17984227

41. Raudvere U, Kolberg L, Kuzmin I, Arak T, Adler P, Peterson H, et al. g: Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic acids research. 2019;47(W1):W191–W198. doi: 10.1093/nar/gkz369 31066453

42. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, et al. The ensembl variant effect predictor. Genome biology. 2016;17(1):122. doi: 10.1186/s13059-016-0974-4 27268795

43. Drake JA, Bird C, Nemesh J, Thomas DJ, Newton-Cheh C, Reymond A, et al. Conserved noncoding sequences are selectively constrained and not mutation cold spots. Nature genetics. 2006;38(2):223–227. doi: 10.1038/ng1710 16380714

44. Dalloul RA, Long JA, Zimin AV, Aslam L, Beal K, Blomberg LA, et al. Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS biology. 2010;8(9). doi: 10.1371/journal.pbio.1000475 20838655

45. Warren WC, Clayton DF, Ellegren H, Arnold AP, Hillier LW, Künstner A, et al. The genome of a songbird. Nature. 2010;464(7289):757–762. doi: 10.1038/nature08819 20360741

46. Alföldi J, Di Palma F, Grabherr M, Williams C, Kong L, Mauceli E, et al. The genome of the green anole lizard and a comparative analysis with birds and mammals. Nature. 2011;477(7366):587–591. doi: 10.1038/nature10390 21881562

47. Zhou T, Yang L, Lu Y, et al. DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale. Nucleic Acids Research. 2013;41:56–62. doi: 10.1093/nar/gkt437 23703209

48. Grantham R. Amino Acid Difference Formula to Help Explain Protein Evolution. Science. 1974;185(4154):862–864. doi: 10.1126/science.185.4154.862 4843792

49. NG P, Henikoff S. Predicting deleterious amino acid substitutions. Genome Research. 2001;11(5):863–874. doi: 10.1101/gr.176601 11337480

50. Foissac S, Djebali S, Munyard K, Vialaneix N, Rau A, Muret K, et al. Multi-species annotation of transcriptome and chromatin structure in domesticated animals. BMC Biology. 2019;17(108):863–874. doi: 10.1186/s12915-019-0726-5 31884969

51. Draper NR, Smith H. Applied regression analysis. vol. 326. John Wiley & Sons; 1998.

52. Lenffer J, Nicholas FW, Castle K, Rao A, Gregory S, Poidinger M, et al. OMIA (Online Mendelian Inheritance in Animals): an enhanced platform and integration into the Entrez search interface at NCBI. Nucleic acids research. 2006;34(suppl_1):D599–D601. doi: 10.1093/nar/gkj152 16381939

53. Zhao H, Sun Z, Wang J, Huang H, Kocher JP, Wang L. CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics. 2014;30(7):1006–1007. doi: 10.1093/bioinformatics/btt730 24351709

54. Truong C, Oudre L, Vayatis N. ruptures: change point detection in Python. arXiv preprint arXiv:180100826. 2018.

55. Hillier LW, Miller W, Birney E, Warren W, Hardison RC, Ponting CP, et al. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature. 2014;423(10):695–777. doi: 10.1038/nature03154 15592404

56. Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, et al. Ultraconserved elements in the human genome. Science. 2004;304(5675):1321–1325. doi: 10.1126/science.1098119 15131266

57. Casillas S, Barbadilla A, Bergman CM. Purifying selection maintains highly conserved noncoding sequences in Drosophila. Molecular biology and evolution. 2007;24(10):2222–2234. doi: 10.1093/molbev/msm150 17646256

58. Cohen J. Statistical power analysis for the behavioral sciences. Academic press; 2013.

59. Cooper GM, Shendure J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nature Reviews Genetics. 2011;12(9):628–640. doi: 10.1038/nrg3046 21850043

60. Babarinde IA, Saitou N. Genomic locations of conserved noncoding sequences and their proximal protein-coding genes in mammalian expression dynamics. Molecular biology and evolution. 2016;33(7):1807–1817. doi: 10.1093/molbev/msw058 27017584

61. Polychronopoulos D, King JW, Nash AJ, Tan G, Lenhard B. Conserved non-coding elements: developmental gene regulation meets genome organization. Nucleic acids research. 2017;45(22):12611–12624. doi: 10.1093/nar/gkx1074 29121339

62. Armstrong J, Hickey G, Diekhans M, Deran A, Fang Q, Xie D, et al. Progressive alignment with Cactus: a multiple-genome aligner for the thousand-genome era. bioRxiv. 2019; p. 730531.

63. Zhang G. The bird’s-eye view on chromosome evolution. Genome biology. 2018;19(1):1–3. doi: 10.1186/s13059-018-1585-z 30470246

64. Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, et al. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science. 2014;346(6215):1320–1331. doi: 10.1126/science.1253451 25504713

65. Steige KA, Laenen B, Reimegård J, Scofield DG, Slotte T. Genomic analysis reveals major determinants of cis-regulatory variation in Capsella grandiflora. Proceedings of the National Academy of Sciences. 2017;114(5):1087–1092. doi: 10.1073/pnas.1612561114 28096395

66. Nei M, Li WH. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proceedings of the National Academy of Sciences. 1979;76(10):5269–5273. doi: 10.1073/pnas.76.10.5269 291943

67. Watterson G. On the number of segregating sites in genetical models without recombination. Theoretical population biology. 1975;7(2):256–276. doi: 10.1016/0040-5809(75)90020-9 1145509


Článek vyšel v časopise

PLOS Genetics


2020 Číslo 9

Nejčtenější v tomto čísle

Tomuto tématu se dále věnují…


Přihlášení
Zapomenuté heslo

Nemáte účet?  Registrujte se

Zapomenuté heslo

Zadejte e-mailovou adresu se kterou jste vytvářel(a) účet, budou Vám na ni zaslány informace k nastavení nového hesla.

Přihlášení

Nemáte účet?  Registrujte se