Pan-genomic open reading frames: A potential supplement of single nucleotide polymorphisms in estimation of heritability and genomic prediction

Autoři: Zhengcao Li aff001;  Henner Simianer aff001
Působiště autorů: Animal Breeding and Genetics Group, Center for Integrated Breeding Research, Department of Animal Sciences, University of Goettingen, Goettingen, Germany aff001;  State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China aff002
Vyšlo v časopise: Pan-genomic open reading frames: A potential supplement of single nucleotide polymorphisms in estimation of heritability and genomic prediction. PLoS Genet 16(8): e32767. doi:10.1371/journal.pgen.1008995
Kategorie: Research Article
doi: 10.1371/journal.pgen.1008995


Pan-genomic open reading frames (ORFs) potentially carry protein-coding gene or coding variant information in a population. In this study, we suggest that pan-genomic ORFs are promising to be utilized in estimation of heritability and genomic prediction. A Saccharomyces cerevisiae dataset with whole-genome SNPs, pan-genomic ORFs, and the copy numbers of those ORFs is used to test the effectiveness of ORF data as a predictor in three prediction models for 35 traits. Our results show that the ORF-based heritability can capture more genetic effects than SNP-based heritability for all traits. Compared to SNP-based genomic prediction (GBLUP), pan-genomic ORF-based genomic prediction (OBLUP) is distinctly more accurate for all traits, and the predictive abilities on average are more than doubled across all traits. For four traits, the copy number of ORF-based prediction(CBLUP) is more accurate than OBLUP. When using different numbers of isolates in training sets in ORF-based prediction, the predictive abilities for all traits increased as more isolates are added in the training sets, suggesting that with very large training sets the prediction accuracy will be in the range of the square root of the heritability. We conclude that pan-genomic ORFs have the potential to be a supplement of single nucleotide polymorphisms in estimation of heritability and genomic prediction.

Klíčová slova:

Gene prediction – Genetics – Genomics – Heredity – Human genomics – principal component analysis – Saccharomyces cerevisiae – Single nucleotide polymorphisms


1. Meuwissen Theo HE and Hayes Ben J and Goddard Michael E. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001; 157(4):1819–1829.

2. Goddard ME and Hayes BJ. Genomic selection. Journal of Animal breeding and Genetics. 2007; 124(6):323–330. doi: 10.1111/j.1439-0388.2007.00702.x 18076469

3. Schaeffer LR. Strategy for applying genome-wide selection in dairy cattle. Journal of animal Breeding and genetics. 2006; 123(4):218–223. doi: 10.1111/j.1439-0388.2006.00595.x

4. Goddard Michael E and Hayes Ben J and Meuwissen Theo HE. Genomic selection in livestock populations. Genetics research. 2010; 92(5-6):413–421. doi: 10.1017/S0016672310000613 21429272

5. Crossa José and Pérez-Rodríguez Paulino and Cuevas Jaime and Montesinos-López Osval and Jarquín Diego and de los Campos Gustavo and Burgueño Juan and González-Camacho Juan M and Pérez-Elizalde Sergio and Beyene Yoseph and others. Genomic selection in plant breeding: methods, models, and perspectives. Trends in plant science. 2017; 22(11):961–975. doi: 10.1016/j.tplants.2017.08.011 28965742

6. Abraham Gad and Inouye Michael. Genomic risk prediction of complex human disease and its clinical application. Current opinion in genetics & development. 2015; 33:10–16. doi: 10.1016/j.gde.2015.06.005

7. Wray Naomi R and Yang Jian and Hayes Ben J and Price Alkes L and Goddard Michael E and Visscher Peter M. Author reply to A commentary on Pitfalls of predicting complex traits from SNPs. PLoS genetics. 2013; 14(12):894.

8. de los Campos Gustavo and Vazquez Ana I and Fernando Rohan and Klimentidis Yann C and Sorensen Daniel. Prediction of complex human traits using the genomic best linear unbiased predictor. PLoS genetics. 2013; 9(7):e1003608. doi: 10.1371/journal.pgen.1003608 23874214

9. Evans Luke M and Tahmasbi Rasool and Vrieze Scott I and Abecasis Gonçalo R and Das Sayantan and Gazal Steven and Bjelland Douglas W and De Candia, Teresa R and Goddard Michael E and Neale Benjamin M and others. Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits. Nature genetics. 2018; 50(5):737–745. doi: 10.1038/s41588-018-0108-x

10. Wray Naomi R and Yang Jian and Hayes Ben J and Price Alkes L and Goddard Michael E and Visscher Peter M. Pitfalls of predicting complex traits from SNPs. Nature Reviews Genetics. 2013; 14(7):507–515. doi: 10.1038/nrg3457 23774735

11. Yang Jian and Benyamin Beben and McEvoy Brian P and Gordon Scott and Henders Anjali K and Nyholt Dale R and Madden Pamela A and Heath Andrew C and Martin Nicholas G and Montgomery Grant W and others. Common SNPs explain a large proportion of the heritability for human height. Nature genetics. 2010; 42(7):565–569. doi: 10.1038/ng.608 20562875

12. Yang Jian and Zeng Jian and Goddard Michael E and Wray Naomi R and Visscher Peter M. Concepts, estimation and interpretation of SNP-based heritability. Nature genetics. 2017; 49(9):1304. doi: 10.1038/ng.3941 28854176

13. Sieber P, Platzer M, Schuster S. The definition of open reading frame revisited. Trends in Genetics. 2018; 34(3):167–170.

14. Lapierre Pascal and Gogarten J Peter. Estimating the size of the bacterial pan-genome. Trends in genetics. 2009; 25(3):107–110. doi: 10.1016/j.tig.2008.12.004 19168257

15. Vernikos George and Medini Duccio and Riley David R and Tettelin Herve. Ten years of pan-genome analyses. Current opinion in microbiology. 2015; 23:148–154. doi: 10.1016/j.mib.2014.11.016 25483351

16. Tettelin Hervé and Masignani Vega and Cieslewicz Michael J and Donati Claudio and Medini Duccio and Ward Naomi L and Angiuoli Samuel V and Crabtree Jonathan and Jones Amanda L and Durkin A Scott and others. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proceedings of the National Academy of Sciences. 2005; 102(39):13950–13955. doi: 10.1073/pnas.0506758102

17. Aherfi Sarah and Pagnier Isabelle and Fournous Ghislain and Raoult Didier and La Scola Bernard and Colson Philippe. Complete genome sequence of Cannes 8 virus, a new member of the proposed family “Marseilleviridae”. Virus Genes. 2013; 47(3):550–555. doi: 10.1007/s11262-013-0965-4 23912978

18. Gao Lei and Gonda Itay and Sun Honghe and Ma Qiyue and Bao Kan and Tieman Denise M and Burzynski-Chang Elizabeth A and Fish Tara L and Stromberg Kaitlin A and Sacks Gavin L and others. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nature genetics. 2019; 51(6):1044–1051. doi: 10.1038/s41588-019-0410-2 31086351

19. Li Ying-hui and Zhou Guangyu and Ma Jianxin and Jiang Wenkai and Jin Long-guo and Zhang Zhouhao and Guo Yong and Zhang Jinbo and Sui Yi and Zheng Liangtao and others. De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nature Biotechnology. 2014; 32(10):1045. doi: 10.1038/nbt.2979 25218520

20. Zhao Qiang and Feng Qi and Lu Hengyun and Li Yan and Wang Ahong and Tian Qilin and Zhan Qilin and Lu Yiqi and Zhang Lei and Huang Tao and others. Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nature genetics. 2018; 50(2):278–284. doi: 10.1038/s41588-018-0041-z 29335547

21. Dunn Barbara and Richter Chandra and Kvitek Daniel J and Pugh Tom and Sherlock Gavin. Analysis of the Saccharomyces cerevisiae pan-genome reveals a pool of copy number variants distributed in diverse yeast strains from differing industrial environments. Genome research. 2012; 22(5):908–924. doi: 10.1101/gr.130310.111 22369888

22. Sherman Rachel M and Forman Juliet and Antonescu Valentin and Puiu Daniela and Daya Michelle and Rafaels Nicholas and Boorgula Meher Preethi and Chavan Sameer and Vergara Candelaria and Ortega Victor E and others. Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nature genetics. 2019; 51(1):30–39. doi: 10.1038/s41588-018-0273-y 30455414

23. Donati Claudio and Hiller N Luisa and Tettelin Hervé and Muzzi Alessandro and Croucher Nicholas J and Angiuoli Samuel V and Oggioni Marco and Hotopp Julie C Dunning and Hu Fen Z and Riley David R and others. Structure and dynamics of the pan-genome of Streptococcus pneumoniae and closely related species. Genome biology. 2010; 11(10):R107. doi: 10.1186/gb-2010-11-10-r107 21034474

24. D’Auria Giuseppe and Jiménez-Hernández Nuria and Peris-Bondia Francesc and Moya Andrés and Latorre Amparo. Legionella pneumophila pangenome reveals strain-specific virulence factors. BMC genomics. 2010; 11(1):181–194. doi: 10.1186/1471-2164-11-181 20236513

25. Hu Pan and Yang Ming and Zhang Anding and Wu Jiayan and Chen Bo and Hua Yafeng and Yu Jun and Chen Huanchun and Xiao Jingfa and Jin Meilin. Comparative genomics study of multi-drug-resistance mechanisms in the antibiotic-resistant Streptococcus suis R61 strain. PLoS One. 2011; 6(9):e24988. doi: 10.1371/journal.pone.0024988 21966396

26. Konstantinidis Konstantinos T and Ramette Alban and Tiedje James M. The bacterial species definition in the genomic era. Philosophical Transactions of the Royal Society B: Biological Sciences. 2006; 361(1475):1929–1940. doi: 10.1098/rstb.2006.1920

27. Botstein David and Fink Gerald R. Yeast: an experimental organism for 21st Century biology. Genetics. 2011; 189(3):695–704. doi: 10.1534/genetics.111.130765 22084421

28. Fay Justin C. The molecular basis of phenotypic variation in yeast. Current opinion in genetics & development. 2013; 23(6):672–677. doi: 10.1016/j.gde.2013.10.005

29. Bloom Joshua S and Ehrenreich Ian M and Loo Wesley T and Lite Thúy-Lan Võ and Kruglyak Leonid. Finding the sources of missing heritability in a yeast cross. Nature. 2013; 494(7436):234–237. doi: 10.1038/nature11867 23376951

30. Kumar Anuj and Snyder Michael. Emerging technologies in yeast genomics. Nature Reviews Genetics. 2001; 2(4):302–312. doi: 10.1038/35066084 11283702

31. Märtens Kaspar and Hallin Johan and Warringer Jonas and Liti Gianni and Parts Leopold. Predicting quantitative traits from genome and phenome with near perfect accuracy. Nature communications. 2016; 7:11512–11520. doi: 10.1038/ncomms11512 27160605

32. Marroni Fabio and Pinosio Sara and Morgante Michele. Structural variation and genome complexity: is dispensable really dispensable?. Current Opinion in Plant Biology. 2014; 18:31–36.

33. Peter Jackson and De Chiara Matteo and Friedrich Anne and Yue Jia-Xing and Pflieger David and Bergström Anders and Sigwalt Anastasie and Barre Benjamin and Freel Kelle and Llored Agnès and others. Genome evolution across 1,011 Saccharomyces cerevisiae isolates. Nature. 2018; 556(7701):339–344. doi: 10.1038/s41586-018-0030-5 29643504

34. Maher Brendan. Personal genomes: The case of the missing heritability. Nature News. 2008; 456(7218):18–21. doi: 10.1038/456018a

35. Hill William G and Goddard Michael E and Visscher Peter M. Data and theory point to mainly additive genetic variance for complex traits. PLoS genetics. 2008; 4(2):e1000008. doi: 10.1371/journal.pgen.1000008 18454194

36. Walker Francis O. Huntington’s disease. The Lancet. 2007; 369(9557):218–228. doi: 10.1016/S0140-6736(07)60111-1

37. Gonzalez Enrique and Kulkarni Hemant and Bolivar Hector and Mangano Andrea and Sanchez Racquel and Catano Gabriel and Nibbs Robert J and Freedman Barry I and Quinones Marlon P and Bamshad Michael J and others. The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science. 2005; 307(5714):1434–1440. doi: 10.1126/science.1101160 15637236

38. Goddard Michael E and Wray Naomi R and Verbyla Klara and Visscher Peter M and others. Estimating effects and making predictions from genome-wide marker data. Statistical Science. 2009; 24(4):517–529. doi: 10.1214/09-STS306

39. Kim Hwasoon and Grueneberg Alexander and Vazquez Ana I and Hsu Stephen and de los Campos Gustavo. Will big data close the missing heritability gap?. Genetics. 2017; 207(3):1135–1145. doi: 10.1534/genetics.117.300271 28893854

40. Speed Doug and Hemani Gibran and Johnson Michael R and Balding David J. Improved heritability estimation from genome-wide SNPs. The American Journal of Human Genetics. 2012; 91(6):1011–1021. doi: 10.1016/j.ajhg.2012.10.010 23217325

41. Erbe Malena and Gredler Birgit and Seefried Franz Reinhold and Bapst Beat and Simianer Henner. A function accounting for training set size and marker density to model the average accuracy of genomic prediction. PLoS One. 2013; 8(12):e81046. doi: 10.1371/journal.pone.0081046 24339895

42. Bentley Stephen. Sequencing the species pan-genome. Nature Reviews Microbiology. 2009; 7:258–259.

43. Georges Michel and Charlier Carole and Hayes Ben. Harnessing genomic information for livestock improvement. Nature Reviews Genetics. 2019; 20(3):135–156. doi: 10.1038/s41576-018-0082-2 30514919

44. Marouli Eirini and Graff Mariaelisa and Medina-Gomez Carolina and Lo Ken Sin and Wood Andrew R and Kjaer Troels R and Fine Rebecca S and Lu Yingchang and Schurmann Claudia and Highland Heather M and others. Rare and low-frequency coding variants alter human adult height. Nature. 2017; 542(7640):186–190. doi: 10.1038/nature21039 28146470

45. Maurano Matthew T and Humbert Richard and Rynes Eric and Thurman Robert E and Haugen Eric and Wang Hao and Reynolds Alex P and Sandstrom Richard and Qu Hongzhu and Brody Jennifer and others. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012; 337(6099):1190–1195. doi: 10.1126/science.1222794 22955828

46. Albert Frank W and Kruglyak Leonid. Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nature Reviews Genetics. 2015; 16(4):197–212. doi: 10.1038/nrg3891 25707927

47. Yan Hai and Dobbie Zuzana and Gruber Stephen B and Markowitz Sanford and Romans Kathy and Giardiello Francis M and Kinzler Kenneth W and Vogelstein Bert. Small changes in expression affect predisposition to tumorigenesis. Nature genetics. 2002; 30(1):25–26. doi: 10.1038/ng799 11743581

48. Kleinjan Dirk A and van Heyningen Veronica. Long-range control of gene expression: emerging mechanisms and disruption in disease. The American Journal of Human Genetics. 2005; 76(1):8–32. doi: 10.1086/426833 15549674

49. Goffeau André and Barrell Bart G and Bussey Howard and Davis RW and Dujon Bernard and Feldmann Heinz and Galibert Francis and Hoheisel JD and Jacq Cr and Johnston Michael and others. Life with 6000 genes. Science. 1996; 274(5287):546–567. doi: 10.1126/science.274.5287.546 8849441

50. Es Lander and Lm Linton and others. Initial sequencing and analysis of the human genome. Nature. 2001; 409(6822):860. doi: 10.1038/35057062

51. Li Mingzhou and Chen Lei and Tian Shilin and Lin Yu and Tang Qianzi and Zhou Xuming and Li Diyan and Yeung Carol KL and Che Tiandong and Jin Long and others. Comprehensive variation discovery and recovery of missing sequence in the pig genome using multiple de novo assemblies. Genome research. 2017; 27(5):865–874. doi: 10.1101/gr.207456.116 27646534

52. Wang Wensheng and Mauleon Ramil and Hu Zhiqiang and Chebotarov Dmytro and Tai Shuaishuai and Wu Zhichao and Li Min and Zheng Tianqing and Fuentes Roven Rommel and Zhang Fan and others. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature. 2018; 557(7703):43–49. doi: 10.1038/s41586-018-0063-9 29695866

53. Hurgobin Bhavna and Golicz Agnieszka A and Bayer Philipp E and Chan Chon-Kit Kenneth and Tirnaz Soodeh and Dolatabadian Aria and Schiessl Sarah V and Samans Birgit and Montenegro Juan D and Parkin Isobel AP and others. Homoeologous exchange is a major cause of gene presence/absence variation in the amphidiploid Brassica napus. Plant biotechnology journal. 2018; 16(7):1265–1274. doi: 10.1111/pbi.12867 29205771

54. Montenegro Juan D and Golicz Agnieszka A and Bayer Philipp E and Hurgobin Bhavna and Lee HueyTyng and Chan Chon-Kit Kenneth and Visendi Paul and Lai Kaitao and Doležel Jaroslav and Batley Jacqueline and others. The pangenome of hexaploid bread wheat. The Plant Journal. 2017; 90(5):1007–1013. doi: 10.1111/tpj.13515 28231383

55. Golicz Agnieszka A and Bayer Philipp E and Barker Guy C and Edger Patrick P and Kim HyeRan and Martinez Paula A and Chan Chon Kit Kenneth and Severn-Ellis Anita and McCombie W Richard and Parkin Isobel AP and others. The pangenome of an agronomically important crop plant Brassica oleracea. Nature communications. 2016; 7:13390.

56. Jun Yu and Songnian Hu and Jun Wang. A Draft Sequence of the Rice Genome (Oryza sativa L. Ssp. Indica). Science. 2002; 296(5565):79–91. doi: 10.1126/science.1068037 11935017

57. Wray Naomi R and Kemper Kathryn E and Hayes Benjamin J and Goddard Michael E and Visscher Peter M. Complex Trait Prediction from Genome Data: Contrasting EBV in Livestock to PRS in Humans: Genomic Prediction. Genetics. 2019; 211(4):1131–1141. doi: 10.1534/genetics.119.301859 30967442

58. Skelly Daniel A and Merrihew Gennifer E and Riffle Michael and Connelly Caitlin F and Kerr Emily O and Johansson Marnie and Jaschob Daniel and Graczyk Beth and Shulman Nicholas J and Wakefield Jon and others. Integrative phenomics reveals insight into the structure of phenotypic diversity in budding yeast. Genome research. 2013; 23(9):1496–1504. doi: 10.1101/gr.155762.113 23720455

59. Bergström Anders and Simpson Jared T and Salinas Francisco and Barré Benjamin and Parts Leopold and Zia Amin and Nguyen Ba Alex N and Moses Alan M and Louis Edward J and Mustonen Ville and others. A high-definition view of functional genetic variation from natural yeast genomes. Molecular biology and evolution. 2014; 31(4):872–888. doi: 10.1093/molbev/msu037 24425782

60. Strope Pooja K and Skelly Daniel A and Kozmin Stanislav G and Mahadevan Gayathri and Stone Eric A and Magwene Paul M and Dietrich Fred S and McCusker John H. The 100-genomes strains, an S. cerevisiae resource that illuminates its natural phenotypic and genotypic variation and emergence as an opportunistic pathogen. Genome research. 2015; 25(5):762–774. doi: 10.1101/gr.185538.114 25840857

61. Browning Brian L and Browning Sharon R. Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics. 2013; 194(2):459–471. doi: 10.1534/genetics.113.150029 23535385

62. Li Heng and Durbin Richard. Fast and accurate short read alignment with Burrows–Wheeler transform. bioinformatics. 2009; 25(14):1754–1760. doi: 10.1093/bioinformatics/btp324 19451168

63. VanRaden Paul M. Efficient methods to compute genomic predictions. Journal of dairy science. 2008; 91(11):4414–4423.

64. Team, R Core and others. R: A language and environment for statistical computing. Computing. 2013.

65. Pérez Paulino and de Los Campos Gustavo. Genome-wide regression and prediction with the BGLR statistical package. Genetics. 2014; 198(2):483–495. doi: 10.1534/genetics.114.164442 25009151

66. Clifford David and McCullagh Peter. Package ‘regress’. 2013.

67. Paradis Emmanuel and Schliep Klaus. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics. 2018; 35(3):526–528. doi: 10.1093/bioinformatics/bty633

Článek vyšel v časopise

PLOS Genetics

2020 Číslo 8
Nejčtenější tento týden
Nejčtenější v tomto čísle
Zapomenuté heslo

Nemáte účet?  Registrujte se

Zapomenuté heslo

Zadejte e-mailovou adresu, se kterou jste vytvářel(a) účet, budou Vám na ni zaslány informace k nastavení nového hesla.


Nemáte účet?  Registrujte se