Leveraging correlations between variants in polygenic risk scores to detect heterogeneity in GWAS cohorts

Autoři: Jie Yuan aff001;  Henry Xing aff001;  Alexandre Louis Lamy aff001aff001;  Todd Lencz aff002;  Itsik Pe’er aff001
Působiště autorů: Department of Computer Science, Columbia University, New York, United States of America aff001;  The Center for Psychiatric Neuroscience, Feinstein Institutes for Medical Research, New York, United States of America aff002
Vyšlo v časopise: Leveraging correlations between variants in polygenic risk scores to detect heterogeneity in GWAS cohorts. PLoS Genet 16(9): e32767. doi:10.1371/journal.pgen.1009015
Kategorie: Research Article
doi: 10.1371/journal.pgen.1009015


Evidence from both GWAS and clinical observation has suggested that certain psychiatric, metabolic, and autoimmune diseases are heterogeneous, comprising multiple subtypes with distinct genomic etiologies and Polygenic Risk Scores (PRS). However, the presence of subtypes within many phenotypes is frequently unknown. We present CLiP (Correlated Liability Predictors), a method to detect heterogeneity in single GWAS cohorts. CLiP calculates a weighted sum of correlations between SNPs contributing to a PRS on the case/control liability scale. We demonstrate mathematically and through simulation that among i.i.d. homogeneous cases generated by a liability threshold model, significant anti-correlations are expected between otherwise independent predictors due to ascertainment on the hidden liability score. In the presence of heterogeneity from distinct etiologies, confounding by covariates, or mislabeling, these correlation patterns are altered predictably. We further extend our method to two additional association study designs: CLiP-X for quantitative predictors in applications such as transcriptome-wide association, and CLiP-Y for quantitative phenotypes, where there is no clear distinction between cases and controls. Through simulations, we demonstrate that CLiP and its extensions reliably distinguish between homogeneous and heterogeneous cohorts when the PRS explains as low as 3% of variance on the liability scale and cohorts comprise 50, 000 − 100, 000 samples, an increasingly practical size for modern GWAS. We apply CLiP to heterogeneity detection in schizophrenia cohorts totaling > 50, 000 cases and controls collected by the Psychiatric Genomics Consortium. We observe significant heterogeneity in mega-analysis of the combined PGC data (p-value 8.54 × 0−4), as well as in individual cohorts meta-analyzed using Fisher’s method (p-value 0.03), based on significantly associated variants. We also apply CLiP-Y to detect heterogeneity in neuroticism in over 10, 000 individuals from the UK Biobank and detect heterogeneity with a p-value of 1.68 × 10−9. Scores were not significantly reduced when partitioning by known subclusters (“Depression” and “Worry”), suggesting that these factors are not the primary source of observed heterogeneity.

Klíčová slova:

Gene expression – Genome-wide association studies – Genomics – Medical risk factors – Normal distribution – Polynomials – Schizophrenia – Single nucleotide polymorphisms


1. Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al. 10 years of GWAS discovery: biology, function, and translation. The American Journal of Human Genetics. 2017;101(1):5–22. doi: 10.1016/j.ajhg.2017.06.005 28686856

2. Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic acids research. 2013;42(D1):D1001–D1006. doi: 10.1093/nar/gkt1229 24316577

3. Wray NR, Goddard ME, Visscher PM. Prediction of individual genetic risk to disease from genome-wide association studies. Genome research. 2007;17(10):000–000. doi: 10.1101/gr.6665407 17785532

4. Wray NR, Maier R. Genetic basis of complex genetic disease: the contribution of disease heterogeneity to missing heritability. Current Epidemiology Reports. 2014;1(4):220–227. doi: 10.1007/s40471-014-0023-3

5. Purcell SM, Wray NR, Stone JL, Visscher PM, O’Donovan MC, Sullivan PF, et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460(7256):748–752. doi: 10.1038/nature08185 19571811

6. Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. The American Journal of Human Genetics. 2011;88(1):76–82. doi: 10.1016/j.ajhg.2010.11.011 21167468

7. Lee SH, Wray NR, Goddard ME, Visscher PM. Estimating missing heritability for disease from genome-wide association studies. The American Journal of Human Genetics. 2011;88(3):294–305. doi: 10.1016/j.ajhg.2011.02.002 21376301

8. Gillett AC, Vassos E, Lewis C. Transforming summary statistics from logistic regression to the liability scale: application to genetic and environmental risk scores. bioRxiv. 2018; p. 385740.

9. Wray NR, Goddard ME. Multi-locus models of genetic risk of disease. Genome Medicine. 2010;2(2):10. doi: 10.1186/gm131 20181060

10. Visscher PM, Wray NR. Concepts and misconceptions about the polygenic additive model applied to disease. Human heredity. 2015;80(4):165–170. doi: 10.1159/000446931 27576756

11. Edwards SL, Beesley J, French JD, Dunning AM. Beyond GWASs: illuminating the dark road from association to function. The American Journal of Human Genetics. 2013;93(5):779–797. doi: 10.1016/j.ajhg.2013.10.012 24210251

12. Boyle EA, Li YI, Pritchard JK. An expanded view of complex traits: from polygenic to omnigenic. Cell. 2017;169(7):1177–1186. doi: 10.1016/j.cell.2017.05.038 28622505

13. Liu X, Li YI, Pritchard JK. Trans effects on gene expression can drive omnigenic inheritance. Cell. 2019;177(4):1022–1034. doi: 10.1016/j.cell.2019.04.014 31051098

14. Bhattacharjee S, Rajaraman P, Jacobs KB, Wheeler WA, Melin BS, Hartge P, et al. A subset-based approach improves power and interpretation for the combined analysis of genetic association studies of heterogeneous traits. The American Journal of Human Genetics. 2012;90(5):821–835. doi: 10.1016/j.ajhg.2012.03.015 22560090

15. Wang M, Spiegelman D, Kuchiba A, Lochhead P, Kim S, Chan AT, et al. Statistical methods for studying disease subtype heterogeneity. Statistics in medicine. 2016;35(5):782–800. doi: 10.1002/sim.6793 26619806

16. Milaneschi Y, Lamers F, Peyrot WJ, Abdellaoui A, Willemsen G, Hottenga JJ, et al. Polygenic dissection of major depression clinical heterogeneity. Molecular psychiatry. 2016;21(4):516. doi: 10.1038/mp.2015.86 26122587

17. Charney A, Ruderfer D, Stahl E, Moran J, Chambert K, Belliveau R, et al. Evidence for genetic heterogeneity between clinical subtypes of bipolar disorder. Translational psychiatry. 2017;7(1):e993. doi: 10.1038/tp.2016.242 28072414

18. Graham DSC. Genome-wide association studies in systemic lupus erythematosus: a perspective; 2009.

19. Disanto G, Berlanga AJ, Handel AE, Para AE, Burrell AM, Fries A, et al. Heterogeneity in multiple sclerosis: scratching the surface of a complex disease. Autoimmune Diseases. 2011;2011. doi: 10.4061/2011/932351 21197462

20. Myers CT, Mefford HC. Advancing epilepsy genetics in the genomic era. Genome medicine. 2015;7(1):91. doi: 10.1186/s13073-015-0214-7 26302787

21. He N, Lin ZJ, Wang J, Wei F, Meng H, Liu XR, et al. Evaluating the pathogenic potential of genes with de novo variants in epileptic encephalopathies. Genetics in Medicine. 2019;21(1):17. doi: 10.1038/s41436-018-0011-y 29895856

22. Hinks A, Cobb J, Marion MC, Prahalad S, Sudman M, Bowes J, et al. Dense genotyping of immune-related disease regions identifies 14 new susceptibility loci for juvenile idiopathic arthritis. Nature genetics. 2013;45(6):664. doi: 10.1038/ng.2614 23603761

23. Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nature genetics. 2019;51(4):584. doi: 10.1038/s41588-019-0379-x 30926966

24. Mostafavi H, Harpak A, Conley D, Pritchard JK, Przeworski M. Variable prediction accuracy of polygenic scores within an ancestry group. BioRxiv. 2019; p. 629949.

25. Arnedo J, Svrakic DM, Del Val C, Romero-Zaliz R, Hernández-Cuervo H, of Schizophrenia Consortium MG, et al. Uncovering the hidden risk architecture of the schizophrenias: confirmation in three independent genome-wide association studies. American Journal of Psychiatry. 2015;172(2):139–153. doi: 10.1176/appi.ajp.2014.14040435 25219520

26. Derringer J. Explaining heritable variance in human character. bioRxiv. 2018; p. 446518.

27. Breen G, Bulik-Sullivan B, Daly M, Medland S, Neale B, O’Donovan M, et al. Eight types of schizophrenia? Not so fast…. http://genomesunzippedorg. 2014.

28. Dahl A, Cai N, Ko A, Laakso M, Pajukanta P, Flint J, et al. Reverse GWAS: Using genetics to identify and model phenotypic subtypes. PLoS genetics. 2019;15(4):e1008009. doi: 10.1371/journal.pgen.1008009 30951530

29. Gratten J, Visscher PM. Genetic pleiotropy in complex traits and diseases: implications for genomic medicine. Genome medicine. 2016;8(1):78. doi: 10.1186/s13073-016-0332-x 27435222

30. Uher R, Zwicker A. Etiology in psychiatry: embracing the reality of poly-gene-environmental causation of mental illness. World Psychiatry. 2017;16(2):121–129. doi: 10.1002/wps.20436 28498595

31. Brown GW, Ban M, Craig TK, Harris TO, Herbert J, Uher R. Serotonin transporter length polymorphism, childhood maltreatment, and chronic depression: a specific gene–environment interaction. Depression and Anxiety. 2013;30(1):5–13. doi: 10.1002/da.21982 22847957

32. Lee SH, Ripke S, Neale BM, Faraone SV, Purcell SM, Perlis RH, et al. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nature genetics. 2013;45(9):984. doi: 10.1038/ng.2711 23933821

33. Ruderfer DM, Ripke S, McQuillin A, Boocock J, Stahl EA, Pavlides JMW, et al. Genomic dissection of bipolar disorder and schizophrenia, including 28 subphenotypes. Cell. 2018;173(7):1705–1715. doi: 10.1016/j.cell.2018.05.046 29906448

34. Lencz T, Guha S, Liu C, Rosenfeld J, Mukherjee S, DeRosse P, et al. Genome-wide association study implicates NDST3 in schizophrenia and bipolar disorder. Nature communications. 2013;4:2739. doi: 10.1038/ncomms3739 24253340

35. Han B, Pouget JG, Slowikowski K, Stahl E, Lee CH, Diogo D, et al. A method to decipher pleiotropy by detecting underlying heterogeneity driven by hidden subgroups applied to autoimmune and neuropsychiatric diseases. Nature genetics. 2016;48(7):803. doi: 10.1038/ng.3572 27182969

36. Ripke S, Neale BM, Corvin A, Walters JT, Farh KH, Holmans PA, et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014;511(7510):421. doi: 10.1038/nature13595 25056061

37. Nelis M, Esko T, Mägi R, Zimprich F, Zimprich A, Toncheva D, et al. Genetic structure of Europeans: a view from the north–east. PloS one. 2009;4(5). doi: 10.1371/journal.pone.0005472 19424496

38. Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx BW, et al. Integrative approaches for large-scale transcriptome-wide association studies. Nature genetics. 2016;48(3):245. doi: 10.1038/ng.3506 26854917

39. Mancuso N, Shi H, Goddard P, Kichaev G, Gusev A, Pasaniuc B. Integrating gene expression with summary association statistics to identify genes associated with 30 complex traits. The American Journal of Human Genetics. 2017;100(3):473–487. doi: 10.1016/j.ajhg.2017.01.031 28238358

40. Raj A, Stephens M, Pritchard JK. fastSTRUCTURE: variational inference of population structure in large SNP data sets. Genetics. 2014;197(2):573–589. doi: 10.1534/genetics.114.164350 24700103

41. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203–209. doi: 10.1038/s41586-018-0579-z 30305743

42. Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS medicine. 2015;12(3). doi: 10.1371/journal.pmed.1001779 25826379

43. Eysenck SB, Eysenck HJ, Barrett P. A revised version of the psychoticism scale. Personality and individual differences. 1985;6(1):21–29. doi: 10.1016/0191-8869(85)90026-1

44. Nagel M, Watanabe K, Stringer S, Posthuma D, Van Der Sluis S. Item-level analyses reveal genetic heterogeneity in neuroticism. Nature communications. 2018;9(1):1–10. doi: 10.1038/s41467-018-03242-8 29500382

45. Bergen SE, Ploner A, Howrigan D, Group CA, the Schizophrenia Working Group of the Psychiatric Genomics Consortium, O’Donovan MC, et al. Joint contributions of rare copy number variants and common SNPs to risk for schizophrenia. American Journal of Psychiatry. 2019;176(1):29–35. doi: 10.1176/appi.ajp.2018.17040467 30392412

46. Martin J, O’Donovan MC, Thapar A, Langley K, Williams N. The relative contribution of common and rare genetic variants to ADHD. Translational psychiatry. 2015;5(2):e506–e506. doi: 10.1038/tp.2015.5 25668434

47. Evangelou E, Ioannidis JP. Meta-analysis methods for genome-wide association studies and beyond. Nature Reviews Genetics. 2013;14(6):379. doi: 10.1038/nrg3472 23657481

48. Delaneau O, Marchini J, Zagury JF. A linear complexity phasing method for thousands of genomes. Nature methods. 2012;9(2):179. doi: 10.1038/nmeth.1785

49. Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nature genetics. 2012;44(8):955. doi: 10.1038/ng.2354 22820512

50. Ullah E, Mall R, Abbas MM, Kunji K, Nato AQ, Bensmail H, et al. Comparison and assessment of family-and population-based genotype imputation methods in large pedigrees. Genome research. 2019;29(1):125–134. doi: 10.1101/gr.236315.118 30514702

51. Haeussler M, Zweig AS, Tyner C, Speir ML, Rosenbloom KR, Raney BJ, et al. The UCSC genome browser database: 2019 update. Nucleic acids research. 2018;47(D1):D853–D858. doi: 10.1093/nar/gky1095 30407534

Článek vyšel v časopise

PLOS Genetics

2020 Číslo 9
Nejčtenější tento týden
Nejčtenější v tomto čísle

Zvyšte si kvalifikaci online z pohodlí domova

Deprese u dětí a adolescentů
nový kurz
Autoři: MUDr. Vlastimil Nesnídal

Konsenzuální postupy v léčbě močových infekcí

COVID-19 up to date
Autoři: doc. MUDr. Vladimír Koblížek, Ph.D., MUDr. Mikuláš Skála, prof. MUDr. František Kopřiva, Ph.D., prof. MUDr. Roman Prymula, CSc., Ph.D.

Betablokátory a Ca antagonisté z jiného úhlu
Autoři: prof. MUDr. Michal Vrablík, Ph.D., MUDr. Petr Janský

Chronické žilní onemocnění a možnosti konzervativní léčby

Všechny kurzy
Zapomenuté heslo

Nemáte účet?  Registrujte se

Zapomenuté heslo

Zadejte e-mailovou adresu, se kterou jste vytvářel(a) účet, budou Vám na ni zaslány informace k nastavení nového hesla.


Nemáte účet?  Registrujte se