A general framework for functionally informed set-based analysis: Application to a large-scale colorectal cancer study

Autoři: Xinyuan Dong aff001;  Yu-Ru Su aff001;  Richard Barfield aff001;  Stephanie A. Bien aff001;  Qianchuan He aff001;  Tabitha A. Harrison aff001;  Jeroen R. Huyghe aff001;  Temitope O. Keku aff003;  Noralane M. Lindor aff004;  Clemens Schafmayer aff005;  Andrew T. Chan aff006;  Stephen B. Gruber aff007;  Mark A. Jenkins aff008;  Charles Kooperberg aff001;  Ulrike Peters aff001;  Li Hsu aff001
Působiště autorů: Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA aff001;  Department of Biostatistics, University of Washington, Seattle, WA, USA aff002;  Center for Gastrointestinal Biology and Disease, University of North Carolina, Chapel Hill, North Carolina, USA aff003;  Department of Health Science Research, Mayo Clinic, Scottsdale, Arizona, USA aff004;  Department of General Surgery, University Hospital Rostock, Rostock, Germany aff005;  Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, and Channing Division of Network Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA aff006;  City of Hope National Medical Center, Duarte, and Department of Preventive Medicine & USC Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, California, USA aff007;  Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, Victoria, Australia aff008
Vyšlo v časopise: A general framework for functionally informed set-based analysis: Application to a large-scale colorectal cancer study. PLoS Genet 16(8): e32767. doi:10.1371/journal.pgen.1008947
Kategorie: Research Article
doi: https://doi.org/10.1371/journal.pgen.1008947


Genome-wide association studies (GWAS) have successfully identified tens of thousands of genetic variants associated with various phenotypes, but together they explain only a fraction of heritability, suggesting many variants have yet to be discovered. Recently it has been recognized that incorporating functional information of genetic variants can improve power for identifying novel loci. For example, S-PrediXcan and TWAS tested the association of predicted gene expression with phenotypes based on GWAS summary statistics by leveraging the information on genetic regulation of gene expression and found many novel loci. However, as genetic variants may have effects on more than one gene and through different mechanisms, these methods likely only capture part of the total effects of these variants. In this paper, we propose a summary statistics-based mixed effects score test (sMiST) that tests for the total effect of both the effect of the mediator by imputing genetically predicted gene expression, like S-PrediXcan and TWAS, and the direct effects of individual variants. It allows for multiple functional annotations and multiple genetically predicted mediators. It can also perform conditional association analysis while adjusting for other genetic variants (e.g., known loci for the phenotype). Extensive simulation and real data analyses demonstrate that sMiST yields p-values that agree well with those obtained from individual level data but with substantively improved computational speed. Importantly, a broad application of sMiST to GWAS is possible, as only summary statistics of genetic variant associations are required. We apply sMiST to a large-scale GWAS of colorectal cancer using summary statistics from ∼120, 000 study participants and gene expression data from the Genotype-Tissue Expression (GTEx) project. We identify several novel and secondary independent genetic loci.

Klíčová slova:

Colorectal cancer – Covariance – Gene expression – Gene prediction – Genetic loci – Genetics – Genome-wide association studies – Test statistics


1. Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic acids research. 2019;47(D1):D1005–D1012. doi: 10.1093/nar/gky1120 30445434

2. Lonsdale J, Thomas J, Salvatore M, Phillips R, Lo E, Shad S, et al. The genotype-tissue expression (GTEx) project. Nature genetics. 2013;45(6):580. doi: 10.1038/ng.2653

3. Consortium EP, et al. The ENCODE (ENCyclopedia of DNA elements) project. Science. 2004;306(5696):636–640. doi: 10.1126/science.1105136

4. Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K, Carroll RJ, et al. A gene-based association method for mapping traits using reference transcriptome data. Nature genetics. 2015;47(9):1091. doi: 10.1038/ng.3367 26258848

5. Barbeira AN, Dickinson SP, Bonazzola R, Zheng J, Wheeler HE, Torres JM, et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nature communications. 2018;9(1):1825. doi: 10.1038/s41467-018-03621-1 29739930

6. Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx BW, et al. Integrative approaches for large-scale transcriptome-wide association studies. Nature genetics. 2016;48(3):245. doi: 10.1038/ng.3506 26854917

7. Burgess S, Dudbridge F, Thompson SG. Combining information on multiple instrumental variables in Mendelian randomization: comparison of allele score and summarized data methods. Statistics in medicine. 2016;35(11):1880–1906. doi: 10.1002/sim.6835

8. Barfield R, Feng H, Gusev A, Wu L, Zheng W, Pasaniuc B, et al. Transcriptome-wide association studies accounting for colocalization using Egger regression. Genetic epidemiology. 2018;42(5):418–433. doi: 10.1002/gepi.22131 29808603

9. Corradin O, Saiakhova A, Akhtar-Zaidi B, Myeroff L, Willis J, Cowper-Sal R, et al. Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits. Genome research. 2014;24(1):1–13. doi: 10.1101/gr.164079.113 24196873

10. Ghoussaini M, French JD, Michailidou K, Nord S, Beesley J, Canisus S, et al. Evidence that the 5p12 variant rs10941679 confers susceptibility to estrogen-receptor-positive breast cancer through FGF10 and MRPS30 regulation. The American Journal of Human Genetics. 2016;99(4):903–911. doi: 10.1016/j.ajhg.2016.07.017 27640304

11. Sun J, Zheng Y, Hsu L. A unified mixed-effects model for rare-variant association in sequencing studies. Genetic epidemiology. 2013;37(4):334–344. doi: 10.1002/gepi.21717

12. Su YR, Di C, Bien S, Huang L, Dong X, Abecasis G, et al. A mixed-effects model for powerful association tests in integrative functional genomics. The American Journal of Human Genetics. 2018;102(5):904–919. doi: 10.1016/j.ajhg.2018.03.019 29727690

13. Burgess S, Zuber V, Valdes-Marquez E, Sun BB, Hopewell JC. Mendelian randomization with fine-mapped genetic data: Choosing from large numbers of correlated instrumental variables. Genetic epidemiology. 2017;41(8):714–725. doi: 10.1002/gepi.22077

14. Huang YT, VanderWeele TJ, Lin X. Joint analysis of SNP and gene expression data in genetic association studies of complex diseases. The annals of applied statistics. 2014;8(1):352. doi: 10.1214/13-AOAS690

15. Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement error in nonlinear models: a modern perspective. Chapman and Hall/CRC; 2006.

16. Huyghe JR, Bien SA, Harrison TA, Kang HM, Chen S, Schmit SL, et al. Discovery of common and rare genetic risk variants for colorectal cancer. Nature genetics. 2019;51(1):76. doi: 10.1038/s41588-018-0286-6 30510241

17. Knight K, Fu W, et al. Asymptotics for lasso-type estimators. The Annals of statistics. 2000;28(5):1356–1378. doi: 10.1214/aos/1015957397

18. Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nature genetics. 2016;48(5):481. doi: 10.1038/ng.3538 27019110

19. Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS genetics. 2014;10(5):e1004383. doi: 10.1371/journal.pgen.1004383 24830394

20. Hormozdiari F, Van De Bunt M, Segre AV, Li X, Joo JWJ, Bilow M, et al. Colocalization of GWAS and eQTL signals detects target genes. The American Journal of Human Genetics. 2016;99(6):1245–1260. doi: 10.1016/j.ajhg.2016.10.003 27866706

21. Wen X, Pique-Regi R, Luca F. Integrating molecular QTL data into genome-wide genetic association analysis: Probabilistic assessment of enrichment and colocalization. PLoS genetics. 2017;13(3):e1006646. doi: 10.1371/journal.pgen.1006646

22. MacKinnon DP, Krull JL, Lockwood CM. Equivalence of the mediation, confounding and suppression effect. Prevention science. 2000;1(4):173–181. doi: 10.1023/A:1026595011371

23. Sobel ME. Asymptotic confidence intervals for indirect effects in structural equation models. Sociological methodology. 1982;13:290–312. doi: 10.2307/270723

24. Guo S, Ran H, Xiao D, Huang H, Mi L, Wang X, et al. NT5DC2 promotes tumorigenicity of glioma stem-like cells by upregulating fyn. Cancer letters. 2019;454:98–107. doi: 10.1016/j.canlet.2019.04.003 30978441

25. Alahari SK. Nischarin inhibits Rac induced migration and invasion of epithelial cells by affecting signaling cascades involving PAK. Experimental cell research. 2003;288(2):415–424. doi: 10.1016/S0014-4827(03)00233-7

26. Karasneh J, Gül A, Ollier WE, Silman AJ, Worthington J. Whole-genome screening for susceptibility genes in multicase families with Behçet’s disease. Arthritis & Rheumatism. 2005;52(6):1836–1842. doi: 10.1002/art.21060

27. Larsen JE, Pavey SJ, Passmore LH, Bowman RV, Hayward NK, Fong KM. Gene expression signature predicts recurrence in lung adenocarcinoma. Clinical Cancer Research. 2007;13(10):2946–2954. doi: 10.1158/1078-0432.CCR-06-2525

28. Choi SY, Huang P, Jenkins GM, Chan DC, Schiller J, Frohman MA. A common lipid links Mfn-mediated mitochondrial fusion and SNARE-regulated exocytosis. Nature cell biology. 2006;8(11):1255. doi: 10.1038/ncb1487

29. Steinhardt AA, Gayyed MF, Klein AP, Dong J, Maitra A, Pan D, et al. Expression of Yes-associated protein in common solid tumors. Human pathology. 2008;39(11):1582–1589. doi: 10.1016/j.humpath.2008.04.012 18703216

30. Schwarz-Romond T, Asbrand C, Bakkers J, Kühl M, Schaeffer HJ, Huelsken J, et al. The ankyrin repeat protein Diversin recruits Casein kinase Iε to the β-catenin degradation complex and acts in both canonical Wnt and Wnt/JNK signaling. Genes & development. 2002;16(16):2073–2084. doi: 10.1101/gad.230402

31. Lee S, Wu MC, Lin X. Optimal tests for rare variant effects in sequencing association studies. Biostatistics. 2012;13(4):762–775. doi: 10.1093/biostatistics/kxs014

32. Hu YJ, Berndt SI, Gustafsson S, Ganna A, Mägi R, Wheeler E, et al. Meta-analysis of gene-level associations for rare variants based on single-variant statistics. The American Journal of Human Genetics. 2013;93(2):236–248. doi: 10.1016/j.ajhg.2013.06.011 23891470

33. McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, et al. A reference panel of 64,976 haplotypes for genotype imputation. Nature genetics. 2016;48(10):1279. doi: 10.1038/ng.3643 27548312

34. Battle A, Mostafavi S, Zhu X, Potash JB, Weissman MM, McCormick C, et al. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome research. 2014;24(1):14–24. doi: 10.1101/gr.155192.113 24092820

Článek vyšel v časopise

PLOS Genetics

2020 Číslo 8
Nejčtenější tento týden
Nejčtenější v tomto čísle

Zvyšte si kvalifikaci online z pohodlí domova

Úloha kombinovaných preparátů v léčbě arteriální hypertenze
nový kurz
Autoři: prof. MUDr. Martin Haluzík, DrSc.

Třikrát z interní medicíny
Autoři: Mgr. Jana Kubátová, Ph.D.

Pokročilá Parkinsonova nemoc − úskalí a možnosti léčby
Autoři: doc. MUDr. Marek Baláž, Ph.D.

Léčba diabetes mellitus 2. typu pomocí GLP- 1 RA

Depresivní porucha a zánětlivé procesy
Autoři: MUDr. Juraj Tkáč

Všechny kurzy
Zapomenuté heslo

Zadejte e-mailovou adresu, se kterou jste vytvářel(a) účet, budou Vám na ni zaslány informace k nastavení nového hesla.


Nemáte účet?  Registrujte se