Leveraging expression from multiple tissues using sparse canonical correlation analysis and aggregate tests improves the power of transcriptome-wide association studies

Autoři: Helian Feng aff001;  Nicholas Mancuso aff003;  Alexander Gusev aff005;  Arunabha Majumdar aff008;  Megan Major aff010;  Bogdan Pasaniuc aff008;  Peter Kraft aff001
Působiště autorů: Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America aff001;  Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America aff002;  Center for Genetic Epidemiology, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America aff003;  Division of Biostatistics, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America aff004;  Department of Medical Oncology, Dana-Farber Cancer Institute & Harvard Medical School, Boston, Massachusetts, United States of America aff005;  Division of Genetics, Brigham & Women’s Hospital, Boston, MA, United States of America aff006;  Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, United States of America aff007;  Department of Human Genetics, University of California Los Angeles, Los Angeles, California, United States of America aff008;  Department of Pathology and Laboratory Medicine, University of California Los Angeles, Los Angeles, California, United States of America aff009;  Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, California, United States of America aff010
Vyšlo v časopise: Leveraging expression from multiple tissues using sparse canonical correlation analysis and aggregate tests improves the power of transcriptome-wide association studies. PLoS Genet 17(4): e1008973. doi:10.1371/journal.pgen.1008973
Kategorie: Research Article
doi: https://doi.org/10.1371/journal.pgen.1008973


Transcriptome-wide association studies (TWAS) test the association between traits and genetically predicted gene expression levels. The power of a TWAS depends in part on the strength of the correlation between a genetic predictor of gene expression and the causally relevant gene expression values. Consequently, TWAS power can be low when expression quantitative trait locus (eQTL) data used to train the genetic predictors have small sample sizes, or when data from causally relevant tissues are not available. Here, we propose to address these issues by integrating multiple tissues in the TWAS using sparse canonical correlation analysis (sCCA). We show that sCCA-TWAS combined with single-tissue TWAS using an aggregate Cauchy association test (ACAT) outperforms traditional single-tissue TWAS. In empirically motivated simulations, the sCCA+ACAT approach yielded the highest power to detect a gene associated with phenotype, even when expression in the causal tissue was not directly measured, while controlling the Type I error when there is no association between gene expression and phenotype. For example, when gene expression explains 2% of the variability in outcome, and the GWAS sample size is 20,000, the average power difference between the ACAT combined test of sCCA features and single-tissue, versus single-tissue combined with Generalized Berk-Jones (GBJ) method, single-tissue combined with S-MultiXcan, UTMOST, or summarizing cross-tissue expression patterns using Principal Component Analysis (PCA) approaches was 5%, 8%, 5% and 38%, respectively. The gain in power is likely due to sCCA cross-tissue features being more likely to be detectably heritable. When applied to publicly available summary statistics from 10 complex traits, the sCCA+ACAT test was able to increase the number of testable genes and identify on average an additional 400 additional gene-trait associations that single-trait TWAS missed. Our results suggest that aggregating eQTL data across multiple tissues using sCCA can improve the sensitivity of TWAS while controlling for the false positive rate.

Klíčová slova:

Body weight – Gene expression – Genetic polymorphism – Genetics – Genome-wide association studies – Heredity – Phenotypes – Research errors


1. Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. American journal of human genetics. 2017;101(1):5–22. doi: 10.1016/j.ajhg.2017.06.005 28686856; PubMed Central PMCID: PMC5501872.

2. Zhang Y, Qi G, Park JH, Chatterjee N. Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits. Nat Genet. 2018;50(9):1318–26. doi: 10.1038/s41588-018-0193-x 30104760.

3. Gusev A, Arthur K, Huwenbo S, Gaurav B, Wonil C, Brenda WJHP, et al. Integrative approaches for large-scale transcriptome-wide association studies. Nature Genetics. 2016;48(3). doi: 10.1038/ng.3506 26854917

4. Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K, Carroll RJ, et al. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet. 2015;47(9):1091–8. Epub 2015/08/11. doi: 10.1038/ng.3367 26258848; PubMed Central PMCID: PMC4552594.

5. Mancuso N, Kichaev G, Shi H, Freund M, Gusev A, Pasaniuc B. Probabilistic fine-mapping of transcriptome-wide association studies. bioRxiv. 2018.

6. Wu L, Cox A, Zheng W. Identification of novel susceptibility loci and genes for breast cancer risk: A transcriptome-wide association study of 229,000 women of European descent. 2018.

7. Wainberg M, Sinnott-Armstrong N, Mancuso N, Barbeira AN, Knowles D, Golan D, et al. Transcriptome-wide association studies: opportunities and challenges. bioRxiv. 2018:206961. doi: 10.1101/206961

8. Finucane HK, Reshef YA, Anttila V, Slowikowski K, Gusev A, Byrnes A, et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat Genet. 2018;50(4):621–9. Epub 2018/04/11. doi: 10.1038/s41588-018-0081-4 29632380; PubMed Central PMCID: PMC5896795.

9. GTExConsortium. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015;348(6235):648–60. Epub 2015/05/09. doi: 10.1126/science.1262110 25954001; PubMed Central PMCID: PMC4547484.

10. GTExConsortium. Erratum: Genetic effects on gene expression across human tissues. Nature. 2018;553(7689):530. Epub 2017/12/21. doi: 10.1038/nature25160 29258290.

11. Liu X, Finucane HK, Gusev A, Bhatia G, Gazal S, O’Connor L, et al. Functional Architectures of Local and Distal Regulation of Gene Expression in Multiple Human Tissues. American journal of human genetics. 2017;100(4):605–16. Epub 2017/03/28. doi: 10.1016/j.ajhg.2017.03.002 28343628; PubMed Central PMCID: PMC5384099.

12. Hu Y, Li M, Lu Q, Weng H, Wang J, Zekavat SM, et al. A statistical framework for cross-tissue transcriptome-wide association analysis. bioRxiv. 2018. doi: 10.1101/286013

13. Barbeira AN, Pividori MD, Zheng J, Wheeler HE, Nicolae DL, Im HK. Integrating predicted transcriptome from multiple tissues improves association detection.(Research Article)(Report). PLoS Genetics. 2019;15(1). BarbeiraAlvaroN.2019Iptf.

14. Witten DM, Tibshirani R, Hastie T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics (Oxford, England). 2009;10(3):515–34. Epub 2009/04/21. doi: 10.1093/biostatistics/kxp008 19377034; PubMed Central PMCID: PMC2697346.

15. Liu Y, Chen S, Li Z, Morrison AC, Boerwinkle E, Lin X. ACAT: A Fast and Powerful p Value Combination Method for Rare-Variant Analysis in Sequencing Studies. American journal of human genetics. 2019;104(3):410–21. doi: 10.1016/j.ajhg.2019.01.002 30849328; PubMed Central PMCID: PMC6407498.

16. Sun R, Hui S, Bader G, Lin X, Kraft P. Powerful gene set analysis in GWAS with the Generalized Berk-Jones statistic. PLoS Genetics. 2019. doi: 10.1371/journal.pgen.1007530 30875371

17. Gusev A. TWAS HUB 2017. Available from: http://twas-hub.org/.

18. van Buuren S, Groothuis-Oudshoorn CGM. mice: Multivariate Imputation by Chained Equations in R. Journal of statistical software. 2011;45(3):urn issn 1548–7660. vanBuurenStef2011mMIb.

19. Marioni RE, Harris SE, Zhang Q, McRae AF, Hagenaars SP, Hill WD, et al. GWAS on family history of Alzheimer’s disease. Transl Psychiatry. 2018;8(1):99. Epub 2018/05/20. doi: 10.1038/s41398-018-0150-6 29777097; PubMed Central PMCID: PMC5959890.

20. Michailidou K, Lindström S, Dennis J, Beesley J, Hui S, Kar S, et al. Association analysis identifies 65 new breast cancer risk loci. 2017;551(7678). MichailidouKyriaki2017Aai6.

21. Nelson CP, Goel A, Butterworth AS, Kanoni S, Webb TR, Marouli E, et al. Association analyses based on false discovery rate implicate new loci for coronary artery disease. Nature Genetics. 49(9):1385–91. NelsonChristopherPAabo. doi: 10.1038/ng.3913 28714975

22. Morris AP, Voight BF, Teslovich TM, Ferreira T, Segre AV, Steinthorsdottir V, et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat Genet. 2012;44(9):981–90. Epub 2012/08/14. doi: 10.1038/ng.2383 22885922; PubMed Central PMCID: PMC3442244.

23. Ruderfer DM, Ripke S, McQuillin A, Boocock J, Stahl EA, Pavlides JMW, et al. Genomic Dissection of Bipolar Disorder and Schizophrenia, Including 28 Subphenotypes. Cell. 2018;173(7):1705–8674. RuderferDouglasM.2018GDoB. doi: 10.1016/j.cell.2018.05.046 29906448

24. Loh PR, Kichaev G, Gazal S, Schoech AP, Price AL. Mixed-model association for biobank-scale datasets. Nat Genet. 2018;50(7):906–8. Epub 2018/06/13. doi: 10.1038/s41588-018-0144-6 29892013; PubMed Central PMCID: PMC6309610.

25. Aschard H, Vilhjalmsson BJ, Greliche N, Morange PE, Tregouet DA, Kraft P. Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies. American journal of human genetics. 2014;94(5):662–76. Epub 2014/04/22. doi: 10.1016/j.ajhg.2014.03.016 24746957; PubMed Central PMCID: PMC4067564.

26. Zhou D, Jiang Y, Zhong X, Cox NJ, Liu C, Gamazon ER. A unified framework for joint-tissue transcriptome-wide association and Mendelian randomization analysis. Nat Genet. 2020. Epub 2020/10/07. doi: 10.1038/s41588-020-0706-2 33020666.

27. Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. American journal of human genetics. 2011;88(1):76–82. doi: 10.1016/j.ajhg.2010.11.011 21167468; PubMed Central PMCID: PMC3014363.

28. Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of statistical software. 2010;33(1):1–22. Epub 2010/09/03. 20808728; PubMed Central PMCID: PMC2929880.

Článek vyšel v časopise

PLOS Genetics

2021 Číslo 4
Nejčtenější tento týden
Nejčtenější v tomto čísle

Zvyšte si kvalifikaci online z pohodlí domova

Důležitost adherence při depresivním onemocnění
nový kurz
Autoři: MUDr. Eliška Bartečková, Ph.D.

Koncepce osteologické péče pro gynekology a praktické lékaře
Autoři: MUDr. František Šenk

Sekvenční léčba schizofrenie
Autoři: MUDr. Jana Hořínková, Ph.D.

Hypertenze a hypercholesterolémie – synergický efekt léčby
Autoři: prof. MUDr. Hana Rosolová, DrSc.

Multidisciplinární zkušenosti u pacientů s diabetem
Autoři: Prof. MUDr. Martin Haluzík, DrSc., prof. MUDr. Vojtěch Melenovský, CSc., prof. MUDr. Vladimír Tesař, DrSc.

Všechny kurzy
Zapomenuté heslo

Zadejte e-mailovou adresu, se kterou jste vytvářel(a) účet, budou Vám na ni zaslány informace k nastavení nového hesla.


Nemáte účet?  Registrujte se