Phenotype-genotype comorbidity analysis of patients with rare disorders provides insight into their pathological and molecular bases

Autoři: Elena Díaz-Santiago aff001;  Fernando M. Jabato aff001;  Elena Rojano aff001;  Pedro Seoane aff001;  Florencio Pazos aff003;  James R. Perkins aff001;  Juan A. G. Ranea aff001
Působiště autorů: Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain aff001;  CIBER de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain aff002;  National Centre for Biotechnology (CNB-CSIC), Madrid, Spain aff003;  The Biomedical Research Institute of Malaga (IBIMA), Malaga, Spain aff004
Vyšlo v časopise: Phenotype-genotype comorbidity analysis of patients with rare disorders provides insight into their pathological and molecular bases. PLoS Genet 16(10): e32767. doi:10.1371/journal.pgen.1009054
Kategorie: Research Article
doi: 10.1371/journal.pgen.1009054


Genetic and molecular analysis of rare disease is made difficult by the small numbers of affected patients. Phenotypic comorbidity analysis can help rectify this by combining information from individuals with similar phenotypes and looking for overlap in terms of shared genes and underlying functional systems. However, few studies have combined comorbidity analysis with genomic data. We present a computational approach that connects patient phenotypes based on phenotypic co-occurence and uses genomic information related to the patient mutations to assign genes to the phenotypes, which are used to detect enriched functional systems. These phenotypes are clustered using network analysis to obtain functionally coherent phenotype clusters. We applied the approach to the DECIPHER database, containing phenotypic and genomic information for thousands of patients with heterogeneous rare disorders and copy number variants. Validity was demonstrated through overlap with known diseases, co-mention within the biomedical literature, semantic similarity measures, and patient cluster membership. These connected pairs formed multiple phenotype clusters, showing functional coherence, and mapped to genes and systems involved in similar pathological processes. Examples include claudin genes from the 22q11 genomic region associated with a cluster of phenotypes related to DiGeorge syndrome and genes related to the GO term anterior/posterior pattern specification associated with abnormal development. The clusters generated can help with the diagnosis of rare diseases, by suggesting additional phenotypes for a given patient and potential underlying functional systems. Other tools to find causal genes based on phenotype were also investigated. The approach has been implemented as a workflow, named PhenCo, which can be adapted to any set of patients for which phenomic and genomic data is available. Full details of the analysis, including the clusters formed, their constituent functional systems and underlying genes are given. Code to implement the workflow is available from GitHub.

Klíčová slova:

Gene mapping – Gene prediction – Genetics of disease – Genomics – Homeobox – Human genetics – Mutation databases – Phenotypes


1. Baldovino S, Moliner AM, Taruscio D, Daina E, Roccatello D. Rare Diseases in Europe: from a Wide to a Local Perspective. The Israel Medical Association journal: IMAJ. 2016;18(6):359–363. 27468531

2. Svenstrup D, Jørgensen HL, Winther O. Rare disease diagnosis: A review of web search, social media and large-scale data-mining approaches. Rare Diseases. 2015;3(1):e1083145. doi: 10.1080/21675511.2015.1083145 26442199

3. Schieppati A, Henter JI, Daina E, Aperia A. Why rare diseases are an important medical and social issue. The Lancet. 2008;371(9629):2039–2041. doi: 10.1016/S0140-6736(08)60872-7

4. Schee Genannt Halfmann S, Mählmann L, Leyens L, Reumann M, Brand A. Personalized medicine: What’s in it for rare diseases? In: Advances in Experimental Medicine and Biology. vol. 1031. Springer New York LLC; 2017. p. 387–404.

5. Boycott KM, Vanstone MR, Bulman DE, MacKenzie AE. Rare-disease genetics in the era of next-generation sequencing: Discovery to translation. Nature Reviews Genetics. 2013;14(10):681–691. doi: 10.1038/nrg3555 23999272

6. Chong JX, Buckingham KJ, Jhangiani SN, Boehm C, Sobreira N, Smith JD, et al. The Genetic Basis of Mendelian Phenotypes: Discoveries, Challenges, and Opportunities. The American Journal of Human Genetics. 2015;97:199–215. doi: 10.1016/j.ajhg.2015.06.009 26166479

7. Hamosh A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Research. 2004;33(Database issue):D514–D517. doi: 10.1093/nar/gki033

8. Orphanet. An online database of rare diseases and orphan drugs. Copyright, INSERM 1997. Available at Accessed 2019-01-25;. Available from:

9. Köhler S, Carmody L, Vasilevsky N, Jacobsen JOB, Danis D, Gourdine JP, et al. Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Research. 2019;47(D1):D1018–D1027. doi: 10.1093/nar/gky1105 30476213

10. Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. Journal of chronic diseases. 1987;40(5):373–83. doi: 10.1016/0021-9681(87)90171-8 3558716

11. Elixhauser A, Steiner C, Harris DR, Coffey RM. Comorbidity measures for use with administrative data. Medical care. 1998;36(1):8–27. doi: 10.1097/00005650-199801000-00004 9431328

12. Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabási AL. The human disease network. Proceedings of the National Academy of Sciences of the United States of America. 2007;104(21):8685–8690. doi: 10.1073/pnas.0701361104 17502601

13. Rodríguez-López R, Reyes-Palomares A, Sánchez-Jiménez F, Medina MÁ. PhenUMA: a tool for integrating the biomedical relationships among genes and diseases. BMC bioinformatics. 2014;15(1):375. doi: 10.1186/s12859-014-0375-1 25420641

14. Goh KI, Choi IG. Exploring the human diseasome: the human disease network. Briefings in Functional Genomics. 2012;2(6):533–542. doi: 10.1093/bfgp/els032

15. Hidalgo CA, Blumm N, Barabási AL, Christakis NA. A Dynamic Network Approach for the Study of Human Phenotypes. PLoS Computational Biology. 2009;5(4):e1000353. doi: 10.1371/journal.pcbi.1000353 19360091

16. Bagley SC, Sirota M, Chen R, Butte AJ, Altman RB. Constraints on Biological Mechanism from Disease Comorbidity Using Electronic Medical Records and Database of Genetic Variants. PLoS Computational Biology. 2016;12(4):e1004885. doi: 10.1371/journal.pcbi.1004885 27115429

17. Verma A, Bang L, Miller JE, Zhang Y, Lee MTM, Zhang Y, et al. Human-Disease Phenotype Map Derived from PheWAS across 38,682 Individuals. American Journal of Human Genetics. 2019. doi: 10.1016/j.ajhg.2018.11.006

18. Rzhetsky A, Wajngurt D, Park N, Zheng T. Probing genetic overlap among complex human phenotypes. Proceedings of the National Academy of Sciences. 2007;104(28):11694–11699. doi: 10.1073/pnas.0704820104

19. Köhler S, Schulz MH, Krawitz P, Bauer S, Dölken S, Ott CE, et al. Clinical Diagnostics in Human Genetics with Semantic Similarity Searches in Ontologies. The American Journal of Human Genetics. 2009;85(4):457–464. doi: 10.1016/j.ajhg.2009.09.003 19800049

20. Yang H, Robinson PN, Wang K. Phenolyzer: Phenotype-based prioritization of candidate genes for human diseases. Nature Methods. 2015. doi: 10.1038/nmeth.3484

21. Vasant D, Chanas L, Malone J, Hanauer M, Olry A, Jupp S, et al. ORDO: An Ontology Connecting Rare Disease, Epidemiology and Genetic Data. Phenotype data at ISMB2014. 2014.

22. Zhou X, Menche J, Barabási AL, Sharma A. Human symptoms–disease network. Nature Communications. 2014;5(1):4212. doi: 10.1038/ncomms5212 24967666

23. Peng J, Hui W, Shang X. Measuring phenotype-phenotype similarity through the interactome. BMC Bioinformatics. 2018;19(S5):114. doi: 10.1186/s12859-018-2102-9 29671400

24. Firth HV, Richards SM, Bevan AP, Clayton S, Corpas M, Rajan D, et al. DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. American Journal of Human Genetics. 2009;84(4):524–533. doi: 10.1016/j.ajhg.2009.03.010 19344873

25. Bueno A, Rodríguez-López R, Reyes-Palomares A, Rojano E, Corpas M, Nevado J, et al. Phenotype-loci associations in networks of patients with rare disorders: application to assist in the diagnosis of novel clinical cases. European Journal of Human Genetics. 2018;26(10):1451–1461. doi: 10.1038/s41431-018-0139-x 29946186

26. Seoane P, Ocaña S, Carmona R, Bautista R, Madrid E, Torres AM, et al. AutoFlow, a Versatile Workflow Engine Illustrated by Assembling an Optimised de novo Transcriptome for a Non-Model Species, such as Faba Bean (Vicia faba). Current Bioinformatics. 2016;11(4):440–450. doi: 10.2174/1574893611666160212235117

27. Chiarelli N, Carini G, Zoppi N, Dordoni C, Ritelli M, Venturini M, et al. Transcriptome-Wide Expression Profiling in Skin Fibroblasts of Patients with Joint Hypermobility Syndrome/Ehlers-Danlos Syndrome Hypermobility Type. PLOS ONE. 2016;11(8):e0161347. doi: 10.1371/journal.pone.0161347 27518164

28. Sirotkin H, Morrow B, Saint-Jore B, Puech A, Das Gupta R, Patanjali SR, et al. Identification, characterization, and precise mapping of a human gene encoding a novel membrane-spanning protein from the 22q11 region deleted in velo-cardio-facial syndrome. Genomics. 1997;42(2):245–251. doi: 10.1006/geno.1997.4734 9192844

29. Tomita-Mitchell A, Mahnke DK, Struble CA, Tuffnell ME, Stamm KD, Hidestrand M, et al. Human gene copy number spectra analysis in congenital heart malformations. Physiological genomics. 2012;44(9):518–41. doi: 10.1152/physiolgenomics.00013.2012 22318994

30. Lee W, van den Veyver IB. 22q11.2 Deletion Syndrome. In: Obstetric Imaging: Fetal Diagnosis and Care: Second Edition; 2017. p. 621–626.e1.

31. Kurtulmuş S, Demirpençe S, Öztekin DC, Koç A, Tavli V. Antenatal diagnosis of left atrial isomerism and heterotaxy syndrome in fetus with Meckel-Gruber syndrome. Turk Kardiyoloji Dernegi Arsivi. 2014;42(2):182–185. doi: 10.5543/tkda.2014.71173 24643152

32. Leonardi ML, Pai G Shashidhar, Wilkes B, Lebel RR. Ritscher-Schinzel cranio-cerebello-cardiac (3C) syndrome: Report of four new cases and review. American Journal of Medical Genetics. 2001;102(3):237–242. doi: 10.1002/ajmg.1449 11484200

33. Czarnecki P, Lacombe D, Weiss L. Toriello-Carey syndrome: Evidence for X-linked inheritance. American Journal of Medical Genetics. 1996;65(4):291–294. doi: 10.1002/(SICI)1096-8628(19961111)65:4%3C291::AID-AJMG9%3E3.0.CO;2-S 8923938

34. Green RF, Devine O, Crider KS, Olney RS, Archer N, Olshan AF, et al. Association of Paternal Age and Risk for Major Congenital Anomalies From the National Birth Defects Prevention Study, 1997 to 2004. Annals of Epidemiology. 2010;20(3):241–249. doi: 10.1016/j.annepidem.2009.10.009 20056435

35. Groot KR, Sevilla LM, Nishi K, DiColandrea T, Watt FM. Kazrin, a novel periplakin-interacting protein associated with desmosomes and the keratinocyte plasma membrane. Journal of Cell Biology. 2004;166(5):653–659. doi: 10.1083/jcb.200312123 15337775

36. Bonnart C, Deraison C, Lacroix M, Uchida Y, Besson C, Robin A, et al. Elastase 2 is expressed in human and mouse epidermis and impairs skin barrier function in Netherton syndrome through filaggrin and lipid misprocessing. The Journal of clinical investigation. 2010;120(3):871–82. doi: 10.1172/JCI41440 20179351

37. Yucesoy G, Cakiroglu Y, Caliskan E. Fryns syndrome: Case report and review of the literature. Journal of Clinical Ultrasound. 2008;36(5):315–317. doi: 10.1002/jcu.20409 17960800

38. Maneerat Y, Prasongsukarn K, Benjathummarak S, Dechkhajorn W, Chaisri U. Increased alpha-defensin expression is associated with risk of coronary heart disease: a feasible predictive inflammatory biomarker of coronary heart disease in hyperlipidemia patients. Lipids in Health and Disease. 2016;15(1):117. doi: 10.1186/s12944-016-0285-5 27430968

39. Zhang Y, Li Y, Wang Y, Shan B, Duan Y. 8p23.1 duplication detected by array-CGH with complete atrioventricular septal defect and unilateral hand preaxial hexadactyly. American Journal of Medical Genetics Part A. 2013;161(3):561–565. doi: 10.1002/ajmg.a.35596

40. Quinonez SC, Innis JW. Human HOX gene disorders. Molecular genetics and metabolism. 2014;111(1):4–15. doi: 10.1016/j.ymgme.2013.10.012 24239177

41. Goodman FR. Limb malformations and the humanHOX genes. American Journal of Medical Genetics. 2002;112(3):256–265. doi: 10.1002/ajmg.10776 12357469

42. Robledo RF, Rajan L, Li X, Lufkin T. The Dlx5 and Dlx6 homeobox genes are essential for craniofacial, axial, and appendicular skeletal development. Genes & Development. 2002;16(9):1089–1101. doi: 10.1101/gad.988402

43. Juan AH, Kumar RM, Marx JG, Young RA, Sartorelli V. Mir-214-dependent regulation of the polycomb protein Ezh2 in skeletal muscle and embryonic stem cells. Molecular cell. 2009;36(1):61–74. doi: 10.1016/j.molcel.2009.08.008 19818710

44. Russell AP, Lamon S. Exercise, Skeletal Muscle and Circulating microRNAs. In: Progress in molecular biology and translational science. vol. 135; 2015. p. 471–496. doi: 10.1016/bs.pmbts.2015.07.018

45. Mahapatra KK, Panigrahi DP, Praharaj PP, Bhol CS, Patra S, Mishra SR, et al. Molecular interplay of autophagy and endocytosis in human health and diseases. Biological Reviews. 2019. doi: 10.1111/brv.12515 30989802

46. Chatron N, Haddad V, Andrieux J, Désir J, Boute O, Dieux A, et al. Refinement of genotype-phenotype correlation in 18 patients carrying a 1q24q25 deletion. American Journal of Medical Genetics, Part A. 2015. doi: 10.1002/ajmg.a.36856

47. Chen Y, Liu J, Zhang Z, Ding J. Hand malformations imaging characteristics and clinical classification: a case-control study. Zhonghua yi xue za zhi. 2015;95(19):1534–1536. 26178510

48. Pereda A, Garin I, Garcia-Barcina M, Gener B, Beristain E, Ibañez AM, et al. Brachydactyly E: Isolated or as a feature of a syndrome; 2013.

49. Van Driel MA, Bruggeman J, Vriend G, Brunner HG, Leunissen JA. A text-mining analysis of the human phenome. European Journal of Human Genetics. 2006;14:535–542. doi: 10.1038/sj.ejhg.5201585 16493445

50. Fotouhi B, Momeni N, Riolo MA, Buckeridge DL. Statistical methods for constructing disease comorbidity networks from longitudinal inpatient data. Applied Network Science. 2018;3(1):46. doi: 10.1007/s41109-018-0101-4 30465022

51. Rojano E, Seoane P, Bueno-Amoros A, Perkins JR, Garcia-Ranea JA. Revealing the Relationship Between Human Genome Regions and Pathological Phenotypes Through Network Analysis. Springer, Cham; 2017. p. 197–207.

52. Zhang XA, Yates A, Vasilevsky N, Gourdine JP, Callahan TJ, Carmody LC, et al. Semantic integration of clinical laboratory tests from electronic health records for deep phenotyping and biomarker discovery. NPJ digital medicine. 2019;2(1):32. doi: 10.1038/s41746-019-0110-4 31119199

53. Sylvestre E, Bouzillé G, Chazard E, His-Mahier C, Riou C, Cuggia M. Combining information from a clinical data warehouse and a pharmaceutical database to generate a framework to detect comorbidities in electronic health records. BMC Medical Informatics and Decision Making. 2018;18(1):9. doi: 10.1186/s12911-018-0586-x 29368609

54. Roque FS, Jensen PB, Schmock H, Dalgaard M, Andreatta M, Hansen T, et al. Using Electronic Patient Records to Discover Disease Correlations and Stratify Patient Cohorts. PLoS Computational Biology. 2011;7(8):e1002141. doi: 10.1371/journal.pcbi.1002141 21901084

55. Wei WQ, Bastarache LA, Carroll RJ, Marlo JE, Osterman TJ, Gamazon ER, et al. Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record. PLOS ONE. 2017;12(7):e0175508. doi: 10.1371/journal.pone.0175508 28686612

56. Escudié JB, Rance B, Malamut G, Khater S, Burgun A, Cellier C, et al. A novel data-driven workflow combining literature and electronic health records to estimate comorbidities burden for a specific disease: a case study on autoimmune comorbidities in patients with celiac disease. BMC Medical Informatics and Decision Making. 2017;17(1):140. doi: 10.1186/s12911-017-0537-y 28962565

57. Chen Y, Zhang X, Zhang Gq, Xu R. Comparative analysis of a novel disease phenotype network based on clinical manifestations. Journal of Biomedical Informatics. 2015;53:113–120. doi: 10.1016/j.jbi.2014.09.007 25277758

58. Li J, Lin X, Teng Y, Qi S, Xiao D, Zhang J, et al. A comprehensive evaluation of disease phenotype networks for gene prioritization. PLoS ONE. 2016;11(7). doi: 10.1371/journal.pone.0159457

59. Martin HC, Jones WD, McIntyre R, Sanchez-Andrade G, Sanderson M, Stephenson JD, et al. Quantifying the contribution of recessive coding variation to developmental disorders. Science (New York, NY). 2018;362(6419):1161–1164. doi: 10.1126/science.aar6731

60. Ratner M. Next-generation sequencing tests to become routine. Nature Biotechnology. 2018;36(6):484–484. doi: 10.1038/nbt0618-484 29874203

61. Denny JC. Chapter 13: Mining electronic health records in the genomics era. PLoS computational biology. 2012;8(12):e1002823. doi: 10.1371/journal.pcbi.1002823 23300414

62. Son JH, Xie G, Yuan C, Ena L, Li Z, Goldstein A, et al. Deep Phenotyping on Electronic Health Records Facilitates Genetic Diagnosis by Clinical Exomes. American Journal of Human Genetics. 2018;103(1):58–73. doi: 10.1016/j.ajhg.2018.05.010 29961570

63. Lonsdale J, Thomas J, Salvatore M, Phillips R, Lo E, Shad S, et al. The Genotype-Tissue Expression (GTEx) project. Nature Genetics. 2013;45(6):580–585. doi: 10.1038/ng.2653

64. Rojano E, Seoane P, Ranea JAG, Perkins JR. Regulatory variants: from detection to predicting impact. Briefings in bioinformatics. 2018. doi: 10.1093/bib/bby039 29893792

65. Fuxman Bass JI, Diallo A, Nelson J, Soto JM, Myers CL, Walhout AJM. Using networks to measure similarity between genes: association index selection. Nature methods. 2013;10(12):1169–76. doi: 10.1038/nmeth.2728 24296474

66. Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A, et al. BioMart and Bioconductor: A powerful link between biological databases and microarray data analysis. Bioinformatics. 2005. doi: 10.1093/bioinformatics/bti525 16082012

67. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene Ontology: tool for the unification of biology. Nature Genetics. 2000;25(1):25–29. doi: 10.1038/75556 10802651

68. The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Research. 2019;47(D1):D330–D338. doi: 10.1093/nar/gky1055

69. Kanehisa M, Sato Y, Furumichi M, Morishima K, Tanabe M. New approach for understanding genome variations in KEGG. Nucleic Acids Research. 2019;47(D1):D590–D595. doi: 10.1093/nar/gky962 30321428

70. Fabregat A, Sidiropoulos K, Garapati P, Gillespie M, Hausmann K, Haw R, et al. The Reactome pathway Knowledgebase. Nucleic Acids Research. 2016;44(D1):D481–D487. doi: 10.1093/nar/gkv1351 26656494

71. Yu G, Wang LG, Han Y, He QY. ClusterProfiler: An R package for comparing biological themes among gene clusters. OMICS A Journal of Integrative Biology. 2012;16(5):284–287. doi: 10.1089/omi.2011.0118 22455463

72. Yu G, He QY. ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization. Molecular bioSystems. 2016;12(2):477–479. doi: 10.1039/C5MB00663E 26661513

73. Kalinka AT, Tomancak P. linkcomm: an R package for the generation, visualization, and analysis of link communities in networks of arbitrary size and type. Bioinformatics. 2011;27(14):2011–2012. doi: 10.1093/bioinformatics/btr311 21596792

74. Deng Y, Gao L, Wang B, Guo X. HPOSim: An r package for phenotypic similarity measure and enrichment analysis based on the human phenotype ontology. PLoS ONE. 2015. doi: 10.1371/journal.pone.0115692 25664462

Článek vyšel v časopise

PLOS Genetics

2020 Číslo 10
Nejčtenější tento týden
Nejčtenější v tomto čísle
Zapomenuté heslo

Nemáte účet?  Registrujte se

Zapomenuté heslo

Zadejte e-mailovou adresu, se kterou jste vytvářel(a) účet, budou Vám na ni zaslány informace k nastavení nového hesla.


Nemáte účet?  Registrujte se