Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts

Autoři: Naeimeh Atabaki-Pasdar aff001;  Mattias Ohlsson aff002;  Ana Viñuela aff004;  Francesca Frau aff007;  Hugo Pomares-Millan aff001;  Mark Haid aff008;  Angus G. Jones aff009;  E. Louise Thomas aff010;  Robert W. Koivula aff001;  Azra Kurbasic aff001;  Pascal M. Mutie aff001;  Hugo Fitipaldi aff001;  Juan Fernandez aff001;  Adem Y. Dawed aff012;  Giuseppe N. Giordano aff001;  Ian M. Forgie aff012;  Timothy J. McDonald aff009;  Femke Rutters aff014;  Henna Cederberg aff015;  Elizaveta Chabanova aff016;  Matilda Dale aff017;  Federico De Masi aff018;  Cecilia Engel Thomas aff017;  Kristine H. Allin aff019;  Tue H. Hansen aff019;  Alison Heggie aff022;  Mun-Gwan Hong aff017;  Petra J. M. Elders aff023;  Gwen Kennedy aff024;  Tarja Kokkola aff025;  Helle Krogh Pedersen aff019;  Anubha Mahajan aff026;  Donna McEvoy aff022;  Francois Pattou aff027;  Violeta Raverdy aff027;  Ragna S. Häussler aff017;  Sapna Sharma aff028;  Henrik S. Thomsen aff016;  Jagadish Vangipurapu aff025;  Henrik Vestergaard aff019;  Leen M. ‘t Hart aff014;  Jerzy Adamski aff008;  Petra B. Musholt aff035;  Soren Brage aff036;  Søren Brunak aff018;  Emmanouil Dermitzakis aff004;  Gary Frost aff038;  Torben Hansen aff019;  Markku Laakso aff025;  Oluf Pedersen aff019;  Martin Ridderstråle aff041;  Hartmut Ruetten aff007;  Andrew T. Hattersley aff009;  Mark Walker aff022;  Joline W. J. Beulens aff014;  Andrea Mari aff043;  Jochen M. Schwenk aff017;  Ramneek Gupta aff018;  Mark I. McCarthy aff011;  Ewan R. Pearson aff012;  Jimmy D. Bell aff010;  Imre Pavo aff046;  Paul W. Franks aff001
Působiště autorů: Genetic and Molecular Epidemiology Unit, Department of Clinical Sciences, Lund University, Malmö, Sweden aff001;  Computational Biology and Biological Physics Unit, Department of Astronomy and Theoretical Physics, Lund University, Lund, Sweden aff002;  Center for Applied Intelligent Systems Research, Halmstad University, Halmstad, Sweden aff003;  Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland aff004;  Institute for Genetics and Genomics in Geneva, University of Geneva Medical School, Geneva, Switzerland aff005;  Swiss Institute of Bioinformatics, Geneva, Switzerland aff006;  Sanofi-Aventis Deutschland, Frankfurt am Main, Germany aff007;  Research Unit Molecular Endocrinology and Metabolism, Helmholtz Zentrum München, Neuherberg, Germany aff008;  Institute of Biomedical and Clinical Science, College of Medicine and Health, University of Exeter, Exeter, United Kingdom aff009;  Research Centre for Optimal Health, School of Life Sciences, University of Westminster, London, United Kingdom aff010;  Oxford Centre for Diabetes, Endocrinology and Metabolism, Radcliffe Department of Medicine, University of Oxford, Oxford, United Kingdom aff011;  Division of Population Health and Genomics, School of Medicine, University of Dundee, Ninewells Hospital, Dundee, United Kingdom aff012;  Blood Sciences, Royal Devon and Exeter NHS Foundation Trust, Exeter, United Kingdom aff013;  Department of Epidemiology and Biostatistics, Amsterdam Public Health Research Institute, Amsterdam UMC, Amsterdam, the Netherlands aff014;  Department of Endocrinology, Abdominal Centre, Helsinki University Hospital, Helsinki, Finland aff015;  Department of Diagnostic Radiology, Copenhagen University Hospital Herlev Gentofte, Herlev, Denmark aff016;  Affinity Proteomics, Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, Solna, Sweden aff017;  Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark aff018;  Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark aff019;  Center for Clinical Research and Prevention, Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark aff020;  Department of Cardiology and Endocrinology, Slagelse Hospital, Slagelse, Denmark aff021;  Institute of Cellular Medicine, Newcastle University, Newcastle upon Tyne, United Kingdom aff022;  Department of General Practice, Amsterdam Public Health Research Institute, Amsterdam UMC, Amsterdam, the Netherlands aff023;  Immunoassay Biomarker Core Laboratory, School of Medicine, University of Dundee, Ninewells Hospital, Dundee, United Kingdom aff024;  Internal Medicine, Institute of Clinical Medicine, University of Eastern Finland, Kuopio, Finland aff025;  Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom aff026;  University of Lille, Inserm, UMR 1190, Translational Research in Diabetes, Department of Endocrine Surgery, CHU Lille, Lille, France aff027;  German Center for Diabetes Research, Neuherberg, Germany aff028;  Unit of Molecular Epidemiology, Institute of Epidemiology II, Helmholtz Zentrum München, Neuherberg, Germany aff029;  Steno Diabetes Center Copenhagen, Gentofte, Denmark aff030;  Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, The Netherlands aff031;  Molecular Epidemiology, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands aff032;  Lehrstuhl für Experimentelle Genetik, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Technische Universität München, Freising, Germany aff033;  Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore aff034;  Diabetes Division, Research and Development, Sanofi, Frankfurt, Germany aff035;  MRC Epidemiology Unit, University of Cambridge, Cambridge, United Kingdom aff036;  Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark aff037;  Section for Nutrition Research, Department of Metabolism, Digestion and Reproduction, Imperial College London, London, United Kingdom aff038;  Faculty of Health Sciences, University of Southern Denmark, Odense, Denmark aff039;  Department of Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland aff040;  Clinical Pharmacology and Translational Medicine, Novo Nordisk, Søborg, Denmark aff041;  Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, the Netherlands aff042;  Institute of Neuroscience, National Research Council, Padua, Italy aff043;  NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Oxford, United Kingdom aff044;  OMNI Human Genetics, Genentech, South San Francisco, California, United States of America aff045;  Eli Lilly Regional Operations, Vienna, Austria aff046;  Department of Nutrition, Harvard School of Public Health, Boston, Massachusetts, United States of America aff047
Vyšlo v časopise: Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts. PLoS Med 17(6): e32767. doi:10.1371/journal.pmed.1003149
Kategorie: Research Article
doi: 10.1371/journal.pmed.1003149



Non-alcoholic fatty liver disease (NAFLD) is highly prevalent and causes serious health complications in individuals with and without type 2 diabetes (T2D). Early diagnosis of NAFLD is important, as this can help prevent irreversible damage to the liver and, ultimately, hepatocellular carcinomas. We sought to expand etiological understanding and develop a diagnostic tool for NAFLD using machine learning.

Methods and findings

We utilized the baseline data from IMI DIRECT, a multicenter prospective cohort study of 3,029 European-ancestry adults recently diagnosed with T2D (n = 795) or at high risk of developing the disease (n = 2,234). Multi-omics (genetic, transcriptomic, proteomic, and metabolomic) and clinical (liver enzymes and other serological biomarkers, anthropometry, measures of beta-cell function, insulin sensitivity, and lifestyle) data comprised the key input variables. The models were trained on MRI-image-derived liver fat content (<5% or ≥5%) available for 1,514 participants. We applied LASSO (least absolute shrinkage and selection operator) to select features from the different layers of omics data and random forest analysis to develop the models. The prediction models included clinical and omics variables separately or in combination. A model including all omics and clinical variables yielded a cross-validated receiver operating characteristic area under the curve (ROCAUC) of 0.84 (95% CI 0.82, 0.86; p < 0.001), which compared with a ROCAUC of 0.82 (95% CI 0.81, 0.83; p < 0.001) for a model including 9 clinically accessible variables. The IMI DIRECT prediction models outperformed existing noninvasive NAFLD prediction tools. One limitation is that these analyses were performed in adults of European ancestry residing in northern Europe, and it is unknown how well these findings will translate to people of other ancestries and exposed to environmental risk factors that differ from those of the present cohort. Another key limitation of this study is that the prediction was done on a binary outcome of liver fat quantity (<5% or ≥5%) rather than a continuous one.


In this study, we developed several models with different combinations of clinical and omics data and identified biological features that appear to be associated with liver fat accumulation. In general, the clinical variables showed better prediction ability than the complex omics variables. However, the combination of omics and clinical variables yielded the highest accuracy. We have incorporated the developed clinical models into a web interface (see: and made it available to the community.

Trial registration NCT03814915.

Klíčová slova:

Diabetes mellitus – Fats – Fatty liver – Insulin – Machine learning – Metabolomics – Oral glucose suppression test – Proteomic databases


Interní lékařství

Článek vyšel v časopise

PLOS Medicine

2020 Číslo 6

