Is reliance on an inaccurate genome sequence sabotaging your experiments?

Autoři: Rodrigo P. Baptista aff001;  Jessica C. Kissinger aff001
Působiště autorů: Center for Tropical and Emerging Global Diseases, University of Georgia, Athens, Georgia, United States of America aff001;  Institute of Bioinformatics, University of Georgia, Athens, Georgia, United States of America aff002;  Department of Genetics, University of Georgia, Athens, Georgia, United States of America aff003
Vyšlo v časopise: Is reliance on an inaccurate genome sequence sabotaging your experiments?. PLoS Pathog 15(9): e1007901. doi:10.1371/journal.ppat.1007901
Kategorie: Pearls


Advances in genomics have made whole genome studies increasingly feasible across the life sciences. However, new technologies and algorithmic advances do not guarantee flawless genomic sequences or annotation. Bias, errors, and artifacts can enter at any stage of the process from library preparation to annotation. When planning an experiment that utilizes a genome sequence as the basis for the design, there are a few basic checks that, if performed, may better inform the experimental design and ideally help avoid a failed experiment or inconclusive result.

Klíčová slova:

Biology and life sciences – Computational biology – Genomic libraries – Genetics – Genomics – Genome analysis – Sequence assembly tools – Genome annotation – Molecular biology – Molecular biology techniques – DNA construction – DNA library construction – Genomic library construction – Molecular biology assays and analysis techniques – Library screening – Genomic library screening – Gene mapping – Organisms – Eukaryota – Protozoans – Parasitic protozoans – Cryptosporidium – Cryptosporidium parvum – Research and analysis methods – Database and informatics methods – Bioinformatics – Sequence analysis – Sequence alignment


1. El-Sayed NM, Myler PJ, Bartholomeu DC, Nilsson D, Aggarwal G, Tran AN, et al. The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease. Science. 2005;309(5733):409–15. doi: 10.1126/science.1112631 16020725

2. Claessens A, Affara M, Assefa SA, Kwiatkowski DP, Conway DJ. Culture adaptation of malaria parasites selects for convergent loss-of-function mutants. Sci Rep. 2017;7:41303. doi: 10.1038/srep41303 28117431

3. Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, et al. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 2002;419(6906):498–511. doi: 10.1038/nature01097 12368864

4. Shin S, Park J. Characterization of sequence-specific errors in various next-generation sequencing systems. Mol Biosyst. 2016;12(3):914–22. doi: 10.1039/c5mb00750j 26790373

5. Magadum S, Banerjee U, Murugan P, Gangapur D, Ravikesavan R. Gene duplication as a major force in evolution. J Genet. 2013;92(1):155–61. 23640422

6. Kyes SA, Kraemer SM, Smith JD. Antigenic variation in Plasmodium falciparum: gene organization and regulation of the var multigene family. Eukaryot Cell. 2007;6(9):1511–20. doi: 10.1128/EC.00173-07 17644655

7. Horn D. Antigenic variation in African trypanosomes. Mol Biochem Parasitol. 2014;195(2):123–9. doi: 10.1016/j.molbiopara.2014.05.001 24859277

8. Lorenzi H, Khan A, Behnke MS, Namasivayam S, Swapna LS, Hadjithomas M, et al. Local admixture of amplified and diversified secreted pathogenesis determinants shapes mosaic Toxoplasma gondii genomes. Nat Commun. 2016;7:10147. doi: 10.1038/ncomms10147 26738725

9. Mardis ER. Next-generation sequencing platforms. Annu Rev Anal Chem (Palo Alto Calif). 2013;6:287–303.

10. Boetzer M, Pirovano W. SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information. BMC Bioinformatics. 2014;15:211. doi: 10.1186/1471-2105-15-211 24950923

11. Fuselli S, Baptista RP, Panziera A, Magi A, Guglielmi S, Tonin R, et al. A new hybrid approach for MHC genotyping: high-throughput NGS and long read MinION nanopore sequencing, with application to the non-model vertebrate Alpine chamois (Rupicapra rupicapra). Heredity (Edinb). 2018;121(4):293–303. doi: 10.1038/s41437-018-0070-5 29572469

12. Koren S, Harhay GP, Smith TP, Bono JL, Harhay DM, McVey SD, et al. Reducing assembly complexity of microbial genomes with single-molecule sequencing. Genome Biol. 2013;14(9):R101. doi: 10.1186/gb-2013-14-9-r101 24034426

13. Brooks CF, Francia ME, Gissot M, Croken MM, Kim K, Striepen B. Toxoplasma gondii sequesters centromeres to a specific nuclear region throughout the cell cycle. Proceedings of the National Academy of Sciences of the United States of America. 2011;108(9):3767–72. doi: 10.1073/pnas.1006741108 21321216

14. Bunnik EM, Venkat A, Shao J, McGovern KE, Batugedara G, Worth D, et al. Comparative 3D genome organization in apicomplexan parasites. Proc Natl Acad Sci U S A. 2019;116(8):3183–92. doi: 10.1073/pnas.1810815116 30723152

15. Sabina J, Leamon JH. Bias in Whole Genome Amplification: Causes and Considerations. Methods Mol Biol. 2015;1347:15–41. doi: 10.1007/978-1-4939-2990-0_2 26374307

16. Head SR, Komori HK, LaMere SA, Whisenant T, Van Nieuwerburgh F, Salomon DR, et al. Library construction for next-generation sequencing: overviews and challenges. Biotechniques. 2014;56(2):61–4, 6, 8, passim. doi: 10.2144/000114133 24502796

17. Leinonen R, Sugawara H, Shumway M, International Nucleotide Sequence Database C. The sequence read archive. Nucleic Acids Res. 2011;39(Database issue):D19–21. doi: 10.1093/nar/gkq1019 21062823

18. Salzberg SL, Church D, DiCuccio M, Yaschenko E, Ostell J. The genome Assembly Archive: a new public resource. PLoS Biol. 2004;2(9):E285. doi: 10.1371/journal.pbio.0020285 15367931

19. Pakseresht N, Alako B, Amid C, Cerdeno-Tarraga A, Cleland I, Gibson R, et al. Assembly information services in the European Nucleotide Archive. Nucleic Acids Res. 2014;42(Database issue):D38–43. doi: 10.1093/nar/gkt1082 24214989

20. Kitts PA, Church DM, Thibaud-Nissen F, Choi J, Hem V, Sapojnikov V, et al. Assembly: a resource for assembled genomes at NCBI. Nucleic Acids Res. 2016;44(D1):D73–80. doi: 10.1093/nar/gkv1226 26578580

21. Hackl T, Hedrich R, Schultz J, Forster F. proovread: large-scale high-accuracy PacBio correction through iterative short read consensus. Bioinformatics (Oxford, England). 2014;30(21):3004–11. doi: 10.1093/bioinformatics/btu392 25015988

22. Otto TD, Sanders M, Berriman M, Newbold C. Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology. Bioinformatics (Oxford, England). 2010;26(14):1704–7. doi: 10.1093/bioinformatics/btq269 20562415

23. Wang Z, Chen Y, Li Y. A brief review of computational gene prediction methods. Genomics Proteomics Bioinformatics. 2004;2(4):216–21. doi: 10.1016/S1672-0229(04)02028-5 15901250

24. Cantarel BL, Korf I, Robb SM, Parra G, Ross E, Moore B, et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 2008;18(1):188–96. doi: 10.1101/gr.6743907 18025269

25. Yandell M, Ence D. A beginner's guide to eukaryotic genome annotation. Nature reviews Genetics. 2012;13(5):329–42. doi: 10.1038/nrg3174 22510764

26. Isaza JP, Galvan AL, Polanco V, Huang B, Matveyev AV, Serrano MG, et al. Revisiting the reference genomes of human pathogenic Cryptosporidium species: reannotation of C. parvum Iowa and a new C. hominis reference. Sci Rep. 2015;5:16324. doi: 10.1038/srep16324 26549794

27. Abrahamsen MS, Templeton TJ, Enomoto S, Abrahante JE, Zhu G, Lancto CA, et al. Complete genome sequence of the apicomplexan, Cryptosporidium parvum. Science. 2004;304(5669):441–5. doi: 10.1126/science.1094786 15044751

28. Xu P, Widmer G, Wang Y, Ozaki LS, Alves JM, Serrano MG, et al. The genome of Cryptosporidium hominis. Nature. 2004;431(7012):1107–12. doi: 10.1038/nature02977 15510150

29. Hadfield SJ, Pachebat JA, Swain MT, Robinson G, Cameron SJ, Alexander J, et al. Generation of whole genome sequences of new Cryptosporidium hominis and Cryptosporidium parvum isolates directly from stool samples. BMC Genomics. 2015;16:650. doi: 10.1186/s12864-015-1805-9 26318339

30. Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics (Oxford, England). 2012;28(12):1647–9. doi: 10.1093/bioinformatics/bts199 22543367

31. Aurrecoechea C, Barreto A, Basenko EY, Brestelli J, Brunk BP, Cade S, et al. EuPathDB: the eukaryotic pathogen genomics database resource. Nucleic Acids Res. 2017;45(D1):D581–D91. doi: 10.1093/nar/gkw1105 27903906

32. Logan-Klumpler FJ, De Silva N, Boehme U, Rogers MB, Velarde G, McQuillan JA, et al. GeneDB—an annotation database for pathogens. Nucleic Acids Res. 2012;40(Database issue):D98–108. doi: 10.1093/nar/gkr1032 22116062

33. Tello-Ruiz MK, Naithani S, Stein JC, Gupta P, Campbell M, Olson A, et al. Gramene 2018: unifying comparative genomics and pathway resources for plant research. Nucleic Acids Res. 2018;46(D1):D1181–D9. doi: 10.1093/nar/gkx1111 29165610

34. Giraldo-Calderon GI, Emrich SJ, MacCallum RM, Maslen G, Dialynas E, Topalis P, et al. VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases. Nucleic Acids Res. 2015;43(Database issue):D707–13. doi: 10.1093/nar/gku1117 25510499

35. van Berkum NL, Lieberman-Aiden E, Williams L, Imakaev M, Gnirke A, Mirny LA, et al. Hi-C: a method to study the three-dimensional architecture of genomes. J Vis Exp. 2010(39).

36. Cairns J, Freire-Pritchett P, Wingett SW, Varnai C, Dimond A, Plagnol V, et al. CHiCAGO: robust detection of DNA looping interactions in Capture Hi-C data. Genome Biol. 2016;17(1):127. doi: 10.1186/s13059-016-0992-2 27306882

37. Schwartz DC, Li X, Hernandez LI, Ramnarain SP, Huff EJ, Wang YK. Ordered restriction maps of Saccharomyces cerevisiae chromosomes constructed by optical mapping. Science. 1993;262(5130):110–4. doi: 10.1126/science.8211116 8211116

Hygiena a epidemiologie Infekční lékařství Laboratoř

Článek vyšel v časopise

PLOS Pathogens

2019 Číslo 9
Nejčtenější tento týden
Nejčtenější v tomto čísle

Zvyšte si kvalifikaci online z pohodlí domova

Hypertenze a hypercholesterolémie – synergický efekt léčby
nový kurz
Autoři: prof. MUDr. Hana Rosolová, DrSc.

Multidisciplinární zkušenosti u pacientů s diabetem
Autoři: Prof. MUDr. Martin Haluzík, DrSc., prof. MUDr. Vojtěch Melenovský, CSc., prof. MUDr. Vladimír Tesař, DrSc.

Úloha kombinovaných preparátů v léčbě arteriální hypertenze
Autoři: prof. MUDr. Martin Haluzík, DrSc.

Autoři: MUDr. Ladislav Korábek, CSc., MBA

Terapie roztroušené sklerózy v kostce
Autoři: MUDr. Dominika Šťastná, Ph.D.

Všechny kurzy
Zapomenuté heslo

Zadejte e-mailovou adresu, se kterou jste vytvářel(a) účet, budou Vám na ni zaslány informace k nastavení nového hesla.


Nemáte účet?  Registrujte se