Analysis and Performance Assessment of the Whole Genome Bisulfite Sequencing Data Workflow: Currently Available Tools and a Practical Guide to Advance DNA Methylation Studies
Ting Gong
Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii at Manoa, Honolulu, HI, 96813 USA
Search for more papers by this authorHeather Borgard
Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii at Manoa, Honolulu, HI, 96813 USA
Search for more papers by this authorZao Zhang
Department of Medicine, The Queen's Medical Center, Honolulu, HI, 96813 USA
Search for more papers by this authorShaoqiu Chen
Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii at Manoa, Honolulu, HI, 96813 USA
Search for more papers by this authorZitong Gao
Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii at Manoa, Honolulu, HI, 96813 USA
Search for more papers by this authorCorresponding Author
Youping Deng
Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii at Manoa, Honolulu, HI, 96813 USA
E-mail: [email protected]
Search for more papers by this authorTing Gong
Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii at Manoa, Honolulu, HI, 96813 USA
Search for more papers by this authorHeather Borgard
Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii at Manoa, Honolulu, HI, 96813 USA
Search for more papers by this authorZao Zhang
Department of Medicine, The Queen's Medical Center, Honolulu, HI, 96813 USA
Search for more papers by this authorShaoqiu Chen
Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii at Manoa, Honolulu, HI, 96813 USA
Search for more papers by this authorZitong Gao
Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii at Manoa, Honolulu, HI, 96813 USA
Search for more papers by this authorCorresponding Author
Youping Deng
Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii at Manoa, Honolulu, HI, 96813 USA
E-mail: [email protected]
Search for more papers by this authorAbstract
DNA methylation is associated with transcriptional repression, genomic imprinting, stem cell differentiation, embryonic development, and inflammation. Aberrant DNA methylation can indicate disease states, including cancer and neurological disorders. Therefore, the prevalence and location of 5-methylcytosine in the human genome is a topic of interest. Whole-genome bisulfite sequencing (WGBS) is a high-throughput method for analyzing DNA methylation. This technique involves library preparation, alignment, and quality control. Advancements in epigenetic technology have led to an increase in DNA methylation studies. This review compares the detailed experimental methodology of WGBS using accessible and up-to-date analysis tools. Practical codes for WGBS data processing are included as a general guide to assist progress in DNA methylation studies through a comprehensive case study.
Conflict of Interest
The authors declare no conflict of interest.
Open Research
Data Availability Statement
The WGBS raw reads dataset analyzed during the current study are available from the corresponding author upon reasonable request. The Homo sapiens genome assembly GRCh38 (hg38) was obtained from https://hgdownload.soe.ucsc.edu/downloads.html.
References
- 1N. Zamudio, J. Barau, A. Teissandier, M. Walter, M. Borsos, N. Servant, D. Bourc'his, Genes Dev. 2015, 29, 1256.
- 2M. Ehrlich, R. Y.-H. Wang, Science 1981, 212, 1350.
- 3J. Doskočil, F. Šorm, Biochim. Biophys. Acta 1962, 55, 953.
- 4M. V. C. Greenberg, D. Bourc'his, Nat. Rev. Mol. Cell Biol. 2019, 20, 590.
- 5Z. D. Smith, A. Meissner, Nat. Rev. Genet. 2013, 14, 204.
- 6K. D. Robertson, Nat. Rev. Genet. 2005, 6, 597.
- 7S. Horvath, Y. Zhang, P. Langfelder, R. S. Kahn, M. Pm Boks, K. Van Eijk, L. H. Van Den Berg, R. A. Ophoff, Genome Biol. 2012, 13, R97.
- 8M. Frommer, L. E. Mcdonald, D. S. Millar, C. M. Collis, F. Watt, G. W. Grigg, P. L. Molloy, C. L. Paul, Proc. Natl. Acad. Sci. USA 1992, 89, 1827.
- 9M. Stevens, J. B. Cheng, D. Li, M. Xie, C. Hong, C. L. Maire, K. L. Ligon, M. Hirst, M. A. Marra, J. F. Costello, T. Wang, Genome Res. 2013, 23, 1541.
- 10M. J. Booth, T. W. B. Ost, D. Beraldi, N. M. Bell, M. R. Branco, W. Reik, S. Balasubramanian, Nat. Protoc. 2013, 8, 1841.
- 11R. Vaisvila, V. K. C. Ponnaluri, Z. Sun, B. W. Langhorst, L. Saleh, S. Guan, N. Dai, M. A. Campbell, B. S. Sexton, K. Marks, M. Samaranayake, J. C. Samuelson, H. E. Church, E. Tamanaha, I. R. Corrêa, S. Pradhan, E. T. Dimalanta, T. C. Evans, L. Williams, T. B. Davis, Genome Res. 2021, 31, 1280.
- 12J. D. Buenrostro, B. Wu, H. Y. Chang, W. J. Greenleaf, Curr. Protoc. Mol. Biol. 2015, 109, 21.
- 13M. F. Fraga, A. F. Fernâandez, Epigenomics in Health and Disease, Elsevier/Academic Press, Amsterdam, Boston 2016.
- 14N. Olova, F. Krueger, S. Andrews, D. Oxley, R. V. Berrens, M. R. Branco, W. Reik, Genome Biol. 2018, 19, 33.
- 15S. Adusumalli, M. F. Mohd Omar, R. Soong, T. Benoukraf, Briefings Bioinf. 2015, 16, 369.
- 16S. R. Head, H. K. Komori, S. A. Lamere, T. Whisenant, F. Van Nieuwerburgh, D. R. Salomon, P. Ordoukhanian, BioTechniques 2014, 56, 61.
- 17S. J. Cokus, S. Feng, X. Zhang, Z. Chen, B. Merriman, C. D. Haudenschild, S. Pradhan, S. F. Nelson, M. Pellegrini, S. E. Jacobsen, Nature 2008, 452, 215.
- 18R. Lister, R. C. O'malley, J. Tonti-Filippini, B. D. Gregory, C. C. Berry, A. H. Millar, J. R. Ecker, Cell 2008, 133, 523.
- 19H. Xiang, J. Zhu, Q. Chen, F. Dai, X. Li, M. Li, H. Zhang, G. Zhang, D. Li, Y. Dong, L. Zhao, Y. Lin, D. Cheng, J. Yu, J. Sun, X. Zhou, K. Ma, Y. He, Y. Zhao, S. Guo, M. Ye, G. Guo, Y. Li, R. Li, X. Zhang, L. Ma, K. Kristiansen, Q. Guo, J. Jiang, S. Beck, et al., Nat. Biotechnol. 2010, 28, 516.
- 20A. Zemach, I. E. Mcdaniel, P. Silva, D. Zilberman, Science 2010, 328, 916.
- 21A. Raine, E. Manlig, P. Wahlberg, A.-C. Syvänen, J. Nordlund, Nucleic Acids Res. 2017, 45, e36.
- 22A. Adey, J. Shendure, Genome Res. 2012, 22, 1139.
- 23F. Miura, Y. Enomoto, R. Dairiki, T. Ito, Nucleic Acids Res. 2012, 40, e136.
- 24F. Miura, T. Ito, DNA Res. 2015, 22, 13.
- 25M. B. Jones, S. K. Highlander, E. L. Anderson, W. Li, M. Dayrit, N. Klitgord, M. M. Fabani, V. Seguritan, J. Green, D. T. Pride, S. Yooseph, W. Biggs, K. E. Nelson, J. C. Venter, Proc. Natl. Acad. Sci. USA 2015, 112, 14024.
- 26S. Feng, Z. Zhong, M. Wang, S. E. Jacobsen, Epigenet. Chromatin 2020, 13, 42.
- 27M. A. Urich, J. R. Nery, R. Lister, R. J. Schmitz, J. R. Ecker, Nat. Protoc. 2015, 10, 475.
- 28S. S. Nair, P.-L. Luu, W. Qu, M. Maddugoda, L. Huschtscha, R. Reddel, G. Chenevix-Trench, M. Toso, J. G. Kench, L. G. Horvath, V. M. Hayes, P. D. Stricker, T. P. Hughes, D. L. White, J. E. J. Rasko, J. J.-L. Wong, S. J. Clark, Epigenet. Chromatin 2018, 11, 24.
- 29C. Bock, Nat. Rev. Genet. 2012, 13, 705.
- 30Li Zhou, H. K. Ng, D. I. Drautz-Moses, S. C. Schuster, S. Beck, C. Kim, J. C. Chambers, M. Loh, Sci. Rep. 2019, 9, 10383.
- 31Ge Tan, L. Opitz, R. Schlapbach, H. Rehrauer, Sci. Rep. 2019, 9, 2856.
- 32K. Wreczycka, A. Gosdschan, D. Yusuf, B. Grüning, Y. Assenov, A. Akalin, J. Biotechnol. 2017, 261, 105.
- 33R. Ekblom, J. B. W. Wolf, Evol. Appl. 2014, 7, 1026.
- 34H. Cheng, Y. Xu, bioRxiv 442798 2018, https://doi.org/10.1101/442798.
10.1101/442798 Google Scholar
- 35C. Bock, E. M. Tomazou, A. B. Brinkman, F. Müller, F. Simmer, H. Gu, N. Jäger, A. Gnirke, H. G. Stunnenberg, A. Meissner, Nat. Biotechnol. 2010, 28, 1106.
- 36C. Grehl, M. Wagner, I. Lemnian, B. Glaser, I. Grosse, Front. Plant Sci. 2020, 11, 176.
- 37J. Tsuji, Z. Weng, Briefings Bioinf. 2016, 17, 938.
- 38S. Schbath, V. Martin, M. Zytnicki, J. Fayolle, V. Loux, J.-F. Gibrat, J. Comput. Biol. 2012, 19, 796.
- 39B. Langmead, C. Wilks, V. Antonescu, R. Charles, Bioinformatics 2019, 35, 421.
- 40B. Langmead, Curr. Protoc. Bioinf. 2010, 32, 1.
10.1002/0471250953.bi1107s32 Google Scholar
- 41B. Langmead, C. Trapnell, M. Pop, S. L. Salzberg, Genome Biol. 2009, 10, R25.
- 42H. Li, R. Durbin, Bioinformatics 2009, 25, 1754.
- 43H. Li, Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM, arXiv:1303.3997v2 [q-bio.GN] 2013, https://github.com/lh3/bwa.
- 44S. Marco-Sola, M. Sammeth, R. Guigó, P. Ribeca, Nat. Methods 2012, 9, 1185.
- 45A. Merkel, M. Fernández-Callejo, E. Casals, S. Marco-Sola, R. Schuyler, I. G. Gut, S. C. Heath, Bioinformatics 2019, 35, 737.
- 46I. Rauluseviciute, F. Drabløs, M. B. Rye, Clin. Epigenet. 2019, 11, 193.
- 47F. Krueger, S. R. Andrews, Bioinformatics 2011, 27, 1571.
- 48E. Y. Harris, N. Ponts, K. G. Le Roch, S. Lonardi, Bioinformatics 2012, 28, 1795.
- 49E. Y. Harris, R. Ounit, S. Lonardi, Bioinformatics 2016, 32, 2696.
- 50Y. Xi, W. Li, BMC Bioinformatics 2009, 10, 232.
- 51T. D. Wu, J. Reeder, M. Lawrence, G. Becker, M. J. Brauer, Methods Mol. Biol. 2016, 1418, 283.
- 52J. Shang, F. Zhu, W. Vongsangnak, Y. Tang, W. Zhang, B. Shen, Biomed Res. Int. 2014, 2014, 309650.
- 53U. Manber, G. Myers, SIAM J. Comput. 1993, 22, 935.
- 54H. Cheng, M. Wu, Y. Xu, Bioinformatics 2018, 34, 416.
- 55P. M. Warnecke, C. Stirzaker, J. Song, C. Grunau, J. R. Melki, S. J. Clark, Methods 2002, 27, 101.
- 56L. Ji, T. Sasaki, X. Sun, P. Ma, Z. A. Lewis, R. J. Schmitz, Front. Genet. 2014, 5, 341.
- 57MethylDackel A (mostly) universal methylation extractor for BS-seq experiments, https://github.com/dpryan79/MethylDackel (accessed: November 2021).
- 58K. D. Hansen, B. Langmead, R. A. Irizarry, Genome Biol. 2012, 13, R83.
- 59K. Hebestreit, M. Dugas, H. U. Klein, Bioinformatics 2013, 29, 1647.
- 60F. Jühling, H. Kretzmer, S. H. Bernhart, C. Otto, P. F. Stadler, S. Hoffmann, Genome Res. 2016, 26, 256.
- 61A. Akalin, M. Kormaksson, S. Li, F. E. Garrett-Bakelman, M. E. Figueroa, A. Melnick, C. E. Mason, Genome Biol. 2012, 13, R87.
- 62D. E. Condon, P. V. Tran, Yu-C Lien, J. Schug, M. K. Georgieff, R. A. Simmons, K.-J. Won, BMC Bioinformatics 2018, 19, 31.
- 63FastQC A Quality Control tool for High Throughput Sequence Data, http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (accessed: November 2021).
- 64L. G. Roser, F. Agüero, D. O. Sánchez, BMC Bioinformatics 2019, 20, 361.
- 65S. Chen, Y. Zhou, Y. Chen, J. Gu, Bioinformatics 2018, 34, i884.
- 66M. Martin, EMBnet.journal 2011, 17, 10.
- 67A. M. Bolger, M. Lohse, B. Usadel, Bioinformatics 2014, 30, 2114.
- 68W. Guo, P. Fiziev, W. Yan, S. Cokus, X. Sun, M. Q. Zhang, P.-Y. Chen, M. Pellegrini, BMC Genomics 2013, 14, 774.
- 69M. C. Frith, R. Mori, K. Asai, Nucleic Acids Res. 2012, 40, e100.
- 70J.-Q. Lim, C. Tennakoon, G. Li, E. Wong, Y. Ruan, C.-L. Wei, W.-K. Sung, Genome Biol. 2012, 13, R82.
- 71Y. Saito, J. Tsuji, T. Mituyama, Nucleic Acids Res. 2014, 42, e45.
- 72H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, Bioinformatics 2009, 25, 2078.
- 73F. García-Alcalde, K. Okonechnikov, J. Carbonell, L. M. Cruz, S. Götz, S. Tarazona, J. Dopazo, T. F. Meyer, A. Conesa, Bioinformatics 2012, 28, 2678.
- 74B. S. Pedersen, R. L. Collins, M. E. Talkowski, A. R. Quinlan, GigaScience 2017, 6, 1.
- 75 Institute B. Picard Tools, http://broadinstitute.github.io/picard/ (accessed: November 2021).
- 76S. Heinz, C. Benner, N. Spann, E. Bertolino, Y. C. Lin, P. Laslo, J. X. Cheng, C. Murre, H. Singh, C. K. Glass, Mol. Cell 2010, 38, 576.
- 77R. J. Kinsella, A. Kahari, S. Haider, J. Zamora, G. Proctor, G. Spudich, J. Almeida-King, D. Staines, P. Derwent, A. Kerhornou, P. Kersey, P. Flicek, Database 2011, 2011, bar030.
- 78H. Pagès, BSgenome: Software infrastructure for efficient representation of full genomes and their SNPs, R package version 1.62.0 2021, https://bioconductor.org/packages/BSgenome.
- 79P. A. Stockwell, A. Chatterjee, E. J. Rodger, I. M. Morison, Bioinformatics 2014, 30, 1814.
- 80K. Korthauer, S. Chakraborty, Y. Benjamini, R. A. Irizarry, Biostatistics 2019, 20, 367.
- 81Y. Park, M. E. Figueroa, L. S. Rozek, M. A. Sartor, Bioinformatics 2014, 30, 2414.
- 82A. Khan, A. Mathelier, BMC Bioinformatics 2017, 18, 287.
- 83mwaskom/seaborn, https://doi.org/10.5281/zenodo.592845 (accessed: November 2021).
10.5281/zenodo.592845 Google Scholar
- 84J. D. Hunter, Comput. Sci. Eng. 2007, 9, 90.
- 85M. Krzywinski, J. Schein, I. Birol, J. Connors, R. Gascoyne, D. Horsman, S. J. Jones, M. A. Marra, Genome Res. 2009, 19, 1639.
- 86A tool to visualise and analyse high throughput mapped sequence data, https://www.bioinformatics.babraham.ac.uk/projects/seqmonk/ (accessed: November 2021).
- 87M. Wöste, E. Leitão, S. Laurentino, B. Horsthemke, S. Rahmann, C. Schröder, BMC Bioinformatics 2020, 21, 169.
- 88W.-W. Liao, M.-R. Yen, E. Ju, F.-M. Hsu, L. Lam, P.-Y. Chen, BMC Genomics 2015, 16, S11.
- 89O. Graña, H. López-Fernández, F. Fdez-Riverola, D. G. Pisano, D. Glez-Peña, Bioinformatics 2018, 34, 1414.
- 90H. Kretzmer, C. Otto, S. Hoffmann, F1000Research 2017, 6, 1490.
- 91P. Jiang, K. Sun, F. M. F. Lun, A. M. Guo, H. Wang, K. C. A. Chan, R. W. K. Chiu, Y. M. D. Lo, H. Sun, PLoS One 2014, 9, e100360.
- 92P. A. Ewels, A. Peltzer, S. Fillinger, H. Patel, J. Alneberg, A. Wilm, M. U. Garcia, P. Di Tommaso, S. Nahnsen, Nat. Biotechnol. 2020, 38, 276.
- 93R. Wurmus, B. Uyar, B. Osberg, V. Franke, A. Gosdschan, K. Wreczycka, J. Ronen, A. Akalin, GigaScience 2018, 7, 1.
- 94V. Bhardwaj, S. Heyne, K. Sikora, L. Rabbani, M. Rauer, F. Kilpert, A. S. Richter, D. P. Ryan, T. Manke, Bioinformatics 2019, 35, 4757.
- 95D. Sun, Y. Xi, B. Rodriguez, H. Park, P. Tong, M. Meong, M. A. Goodell, W. Li, Genome Biol. 2014, 15, R38.
- 96P. Ewels, F. Krueger, M. Käller, S. Andrews, F1000Research 2016, 5, 2824.