Orthology Guided Assembly in highly heterozygous crops: creating a reference transcriptome to uncover genetic diversity in Lolium perenne
Corresponding Author
Tom Ruttink
Plant Sciences Unit – Growth and Development, Institute for Agricultural and Fisheries Research (ILVO), Melle, Belgium
Correspondence (fax +32(0)9 272 29 01; email [email protected])Search for more papers by this authorLieven Sterck
Department of Plant Systems Biology, VIB, Ghent, Belgium
Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
Search for more papers by this authorAntje Rohde
Plant Sciences Unit – Growth and Development, Institute for Agricultural and Fisheries Research (ILVO), Melle, Belgium
Search for more papers by this authorChristian Bendixen
Department of Molecular Biology and Genetics, Research Centre Foulum, Aarhus University, Tjele, Denmark
Search for more papers by this authorPierre Rouzé
Department of Plant Systems Biology, VIB, Ghent, Belgium
Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
Search for more papers by this authorTorben Asp
Department of Molecular Biology and Genetics, Research Centre Flakkebjerg, Aarhus University, Slagelse, Denmark
Search for more papers by this authorYves Van de Peer
Department of Plant Systems Biology, VIB, Ghent, Belgium
Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
Search for more papers by this authorIsabel Roldan-Ruiz
Plant Sciences Unit – Growth and Development, Institute for Agricultural and Fisheries Research (ILVO), Melle, Belgium
Search for more papers by this authorCorresponding Author
Tom Ruttink
Plant Sciences Unit – Growth and Development, Institute for Agricultural and Fisheries Research (ILVO), Melle, Belgium
Correspondence (fax +32(0)9 272 29 01; email [email protected])Search for more papers by this authorLieven Sterck
Department of Plant Systems Biology, VIB, Ghent, Belgium
Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
Search for more papers by this authorAntje Rohde
Plant Sciences Unit – Growth and Development, Institute for Agricultural and Fisheries Research (ILVO), Melle, Belgium
Search for more papers by this authorChristian Bendixen
Department of Molecular Biology and Genetics, Research Centre Foulum, Aarhus University, Tjele, Denmark
Search for more papers by this authorPierre Rouzé
Department of Plant Systems Biology, VIB, Ghent, Belgium
Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
Search for more papers by this authorTorben Asp
Department of Molecular Biology and Genetics, Research Centre Flakkebjerg, Aarhus University, Slagelse, Denmark
Search for more papers by this authorYves Van de Peer
Department of Plant Systems Biology, VIB, Ghent, Belgium
Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
Search for more papers by this authorIsabel Roldan-Ruiz
Plant Sciences Unit – Growth and Development, Institute for Agricultural and Fisheries Research (ILVO), Melle, Belgium
Search for more papers by this authorTom Ruttink and Lieven Sterck contributed equally to the manuscript.
Summary
Despite current advances in next-generation sequencing data analysis procedures, de novo assembly of a reference sequence required for SNP discovery and expression analysis is still a major challenge in genetically uncharacterized, highly heterozygous species. High levels of polymorphism inherent to outbreeding crop species hamper De Bruijn Graph-based de novo assembly algorithms, causing transcript fragmentation and the redundant assembly of allelic contigs. If multiple genotypes are sequenced to study genetic diversity, primary de novo assembly is best performed per genotype to limit the level of polymorphism and avoid transcript fragmentation. Here, we propose an Orthology Guided Assembly procedure that first uses sequence similarity (tBLASTn) to proteins of a model species to select allelic and fragmented contigs from all genotypes and then performs CAP3 clustering on a gene-by-gene basis. Thus, we simultaneously annotate putative orthologues for each protein of the model species, resolve allelic redundancy and fragmentation and create a de novo transcript sequence representing the consensus of all alleles present in the sequenced genotypes. We demonstrate the procedure using RNA-seq data from 14 genotypes of Lolium perenne to generate a reference transcriptome for gene discovery and translational research, to reveal the transcriptome-wide distribution and density of SNPs in an outbreeding crop and to illustrate the effect of polymorphisms on the assembly procedure. The results presented here illustrate that constructing a non-redundant reference sequence is essential for comparative genomics, orthology-based annotation and candidate gene selection but also for read mapping and subsequent polymorphism discovery and/or read count-based gene expression analysis.
Supporting Information
Filename | Description |
---|---|
pbi12051-sup-0001-FigS1.epsimage/eps, 300.2 KB | Figure S1 Fragmentation patterns and allelic redundancy after de novo transcript assembly in 14 Lolium perenne genotypes. |
pbi12051-sup-0002-TableS1.pdfapplication/PDF, 36.3 KB | Table S1 Number of contigs generated from RNA-seq data from 14 independent Lolium perenne genotypes, using Trinity or CLCbio de novo assembly algorithms in different configurations |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
References
- Chen, S., Yang, P., Jiang, F., Wei, Y., Ma, Z. and Kang, L. (2010) De novo analysis of transcriptome dynamics in the migratory locust during the development of phase traits. PLoS One, 5(12), e15633.
- Clark, R.M., Schweikert, G., Toomajian, C., Ossowski, S., Zeller, G., Shinn, P., Warthmann, N., Hu, T.T., Fu, G., Hinds, D.A., Chen, H., Frazer, K.A., Huson, D.H., Schölkopf, B., Nordborg, M., Rätsch, G., Ecker, J.R. and Weigel, D. (2007) Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana. Science, 317, 338–342.
- Cogan, N.O.I., Ponting, R.C., Vecchies, A.C., Drayton, M.C., George, J., Dracatos, P.M., Dobrowolski, M.P., Sawbridge, T., Smith, K.F., Spangenberg, G. and Forster, J.W. (2006) Gene-associated single nucleotide polymorphism discovery in perennial ryegrass (Lolium perenne L.). Mol. Genet. Genom. 276, 101–112.
- Deschamps, S. and Campbell, M.A. (2010) Utilization of next-generation sequencing platforms in plant genomics and genetic variant discovery. Mol. Breed. 25, 553–570.
- Donmez, N. and Brudno, M. (2011) Hapsembler: An assembler for highly polymorphic genomes. In RECOMB 2011, ( V. Bafna and S.C. Sahinalp, eds), LNCS, vol. 6577, pp. 38–52. Heidelberg: Springer.
- Duan, J., Xia, C., Zhao, G., Jia, J. and Kong, X. (2012) Optimizing de novo common wheat transcriptome assembly using short-read RNA-Seq data. BMC Genomics 13, 392.
- Elshire, R.J., Glaubitz, J.C., Sun, Q., Poland, J.A., Kawamoto, K., Buckler, E.S. and Mitchell, S.E. (2011) A robust, simple Genotyping-by-Sequencing (GBS) approach for high diversity species. PLoS One 6(5), e19379.
- Flicek, P. and Birney, E. (2009) Sense from sequence reads: methods for alignment and assembly. Nat. Methods, 6, S6–S12.
- Garg, R., Patel, R.K., Tyagi, A.K. and Jain, M. (2011) De novo assembly of chickpea transcriptome using short reads for gene discovery and marker identification. DNA Res. 18, 53–63.
- Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N., Gnirke, A., Rhind, N., di Palma, F., Birren, B.W., Nusbaum, C., Lindblad-Toh, K., Friedman, N. and Regev, A. (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652.
- Guttman, M., Garber, M., Levin, J.Z., Donaghey, J., Robinson, J., Adiconis, X., Fan, L., Koziol, M.J., Gnirke, A., Nusbaum, C., Rinn, J.L., Lander, E.S. and Regev, A. (2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28, 503–510.
- Huang, X. and Madan, A. (1999) CAP3: a DNA sequence assembly program. Genome Res. 9, 868–877.
- Ilut, D.C., Coate, J.E., Luciano, A.K., Owens, T.G., May, G.D., Farmer, A. and Doyle, J.J. (2012) A comparative transcriptomic study of an allotetraploid and its diploid progenitors illustrates the unique advantages and challenges of RNA-seq in plant species. Am. J. Bot. 99, 383–396.
- Jiménez-Gómez, J.M. and Maloof, J.N. (2009) Sequence diversity in three tomato species: SNPs, markers, and molecular evolution. BMC Plant Biol. 9, 85.
- Koonin, E.V. (2005) Orthologs, paralogs, and evolutionary genomics. Annu. Rev. Genet. 39, 309–338.
- Li, Z., Chen, Y., Mu, D., Yuan, J., Shi, Y., Zhang, H., Gan, J., Li, N., Hu, X., Liu, B., Yang, B. and Fan, W. (2011) Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph. Brief Funct. Genomics, 11, 25–37.
- Mamanova, L., Coffey, A.J., Scott, C.E., Kozarewa, I., Turner, E.H., Kumar, A., Howard, E., Shendure, J. and Turner, D.J. (2010) Target-enrichment strategies for next-generation sequencing. Nat. Methods, 7, 111–118.
- Martin, J.A. and Wang, Z. (2011) Next-generation transcriptome assembly. Nat. Rev. Genet. 12, 671–682.
- Matsumoto, T., Tanaka, T., Sakai, H., Amano, N., Kanamori, H., Kurita, K., Kikuta, A., Kamiya, K., Yamamoto, M., Ikawa, H., Fujii, N., Hori, K., Itoh, T. and Sato, K. (2011) Comprehensive sequence analysis of 24,783 barley full-length cDNAs derived from twelve clone libraries. Plant Physiol. 156, 20–28.
- Mayer, K.F.X., Martis, M., Hedley, P.E., Šimková, H., Liu, H., Morris, J.A., Steuernagel, B., Taudien, S., Roessner, S., Gundlach, H., Kubaláková, M., Suchánková, P., Murat, F., Felder, M., Nussbaumer, T., Graner, A., Salse, J., Endo, T., Sakai, H., Tanaka, T., Itoh, T., Sato, K., Platzer, M., Matsumoto, T., Scholz, U., Doležel, J., Waugh, R. and Stein, N. (2011) Unlocking the barley genome by chromosomal and comparative genomics. Plant Cell, 23, 1249–1263.
- McNally, K.L., Childs, K.L., Bohnert, R., Davidson, R.M., Zhao, K., Ulat, V.J., Zeller, G., Clark, R.M., Hoen, D.R., Bureau, T.E., Stokowski, R., Ballinger, D.G., Frazer, K.A., Cox, D.R., Padhukasahasram, B., Bustamante, C.D., Weigel, D., Mackill, D.J., Bruskiewich, R.M., Rätsch, G., Buell, C.R., Leung, H. and Leach, J.E. (2009) Genomewide SNP variation reveals relationships among landraces and modern varieties of rice. Proc. Natl Acad. Sci. USA, 106, 12273–12278.
- Miller, J.R., Koren, S. and Sutton, G. (2010) Assembly algorithms for next-generation sequencing data. Genomics, 95, 315–327.
- Pertea, G., Huang, X., Liang, F., Antonescu, V., Sultana, R., Karamycheva, S., Lee, Y., White, J., Cheung, F., Parvizi, B., Tsai, J. and Quackenbush, J. (2003) TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics, 19, 651–652.
- Pfeifer, M., Martis, M., Asp, T., Mayer, K.F.X., Lübberstedt, T., Byrne, S., Frei, U. and Studer, B. (2013) The perennial ryegrass GenomeZipper – targeted use of genome resources for comparative grass genomics. Plant Physiol, 161, 571–582.
- Proost, S., Van Bel, M., Sterck, L., Billiau, K., Van Parys, T., Van de Peer, Y. and Vandepoele, K. (2009) PLAZA: a comparative genomics resource to study gene and genome evolution in plants. Plant Cell, 21, 3718–3731.
- Raes, J. and Van de Peer, Y. (2003) Gene duplication, the evolution of novel gene functions, and detecting functional divergence of duplicates in silico. Appl. Bioinformatics, 2, 91–101.
- Roldan-Ruiz, I. and Kölliker, R. (2010) Marker assisted selection in forage crops and turf: a review. In Sustainable Use of Genetic Diversity in Forage and Turf Breeding ( C. Huyghe, ed.), pp. 383–390. Dordrecht: Springer.
- Ruttink, T., Sterck, L., Vermeulen, E., Rohde, A. and Roldan-Ruiz, I. (2012) Development of a SNP identification pipeline for highly heterozygous crops. In Breeding Strategies for Sustainable Forage and Turf Grass Improvement ( D. Milbourne and S. Barth eds), Part 3, pp 131–139. Dordrecht: Springer.
- Sato, K., Shin-I, T., Seki, M., Shinozaki, K., Yoshida, H., Takeda, K., Yamazaki, Y., Conte, M. and Kohara, Y. (2009) Development of 5006 full-length cDNAs in barley: a tool for accessing cereal genomics resources. DNA Res. 16, 81–89.
- Strickler, S.R., Bombarely, A. and Mueller, L.A. (2012) Designing a transcriptome next-generation sequencing project for a nonmodel plant species. Am. J. Bot. 99, 257–266.
- Studer, B., Byrne, S., Nielsen, R.O., Panitz, F., Bendixen, C., Islam, M.S., Pfeifer, M., Lubberstedt, T. and Asp, T. (2012) A transcriptome map of perennial ryegrass (Lolium perenne L.). BMC Genomics. 13, 140.
- Surget-Groba, Y. and Montoya-Burgos, J.I. (2010) Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome Res. 20, 1432–1440.
- The International Brachypodium Initiative (IBI). (2010) Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature, 463, 763–768.
- Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J. and Pachter, L. (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515.
- Van Bel, M., Proost, S., Wischnitzki, E., Movahedi, S., Scheerlinck, C., Van de Peer, Y. and Vandepoele, K. (2012) Dissecting plant genomes with the PLAZA comparative genomics platform. Plant Physiol. 158, 590–600.
- Van de Peer, Y., Fawcett, J.A., Proost, S., Sterck, L. and Vandepoele, K. (2009) The flowering world, a tale of duplications. Trends Plant Sci. 14, 680–688.
- Van Der Heijden, S.A.G. and Roulund, N. (2010) Genetic gain in agronomic value of forage crops and turf: a review. In Sustainable Use of Genetic Diversity in Forage and Turf Breeding ( C. Huyghe, ed.), pp. 247–260. Dordrecht: Springer.
- Velasco, R., et al. (2010) The genome of the domesticated apple (Malus × domestica Borkh.). Nat. Genet. 42, 833–839.
- Ward, J.A., Ponnala, L. and Weber, C.A. (2012) Strategies for transcriptome analysis in nonmodel plants. Am. J. Bot. 99, 267–276.
- Winfield, M.O., Wilkinson, P.A., Allen, A.M., Barker, G.L., Coghill, J.A., Burridge, A., Hall, A., Brenchley, R.C., D'Amore, R., Hall, N., Bevan, M.W., Richmond, T., Gerhardt, D.J., Jeddeloh, J.A. and Edwards, K.J. (2012) Targeted re-sequencing of the allohexaploid wheat exome. Plant Biotechnol. J. 10, 733–742.
- Wu, Z.Q. and Ge, S. (2012) The phylogeny of the BEP clade in grasses revisited: evidence from the whole-genome sequences of chloroplasts. Mol. Phylogenet. Evol. 62, 573–578.
- Zharkikh, A., Troggio, M., Pruss, D., Cestaro, A., Eldridge, G., Pindo, M., Mitchell, J.T., Vezzulli, S., Bhatnagar, S., Fontana, P., Viola, R., Gutin, A., Salamini, F., Skolnick, M. and Velasco, R. (2008) Sequencing and assembly of highly heterozygous genome of Vitis vinifera L. cv Pinot Noir: problems and solutions. J. Biotechnol. 136, 38–43.