Volume 21, Issue 1 pp. 212-225

RESOURCE ARTICLE

Full Access

De novo assemblies of Luffa acutangula and Luffa cylindrica genomes reveal an expansion associated with substantial accumulation of transposable elements

Wirulda Pootakham,

Corresponding Author

Wirulda Pootakham

[email protected]

orcid.org/0000-0001-6721-6453

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Correspondence

Wirulda Pootakham and Sithichoke Tangphatsornruang, National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand.

Email: [email protected]; [email protected]

Search for more papers by this author

Chutima Sonthirod,

Chutima Sonthirod

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Chaiwat Naktang,

Chaiwat Naktang

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Wanapinun Nawae,

Wanapinun Nawae

orcid.org/0000-0001-9228-6963

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Thippawan Yoocha,

Thippawan Yoocha

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Wasitthee Kongkachana,

Wasitthee Kongkachana

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Duangjai Sangsrakru,

Duangjai Sangsrakru

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Nukoon Jomchai,

Nukoon Jomchai

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Sonicha U-thoomporn,

Sonicha U-thoomporn

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

John R. Sheedy,

John R. Sheedy

Chia Tai Company Limited, Phra Khanong District, Bangkok, Thailand

Search for more papers by this author

Jarunee Buaboocha,

Jarunee Buaboocha

Chia Tai Company Limited, Phra Khanong District, Bangkok, Thailand

Search for more papers by this author

Supat Mekiyanon,

Supat Mekiyanon

Chia Tai Company Limited, Phra Khanong District, Bangkok, Thailand

Search for more papers by this author

Sithichoke Tangphatsornruang,

Corresponding Author

Sithichoke Tangphatsornruang

[email protected]

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Correspondence

Wirulda Pootakham and Sithichoke Tangphatsornruang, National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand.

Email: [email protected]; [email protected]

Search for more papers by this author

Wirulda Pootakham,

Corresponding Author

Wirulda Pootakham

[email protected]

orcid.org/0000-0001-6721-6453

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Correspondence

Wirulda Pootakham and Sithichoke Tangphatsornruang, National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand.

Email: [email protected]; [email protected]

Search for more papers by this author

Chutima Sonthirod,

Chutima Sonthirod

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Chaiwat Naktang,

Chaiwat Naktang

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Wanapinun Nawae,

Wanapinun Nawae

orcid.org/0000-0001-9228-6963

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Thippawan Yoocha,

Thippawan Yoocha

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Wasitthee Kongkachana,

Wasitthee Kongkachana

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Duangjai Sangsrakru,

Duangjai Sangsrakru

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Nukoon Jomchai,

Nukoon Jomchai

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Sonicha U-thoomporn,

Sonicha U-thoomporn

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

John R. Sheedy,

John R. Sheedy

Chia Tai Company Limited, Phra Khanong District, Bangkok, Thailand

Search for more papers by this author

Jarunee Buaboocha,

Jarunee Buaboocha

Chia Tai Company Limited, Phra Khanong District, Bangkok, Thailand

Search for more papers by this author

Supat Mekiyanon,

Supat Mekiyanon

Chia Tai Company Limited, Phra Khanong District, Bangkok, Thailand

Search for more papers by this author

Sithichoke Tangphatsornruang,

Corresponding Author

Sithichoke Tangphatsornruang

[email protected]

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Correspondence

Wirulda Pootakham and Sithichoke Tangphatsornruang, National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand.

Email: [email protected]; [email protected]

Search for more papers by this author

First published: 09 August 2020

https://doi.org/10.1111/1755-0998.13240

Citations: 25

Chutima Sonthirod, Chaiwat Naktang, Wanapinun Nawae have equal contributions.

Share a link

Email
Wechat
Bluesky

Abstract

Luffa spp. (sponge gourd or ridge gourd) is an economically important vegetable crop widely cultivated in China, India and Southeast Asia. Here, we employed PacBio long-read single-molecule real-time (SMRT) sequencing to perform de novo genome assemblies of two commonly cultivated Luffa species, L. acutangula and L. cylindrica. We obtained preliminary draft genomes of 734.6 Mb and 689.8 Mb with scaffold N50 of 786,130 and 578,616 bases for L. acutangula and L. cylindrica, respectively. We also applied long-range Chicago and HiC techniques to obtain the first chromosome-scale whole-genome assembly of L. acutangula. The final assembly contained 13 pseudomolecules, corresponding to the haploid chromosome number in Luffa spp. (1n = 13, 2n = 26). The sizes of the assembled Luffa genomes are approximately twice as large as the genome assemblies of related Cucurbitaceae. A large proportion of L. acutangula (62.17%; 456.69 Mb) and L. cylindrica (56.78%; 391.65 Mb) genome assemblies contained repetitive elements. Phylogenetic analyses revealed that the substantial accumulation of transposable elements likely contributed to the expansion of the Luffa genomes. We also investigated alternative splicing events in Luffa using full-length transcript sequences obtained from PacBio Isoform Sequencing (Iso-seq). While the predominant form of alternative splicing in most plant species examined was intron retention, alternative 3’ acceptor site selection appeared to be a major event observed in Luffa. High-quality genome assemblies for L. acutangula and L. cylindrica reported here provide valuable resources for Luffa breeding and future genetics and comparative genomics studies in Cucurbitaceae.

1 INTRODUCTION

Luffa spp. (commonly known as sponge gourd, ridge gourd, loofah or dishcloth gourd (Joshi, Tiwari, Kc, Ghale, & Gyawali, 2013)) belong to the family Cucurbitaceae. They are cross-pollinated diploid species with 26 chromosomes (2n = 26) (Wu et al., 2016). The genus Luffa comprises nine species (Filipowicz & Schaefer, 2014; Prakash, Pandey, Jalli, & Bisht, 2013), two of which, Luffa acutangula (L.) Roxb. (ridge gourd) and Luffa cylindrica (L.) Roem. (sponge gourd), are domesticated (Dassanayake & Forsberg, 1988). Luffa spp. are prevalent in the subtropical regions of Asia, and it is believed to have an Asian origin (Heiser & Schilling, 1988; Heiser, Schilling, & Dutt, 1988). Both L. acutangula and L. cylindrica are widely cultivated in India, China, Thailand, Central America and Africa (Oboh & Aluyor, 2009; Rabei, Rizk, & Khedr, 2013; Wu et al., 2014). Immature Luffa fruits serve as nutrient-rich vegetables that are abundant in bioactive compounds beneficial to human health such as glycosides, alkaloids, flavonoids and sterols (Partap, Kumar, Sharma, & Jha, 2012). Mature fruits contain a tough and fibrous network of cellulose that can be used as bathing or cleaning sponges as well as biodegradable filters (Oboh & Aluyor, 2009; Zhang, Hu, Zhang, Guan, & Zhang, 2007). Luffa fruits have also been used in traditional medicine to treat anaemia, leucoderma and tumours (Manikandaselvi, Vadivel, & Brindha, 2016).

Luffa breeding programmes tend to employ conventional approaches as marker-assisted selections are still in an early stage due to limited genetic and genomic resources available. Using an F₂ population derived from an interspecific cross between L. acutangula and L. cylindrica, Cui et al. (2015) reported the first genetic linkage map constructed with 258 sequence-related amplified polymorphism (SRAP) markers (Cui et al., 2015). More recently, another linkage map was constructed using simple sequence repeat (SSR) markers developed from transcriptome sequencing (Wu et al., 2014); however, the map resolution was relatively low with an average distance of 8.11 cm between adjacent markers (Wu et al., 2016). In addition, two studies on fruit browning reported the genome-wide transcriptome sequencing of L. cylindrica (Chen et al., 2015; Zhu et al., 2017). The first attempt to perform a genome survey sequencing of Luffa was carried out by An et al. (2017). The first available assembly for L. cylindrica, obtained from a small-insert (220-bp) library, was highly fragmented with a scaffold N50 length of merely 807 bp (An et al., 2017). Recently, Zhang et al. (2020) reported a de novo assembly of the L. cylindrica genome (Zhang et al., 2020), utilizing the Pacific Biosciences (PacBio) sequencing platform, which offers kilobase-sized reads without GC-bias or systematic errors. A combination of long-read PacBio assembly and long-range HiC scaffolding technique provides an effective approach to produce a high-quality reference assembly. Here, we combined PacBio long-read single-molecule real-time (SMRT) sequencing and Chicago/HiC techniques to obtain de novo genome assemblies of two cultivated Luffa species, L. acutangula and L. cylindrica. Comparative genomics and phylogenetic analyses revealed that the substantial accumulation of repetitive elements, especially long terminal repeat (LTR) retrotransposons, contributed to the large genome sizes of Luffa spp. These high-quality genome assemblies along with the genomic variation information from L. acutangula and L. cylindrica germplasm will provide a foundation for basic and applied research, expediting the progress towards the development of elite varieties through marker-trait association analyses.

2 MATERIALS AND METHODS

2.1 Plant materials and DNA/RNA isolation

Sixty-one ridge gourd (L. acutangula) and twenty-three sponge gourd (L. cylindrica) accessions maintained at Chia Tai Company Limited (Thailand) were used in this study. Fresh healthy leaf tissues were collected, immediately frozen in liquid nitrogen and stored at −80°C until DNA extraction. To obtain high-molecular-weight DNA for PacBio single-molecule real-time (SMRT) sequencing, frozen tissues were pulverized in liquid nitrogen, and the CTAB buffer (2% CTAB, 1.4 M NaCl, 2% PVP, 20 mM EDTA pH 8.0, 100 mM Tris-HCl pH 8.0, 0.4% SDS) was added. DNA was extracted from the aqueous phase twice using 25:24:1 phenol:chloroform:isoamyl alcohol and precipitated in 2.5 volumes of absolute ethanol. DNA pellets were washed with 70% ethanol twice, air-dried and resuspended in 10 mM Tris-HCl pH 8.0. DNA samples were subsequently purified with the Ampure® PB beads (Pacific Biosciences, Menlo Park, CA, USA), and the DNA integrity was assessed using the Pippin Pulse Electrophoresis System (Sage Science, Beverly, MA, USA). Total RNA was extracted from above-ground and root tissues using CTAB buffer and 25:24:1 phenol:chloroform:isoamyl alcohol as mentioned above. RNA was precipitated overnight in ¼ volumes of 8M LiCl, washed with 70% ethanol, air-dried and resuspended in RNase-free water. Poly(A) mRNAs were enriched from total RNA samples using the Dynabeads mRNA Purification Kit (Thermo Fisher Scientific, Waltham, MA, USA).

2.2 Genomic and transcriptomics (Iso-seq) library preparation and sequencing

SMRTbell libraries with an insert size of 12,000 nt were constructed for the PacBio RSII sequencing system. Sequencing was performed with P6-C4 polymerase and chemistry using 360-min movie times according to manufacturer's protocols. For short-read whole-genome shotgun sequencing of the 84 accessions described previously, DNA was isolated using the protocol reported in Pootakham et al. (2017). Illumina paired-end libraries (2 × 150 bp) were prepared and sequenced on the HiSeqX by NovogeneAIT Genomics Singapore Pte Ltd (Singapore). Iso-seq libraries were prepared according to a previously published protocol (Pootakham et al., 2017) using the SMARTer PCR cDNA Synthesis Kit (Clontech, Mountain View, USA) and size-selected using the BluePippin Size Selection System (Sage Science, Beverly, USA) into 1–2 kb, 2–3 kb and 3–6 kb bins. Sequencing was performed using polymerase chemistry and the movie time mentioned above.

2.3 Chicago library preparation and sequencing

Chicago library preparation and sequencing were carried out by Dovetail Genomics (Scotts Valley, CA, USA). A Chicago library was prepared as described previously (Putnam et al., 2016). Briefly, ~500 ng of HMW gDNA (mean fragment length = 61) was reconstituted into chromatin in vitro and fixed with formaldehyde. Fixed chromatin was digested with DpnII, the 5’ overhangs filled in with biotinylated nucleotides, and then, free blunt ends were ligated. After ligation, cross-links were reversed and the DNA purified from protein. Purified DNA was treated to remove biotin that was not internal to ligated fragments. The DNA was then sheared to ~350-bp mean fragment size, and sequencing libraries were generated using NEBNext Ultra enzymes and Illumina-compatible adapters. Biotin-containing fragments were isolated using streptavidin beads before PCR enrichment of each library. The libraries were sequenced on an Illumina HiSeq X to produce 107 million 2 × 150 bp paired-end reads, which provided 6.30 × physical coverage of the genome (1–100 kb pairs).

2.4 Dovetail HiC library preparation and sequencing

Dovetail HiC library preparation and sequencing were carried out by Dovetail Genomics (Scotts Valley, CA, USA). A Dovetail HiC library was prepared in a similar manner as described previously (Lieberman-Aiden et al., 2009). Briefly, for each library, chromatin was fixed in place with formaldehyde in the nucleus, and then, extracted fixed chromatin was digested with DpnII, the 5’ overhangs filled in with biotinylated nucleotides, and then, free blunt ends were ligated. After ligation, cross-links were reversed and the DNA purified from protein. Purified DNA was treated to remove biotin that was not internal to ligated fragments. The DNA was then sheared to ~350-bp mean fragment size, and sequencing libraries were generated using NEBNext Ultra enzymes and Illumina-compatible adapters. Biotin-containing fragments were isolated using streptavidin beads before PCR enrichment of each library. The libraries were sequenced on an Illumina HiSeq X to produce 101 million 2 × 150 bp paired-end reads, which provided 393.88 x physical coverage of the genome (10–10,000 kb pairs).

2.5 De novo genome assembly

A total of 2,823,498 and 2,404,358 raw reads (totalling 39.73 and 39.65 Gb) from L. acutangula and L. cylindrica, respectively, were subjected to read correction, trimming, overlap detection and de novo assembly by Canu v1.8 (Koren et al., 2017) using the following parameters: genomeSize = 790m correctedErrorRate = 0.040. An estimated genome size of 790 Mb (An et al., 2017) was assumed for both L. acutangula and L. cylindrica, and besides the parameters mentioned above, default parameters were used in the assembly process. The polishing was carried out using the GenomicConsensus package in the SMRT Analysis software suite version 2.3. The GenomicConsensus package contains a main driver programme (variantCaller), which provides two consensus/variant calling algorithms: Arrow and Quiver (https://github.com/PacificBiosciences/GenomicConsensus). The PacBio preliminary assembly was used as an input for the subsequent scaffolding with HiRise. Calculation of k-mer depth distribution for clean Illumina sequence reads and estimation of L. acutangula and L. cylindrica genome sizes were performed using Jellyfish software version 2.2.10 with the C-setting (Marcais & Kingsford, 2011).

2.6 Scaffolding the assembly of L. acutangula with HiRise

The input de novo assembly, shotgun reads, Chicago library reads and Dovetail HiC library reads were used as input data for HiRise, a software pipeline designed specifically for using proximity ligation data to scaffold genome assemblies (Putnam et al., 2016). An iterative analysis was conducted. First, shotgun and Chicago library sequences were aligned to the draft input assembly using a modified SNAP read mapper (http://snap.cs.berkeley.edu). The separations of Chicago read pairs mapped within draft scaffolds were analysed by HiRise to produce a likelihood model for genomic distance between read pairs, and the model was used to identify and break putative misjoins, to score prospective joins, and make joins above a threshold. After aligning and scaffolding Chicago data, Dovetail HiC library sequences were aligned and scaffolded following the same method. After scaffolding, shotgun sequences were used to close gaps between contigs.

2.7 Assembly quality assessment

Prior to short-read alignment, we used TrimGalore-0.6.0 (https://github.com/FelixKrueger/TrimGalore) for trimming adapter sequences and removing low-quality bases from Illumina reads. Iso-seq reads were processed using the Iso-Seq3 pipeline (https://github.com/PacificBiosciences/IsoSeq) with default parameter setting. The quality of the assemblies was evaluated by aligning short-read Illumina sequences, Iso-seq transcript sequences and available genome/transcriptome sequences from previous studies (An et al., 2017; Chen et al., 2015; Zhu et al., 2017) using BLASTN at an e-value cut-off of 10^–10. The completeness of the final genome assemblies was assessed using Benchmarking Universal Single-Copy Orthologs (BUSCO) (Simão, Waterhouse, Ioannidis, Kriventseva, & Zdobnov, 2015). The BUSCO pipeline version 3 was used to test for the presence and completeness of orthologs using the Embryophyta OrthoDB release 9 (Kriventseva et al., 2015).

2.8 Annotation of repetitive elements and repeat masking

To generate a de novo repeat library, RepeatModeler version 2.0.1 (http://www.repeatmasker.org/RepeatModeler/) was used to predict transposable elements in the unannotated genome assemblies. Two de novo repeat-finding programmes, RECON version 1.08 and RepeatScout version 1.0.5, were employed to identify the boundaries of repetitive elements and to build consensus models of interspersed repeats. To ensure that repeat sequences in the library did not contain large families of protein-coding sequences that are not transposable elements, we aligned them to GenBank's nr protein database using BLASTX (e-value cut-off of 10^–6). The custom Luffa-specific repeat library generated by RepeatModeler along with the repetitive sequences in the RepBase plant repeat database (20,150,807; https://www.girinst.org/) was used to mask the assembled genome sequences using RepeatMasker version 4.0.9_p2 (default parameters) (Tempel, 2012). To estimate the insertion time for the LTR retrotransposons, we first employed the LTR_FINDER (Ou & Jiang, 2019) and LTRHarvest (Ellinghaus, Kurtz, & Willhoeft, 2008) programs to predict full-length LTR using default parameter setting. We subsequently used the LTR_retriever program (Ou & Jiang, 2018) to filter out false positives from the initial prediction inputs from LTR-FINDER and LTRHarvest. The insertion times of LTRs (T) were calculated according to the following formula: T = K/2μ, where K is the divergence rate calculated with the Jukes-Cantor model for non-coding sequences and μ is neutral mutation rate (4.5e⁻⁹; estimated based on known divergence time between cucumber and melon (Sebastian, Schaefer, Telford, & Renner, 2010)).

2.9 Gene annotation

Evidences from transcriptome-based prediction, ab initio gene prediction and homology-based prediction were combined to predict protein-coding sequences in the unmasked Luffa genomes using EvidenceModeler (EVM) version 1.1.1 r2015-07-03 (Haas et al., 2008). Transcriptome-based prediction methods combined information from PacBio Iso-seq data obtained from leaf, root, apical meristem and flower and available short-read transcriptome data (Chen et al., 2015; Wu et al., 2014; Zhu et al., 2017). Full-length transcripts were mapped to the genome assemblies using Genomic Mapping and Alignment Program (GMAP; version r20160630) (Wu & Watanabe, 2005), and short-read RNA-seq data were mapped to the assemblies during the initial step of annotation using the PASA2 pipeline version 2.0.1 (Haas et al., 2008). Protein sequences from L. cylindrica, Cucumis sativus, Cucumis melo, Citrullus lanatus, Arabidopsis thaliana, Cucurbita maxima and Mormodica charantia obtained from public databases were aligned to the unmasked genome using AAT version 1.52 (Huang, Adams, Zhou, & Kerlavage, 1997). Two ab initio gene predictors were run on the unmasked assemblies. Protein-coding gene predictions were obtained with Augustus version 3.2.1 (Stanke, Steinkamp, Waack, & Morgenstern, 2004) trained with C. sativus, C. melo, C. lanatus, A. thaliana, C. maxima and M. charantia PASA transcriptome alignment assembly and BRAKER (Hoff, Lange, Lomsadze, Borodovsky, & Stanke, 2016; Hoff, Lomsadze, Borodovsky, & Stanke, 2019) using Luffa Iso-seq and RNA-seq (Chen et al., 2015) alignment files as inputs. All gene predictions were integrated by EVM to generate consensus gene models using the following weight for each evidence type: PASA2 – 1, GMAP – 0.5, AAT – 0.3, Augustus – 0.3 and BRAKER – 0.3. The positions of annotate genes were cross-checked with those of known repeats, and any gene that had more than 20% overlapping sequence with repetitive elements were excluded from the list of annotated genes.

2.10 Comparative genomics and phylogenetic analysis

We used OrthoFinder (Emms & Kelly, 2019) to identify orthologous groups in L. acutangula, L. cylindrica and six other cucurbits (melon, cucumber, bottle gourd, watermelon, squash, pumpkin, bitter melon), three rosid species (peach, Arabidopsis, grape), one asteroid species (tomato) and one monocot (rice). Protein sequences from single-copy orthologous groups were used to construct phylogenetic tree using the RAxML-ng program (Kozlov, Darriba, Flouri, Morel, & Stamatakis, 2019). We first aligned protein sequences in each single-copy orthologous group with MUSCLE (Edgar, 2004) and removed alignment gaps with trimAl (Capella-Gutiérrez, Silla-Martínez, & Gabaldón, 2009) using the automated1 heuristic method. All alignment blocks were concatenated using catsequences program (https://github.com/ChrisCreevey/catsequences), and substitution model for each alignment block was estimated using ModelTest-NG program (Darriba et al., 2019). The outputs were subsequently used to compute a maximum-likelihood phylogenetic tree. Divergence times of species in the phylogenetic tree were estimated with the MCMCtree program (PAML4 package) (Yang, 2007) using the relaxed-clock model with the known divergence time between cucumber and melon, which was estimated at 8.4–11.8 million years ago (MYA) Sebastian et al., 2010).

2.11 Cucurbitaceae evolutionary analysis

We adapted the method described in Ren, Huang, and Cannon (2019) to reconstruct the ancestral genome of L. acutangula, pumpkin, bottle gourd, watermelon, melon and cucumber. In brief, we used OrthoFinder to identify orthologous groups in these six species. Syntenic blocks were then constructed from the orthologous groups using the DAGchainer program (Haas, Delcher, Wortman, & Salzberg, 2004) in the Synima pipeline (Farrer, 2017). The outputs from DAGchainer were used to specify “markers” representing features that were shared by the selected genomes using the scripts provided in Ren et al. (2019). In the following step, we used MLGO web service (http://www.geneorder.org/server.php) (Hu, Lin, & Tang, 2014) to infer the ancestral genome from the order of the markers in each individual genome and the information from the phylogenetic tree constructed using single-copy orthologs (Kozlov et al., 2019).

2.12 Luffa phylogenetic relationship and population structure analysis

Short-read whole-genome shotgun sequences of the 84 Luffa accessions were used for the phylogenetic analysis. Illumina reads were mapped to their respective genome assemblies using Minimap2 version 2.11-r797-v03 (Li, 2018), and single nucleotide polymorphism (SNP) markers were called using GATK HaplotypeCaller 3.8 (McKenna et al., 2010). We used a set of 11,704 SNP markers at fourfold-degenerate sites with the following criteria: (a) a minor allele frequency >0.05; (b) depth coverage between 20X and 200X; (c) fewer than 10% missing data. R package was used to construct a neighbour-joining tree with 1,000 bootstrap replicates (Paradis, Claude, & Strimmer, 2004; Team, 2016). We applied the same set of SNPs to examine the population structure using STRUCTURE program (version 2.3.4) (Falush, Stephens, & Pritchard, 2003) using 10,000 iterations with the number of clusters (K) of 2–4.

3 RESULTS

3.1 Genome sequencing and assembly

We selected two Thai elite inbred lines for sequencing: one ridge gourd (L. acutangula) cultivar AG-4 and one sponge gourd (L. cylindrica) cultivar SO-3. A whole-genome shotgun strategy was used to sequence and assemble both Luffa genomes from PacBio long-read data. A total of 2,823,498 (39.73 Gb) and 2,404,358 raw reads (39.65 Gb), representing 50.29X and 50.19X coverage based on the estimated genome size of 789.97 Mb (An et al., 2017), were generated for L. acutangula and L. cylindrica, respectively. De novo assemblies of PacBio sequences from L. acutangula and L. cylindrica yielded draft genomes of 734.6 Mb and 689.8 Mb in 2,280 and 3,570 scaffolds with scaffold N50 of 786,130 (L50 = 220 scaffolds) and 578,616 bases (L50 = 316 scaffolds), respectively (Table 1). Analyses of k-mer distribution of the genome sequencing reads provided estimated genome sizes of 760 Mb and 773 Mb for L. acutangula and L. cylindrica, respectively, close to the figure previously reported for L. cylindrica (An et al., 2017; Zhang et al., 2020) (Figure S1). The heterozygosity of L. acutangula and L. cylindrica was 0.41 and 0.25, respectively. The preliminary L. acutangula genome was further assembled using the Chicago (in vitro proximity ligation; 107 million read pairs) and HiC (in vivo fixation of chromosomes; 101 million read pairs) library data scaffolded with the HiRise software (Dovetail Genomics, Santa Cruz, CA, USA). The final assembly contained 13 chromosome-scale pseudomolecules (hereafter referred to as chromosomes, numbered according to size; Figure 1, Figure S2) greater than 1 Mb in length, corresponding to the haploid chromosome number in Luffa spp (1n = 13, 2n = 26). The 13 chromosomes covered 618,333,454 bases or 84.06% of the 735-Mb L. acutangula assembly.

TABLE 1. Assembly statistics of L. acutangula and L. cylindrica genomes

	PacBio	PacBio + Chicago	PacBio + Chicago +HiC	PacBio
	L. acutangula			L. cylindrica
N50 scaffold size (bases)	786,130	104,669	47,609,564	578,616
L50 scaffold number	220	1,430	8	316
N75 scaffold size (bases)	360,404	46,914	42,543,272	222,372
L75 scaffold number	571	4,103	12	789
N90 scaffold size (bases)	145,274	23,392	35,168	62,111
L90 scaffold number	1,037	7,385	834	1,667
Total (bases)	734,615,403	734,942,309	735,610,612	689,872,192
Number of scaffolds	2,280	15,410	7,871	3,570
Number of scaffolds ≥ 100 kb	1,230	1,547	46	1,267
Number of scaffolds ≥ 1 Mb	143	50	13	122
Number of scaffolds ≥ 10 Mb	—	—	13	—
Longest scaffold (bases)	8,225,708	3,510,179	56,032,585	7,054,290
% N	—	0.0004	0.00135	—
GC content (%)	36	36	36	36
BUSCO evaluation (% completeness)	—	—	92.7	93.0

Details are in the caption following the image — **FIGURE 1**
Open in figure viewer PowerPoint

Genomic landscape of *L. acutangula* chromosomes. (a) Physical map of 13 assembled chromosomes (Mb scale) numbered according to size. (b) Repeat density represented by proportion of genomic regions covered by repetitive sequences in 500-kb windows. (c) Gene density represented by number of genes in 500-kb windows. (d) SNP density represented by number of SNP markers in 500-kb windows. (e) GC content represented by percentage of G + C bases in 500-kb windows. Syntenic blocks are depicted by connected lines [Colour figure can be viewed at wileyonlinelibrary.com]

To assess the quality of our de novo assemblies, we aligned genomic DNA short reads back to the genomes. Approximately 91% and 93.9% of the reads from Illumina shotgun libraries could be mapped back to L. acutangula and L. cylindrica genomes, respectively. We also aligned Iso-seq reads and publicly available RNA-seq reads (from L. cylindrica (Chen et al., 2015; Zhu et al., 2017)) to the assemblies, and 99.0% of Iso-seq transcripts were mapped to each respective genome while 98.3% of the RNA-seq reads could be mapped to L. cylindrica genome. To further evaluate the completeness of both genome assemblies, we checked the gene content with the BUSCO software using a plant-specific database of 1,440 genes (Simão et al., 2015). Our gene predictions for L. acutangula and L. cylindrica recovered 92.7% and 93.0% of the highly conserved orthologs in the Embryophyta lineage, respectively (Table 1). These evidences supported high-quality assembly of both Luffa genomes.

A combination of ab initio prediction and transcript evidence obtained from both Iso-seq and RNA-seq data was used for gene prediction. The genome annotations of L. acutangula and L. cylindrica contained 42,211 and 50,340 predicted gene models, of which 32,233 and 43,828 were protein-coding genes, respectively (Tables S1 and S2; Figures S3–S5). Genes were preferentially distributed near the telomeres for most of the chromosomes (Figure 1). The numbers of protein-coding genes in Luffa were higher than the figures reported for other cucurbit genomes (cucumber = 23,248 (Li et al., 2011), watermelon = 23,440 (Guo et al., 2013), bottle gourd = 22,472 (Wu et al., 2017) and wax gourd = 27,467 (Xie et al., 2019)). In total, we found transcript support for 25,402 (78.8%; Iso-seq support) and 37,931 (86.5%; Iso-seq and RNA-seq support (Chen et al., 2015)) protein-coding genes in L. acutangula and L. cylindrica, respectively. The proportions of protein-coding genes supported by transcript evidence in Luffa were comparable to that of the bottle gourd genome (79.3%) (Wu et al., 2017) but slightly higher than that of the wax gourd genome (72.7%) (Xie et al., 2019). The average gene sizes for L. acutangula and L. cylindrica were 2,866 and 2,582 nt with 4.50 and 4.28 exons per gene, respectively (Table S1).

3.2 Comparative genomics and phylogenetic analyses

To investigate the evolutionary relationships between Luffa and other Cucurbitaceae species, we analysed the gene sets from nine cucurbits: cucumber (Cucumis sativus), melon (Cucumis melo), watermelon (Citrullus lanatus), bottle gourd (Lagenaria siceraria), squash (Cucurbita moschata) pumpkin (Cucurbita maxima), bitter gourd (Momordica charantia), L. acutangula and L. cylindrica; three rosids: Arabidopsis (Arabidopsis thaliana), grape (Vitis vinifera) and peach (Prunus persica); one asteroid: tomato (Solanum lycopersicum) and one monocot: rice (Oryza sativa). Of 508,876 input proteins from 14 species, 433,861 (85.25%) were clustered into 19,759 orthologous groups. Sequence information from single-copy orthologous genes was used to construct a maximum-likelihood phylogenetic tree, and the divergence time was estimated based on the topology and branch length, revealing that the two Luffa species sequenced in this work diverged about 7.9 million years ago (MYA; Figure 2a). The ancestor of Luffa formed a sister clade to the ancestor of the tribes Cucurbiteae and Benincaseae, and the two clades diverged about 33.62 MYA. This placement in the phylogenetic tree was consistent with previous reports (Chomicki, Schaefer, & Renner, 2019; Renner & Schaefer, 2017).

We used the transversion rate at fourfold-degenerate synonymous sites (4DTv approach) to analyse orthologous gene pairs in order to estimate relative timing of evolutionary divergence between L. acutangula and closely related cucurbit species. This analysis showed that the speciation event between Luffa and other cucurbits occurred right around the time where the duplication event in C. maxima took place (Figure 2b and c). The distribution of 4DTvs among paralogous gene pairs for most species including Luffa had a peak that ranged from 0.3 to 0.6, with the maximum around 0.4 (Figure 2c). Examination of synteny between L. acutangula and L. cylindrica revealed extensive conservation in genome structure (Figure S6). With the exception of L. acutangula chromosomes 6 and 12, the remaining chromosomes exhibited a one-to-one relationship with L. cylindrica chromosomes. There appeared to be a shuffling of regions between L. acutangula chromosomes 6 and 12 and L. cylindrica chromosomes 4 and 13 (Figure S6).

3.3 Expansion of repetitive elements leads to larger genome sizes in Luffa

We identified 456.69 and 391.65 Mb of repetitive elements in L. acutangula and L. cylindrica assemblies, representing 62.17% and 56.78% of the genomes, respectively (Figure 3, Table S3). The length and proportion of repetitive sequences in Luffa genomes were higher than those in the watermelon (160 Mb; 45.2%) (Guo et al., 2013), bottle gourd (147 Mb; 46.9%) (Wu et al., 2017) and bitter gourd (159 Mb; 52.5%) (Matsumura et al., 2019) but not as high as in the wax gourd genome (689 Mb; 75.5%) (Xie et al., 2019). LTR retrotransposons represented the majority of repetitive elements, comprising 41% and 35% of the genomes and 66% and 61% of all repetitive elements in L. acutangula and L. cylindrica, respectively (Figure 3a, Table S3). The most abundant LTR superfamilies, Gypsy and Copia occupied 34% and 31% of the repetitive elements in L. acutangula and 31% and 26% of the repetitive elements in L. cylindrica, respectively (Table S3). A genome-wide distribution plot of each repeat type showed that DNA elements, LTRs, long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs) were enriched near the centromeric regions (Figure 1). The sizes of the assembled Luffa genomes are approximately twice as large as the genome assemblies of related Cucurbitaceae (Garcia-Mas et al., 2012; Guo et al., 2013; Huang et al., 2009; Ruggieri et al., 2018; Sun et al., 2017; Wu et al., 2017), with the exception of the wax gourd genome (Xie et al., 2019). The length of LTR elements in Luffa is 8-fold longer than that in cucumber (Huang et al., 2009) and 6-fold longer than in bottle gourd (Wu et al., 2017) (Table S4). Without evidence supporting a recent whole-genome duplication event in Luffa, the substantial accumulation of transposable elements especially the LTR likely contributes to the expansion of the Luffa genomes. The insertion times for LTR retrotransposons were estimated based on predicted full-length LTRs in the genome assembly. Interestingly, LTRs started accumulating after the divergence of Luffa and Benincaseae species (Figure 3b, Figure S7). LTRs appeared to accumulate earlier in the wax gourd (around 6–8 MYA), melon (4–6 MYA) and bitter gourd (2–4 MYA) genomes than in the Luffa genome. A substantial proportion of LTRs in the Luffa genome has proliferated relatively recently (0–1 MYA), similar to the situations observed in the cucumber and squash genomes (Figure S7).

3.4 Cucurbit genome evolution

To investigate chromosome evolution of Luffa and other cucurbits including cucumber, melon, watermelon, bottle gourd and pumpkin, we analysed syntenic blocks across their genomes. Twelve ancestral chromosomes were inferred on the basis of syntenic and phylogenetic relationships among cucurbit genomes based on 20,026 orthologous groups. Three L. acutangula chromosomes (chromosomes 1, 2 and 9) exhibited a one-to-one relationship with the M. charantia genome (chromosomes 1, 11 and 2, respectively), while only two L. acutangula chromosomes exhibited one-to-one syntenic relationship with C. melo and L. siceraria and one-to-two relationship with C. maxima (due to the whole-genome duplication event in the Cucurbita genus; Figure 4). Relative to the twelve ancestral chromosomes configuration, L. acutangula and M. charantia genomes appeared to best preserve the ancestral karyotype, followed by C. maxima genome with six out of twenty chromosomes (chromosomes 3, 7, 10, 13, 15 and 20) remaining in the ancestral state despite the recent whole-genome duplication. All of C. sativus and C. lanatus chromosomes and most L. siceraria and C. melo chromosomes were derived from a series of fusion and fission events.

3.5 Luffa population structure

To investigate genetic diversity and variations in Luffa germplasms, 61 L. acutangula and 23 L. cylindrica accessions were selected and shotgun-sequenced using Illumina sequencing platform. A total of 1,402 and 558 Gb of high quality, cleaned data with an average of 32.8- and 34.6-fold depth coverage were mapped to L. acutangula and L. cylindrica genome assemblies with an average mapping rate of 92.2% and 93.9%, respectively (Table S5). The population structure was explored using fourfold-degenerate sites using STRUCTURE (Pritchard, Stephens, & Donnelly, 2000). We tested for a population structure ranging from 2 (K = 2) up to 4 subpopulations (K = 4; Figure 5). The results supported clustering of L. acutangula and L. cylindrica accessions into three distinct subgroups (Figure 5). Accessions originating from the same geographical location appeared to cluster together.

3.6 Identification of alternative splicing variants

The availability of both the whole-genome sequence and the full-length transcript isoforms enabled the investigation of alternative splicing events in Luffa. To the best of our knowledge, this is the first report on alternative splicing in Luffa spp. We employed the TAPIS pipeline (version 1.2.1) (Abdel-Ghany et al., 2016) and SpliceGrapher program (Rogers, Thomas, Reddy, & Ben-Hur, 2012) to identify transcript variants exhibiting the following alternative splicing events: alternative 5’ donor site selection, alternative 3’ acceptor site selection, exon skipping and intron retention. A total of 1,191 and 1,641 alternative splicing events were detected in L. acutangula and L. cylindrica, respectively (Figure 6a). While alternative 3’ acceptor site selection appeared to be a major event observed in both species (41% in L. acutangula and 37% in L. cylindrica), followed by alternative 5’ donor site selection (29% in L. acutangula and 28% in L. cylindrica), exon skipping was the least prevalent mode of alternative splicing identified in L. acutangula, and intron retention was the least common form found in L. cylindrica (Figure 6a). In contrast to the observations made in Luffa, intron retention has been reported as the most prevalent alternative splicing mechanism in several plant species such as Arabidopsis (Marquez, Brown, Simpson, Barta, & Kalyna, 2012), soya bean (Shen et al., 2014), cotton (Feng, Xu, Liu, Cui, & Zhou, 2019), maize (Thatcher et al., 2016; Wang et al., 2016) and rice (Zhang et al., 2019). Different types of alternative splicing were also observed in a combinatorial manner in a single gene. Figure 6b illustrated an example of a transcript that was subjected to multiple forms of alternative splicing. Alternative splicing serves to diversify an organism's transcriptome, and recent data suggest that it is one of the mechanisms that plants use to adapt to a changing environment (Reddy, 2007; Shang, Cao, & Ma, 2017; Wang & Brendel, 2006). Further studies using RNA samples from different tissues and various growth conditions, including biotic and abiotic stresses, will help elucidate the complete repertoire of transcript isoforms in Luffa species.

3.7 Cucurbitacin biosynthesis pathway in Luffa

With the exception of bitter gourd, bitterness is an economically undesirable trait in other cucurbits including Luffa. We identified ten putative cucurbitacin biosynthetic enzymes in L. acutangula and L. cylindrica, including an oxidosqualene cyclase (Bi), eight cytochrome P450 (CYP) and an acyltransferase (ACT), similar to the number reported in watermelon (Zhou et al., 2016). Since cucurbitacin biosynthetic pathways appeared to be conserved among cucurbits, we carried out a comparative analysis among Luffa, cucumber, watermelon and melon. We identified colinear regions on L. acutangula (chromosome 4) and L. cylindrica (chromosome 3) where the Bi clusters (containing Bi, three CYPs, ACT) were localized (Figure S8). These clusters were highly conserved in cucumber (chromosome 6), watermelon (chromosome 6) and melon (chromosome 11)(Zhou et al., 2016). The syntenic regions containing La490 and La890 (Figure S8) were also conserved except for the presence of two paralogous genes in the watermelon genome (Cl890A and Cl890B). The colinear regions encompassing La510 and La560 exhibited a lower degree of synteny among Luffa, cucumber watermelon and melon (Figure S8) with full-length Cs540 and Cs550 orthologs being truncated in melon and in missing in Luffa and watermelon.

4 DISCUSSION

Here, we present de novo genome assemblies of two cultivated Luffa species (L. acutangula and L. cyclidrica) obtained from long-read PacBio sequencing. We obtained preliminary draft genomes of 734.6 Mb and 689.8 Mb with scaffold N50 of 786,130 and 578,616 bases for L. acutangula and L. cylindrica, respectively. The sizes of the preliminary assemblies were 96.5% and 89% of the estimated genome sizes in L. acutangula and L. cylindrica, respectively, based on k-mer analyses. A previous publication on L. cylindrica genome assembly also reported that the assembly size (669 Mb) was smaller than the estimated genome size (737 Mb) based on k-mer analysis (91% of the estimated size) (Zhang et al., 2020). The discrepancy observed between the assembly size and the k-mer size estimation could be due to a large number of repetitive sequences (Pflug, Holmes, Burrus, Spencer Johnston, & Maddison, 2020). We further assembled L. acutangula genome using the long-range Chicago and HiC techniques. The final assembly is the first high-quality, chromosome-scale genome assembly in L. acutangula. The numbers of protein-coding genes in L. acutangula and L. cylindrica obtained for our assemblies were similar to the figure previously reported for L. cylindrica genome (Zhang et al., 2020), while the proportions of repetitive sequences identified in our genome assemblies (62% for L. acutangula and 57% for L. cylindrica) were slightly lower than the percentage reported for L. cylindrica (73%) in Zhang et al. (2020). The availability of this assembly enabled comparative genomics/phylogenetic studies of the Cucurbitaceae family members. The sequence information from single-copy orthologous genes revealed that L. acutangula and L. cylindrica diverged approximately 7.9 MYA. The assembly also revealed no evidence supporting a recent whole-genome duplication event in Luffa, unlike in C. maxima and C. moschata. We also observed a substantial accumulation of transposable elements, especially the LTR retrotransposons, which likely contributed to the expansion of the Luffa genomes. Obtaining elite, nonbitter varieties for human consumption is one of the major goals for Luffa breeding programmes. The availability of Luffa genome assemblies facilitated the identification of putative cucurbitacin biosynthetic genes. Collinearity analysis among cucurbit species showed that the Bi clusters as well as other regions encompassing CYP genes exhibited synteny among Luffa, cucumber, watermelon and melon. Our high-quality genome assemblies along with the genomic variation information from L. acutangula and L. cylindrica germplasms provide invaluable resources for studying marker-trait association at a whole-genome level and for future comparative genomics and phylogenetic studies in Cucurbitaceae.

ACKNOWLEDGEMENTS

This study was supported by the National Omics Center under the National Science and Technology Development Agency, Thailand, grant number: 1000221.

CONFLICT OF INTEREST

The authors declare no competing financial interests.

AUTHOR CONTRIBUTIONS

Research study was designed by W.P. and S.T. Laboratory work was performed by W.P, T.Y., D.S., N.J., S.U., J.R.S., J.B. and S.M. (sample collection, DNA and RNA extraction, library construction and sequencing). Bioinformatics analyses were performed by C.S., C.N., W.N. and W.K. The manuscript was written and revised by W.P. and all authors reviewed it.

Open Research

DATA AVAILABILITY STATEMENT

L. acutangula and L. cyclindrica genome assemblies and Iso-seq data have been submitted to the DDBJ/EMBL/GenBank databases under the following accession numbers: JAATNF000000000 (L. acutangula genome assembly), SRR11445640 (L. acutangula Iso-seq data), JAAVXE000000000 (L. cylindrica genome assembly) and SRR11452010 (L. cylindrica Iso-seq data). The scripts used to perform the assembly and analyses of the Luffa genomes are available at https://github.com/BeeKento/Luffa-acutangula-genome.

Supporting Information

REFERENCES

Abdel-Ghany, S. E., Hamilton, M., Jacobi, J. L., Ngam, P., Devitt, N., Schilkey, F., … Reddy, A. S. (2016). A survey of the sorghum transcriptome using single-molecule long reads. Nature Communications, 7, 11706. https://doi.org/10.1038/ncomms11706
10.1038/ncomms11706
CAS PubMed Web of Science® Google Scholar
An, J., Yin, M., Zhang, Q., Gong, D., Jia, X., Guan, Y., & Hu, J. (2017). Genome survey sequencing of Luffa Cylindrica L. and microsatellite high resolution melting (SSR-HRM) analysis for genetic relationship of Luffa genotypes. International Journal of Molecular Sciences, 18, 1942.
10.3390/ijms18091942
Web of Science® Google Scholar
Capella-Gutiérrez, S., Silla-Martínez, J. M., & Gabaldón, T. (2009). trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics (Oxford, England), 25, 1972–1973. https://doi.org/10.1093/bioinformatics/btp348
10.1093/bioinformatics/btp348
CAS PubMed Web of Science® Google Scholar
Chen, X., Tan, T., Xu, C., Huang, S., Tan, J., Zhang, M., … Xie, C. (2015). Genome-wide transcriptome profiling reveals novel insights into Luffa cylindrica browning. Biochemical and Biophysical Research Communications, 463, 1243–1249. https://doi.org/10.1016/j.bbrc.2015.06.093
10.1016/j.bbrc.2015.06.093
CAS PubMed Web of Science® Google Scholar
Chomicki, G., Schaefer, H., & Renner, S. S. (2019). Origin and domestication of Cucurbitaceae crops: Insights from phylogenies, genomics and archaeology. New Phytologist, 226(5), 1240–1255. https://doi.org/10.1111/nph.16015
10.1111/nph.16015
PubMed Web of Science® Google Scholar
Cui, J., Cheng, J., Wang, G., Tang, X., Wu, Z., Lin, M., … Hu, K. (2015). QTL analysis of three flower-related traits based on an interspecific genetic map of Luffa. Euphytica, 202, 45–54. https://doi.org/10.1007/s10681-014-1208-z
10.1007/s10681-014-1208-z
Web of Science® Google Scholar
Darriba, D., Posada, D., Kozlov, A. M., Stamatakis, A., Morel, B., & Flouri, T. (2019). ModelTest-NG: A new and scalable tool for the selection of DNA and protein evolutionary models. Molecular Biology and Evolution, 37, 291–294. https://doi.org/10.1093/molbev/msz189
10.1093/molbev/msz189
Web of Science® Google Scholar
Dassanayake, M. D., & Forsberg, F. R. (1988). A revised handbook of the flora of Ceylon. Sri Lanka: CRC Press.
Google Scholar
Edgar, R. C. (2004). MUSCLE: A multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics, 5, 113.
10.1186/1471-2105-5-113
CAS PubMed Web of Science® Google Scholar
Ellinghaus, D., Kurtz, S., & Willhoeft, U. (2008). LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics, 9, 18. https://doi.org/10.1186/1471-2105-9-18
10.1186/1471-2105-9-18
CAS PubMed Web of Science® Google Scholar
Emms, D. M., & Kelly, S. (2019). OrthoFinder: Phylogenetic orthology inference for comparative genomics. Genome Biology, 20, 238. https://doi.org/10.1186/s13059-019-1832-y
10.1186/s13059-019-1832-y
PubMed Web of Science® Google Scholar
Falush, D., Stephens, M., & Pritchard, J. K. (2003). Inference of population structure using multilocus genotype data: Linked loci and correlated allele frequencies. Genetics, 164, 1567–1587.
10.1111/j.1365-294X.2005.02553.x
CAS PubMed Web of Science® Google Scholar
Farrer, R. A. (2017). Synima: A Synteny imaging tool for annotated genome assemblies. BMC Bioinformatics, 18, 507. https://doi.org/10.1186/s12859-017-1939-7
10.1186/s12859-017-1939-7
PubMed Web of Science® Google Scholar
Feng, S., Xu, M., Liu, F., Cui, C., & Zhou, B. (2019). Reconstruction of the full-length transcriptome atlas using PacBio Iso-Seq provides insight into the alternative splicing in Gossypium australe. BMC Plant Biology, 19, 365. https://doi.org/10.1186/s12870-019-1968-7
10.1186/s12870-019-1968-7
PubMed Web of Science® Google Scholar
Filipowicz, N., & Schaefer, H. (2014). Revisiting Luffa (Cucurbitaceae) 25 years after C. Heiser: Species boundaries and application of names tested with plastid and nuclear DNA sequences. Systematic Botany, 39, 205–215.
10.1600/036364414X678215
Web of Science® Google Scholar
Garcia-Mas, J., Benjak, A., Sanseverino, W., Bourgeois, M., Mir, G., Gonzalez, V. M., … Puigdomenech, P. (2012). The genome of melon (Cucumis melo L.). Proceedings of the National Academy of Sciences, 109, 11872–11877. https://doi.org/10.1073/pnas.1205415109
10.1073/pnas.1205415109
CAS PubMed Web of Science® Google Scholar
Guo, S., Zhang, J., Sun, H., Salse, J., Lucas, W. J., Zhang, H., … Xu, Y. (2013). The draft genome of watermelon (Citrullus lanatus) and resequencing of 20 diverse accessions. Nature Genetics, 45, 51–58. https://doi.org/10.1038/ng.2470
10.1038/ng.2470
CAS PubMed Web of Science® Google Scholar
Haas, B. J., Delcher, A. L., Wortman, J. R., & Salzberg, S. L. (2004). DAGchainer: A tool for mining segmental genome duplications and synteny. Bioinformatics, 20, 3643–3646. https://doi.org/10.1093/bioinformatics/bth397
10.1093/bioinformatics/bth397
CAS PubMed Web of Science® Google Scholar
Haas, B. J., Salzberg, S. L., Zhu, W., Pertea, M., Allen, J. E., Orvis, J., … Wortman, J. R. (2008). Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biology, 9, R7. https://doi.org/10.1186/gb-2008-9-1-r7
10.1186/gb-2008-9-1-r7
CAS PubMed Web of Science® Google Scholar
Heiser, C. B., & Schilling, E. E. (1988). Phylogeny and distribution of Luffa (Cucurbitaceae). Biotropica, 20, 185–191. https://doi.org/10.2307/2388233
10.2307/2388233
Web of Science® Google Scholar
Heiser, C. B., Schilling, E. E., & Dutt, B. (1988). The American species of Luffa (Cucurbitaceae). Systematic Botany, 13, 138–145. https://doi.org/10.2307/2419250
10.2307/2419250
Web of Science® Google Scholar
Hoff, K. J., Lange, S., Lomsadze, A., Borodovsky, M., & Stanke, M. (2016). BRAKER1: Unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics, 32, 767–769.
10.1093/bioinformatics/btv661
CAS PubMed Web of Science® Google Scholar
Hoff, K. J., Lomsadze, A., Borodovsky, M., & Stanke, M. (2019). Whole-genome annotation with BRAKER. Methods in Molecular Biology, 1962, 65–95.
10.1007/978-1-4939-9173-0_5
CAS PubMed Google Scholar
Hu, F., Lin, Y., & Tang, J. (2014). MLGO: Phylogeny reconstruction and ancestral inference from gene-order data. BMC Bioinformatics, 15, 354. https://doi.org/10.1186/s12859-014-0354-6
10.1186/s12859-014-0354-6
PubMed Web of Science® Google Scholar
Huang, S., Li, R., Zhang, Z., Li, L. I., Gu, X., Fan, W., … Li, S. (2009). The genome of the cucumber, Cucumis sativus L. Nature Genetics, 41, 1275–1281. https://doi.org/10.1038/ng.475
10.1038/ng.475
CAS PubMed Web of Science® Google Scholar
Huang, X., Adams, M. D., Zhou, H., & Kerlavage, A. R. (1997). A tool for analyzing and annotating genomic sequences. Genomics, 46, 37–45. https://doi.org/10.1006/geno.1997.4984
10.1006/geno.1997.4984
CAS PubMed Web of Science® Google Scholar
Joshi, B., Tiwari, R., Kc, H., Ghale, M., & Gyawali, S. (2013). Nepalese landraces of sponge gourd for the production of tender fruits, The Journal of Agriculture and Environment, 14, 13–22.
10.3126/aej.v14i0.19782
Google Scholar
Koren, S., Walenz, B. P., Berlin, K., Miller, J. R., Bergman, N. H., & Phillippy, A. M. (2017). Canu: Scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research, 27, 722–736.
10.1101/gr.215087.116
CAS PubMed Web of Science® Google Scholar
Kozlov, A. M., Darriba, D., Flouri, T., Morel, B., & Stamatakis, A. (2019). RAxML-NG: A fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics, 35, 4453–4455. https://doi.org/10.1093/bioinformatics/btz305
10.1093/bioinformatics/btz305
CAS PubMed Web of Science® Google Scholar
Kriventseva, E. V., Tegenfeldt, F., Petty, T. J., Waterhouse, R. M., Simao, F. A., Pozdnyakov, I. A., … Zdobnov, E. M. (2015). OrthoDB v8: Update of the hierarchical catalog of orthologs and the underlying free software. Nucleic Acids Research, 43, D250–D256. https://doi.org/10.1093/nar/gku1220
10.1093/nar/gku1220
CAS PubMed Web of Science® Google Scholar
Li, H. (2018). Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics, 34, 3094–3100. https://doi.org/10.1093/bioinformatics/bty191
10.1093/bioinformatics/bty191
CAS PubMed Web of Science® Google Scholar
Li, Z., Zhang, Z., Yan, P., Huang, S., Fei, Z., & Lin, K. (2011). RNA-Seq improves annotation of protein-coding genes in the cucumber genome. BMC Genomics, 12, 540. https://doi.org/10.1186/1471-2164-12-540
10.1186/1471-2164-12-540
CAS PubMed Web of Science® Google Scholar
Lieberman-Aiden, E., van Berkum, N. L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., … Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326, 289–293. https://doi.org/10.1126/science.1181369
10.1126/science.1181369
CAS PubMed Web of Science® Google Scholar
Manikandaselvi, S., Vadivel, V., & Brindha, P. (2016). Review on Luffa acutangula L.: Ethnobotany, phytochemistry, nutritional value and pharmacological properties. International Journal of Current Pharmaceutical Review and Research, 7, 151–155.
Google Scholar
Marcais, G., & Kingsford, C. (2011). A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics, 27, 764–770. https://doi.org/10.1093/bioinformatics/btr011
10.1093/bioinformatics/btr011
CAS PubMed Web of Science® Google Scholar
Marquez, Y., Brown, J., Simpson, C., Barta, A., & Kalyna, M. (2012). Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis. Genome Research, 22, 1184–1195.
10.1101/gr.134106.111
CAS PubMed Web of Science® Google Scholar
McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., … DePristo, M. A. (2010). The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 20, 1297–1303. https://doi.org/10.1101/gr.107524.110
10.1101/gr.107524.110
CAS PubMed Web of Science® Google Scholar
Oboh, I., & Aluyor, E. (2009). Luffa cylindrica-an emerging cash crop. African Journal of Agricultural Research, 4, 684–688.
Web of Science® Google Scholar
Ou, S., & Jiang, N. (2018). LTR_retriever: A highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiology, 176, 1410–1422. https://doi.org/10.1104/pp.17.01310
10.1104/pp.17.01310
CAS PubMed Web of Science® Google Scholar
Ou, S., & Jiang, N. (2019). LTR_FINDER_parallel: Parallelization of LTR_FINDER enabling rapid identification of long terminal repeat retrotransposons. Mobile DNA, 10, 48. https://doi.org/10.1186/s13100-019-0193-0
10.1186/s13100-019-0193-0
CAS PubMed Web of Science® Google Scholar
Paradis, E., Claude, J., & Strimmer, K. (2004). APE: Analyses of phylogenetics and evolution in R language. Bioinformatics, 20, 289–290. https://doi.org/10.1093/bioinformatics/btg412
10.1093/bioinformatics/btg412
CAS PubMed Web of Science® Google Scholar
Partap, S., Kumar, A., Sharma, N. K., & Jha, K. K. (2012). Luffa cylindrica: An important medicinal plant. Journal of Natural Product and Plant Resources, 2, 127–134.
Google Scholar
Pflug, J. M., Holmes, V. R., Burrus, C., Spencer Johnston, J., & Maddison, D. R. (2020). Measuring genome sizes using read-depth, k-mers, and flow cytometry: Methodological comparisons in beetles (Coleoptera). G3: Genes, Genomes, Genetics. https://doi.org/10.1534/g3.120.401028
Web of Science® Google Scholar
Pootakham, W., Sonthirod, C., Naktang, C., Ruang-Areerate, P., Yoocha, T., Sangsrakru, D., … Tangphatsornruang, S. (2017). De novo hybrid assembly of the rubber tree genome reveals evidence of paleotetraploidy in Hevea species. Scientific Reports, 7, 41457. https://doi.org/10.1038/srep41457
10.1038/srep41457
CAS PubMed Web of Science® Google Scholar
Prakash, K., Pandey, A., Jalli, R., & Bisht, I. (2013). Morphological variability in cultivated and wild species of Luffa (Cucurbitaceae) from India. Genetic Resources and Crop Evolution, 60, 2319–2329. https://doi.org/10.1007/s10722-013-9999-7
10.1007/s10722-013-9999-7
Web of Science® Google Scholar
Pritchard, J. K., Stephens, M., & Donnelly, P. (2000). Inference of population structure using multilocus genotype data. Genetics, 155, 945–959.
10.1111/j.1365-294X.2004.02396.x
CAS PubMed Web of Science® Google Scholar
Putnam, N. H., O'Connell, B. L., Stites, J. C., Rice, B. J., Blanchette, M., Calef, R., … Green, R. E. (2016). Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Research, 26, 342–350. https://doi.org/10.1101/gr.193474.115
10.1101/gr.193474.115
CAS PubMed Web of Science® Google Scholar
Rabei, S., Rizk, R. M., & Khedr, A.-H.-A. (2013). Keys for and morphological character variation in some Egyptian cultivars of Cucurbitaceae. Genetic Resources and Crop Evolution, 60, 1353–1364. https://doi.org/10.1007/s10722-012-9924-5
10.1007/s10722-012-9924-5
Web of Science® Google Scholar
Reddy, A. S. (2007). Alternative splicing of pre-messenger RNAs in plants in the genomic era. Annual Review of Plant Biology, 58, 267–294. https://doi.org/10.1146/annurev.arplant.58.032806.103754
10.1146/annurev.arplant.58.032806.103754
CAS PubMed Web of Science® Google Scholar
Ren, L., Huang, W., & Cannon, S. B. (2019). Reconstruction of ancestral genome reveals chromosome evolution history for selected legume species. New Phytologist, 223, 2090–2103. https://doi.org/10.1111/nph.15770
10.1111/nph.15770
PubMed Web of Science® Google Scholar
Renner, S. S., & Schaefer, H. (2017). Phylogeny and evolution of the Cucurbitaceae. In R. Grumet, N. Katzir, & J. Garcia-Mas (Eds.), Genetics and genomics of Cucurbitaceae (pp. 13–23). Cham: Springer International Publishing.
Google Scholar
Rogers, M. F., Thomas, J., Reddy, A. S. N., & Ben-Hur, A. (2012). SpliceGrapher: Detecting patterns of alternative splicing from RNA-Seq data in the context of gene models and EST data. Genome Biology, 13, R4. https://doi.org/10.1186/gb-2012-13-1-r4
10.1186/gb-2012-13-1-r4
CAS PubMed Web of Science® Google Scholar
Ruggieri, V., Alexiou, K. G., Morata, J., Argyris, J., Pujol, M., Yano, R., … Garcia-Mas, J. (2018). An improved assembly and annotation of the melon (Cucumis melo L.) reference genome. Scientific Reports, 8, 8088. https://doi.org/10.1038/s41598-018-26416-2
10.1038/s41598-018-26416-2
PubMed Web of Science® Google Scholar
Sebastian, P., Schaefer, H., Telford, I. R., & Renner, S. S. (2010). Cucumber (Cucumis sativus) and melon (C. melo) have numerous wild relatives in Asia and Australia, and the sister species of melon is from Australia. Proceedings of the National Academy of Sciences, 107, 14269–14273. https://doi.org/10.1073/pnas.1005338107
10.1073/pnas.1005338107
CAS PubMed Web of Science® Google Scholar
Shang, X., Cao, Y., & Ma, L. (2017). Alternative splicing in plant genes: A means of regulating the environmental fitness of plants. International Journal of Molecular Sciences, 18, 432. https://doi.org/10.3390/ijms18020432
10.3390/ijms18020432
PubMed Web of Science® Google Scholar
Shen, Y., Zhou, Z., Wang, Z., Li, W., Fang, C., Wu, M., … Tian, Z. (2014). Global dissection of alternative splicing in paleopolyploid soybean. The Plant Cell, 26, 996–1008. https://doi.org/10.1105/tpc.114.122739
10.1105/tpc.114.122739
CAS PubMed Web of Science® Google Scholar
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V., & Zdobnov, E. M. (2015). BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics, 31(19), 3210–3212. https://doi.org/10.1093/bioinformatics/btv351
10.1093/bioinformatics/btv351
CAS PubMed Web of Science® Google Scholar
Stanke, M., Steinkamp, R., Waack, S., & Morgenstern, B. (2004). AUGUSTUS: A web server for gene finding in eukaryotes. Nucleic Acids Research, 32, W309–W312. https://doi.org/10.1093/nar/gkh379
10.1093/nar/gkh379
CAS PubMed Web of Science® Google Scholar
Sun, H., Wu, S., Zhang, G., Jiao, C., Guo, S., Ren, Y., … Xu, Y. (2017). Karyotype stability and unbiased fractionation in the paleo-allotetraploid cucurbita genomes. Molecular Plant, 10, 1293–1306. https://doi.org/10.1016/j.molp.2017.09.003
10.1016/j.molp.2017.09.003
CAS PubMed Web of Science® Google Scholar
Team RC. (2016). R: A language and environment for statistical computing. In RFfS Computing (Ed.), R Foundation for Statistical Computing.
Google Scholar
Tempel, S. (2012). Using and understanding RepeatMasker. Methods in Molecular Biology, 859, 29–51.
10.1007/978-1-61779-603-6_2
CAS PubMed Google Scholar
Thatcher, S. R., Danilevskaya, O. N., Meng, X., Beatty, M., Zastrow-Hayes, G., Harris, C., … Li, B. (2016). Genome-wide analysis of alternative splicing during development and drought stress in maize. Plant Physiology, 170, 586–599. https://doi.org/10.1104/pp.15.01267
10.1104/pp.15.01267
CAS PubMed Web of Science® Google Scholar
Wang, B. B., & Brendel, V. (2006). Genomewide comparative analysis of alternative splicing in plants. Proceedings of the National Academy of Sciences, 103, 7175–7180. https://doi.org/10.1073/pnas.0602039103
10.1073/pnas.0602039103
CAS PubMed Web of Science® Google Scholar
Wang, B., Tseng, E., Regulski, M., Clark, T. A., Hon, T., Jiao, Y., … Ware, D. (2016). Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing. Nature Communications, 7, 11708. https://doi.org/10.1038/ncomms11708
10.1038/ncomms11708
CAS PubMed Web of Science® Google Scholar
Wu, H.-B., Gong, H., Liu, P., He, X.-L., Luo, S.-B., Zheng, X.-M., … Luo, J. (2014). Large-scale development of EST-SSR markers in sponge gourd via transcriptome sequencing. Molecular Breeding, 34, 1903–1915. https://doi.org/10.1007/s11032-014-0148-6
10.1007/s11032-014-0148-6
CAS Web of Science® Google Scholar
Wu, H., He, X., Gong, H., Luo, S., Li, M., Chen, J., … Luo, J. (2016). Genetic linkage map construction and QTL analysis of two interspecific reproductive isolation traits in sponge gourd. Frontiers in Plant Science, 7, 980. https://doi.org/10.3389/fpls.2016.00980
10.3389/fpls.2016.00980
PubMed Web of Science® Google Scholar
Wu, S., Shamimuzzaman, M., Sun, H., Salse, J., Sui, X., Wilder, A., … Fei, Z. (2017). The bottle gourd genome provides insights into Cucurbitaceae evolution and facilitates mapping of a Papaya ring-spot virus resistance locus. The Plant Journal, 92, 963–975.
10.1111/tpj.13722
CAS PubMed Web of Science® Google Scholar
Wu, T., & Watanabe, C. (2005). GMAP: A genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics, 21, 1859–1875. https://doi.org/10.1093/bioinformatics/bti310
10.1093/bioinformatics/bti310
CAS PubMed Web of Science® Google Scholar
Xie, D., Xu, Y., Wang, J., Liu, W., Zhou, Q., Luo, S., … Zhang, Z. (2019). The wax gourd genomes offer insights into the genetic diversity and ancestral cucurbit karyotype. Nature Communications, 10, 5158. https://doi.org/10.1038/s41467-019-13185-3
10.1038/s41467-019-13185-3
PubMed Web of Science® Google Scholar
Yang, Z. (2007). PAML 4: Phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution, 24, 1586–1591. https://doi.org/10.1093/molbev/msm088
10.1093/molbev/msm088
CAS PubMed Web of Science® Google Scholar
Zhang, G., Sun, M., Wang, J., Lei, M., Li, C., Zhao, D., … Zhang, B. (2019). PacBio full-length cDNA sequencing integrated with RNA-seq reads drastically improves the discovery of splicing transcripts in rice. The Plant Journal, 97, 296–305. https://doi.org/10.1111/tpj.14120
10.1111/tpj.14120
CAS PubMed Web of Science® Google Scholar
Zhang, S., Hu, J., Zhang, C. F., Guan, Y. J., & Zhang, Y. (2007). Genetic analysis of fruit shape traits at different maturation stages in sponge gourd. Journal of Zhejiang University Science B, 8, 338–344. https://doi.org/10.1631/jzus.2007.B0338
10.1631/jzus.2007.B0338
PubMed Google Scholar
Zhang, T., Ren, X., Zhang, Z., Ming, Y., Yang, Z., Hu, J., … Sun, Z. (2020). Long-read sequencing and de novo assembly of the Luffa cylindrica (L.) Roem. genome. Molecular Ecology Resources, 20, 511–519.
10.1111/1755-0998.13129
CAS PubMed Web of Science® Google Scholar
Zhou, Y., Ma, Y., Zeng, J., Duan, L., Xue, X., Wang, H., … Huang, S. (2016). Convergence and divergence of bitterness biosynthesis and regulation in Cucurbitaceae. Nature Plants, 2, 16183. https://doi.org/10.1038/nplants.2016.183
10.1038/nplants.2016.183
CAS PubMed Web of Science® Google Scholar
Zhu, H., Liu, J., Wen, Q., Chen, M., Wang, B., Zhang, Q., & Xue, Z. (2017). De novo sequencing and analysis of the transcriptome during the browning of fresh-cut Luffa cylindrica 'Fusi-3' fruits. PLoS One, 12, e0187117. https://doi.org/10.1371/journal.pone.0187117
10.1371/journal.pone.0187117
PubMed Web of Science® Google Scholar

Citing Literature

Volume21, Issue1

January 2021

Pages 212-225

De novo assemblies of Luffa acutangula and Luffa cylindrica genomes reveal an expansion associated with substantial accumulation of transposable elements

Abstract

1 INTRODUCTION

2 MATERIALS AND METHODS

2.1 Plant materials and DNA/RNA isolation

2.2 Genomic and transcriptomics (Iso-seq) library preparation and sequencing

2.3 Chicago library preparation and sequencing