Volume 21, Issue 1 pp. 238-250

RESOURCE ARTICLE

Full Access

A chromosome-scale assembly of the black gram (Vigna mungo) genome

Wirulda Pootakham,

Corresponding Author

Wirulda Pootakham

[email protected]

orcid.org/0000-0001-6721-6453

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Correspondence

Wirulda Pootakham and Sithichoke Tangphatsornruang, National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand.

Emails: [email protected](WP); [email protected](ST)

Search for more papers by this author

Wanapinun Nawae,

Wanapinun Nawae

orcid.org/0000-0001-9228-6963

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Chaiwat Naktang,

Chaiwat Naktang

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Chutima Sonthirod,

Chutima Sonthirod

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Thippawan Yoocha,

Thippawan Yoocha

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Wasitthee Kongkachana,

Wasitthee Kongkachana

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Duangjai Sangsrakru,

Duangjai Sangsrakru

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Nukoon Jomchai,

Nukoon Jomchai

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Sonicha U-thoomporn,

Sonicha U-thoomporn

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Prakit Somta,

Prakit Somta

orcid.org/0000-0002-1002-8360

Department of Agronomy, Faculty of Agriculture at Kamphaeng Saen, Kasetsart University, Nakhon Pathom, Thailand

Search for more papers by this author

Kularb Laosatit,

Kularb Laosatit

Department of Agronomy, Faculty of Agriculture at Kamphaeng Saen, Kasetsart University, Nakhon Pathom, Thailand

Search for more papers by this author

Sithichoke Tangphatsornruang,

Corresponding Author

Sithichoke Tangphatsornruang

[email protected]

orcid.org/0000-0003-2673-0012

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Correspondence

Wirulda Pootakham and Sithichoke Tangphatsornruang, National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand.

Emails: [email protected](WP); [email protected](ST)

Search for more papers by this author

Wirulda Pootakham,

Corresponding Author

Wirulda Pootakham

[email protected]

orcid.org/0000-0001-6721-6453

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Correspondence

Wirulda Pootakham and Sithichoke Tangphatsornruang, National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand.

Emails: [email protected](WP); [email protected](ST)

Search for more papers by this author

Wanapinun Nawae,

Wanapinun Nawae

orcid.org/0000-0001-9228-6963

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Chaiwat Naktang,

Chaiwat Naktang

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Chutima Sonthirod,

Chutima Sonthirod

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Thippawan Yoocha,

Thippawan Yoocha

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Wasitthee Kongkachana,

Wasitthee Kongkachana

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Duangjai Sangsrakru,

Duangjai Sangsrakru

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Nukoon Jomchai,

Nukoon Jomchai

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Sonicha U-thoomporn,

Sonicha U-thoomporn

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Search for more papers by this author

Prakit Somta,

Prakit Somta

orcid.org/0000-0002-1002-8360

Department of Agronomy, Faculty of Agriculture at Kamphaeng Saen, Kasetsart University, Nakhon Pathom, Thailand

Search for more papers by this author

Kularb Laosatit,

Kularb Laosatit

Department of Agronomy, Faculty of Agriculture at Kamphaeng Saen, Kasetsart University, Nakhon Pathom, Thailand

Search for more papers by this author

Sithichoke Tangphatsornruang,

Corresponding Author

Sithichoke Tangphatsornruang

[email protected]

orcid.org/0000-0003-2673-0012

National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand

Correspondence

Wirulda Pootakham and Sithichoke Tangphatsornruang, National Omics Center, National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand.

Emails: [email protected](WP); [email protected](ST)

Search for more papers by this author

First published: 13 August 2020

https://doi.org/10.1111/1755-0998.13243

Citations: 43

Nawae, Naktang and Sonthirod equal contributors.

Share a link

Email
Wechat
Bluesky

Abstract

Black gram (Vigna mungo) is an important short duration grain legume crop. Black gram seeds provide an inexpensive source of dietary protein. Here, we applied the 10X Genomics linked-read technology to obtain a de novo whole genome assembly of V. mungo cultivated variety Chai Nat 80 (CN80). The preliminary assembly contained 12,228 contigs and had an N50 length of 5.2 Mb. Subsequent scaffolding using the long-range Chicago and HiC techniques yielded the first high-quality, chromosome-level assembly of 499 Mb comprising 11 pseudomolecules. Comparative genomics analyses based on sequence information from single-copy orthologous genes revealed that black gram and mungbean (Vigna radiata) diverged about 2.7 million years ago . The transversion rate (4DTv) analysis in V. mungo revealed no evidence supporting a recent genome-wide duplication event observed in the tetraploid créole bean (Vigna reflexo-pilosa). The proportion of repetitive elements in the black gram genome is slightly lower than the numbers reported for related Vigna species. The majority of long terminal repeat retrotransposons appeared to integrate into the genome within the last five million years. We also examined alternative splicing events in V. mungo using full-length transcript sequences. While intron retention was the most prevalent mode of alternative splicing in several plant species, alternative 3' acceptor site selection represented the majority of events in black gram. Our high-quality genome assembly along with the genomic variation information from the germplasm provides valuable resources for accelerating the development of elite varieties through marker-assisted breeding and for future comparative genomics and phylogenetic studies in legume species.

1 INTRODUCTION

Black gram (Vigna mungo [L.] Hepper) is an important short duration grain legume crop with high protein content in seeds. Black gram is a self-pollinating diploid species (2n = 2x = 22) with an estimated genome size of 574 Mb (Arumuganathan & Earle, 1991). Black gram seeds are an inexpensive source of dietary protein, starch, vitamins and mineral elements,containing a high level of folate and iron (Kakati, Deka, Kotoki, & Saikia, 2010). V.mungo var. mungo (L.) Hepper appeared to have been domesticated in India from its wild progenitor, V. mungo var. silvestris (Chandel, Lester, & Starling, 1984). This pulse crop is widely cultivated in South and Southeast Asian countries including India, Bangladesh, Pakistan, Sri Lanka, Myanmar, the Philippines and Thailand (Kaewwongwal et al., 2015). India is the world's largest producer of black gram followed by Myanmar and Pakistan (Kaewwongwal et al., 2015; Raizada & Souframanien, 2019). To date, genetic improvement in black gram has been achieved primarily through conventional breeding as marker-assisted approaches are still in their infancy. While a decent yield improvement has been made, the average yield per hectare is still low due to losses from biotic (e.g., powdery mildew and yellow mosaic disease) and abiotic (e.g., drought and salinity) stresses.

Over the past few years, several studies have utilized Illumina short-read sequencing technology to obtain transcriptome assemblies for the purpose of developing simple sequence repeat (SSR) and single nucleotide polymorphism (SNP) markers in black gram (Jasrotia et al., 2017; Raizada & Souframanien, 2019; Souframanien & Reddy, 2015). Nevertheless, there has not been a report on a genome assembly for this legume species. Here, we employed the 10X Genomics linked-read technology (Paajanen et al., 2019) to perform de novo genome assembly of V. mungo. We also applied the long-range Chicago (in vitro proximity ligation) and HiC (in vivo fixation of chromosomes) techniques (Putnam et al., 2016) to obtain the first chromosome-scale whole genome assembly for this species. A combination of the 10X Genomics linked-read technology and the long-range HiC scaffolding technique provides an effective approach to produce a high-quality reference assembly. Along with the genomic variation information from V. mungo germplasm, this genome assembly provides invaluable resources for accelerating the development of improved elite black gram varieties through molecular breeding and future phylogenetics and comparative genomics studies in Vigna species.

2 MATERIALS AND METHODS

2.1 Plant materials and DNA/RNA extraction

Ninety black gram accessions maintained at Kasetsart University (Thailand) were used in this study. For DNA extraction, fresh leaf tissues were collected, flash-frozen in liquid nitrogen and stored at –80°C until use. To obtain high molecular weight DNA for 10X Genomics linked-read sequencing, frozen tissues (cultivar Chai Nat 80 [CN80]) were homogenized, and DNA was extracted using QIAGEN Genomic-tip 100/G following the manufacturer's protocol (Qiagen). The DNA integrity was assessed using the Pippin Pulse Electrophoresis System (Sage Science). Total RNA was isolated from the following tissues (CN80): leaf, root, stem, flower, 1-week-old pod and 3-week-old pod using the CTAB buffer (2% CTAB, 1.4 M NaCl, 2% PVP, 20 mM EDTA pH 8.0, 100 mM Tris-HCl pH 8.0, 0.4% SDS). The aqueous phase was extracted three times using 25:24:1 phenol:chloroform:isoamylalcohol, and RNA was precipitated overnight in ¼ volumes of 8 M LiCl. The pellets were washed with 70% ethanol, air-dried and resuspended in RNase-free water. Poly(A) mRNAs were enriched from total RNA samples using Dynabeads mRNA Purification Kit (ThermoFisher Scientific).

2.2 DNA and RNA library preparation and sequencing

A total of 1.25 ng of high molecular weight DNA was used to prepare the linked-read library using the Chromium Genome Library Kit & Gel Bead Kit v2, the Chromium Genome Chip Kit v2 and the Chromium i7 Multiplex Kit according to the manufacturer's instructions (10X Genomics). The resulting 10X library was sequenced on a single lane of Illumina HiSeq X Ten (2 × 150 bp paired-end reads). For whole genome shotgun sequencing of 89 V. mungo accessions (Table S1), approximately 300 ng of each DNA sample was used for a library construction following the protocol in the MGIEazy FS Library Prep Kit (MGI Tech). Paired-end (150 bp) sequencing was performed on the MGISEQ-2000RS according to the manufacturer's instructions.

RNA integrity was assessed with a Fragment Analyser System (Agilent) prior to the construction of RNA sequencing libraries. Two Iso-seq libraries were prepared according to a previously published protocol (Pootakham et al., 2017) using the SMARTer PCR cDNA Synthesis Kit (Clontech) and size-selected using the BluePippin Size Selection System (Sage Science) into 1–2 kb, 2–3 kb and 3–6 kb bins. One library was prepared from RNA extracted from leaf tissue, and the other was prepared from pooled RNA samples (root, stem, flower, 1-week-old pod and 3-week-old pod). Sequencing was performed on the PacBio RSII sequencing system using P6-C4 polymerase and chemistry and 360 min movie times according to the manufacturer's protocol. To obtain short-read RNA sequences, six RNA libraries (one for each tissue type) were prepared according to the protocol reported in Pootakham et al. (2018). Briefly, 200 ng of poly(A) mRNA was used to construct a library using the Ion Total RNA Sequencing Kit (ThermoFisher Scientific). The libraries were sequenced on the Ion S5 XL using the Ion 540 chip (ThermoFisher Scientific).

2.3 Chicago library preparation and sequencing

Chicago library preparation and sequencing were carried out by Dovetail Genomics. A Chicago library was prepared as described previously (Putnam et al., 2016). Briefly, ~500 ng of high molecular weight genomic DNA (mean fragment length = 61 kbp) was reconstituted into chromatin in vitro and fixed with formaldehyde. Fixed chromatin was digested with DpnII, the 5' overhangs filled in with biotinylated nucleotides, and then free blunt ends were ligated. After ligation, crosslinks were reversed and the DNA purified from protein. Purified DNA was treated to remove biotin that was not internal to ligated fragments. The DNA was then sheared to ~350 bp mean fragment size, and sequencing libraries were generated using NEBNext Ultra enzymes and Illumina-compatible adapters. Biotin-containing fragments were isolated using streptavidin beads before PCR enrichment of each library. The libraries were sequenced on an Illumina HiSeq X to produce 103 million 2 × 150 bp paired-end reads, which provided 21.72X physical coverage of the genome (1–100 kb pairs).

2.4 Dovetail HiC library preparation and sequencing

Dovetail HiC library preparation and sequencing were carried out by Dovetail Genomics. A Dovetail HiC library was prepared in a similar manner as described previously (Lieberman-Aiden et al., 2009). Briefly, for each library, chromatin was fixed in place with formaldehyde in the nucleus and then extracted fixed chromatin was digested with DpnII, the 5’ overhangs filled in with biotinylated nucleotides, and then free blunt ends were ligated. After ligation, crosslinks were reversed and the DNA purified from protein. Purified DNA was treated to remove biotin that was not internal to ligated fragments. The DNA was then sheared to ~ 350 bp mean fragment size and sequencing libraries were generated using NEBNext Ultra enzymes and Illumina-compatible adapters. Biotin-containing fragments were isolated using streptavidin beads before PCR enrichment of each library. The libraries were sequenced on an Illumina HiSeq X to produce 86 million 2 x 150 bp paired-end reads, which provided 2,521.27X physical coverage of the genome (10–10,000 kb pairs).

2.5 De novo genome assembly

Linked-read data were assembled using the Supernova assembler version 2.1.1 using the default settings (https://support.10xgenomics.com/de-novo-assembly/software/pipelines/latest/using/running; 10X Genomics). The Supernova scaffolds along with Chicago library reads, shotgun reads, and Dovetail HiC library reads were used as the input data for HiRise, a software pipeline designed specifically for using proximity ligation data to scaffold genome assemblies (Putnam et al., 2016). An iterative analysis was conducted. First, shotgun and Chicago library sequences were aligned to the draft input assembly using a modified SNAP read mapper (http://snap.cs.berkeley.edu). The separations of Chicago read pairs mapped within draft scaffolds were analysed by HiRise to produce a likelihood model for genomic distance between read pairs, and the model was used to identify and break putative misjoins, to score prospective joins, and make joins above a threshold. After aligning and scaffolding Chicago data, Dovetail HiC library sequences were aligned and scaffolded following the same method. After scaffolding, shotgun sequences were used to close gaps between contigs.

2.6 Assembly quality assessment

The assembly quality assessment was carried out by aligning short-read DNA and RNA-seq sequences, Iso-seq transcript sequences and publicly available genomic and transcriptomic sequences (Kundu, Patel, Patel, & Pal, 2015; Kundu, Singh, Dey, Ganguli, & Pal, 2019) using BLASTN at an e-value cutoff of 10^–10. The completeness of the final genome assembly was also evaluated using Benchmarking Universal Single-Copy Orthologues (BUSCO) (Simão, Waterhouse, Ioannidis, Kriventseva, & Zdobnov, 2015). The BUSCO pipeline version 3 was used to test for the presence and completeness of orthologues using the Embryophyta OrthoDB release 9 (Kriventseva et al., 2015).

2.7 Annotation of repetitive elements and repeat masking

To generate a de novo repeat library, RepeatModeler version 2.0.1 (http://www.repeatmasker.org/RepeatModeler/) was used to predict transposable elements in the unannotated genome assembly. Two de novo repeat-finding programs, RECON version 1.08 and RepeatScout version 1.0.5, were employed to identify the boundaries of repetitive elements and to build consensus models of interspersed repeats. To mask the assembled genome sequences, we employed both the custom black gram-specific repeat library generated by RepeatModeler and the repetitive sequences in the RepBase plant repeat database (20,150,807; https://www.girinst.org/) using RepeatMasker version 4.0.9_p2 (default parameters) (Tempel, 2012). The insertion times for the full-length LTR retrotransposons were estimated using the LTR_retriever program (Ou & Jiang, 2018). The insertion times of LTRs (T) were calculated according to the following formula: T = K/2μ, where K is the divergence rate calculated with the Jukes-Cantor model (Jukes & Cantor, 1969) for non-coding sequences, and μ is the 1.64 × 10^–8 substitution rate in repeat regions (Zhuang et al., 2019). The Jukes-Cantor model assumes equal base frequencies and equal mutation rates. We also assumed that the mutation rates were similar among species analysed.

2.8 Gene annotation

To predict protein-coding sequences in the unmasked genomes, evidences from transcriptome-based prediction, gene prediction and homology-based prediction were combined using EvidenceModeler (EVM) version 1.1.1 r2015-07–03 (Haas et al., 2008). Transcriptome-based prediction methods combined information from RNA-seq and Ise-seq data obtained from leaf, root, stem, flower and pod. Full-length transcripts were mapped to the genome assembly using Genomic Mapping and Alignment Program (GMAP; version r20160630) (Wu & Watanabe, 2005), and short-read RNA-seq data were mapped to the assembly using PASA2 version 2.0.1 (Haas et al., 2008). Protein sequences from Vigna radiata (mungbean), Vigna angularis (adzuki bean), Vigna unguiculata (cowpea), Phaseolus vulgaris (common bean), Glycine max (soybean) and Arabidopsis thaliana obtained from public databases were aligned to the unmasked genome using AAT version 1.52 (Huang, Adams, Zhou, & Kerlavage, 1997). Two ab initio gene predictors were run on the unmasked assembly. Protein-coding gene predictions were obtained with Augustus version 3.2.1 (Stanke, Steinkamp, Waack, & Morgenstern, 2004) trained with V. radiata, V. angularis, V. unguiculata, P. vulgaris, G. max and A. thaliana PASA transcriptome alignment assembly and BRAKER (Hoff, Lange, Lomsadze, Borodovsky, & Stanke, 2016; Hoff, Lomsadze, Borodovsky, & Stanke, 2019) using an Iso-seq alignment file as an input. All gene predictions were integrated by EVM to generate consensus gene models using the following weight for each evidence type: PASA2–1, GMAP–1, AAT–0.3 and Augustus–0.3. The positions of annotate genes were cross-checked with those of known repeats, and any gene that had more than 20% overlapping sequence with repetitive elements was excluded from the list of annotated genes. Predicted genes were functionally annotated using OmicsBox version 1.3.11 (https://www.biobam.com/download-omicsbox/). Protein sequences were aligned with the following protein databases: UniProtKB/Swiss-Prot (swissprot v5) and GenBank nonredundant database (nr v5) using local BLASTP with an e-value cutoff of 1.0e-5. Gene ontology (GO) terms were retrieved and assigned to V. mungo query sequences. Enzyme codes, corresponding to V. mungo gene ontology, were retrieved and map to KEGG pathway annotations.

2.9 Comparative genomics and phylogenetic analysis

We used OrthoFinder (Emms & Kelly, 2019) to identify orthologous groups in V. mungo and nine other legumes (G. max, V. unguiculata, V. angularis, V. radiata, Vigna reflexo-pilosa, P. vulgaris, Arachis duranensis, Cicer arietinum and Medicago truncatula), two cuburbit species (Cucumis sativus and Cucumis melo), two rosid species (Prunus persica and A. thaliana) and one monocot (Oryza sativa). Protein sequences from single-copy orthologous groups were used to construct phylogenetic tree using the RAxML-ng program (Kozlov, Darriba, Flouri, Morel, & Stamatakis, 2019). We first aligned sequences in each single-copy orthologous group with MUSCLE (Edgar, 2004) and removed alignment gaps with trimAl (Capella-Gutiérrez, Silla-Martínez, & Gabaldón, 2009) using the automated1 heuristic method. All alignment blocks were concatenated using catsequences program (https://github.com/ChrisCreevey/catsequences), and substitution model for each alignment block was estimated using ModelTest-NG program (Darriba et al., 2019). The outputs were subsequently used to compute a maximum-likelihood phylogenetic tree. Divergence times of species in the phylogenetic tree were estimated with the MCMCtree program (PAML4 package) (Yang, 2007) using the relaxed-clock model with the known divergence time between Phaseolus and Vigna, estimated at 6.4–10.4 million years ago (MYA) (Lavin, Herendeen, & Wojciechowski, 2005) and the known divergence time between cucumber and melon, estimated at 8.4–11.8 MYA (Sebastian, Schaefer, Telford, & Renner, 2010). Significantly expanded or contracted gene families across the phylogenetic tree (p-value <.01) were calculated using the CAFE software version 4.2 (Han, Thomas, Lugo-Martinez, & Hahn, 2013) with the gene birth-death (λ) parameters estimated using the maximum-likelihood method.

2.10 Phaseoloid evolutionary analysis

We adapted the method described in Ren, Huang, and Cannon (2019) to reconstruct the ancestral genome of V. mungo, V. radiata, V. unguiculata, V. angularis, P. vulgaris and G. max (Ren et al., 2019). In brief, we used OrthoFinder to identify orthologous groups in these six species. Syntenic blocks were then constructed from the orthologous groups using the DAGchainer program (Haas, Delcher, Wortman, & Salzberg, 2004) in the Synima pipeline (Farrer, 2017). The outputs from DAGchainer were used to specify “markers”representing features that were shared by the selected genomes using the scripts provided in Ren et al. (2019). In the following step, we used MLGO web service (http://www.geneorder.org/server.php) (Hu, Lin, & Tang, 2014) to infer the ancestral genome from the order of the markers in each individual genome and the information from the phylogenetic tree constructed using single-copy orthologues (Kozlov et al., 2019).

2.11 V. mungo phylogenetic relationship and population structure analysis

For the phylogenetic analysis, we used a set of 6,657 SNP markers at four-fold-degenerate sites with the following criteria: (a) a minor allele frequency >0.05; (b) depth coverage between 20X–200X; and (c) fewer than 10% missing data. R package was used to construct a neighbour-joining tree with 1,000 bootstrap replicates (Paradis, Claude, & Strimmer, 2004; R Core Team, 2016).We applied the same set of SNPs to examine the population structure using STRUCTURE program (version 2.3.4) (Falush, Stephens, & Pritchard, 2003) using 10,000 iterations with the number of clusters (K) of 2–4. The Evanno method was used to detect the number of K groups that best fitted the data set (Earl & vonHoldt, 2012; Evanno, Regnaut, & Goudet, 2005).

3 RESULTS

3.1 Genome assembly and annotation

An elite cultivar, Chai Nat 80, was selected for whole-genome shotgun sequencing using the 10X Genomics linked-read strategy. We generated a total of 133 Gb of Illumina paired-end 150 bp sequencing data from 892,194,918 raw reads, representing 232X coverage based on the estimated genome size of 574 Mb (Arumuganathan & Earle, 1991; Pal, 2006). A de novo assembly of linked-read sequences using the Supernova yielded a draft genome of 498.9 Mb. The preliminary assembly contained 12,228 contigs and had an N50 length of 5.2 Mb (Table 1). The analysis of k-mer distribution of the genome sequencing reads provided an estimated genome size of 531.3 Mb (Figure S1), close to the previously reported figure (Pal, 2006). The preliminary assembly of V. mungo genome was further assembled using the long-range Chicago (in vitro proximity ligation; 103 million read pairs; 21.72X physical coverage of the genome) and HiC (in vivo fixation of chromosomes; 86 million read pairs; 2,521.27X physical coverage of the genome) library data scaffolded with the HiRise software (Dovetail Genomics, Santa Cruz, USA; Figure S2). The final assembly contained 11 pseudomolecules greater than 10 Mb in length (hereafter referred to as chromosomes, numbered according to size; Figure 1), corresponding to the haploid chromosome number in V. mungo (1n = 11, 2n = 22). The 11 chromosomes covered 463,352,435 bases or 92.8% of the 499 Mb assembly.

TABLE 1. Assembly statistics of the V. mungo genome

	10X Genomics	10X Genomics + Chicago	10X Genomics + Chicago + HiC
N50 scaffold size (bases)	5,235,471	98,299	43,171,434
L50 scaffold number	30	131	5
N75 scaffold size (bases)	2,307,364	331	35,890,399
L75 scaffold number	68	408,224	9
N90 scaffold size (bases)	22,699	55,295	24,711,178
L90 scaffold number	351	732	11
Assembly size (bases)	498,929,800	499,131,000	498,872,271
Number of scaffolds	12,228	11,461	9,224
Number of scaffolds ≥100 kb	147	633	13
Number of scaffolds ≥1 Mb	95	128	11
Number of scaffolds ≥10 Mb	5	0	11
Longest scaffold (bases)	16,127,237	6,075,845	65,136,230
% N	0.06	0.06	0.06
GC content (%)	33.55	33.55	33.55
BUSCO evaluation (% completeness)	-	-	94.4

Details are in the caption following the image — **FIGURE 1**
Open in figure viewer PowerPoint

Genomic landscape of *V. mungo* chromosomes. (a) Physical map of 11 assembled chromosomes (Mb scale) numbered according to size. (b) Repeat density represented by proportion of genomic regions covered by repetitive sequences in 500 kb windows. (c) Gene density represented by number of genes in 500 kb windows. (d) GC content represented by percentage of G + C bases in 500 kb windows. (e) SNP density represented by number of SNP markers in 500 kb windows. (f) Syntenic blocks are depicted by connected lines [Colour figure can be viewed at wileyonlinelibrary.com]

To evaluate the quality of the final genome assembly, we aligned genomic DNA reads to the genome sequences and found that 93.8% of the reads from the MGI shotgun libraries could be mapped back to the assembly. We also aligned our RNA-seq reads and full-length Iso-seq transcript sequences as well as publicly available RNA-seq reads to the assembly. The percentages of our RNA-seq and Iso-seq reads that could be mapped to the genome were 94.6% and 99.0%, respectively. The percentages of reads mapped to the genome assembly were 95.4%, 97,4% and 99.8% for RNA-seq reads from the NCBI Genbank accessions SRR3141655 (Kundu et al., 2015), SRR2058996 (Kundu et al., 2019) and SRR554452, respectively. To further assess the completeness of our V. mungo genome assembly, we employed the BUSCO software to check the gene content using a plant-specific database of 1,440 genes (Simão et al., 2015). Our gene predictions recovered 94.4% of the highly conserved orthologues in the Embryophyta lineage, with 91.9% identified as “complete” and 2.5% identified as “partial” (Table 1).

To annotate the genome, we used a combination of ab initio prediction, homology-based search and transcript evidence from both Iso-seq and RNA-seq data for gene prediction. The genome annotation contained 32,729 predicted gene models, of which 29,411 (89.86%) were protein-coding genes (Tables S2, S3, S4). The most prevalent gene ontology (GO) term associated with cellular component was integral component of the membrane (7,265), followed by nucleus (2,788; Figure S3). The largest category of genes annotated to molecular function was ATP binding (2,718), followed by metal ion binding (1,178) and DNA binding (1,149; Figure S3). Only 3,318 genes (10.13%) remained functionally unannotated. We observed an uneven distribution of genes, with an increase in density towards the ends of the chromosomes (Figure 1). Compared with other sequenced legume genomes, the number of predicted genes in the V. mungo genome was lower than that of V. angularis (Yang et al., 2015), G. max (Schmutz et al., 2010) and Cajaus cajan (Varshney et al., 2012), and higher than that of P. vulgaris (Schmutz et al., 2014), V. radiata (Kang et al., 2014) and C. arietinum (Varshney et al., 2013), but similar to V. unguiculata (Lonardi et al., 2019). We found transcript support for 21,926 protein-coding genes (74.6%), comparable to the percentage reported in barrel medic (76.7%) (Young et al., 2011) but considerably higher than the percentage reported in adzuki bean (53%) (Yang et al., 2015).

The average gene length was 3,123 nt with 5.22 exons per gene, and the average exon length was 226 nt (Table S2). In addition to coding sequences, we identified 5,202 microRNAs, 979 tRNAs, 271 rRNAs and 322 small nuclear RNAs (Table S5). The GC content of the V. mungo genome is 33.6%, similar to other sequenced legume genomes (Sato et al., 2008; Schmutz et al., 2010, 2014; Varshney et al., 2012, 2013; Yang et al., 2015; Young et al., 2011). The GC content in coding regions (43.0%) was higher than that observed in introns and untranslated regions (32.9%; Table S2).

3.2 Comparative genomics and phylogenetic analyses

To investigate the evolutionary relationships between black gram and other plant species, we analysed the gene sets from nine legumes: soybean (G. max), cowpea (V. unguiculata), adzuki bean (V. angularis), mungbean (V. radiata), créole bean (V. reflexo-pilosa), common bean (P. vulgaris), a diploid progenitor of cultivated peanut (A. duranensis), chickpea (C. arietinum) and barrel medic (M. truncatula); two cuburbit species: cucumber (C. sativus) and melon (C. melo); two rosid species: peach (P. persica) and Arabidopsis; one monocot: rice (O. sativa). A monocot representative, rice, was included in the analysis as an outgroup species. We chose to include cucumber and melon because of their known divergence time (Sebastian et al., 2010) whereas peach and Arabidopsis were selected as representatives of rosid species because of the availability of their complete genome sequences. Of 609,118 input proteins from 15 species, 546,885 (89.78%) were clustered into 20,784 orthologous groups. Sequence information from single-copy orthologous genes was used to construct a maximum-likelihood phylogenetic tree, revealing that V. mungo and V. radiata diverged approximately 2.7 MYA ( Figure 2a, Figure S4). The ancestor of V. mungo and V. radiata formed a sister clade to the ancestor of V. reflexo-pilosa and V. angularis, and the two clades diverged about 4.2 MYA. This placement in the phylogenetic tree was consistent with previous reports (Doi, Kaga, Tomooka, & Vaughan, 2002; Tun Tun & Yamaguchi, 2007).

We analysed gene family expansion and contraction in 10 legumes and five other plant species. Of the 20,784 gene families identified among 15 species, 32 (0.15%) and 352 (1.69%) were significantly expanded or contracted in V. mungo, respectively, after the speciation from V. radiata (Figure 2a, Figure S4). We further investigated the functions of genes in the expanded families and observed a number of genes encoding protein kinases and transcription factors (Table S6). On contrary, a large number of contracted gene families were involved in the ubiquitination pathway and genes encoding pentatricopeptide repeat-containing proteins, which appeared to mediate gene expression through the regulation of RNA stability and translation (Manna, 2015) (Table S6). Of 25,859 protein-coding genes in black gram that had orthologues present in the other 14 species analysed, 7,091 (21.62%) genes were specific to black gram and only 404 (1.23%) genes were shared among legume species (Figure 2a). The proportions of species-specific (1.23%–1.56%) and legume-specific (20.87%–22.61%) genes were similar among the diploid Vigna species.

We used the 4DTv approach, which measures the number of transversions at fourfold degenerate synonymous sites (i.e., codons in which any base at the third position is translated into the same amino acid) to analyse orthologous gene pairs in order to estimate relative timing of evolutionary divergence between V. mungo and closely related phaseoloid species. Figure 2b showed the peak 4DTv distance of 0.053 (V. mungo–P. vulgaris), which was higher than the peak 4DTv distances of 0.037 (V. mungo–V. unguiculata), 0.021 (V. mungo–V. angularis), 0.0167 (V. mungo–V. radiata) and 0.0157 (V. mungo–V. reflex-pilosa), implying that V. mungo and P. vulgaris diverged prior to the speciation events separating V. mungo, V. radiata, V. angularis and V. reflexo-pilosa. Comparison of 5,527 pairs or paralogous genes residing in duplicated collinear blocks within V. mungo genome revealed a single peak at 0.25, suggesting that black gram has experienced only one ancient whole genome duplication event in contrast to the V. reflexo-pilosa genome, which had a sharp peak at 0.019 indicative of a recent genome-wide duplication event (Figure 2c). Examination of synteny with other phaseoloid species revealed extensive conservation between V. mungo and four warm-season legumes (V. radiata, V. angularis, V. unguiculata and P. vulgaris; Figure S5).

3.3 Repetitive sequence analysis

To analyse the repetitive sequences in V. mungo, we used a combination of de novo repeat identification tool, RepeatModeler, and homology search tools and found that 205.4 Mb (41.1%) of the genome assembly contained repetitive DNA (Table S7). The proportion of the repetitive sequences in V. mungo was lower than that reported for mungbean (50.1%) (Kang et al., 2014), V. angularis (44.5%) (Yang et al., 2015), V. unguiculata (49.5%) (Lonardi et al., 2019) and P. vulgaris (45%) (Schmutz et al., 2014). A genome-wide distribution plot showed that DNA elements, long terminal repeat (LTR) retrotransposons, long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs) were enriched near the centromeric regions (Figure 1). Classification of the observed transposable elements into known classes revealed that the majority of them were retrotransposons (24% of the assembly) whereas DNA transposons represented only 4.9% of the genome, consistent with the observations in other legume genomes (Table S7) (Varshney et al., 2012; Yang et al., 2015). LTRs were the predominant class of transposable elements in the V. mungo genome, occupying 57.1% of the repetitive DNA identified. The most abundant LTR superfamilies, Gypsy and Copia occupied 33.6% and 23.1% of the repetitive elements in the genome, respectively (Figure 3a, Table S7). The proportions of LTR retroelements, LINEs, SINEs and DNA transposons in the V. mungo genome are comparable to those in V. radiata (Kang et al., 2014), V. unguiculata (Lonardi et al., 2019) and V. angularis (Yang et al., 2015) genomes (Figure 3a).

The insertion times for LTR retrotransposons were estimated based on predicted full-length LTRs in the genome assembly. Even though there were ancient repeat sequences that inserted into the black gram genome over 10 MYA, nearly three quarters of the LTR retroelements (73%) integrated into the genome within the last five million years. Notably, the insertion time of ~ 23% of the LTR elements were more recent than 2 MYA (Figure 3b), suggesting that they started accumulating after the divergence of V. mungo and V. radiata from V. angularis. We also observed that LTR retrotransposons appeared to accumulate earlier in V. mungo, V. radiata and P. vulgaris genomes (2–5 MYA) than in the V. unguiculata genome (1–2 MYA; Figure 3b).

3.4 Genome evolution in the phaseoloid clade

To investigate chromosome evolution of black gram and other legumes within the phaseoloid clade, we analysed syntenic blocks across their genomes. Nine ancestral chromosomes were inferred on the basis of syntenic and phylogenetic relationships among those genomes based on 21,969 orthologous groups. Seven V. mungo chromosomes (chromosomes 1, 3, 4, 5, 7, 8 and 9) exhibited a one-to-one relationship with the V. radiata and V. unguiculata genomes while six V. mungo chromosomes exhibited a one-to-one relationship with the V. angularis and P. vulgaris genomes (Figure 4). Two V. mungo chromosomes (chromosomes 3 and 4) exhibited one-to-two syntenic relationship with G. max due to the whole genome duplication event in the Glycine genome; Figure 4). Relative to the nine ancestral chromosomes configuration, V. mungo, V. unguiculata and P. vulgaris genomes appeared to best preserve the ancestral karyotype, followed by V. angularis and V. radiata genomes with three out of nine chromosomes remaining in the ancestral state. Half of the chromosomes in Phaseolinae species and most G. max chromosomes were derived from a series of fusion and fission events.

3.5 V. mungo population structure

To investigate genetic diversity and variations in the germplasm, 89 black gram accessions were selected and shotgun-sequenced using MGI sequencing platform. A total of 3,164,866,082 of high quality, cleaned reads (474 Gb) were mapped to V. mungo genome assembly with an average mapping rate of 97.6% (Table S1). The population structure was explored using SNP markers at fourfold degenerate sites using STRUCTURE (Pritchard, Stephens, & Donnelly, 2000). Based on the Evanno method (Earl & vonHoldt, 2012; Evanno et al., 2005), we found that K = 3 was the best fit to the data (Figure 5, Figure S6). Our results showed that the 89 black gram accessions were an admixture of three subpopulations, and each subpopulation comprised accessions from different geographical origins/countries (Figure 5, Figure S6). These results were consistent with the previous genetic diversity study using simple sequence repeat markers (Kaewwongwal et al., 2015). The majority of V. mungo accessions used in this study were from India, Nepal and Pakistan (Table S1). Geographic vicinity, cultural ties and migration of people (Gartaula & Niehof, 2013) among these countries may account for the genetic admixture observed.

3.6 Alternative splicing in V. mungo

Alternative splicing contributes to the diversity of transcriptome and proteome. The availability of full-length transcript isoforms from the long-read PacBio sequencing technology allowed us to investigate the following alternative splicing events in black gram: alternative 5' donor site selection, alternative 3' acceptor site selection, exon skipping and intron retention. A total of 2,590 alternative splicing events were detected in V. mungo (Figure S7a). While the alternative 3' acceptor site selection (33%), alternative 5' donor site selection (30%) and intron retention (29%) events were observed at similar frequencies, exon skipping was the least prevalent mode of alternative splicing, representing only 8% of the total events (Figure S7a). Occasionally, different types of alternative splicing were observed in a combinatorial manner in a single gene. Figure S7b illustrated an example of a transcript that was subjected to multiple forms of alternative splicing. Alternative splicing serves to diversify an organism's transcriptome, and recent data suggest that it is one of the mechanisms that plants use to adapt to a changing environment (Reddy, 2007; Shang, Cao, & Ma, 2017; Wang & Brendel, 2006). Future studies using RNA samples from different developmental stages and various growth conditions will be required to thoroughly probe alternative splicing events in order to obtain a complete repertoire of transcript isoforms in V. mungo.

4 DISCUSSION

In this study, we employed the 10X Genomics technology to obtain a de novo whole genome assembly of V. mungo. The 10X Genomics linked-read strategy utilizes emulsion technology to partition long DNA fragments into micelles, within which small DNA fragments are amplified and tagged by a shared barcode. After sequencing, the barcodes are used to identify sequences that are in close proximity in the genome, and long DNA fragments can be reconstructed based on these linked reads (Ott et al., 2018). As the 10X Genomics linked-read technology utilizes Illumina sequencing, it is more cost-effective to generate the preliminary assembly using this approach compared to the long-read PacBio sequencing (Zhang, Sun, et al., 2019; Zhang et al., 2019) . The preliminary assembly obtained from the 10X Genomics linked-read strategy was 498.9 Mb with a scaffold N50 length of 5.2 Mb. To achieve higher contiguity, we further assembled the black gram genome using the long-range positional information from the Chicago and HiC techniques. The HiC approach identifies chromosomal interactions using chromosome conformation capture. HiC data provide long-range linkage information up to tens of megabases and can be used to generate chromosome-scale scaffolds (Burton et al., 2013; Marie-Nelly et al., 2014). The final assembly reported here is the first high-quality, chromosome-scale genome assembly in V. mungo, containing 11 chromosomes corresponding to the haploid chromosome number. The size of the preliminary assembly covered 86.9% of the estimated genome size based on flow cytometry analyses (Arumuganathan & Earle, 1991; Pal, 2006). Comparative genomics analyses based on sequence information from single-copy orthologous genes revealed that V. mungo and V. radiata diverged about 2.7 MYA. Unlike in V. reflexo-pilosa, there was no evidence supporting a recent whole genome duplication event in V. mungo. The proportion of repetitive elements in the black gram genome is slightly lower than the numbers reported for related Vigna species; however, it should be noted that the quality of the genome assemblies and/or the repeat identification methods used might affect the percentages of repetitive sequences reported in each species. The majority of LTR retrotransposons appeared to integrate into the genome within the last five million years.

We also obtained Iso-seq data from multiple tissues (leaf, root, stem, flower, 1-week-old pod and 3-week-old pod) and identified transcript variants exhibiting alternative splicing events. In contrast to the observations made in black gram, intron retention has been reported as the most prevalent alternative splicing mechanism in several plant species such as Arabidopsis (Marquez, Brown, Simpson, Barta, & Kalyna, 2012), G. max (Shen et al., 2014), V. radiata (Satyawan, Kim, & Lee, 2017), cotton (Feng, Xu, Liu, Cui, & Zhou, 2019), maize (Thatcher et al., 2016; Wang et al., 2016) and rice (Zhang, Sun, et al., 2019; Zhang, Zhou, et al., 2019). Our high-quality genome assembly along with the genomic variation information from the germplasm provides an invaluable resource for investigating marker-trait association at a whole genome level, gene expression analyses and comparative genomics and phylogenetic studies in legume species.

ACKNOWLEDGEMENTS

This study was supported by the National Omics Center under the National Science and Technology Development Agency, Thailand, grant number: 1000221.

AUTHOR CONTRIBUTIONS

W.P., and S.T. designed research study. W.P, T.Y., D.S., N.J., S.U., P.S., and K.L. performed laboratory work (sample collection, DNA and RNA extraction, library construction and sequencing). C.N., C.S., W.N., and W.K. performed bioinformatics analyses. W.P. wrote and revised the manuscript, and all authors reviewed it.

Open Research

DATA AVAILABILITY STATEMENT

V. mungo genome assembly, Iso-seq and RNA-seq data have been submitted to the DDBJ/EMBL/Genbank databases under BioProject number PRJNA623719: genome assembly–JABCND000000000; Iso-seq data–SRR11787985 and SRR11787359; RNA-seq data–SRR11775845, SRR11775821, SRR11775823, SRR11775824, SRR11775822, SRR11775544.

Supporting Information

REFERENCES

Arumuganathan, K., & Earle, E. D. (1991). Nuclear DNA content of some important plant species. Plant Molecular Biology Reporter, 9, 208–218. https://doi.org/10.1007/BF02672069
10.1007/BF02672069
CAS PubMed Web of Science® Google Scholar
Burton, J. N., Adey, A., Patwardhan, R. P., Qiu, R., Kitzman, J. O., & Shendure, J. (2013). Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nature Biotechnology, 31, 1119–1125. https://doi.org/10.1038/nbt.2727
10.1038/nbt.2727
CAS PubMed Web of Science® Google Scholar
Capella-Gutiérrez, S., Silla-Martínez, J. M., & Gabaldón, T. (2009). trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics (Oxford, England), 25, 1972–1973. https://doi.org/10.1093/bioinformatics/btp348
10.1093/bioinformatics/btp348
CAS PubMed Web of Science® Google Scholar
Chandel, K., Lester, R., & Starling, R. (1984). The wild ancestors of urid and mung beans (Vigna mungo (L.) Hepper and V. radiata (L.) Wilczek). Botanical Journal of the Linnean Society, 89, 85–96. https://doi.org/10.1111/j.1095-8339.1984.tb01002.x
10.1111/j.1095-8339.1984.tb01002.x
Web of Science® Google Scholar
Darriba, D., Posada, D., Kozlov, A. M., Stamatakis, A., Morel, B., & Flouri, T. (2019). ModelTest-NG: A new and scalable tool for the selection of DNA and Protein evolutionary models. Molecular Biology and Evolution, 37, 291–294. https://doi.org/10.1093/molbev/msz189
10.1093/molbev/msz189
Web of Science® Google Scholar
Doi, K., Kaga, A., Tomooka, N., & Vaughan, D. A. (2002). Molecular phylogeny of genus Vigna subgenus Ceratotropis based on rDNA ITS and atpB-rbcL intergenic spacer of cpDNA sequences. Genetica, 114, 129–145.
10.1023/A:1015158408227
CAS PubMed Web of Science® Google Scholar
Earl, D. A., & vonHoldt, B. M. (2012). STRUCTURE HARVESTER: A website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genetics Resources, 4, 359–361. https://doi.org/10.1007/s12686-011-9548-7
10.1007/s12686-011-9548-7
Web of Science® Google Scholar
Edgar, R. C. (2004). MUSCLE: A multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics, 5, 113.
10.1186/1471-2105-5-113
CAS PubMed Web of Science® Google Scholar
Emms, D. M., & Kelly, S. (2019). OrthoFinder: Phylogenetic orthology inference for comparative genomics. Genome Biology, 20, 238. https://doi.org/10.1186/s13059-019-1832-y
10.1186/s13059-019-1832-y
PubMed Web of Science® Google Scholar
Evanno, G., Regnaut, S., & Goudet, J. (2005). Detecting the number of clusters of individuals using the software STRUCTURE: A simulation study. Molecular Ecology, 14, 2611–2620. https://doi.org/10.1111/j.1365-294X.2005.02553.x
10.1111/j.1365-294X.2005.02553.x
CAS PubMed Web of Science® Google Scholar
Falush, D., Stephens, M., & Pritchard, J. K. (2003). Inference of population structure using multilocus genotype data: Linked loci and correlated allele frequencies. Genetics, 164, 1567–1587.
10.1111/j.1365-294X.2005.02553.x
CAS PubMed Web of Science® Google Scholar
Farrer, R. A. (2017). Synima: A Synteny imaging tool for annotated genome assemblies. BMC Bioinformatics, 18, 507. https://doi.org/10.1186/s12859-017-1939-7
10.1186/s12859-017-1939-7
PubMed Web of Science® Google Scholar
Feng, S., Xu, M., Liu, F., Cui, C., & Zhou, B. (2019). Reconstruction of the full-length transcriptome atlas using PacBio Iso-Seq provides insight into the alternative splicing in Gossypium australe. BMC Plant Biology, 19, 365. https://doi.org/10.1186/s12870-019-1968-7
10.1186/s12870-019-1968-7
PubMed Web of Science® Google Scholar
Gartaula, H., & Niehof, A. (2013). Migration to and from the Nepal terai: Shifting movements and motives. The South Asianist Journal, 2(2). 28–50.
Google Scholar
Haas, B. J., Delcher, A. L., Wortman, J. R., & Salzberg, S. L. (2004). DAGchainer: A tool for mining segmental genome duplications and synteny. Bioinformatics, 20, 3643–3646. https://doi.org/10.1093/bioinformatics/bth397
10.1093/bioinformatics/bth397
CAS PubMed Web of Science® Google Scholar
Haas, B. J., Salzberg, S. L., Zhu, W., Pertea, M., Allen, J. E., Orvis, J., … Wortman, J. R. (2008). Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biology, 9, R7. https://doi.org/10.1186/gb-2008-9-1-r7
10.1186/gb-2008-9-1-r7
CAS PubMed Web of Science® Google Scholar
Han, M. V., Thomas, G. W. C., Lugo-Martinez, J., & Hahn, M. W. (2013). Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Molecular Biology and Evolution, 30, 1987–1997. https://doi.org/10.1093/molbev/mst100
10.1093/molbev/mst100
CAS PubMed Web of Science® Google Scholar
Hoff, K. J., Lange, S., Lomsadze, A., Borodovsky, M., & Stanke, M. (2016). BRAKER1: Unsupervised RNA-seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics, 32, 767–769.
10.1093/bioinformatics/btv661
CAS PubMed Web of Science® Google Scholar
Hoff, K. J., Lomsadze, A., Borodovsky, M., & Stanke, M. (2019). Whole-genome annotation with BRAKER. Methods in Molecular Biology, 1962, 65–95.
10.1007/978-1-4939-9173-0_5
CAS PubMed Google Scholar
Hu, F., Lin, Y., & Tang, J. (2014). MLGO: Phylogeny reconstruction and ancestral inference from gene-order data. BMC Bioinformatics, 15, 354. https://doi.org/10.1186/s12859-014-0354-6
10.1186/s12859-014-0354-6
PubMed Web of Science® Google Scholar
Huang, X., Adams, M. D., Zhou, H., & Kerlavage, A. R. (1997). A tool for analyzing and annotating genomic sequences. Genomics, 46, 37–45. https://doi.org/10.1006/geno.1997.4984
10.1006/geno.1997.4984
CAS PubMed Web of Science® Google Scholar
Jasrotia, R. S., Iquebal, M. A., Yadav, P. K., Kumar, N., Jaiswal, S., Angadi, U. B., … Kumar, D. (2017). Development of transcriptome based web genomic resources of yellow mosaic disease in Vigna mungo. Physiology and Molecular Biology of Plants, 23, 767–777. https://doi.org/10.1007/s12298-017-0470-7
10.1007/s12298-017-0470-7
CAS PubMed Web of Science® Google Scholar
Jukes, T. H., & Cantor, C. R. (1969). Evolution of protein molecules. In H. N. Munro (Eds.), Mammalian Protein Metabolism, III (pp. 21–132). New York, NY: Academic Press.
10.1016/B978-1-4832-3211-9.50009-7
Google Scholar
Kaewwongwal, A., Kongjaimun, A., Somta, P., Chankaew, S., Yimram, T., & Srinives, P. (2015). Genetic diversity of the black gram [Vigna mungo (L.) Hepper] gene pool as revealed by SSR markers. Breeding Science, 65, 127–137.
10.1270/jsbbs.65.127
CAS PubMed Web of Science® Google Scholar
Kakati, P., Deka, S., Kotoki, D., & Saikia, S. (2010). Effect of traditional methods of processing on the nutrient contents and some antinutritional factors in newly developed cultivars of green gram [Vigna radiata (L.) Wilezek] and black gram [Vigna mungo (L.) Hepper] of Assam. India. International Food Research Journal, 17, 377–384.
CAS Google Scholar
Kang, Y. J., Kim, S. K., Kim, M. Y., Lestari, P., Kim, K. H., Ha, B.-K., … Lee, S.-H. (2014). Genome sequence of mungbean and insights into evolution within Vigna species. Nature Communications, 5, 5443. https://doi.org/10.1038/ncomms6443
10.1038/ncomms6443
CAS PubMed Web of Science® Google Scholar
Kozlov, A. M., Darriba, D., Flouri, T., Morel, B., & Stamatakis, A. (2019). RAxML-NG: A fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics, 35, 4453–4455. https://doi.org/10.1093/bioinformatics/btz305
10.1093/bioinformatics/btz305
CAS PubMed Web of Science® Google Scholar
Kriventseva, E. V., Tegenfeldt, F., Petty, T. J., Waterhouse, R. M., Simão, F. A., Pozdnyakov, I. A., … Zdobnov, E. M. (2015). OrthoDB v8: Update of the hierarchical catalog of orthologs and the underlying free software. Nucleic Acids Research, 43, D250–D256. https://doi.org/10.1093/nar/gku1220
10.1093/nar/gku1220
CAS PubMed Web of Science® Google Scholar
Kundu, A., Patel, A., Paul, S., & Pal, A. (2015). Transcript dynamics at early stages of molecular interactions of MYMIV with resistant and susceptible genotypes of the leguminous host, Vigna mungo. PLoS One, 10(4), e0124687. https://doi.org/10.1371/journal.pone.0124687
10.1371/journal.pone.0124687
PubMed Web of Science® Google Scholar
Kundu, A., Singh, P. K., Dey, A., Ganguli, S., & Pal, A. (2019). Complex molecular mechanisms underlying MYMIV-resistance in Vigna mungo revealed by comparative transcriptome profiling. Scientific Reports, 9, 1–13. https://doi.org/10.1038/s41598-019-45383-w
10.1038/s41598-019-45383-w
PubMed Web of Science® Google Scholar
Lavin, M., Herendeen, P. S., & Wojciechowski, M. F. (2005). Evolutionary rates analysis of Leguminosae implicates a rapid diversification of lineages during the tertiary. Systematic Biology, 54, 575–594. https://doi.org/10.1080/10635150590947131
10.1046/j.1095-8312.2002.00091.x
PubMed Web of Science® Google Scholar
Lieberman-Aiden, E., van Berkum, N. L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., … Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326, 289–293. https://doi.org/10.1126/science.1181369
10.1126/science.1181369
CAS PubMed Web of Science® Google Scholar
Lonardi, S., Muñoz-Amatriaín, M., Liang, Q., Shu, S., Wanamaker, S. I., Lo, S., … Close, T. J. (2019). The genome of cowpea (Vigna unguiculata [L.] Walp.). The Plant Journal, 98, 767–782.
10.1111/tpj.14349
CAS PubMed Web of Science® Google Scholar
Manna, S. (2015). An overview of pentatricopeptide repeat proteins and their applications. Biochimie, 113, 93–99. https://doi.org/10.1016/j.biochi.2015.04.004
10.1016/j.biochi.2015.04.004
CAS PubMed Web of Science® Google Scholar
Marie-Nelly, H., Marbouty, M., Cournac, A., Flot, J.-F., Liti, G., Parodi, D. P., … Koszul, R. (2014). High-quality genome (re) assembly using chromosomal contact data. Nature Communications, 5, 1–10. https://doi.org/10.1038/ncomms6695
10.1038/ncomms6695
Web of Science® Google Scholar
Marquez, Y., Brown, J., Simpson, C., Barta, A., & Kalyna, M. (2012). Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis. Genome Research, 22, 1184–1195.
10.1101/gr.134106.111
CAS PubMed Web of Science® Google Scholar
Ott, A., Schnable, J. C., Yeh, C.-T., Wu, L., Liu, C., Hu, H.-C., … Schnable, P. S. (2018). Linked read technology for assembling large complex and polyploid genomes. BMC Genomics, 19, 651. https://doi.org/10.1186/s12864-018-5040-z
10.1186/s12864-018-5040-z
PubMed Web of Science® Google Scholar
Ou, S., & Jiang, N. (2018). LTR_retriever: A highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiology, 176, 1410–1422. https://doi.org/10.1104/pp.17.01310
10.1104/pp.17.01310
CAS PubMed Web of Science® Google Scholar
Paajanen, P., Kettleborough, G., López-Girona, E., Giolai, M., Heavens, D., Baker, D., … Clark, M. D. (2019). A critical comparison of technologies for a plant genome sequencing project. Gigascience, 8, 1–12. https://doi.org/10.1093/gigascience/giy163
10.1093/gigascience/giy163
CAS Web of Science® Google Scholar
Pal, A. (2006). Flow cytometry: A comparatively new tool in Plant Biotechnology. In A. Kumar, S. Roy, & S. K. Sopory (Eds.), Plant biotechnology and its application (pp. 193–204). New Delhi, India: International Pvt Ltd..
Google Scholar
Paradis, E., Claude, J., & Strimmer, K. (2004). APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics, 20, 289–290. https://doi.org/10.1093/bioinformatics/btg412
10.1093/bioinformatics/btg412
CAS PubMed Web of Science® Google Scholar
Pootakham, W., Naktang, C., Sonthirod, C., Yoocha, T., Sangsrakru, D., Jomchai, N., … Tangphatsornruang, S. (2018). Development of a novel reference transcriptome for scleractinian coral porites lutea using single-molecule long-read isoform sequencing (Iso-Seq). Frontiers in Marine Science, 5(122). https://doi.org/10.3389/fmars.2018.00122
Google Scholar
Pootakham, W., Sonthirod, C., Naktang, C., Ruang-Areerate, P., Yoocha, T., Sangsrakru, D., … Tangphatsornruang, S. (2017). De novo hybrid assembly of the rubber tree genome reveals evidence of paleotetraploidy in Hevea species. Scientific Reports, 7, 41457. https://doi.org/10.1038/srep41457
10.1038/srep41457
CAS PubMed Web of Science® Google Scholar
Pritchard, J. K., Stephens, M., & Donnelly, P. (2000). Inference of population structure using multilocus genotype data. Genetics, 155, 945–959.
10.1111/j.1365-294X.2004.02396.x
CAS PubMed Web of Science® Google Scholar
Putnam, N. H., O'Connell, B. L., Stites, J. C., Rice, B. J., Blanchette, M., Calef, R., … Green, R. E. (2016). Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Research, 26, 342–350. https://doi.org/10.1101/gr.193474.115
10.1101/gr.193474.115
CAS PubMed Web of Science® Google Scholar
R Core Team (2016). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
Google Scholar
Raizada, A., & Souframanien, J. (2019). Transcriptome sequencing, de novo assembly, characterisation of wild accession of blackgram (Vigna mungo var. silvestris) as a rich resource for development of molecular markers and validation of SNPs by high resolution melting (HRM) analysis. BMC Plant Biology, 19, 358.
10.1186/s12870-019-1954-0
PubMed Web of Science® Google Scholar
Reddy, A. S. (2007). Alternative splicing of pre-messenger RNAs in plants in the genomic era. Annual Review of Plant Biology, 58, 267–294. https://doi.org/10.1146/annurev.arplant.58.032806.103754
10.1146/annurev.arplant.58.032806.103754
CAS PubMed Web of Science® Google Scholar
Ren, L., Huang, W., & Cannon, S. B. (2019). Reconstruction of ancestral genome reveals chromosome evolution history for selected legume species. New Phytologist, 223, 2090–2103. https://doi.org/10.1111/nph.15770
10.1111/nph.15770
PubMed Web of Science® Google Scholar
Sato, S., Nakamura, Y., Kaneko, T., Asamizu, E., Kato, T., Nakao, M., … Tabata, S. (2008). Genome structure of the legume, Lotus japonicus. DNA Research, 15, 227–239. https://doi.org/10.1093/dnares/dsn008
10.1093/dnares/dsn008
CAS PubMed Web of Science® Google Scholar
Satyawan, D., Kim, M. Y., & Lee, S.-H. (2017). Stochastic alternative splicing is prevalent in mungbean (Vigna radiata). Plant Biotechnology Journal, 15, 174–182.
10.1111/pbi.12600
CAS PubMed Web of Science® Google Scholar
Schmutz, J., Cannon, S. B., Schlueter, J., Ma, J., Mitros, T., Nelson, W., … Jackson, S. A. (2010). Genome sequence of the palaeopolyploid soybean. Nature, 463, 178–183. https://doi.org/10.1038/nature08670
10.1038/nature08670
CAS PubMed Web of Science® Google Scholar
Schmutz, J., McClean, P. E., Mamidi, S., Wu, G. A., Cannon, S. B., Grimwood, J., … Jackson, S. A. (2014). A reference genome for common bean and genome-wide analysis of dual domestications. Nature Genetics, 46, 707. https://doi.org/10.1038/ng.3008
10.1038/ng.3008
CAS PubMed Web of Science® Google Scholar
Sebastian, P., Schaefer, H., Telford, I. R. H., & Renner, S. S. (2010). Cucumber (Cucumis sativus) and melon (C. melo) have numerous wild relatives in Asia and Australia, and the sister species of melon is from Australia. Proceedings of the National Academy of Sciences USA, 107, 14269–14273.
10.1073/pnas.1005338107
CAS PubMed Web of Science® Google Scholar
Shang, X., Cao, Y., & Ma, L. (2017). Alternative splicing in plant genes: A means of regulating the environmental fitness of plants. International Journal of Molecular Sciences, 18. https://doi.org/10.3390/ijms18020432
10.3390/ijms18020432
Web of Science® Google Scholar
Shen, Y., Zhou, Z., Wang, Z., Li, W., Fang, C., Wu, M., … Tian, Z. (2014). Global dissection of alternative splicing in paleopolyploid soybean. The Plant Cell, 26, 996–1008. https://doi.org/10.1105/tpc.114.122739
10.1105/tpc.114.122739
CAS PubMed Web of Science® Google Scholar
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V., & Zdobnov, E. M. (2015). BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics, 31, 3210–3212. https://doi.org/10.1093/bioinformatics/btv351
10.1093/bioinformatics/btv351
CAS PubMed Web of Science® Google Scholar
Souframanien, J., & Reddy, K. S. (2015). De novo assembly, characterization of immature seed transcriptome and development of genic-SSR markers in black gram [Vigna mungo (L.) Hepper]. PLoS One, 10, e0128748. https://doi.org/10.1371/journal.pone.0128748
10.1371/journal.pone.0128748
CAS PubMed Web of Science® Google Scholar
Stanke, M., Steinkamp, R., Waack, S., & Morgenstern, B. (2004). AUGUSTUS: A web server for gene finding in eukaryotes. Nucleic Acids Research, 32, W309–W312. https://doi.org/10.1093/nar/gkh379
10.1093/nar/gkh379
CAS PubMed Web of Science® Google Scholar
Tempel, S. (2012). Using and understanding RepeatMasker. Methods in Molecular Biology, 859, 29–51.
10.1007/978-1-61779-603-6_2
CAS PubMed Google Scholar
Thatcher, S. R., Danilevskaya, O. N., Meng, X., Beatty, M., Zastrow-Hayes, G., Harris, C., … Li, B. (2016). Genome-wide analysis of alternative splicing during development and drought stress in maize. Plant Physiology, 170, 586–599. https://doi.org/10.1104/pp.15.01267
10.1104/pp.15.01267
CAS PubMed Web of Science® Google Scholar
Tun Tun, Y., & Yamaguchi, H. (2007). Phylogenetic relationship of wild and cultivated vigna (Subgenus Ceratotropis, Fabaceae) from Myanmar based on sequence variations in non-coding regions of trnT-F. Breeding Science, 57, 271–280. https://doi.org/10.1270/jsbbs.57.271
10.1270/jsbbs.57.271
Web of Science® Google Scholar
Varshney, R. K., Chen, W., Li, Y., Bharti, A. K., Saxena, R. K., Schlueter, J. A., … Jackson, S. A. (2012). Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers. Nature Biotechnology, 30, 83. https://doi.org/10.1038/nbt.2022
10.1038/nbt.2022
CAS Web of Science® Google Scholar
Varshney, R. K., Song, C., Saxena, R. K., Azam, S., Yu, S., Sharpe, A. G., … Cook, D. R. (2013). Draft genome sequence of chickpea (Cicer arietinum) provides a resource for trait improvement. Nature Biotechnology, 31, 240. https://doi.org/10.1038/nbt.2491
10.1038/nbt.2491
CAS PubMed Web of Science® Google Scholar
Wang, B. B., & Brendel, V. (2006). Genomewide comparative analysis of alternative splicing in plants. Proceedings of the National Academy of Sciences USA, 103, 7175–7180. https://doi.org/10.1073/pnas.0602039103
10.1073/pnas.0602039103
CAS PubMed Web of Science® Google Scholar
Wang, B. O., Tseng, E., Regulski, M., Clark, T. A., Hon, T., Jiao, Y., … Ware, D. (2016). Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing. Nature Communications, 7, 11708. https://doi.org/10.1038/ncomms11708
10.1038/ncomms11708
CAS PubMed Web of Science® Google Scholar
Wu, T., & Watanabe, C. (2005). GMAP: A genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics, 21, 1859–1875. https://doi.org/10.1093/bioinformatics/bti310
10.1093/bioinformatics/bti310
CAS PubMed Web of Science® Google Scholar
Yang, K., Tian, Z., Chen, C., Luo, L., Zhao, B., Wang, Z., … Wan, P. (2015). Genome sequencing of adzuki bean (Vigna angularis) provides insight into high starch and low fat accumulation and domestication. Proceedings of the National Academy of Sciences USA, 112, 13213–13218.
10.1073/pnas.1420949112
CAS PubMed Web of Science® Google Scholar
Yang, Z. (2007). PAML 4: Phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution, 24, 1586–1591. https://doi.org/10.1093/molbev/msm088
10.1093/molbev/msm088
CAS PubMed Web of Science® Google Scholar
Young, N. D., Debellé, F., Oldroyd, G. E. D., Geurts, R., Cannon, S. B., Udvardi, M. K., … Roe, B. A. (2011). The Medicago genome provides insight into the evolution of rhizobial symbioses. Nature, 480, 520–524. https://doi.org/10.1038/nature10625
10.1038/nature10625
CAS PubMed Web of Science® Google Scholar
Zhang, G., Sun, M., Wang, J., Lei, M., Li, C., Zhao, D., … Zhang, B. (2019). PacBio full-length cDNA sequencing integrated with RNA-seq reads drastically improves the discovery of splicing transcripts in rice. The Plant Journal, 97, 296–305. https://doi.org/10.1111/tpj.14120
10.1111/tpj.14120
CAS PubMed Web of Science® Google Scholar
Zhang, L., Zhou, X., Weng, Z., & Sidow, A. (2019). Assessment of human diploid genome assembly with 10x Linked-Reads data. GigaScience, 8. https://doi.org/10.1093/gigascience/giz141
10.1093/gigascience/giz141
Web of Science® Google Scholar
Zhuang, W., Chen, H., Yang, M., Wang, J., Pandey, M. K., Zhang, C., … Varshney, R. K. (2019). The genome of cultivated peanut provides insight into legume karyotypes, polyploid evolution and crop domestication. Nature Genetics, 51, 865–876. https://doi.org/10.1038/s41588-019-0402-2
10.1038/s41588-019-0402-2
CAS PubMed Web of Science® Google Scholar

Citing Literature

Volume21, Issue1

January 2021

Pages 238-250

Filename	Description
men13243-sup-0001-FigS1-S7.pdfPDF document, 1.2 MB	Fig S1-S7
men13243-sup-0002-TabS1-S7.xlsxapplication/excel, 1.6 MB	Tab S1-S7

A chromosome-scale assembly of the black gram (Vigna mungo) genome

Abstract

1 INTRODUCTION

2 MATERIALS AND METHODS

2.1 Plant materials and DNA/RNA extraction

2.2 DNA and RNA library preparation and sequencing

2.3 Chicago library preparation and sequencing