QTL-Seq Associated With Grain Elongation in Rice (Oryza sativa L.) Using Bulked Segregant Analysis (BSA) Approach
Funding: The authors received no specific funding for this work.
ABSTRACT
Quantitative trait locus (QTL) identification is the prerequisite for an effective molecular plant breeding programme for introgression of genes of interest via marker-assisted selection (MAS) technology. The bulk segregant analysis (BSA) is a high-throughput QTL mapping approach to rapidly identify genomic loci regulating the trait of interest. QTL-Seq identifies candidate genomic regions more efficiently and entails an improved understanding of the molecular mechanisms underlying the traits. Among the various grain quality characteristics, grain elongation (GE) is one of the most important and targeted traits of basmati rice. GE is the most desirable feature of basmati rice that is influenced by starch properties. The inheritance of GE in rice has not been evidently illuminated because of its composite and variable pattern. In this study, we reported the QTLs responsible for GE using a QTL-seq (bulked segregant analysis + whole genome resequencing) approach based on the F2 population of the rice crop. Genome-wide SNP profiling of extreme phenotypic bulks from the F2 population of Basmati 370 and Pusa Basmati 1121 identified the genomic regions on Chromosomes 1, 2, 3 and 6. These genomic regions were further validated using CAPS/dCAPS markers in the F3 population of rice. Identified markers can be used in future rice improvement programmes to enhance the export potential of basmati rice with improved GE with distinct starch properties.
1 Introduction
Rice (Oryza sativa L.), among all other cereal crops, is a key to food security for at least half of the world's population (Kumar et al. 2021). The green revolution has led to the development of new cultivars with the advancement in irrigation, infrastructure, novel management approaches and synthetic fertilisers and pesticides (Bazrkar-Khatibani et al. 2019). Basmati rice is cultivated in the Indian subcontinent and is an important contributor to the economies of India and Pakistan (Bao 2014). It occupies a significant position in the international market due to its long slender grains, pleasant exquisite aroma, longer shelf-life, fine cooking quality, sweet taste, fluffy texture, palatability and easy digestibility. The grain quality characteristics of basmati rice are determined by various physiological and biochemical properties. However, among the various physicochemical properties, the composition of protein and starch present in the rice endosperm plays a significant role in determining quality of rice. In the world market, India contributes 70% of its share in basmati rice export. Presently, rice breeders are more focused on improving the grain quality of rice for export in global markets (Asante 2017). The grain quality of basmati rice entails distinct characteristics such as grain length (L), grain breadth (B), L/B ratio, elongation ratio, milling percentage, organoleptic properties, nutritional properties and aroma. In addition to these distinctive features, basmati retains a low glycemic index but is known to contain substantial amounts of micronutrients, for instance, iron and zinc (Ahuja, Ahuja, and Ahuja 2019). The cooking quality and organoleptic properties of rice depend upon the gel consistency, gelatinization temperature (GT), amylose content, volume expansion and water absorption (Butt et al. 2008). In basmati rice, linear grain elongation (GE) is a desirable trait that is mainly influenced by the GT as well as amylose present in the endosperm.
The complex traits are controlled by QTLs, and the approach applied to identify QTLs/genes is known as QTL mapping. To dissect the complex phenotypic traits, both QTL mapping and positional cloning approaches are used. The density of molecular markers is increased across the candidate region for further refinement, termed as fine mapping, followed by current physical position on the chromosomes (physical mapping). The efficiency of these approaches in crop breeding is reduced and has low resolution, low throughput and time-consuming (Majeed et al. 2022). Therefore, bulked segregant analysis (BSA), a high-throughput QTL mapping approach comes into play that differentiates the genomic region linked with a particular trait. The advancement of sequencing technologies leads to better integration of BSA with sequencing for the detection of marker-trait associations. The integration of sequencing with BSA was termed BSA seq (Zhang and Panthee 2020). The BSA-seq technique is used for the detection of genomic loci linked to various traits in which only extreme contrasting genotypes are selected from the segregating populations, such as F2, backcross and RILs. Progeny with extreme phenotypes is used to generate two bulks of DNA samples instead of the entire population, and DNA markers with differences between the two bulks are screened. BSA allied with whole-genome sequencing (BSA-seq) can effectively curtail genotyping costs by adopting a selective sampling approach; furthermore, the statistical efficacy of QTL mapping is analogous to that of a full-population study. Due to the significant evolution of next-generation sequencing technology, sequencing data have been applied in fabricating high density genetic maps, which markedly enhances the efficiency of mining the QTLs regulating essential agronomical traits in crops (Li et al. 2016; Lu et al. 2017).
Cooked GE is an important characteristic of rice cooking quality that is influenced by starch properties (Nur Suraya et al. 2020). The consumers and the market prefer the lengthwise elongation of the basmati rice grain after cooking. GE trait could be measured by the ratio of GE (cooked grain length ratio to milled grain length) (Arikit et al. 2019) and by the GE index (a proportionate rice grain change after cooking). GE has physicochemical characteristics, which is influenced by various factors such as genetic makeup of the variety, ageing time, ageing temperature, water uptake, GT and amylose content (Ojha et al. 2018) and by the GE index (a proportionate rice grain change after cooking). The inheritance pattern of GE has not been clearly elucidated due to its complex and inconsistent pattern (Arikit et al. 2019).
The basmati rice cultivated in the Jammu region of Jammu and Kashmir in India is world famous but possesses a lower GE characteristic compared to the other basmati varieties. Although the microclimate of this region produces a high aroma in locally bred basmati varieties, the GE after cooking is slightly low compared to improved basmati varieties grown in other parts of the country. Moreover, longitudinal GE of basmati rice after cooking is the choice of the consumers as well as in the world market. To intensify the export potential and satisfy the consumers' choice, there is a need to increase the GE characteristic of local basmati varieties grown in this region. In this context, the present study was undertaken to identify QTLs associated with GE in basmati rice using a QTL-seq approach based on F2 populations derived from a cross between Basmati 370 (low GE variety) and Pusa Basmati 1121 (high GE variety). The candidate genomic regions identified in the study suggest that the QTLs/genes responsible for GE could be those involved in the starch biosynthetic pathway. For genetic improvement of basmati rice varieties, the identified QTLs/genes associated with GE can be further used in basmati rice breeding programmes for more GE ratio.
2 Materials and Methods
2.1 Plant Material
The study involved two contrasting parents, namely, Basmati 370 and Pusa Basmati 1121. Basmati 370 is a very old variety and has been widely cultivated since 1933, before the partition of India and Pakistan. The seed material in the present study consisted of an elite short grain variety, Basmati 370, of Jammu region (R.S. Pura), and an essentially derived variety, Pusa Basmati 1121, which has evolved through the process of hybridization over a long breeding process. The School of Biotechnology, Faculty of Agriculture, Sher-e-Kashmir University of Agricultural Sciences and Technology of Jammu, is purifying and multiplying the nucleus as well as the seed of Basmati 370. However, Pusa Basmati 1121 seeds were procured from the Indian Agricultural Research Institute (IARI), New Delhi, India, through the National Bureau of Plant Genetic Resources, Government of India, as per the guidelines of the Standard Material Transfer Agreement (MTA) of the International Treaty on Plant Genetic Resources for Food and Agriculture under Article 12.4 of the Treaty. All the standard guidelines were followed for using the rice crop varieties of the present study.
2.2 Site Description
The field experiments were conducted at the Research Farm of the School of Biotechnology, Faculty of Agriculture, Sher-e-Kashmir University of Agricultural Sciences & Technology of Jammu, Chatha, Jammu & Kashmir, India. This experimental field is located at an elevation of 332 m above the mean sea level, along with 32°39′N latitude and 74°58′E longitude, in addition to an annual rainfall of 1000 mm, thereby defining subtropical conditions. The soil surface of the investigational field was clay loam in appearance, and the content of organic carbon was not more than 0.50%. The temperature of the region varied from 30°C to 41°C during the course of distinct growth phases of rice, starting from the broadcasting of seeds in a nursery in mid-May to maturity stages at the end of October. The offseason crop was raised at the Research Farm of the National Rice Research Institute, Cuttack, Odisha, India, to advance the generation. The contemporary research work was carried out in the Plant Molecular Biology Laboratory of the School of Biotechnology, Sher-e-Kashmir University of Agricultural Sciences & Technology of Jammu, Chatha, Jammu & Kashmir, India, during 2019–2022.
2.3 Development of Mapping Population for GE
Pusa Basmati 1121 was selected as the high GE parent, and an elite variety, Basmati 370 of Jammu region (Ranbir Singh Pura), was selected as the low GE parent to generate a segregated population for evaluation of the cooked GE ratio. Crosses between Basmati 370 and Pusa Basmati 1121 were made using an emasculation method to generate F1 seeds. These F1 seeds were raised in the fields at the Research Farm of the School of Biotechnology, Faculty of Agriculture, Sher-e-Kashmir University of Agricultural Sciences & Technology of Jammu, Chatha, Jammu and Kashmir, India, during the kharif 2020. The F2 seeds were collected from a few self-pollinated F1 plants and grown in the Research Farm of the National Rice Research Institute, Cuttack, Odisha, India, during the rabi season 2020–2021, to generate the F2 population. Approximately 200 F2 plants were grown and self-pollinated to produce F3 seeds, which were used to evaluate the phenotypic traits. The F2 plants were used as planting material for extracting DNA, and the F2:3 seeds derived from F2 progenies were used for evaluation of the phenotype.
2.4 Evaluation of GE
2.5 Generation of High Grain Elongation and Low Grain Elongation Bulk
QTL-seq (Zhang et al. 2020) analysis was effectuated by assembling two classes of the F2 lines with distinguished phenotypes, for instance, high grain-elongation (HGE) and low grain-elongation (LGE) ratios. About 40 F2 plants were chosen to develop HGE and LGE bulks, each with 20 plants. Leaf samples were collected from each F2 plant, and genomic DNA was individually isolated using the CTAB DNA extraction method (Aboul-Maaty and Oraby 2019). DNA samples from 20 plants with extremely high GE ratios were mixed in equal amounts and used as an HGE-pool, and DNA samples from 20 plants with extremely LGE ratios were mixed in equal amounts and used as an LGE-pool.
2.6 Whole Genome Sequencing
DNA-seq libraries were constructed from the genomic DNA of the two pools as well as from that of the two parents using the TruSeq DNA Nano kit. DNA-seq libraries were sequenced using the whole genome Illumina NovaSeq 6000 system sequencing platform (Walkowiak et al. 2020) to produce paired-end read statistics with a sequencing depth of nearly 40× of the rice genome (~400 Mb) for a specific pool and 20× for parental plants.
2.7 Read Mapping, SNP Calling and SNP Index Analysis
The raw sequencing data were filtered in accordance with the strict parameters to attain high-quality data. QTL-Seq analysis was achieved by adopting the QTL-Seq pipeline (Thianthavon et al. 2021). The default parameters were selected to locate the variations, SNP index and QTL associated with GE. At the beginning of the analysis, the sequencing data of either parent were imperative to generate a reference genome of the parent to be utilised as the reference for read mapping of the two bulk samples. In this study, we used Basmati 370 as a reference throughout the entire analysis pipeline. Primarily, the clean reads of the Basmati 370 parent were aligned to the public reference genome (Nipponbare: IRGSP1.0) using HISAT 2. The variants illustrating the Basmati 370 parent were then used to develop the Basmati 370 reference genome by substituting the bases in the genome. DNA variants, including single nucleotide polymorphisms (SNP) and small insertions and deletions (Indels), were identified in the HGE and LGE bulks by mapping reads onto the Basmati 370 reference genome. Then, the SNP index at each SNP position was computed for the HGE and LGE bulks. Focusing on a particular SNP position in the genome, the SNP index is calculated as a ratio of PE short reads aligned to that position with a nucleotide different from that of the reference sequence. For example, an identical base is denoted as a reference (REF) base, and another base is referred to as the alternative (ALT) base. The SNP positions with SNP indexes < 0.3 in Bulk 1 (LGE) and > 0.8 in Bulk 2 (HGE), along with the read depths > 29 in both pools, were selected as SNPs. A high impact of Bulk 2 was observed while identifying peaks or valleys of the SNP index plot. For positions in the genome where the entire short reads match the reference sequence, we assign a SNP-index of 0. The ∆ (SNP index) was then calculated using the following formula: [SNP index (HGE bulk) − SNP index (LGE bulk)]. The distribution of average SNP index and ∆ (SNP index) was estimated in a given genomic interval using a sliding window approach with a 2-Mb window size and 10-kb increment step and plotted to generate SNP index plots for all rice chromosomes. The use of m for sliding window analysis after taking the average of 10 SNP-indices was important to reduce the noise in the plot. The candidate genomic regions for GE were determined based on the sliding window plots. Only the regions in which the average ∆ (SNP index) of a locus was significantly greater than the surrounding region and windows that exhibited an average p value < 0.05 were considered. After the candidate genomic region was identified, SNPs were looked for in the SNP index p99 file (obtained from the SNP index analysis file at the 99% probability level). The SNPs were sorted at particular positions (LGE < 0.3 and HGE > 0.8) within the identified confidence interval range. Thus, those particular positions (at which SNPs were identified) were checked in the VCF file to obtain a locus ID. The effects of the obtained SNPs were annotated using SnpEff 5.0 (Cingolani et al. 2012).
2.8 SNP Genotyping and Marker-Trait Association Analysis
To study the effect of identified candidate markers on the population, the F3 population (Basmati 370 x Pusa Basmati 1121) was genotyped with the dCAPS marker. The identified SNPs were converted into CAPS and dCAPS markers using dCAPS Finder 2.0 for the development of cost-effective gel-based markers. The dCAPS markers were designed based on SNP positions for Chromosomes 1, 2 and 6. The predicted dCAPS candidates were amplified on parental genotypes (Basmati 370 and Pusa Basmati 1121), along with high and LGE bulks. PCR amplicons of each dCAPS marker were subjected to digestion with their respective restriction enzymes. The restricted samples were checked using agarose gel electrophoresis (Genetix Biotech Asia Pvt. Ltd.). Single marker analysis was used to study the marker-trait association using the QTL IciMapping software tool (Meng et al. 2015). The effect of identified candidate markers on the population was studied by using the genotypic data from the three markers and the phenotypic data of 80 F3 lines. The mean phenotypic data generated for the GE characteristic used for creating the pool were used for executing this analysis. The percentage of phenotypic variance elucidated by each QTL (R2) was evaluated by simple regression analysis (Haley and Knott 1992). Based on the p value, the marker was determined to regulate the trait of interest.
3 Results
3.1 Performance of GE-Related Traits in Parents and the Two Extreme Pools
To rapidly distinguish the QTLs associated with GE with the help of the QTL-seq technique, we developed an F2 mapping population derived from a cross between Basmati 370 and Pusa Basmati 1121. Both parental lines are aromatic in nature but vary in GE. Pusa Basmati 1121 was selected as the HGE parent, and Basmati 370 was selected as the LGE parent to generate a segregating population for the GE trait. Basmati 370 comprises of LGE ratio (1.5) along with intermediate-low amylose content (AC) (21%), whereas Pusa Basmati 1121 comprises of high GE ratio (2.5) and intermediate-high amylose content (24%). The average length of raw kernels of Pusa Basmati 1121 was comparatively higher than Basmati 370 (9 mm vs. 7.7 mm, respectively). Moreover, the average length of cooked grains of Pusa Basmati 1121 ascends higher in comparison to Basmati 370 (22 mm vs. 11 mm, respectively). In the i2 progenies, the lengths of raw kernels were in the range of 6.027 to 8.111 mm, whereas the lengths of cooked kernels were in the range of 11.071 to 18.364 mm (Table S1). The GE ratios of 200 F2 lines were between 1.67 and 2.53, and the frequency distribution revealed a Gaussian distribution, thereby indicating a polygenic mode of inheritance. The phenotypic traits of basmati rice such as grain length, grain breadth, length/breadth ratio, length of cooked kernel and GE were evaluated using digital vernier calliper (Mitutoyo, Japan). The recorded mean values along with standard error for 200 F2 plants are presented in Table S1. Based on the phenotypic data (Figure 1) generated from an F2 mapping population, two extreme bulks were prepared for GE characteristics and subsequently subjected to a QTL-seq pipeline. The genotypic pools were generated based on the average GE value obtained after conducting phenotypic evaluations in basmati rice. The gene pools were generated for BSA. Twenty plants each from two extreme groups of the F2 lines (derived from a cross between Basmati 370 and Pusa Basmati 1121), bearing distinctive phenotypes, that is, HGE and LGE ratios, were selected to generate HGE and LGE bulks (Figure 2). The average GE ratio for LGE (1.765) and HGE (2.415) bulks in distinct F2 plants is mentioned below in Table 1.


Low GE bulk | High GE bulk | ||
---|---|---|---|
Average GE | Genotype | Average GE | Genotype |
1.672 | 77 | 2.361 | 82 |
1.731 | 47 | 2.364 | 69 |
1.746 | 18 | 2.365 | 11 |
1.748 | 84 | 2.374 | 37 |
1.762 | 149 | 2.374 | 101 |
1.766 | 147 | 2.378 | 38 |
1.770 | 130 | 2.382 | 94 |
1.770 | 54 | 2.394 | 22 |
1.772 | 126 | 2.395 | 30 |
1.773 | 106 | 2.409 | 61 |
1.774 | 164 | 2.410 | 52 |
1.776 | 153 | 2.412 | 25 |
1.779 | 127 | 2.415 | 80 |
1.780 | 120 | 2.432 | 43 |
1.781 | 132 | 2.432 | 26 |
1.781 | 136 | 2.436 | 20 |
1.782 | 112 | 2.462 | 65 |
1.784 | 118 | 2.469 | 70 |
1.784 | 119 | 2.484 | 59 |
1.784 | 137 | 2.554 | 29 |
1.765 | Average value | 2.415 | Average value |
3.2 Whole Genome Resequencing and Read Mapping
DNA samples of HGE and LGE bulks, together with those of the two parents, were subjected to whole genome sequencing using Illumina NovaSeq 6000. As a result, 159-bp clean paired-end sequences were generated (Table 2), yielding approximately 36 GB for HGE bulk, 33 GB for LGE bulk, 9 GB for Basmati 370 and 15 GB for Pusa Basmati 1121. The low-quality sequences were filtered to exclusively obtain high-quality sequences, of which 90% or greater of the individual bases contained Phred scores of 30 or greater. Thus, approximately 226, 207, 99 and 57 million reads of paired-end sequences were retained in the HGE bulk, LGE bulk, Pusa Basmati 1121 and Basmati 370 lines, respectively, which were equivalent to 41×, 36×, 17× and 9× coverage of the rice genome in the HGE bulk, LGE bulk, Pusa Basmati 1121 and Basmati 370 lines (Table 3). The high-quality reads of Basmati 370 were used to generate the reference sequence of the Basmati cultivar. Alignment of the paired end (PE) reads generated from Basmati 370 to the Nipponbare reference genome IRGSP 1.0 resulted in an average depth of 9.63× along with 81.99% genome coverage, allowing us to prepare a reference-based assembly of Basmati 370. Mapping of the PE reads generated from the extreme bulks to the developed Basmati 370 reference genome for GE resulted in 41.72× and 35.91×, together with 83.61% and 82.42% coverage for the Basmati 370 reference assembly, respectively (Table 3). By aligning the high-quality reads of the two bulks onto the Basmati 370 reference sequence, a total of 1,343,246 SNPs were generally discovered in the two bulks (Table 4).
S. No. | Sample ID | Total reads | Total bases | Mean read length | Raw data (GB) |
---|---|---|---|---|---|
1 | Pusa1121 | 98,843,196 | 15,716,068,164 | 159 | 15.7161 |
2 | Basmati370 | 57,038,990 | 9,069,199,410 | 159 | 9.0692 |
3 | HGE | 226,186,728 | 36,416,063,208 | 161 | 36.4161 |
4 | LGE | 207,873,854 | 33,467,690,494 | 161 | 33.4677 |
S. No. | Sample | Sequencing coverage | paired_total | unpaired_total | % Aligned |
---|---|---|---|---|---|
1 | Pusa 1121 | 17.65 | 47,717,935 | 32,337,748 | 82.08 |
2 | Basmati 370 | 9.63 | 26,033,390 | 15,856,744 | 81.99 |
3 | HE | 41.72 | 111,417,976 | 64,867,878 | 83.61 |
4 | LGE | 35.91 | 95,918,111 | 55,034,828 | 82.42 |
Chromosome | Length |
Variants (SNPs) |
Variant rate |
---|---|---|---|
1 | 43,270,923 | 136,429 | 317 |
2 | 35,937,250 | 122,795 | 292 |
3 | 36,413,819 | 122,774 | 296 |
4 | 35,502,694 | 114,377 | 310 |
5 | 29,958,434 | 84,558 | 354 |
6 | 31,248,787 | 142,218 | 219 |
7 | 29,697,621 | 125,999 | 235 |
8 | 28,443,022 | 116,620 | 243 |
9 | 23,012,720 | 81,123 | 283 |
10 | 23,207,287 | 113,824 | 203 |
11 | 29,021,106 | 83,840 | 346 |
12 | 27,531,856 | 98,689 | 278 |
Total | 373,245,519 | 1,343,246 | 277 |
3.3 Candidate Genomic Region for GE
To determine candidate genomic regions associated with GE in basmati rice, the ‘SNP-index’ between the low and high bulks was compared. Targeting a particular SNP position in the genome, the SNP-index was calculated as a ratio of PE short reads aligned to that position with a nucleotide distinct from that of the reference sequence. The SNP-index represents the frequencies of parental alleles in the population of bulked individuals. In this case, Basmati 370 was taken as the reference genome, where an SNP-index of 1 demonstrated that the reads in a population are derived only from Pusa Basmati 1121 (other than the reference genome), whereas an SNP-index = 0 illustrated that the reads are derived only from the reference genome (Basmati 370) itself. An SNP-index of 0.5 indicated an equal genome contribution from both parents. A significant deviation from the SNP-index of 0.5 could specify the contribution of that particular SNP to the phenotypic divergence observed in the bulks (Singh et al. 2016). The computation of SNP-index values across the genome was completed. Additionally, the sliding window averages of 2-Mb intervals with 10-Kb increments were plotted for HGE and LGE bulks. To simplify the detection of differences in SNP-indices of HGE and LGE bulks, the values of the ΔSNP-index with a statistical confidence interval of p < 0.05 and p < 0.01 were plotted (Figure 3). Thus, significant genomic positions with a statistical significance of p < 0.05 were identified.

Based on the SNP index plots of the HGE and LGE bulks and the plots of ∆ (SNP index), we identified four candidate genomic regions for GE, one region each on Chromosomes 1, 2, 3 and 6 (Table 5). The genomic regions identified on these chromosomes exhibited contrasting patterns of SNP index graphs for HGE and LGE bulks (Figure 3). The plants in HGE bulk mainly had Pusa Basmati 1121 genomic segments in this region, whereas the plants in LGE bulk mainly had the Basmati 370-type genome. While identifying the candidate genomic regions on these chromosomes, the average SNP index of > 0.8 was taken in HGE bulk, whereas the average SNP index of < 0.3 was taken in LGE bulk with a read support of > 29 reads. The ∆ (SNP index) plots in these regions were mainly found to be above or near the statistical confidence intervals for this read depth (statistical significance under the null hypothesis: p < 0.05). These SNPs were found to be located in proximity to starch biosynthetic genes, that is, granule-bound starch synthase I (GBSS I) or Wx, starch synthases, starch branching enzymes (SBEs), starch debranching enzymes (DBE), Pullulanase (Pul) and ADP-Glc pyrophosphorylase (AGPase) (Tables S2 and S3).
Chromosome | Genomic region | Interval (Mb) | |
---|---|---|---|
Start | End | ||
1 | 13,298,129 | 20,662,043 | 7.36 |
2 | 122,730 | 966,598 | 0.84 |
3 | 32,102,495 | 35,245,799 | 3.14 |
6 | 21,969,348 | 29,054,306 | 7.08 |
3.4 SNP Annotation
The SNPs were identified with the help of BAM files obtained in the previous step using SAMtools 1.0. The effects of the obtained SNPs annotated using SnpEff 5.0 produced an output file comprising an annotated VCF file along with the HTML file. It included summary statistics related to discrete variants together with their annotations, as well as a text file summarising the number of variant types per gene. The annotation files predicted 1,343,246 variants (SNPs) over the entire genome length, with a variant rate of 1 variant every 277 bases (Table 3). The variants were listed according to the number of effects by impact, functional class, type and region (Table S4) (Figure 4). Transitions are induced at a higher frequency than transversions. They are less likely to result in amino acid substitutions and are therefore more likely to prevail as silent substitutions in populations like SNPs. About 3,281,165 transitions and 1,472,635 transversions were found to occur in the genome of basmati rice, amounting to Ts/Tv ratio of 2.2281 (Table 6).

A | C | G | T | |
---|---|---|---|---|
A | 0 | 52,452 | 212,899 | 60,101 |
C | 57,116 | 0 | 37,346 | 251,229 |
G | 251,7936 | 37,534 | 0 | 57,054 |
T | 60,603 | 213,529 | 51,590 | 0 |
- Note: Transitions: 3,281,165; transversions: 1,472,635; Ts/Tv ratio: 2.2281.
3.5 Validation of the Identified Genomic Regions on Chromosome 1, 2 and 6
We chose to examine SNPs by designing a dCAPS marker on Chromosomes 1, 2 and 6. Three dCAPS markers were developed based on their particular SNP positions. A dCAPS marker developed based on an SNP (G/A) at Position 14257147 on chromosome 1 (LOC_ Os01g0354700), a second dCAPS marker developed based on a SNP (C/A) at the position 122,730 on chromosome 2 (LOC_ Os02g0102300) and a third dCAPS marker developed based on a SNP (A/C) at the Position 21969348 on Chromosome 6 were used to validate 80 individual plants in HGE and LGE pools. A single marker analysis was performed to validate the association of the markers with the GE phenotypes. As a result, markers on Chromosome 1 did not show any restriction pattern when digested with the restriction enzyme Spe1, whereas genotypic data for the rest of the markers was used for performing a single marker analysis. The dCAPS markers on Chromosomes 2 and 6 depicted a clear polymorphism between Basmati 370 and Pusa Basmati 1121 after digestion with Rsa I and Hinf I restriction enzymes, respectively. Furthermore, the PCR amplicons showed corresponding results for the HGE and LGE bulks along with the other two parental lines (Basmati 370, a LGE parent, and Pusa Basmati 1121, a HGE parent) of the distinct mapping population. With the help of the single marker analysis approach, associations between molecular markers and traits of interest were detected. According to the marker-trait association results, the marker on Chromosome 2 depicted 11.93% of phenotypic variation (PVE) with a LOD score of 4.0448, whereas the marker on Chromosome 6 showed no association with the phenotype as the LOD score was less than 3 (Table 7).
Position | Chromosome | LOD | PVE (%) | Additive effect |
---|---|---|---|---|
122730 | 2 | 4.0448 | 11.93 | 0.102 |
21969348 | 6 | 2.6105 | 5.89 | 0.061 |
4 Discussion
Kernel elongation after cooking is an influential characteristic feature of fine rice, as most rice consumers select grains with length-wise elongation. Rice kernels absorb water at the time of cooking and swell up, thereby exhibiting a hike in grain length. The length-wise increase without any increment in girth or appearance of cracks in the kernel is premeditated to be the most desirable attribute of good-quality rice (Rajendran, Devi, and Prabhakaran 2021). Cooked GE acts as an essential characteristic in the cooking quality of rice (Nirmaladevi et al. 2015), which is influenced by starch properties. The texture of cooked rice is also affected by diverse attributes such as amylose content, gelatinization temperature, gel consistency and pasting viscosity (Balet et al. 2019). In a contemporary study, the GE ratio was estimated to regulate cooked GE in basmati rice. The primary aim of this study was to employ the QTL-seq technique to detect QTLs for GE in an F2 population derived from a cross between Basmati 370 and Pusa Basmati 1121, varying in GE. Pusa Basmati 1121 is a prominent Basmati rice cultivar with a cooked kernel elongation factor of 2.5-fold, that is, a cooked length of upto 22 mm, a fourfold increase in volume (Zhou, Xia, and He 2020). In this study, the average GE ratio of Pusa Basmati 1121 was found to be 2.44 with a cooked kernel length of 21 mm, whereas Basmati 370 possessed an average elongation ratio of 2.32 with a cooked kernel length ranging between 16 and 18 mm, respectively. The GE ratios of 200 F2 lines found to be ranging from 1.673 to 2.53 exhibited a frequency distribution in distinct classes of GE ratio close to the normal distribution, thereby specifying a polygenic mode of inheritance of this trait.
QTL-seq, along with its related approaches, has formerly proven its applicability in the rapid identification of trait-specific genomic regions in various crop plants, including rice. In addition to its importance in various crop plants, innumerable studies have authenticated its utilisation in other species as well, such as yeast. This approach has greatly enhanced the speedy detection of marker-trait association at high resolution, accuracy and reduced time-span in a high-throughput manner, as it aids in replacing the tedious procedures of the traditional QTL mapping approach, thereby making the analysis simpler (Takagi et al. 2013). The most attractive feature of this technique is that it takes only F2 generations to precisely locate trait-specific QTLs, thereby preserving a lot of time. According to the research, four QTLs were identified, each on Chromosomes 1, 2, 3 and 6. The markers (SNPs) located within the identified candidate genomic regions can be further used for high-resolution mapping experiments. Based on the genomic locations, functional annotation was accomplished using the SNPEff software tool. The output file obtained from this variant effect predictor programme depicted the variant effects of some common annotations, revealing putative impact, type, functional class, sequence variations, and so on. The plants in HGE bulk mainly had Pusa Basmati 1121 genomic segments in this region, whereas the plants in LGE bulk mainly had the Basmati 370-type genome. In Figure 4, the SNP index of ‘highest’ and ‘lowest’ bulks look similar in pattern but do not depict any mirror image, as the two parents (Basmati 370 and Pusa Basmati 1121) taken for the formation of bulks are closely related to each other (both being the varieties of Basmati). In this investigation, we took closely related species as the parental genomes rather than the contrasting ones (e.g., Basmati and non-Basmati) because of which mirror images could not be obtained. A simple evaluation of putative impact or deleteriousness involves high, moderate, low and modifier variants. These classified impact levels are predefined divisions based on the effect of the variant, to help users explore more significant variants. The variants (SNPs) with high impact affect splice sites, stop codons and start codons, whereas those with low impact alter synonymous or coding regions, stop and start codons. The variants with moderate impact are known to influence nonsynonymous, coding and start as well as stop codons. The modifiers reveal an alteration in the upstream, downstream and intergenic regions along with the 5′ or 3′ UTR regions of a gene. The variations identified in this study will prove useful for selecting candidate genes in specific target areas at the time of map-based gene cloning experiments with populations derived from crosses between elite Basmati varieties. Identifying various patterns among SNPs is a favourable approach in deciphering the evolution of species at the genomic level to understand the genetic variation among individuals and the role of selection pressure in inducing that variation. About 1,343,246 SNPs were found to be distributed across 12 chromosomes, covering the entire genome. Presently, SNPs are considered the marker of choice due to their massive presence in nearly entire populations of individuals. It was the most common genetic variant found in all individuals, with a variant rate of 277. The discernment of these biomarkers (SNPs) proves to be extremely important in terms of identifying functional significance, genetic mapping and population genetics studies. They also play a significant role in determining phenotypic distinctiveness within the rice plants. It could further aid the plant breeder in simplifying the selection of important phenotypic traits to locate rare recombinants in a huge population.
All these detected QTLs were found to be present near the locations of identified starch biosynthetic genes or its isoforms. In accordance with the results of marker-trait analysis, a dCAPS marker developed based on a SNP (C/A) at Position 122730 on Chromosome 2 showed an association with the GE phenotype bearing a LOD score of 4.0448 along with the phenotypic variation of 11.93%. However, another dCAPS marker that was developed based on an SNP (A/C) at Position 21969348 on Chromosome 6 revealed no strong association with the phenotype, as the LOD score was less than 3. Therefore, the marker exhibiting significant association with the phenotypic value of the trait is likely to be linked to a QTL governing the trait, confirming its suitability for application in marker-assisted selection. The identified QTLs, which are found to be associated with GE in Basmati rice, aid in providing new genomic resources for GE and can be further used in rice breeding programmes with increased GE characteristics, to enhance the export potential of Basmati rice in this region. Although these genes could be probable candidates given that they are involved in starch biosynthesis, fine mapping is still necessary to determine the causal genes within each candidate's genomic region.
Cooking quality is a composite trait that must be taken up for improvement (Faruq, Hadjim, and Meisner 2004). Most of the former reports outline diverse genetic significance for these traits, apart from commonly detected waxy (Wx) and alkali degeneration (Alk) loci on Chromosome 6 (Chen et al. 2020). Wx is encoded by GBSSI, which is responsible for amylose synthesis (Vrinten and Nakamura 2000) in the rice grain, thereby bringing about natural variation of amylose content, gel consistency and RVA pasting viscosity (Bao 2014). Wx/GBSSI and ALK/SSIIa regulate the maximum amount of natural variation in cooking and eating quality attributes (Zhou, Xia, and He 2020). The association between GE and gelatinization temperature (GT) has been cited in prior studies, wherein GE was contemplated as a physical phenomenon that is influenced by GT (Danbaba et al. 2011). In rice, GT is predominantly controlled by starch synthase IIa (SSIIa), which is positioned on Chromosome 6. SSIIa is primarily responsible for regulating thermal properties, gelatinization temperature (GT) and amylopectin structure (Sharma and Khanna 2019). The SSIIa gene, which exists on Chromosome 6, is a major QTL responsible for GT and amylopectin structure. This gene functions to elongate the short A and B1 chains with DP < 10 to form long B1 chains of amylopectin (Fujita et al. 2011). A recent study executed by Govindaraj et al. (2009) revealed that one major effective QTL is associated with the GE near the SSIIa (Alk) locus based on 86 doubled haploid lines derived from IR64 and Azucena, which show distinctive patterns in GT but display related amylose content. This study was further supported by Arikit et al. (2019), who revealed similar outcomes by characterising the gene responsible for gelatinization temperature, SSIIa, as a candidate gene in the qGE6.1 region. In this contemporary study, QTLs were discovered on Chromosome 6, identifying the genes responsible for cooked GE. Also, a QTL on Chromosome 6 was identified with a hypothetical protein function.
HGE in basmati rice may also be determined by the structural alignment of starch molecules in the endosperm. The biosynthetic pathway of starch in rice has been clearly illuminated (Bao 2014). The physicochemical properties of starch are greatly influenced by its two fundamental components, that is, amylose and amylopectin. In higher plant species, there are relatively six branches of enzymes that are involved in amylose and amylopectin biosynthesis. The enzymes include ADP-Glc pyrophosphorylase (AGPase), granule-bound starch synthase (GBSS), soluble starch synthase (SS), SBE, DBE and starch phosphorylase (SP) (Irshad et al. 2021). A considerable quantity of genes encoding distinct isoforms of various enzymes, including starch synthases and SBEs, are responsible for carrying out starch biosynthesis in rice grains. Starch synthase utilises ADP-glucose for chain elongation through α-1, 4-glycosidic linkages and thus directly activates the process of amylopectin biosynthesis (Miao et al. 2017). These groups of enzymes or their isoforms, which encode for enzymes involved in amylose and amylopectin biosynthesis, were ascertained in various QTLs detected on distinct chromosomes. Starch synthases encompass granule-bound starch synthase (GBSS) or Wx gene, which is involved in amylose biosynthesis, whereas soluble starch synthase divisions such as SSI, SSII, SSIII and SSIV are known to be involved in amylopectin synthesis (Kordrostami, Mafakheri, and Chaleshtori 2021). In plants, several SBE genes are present, and discrete SBE isoforms control the structural and functional properties of starch. In spite of the known fact that amylose content shows a positive correlation with GE ratio and that QTLs involving the Wx locus controlling cooked GE were additionally disclosed, the dCAPS marker on chromosome 6 did not depict any significant association with the GE ratio based on the F3 population derived from a cross between Basmati 370 and Pusa Basmati 1121. This conclusion substantially commends the role of amylopectin rather than amylose in association with GE characteristics in this type of population, as the results clearly reveal a significant association of the dCAPS marker on Chromosome 2 with the GE ratio. These identified markers, located within the identified candidate genomic regions, can be further used for fine mapping and cloning experiments.
Author Contributions
R.K.S. designed the experiments. P.J. performed the experiments. P.J. and R.K.S. wrote the manuscript. All authors have read and acknowledged the final manuscript.
Acknowledgements
We would like to thank the vice chancellor together with School of Biotechnology, SKUAST-J, for providing the financial assistance and lab facilities for executing this research work.
Conflicts of Interest
The authors declare no conflicts of interest.
Open Research
Data Availability Statement
The datasets generated and/or analysed during the current study are available in the INSDC and Indian Nucleotide Data Archive (INDA), INRP000051 (https://ibdc.rcb.res.in/inda/submittedStudyHome), with accession numbers PRJEB60058 and INRP000051, respectively.