Funding: This work was supported by the Korea Institute of Planning and Evaluation for Technology in Food, Agriculture, and Forestry (IPET) through the Digital Breeding Transformation Technology Development Program funded by the Ministry of Agriculture, Food, and Rural Affairs (MAFRA) (RS-2022-IP322069).

About

Sections

PDF

Tools

Share a link

Email
Wechat
Bluesky

ABSTRACT

Radish (Raphanus sativus), which belongs to the family Brassicaceae, has relatively limited genomic resources, especially for elite lines used in commercial breeding and other agricultural applications. Thus, this study aimed to provide a comprehensive catalogue of genome sequences for 100 elite radish lines used in the Korean industry for commercial breeding purposes. These lines were sequenced and mapped to the elite Bakdal genome. A total of 33,919 high-quality single nucleotide polymorphisms (SNPs) were identified and were found to be associated with eight distinct phenotypic traits. Five diverse machine learning (ML) models revealed that a subset of 198 SNPs had high predictive potential for the eight horticultural traits. Furthermore, the 100 elite lines were grouped into four clusters based on the eight traits, and their predictive potential was evaluated using the ML models trained using both individual and pooled SNPs. The accuracy ranged from 0.83 to 0.96 for the individually trained models and from 0.84 to 0.95 for the pooled models. This study provides a substantial basis for the advancement of digital/precision radish breeding.

1 Introduction

Radish (Raphanus sativus) is a valuable horticultural crop because of its nutritional content, multiple culinary uses and ability to thrive under various environmental conditions (Park et al. 2024; Al-Hamadany, Al-Jubouri, and Al-Shakarchy 2023; Raman 2022). Radish domestication is believed to have occurred in Asia prior to the Roman era, and its cultivation and consumption have since become widespread worldwide (Lewis-Jones, Thorpe, and Wallis 1982). The popularity of radish has further increased owing to the widespread consumption of kimchi, a taste- and health-promoting fermented food (Park et al. 2024). Radishes offer a wide range of flavours, textures and biochemical properties, making them versatile ingredients in various culinary traditions (Gamba et al. 2021; Cai et al. 2024). Given the significance of radishes in global vegetable production, understanding their genetic diversity is important. This information is essential for various aspects of plant breeding because radishes account for approximately 2% of the global vegetable production (Huh et al. 2024). Additionally, radish breeding programs face several challenges with the development of horticultural traits such as those associated with reasonable root shape, root length, root weight, flowering time, drought tolerance and soil adaptability (Huh et al. 2024; Kumar and Kaushik 2021). Furthermore, the primary challenges associated with radish breeding, particularly in the context of root and tuber crops, arise from the fact that these crops exhibit belowground traits that make observation and measurement difficult (Lebot 2018). In addition, the complex ploidy and irregular flowering patterns of these crops pose significant obstacles for genotype-based breeding. Considering these challenges, the development of innovative strategies for radish breeding that incorporate genome-wide molecular markers is urgently needed to improve radish crop development (Yang et al. 2020).

Challenges associated with radish breeding have prompted ongoing efforts to establish a superior reference genome for R. sativus. Over the past decade, these efforts have included the sequencing of wild radish and different cultivated radish genomes using short-read sequencing techniques, which has resulted in highly fragmented data compared with more recent genomes constructed using long-read sequencing technologies (Kitashiba et al. 2014; Moghe et al. 2014; Xu et al. 2023). Consequently, a pan-genome for radishes of various wild types was constructed. The size of the radish genome ranged from 418 to 515 Mb, with nine chromosomes and a completeness score of approximately 95% (Zhang et al. 2021). Furthermore, quantitative trait loci (QTLs) with different agronomic traits can be identified using different variant capture methods, such as Random Amplicon Sequencing-Direct (Ezeah et al. 2023), Kompetitive Allele-Specific PCR (Xing et al. 2024) and genotyping-by-sequencing (Kobayashi et al. 2020), which are deployed to address the reduced representation of the radish genome. To increase the coverage of genome-wide markers, a few more studies have also deployed 100 agronomically important radish traits and root colour horticulture traits worldwide (Huh et al. 2024; Xu et al. 2023; Kim et al. 2024). With these detailed reviews, the genetic resources of elite radish lines have been overlooked, except for the genome of the Korean Bakdal elite line thus far (Park et al. 2024). Elite lines are valuable assets in commercial plant breeding programs and represent the culmination of selective breeding efforts to enhance crop performance. However, it is important to balance the use of elite lines with strategies to conserve genetic diversity and ensure the resilience and adaptability of agricultural systems (Sanchez et al. 2023). Furthermore, identifying unique genomic loci in elite lines, such as genes associated with desirable agronomic traits, is crucial for mapping QTLs that are significantly associated with phenotypes (Lv et al. 2017). In addition, understanding genetic gains in breeding programs typically focuses on polygenic traits, which are influenced by multiple genes, rather than on oligogenic traits, which are primarily controlled by a small number of genes (Epstein et al. 2023). Additionally, breeders must obtain detailed genetic insights into radish diversity and genomic loci associated with oligogenic or polygenic traits (Epstein et al. 2023).

The emergence of cost-effective genotypic technologies made possible by massive advancements in next-generation sequencing technologies has created a big data environment for plant breeding (van Dijk, Shiu, and de Ridder 2022). Concurrently, advancements in machine learning have opened new avenues for improving breeding programs by associating polygenic markers with multiple agronomic attributes and enabling phenotypic predictions from genotypes (van Dijk, Shiu, and de Ridder 2022). These advancements have shifted the use of scientific community from time-consuming and resource-intensive breeding methods to advanced genomic marker-assisted breeding methods, paving the way for precision/digital farming and the use of empirical genome-wide genetic markers for a wide array of traits. The research community on radish has obtained genetic insights mostly in a linear form using regression methods, which are highly effective for oligogenic traits but not for polygenic traits, which have a nonlinear nature (Kumar and Kaushik 2021). Machine learning models such as regression and classification models and deep learning approaches, along with statistical methods such as genomic best linear unbiased prediction (GBLUP) and ridge regression best linear unbiased predictor (rrBLUP), are increasingly used in determining the association of genotypes and phenotypes for crop/plant breeding (Tong and Nikoloski 2021). These machine learning models are used to predict complex polygenic traits and show promise in improving the prediction accuracy of polygenic trait determination (Luo and Gu 2020; Danilevicz et al. 2022; Sandhu et al. 2021; Niazian and Niedbała 2020; Yang et al. 2013). Interestingly, although GBLUP has been effectively used (Tong and Nikoloski 2021), it might not capture nonlinear relationships as effectively as machine learning algorithms for complex relationships (Tong and Nikoloski 2021; Danilevicz et al. 2022). Machine learning/deep learning methods have demonstrated promise in improving the prediction accuracy for complex traits, whereas statistical methods such as rrBLUP and GBLUP are used to predict quantitative traits in various crop/plant breeding programs (Tong and Nikoloski 2021). These models offer improved prediction accuracy when associating wide-array SNPs from long genomic distances and can handle the complexity of polygenic traits, which is essential for the development of improved radish varieties (Luo and Gu 2020; Danilevicz et al. 2022; Sandhu et al. 2021). For instance, the application of machine learning algorithms in genome-wide association studies (GWAS) may improve the efficiency of genomics-assisted breeding programs through the identification of QTLs and marker-trait associations that are important for various agronomic traits (Tong and Nikoloski 2021; Yoosefzadeh-Najafabadi et al. 2022).

As described above, machine learning methods can be effectively applied in GWAS to improve radish breeding programs. In this study, we sequenced 100 elite lines commonly used in Korean industries for backcrossing, with the ultimate goal of creating a desired radish variety for a wide range of applications. The sequenced genomes were genotyped using the reference Bakdal, an elite line, and the genotypes were mapped to eight traits and assessed for their predictive potential for phenotypes using five machine-learning models.

2 Materials and Methods

2.1 Plant Sampling

A total of 100 elite lines sourced from Dasan Bio (South Korea base seed company), were used. These lines were obtained directly from breeding procedures. DNA samples were collected on 27 August 2023 and sown in an open field with a green plastic mulch that had holes spaced 25 cm apart in a grid-like pattern. Three seeds were planted into each hole. Fifteen days later, only one healthy plant was left in each hole and the remaining plants were thinned out. Ten days after thinning, young leaves 1.5 cm in size were carefully harvested for DNA extraction using the cetyltrimethylammonium bromide method.

2.2 DNA Sequencing and Variant Calling

Total DNA was isolated from the samples individually according to standard sequencing protocols. DNA was prepared using a TruSeq Nano DNA Prep Kit for Illumina sequencing. Each isolated DNA sample was sequenced using the short-read Novaseq6000 platform (Illumina, CA, USA). The experiment was conducted using DNALink, an authorised service provider in South Korea. Illumina paired-end sequences were subjected to quality and adapter trimming using BBDuk (v28.26). The processed reads were mapped to the recently sequenced Radish Elite Parental line as a reference genome (Park et al. 2024) using Bowtie2 (v.2.2.5) (Langmead and Salzberg 2012). Variant calling was performed with the Haplotype caller in the Genome Analysis Toolkit (GATK; v4.2.0.0) (McKenna et al. 2010), and the SNPs were annotated using SnpEff (v.4.2) (Cingolani et al. 2012).

2.3 Selection of Trait-Associated SNPs

SNPs were selected using parametric selection: Initially, GATK variant call parameters, specifically a normalised quality score ≥ 2 and mapping quality ≥ 40, were used. Subsequently, high-quality SNPs were selected using the following criteria: (1) bi-allelic sites, (2) genotyping rate of the samples at each variable site ≥ 90%, (3) minor allele frequency (MAF) > 5%, and (4) Hardy–Weinberg equilibrium (HWE) < 0.001 using PLINK v1.9 (Purcell et al. 2007). The selected high-quality SNPs were subjected to population stratification using the STRUCTURE algorithm, with a K range of 1–7 and 10,000 iterations (Pritchard, Stephens, and Donnelly 2000). Furthermore, the 100 elites were divided into two distinct categories (high and low) based on their individual phenotypic trait values. The top 30% were classified as the high group (case) and the bottom 30% as the low group (control) and subjected to the association test of PLINK as a case–control model. Significantly associated SNPs were selected based on a P-value < 0.01. This process was repeated independently for all eight traits. Finally, the subset of individual and pooled high-quality SNPs was used for machine learning as features to assess their predictive potential.

2.4 Construction and Validation of Machine Learning Models

Five supervised machine learning algorithm models were used to determine the efficacy of the selected SNP: support vector machine (SVM), k-nearest neighbour (k-NN), random forest (RF), C5.0 decision tree (C5.0) and partial least squares (PLS). Each dataset was divided into training and validation datasets in a 7:3 ratio for the prediction models. The accuracy of five distinct models was evaluated using the ‘Caret’ package, which evaluates to select the optimal model (Kuhn 2008). To evaluate the prediction methods, sensitivity, specificity and accuracy were calculated using the following equations: sensitivity = true positives/(true positives + false negatives); specificity = true negatives/(true negatives + false positives); and accuracy = (true positives + true negatives)/(true positives + false negatives + true negatives + false positives). The performance of the prediction models was assessed using receiver operating characteristic (ROC) curves, which plot sensitivity as a function of (1-specificity) for different decision thresholds. To further compare the ROC curves quantitatively, the area under the curve (AUC) was computed, and significant differences between the two ROCs were assessed using a two-tailed Student's t-test. Evaluation metrics were calculated as described by Kang et al. (Kang et al. 2019). The ‘plotROC’ package (Sachs 2017) was used to calculate the ROC and AUC.

3 Results

3.1 Genome Sequencing and High-Quality SNPs

The sequencing-to-variant selection process is shown in Figure S1 and Table S1. The total number of short read sequences obtained for each of the 100 elite lines of radish was approximately 40 times coverage, resulting in approximately 17.4 Gb of sequencing data (Figure S2A). Of these, 98.8% were processed for sequencing artefacts as outlined in the Methods section, and 93% of the processed sequences were mapped to the reference genome. The mapped sequences covered 90% of the genomic region, and on average, 97.2% of the genes were also covered. Among these, 63% of the genic loci passed the variant call protocol with mapped bases (Figures S2B and S2C). A total of 337 Mb regions were identified during the initial genotyping, with only 1 Mb comprising high-quality SNP regions that successfully passed the quality filter. Subsequently, 33,919 high-quality SNPs were selected for downstream analysis, as illustrated in the workflow (Figure S1). Additionally, the STRUCTURE methodology was employed to evaluate the genetic population stratification of 100 Korean elite lines, resulting in the identification of three sub-populations, which exhibited phenotypic patterns corresponding to high and low horticultural trait values (Figure 1).

Details are in the caption following the image — **FIGURE 1**
Open in figure viewer PowerPoint

Assessment of 100 Korean elite lines using the STRUCTURE methodology identified three distinct subgroups. The phenotype values, which were classified into high and low, were also mapped and were associated with genetic population structure. [Color figure can be viewed at wileyonlinelibrary.com]

3.2 Selected SNPs and Their Trait Associations

As illustrated in Figure 2A,B, eight characteristics were categorized into four distinct groups (i.e., clusters 1 to 4). The observed and calculated phenotype values (Table S1) subjected to neighbour joining method and subsequently the dendrogram tree plotted with iTOL visualisation method. Clusters 1 and 2 primarily comprised oval-shaped roots of two to three different colours, whereas clusters 3 and 4 comprised longer barrel-shaped roots with an increased number of leaflets. These clusters were established based on a phylogenetic tree constructed using the phenotypic values derived from all samples. Similarly, each phenotypic value was classified as high or low based on the trait values (Figure 3A). The weight and length of the root were highly correlated. Whole body weight was derived from the weights of the roots and leaves, which were also highly correlated (Figure 3B). To identify the SNP set, a subset of high-quality SNPs that were strongly associated with eight horticultural crops was selected from the GWAS and principal component analysis (PCA) assessments. Manhattan plot p-values (Figure S3) were used to identify SNPs that demonstrated a high correlation with each trait. PCA was performed to better describe the overall diversity of the samples (Figure S4A). A total of 198 SNPs were chosen from all eight traits: root length (37 SNPs), leaf length (31 SNPs), leaf number (33 SNPs), leaflet number (25 SNPs), root weight (36 SNPs), leaf weight (21 SNPs), whole body weight (40 SNPs) and flowering time (23 SNPs). Interestingly, the PCA results showed a distinct separation between the high and low categories for all 198 SNPs (Figure S4B). The combinations of SNPs are depicted in a heatmap, which revealed that 163 of the 198 SNPs were associated with a single trait, 25 were associated with two traits, nine were associated with three traits, and the RASAT00001:9289382 SNP was associated with six traits (Figure S5 and Table S2). Subsequently, the selected SNP sets were subjected to machine learning-based feature selection and evaluation of their predictive capacity as individual trait SNP sets, which were then combined and assessed collectively using five separate machines for classification tasks.

3.3 Machine Learning-Based Classification of SNPs for the Eight Traits

The eight traits depicted in Figures 2A and 3A were used to classify the data into training and validation datasets (Table 1), which consisted of 42–58 genomes for training and testing purposes. Additionally, a separate external validation dataset consisting of 18–25 genomes were used (Table 1). Five distinct machine learning models, namely, SVM, k-NN, RF, C5.0 and PLS, were used to assess the SNP sets, both individually and collectively, to evaluate their predictive potential for traits. The results are presented in Table 1. Among the five machine learning models, SVM demonstrated successful performance for six traits across the individual and pooled groups. Additionally, the pooled SNPs exhibited a stronger predictive capability for the traits (Figure 4) than individual SNPs (Figure 5). This is evidenced by the PCA plot, which clearly shows the classification potential of both the pooled (Figure 6) and individual SNPs (Figure 7) for the eight traits. The following biological processes identified various markers that have been linked to crop development:DNA replication and methylation (factor of DNA methylation 4, replication protein A 70 kDa DNA-binding subunit, centromere protein C, regulator of telomere elongation helicase 1, and ATP-dependent DNA helicase Q-like 4A), RNA degradation (CCR4-NOT transcription complex subunit and IAA-amino acid hydrolase ILR1-like 2), plant hormone signal transduction (abscisic acid receptor, ETHYLENE INSENSITIVE 3-like 4 protein and DELLA protein RGL1), plant-pathogen interaction (receptor-like protein EIX2, patellin-3 and defensin-like protein 1), carbon fixation in photosynthesis (NADP-dependent malic enzyme 2), inositol phosphate metabolism (phosphoinositide phospholipase C7 and phosphatidylinositol 4-phosphate 5-kinase 7), and biosynthesis of secondary metabolites (caffeoylshikimate esterase, isopentenyl phosphate kinase, amyloplastic, regulator of telomere elongation helicase 1 homologue and transcription factor TFIIIB component B′) and organelle genes (pentatricopeptide repeat-containing protein). These markers are essential for understanding crucial factors that contribute to crop growth and development (Table S2).

TABLE 1. Evaluation metrics of external datasets for machine learning models are summarized for the best model derived from individual and pooled SNPs.

S. no.	Trait	Individual SNPs				Pooled SNPs				Dataset
		ML model	Accuracy	Sp	Sn	ML model	Accuracy	Sp	Sn	Training	Validation
		ML model	Accuracy	Sp	Sn	ML model	Accuracy	Sp	Sn	(High/low)	Validation
1	Length of root	KNN	0.89	0.9	0.88	C5	0.89	0.9	0.88	43 (22/21)	19
2	Length of leaf	SVM	0.84	0.8	0.89	C5	0.79	0.6	1	42 (21/21)	19
3	Number of leaf	C5	0.89	0.83	1	KNN	0.95	0.92	1	42 (24/18)	19
4	Number of leaflets	SVM	0.89	1	0.78	RF	0.95	1	0.89	44 (23/21)	19
5	Weight of root	RF	0.84	0.9	0.78	SVM	0.84	0.9	0.78	43 (23/20)	19
6	Weight of leaf	SVM	0.9	0.92	0.88	C5	0.9	1	0.75	44 (22/22)	20
7	Weight of whole body	RF	0.83	0.78	0.89	SVM	0.94	0.89	1	42 (21/21)	18
8	Flowering time	KNN	0.96	1	0.67	SVM	0.92	0.67	0.95	58 (18/40)	25

Abbreviations: Sn, sensitivity; Sp, specificity.

4 Discussion

Genome sequencing is a critical resource for plant breeding because it provides a comprehensive understanding of the genetic makeup of plants, which is essential for identifying desirable traits and accelerating breeding (Henry 2022). The availability of genome sequences allows breeders to connect phenotypic traits with their underlying genotypes, thereby facilitating the selection of improved cultivars (Poland and Rife 2012). Radish (R. sativus L.) is a member of the Brassicaceae family that includes Arabidopsis thaliana and Brassica species, which have undergone advanced genetic and genomic studies. Similarly, various efforts have been made to develop genetic resources for radishes, which involve the construction of meta-genomes and high-density genetic maps and the identification of trait-associated molecular markers such as glucosinolate (GL) content and root colour (Huh et al. 2024; Xing et al. 2024; Yi et al. 2016; Kim et al. 2021; Masukawa et al. 2019; Shirasawa and Kitashiba 2017). Moreover, germplasm resources for radishes have been widely sequenced, such as the 100 radish varieties that are currently being cultivated (Huh et al. 2024). Although these advancements have facilitated the identification of genes for important agronomic traits and are expected to improve radish breeding programs, radish remains a less-studied crop. The underground growth of roots/tubers, such as radishes, onions, carrots, potatoes, sweet potatoes and yams, complicates the phenotyping of desirable traits, presenting a significant obstacle in plant breeding (Paez-Garcia et al. 2015; Divya, Thangaraj, and Krishna Radhika 2024). An alternative method for accelerating the breeding process in such scenarios is marker-assisted breeding, which can be achieved using genome sequencing. For cereals, genomic resources have been extensively developed and the application of GAB techniques such as marker-assisted selection and genomic selection are being applied to traits such as drought tolerance and disease resistance (Thudi et al. 2014; Singh et al. 2017). These approaches are also being applied to other crops such as tomato (Tiwari et al. 2022) and millets (Satyavathi et al. 2019), indicating a broader trend towards the integration of genomic information into breeding practises. However, the genomic resources available for root and tuber crops are currently limited. Therefore, the genomic resources we developed for the elite lines and the markers associated with the eight traits (Figures 6 and 4) could serve as important resources for radish and other root crops and could aid in uncovering various challenging genetic factors associated with root/tuber crops (Mun et al. 2015; Kumar et al. 2012).

Large-scale phenotypic and genotypic datasets have been increasingly integrated with machine learning models, facilitating predictive breeding and crop development with improved yield, resilience and quality (Bose et al. 2024; Yu et al. 2021). Similarly, we used five unique machine learning algorithms that are often used in predicting the functions of biomolecules to forecast the association of the identified SNPs with the eight traits (Figures 4 and 5) (Yu et al. 2021; Noh et al. 2023; Malik et al. 2022). This methodology is advantageous for discerning linkages between polygenic traits and genetic improvement during backcrossing. The results of the present study are consistent with those of several studies conducted on other crops. For example, the identification of meiotic crossover genes is crucial for various genetic gains (Epstein et al. 2023). Additionally, this study found that similar genes are involved in DNA replication and methylation, including DNA methylation 4, replication protein A 70 kDa DNA-binding subunit, centromere protein C, regulator of telomere elongation helicase 1 and ATP-dependent DNA helicase Q-like 4A. Furthermore, other biological functions, such as plant hormone signal transduction and plant-pathogen interactions, could serve as valuable genetic markers for the development of disease resistance in radish breeding programs. This study generated valuable genetic resources and identified significant SNP markers that may serve as a foundation for various genome-assisted breeding applications. To supplement conventional phenotypic methods, we incorporated genome-wide SNPs and identified their associations with corresponding traits by selecting specific SNP subsets using machine learning techniques. The carefully selected set of SNPs, which could be effective variables in machine learning methods, were highly capable of classifying the genotypic patterns with phenotype and were also similar in the genotype subpopulation assessment, guaranteeing the extensive applicability of the machine learning methodology to radish research and other breeding applications (Figure 1).

Although machine learning models have shown efficiency in GWAS for crop breeding by identifying relevant genetic markers associated with important traits and show promise in handling complex, high-dimensional data and capturing nonlinear relationships, they also face challenges such as the need for large, high-quality datasets and the interpretation of model outputs (Danilevicz et al. 2022). The high dimensionality of data can impede the scalability and generalisation of machine learning algorithms. For instance, the genomic BLUP method performs well in the presence of a population structure, suggesting that machine learning methods require refinement to incorporate such information (Danilevicz et al. 2022; Sandhu et al. 2021; Yang et al. 2013; Grinberg, Orhobor, and King 2020). With these cautions, we utilised GWAS to enhance radish genetic resources.

Author Contributions

Myunghee Jung, Yu-Jin Lim, Sunghyun Cho, and Younhee Shin performed genome mapping, variant analysis and machine learning modelling. Younhee Shin, Sathiyamoorthy Subramaniyam and Han Yong Park drafted the manuscript. Han Yong Park and Byeong Jun Park performed the sampling and sequencing. Han Yong Park, Byeong Jun Park and Younhee Shin funded and modelled the study.

Conflicts of Interest

The authors declare no conflicts of interest.

Open Research

Data Availability Statement

The complete sequences generated in this study have been deposited in the Sequence Read Archive repository under the accession number PRJNA1173361.

Supporting Information

Filename

Description

pbr13250-sup-0001-Table_S1.xlsxExcel 2007 spreadsheet , 27.2 KB

Table S1. Comprehensive summary of the genetic analysis of 100 elite radish lines including sequencing, mapping, coverage and phenotypic assessments.

pbr13250-sup-0002-Table_S2.xlsxExcel 2007 spreadsheet , 38.7 KB

Table S2. Comprehensive overview of the selected SNPs and their genomic locations and gene functional annotations.

pbr13250-sup-0003-Figures.docxWord 2007 document , 3.1 MB

Figure S1. Complete variant calling protocol used in this research, converting fastq files to high-quality SNPs and with specific software and metrics employed along with established cut-offs.

Figure S2. Assessment of the mapping and coverage of the sequenced reads from the 100 elite lines mapped to the Raphanus sativus Bakdal reference genome, including the quantity of sequence artefacts processed (A), the coverage of genic regions (B), and the coverage of genes (C).

Figure S3. Individual Manhattan plots of -Log10 (P) versus chromosomal position of SNP markers associated with all eight traits

Figure S4. PCA evaluation of the obtained high-quality SNPs (A) and the pooled trait-associated SNPs chosen from the Manhattan plots of -Log10 (P) versus chromosomal position of SNP markers associated with all eight traits (B).

Figure S5. Heat map representation of the 198 SNPs shows the detailed genotypes present in the reference genome and the called bases.

Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.

References

Al-Hamadany, S. Y. H., A. A. H. Al-Jubouri, and W. Y. R. Al-Shakarchy. 2023. “ Variability and Expectant Genetic Advance for Yield and Its Components in Radish (Raphanus sativus L.).” In IOP Conference Series: Earth and Environmental Science, vol. 1213, 012018. Bristol: IOP Publishing.
10.1088/1755-1315/1213/1/012018
Google Scholar
Bose, S., S. Banerjee, S. Kumar, A. Saha, D. Nandy, and S. Hazra. 2024. “Review of Applications of Artificial Intelligence (AI) Methods in Crop Research.” Journal of Applied Genetics 65: 225–240.
10.1007/s13353-023-00826-z
PubMed Web of Science® Google Scholar
Cai, X., K. Zhu, W. Li, et al. 2024. “Characterization of Flavor and Taste Profile of Different Radish (Raphanus sativus L.) Varieties by Headspace-Gas Chromatography-Ion Mobility Spectrometry (GC/IMS) and E-Nose/Tongue.” Food Chemistry 22: 101419.
CAS Google Scholar
Cingolani, P., A. Platts, L. Wang le, et al. 2012. “A Program for Annotating and Predicting the Effects of Single Nucleotide Polymorphisms, SnpEff: SNPs in the Genome of Drosophila Melanogaster Strain w1118; iso-2; iso-3.” Fly (Austin) 6: 80–92.
10.4161/fly.19695
CAS PubMed Web of Science® Google Scholar
Danilevicz, M. F., M. Gill, R. Anderson, et al. 2022. “Plant Genotype to Phenotype Prediction Using Machine Learning.” Frontiers in Genetics 13: 822173.
10.3389/fgene.2022.822173
PubMed Web of Science® Google Scholar
Divya, K., M. Thangaraj, and N. Krishna Radhika. 2024. “CRISPR/Cas9: An Advanced Platform for Root and Tuber Crops Improvement.” Frontiers in Genome Editing 5: 1242510.
10.3389/fgeed.2023.1242510
PubMed Web of Science® Google Scholar
Epstein, R., N. Sajai, M. Zelkowski, A. Zhou, K. R. Robbins, and W. P. Pawlowski. 2023. “Exploring Impact of Recombination Landscapes on Breeding Outcomes.” Proceedings of the National Academy of Sciences 120: e2205785119.
10.1073/pnas.2205785119
CAS PubMed Web of Science® Google Scholar
Ezeah, C. S. A., J. Shimazu, T. Kawanabe, et al. 2023. “Quantitative Trait Locus (QTL) Analysis and Fine-Mapping for Fusarium oxysporum Disease Resistance in Raphanus sativus Using GRAS-Di Technology.” Breeding Science 73: 421–434.
10.1270/jsbbs.23032
CAS PubMed Web of Science® Google Scholar
Gamba, M., E. Asllanaj, P. F. Raguindin, et al. 2021. “Nutritional and Phytochemical Characterization of Radish (Raphanus sativus): A Systematic Review.” Trends in Food Science & Technology 113: 205–218.
10.1016/j.tifs.2021.04.045
CAS Web of Science® Google Scholar
Grinberg, N. F., O. I. Orhobor, and R. D. King. 2020. “An Evaluation of Machine-Learning for Predicting Phenotype: Studies in Yeast, Rice, and Wheat.” Machine Learning 109: 251–277.
10.1007/s10994-019-05848-5
PubMed Web of Science® Google Scholar
Henry, R. J. 2022. “Progress in Plant Genome Sequencing.” Applied Biosciences 1, no. 2: 113–128. https://doi.org/10.3390/applbiosci1020008.
10.3390/applbiosci1020008
Web of Science® Google Scholar
Huh, S. M., A. Cho, B. Yim, et al. 2024. “Characterization of Agronomic Traits and Genomic Diversity in a Newly Assembled Radish Core Collection.” Crop Science 64: 88–109.
10.1002/csc2.21135
Web of Science® Google Scholar
Kang, M. J., A. Y. Shin, Y. Shin, et al. 2019. “Identification of Transcriptome-Wide, Nut Weight-Associated SNPs in Castanea Crenata.” Scientific Reports 9: 13161.
10.1038/s41598-019-49618-8
CAS PubMed Web of Science® Google Scholar
Kim, J., H. Jang, S. M. Huh, et al. 2024. “Effect of Structural Variation in the Promoter Region of RsMYB1.1 on the Skin Color of Radish Taproot.” Frontiers in Plant Science 14: 1327009.
10.3389/fpls.2023.1327009
PubMed Web of Science® Google Scholar
Kim, S., K. Yun, H. Y. Park, et al. 2021. “Development of Molecular Markers for Predicting Radish (Raphanus sativus) Flesh Color Based on Polymorphisms in the RsTT8 Gene.” Plants 10: 1386.
10.3390/plants10071386
Google Scholar
Kitashiba, H., F. Li, H. Hirakawa, et al. 2014. “Draft Sequences of the Radish (Raphanus sativus L.) Genome.” DNA Research 21: 481–490.
10.1093/dnares/dsu014
CAS PubMed Web of Science® Google Scholar
Kobayashi, H., K. Shirasawa, N. Fukino, H. Hirakawa, T. Akanuma, and H. Kitashiba. 2020. “Identification of Genome-Wide Single-Nucleotide Polymorphisms Among Geographically Diverse Radish Accessions.” DNA Research 27: dsaa001.
10.1093/dnares/dsaa001
PubMed Web of Science® Google Scholar
Kuhn, M. 2008. “Building Predictive Models in R Using the Caret Package.” Journal of Statistical Software 28: 1–26.
10.18637/jss.v028.i05
PubMed Web of Science® Google Scholar
Kumar, A., and P. Kaushik. 2021. Advances and Milestones of Radish Breeding: An Update. Preprints. https://doi.org/10.20944/preprints202108.0514.v1.
10.20944/preprints202108.0514.v1
Google Scholar
Kumar, R., R. Sharma, R. K. Gupta, and M. Singh. 2012. “Determination of Genetic Variability and Divergence for Root Yield and Quality Characters in Temperate Radishes.” International Journal of Vegetable Science 18: 307–318.
10.1080/19315260.2011.623761
Google Scholar
Langmead, B., and S. L. Salzberg. 2012. “Fast Gapped-Read Alignment With Bowtie 2.” Nature Methods 9: 357–359.
10.1038/nmeth.1923
CAS PubMed Web of Science® Google Scholar
Lebot, V. 2018. Tropical Root and Tuber Crops Breeding in the Pacific: A Review of 35 Years of Efforts. 1205th ed, 589–602. Leuven, Belgium: International Society for Horticultural Science (ISHS).
Google Scholar
Lewis-Jones, L. J., J. P. Thorpe, and G. P. Wallis. 1982. “Genetic Divergence in Four Species of the Genus Raphanus: Implications for the Ancestry of the Domestic Radish R. sativus.” Biological Journal of the Linnean Society 18: 35–48.
10.1111/j.1095-8312.1982.tb02032.x
Web of Science® Google Scholar
Luo, M. and S. Gu. 2020, Polygenic Prediction of Complex Traits With Iterative Screen Regression Models. bioRxiv, 2020.2011.2029.402180.
Google Scholar
Lv, H., Q. Wang, F. Han, et al. 2017. “Genome-Wide Indel/SSR Scanning Reveals Significant Loci Associated With Excellent Agronomic Traits of a Cabbage (Brassica oleracea) Elite Parental Line ‘01–20’.” Scientific Reports 7: 41696.
10.1038/srep41696
CAS PubMed Web of Science® Google Scholar
Malik, A., S. Subramaniyam, C.-B. Kim, and B. Manavalan. 2022. “SortPred: The First Machine Learning Based Predictor to Identify Bacterial Sortases and Their Classes Using Sequence-Derived Information.” Computational and Structural Biotechnology Journal 20: 165–174.
10.1016/j.csbj.2021.12.014
CAS PubMed Web of Science® Google Scholar
Masukawa, T., K.-S. Cheon, D. Mizuta, M. Kadowaki, A. Nakatsuka, and N. Kobayashi. 2019. “Development of Mutant RsF3′H Allele-Based Marker for Selection of Purple and Red Root in Radish (Raphanus sativus L. var. Longipinnatus L. H. Bailey).” Euphytica 215: 119.
10.1007/s10681-019-2442-1
Google Scholar
McKenna, A., M. Hanna, E. Banks, et al. 2010. “The Genome Analysis Toolkit: A MapReduce Framework for Analyzing Next-Generation DNA Sequencing Data.” Genome Research 20: 1297–1303.
10.1101/gr.107524.110
CAS PubMed Web of Science® Google Scholar
Moghe, G. D., D. E. Hufnagel, H. Tang, et al. 2014. “Consequences of Whole-Genome Triplication as Revealed by Comparative Genomic Analyses of the Wild Radish Raphanus Raphanistrum and Three Other Brassicaceae Species.” Plant Cell 26: 1925–1937.
10.1105/tpc.114.124297
CAS PubMed Web of Science® Google Scholar
Mun, J.-H., H. Chung, W.-H. Chung, et al. 2015. “Construction of a Reference Genetic map of Raphanus sativus Based on Genotyping by Whole-Genome Resequencing.” Theoretical and Applied Genetics 128: 259–272.
10.1007/s00122-014-2426-4
CAS PubMed Web of Science® Google Scholar
Niazian, M., and G. Niedbała. 2020. “Machine Learning for Plant Breeding and Biotechnology.” Agriculture 10, no. 10: 436.
10.3390/agriculture10100436
CAS Google Scholar
Noh, E. S., S. Subramaniyam, S. Cho, et al. 2023. “Genotyping of Haliotis Discus Hannai and Machine Learning Models to Predict the Heat Resistant Phenotype Based on Genotype.” Frontiers in Genetics 14: 1151427. https://doi.org/10.3389/fgene.2023.1151427.
10.3389/fgene.2023.1151427
PubMed Web of Science® Google Scholar
Paez-Garcia, A., C. M. Motes, W.-R. Scheible, R. Chen, E. B. Blancaflor, and M. J. Monteros. 2015. “Root Traits and Phenotyping Strategies for Plant Improvement.” Plants 4, no. 2: 334–355. https://doi.org/10.3390/plants4020334.
10.3390/plants4020334
Google Scholar
Park, H. Y., Y. J. Lim, M. Jung, et al. 2024. “Genome of Raphanus sativus L. Bakdal, an Elite Line of Large Cultivated Korean Radish.” Frontiers in Genetics 15: 1328050.
10.3389/fgene.2024.1328050
PubMed Web of Science® Google Scholar
Poland, J. A., and T. W. Rife. 2012. “Genotyping-By-Sequencing for Plant Breeding and Genetics.” Plant Genome 5: 92–102.
10.3835/plantgenome2012.05.0005
CAS Web of Science® Google Scholar
Pritchard, J. K., M. Stephens, and P. Donnelly. 2000. “Inference of Population Structure Using Multilocus Genotype Data.” Genetics 155: 945–959.
10.1111/j.1365-294X.2004.02396.x
CAS PubMed Web of Science® Google Scholar
Purcell, S., B. Neale, K. Todd-Brown, et al. 2007. “PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses.” American Journal of Human Genetics 81: 559–575.
10.1086/519795
CAS PubMed Web of Science® Google Scholar
Raman, S. 2022. “ An Update on Radish Breeding Strategies: An Overview.” In Case Studies of Breeding Strategies in Major Plant Species, edited by W. Haiping, Ch. 14. Rijeka: IntechOpen.
Google Scholar
Sachs, M. C. 2017. “plotROC: A Tool for Plotting ROC Curves.” Journal of Statistical Software, Code Snippets 79, no. 2: 1–19. https://doi.org/10.18637/jss.v079.c02.
10.18637/jss.v079.c02
PubMed Web of Science® Google Scholar
Sanchez, D., S. B. Sadoun, T. Mary-Huard, A. Allier, L. Moreau, and A. Charcosset. 2023. “Improving the Use of Plant Genetic Resources to Sustain Breeding programs' Efficiency.” Proceedings of the National Academy of Sciences 120: e2205780119.
10.1073/pnas.2205780119
CAS PubMed Web of Science® Google Scholar
Sandhu, K. S., D. N. Lozada, Z. Zhang, M. O. Pumphrey, and A. H. Carter. 2021. “Deep Learning for Predicting Complex Traits in Spring Wheat Breeding Program.” Frontiers in Plant Science 11: 613325.
10.3389/fpls.2020.613325
PubMed Web of Science® Google Scholar
Satyavathi, C. T., R. K. Solanki, R. K. Kakani, et al. 2019. “ Genomics Assisted Breeding for Abiotic Stress Tolerance in Millets.” In Genomics Assisted Breeding of Crops for Abiotic Stress Tolerance, edited by V. R. Rajpal, D. Sehgal, A. Kumar, and S. N. Raina, vol. II, 241–255. Cham: Springer International Publishing.
10.1007/978-3-319-99573-1_13
Google Scholar
Shirasawa, K., and H. Kitashiba. 2017. “ Genetic Maps and Whole Genome Sequences of Radish.” In The Radish Genome, edited by T. Nishio and H. Kitashiba, 31–42. Cham: Springer International Publishing.
10.1007/978-3-319-59253-4_3
Google Scholar
Singh, R. K., P. P. Sahu, M. Muthamilarasan, A. Dhaka, and M. Prasad. 2017. “ Genomics-Assisted Breeding for Improving Stress Tolerance of Graminaceous Crops to Biotic and Abiotic Stresses: Progress and Prospects.” In Plant Tolerance to Individual and Concurrent Stresses, edited by M. Senthil-Kumar, 59–81. New Delhi: Springer India.
10.1007/978-81-322-3706-8_5
Google Scholar
Thudi, M., P. M. Gaur, L. Krishnamurthy, et al. 2014. “Genomics-Assisted Breeding for Drought Tolerance in Chickpea.” Functional Plant Biology 41: 1178–1190.
10.1071/FP13318
CAS Web of Science® Google Scholar
Tiwari, J. K., S. R. Yerasu, N. Rai, et al. 2022. “Progress in Marker-Assisted Selection to Genomics-Assisted Breeding in Tomato.” Critical Reviews in Plant Sciences 41: 321–350.
10.1080/07352689.2022.2130361
CAS Web of Science® Google Scholar
Tong, H., and Z. Nikoloski. 2021. “Machine Learning Approaches for Crop Improvement: Leveraging Phenotypic and Genotypic big Data.” Journal of Plant Physiology 257: 153354.
10.1016/j.jplph.2020.153354
CAS PubMed Web of Science® Google Scholar
van Dijk, A. D. J., S. H. Shiu, and D. de Ridder. 2022. “Editorial: Artificial Intelligence and Machine Learning Applications in Plant Genomics and Genetics.” Frontiers in Artificial Intelligence 5: 959470.
10.3389/frai.2022.959470
PubMed Google Scholar
Xing, X., T. Hu, Y. Wang, et al. 2024. “Construction of SNP Fingerprints and Genetic Diversity Analysis of Radish (Raphanus sativus L.).” Frontiers in Plant Science 15: 1329890.
10.3389/fpls.2024.1329890
PubMed Web of Science® Google Scholar
Xu, L., Y. Wang, J. Dong, et al. 2023. “A Chromosome-Level Genome Assembly of Radish (Raphanus sativus L.) Reveals Insights Into Genome Adaptation and Differential Bolting Regulation.” Plant Biotechnology Journal 21: 990–1004.
10.1111/pbi.14011
CAS PubMed Web of Science® Google Scholar
Yang, R., H. Li, L. Fu, and Y. Liu. 2013. “An Efficient Approach to Large-Scale Genotype–Phenotype Association Analyses.” Briefings in Bioinformatics 15: 814–822.
10.1093/bib/bbt061
PubMed Google Scholar
Yang, W., H. Feng, X. Zhang, et al. 2020. “Crop Phenomics and High-Throughput Phenotyping: Past Decades, Current Challenges, and Future Perspectives.” Molecular Plant 13: 187–214.
10.1016/j.molp.2020.01.008
CAS PubMed Web of Science® Google Scholar
Yi, G., S. Lim, W. B. Chae, et al. 2016. “Root Glucosinolate Profiles for Screening of Radish (Raphanus sativus L.) Genetic Resources.” Journal of Agricultural and Food Chemistry 64: 61–70.
10.1021/acs.jafc.5b04575
CAS PubMed Web of Science® Google Scholar
Yoosefzadeh-Najafabadi, M., M. Eskandari, S. Torabi, D. Torkamaneh, D. Tulpan, and I. Rajcan. 2022. “Machine-Learning-Based Genome-Wide Association Studies for Uncovering QTL Underlying Soybean Yield and Its Components.” International Journal of Molecular Sciences 23, no. 10: 5538.
10.3390/ijms23105538
CAS PubMed Web of Science® Google Scholar
Yu, G.-E., Y. Shin, S. Subramaniyam, et al. 2021. “Machine Learning, Transcriptome, and Genotyping Chip Analyses Provide Insights Into SNP Markers Identifying Flower Color in Platycodon Grandiflorus.” Scientific Reports 11: 8019.
10.1038/s41598-021-87281-0
CAS PubMed Web of Science® Google Scholar
Zhang, X., T. Liu, J. Wang, et al. 2021. “Pan-Genome of Raphanus Highlights Genetic Variation and Introgression Among Domesticated, Wild, and Weedy Radishes.” Molecular Plant 14: 2032–2055.
10.1016/j.molp.2021.08.005
CAS PubMed Web of Science® Google Scholar

Volume144, Issue3

June 2025

Pages 350-359

Genome Resources for Identifying SNPs Associated With Eight Horticultural Traits in Commercial Korean Elite Radish (Raphanus sativus) Lines

ABSTRACT

1 Introduction