Determination of soybean routine quality parameters using near-infrared spectroscopy
Abstract
Large differences in quality existed between soybean samples. In order to rapidly detect soybean quality between samples from different areas, we have developed near-infrared spectroscopy (NIRS) models for the moisture, crude fat, and protein content of soybeans, based on 360 soybean samples collected from different areas. Compared with whole kernels, soybean powder with particle sizes of 60 mesh was more suitable for modeling of moisture, crude fat, and protein content. To increase the reproducibility of the prediction model, uniform particle sizes of soybeans were prepared by grinding and sieving soybeans with different sizes and colors. Modeling analysis showed that the internal cross-validation correlation coefficients (Rcv) for the moisture, crude fat, and protein content of soybeans were .965, .941, and .949, respectively, and the determination coefficients (R2) were .966, .958, and .958. NIRS performed well as a rapid method for the determination of routine quality parameters and provided reference data for the analysis of soybean quality using FT-NIRS.
1 INTRODUCTION
China is a high consumption country of soybean which has been regarded as the health food (He & Chen, 2013). Soybeans are one of the main agricultural products of China and several hundred varieties are grown, with huge differences in composition that arise from the rich genetic diversity and regional planting (Lam et al., 2010). Currently, traditional analyse methods are used to analyze soybean quality indices, which are moisture, crude fat, and protein content. These methods produce highly accurate results, but the analytical processes are time-consuming and laborious and the chemical reagents that are used contribute to environmental pollution (Liu, 1997). Alternative rapid and accurate analytical methods are, therefore, urgently needed (Baianu et al., 2012; Martin, 1992; Zhu et al., 2011).
Near-infrared spectroscopy (NIRS) is a rapid technique that can be used for the simultaneous detection and analysis of multiple components (Acquah, Via, Billor, Fasina, & Eckhardt, 2016; Baianu et al., 2012; Louw & Theron, 2010; Wehling, Pierce, & Froning, 1988; Williams, Norris, & Sobering, 1985). Different chemical components in samples can be rapidly quantified using NIRS by taking advantage of the vibrational absorption modes of the compounds in the NIR region of the spectrum (Martelovidal & Vazquez, 2014). The frequency doubling and combination bands of various hydrogen-containing groups in moisture, protein, fat, and carbohydrate all fall within the NIR region, and the characteristic vibrational information of the hydrogen-containing groups in these organic molecules can be used to determine the chemical composition of mixtures (Givens, De Boever, & Deaville, 1997).
NIRS is nondestructive, fast and needs no complicated sample pretreatment. Because of these advantages, the technique has been evaluated as a method for the analysis of many agricultural products, including beef, eggs, apples, and tomatoes (Mitsumoto, Maeda, Mitsuhashi, & Ozawa, 1991; Peirs, Scheerlinck, De Baerdemaeker, & Nicolai, 2003; Slaughter, Barrett, & Boersig, 1996; Uddin & Okazaki, 2004; Wehling et al., 1988). As to the evaluation of the quality of agricultural products, including rice, wheat, corn, rape, and soybean, this technology is widely used (Agelet et al., 2012; Baianu, You, Guo, Costescu, & Prisecaru, 2011; Bao, Cai, & Corke, 2001; Barton, Shenk, Westerhaus, & Funk, 2000; Dowell et al., 2006; Kovalenko, Rippke, & Hurburgh, 2006; Liu et al., 2008; Peiris, Bockus, & Dowell, 2015; Peiris, Dong, Bockus, & Dowell, 2014; Peiris et al., 2010). For the evaluation of soybean quality, AACC International (formerly the American Association of Cereal Chemists) currently recommends near-infrared reflectance method for protein, crude fat, and moisture content analysis in soybean based on intact seed (International, 2010b). However, a number of factors, including sophistication of instruments, sample particle size, moisture content, temperature, and color will affect the outcomes of experiments (Fernandezahumada et al., 2006). Sample particle size and uniformity have been shown to be the main factors affecting the accuracy of NIR analysis and well-controlled particle size and uniformity of samples thus provide the basis for the establishment of a good model (Williams & Thompson, 1978). In our study, we found sharp differences among soybeans in terms of grain size and color, especially for the complex and diverse Chinese soybeans, which means that there is a requirement to investigate soybeans qualities in China for the purpose of Chinese standard updating. We decided, therefore, to crush the whole grains of soybeans to determine appropriate particle sizes for the establishment of a diffuse reflectance Fourier transform NIRS (FT-NIRS) prediction model. Soybean quality index models were established using uniform particle sizes to avoid the problems of poor reproducibility and accuracy caused by different varieties, different growing regions, and different grain sizes of soybean samples and to provide reference data for the analysis of soybean quality using FT-NIRS.
2 EXPERIMENTAL SECTION
2.1 Reagents and apparatus
Concentrated sulfuric acid, sodium hydroxide, boric acid, hydrochloric acid, petroleum ether, bromocresol green, methyl red, anhydrous sodium carbonate, potassium sulfate, copper sulfate, and ethanol were all analytical grade (AR) reagents and were purchased from Sigma-Aldrich Shanghai Trading Co Ltd (Shanghai, China).
MB3600 FT-NIR spectrometer was purchased from ABB-Bomem (Quebec, Canada).
2.2 Sample Collection
The 360 samples soybeans (total 50 varieties) were collected from two represented areas. Two hundred and forty samples collected from Northeast China through National Research Center of Soybean Engineering and Technology in October 2013. One hundred and twenty samples collected from the Yangtze River through Zhenjiang Grain and Oil Co., Ltd in September 2014. Samples were dried in oven at 25°C for 24 hr.
2.3 Preparation of soybean sample sets and classification of model samples
The preparation of soybean sample has 90 samples. Each soybean sample was 500 g, and each sample was divided into two equal parts. One part was stored for future use, and the other part was divided into five equal parts. The soybean samples were crushed using a high-speed multifunction mill and screened through mesh sizes of 10, 20, 40, 60, and 80 to provide particles with diameters of 2, 0.9, 0.45, 0.3, and 0.2 mm, respectively. When more than 95% of the particles had passed through the mesh sieve, the individual powders were thoroughly mixed and scanned to determine the best particle size for modeling. The remaining 270 soybean samples were crushed and screened using the best method according comprehensive modeling of the first 90 samples and then divided into a calibration set (n = 216) and an external validation set (n = 54) to establish the best model for determination of moisture, crude fat, and protein content.
2.4 Chemical analysis of soybean samples
The moisture content of the soybeans was determined according to AACC Method 44-15.02 (International, 2010c), the crude fat content was determined according to AACC Method 30-25.01 (International, 2010a), and the protein content was determined according to AACC Method 46-11.02 (protein was determined by the combustion method, with a protein correction factor of %N × 6.25) (International, 2010d). Each sample was analyzed three times, and the final results are presented as mean values.
2.5 Collection of near-infrared spectra
To ensure consistency of the samples used for NIR scanning, the sample thickness was maintained at 2 cm. A high efficiency MB3600 FT-NIR spectrometer, with a scanning spectral range of 3700–15,000/cm and built-in Horizon MB stoichiometric modeling software, was used to collect the spectra of the soybean samples. The spectrometer was turned on and allowed to warm up for 30 min and the spectra were then collected over the range 4000–12,600/cm, at a resolution of 16/cm with 60 scan number, which containing the absorbance regions of the traits of interest (4000–9000/cm for protein, moisture, and fat). Each sample was scanned three times to eliminate differences caused by objective factors.
2.6 Evaluation of the NIR model
The performance of the prediction model was evaluated using an internal cross-validation method, which incorporates root mean square error of calibration (RMSEC), standard error of cross-validation (SECV), and correlation coefficient of cross-validation (Rcv). Smaller values of RMSEC and SECV and higher values of Rcv indicate better performance of the prediction model (Ferreira, Galão, Pallone, & Poppi, 2014). External validation is the evaluation of the predictive performance of the calibration model in the validation sample set. The predictive performance of the model can be evaluated using the determination coefficient (R2) and the statistical probability (p value). Higher values of R2 and p values <0.05 indicate better performance of the prediction model.
3 RESULTS AND DISCUSSION
3.1 Analysis of soybean components
The quality indices, moisture, crude fat, and protein content, for soybeans samples used in the study, are presented in Table 1. For the 90 selected samples, moisture, crude fat, and protein content were 8.47%–10.67%, 17.71%–25.14%, and 37.37%–43.20%, respectively. For the 216 samples in the calibration set, moisture, crude fat, and protein content were 7.42%–13.71%, 15.78%–25.57%, and 37.37%–43.21%, respectively. For the 54 samples in the external validation set, moisture, crude fat, and protein content were 6.92%–11.24%, 17.75%–25.39%, and 37.04%–43.56%, respectively. The number of samples used for modeling was much higher than 50, which is the minimum sample size proposed for NIRS modeling (Williams et al., 1985). The quality indices, moisture, crude fat, and protein content were widely distributed and were representative of sample composition, providing favorable conditions for the establishment of quality models.
Parameters | Sample number(n) | Maximum value(%) | Minimum value(%) | Mean value (%) | Standard (%) |
---|---|---|---|---|---|
Moisture | 90 | 10.67 | 8.47 | 9.37 | 0.46 |
216 | 13.71 | 7.42 | 9.30 | 1.09 | |
54 | 11.24 | 6.92 | 9.41 | 0.90 | |
Crude fat | 90 | 25.14 | 17.71 | 19.68 | 1.41 |
216 | 25.57 | 15.78 | 21.91 | 1.95 | |
54 | 25.39 | 17.75 | 21.89 | 2.16 | |
Protein | 90 | 43.20 | 37.37 | 39.97 | 1.46 |
216 | 43.20 | 37.37 | 40.60 | 1.02 | |
54 | 43.56 | 37.04 | 40.54 | 1.17 |
3.2 Spectrogram of soybean samples
NIRS analysis is based on the characteristic absorption bands from combination vibrational frequencies of NH, CH, OH, and CO in chemical components of samples in the NIR region (Martin, 1992). The position of the absorption bands provides information about the chemical composition of the components, and the strength of the absorption band is proportional to the amount of the hydrogen-containing group that is present. The NIR spectra of soybean samples can be used as a basis for quantitative analysis of quality indices. The distribution pattern of the sample group under investigation is not accurately reflected if the sample size is too small or too large and useful information may be obscured because irrelevant statistical differences are emphasized. As a result, the performance of the model is greatly reduced. Fewer, but more valuable, samples should thus be chosen to ensure the establishment of a model with the best predictive power. Variations in the intensity of the absorption bands for five samples of the same soybean with different particle sizes at different wavelengths in the spectral region 4000–12,600/cm are shown in Figure 1. The intensity of the absorbance showed a tendency to increase with increasing particle size. Spectral variation was also greater at higher wavelengths, thus affecting the reliability of the NIR prediction model.

3.3 Selection of optimal particle size for soybean modeling
Many studies that describe models for evaluation soybean quality have been published. In the research, crushed soybean kernels showed better modeling effect for quality prediction. Pazdernik et al. analyzed the amino acid and fatty acid content of whole grains and crushed samples of soybeans using NIRS technology and obtained cross-validation R2 values of .380–.850 and .060–.830 for the crushed samples and whole grains, respectively, demonstrating the higher accuracy of the tests conducted on crushed samples (Pazdernik, Killam, & Orf, 1997). Haughey et al. analyzed different soya bean meal by NIR, and the correlation coefficient of the model was between 0.990 and 0.890(Haughey, Graham, Cancouët, & Elliott, 2013).
In our paper, we firstly analyzed the moisture, crude fat, and protein content of whole kernels using NIRS technology, and the results indicated that the Rcv of the moisture content model was .971. However, the Rcv values of the crude fat and protein models, which were .520 and .495, respectively, showed the predication ability of these two models was low. So modeling analysis of 90 crushed soybean samples with different size was performed using Horizon MB stoichiometric software, combined with partial least squares (PLS) analysis. Samples were pretreated and the data were then processed using appropriate spectral mathematical procedures, including multiple scattering correction, derivation, detrending, normalization, offset correction, and standard normal variate, to determine the optimal particle size for modeling the moisture, protein, and crude fat content of soybeans. Ninety samples of soybean crushed samples were sieved, the particle size were 0, 20, 40, 60, 80 mesh, the establishment of the appropriate model to find the best modeling particle size, the experiment using Horizon MB stoichiometric software modeling results in Table 2.
Parameters | Particle size (mesh) | RMSEC | SECV | R cv |
---|---|---|---|---|
Moisture | 10 | 0.522 | 0.273 | .954 |
20 | 0.520 | 0.270 | .955 | |
40 | 0.503 | 0.253 | .960 | |
60 | 0.612 | 0.374 | .914 | |
80 | 0.637 | 0.405 | .899 | |
Crude fat | 10 | 0.477 | 0.228 | .895 |
20 | 0.361 | 0.096 | .913 | |
40 | 0.281 | 0.078 | .934 | |
60 | 0.266 | 0.071 | .939 | |
80 | 0.282 | 0.079 | .933 | |
Protein | 10 | 0.554 | 0.307 | .928 |
20 | 0.431 | 0.186 | .930 | |
40 | 0.379 | 0.144 | .950 | |
60 | 0.390 | 0.152 | .953 | |
80 | 0.409 | 0.167 | .948 |
As shown in Table 2, the Rcv of the soybean moisture content model increased from .954 to .960 when the particle size of crushed samples was decreased from 10 mesh to 40 mesh. The Rcv values of the 60 and 80 mesh models, which were .914 and .899, respectively, were lower than that of the 40 mesh model. The Rcv of the soybean crude fat content model increased from .895 to .939 when the particle size was decreased from 10 mesh to 60 mesh. The Rcv of the 80 mesh model was .933, which was lower than that of the 60 mesh model. The Rcv of the soybean protein content model increased from .928 to .953 when the particle size was decreased from 10 mesh to 60 mesh. The Rcv of the 80 mesh model was .948, which was lower than that of the 60 mesh model. This phenomenon may be induced from the soybean grinding process as long grinding time may lead slight soybean quality changes. Based on an overall consideration of the modeling results of the crushed soybean samples, the standardized multivariate scatter correction method was selected as the best method for determination of the moisture content of soybeans. As shown in Table 2, crushed soybean particles, screened using a 40 mesh sieve, gave the minimum RMSEC and SECV values and the maximum Rcv value (.503, .253, and .960, respectively). The normalized multivariate scatter correction method was selected as the best method for determination of the crude fat content of soybeans. Crushed soybean particles, screened using a 60 mesh sieve, gave the minimum RMSEC and SECV values and the maximum Rcv value (.266, .071, and .939, respectively). The deviation-corrected multivariate scatter correction method was selected as the best method for determination of the protein content of soybeans. Crushed soybean particles, screened using a 60 mesh sieve, gave the minimum RMSEC and SECV values and the maximum Rcv value (.390, .152, and .953, respectively). The models for moisture, crude fat, and protein content, established under optimal conditions, are shown in Figure 2.

3.4 Establishment of NIR calibration model
In this step, the calibration model using 216 soybean samples which were crushed to optimal particles size. The data were then processed using appropriate spectral mathematical procedures mentioned before, including multiple scattering correction (MSC), derivative, detrending, normalization, offset correction, and standard normal variate. The internal cross-validation method is used to evaluate the predictive performance of the model. The internal cross-validation verifies the superiority of the detection model by RMSEC, SECV, and cross-correlation coefficient Rcv. The smaller the RMSEC and SECV, the larger the Rcv, and the better the model predictive performance.
In this study, Horizon MB stoichiometric software modeling and analysis of near-infrared instrument were used to pretreat the calibration sample. After proper mathematical treatment, it can be seen from Table 3 that for the soybean crushed particles, 40 mesh water RMSEC and SECV were the smallest, Rcv was the largest, RMSEC was 0.451, SECV was 0.203, and cross-validation correlation coefficient was 0.965. As can be seen from Table 4, the calibration curve of the normalized normalized multiple scattering of 60-mesh crude fat for soybean smash particles is the best, RMSEC and SECV are the smallest, Rcv is the largest, RMSEC is 0.735, the SECV was 0.540, and the cross-validation correlation coefficient Rcv was .922. As can be seen from Table 5, the method for correcting the scattering of 60 mesh protein by soybean smash particles is the best using the multiple scattering correction method with the smallest RMSEC and SECV, the largest Rcv and the standard deviation of correction (RMSEC) of 0.537, the error (SECV) was 0.287, and the cross-validation correlation coefficient Rcv was .920.
Processing method | RMSEC | SECV | R cv | R | |
---|---|---|---|---|---|
Moisture | MSC | 0.464 | 0.215 | .961 | .980 |
Derivative | 0.474 | 0.224 | .958 | .979 | |
Detrending | 0.479 | 0.230 | .956 | .978 | |
Normalization | 0.466 | 0.218 | .960 | .980 | |
MSC/Derivative | 0.458 | 0.210 | .963 | .981 | |
MSC/Detrending | 0.462 | 0.214 | .962 | .981 | |
MSC/Normalization | 0.466 | 0.218 | .960 | .980 | |
Derivative/Detrending | 0.476 | 0.226 | .957 | .978 | |
Derivative/Normalization | 0.451 | 0.203 | .965 | .983 | |
Detrending/Normalization | 0.454 | 0.206 | .964 | .982 |
Processing method | RMSEC | SECV | R cv | R | |
---|---|---|---|---|---|
Crude fat | MSC | 0.735 | 0.541 | .922 | .960 |
Detrending | 0.758 | 0.574 | .912 | .955 | |
Offset correction | 0.741 | 0.549 | .920 | .959 | |
Standard normal variate | 0.750 | 0.562 | .916 | .957 | |
MSC/Detrending | 0.756 | 0.571 | .913 | .956 | |
Offset correction/MSC | 0.735 | 0.541 | .922 | .960 | |
Standard normal variate/MSC | 0.735 | 0.540 | .922 | .960 | |
Detrending/Offset correction | 0.758 | 0.575 | .912 | .955 | |
Detrending/Standard normal variate | 0.773 | 0.597 | .905 | .952 | |
Offset correction/Standard normal variate | 0.747 | 0.558 | .917 | .958 |
Processing method | RMSEC | SECV | R cv | R | |
---|---|---|---|---|---|
Protein | MSC | 0.537 | 0.288 | .920 | .959 |
Detrending | 0.604 | 0.365 | .871 | .933 | |
Offset correction | 0.637 | 0.406 | .840 | .917 | |
Standard normal variate | 0.663 | 0.439 | .814 | .902 | |
MSC/Detrending | 0.612 | 0.375 | .864 | .930 | |
Offset correction/MSC | 0.537 | 0.287 | .920 | .959 | |
Standard normal variate/MSC | 0.537 | 0.288 | .920 | .959 | |
Detrending/Offset correction | 0.610 | 0.373 | .866 | .930 | |
Detrending/Standard normal variate | 0.627 | 0.393 | .851 | .922 | |
Offset correction/Standard normal variate | 0.602 | 0.362 | .873 | .935 |
Under the best processing method, the Rcv values for moisture, crude fat, and protein content were .965, .922, and .920 shown in Table 6. However, some outliers will inevitably occur in the establishment of a prediction model using NIR spectral data, and the presence of these outliers will seriously affect the accuracy of the prediction model. To avoid the elimination of outliers by mistake, the soybean quality value and spectrum of outliers were measured again. If it is still an outlier, it is permanently removed from the calibration set; otherwise, the sample is retained. Results of the corrected NIR calibration model are shown in Table 6 and the data based on the corrected model are shown in Figure 3. The Rcv values for soybean crude fat and protein content increased to .949 and .941, respectively, after correction. There were no outliers in the NIR model for moisture content in crushed soybeans as the moisture detection based on AACC method or NIR method was simple and effective.
Parameters | Processing Method | RMSEC | SECV | R cv |
---|---|---|---|---|
Moisture | Derivative/Normalization | 0.451 | 0.203 | .965 |
Crude fat | Standard normal variate/MSC | 0.735 | 0.540 | .922 |
Correction of crude fat | Standard normal variate/MSC | 0.648 | 0.420 | .949 |
Protein | Offset correction/MSC | 0.537 | 0.288 | .920 |
Correction of protein | Offset correction/MSC | 0.506 | 0.256 | .941 |

Based on the NIR methods, soybean moisture and protein content models were also established by Ferreira et al. who obtained R2 values of .800 and .810 for the moisture content model and protein content model, respectively (Ferreira, Pallone, & Poppi, 2013). D.S. Ferreira used totally 40 soybean samples to use near-infrared and midinfrared spectroscopy, with diffuse reflectance measurements, associated with multivariate calibration methods based on partial least squares algorithm. The determination coefficient (R2) for moisture, ash, protein, and lipid content were 0.72, 0.73, and 0.88, respectively, having an RMSECV (root mean square error of cross-validation) <2.09% (Ferreira et al., 2014). However, in this study, the calibration model using 216 soybean samples was improved after the samples were crushed to give particles of uniform size with better prediction ability. The cross-validation Rcv values for soybean moisture, crude fat, and protein content were .965, .949, and .941, respectively, with very small SECVs (0.203, 0.420, and 0.256, respectively).
3.5 External validation of NIR soybean model
SPSS linear regression analysis was performed on selected soybean samples as externally validated data and experimentally determined chemical values. The Anovab variance table was mainly used for the F test of regression linearity. The statistics F means square regression and mean residual sum of square. If the F value is too small, indicating that the explanatory power of the independent variables to the dependent variable is very poor, fitting the regression line is meaningless. If the smaller the probability value sig, the more obvious the linear correlation is.
As can be seen from Table 7, the F of variance of soybean crushed granules moisture, crude fat, and protein is very large, respectively, 1494.903, 1192.713, and 1173.284, Sig values are 0, and the corresponding regression normalized residuals standard PP diagram, it can be seen that the standard PP diagram for each indicator points is basically located in a straight line, indicating that the external verification of several regression lines is meaningful, the establishment of near-infrared calibration set model is applicable and can be used to accurately determine the unknown quality of the soybean sample content.
Parameters | Model | Sum of squares | df | Mean square | F | Sig. |
---|---|---|---|---|---|---|
Moisture | Regression coefficients | 37.389 | 1 | 37.389 | 1494.903 | .000a |
Residual | 1.301 | 52 | 0.025 | |||
Total | 38.689 | 53 | ||||
Regression coefficients | 212.231 | 1 | 212.231 | 1192.713 | .000a | |
Crude fat | Residual | 9.253 | 52 | 0.178 | ||
Total | 221.484 | 53 | ||||
Protein | Regression coefficients | 58.207 | 1 | 58.207 | 1173.284 | .000a |
Residual | 2.580 | 52 | 0.050 | |||
Total | 60.787 | 53 |
- a Dependent variables: whole grain moisture, crushed water, crude fat, and protein predictions.
- b Predictors: (constant), crushed water, crude fat, and protein measurements.
External validation results showed that, for the 54 soybean samples as a group, the minimum deviation, maximum deviation, and mean deviation for the moisture content of the crushed grains were 0.004%, 0.349%, and 0.129%, respectively. Corresponding values for crude fat were 0.006%, 0.941%, and 0.329%, and those for protein were 0.012%, 0.607%, and 0.204%. Verification is needed to evaluate established NIR models, and external prediction samples are used to test their applicability. R2 values of the soybean quality indices models are close to 1, indicating a high degree of fitting in regression of the measured and predicted values. The Durbin–Watson test is commonly used to detect whether there is a residual. The value of the Durbin–Watson statistic lies between 0 and 4, and a value close to 2 indicates that the variables are independent of each other. We used this test for independent assessment of the residual between the predicted and measured values in the soybean model. Calibration models were imported into the high efficiency MB3600 FT-NIR spectrometer used in the experiment. Spectral scanning was performed on 54 samples comprising different varieties of soybean collected from Northeast China and the Yangtze River Basin, and predicted values of moisture, crude fat, and protein content were generated. The statistics of the external validation model are shown in Table 8, and correlations between the measured and predicted values are shown in Figure 4.
Moisture | Crude Fat | Protein | |
---|---|---|---|
Minimum deviation (%) | 0.004 | 0.006 | 0.012 |
Maximum deviation (%) | 0.349 | 0.941 | 0.607 |
Mean deviation (%) | 0.129 | 0.329 | 0.204 |
R | .983 | .979 | .979 |
R 2 | .966 | .958 | .958 |
Durbin–watson | 1.803 | 1.838 | 1.923 |
p value | .000 | .000 | .000 |

Many papers have been established good models for fast evaluation of crop quality. Heman et al. established a model to determine the moisture content of rice and obtained an R2 value of .920 in external validation of the model (Heman & Hsieh, 2016). Fassio et al. established a model to determine the crude fat content of corn and obtained an R2 value of .900 in external validation of the model (Fassio, Restaino, & Cozzolino, 2015). Xiaodong Mao et al. established a model to determine the protein content of wheat and obtained an R value 0.975 in external validation of the model (Mao, Sun, Hui, & Xu, 2014). In the present study, R2 values of the NIR models for determination of moisture, crude fat, and protein content were .966, .958, and .958, respectively, and Rcv values were .965, .941, and .949, respectively, demonstrating that the models of soybean moisture, crude fat, and protein content have good predictive value. Values of the Durbin–Watson statistic for the moisture, crude fat, and protein content of crushed soybean grains were very close to 2, indicating that the residual of the model is not self-correlated and that the regression equation covers the dependent variable changes. Additionally, p values for the regression parts were all 0.00, which is lower than the significance level of 0.05, showing that the values predicted by the model are highly significant in the interpretation of real values for soybean samples. After comprehensive analysis using the results of external validation, the predictive performance of the calibration model using the external validation set was found to be credible, indicating that the NIR calibration models for the moisture, crude fat, and protein content of crushed soybean are representative and have good predictive ability.
4 CONCLUSION
In this paper, we concluded that achieving a uniform particle size by crushing and sieving provides a good solution to the problem of poor reproducibility of prediction models for soybean quality indices caused by individual differences in samples among varieties. We have studied optimal particle sizes for NIR models of moisture, crude fat, and protein content of soybeans, using FT-NIRS technology. The external validation results of calibration models using soybean samples from Northeast China and the Yangtze River Basin indicated that models with unified particle sizes showed significant predictive ability for various components in soybean samples of different varieties, from different regions, and with different sizes. Such models showed very high prediction accuracy and reproducibility for soybean moisture, crude fat, and protein content, with external validation R2 values of .966, .958, and .958, respectively. Both internal cross-validation and external validation were performed for these models. The predictive performance of the models, established using the soybean calibration set on samples of the external validation set, was found to be credible, indicating that the NIRS detection models for the determination of the main soybean components are feasible and can be used for rapid determination of the components of soybean.
ACKNOWLEDGMENTS
This work is financially supported by the National Key Research and development Plan (2017YFD0401404), the National Natural Science Foundation of China (Grants 31601547), the Fundamental Research Funds for the Provincial Universities (Grants 16KJB550003), the Public Welfare Science Research of Grain Industry (201313007), and the National Science and Technology Support Program (2014BAD04B10).
CONFLICT OF INTEREST
There was no conflict of interest.