Feature selection based on multiview entropy measures in multiperspective rough set
Jiucheng Xu
Engineering Lab of Intelligence Business & Internet of Things, Henan Province, Xinxiang, China
College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
Search for more papers by this authorCorresponding Author
Kanglin Qu
Engineering Lab of Intelligence Business & Internet of Things, Henan Province, Xinxiang, China
College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
Correspondence Kanglin Qu, College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China.
Email: [email protected]
Search for more papers by this authorXiangru Meng
Engineering Lab of Intelligence Business & Internet of Things, Henan Province, Xinxiang, China
College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
Search for more papers by this authorYuanhao Sun
Engineering Lab of Intelligence Business & Internet of Things, Henan Province, Xinxiang, China
College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
Search for more papers by this authorQincheng Hou
Engineering Lab of Intelligence Business & Internet of Things, Henan Province, Xinxiang, China
College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
Search for more papers by this authorJiucheng Xu
Engineering Lab of Intelligence Business & Internet of Things, Henan Province, Xinxiang, China
College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
Search for more papers by this authorCorresponding Author
Kanglin Qu
Engineering Lab of Intelligence Business & Internet of Things, Henan Province, Xinxiang, China
College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
Correspondence Kanglin Qu, College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China.
Email: [email protected]
Search for more papers by this authorXiangru Meng
Engineering Lab of Intelligence Business & Internet of Things, Henan Province, Xinxiang, China
College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
Search for more papers by this authorYuanhao Sun
Engineering Lab of Intelligence Business & Internet of Things, Henan Province, Xinxiang, China
College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
Search for more papers by this authorQincheng Hou
Engineering Lab of Intelligence Business & Internet of Things, Henan Province, Xinxiang, China
College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
Search for more papers by this authorAbstract
The performance of the neighborhood rough set model in feature selection is limited by nonobjective parameter selection method, the uncertainty measures considered only from a single view, and high time cost caused by processing high-dimensional data. To solve the above problems, this study first defines the interclass boundary to granulate the samples in different classes, and three types of neighborhood concepts—negative perspective, neutral perspective, and positive perspective—are put forward based on different cognitive perspectives. Then, the concept of the multiperspective rough set model is developed. The most prominent feature of this model is the discovery of differences between classes from the given data, without any parameters. Second, by integrating the information theory and algebraic views under the multiperspective rough set model, multiview entropy measures are proposed to effectively measure the uncertainty in data. Moreover, a nonmonotonic feature selection algorithm based on the mutual information in the multiview entropy measures under the neutral perspective as the evaluation function of feature importance is designed to resolve the disadvantages of the algorithms based on the monotone evaluation function. Finally, Information Gain is introduced to preliminarily decrease the dimension of high-dimensional data sets to promote classification accuracy and reduce time consumption. The experimental results confirm that the proposed algorithm is efficient in eliminating noise and increasing classification accuracy.
CONFLICTS OF INTEREST
The authors declare no conflicts of interest.
REFERENCES
- 1Hu QH, Yu DR, Liu JF, Wu CX. Neighborhood rough set based heterogeneous feature subset selection. Inf Sci. 2008; 178: 3577-3594.
- 2Miao DQ. Discretization of continuous attributes in rough set theory. Acta Autom Sin. 2001; 27: 296-302.
- 3Pawlak Z. Rough sets and intelligent data analysis. Inf Sci. 2002; 147: 1-12.
- 4Zhang PF, Li TR, Luo C, Wang GQ. AMG-DTRS: adaptive multi-granulation decision-theoretic rough sets. Int J Approximate Reason. 2022; 140: 7-30.
- 5Zhang JB, Li TR, Ruan D, Liu D. Neighborhood rough sets for dynamic data mining. Int J Intell Syst. 2012; 27(4): 317-342.
- 6Wang CZ, Huang Y, Shao MW, Hu QH, Chen DG. Feature selection based on neighborhood self-information. IEEE Trans Cybern. 2020; 50(9): 4031-4042.
- 7Sun L, Wang LY, Ding WP, Qian YH, Xu JC. Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems. Knowl-Based Syst. 2020; 192. doi:10.1016/j.knosys.2019.105373
- 8Xu JC, Qu KL, Yuan M, Yang J. Feature selection combining information theory view and algebraic view in the neighborhood decision System. Entropy. 2021; 23(6) . doi:10.3390/e23060704
- 9Rehman N, Ali A, Liu PD, Hila K. A comprehensive study of upward fuzzy preference relation based fuzzy rough set models: properties and applications in treatment of coronavirus disease. Int J Intell Syst. 2021; 36(8): 3704-3745.
- 10Zhang XY, Fan YR, Chen S, Tang LY, Lv ZY. Classification-level and class-level complement information measures based on neighborhood decision systems. Cognit Comput. 2021; 13(6): 1530-1555.
- 11Sun L, Wang LY, Ding WP, Qian YH, Xu JC. Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy neighborhood multigranulation rough sets. IEEE Trans Fuzzy Syst. 2021; 29: 19-33.
- 12Xu JC, Wang Y, Mu HY, Huang FZ. Feature genes selection based on fuzzy neighborhood conditional entropy. J Intell Fuzzy Syst. 2019; 36(1): 117-126.
- 13Shu WH, Qian WB, Xie YH. Incremental feature selection for dynamic hybrid data using neighborhood rough set. Knowl-Based Syst. 2020; 194. doi:10.1016/j.knosys.2020.105516
- 14Chen YY, Chen YM. Feature subset selection based on variable precision neighborhood rough sets. Int J Comput Int Sys. 2021; 14(1): 572-581.
- 15Campagner A, Ciucci D. Measuring uncertainty in orthopairs. In: European Conference on Symbolic & Quantitative Approaches to Reasoning & Uncertainty. Springer; 2017: 423-432.
10.1007/978-3-319-61581-3_38 Google Scholar
- 16Sun L, Zhang XY, Xu JC, Zhang SG. An attribute reduction method using neighborhood entropy measures in neighborhood rough sets. Entropy. 2019 21(2) . doi:10.3390/e21020155
- 17Mu HY, Xu JC, Wang Y, Sun L. Feature genes selection using Fisher transformation method. J Intell Fuzzy Syst. 2018; 34(6): 4291-4300.
- 18Sun L, Zhang XY, Xu JC, Wang W, Liu RN. A gene selection approach based on the Fisher linear discriminant and the neighborhood rough set. Bioengineered. 2018; 9(1): 144-151.
- 19Xu JC, Yuan M, Ma YY. Feature selection using self-information and entropy-based uncertainty measure for fuzzy neighborhood rough set. Complex Intell Syst. 2021. doi:10.1007/s40747-021-00356-3
- 20Wang JH, Chen HM, Li TR, Yang XL, Sang BB. Dynamic interaction feature selection based on fuzzy rough set. Inf Sci. 2021; 581: 891-911.
- 21Hu QH, Zhang L, Zhang D, Pan W, An S, Pedrycz W. Measuring relevance between discrete and continuous features based on neighborhood mutual information. Expert Syst Appl. 2011; 38(9): 10737-10750.
- 22Campagner A, Ciucci D, Hüllermeier E. Rough set-based feature selection for weakly labeled data. Int J Approximate Reason. 2021; 136: 150-167.
- 23Sun L, Zhang XY, Qian YH, Xu JC, Zhang SG. Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf Sci. 2019; 502: 18-41.
- 24Wan JH, Chen HM, Li TR, Yuan Z, Liu J, Huang W. Interactive and complementary feature selection via fuzzy multigranularity uncertainty measures. IEEE Trans Cybern. 2021. doi:10.1109/TCYB.2021.3112203
- 25Zhang R, Nie FP, Li XL, Wei X. Feature selection with multi-view data: a survey. Inf Fusion. 2019; 50: 158-167.
- 26Sang BB, Chen HM, Li TR, Xu WH, Yu H. Incremental approaches for heterogeneous feature selection in dynamic ordered data. Inf Sci. 2020; 541: 475-501.
- 27Chen Y, Song JJ, Yang XB. Accelerator for finding reduct based on attribute group. J Nanjing Univ Sci Technol. 2020; 44(2): 216-223.
- 28Zhou ZH. Machine Learning. Tsinghua University Press; 2016: 83-85.
10.1201/9781315371658-6 Google Scholar
- 29Hu QH, Pan W, An S, Ma PJ, Wei JM. An efficient gene selection technique for cancer recognition based on neighborhood mutual information. Int J Mach Learn Cybern. 2010; 1: 63-74.
- 30Lin YJ, Hu QH, Liu JH, Chen JK, Duan J. Multi-label feature selection based on neighborhood mutual information. Appl Soft Comput. 2016; 38: 244-256.
- 31Sun L, Yin TY, Ding WP, Qian YH, Xu JC. Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems. Inf Sci. 2020; 537: 401-424.
- 32Fan XD, Zhao WD, Wang CZ, Huang Y. Attribute reduction based on max-decision neighborhood rough set model. Knowl-Based Syst. 2018; 151: 16-23.
- 33Li WW, Huang ZQ, Jia XY, Cai XY. Neighborhood based decision-theoretic rough set models. Int J Approximate Reason. 2016; 69: 1-17.
- 34Chen YM, Zeng ZQ, Lu JW. Neighborhood rough set reduction with fish swarm algorithm. Soft Comput. 2017; 21: 6907-6918.
- 35Wang TH, Li W. Kernel learning and optimization with Hilbert–Schmidt independence criterion. Int. J Mach Learn Cybern. 2019; 9: 1707-1717.
- 36Shukla AK, Singh P, Vardhan M. A hybrid gene selection method for microarray recognition. Biocybern Biomed Eng. 2018; 38(4): 975-991.
- 37Dong HB, Li T, Ding R, Sun J. A novel hybrid genetic algorithm with granular information for feature selection and optimization. Appl Soft Comput. 2018; 65: 33-46.
- 38Huang XJ, Zhang L, Wang BJ, Li FZ, Zhang Z. Feature clustering based support vector machine recursive feature elimination for gene selection. Appl Intell. 2018; 48(3): 594-607.
- 39Sun SQ, Peng QK, Zhang XK. Global feature selection from microarray data using Lagrange multipliers. Knowl-Based Syst. 2016; 110: 267-274.
- 40Sun L, Liu RN, Xu JC, Zhang SG, Tian Y. An affinity propagation clustering method using hybrid kernel function with LLE. IEEE Access. 2018; 6: 68892-68909.
- 41Jensen R, Shen Q. New approaches to fuzzy-rough feature selection. IEEE Trans Fuzzy Syst. 2009; 17(4): 824-838.
- 42Tan AH, Wu WZ, Qian YH, Liang JY, Chen JK, Li JJ. Intuitionistic fuzzy rough set-based granular structures and attribute subset selection. IEEE Trans Fuzzy Syst. 2019; 27(3): 527-539.
- 43Qian YH, Wang Q, Cheng HH, Liang JY, Dang CY. Fuzzy-rough feature selection accelerator. Fuzzy Set Syst. 2015; 258(1): 61-78.
- 44Chen DG, Zhang L, Zhao SY, Hu QH, Zhu PF. A novel algorithm for finding reducts with fuzzy rough sets. IEEE Trans Fuzzy Syst. 2012; 20(2): 385-389.
- 45Yu DR, Hu QH, Wu CX. Uncertainty measures for fuzzy relations and their applications. Appl Soft Comput. 2007; 7: 1135-1143.
- 46Zeng K, She K, Niu XZ. Multi-granulation entropy and its applications. Entropy. 2013; 15(6): 2288-2302.
- 47Chen YM, Zhang ZJ, Zheng JZ, Ma Y, Xue Y. Gene selection for tumor classification using neighborhood rough sets and entropy measures. J Biomed Inf. 2017; 67: 59-68.
- 48Xu FF, Miao DQ, Wei L. Fuzzy-rough attribute reduction via mutual information with an application to cancer classification. Comput Math Appl. 2009; 57(6): 1010-1017.
- 49Lu HJ, Chen JY, Yan K, Jin Q, Xue Y, Gao ZG. A hybrid feature selection algorithm for gene expression data classification. Neurocomputing. 2017; 256: 56-62.
- 50Xu JC, Mu HY, Wang Y, Huang FZ. Feature genes selection using supervised locally linear embedding and correlation coefficient for microarray classification. Comput Math Methods Med. 2018;2018 . doi:10.1155/2018/5490513
- 51Simon N, Friedman J, Hastie T, Tibshirani R. A sparse-group Lasso. J Comput Graph Stat. 2013; 22(2): 231-245.
- 52Li JT, Dong WP, Meng DY. Grouped gene selection of cancer via adaptive sparse group Lasso based on conditional mutual information. IEEE ACM Trans Comput Biol. 2018; 15(6): 2028-2038.
- 53Sun L, Xu JC, Wang W, Yin Y. Locally linear embedding and neighborhood rough set-based gene selection for gene expression data classification. Genet Mol Res. 2016; 15(3) . doi:10.4238/gmr.15038990
- 54Zhang W, Chen JJ. Relief feature selection and parameter optimization for support vector machine based on mixed kernel function. Int J Performability Eng. 2018; 14(2): 280-289.
- 55Aziz R, Verma CK, Srivastava N. A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data. Genomics Data. 2016; 8: 4-15.
- 56Friedman M. A comparison of alternative tests of significance for the problem of rankings. Ann Math Stat. 1940; 11: 86-92.
- 57Dunn OJ. Multiple comparisons among means. J Am Stat Assoc. 1961; 56(293): 52-64.