Feature selection is one of the most important concepts in data mining when dimensionality reduction is needed. The performance measures of feature selection encompass predictive accuracy and result comprehensibility. Consistency-based methods are a significant category of feature selection research that substantially improves the comprehensibility of the result using the parsimony principle. In this work, the biobjective version of the algorithm logical analysis of inconsistent data is applied to large volumes of data. In order to deal with hundreds of thousands of attributes, heuristic decomposition uses parallel processing to solve a set covering problem and a cross-validation technique. The biobjective solutions contain the number of reduced features and the accuracy. The algorithm is applied to omics datasets with genome-like characteristics of patients with rare diseases.

REFERENCES

Almuallim, H., & Dietterich, T. G. (1991). Learning with many irrelevant features. In Proceedings of the 9th national conference on artificial intelligence (pp. 547–552). Menlo Park: MIT Press.
Google Scholar
Boros, E., Hammer, P. L., Ibaraki, T., Kogan, A., Mayoraz, E., & Muchnik, I. (2000). An implementation of logical analysis of data. IEEE Transactions on Knowledge and Data Engineering, 12(2), 292–306.
10.1109/69.842268
Web of Science® Google Scholar
Boyd, S., Xiao, L., Mutapcic, A., & Mattingley, J. (2008). Notes on decomposition methods. Notes for EE364B, Stanford University, pp. 1–36.
Google Scholar
Cavique, L., Mendes, A. B., & Funk, M. (2011). Logical analysis of inconsistent data (LAID) for a paremiologic study. In: Processing 15th Portuguese Conference on Artificial Inteligence, EPIA.
Google Scholar
Cavique, L., Mendes, A. B., Funk, M., & Santos, J. M. A. (2013). A feature selection approach in the study of Azorean proverbs. In Exploring innovative and successful applications of soft computing, advances in computational intelligence and robotics (ACIR) book series (pp. 38–58). Hershey: IGI Global.
Google Scholar
Cavique, L., Mendes, A. B., & Martiniano, H. F. M. C. (2017). A feature selection algorithm based on heuristic decomposition. In E. Oliveira, J. Gama, Z. Vale, & H. Lopes Cardoso (Eds.), Progress in artificial intelligence, EPIA 2017, lecture notes in computer science (pp. 525–536). Porto, Portugal: Springer, vol. 10423.
10.1007/978-3-319-65340-2_43
Google Scholar
Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers and Electrical Engineering, 40(1), 16–28.
10.1016/j.compeleceng.2013.11.024
Web of Science® Google Scholar
Chung, R. H., Tsai, W. Y., Hsieh, C. H., Hung, K. Y., Hsiung, C. A., & Hauser, E. R. (2014). SeqSIMLA2: Simulating correlated quantitative traits accounting for shared environmental effects in user-specified pedigree structure. Genetic Epidemiology, 39(1), 20–24.
10.1002/gepi.21850
PubMed Web of Science® Google Scholar
Chvatal, V. (1979). A greedy heuristic for the set-covering problem. Mathematics of Operations Research, 4, 233–235.
10.1287/moor.4.3.233
Google Scholar
Collette, Y., & Siarry, P. (2011). Multiobjective optimization, principles and case studies, decision engineering series. Heidelberg: Springer.
Google Scholar
Crama, Y., Hammer, P. L., & Ibaraki, T. (1988). Cause-effect relationships and partially defined Boolean functions. Annals of Operations Research, 16, 299–326.
10.1007/BF02283750
Google Scholar
European Commission (2018). The European declaration on high-performance computing, Retrieved from https://ec.europa.eu/digital-single-market/en/news/european-declaration-high-performance-computing
Google Scholar
John, G.H., Kohavi, R., Pfleger K. (1994). Irrelevant features and the subset selection problem. In: Proceedings of the 11th International Conference on Machine Learning, ICML 94, pp. 121–129.
Google Scholar
Joncour, C., Michel, S., Sadykov, R., Sverdlov, D., & Vanderbeck, F. (2010). Column generation based primal heuristics. Electronic Notes in Discrete Mathematics, Elsevier, 36, 695–702.
10.1016/j.endm.2010.05.088
Google Scholar
Kira, K., & Rendell, L. A. (1992). The feature selection problem: Traditional methods and a new algorithm. Proceedings of 9th National Conference on Artificial Intelligence, 129–134.
Google Scholar
Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491–502.
10.1109/TKDE.2005.66
Web of Science® Google Scholar
Pawlak, Z. (1982). Rough sets. International Journal of Computer and Information Science, 1, 341–356.
10.1007/BF01001956
Google Scholar
Pawlak, Z. (1991). Rough sets. In Theoretical aspects of reasoning about data. Boston: Kluwer Academic Publishers.
10.1007/978-94-011-3534-4
Google Scholar
Peters, J. F., & Skowron, A. (2010). Transactions on rough sets XI. Lecture notes in computer science/transactions on rough sets. Berlin, Heidelberg: Springer.
10.1007/978-3-642-11479-3
Google Scholar
Polkowski, L. (2002). Rough sets, mathematical foundations. Advances in soft computing. Germany: Physica-Verlag Heidelberg.
Google Scholar
Smet, P., Ernst, A., & Van den Berghe, G. (2016). Heuristic decomposition approaches for an integrated task scheduling and personnel rostering problem. Computers and Operations Research, 76, 60–72.
10.1016/j.cor.2016.05.016
Web of Science® Google Scholar
Stephens, Z. D., Lee, S. Y., Faghri, F., Campbell, R. H., Zhai, C., Efron, M. J., … Robinson, G. E. (2015). Big data, astronomical or genomical? PLoS Biology, 13(7), e1002195. https://doi.org/10.1371/journal.pbio.1002195
10.1371/journal.pbio.1002195
PubMed Web of Science® Google Scholar
Talbi, E. G. (2009). Metaheuristics, from design to implementation. Hoboken, New Jersey: John Wiley & Sons, Inc.
10.1002/9780470496916
Google Scholar
The 1000 Genomes Project Consortium (2015). A global reference for human genetic variation. Nature, 526, 68–74.
10.1038/nature15393
CAS PubMed Web of Science® Google Scholar
Yao, P. J., & Chung, R. H. (2016). SeqSIMLA2_exact, simulate multiple disease sites in large pedigrees with given disease status for diseases with low prevalence. Bioinformatics, 32(4), 557–562.
10.1093/bioinformatics/btv626
CAS PubMed Web of Science® Google Scholar

Citing Literature

Volume35, Issue4

Fourth special issue on knowledge discovery and business intelligence

August 2018

e12301

A biobjective feature selection algorithm for large omics datasets

Abstract

REFERENCES

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

A biobjective feature selection algorithm for large omics datasets

Abstract

REFERENCES

Citing Literature

References

Related

Information