Multilayer hybrid strategy for phishing email zero-day filtering
M. U. Chowdhury
School of Information Technology, Deakin University, Locked Bag 20000, Geelong, 3220 Vic, Australia
Search for more papers by this authorJ. H. Abawajy
School of Information Technology, Deakin University, Locked Bag 20000, Geelong, 3220 Vic, Australia
Search for more papers by this authorCorresponding Author
A. V. Kelarev
School of Information Technology, Deakin University, Locked Bag 20000, Geelong, 3220 Vic, Australia
Correspondence to: Andrei V. Kelarev, School of Information Technology, Deakin University, 221 Burwood Hwy, Melbourne, Vic 3125, Australia.
E-mail: [email protected]
Search for more papers by this authorT. Hochin
Division of Information Science, Graduate School of Science and Technology, Kyoto Institute of Technology, Kyoto, Japan
Search for more papers by this authorM. U. Chowdhury
School of Information Technology, Deakin University, Locked Bag 20000, Geelong, 3220 Vic, Australia
Search for more papers by this authorJ. H. Abawajy
School of Information Technology, Deakin University, Locked Bag 20000, Geelong, 3220 Vic, Australia
Search for more papers by this authorCorresponding Author
A. V. Kelarev
School of Information Technology, Deakin University, Locked Bag 20000, Geelong, 3220 Vic, Australia
Correspondence to: Andrei V. Kelarev, School of Information Technology, Deakin University, 221 Burwood Hwy, Melbourne, Vic 3125, Australia.
E-mail: [email protected]
Search for more papers by this authorT. Hochin
Division of Information Science, Graduate School of Science and Technology, Kyoto Institute of Technology, Kyoto, Japan
Search for more papers by this authorSummary
The cyber security threats from phishing emails have been growing buoyed by the capacity of their distributors to fine-tune their trickery and defeat previously known filtering techniques. The detection of novel phishing emails that had not appeared previously, also known as zero-day phishing emails, remains a particular challenge. This paper proposes a multilayer hybrid strategy (MHS) for zero-day filtering of phishing emails that appear during a separate time span by using training data collected previously during another time span. This strategy creates a large ensemble of classifiers and then applies a novel method for pruning the ensemble. The majority of known pruning algorithms belong to the following three categories: ranking based, clustering based, and optimization-based pruning. This paper introduces and investigates a multilayer hybrid pruning. Its application in MHS combines all three approaches in one scheme: ranking, clustering, and optimization. Furthermore, we carry out thorough empirical study of the performance of the MHS for the filtering of phishing emails. Our empirical study compares the performance of MHS strategy with other machine learning classifiers. The results of our empirical study demonstrate that MHS achieved the best outcomes and multilayer hybrid pruning performed better than other pruning techniques. Copyright © 2016 John Wiley & Sons, Ltd.
References
- 1Liu T, Guan X, Qu Y, Sun Y. A layered classification for malicious function identification and malware detection. Concurrency and Computation: Practice and Experience 2012; 24: 1169–1179.
- 2Islam R, Abawajy J. A multi-tier phishing detection and filtering approach. Journal of Network and Computer Applications 2013; 36: 324–335.
- 3Ezzati-Jivan N, Dagenais MR. Cube data model for multilevel statistics computation of live execution traces. Concurrency and Computation: Practice and Experience 2015; 27: 1069–1091.
- 4Miao X, Jin X, Ding J. A new hybrid solver with two-level parallel computing for large-scale structural analysis. Concurrency and Computation: Practice and Experience 2015; 27: 3661–3675.
- 5 APWG. Phishing activity trends report. 2015. (Available from: http://www.antiphishing.org/resources/apwg-reports/), [Accessed on 21 October 2015].
- 6Alsharnouby M, Alaca F, Chiasson S. Why phishing still works: user strategies for combating phishing attacks. International Journal of Human-Computer Studies 2015; 82: 69–82.
- 7Zeydan HZ, Selamat A, Sallehm M. Survey of anti-phishing tools with detection capabilities. In Proceedings of the 2014 International Symposium on Biometrics and Security Technologies, ISBAST: Kuala Lumpur, Malaysia, 2014a; 2014–2019.
- 8Alazab M, Venkatraman S, Watters P, Alazab M. Zero-day malware detection based on supervised learning algorithms of API call signatures. In Data Mining and Analytics 2011, Proceedings of the Ninth Australasian Data Mining Conference, AusDM2011, CRPIT, vol. 121: Ballarat, Australia, 2011; 171–182.
- 9Islam R, Tian R, Moonsamy V, Batten L. A comparison of the classification of disparate malware collected in different time periods. Journal of Networks 2012; 7: 956–955.
10.4304/jnw.7.6.946-955 Google Scholar
- 10Islam R, Altas I, Islam MS. Exploring timeline-based malware classification. In Proceedings of the 28th IFIP TC International Conf. Security and Privacy Protection in Information Processing Systems, SEC 2013, IFIP Advances in Information and Communication Technology, vol. 405: Auckland, New Zealand, 2013; 1–13.
- 11Tsoumakas G, Partalas I, Vlahavas I. An ensemble pruning primer. In Applications of Supervised and Unsupervised Ensemble Methods, Studies in Computational Intelligence, vol. 245. Springer: Verlag, 2009; 1–13.
- 12Almomani A, Wan TC, Manasrah A, Altaher A, Almomani E, Al-Saedi K, Alnajjar A, Ramadass S. A survey of learning based techniques of phishing email filtering. International Journal of Digital Content Technology and its Applications 2012; 6: 119–129.
10.4156/jdcta.vol6.issue18.14 Google Scholar
- 13Almomani A, Gupta BB, Atawneh S, Meulenberg A, Almomani E. A survey of phishing email filtering techniques. IEEE Communications Surveys & Tutorials 2013; 15: 2070–2090.
- 14Khonji M, Iraqi Y, Jones A. Phishing detection: a literature survey. IEEE Communications Surveys & Tutorials 2013; 15: 2091–2121.
- 15Zeydan HZ, Selamat A, Sallehm M. Current state of anti-phishing approaches and revealing competencies. Journal of Theoretical and Applied Information Technology 2014b; 70: 507–515.
- 16Hamid IRA, Abawajy J. Hybrid feature selection for phishing email detection. In International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2011, LNCS, vol. 7017: Melbourne, Australia, 2011; 266–275.
- 17Hamid IRA, Abawajy JH. An approach for profiling phishing activities. Computers & Security 2014; 45: 27–41.
- 18Li S, Schmitz R. A novel anti-phishing framework based on honeypots. In Proceedings of the eCrime Researchers SummiteCRIME'09: Tacoma, WA, USA, 2009; 1–13.
- 19Barraclough PA, Hossain MA, Tahir MA, Sexton G, Aslam N. Intelligent phishing detection and protection scheme for online transactions. Expert Systems with Applications 2013; 40: 4697–4706.
- 20Ramanathan V, Wechsler H. Phishing detection and impersonated entity discovery using conditional random field and latent Dirichlet allocation. Computers & Security 2013; 34: 123–139.
- 21Abawajy J, Beliakov G, Kelarev A, Chowdhury M. Iterative construction of hierarchical classifiers for phishing website detection. Journal of Networks 2014; 9: 2089–2098.
10.4304/jnw.9.8.2089-2098 Google Scholar
- 22Akinyelu AA, Adewumi AO. Classification of phishing email using random forest machine learning technique. Journal of Applied Mathematics 2014: 1–6. Article ID 425731.
- 23Lu Z, Wu X, Zhu X, Bongard J. Ensemble pruning via individual contribution ordering. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010: Washington, DC, USA, 2010; 871–880.
- 24Giacinto G, Roli F, Fumera G. Design of effective multiple classifier systems by clustering of classifiers. In Proceedings of the 15th International Conference on Pattern Recognition: Barcelona, Spain, 2000; 160–163.
- 25Lazarevic A, Obradovic Z. Effective pruning of neural network classifier ensembles. In Proceedings if the 2001 IEEE/INNS International Joint Conference on Neural Networks: Washington, DC, USA, 2001; 796–801.
- 26Zhou H, Zhao X, Wang X. An effective ensemble pruning algorithm based on frequent patterns. Knowledge-Based Systems 2014; 56: 79–85.
- 27Dai Q, Liu Z. ModEnPBT: a modified backtracking ensemble pruning algorithm. Applied Soft Computing 2013; 13: 4292–4302.
- 28Sheen S, Aishwarya SV, Anitha R, Raghavan SV, Bhaskar SM. Ensemble pruning using harmony search. LNAI 2012; 7209: 13–24.
- 29Sheen S, Anitha R, Sirisha P. Malware detection by pruning of parallel ensembles using harmony search. Pattern Recognition Letters 2013; 34: 1679–1686.
- 30Abdi L, Hashemi S. GAB-EPA: a GA based ensemble pruning approach to tackle multiclass imbalanced problems. LNAI 2013; 7802: 246–254.
- 31Guo L, Boukir S. Margin-based ordered aggregation for ensemble pruning. Pattern Recognition Letters 2013; 34: 603–609.
- 32Dai Q. An efficient ensemble pruning algorithm using one-path and two-trips searching approach. Knowledge-Based Systems 2013; 51: 85–92.
- 33Zhang G, Zhang S, Wang C, Cheng L. Ensemble pruning for data dependent learners. Applied Mechanics and Materials 2012; 135-136: 522–527.
10.4028/www.scientific.net/AMM.135-136.522 Google Scholar
- 34Dai Q. A novel ensemble pruning algorithm based on randomized greedy selective strategy and ballot. Neurocomputing 2013; 122: 258–265.
- 35Toraman C, Can F. Squeezing the ensemble pruning: faster and more accurate categorization for news portals. LNCS 2012; 7224: 508–511.
- 36Bhowan U, Johnston M, Zhang M. Ensemble learning and pruning in multi-objective genetic programming for classification with unbalanced data. In AI 2011: Advances in Artificial Intelligence, 24th Australasian Joint Conference on Artificial Intelligence, LNAI, Wang D, Reynolds M (eds), Vol. 7106, Perth, Australia, 2011; 192–202.
- 37Partalas I, Tsoumakas G, Vlahavas I. Pruning an ensemble of classifiers via reinforcement learning. Neurocomputing 2009; 72: 1900–1909.
- 38Guo H, Zhi W, Han X, Fan M. A new metric for greedy ensemble pruning. LNAI 2011; 7003: 631–639.
- 39Partalas I, Tsoumakas G, Vlahavas I. An ensemble uncertainty aware measure for directed hill climbing ensemble pruning. Machine Learning 2010; 81: 257–282.
- 40Soto V, Martinez-Munoz G, Hernandez-Lobato D, Suarez A. A double pruning algorithm for classification ensembles. LNCS 2010; 5997: 104–113.
- 41Hernandez-Lobato D, Martinez-Munoz G. Statistical instance-based pruning in ensembles of independent classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 2009; 31: 364–369.
- 42Zhao QL, Jiang YH, Xu M. A fast ensemble pruning algorithm based on pattern mining process. Data Mining and Knowledge Discovery 2009; 19: 277–292.
- 43Islam R, Abawajy J, Warren M. Multi-tier phishing email classification with an impact of classifier rescheduling. In Proceedings of the 10th International Symposium on Pervasive Systems, Algorithms, and Networks, 2009; 789–793.
- 44Islam R, Zhou W, Chowdhury MU. Email categorization using (2+1)-tier classification algorithms. In Proceedings – 7th IEEE/ACIS International Conference on Computer and Information Science, IEEE/ACIS ICIS 2008, In conjunction with 2nd IEEE/ACIS Int. Workshop on e-Activity, IEEE/ACIS IWEA 2008: Portland, OR, USA, 2008; 276–281.
- 45Islam R, Zhou W, Gao M, Xiang Y. An innovative analyser for multi-classifier email classification based on grey list analysis. Journal of Network and Computer Applications 2009; 32: 357–366.
- 46Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. SIGKDD. Explorations 2009; 11: 10–18.
10.1145/1656274.1656278 Google Scholar
- 47Witten IH, Frank E, Hall MA. Data Mining: Practical Machine Learning Tools and Techniques (3rd edn.) Elsevier/Morgan Kaufman: Amsterdam, 2011.
- 48Rousseeuw P. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational & Applied Mathematics 1987; 20: 53–65.
- 49Islam R, Zhou W, Chowdhury M. Minimizing the drawbacks of grey list analyser of synthesis based spam filtering. Journal of Electronics and Computer Science 2009; 11: 89–96.
- 50Yearwood J, Webb D, Ma L, Vamplew P, Ofoghi B, Kelarev A. Data Mining and Analytics 2009 Proc. 8th Australasian Data Mining Conference AusDM 2009 CRPIT. Applying clustering and ensemble clustering approaches to phishing profiling, PJ Kennedy, K Ong, P Christen (eds), Vol. 101. ACS: Melbourne, Australia, 2009; 25–34.
- 51Peng T, Liu L, Zuo W. PU text classification enhanced by term frequency-inverse document frequency-improved weighting. Concurrency and Computation: Practice and Experience 2014; 26: 728–741.
- 52Huda S, Abawajy J, Alazab M, Abdollalihian M, Islam R, Yearwood J. Hybrids of support vector machine wrapper and filter based framework for malware detection. Future Generation Computer Systems 2016; 55: 376–390.
- 53Villar-Rodriguez E, Del Ser J, Torre-Bastida AI, Bilbao MN, Salcedo-Sanz S. A novel machine learning approach to the detection of identity theft in social networks based on emulated attack instances and support vector machines. Concurrency Computat.: Pract. Exper 2015; 27. 10.1002/cpe.3633.
Citing Literature
10 December 2017
e3929