Concurrency and Computation: Practice and Experience

Volume 29, Issue 23 e3929

SPECIAL ISSUE PAPER

Multilayer hybrid strategy for phishing email zero-day filtering

M. U. Chowdhury,

M. U. Chowdhury

School of Information Technology, Deakin University, Locked Bag 20000, Geelong, 3220 Vic, Australia

Search for more papers by this author

J. H. Abawajy,

J. H. Abawajy

School of Information Technology, Deakin University, Locked Bag 20000, Geelong, 3220 Vic, Australia

Search for more papers by this author

A. V. Kelarev,

Corresponding Author

A. V. Kelarev

[email protected]

School of Information Technology, Deakin University, Locked Bag 20000, Geelong, 3220 Vic, Australia

Correspondence to: Andrei V. Kelarev, School of Information Technology, Deakin University, 221 Burwood Hwy, Melbourne, Vic 3125, Australia.

E-mail: [email protected]

Search for more papers by this author

T. Hochin,

T. Hochin

Division of Information Science, Graduate School of Science and Technology, Kyoto Institute of Technology, Kyoto, Japan

Search for more papers by this author

M. U. Chowdhury,

M. U. Chowdhury

School of Information Technology, Deakin University, Locked Bag 20000, Geelong, 3220 Vic, Australia

Search for more papers by this author

J. H. Abawajy,

J. H. Abawajy

School of Information Technology, Deakin University, Locked Bag 20000, Geelong, 3220 Vic, Australia

Search for more papers by this author

A. V. Kelarev,

Corresponding Author

A. V. Kelarev

[email protected]

School of Information Technology, Deakin University, Locked Bag 20000, Geelong, 3220 Vic, Australia

Correspondence to: Andrei V. Kelarev, School of Information Technology, Deakin University, 221 Burwood Hwy, Melbourne, Vic 3125, Australia.

E-mail: [email protected]

Search for more papers by this author

T. Hochin,

T. Hochin

Division of Information Science, Graduate School of Science and Technology, Kyoto Institute of Technology, Kyoto, Japan

Search for more papers by this author

First published: 22 July 2016

https://doi.org/10.1002/cpe.3929

Citations: 15

Share a link

Email
Wechat
Bluesky

Summary

The cyber security threats from phishing emails have been growing buoyed by the capacity of their distributors to fine-tune their trickery and defeat previously known filtering techniques. The detection of novel phishing emails that had not appeared previously, also known as zero-day phishing emails, remains a particular challenge. This paper proposes a multilayer hybrid strategy (MHS) for zero-day filtering of phishing emails that appear during a separate time span by using training data collected previously during another time span. This strategy creates a large ensemble of classifiers and then applies a novel method for pruning the ensemble. The majority of known pruning algorithms belong to the following three categories: ranking based, clustering based, and optimization-based pruning. This paper introduces and investigates a multilayer hybrid pruning. Its application in MHS combines all three approaches in one scheme: ranking, clustering, and optimization. Furthermore, we carry out thorough empirical study of the performance of the MHS for the filtering of phishing emails. Our empirical study compares the performance of MHS strategy with other machine learning classifiers. The results of our empirical study demonstrate that MHS achieved the best outcomes and multilayer hybrid pruning performed better than other pruning techniques. Copyright © 2016 John Wiley & Sons, Ltd.

References

1Liu T, Guan X, Qu Y, Sun Y. A layered classification for malicious function identification and malware detection. Concurrency and Computation: Practice and Experience 2012; 24: 1169–1179.
10.1002/cpe.1896
Web of Science® Google Scholar
2Islam R, Abawajy J. A multi-tier phishing detection and filtering approach. Journal of Network and Computer Applications 2013; 36: 324–335.
10.1016/j.jnca.2012.05.009
Web of Science® Google Scholar
3Ezzati-Jivan N, Dagenais MR. Cube data model for multilevel statistics computation of live execution traces. Concurrency and Computation: Practice and Experience 2015; 27: 1069–1091.
10.1002/cpe.3272
Web of Science® Google Scholar
4Miao X, Jin X, Ding J. A new hybrid solver with two-level parallel computing for large-scale structural analysis. Concurrency and Computation: Practice and Experience 2015; 27: 3661–3675.
10.1002/cpe.3361
Web of Science® Google Scholar
5 APWG. Phishing activity trends report. 2015. (Available from: http://www.antiphishing.org/resources/apwg-reports/), [Accessed on 21 October 2015].
Google Scholar
6Alsharnouby M, Alaca F, Chiasson S. Why phishing still works: user strategies for combating phishing attacks. International Journal of Human-Computer Studies 2015; 82: 69–82.
10.1016/j.ijhcs.2015.05.005
Web of Science® Google Scholar
7Zeydan HZ, Selamat A, Sallehm M. Survey of anti-phishing tools with detection capabilities. In Proceedings of the 2014 International Symposium on Biometrics and Security Technologies, ISBAST: Kuala Lumpur, Malaysia, 2014a; 2014–2019.
Google Scholar
8Alazab M, Venkatraman S, Watters P, Alazab M. Zero-day malware detection based on supervised learning algorithms of API call signatures. In Data Mining and Analytics 2011, Proceedings of the Ninth Australasian Data Mining Conference, AusDM2011, CRPIT, vol. 121: Ballarat, Australia, 2011; 171–182.
Google Scholar
9Islam R, Tian R, Moonsamy V, Batten L. A comparison of the classification of disparate malware collected in different time periods. Journal of Networks 2012; 7: 956–955.
10.4304/jnw.7.6.946-955
Google Scholar
10Islam R, Altas I, Islam MS. Exploring timeline-based malware classification. In Proceedings of the 28th IFIP TC International Conf. Security and Privacy Protection in Information Processing Systems, SEC 2013, IFIP Advances in Information and Communication Technology, vol. 405: Auckland, New Zealand, 2013; 1–13.
Google Scholar
11Tsoumakas G, Partalas I, Vlahavas I. An ensemble pruning primer. In Applications of Supervised and Unsupervised Ensemble Methods, Studies in Computational Intelligence, vol. 245. Springer: Verlag, 2009; 1–13.
Google Scholar
12Almomani A, Wan TC, Manasrah A, Altaher A, Almomani E, Al-Saedi K, Alnajjar A, Ramadass S. A survey of learning based techniques of phishing email filtering. International Journal of Digital Content Technology and its Applications 2012; 6: 119–129.
10.4156/jdcta.vol6.issue18.14
Google Scholar
13Almomani A, Gupta BB, Atawneh S, Meulenberg A, Almomani E. A survey of phishing email filtering techniques. IEEE Communications Surveys & Tutorials 2013; 15: 2070–2090.
10.1109/SURV.2013.030713.00020
Web of Science® Google Scholar
14Khonji M, Iraqi Y, Jones A. Phishing detection: a literature survey. IEEE Communications Surveys & Tutorials 2013; 15: 2091–2121.
10.1109/SURV.2013.032213.00009
Web of Science® Google Scholar
15Zeydan HZ, Selamat A, Sallehm M. Current state of anti-phishing approaches and revealing competencies. Journal of Theoretical and Applied Information Technology 2014b; 70: 507–515.
Google Scholar
16Hamid IRA, Abawajy J. Hybrid feature selection for phishing email detection. In International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2011, LNCS, vol. 7017: Melbourne, Australia, 2011; 266–275.
Google Scholar
17Hamid IRA, Abawajy JH. An approach for profiling phishing activities. Computers & Security 2014; 45: 27–41.
10.1016/j.cose.2014.04.002
Web of Science® Google Scholar
18Li S, Schmitz R. A novel anti-phishing framework based on honeypots. In Proceedings of the eCrime Researchers SummiteCRIME'09: Tacoma, WA, USA, 2009; 1–13.
Google Scholar
19Barraclough PA, Hossain MA, Tahir MA, Sexton G, Aslam N. Intelligent phishing detection and protection scheme for online transactions. Expert Systems with Applications 2013; 40: 4697–4706.
10.1016/j.eswa.2013.02.009
Web of Science® Google Scholar
20Ramanathan V, Wechsler H. Phishing detection and impersonated entity discovery using conditional random field and latent Dirichlet allocation. Computers & Security 2013; 34: 123–139.
10.1016/j.cose.2012.12.002
Web of Science® Google Scholar
21Abawajy J, Beliakov G, Kelarev A, Chowdhury M. Iterative construction of hierarchical classifiers for phishing website detection. Journal of Networks 2014; 9: 2089–2098.
10.4304/jnw.9.8.2089-2098
Google Scholar
22Akinyelu AA, Adewumi AO. Classification of phishing email using random forest machine learning technique. Journal of Applied Mathematics 2014: 1–6. Article ID 425731.
10.1155/2014/425731
Web of Science® Google Scholar
23Lu Z, Wu X, Zhu X, Bongard J. Ensemble pruning via individual contribution ordering. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010: Washington, DC, USA, 2010; 871–880.
Google Scholar
24Giacinto G, Roli F, Fumera G. Design of effective multiple classifier systems by clustering of classifiers. In Proceedings of the 15th International Conference on Pattern Recognition: Barcelona, Spain, 2000; 160–163.
Google Scholar
25Lazarevic A, Obradovic Z. Effective pruning of neural network classifier ensembles. In Proceedings if the 2001 IEEE/INNS International Joint Conference on Neural Networks: Washington, DC, USA, 2001; 796–801.
Google Scholar
26Zhou H, Zhao X, Wang X. An effective ensemble pruning algorithm based on frequent patterns. Knowledge-Based Systems 2014; 56: 79–85.
10.1016/j.knosys.2013.10.024
Web of Science® Google Scholar
27Dai Q, Liu Z. ModEnPBT: a modified backtracking ensemble pruning algorithm. Applied Soft Computing 2013; 13: 4292–4302.
10.1016/j.asoc.2013.06.023
Web of Science® Google Scholar
28Sheen S, Aishwarya SV, Anitha R, Raghavan SV, Bhaskar SM. Ensemble pruning using harmony search. LNAI 2012; 7209: 13–24.
Web of Science® Google Scholar
29Sheen S, Anitha R, Sirisha P. Malware detection by pruning of parallel ensembles using harmony search. Pattern Recognition Letters 2013; 34: 1679–1686.
10.1016/j.patrec.2013.05.006
Web of Science® Google Scholar
30Abdi L, Hashemi S. GAB-EPA: a GA based ensemble pruning approach to tackle multiclass imbalanced problems. LNAI 2013; 7802: 246–254.
Web of Science® Google Scholar
31Guo L, Boukir S. Margin-based ordered aggregation for ensemble pruning. Pattern Recognition Letters 2013; 34: 603–609.
10.1016/j.patrec.2013.01.003
Web of Science® Google Scholar
32Dai Q. An efficient ensemble pruning algorithm using one-path and two-trips searching approach. Knowledge-Based Systems 2013; 51: 85–92.
10.1016/j.knosys.2013.07.006
Web of Science® Google Scholar
33Zhang G, Zhang S, Wang C, Cheng L. Ensemble pruning for data dependent learners. Applied Mechanics and Materials 2012; 135-136: 522–527.
10.4028/www.scientific.net/AMM.135-136.522
Google Scholar
34Dai Q. A novel ensemble pruning algorithm based on randomized greedy selective strategy and ballot. Neurocomputing 2013; 122: 258–265.
10.1016/j.neucom.2013.06.026
Web of Science® Google Scholar
35Toraman C, Can F. Squeezing the ensemble pruning: faster and more accurate categorization for news portals. LNCS 2012; 7224: 508–511.
Google Scholar
36Bhowan U, Johnston M, Zhang M. Ensemble learning and pruning in multi-objective genetic programming for classification with unbalanced data. In AI 2011: Advances in Artificial Intelligence, 24th Australasian Joint Conference on Artificial Intelligence, LNAI, Wang D, Reynolds M (eds), Vol. 7106, Perth, Australia, 2011; 192–202.
Google Scholar
37Partalas I, Tsoumakas G, Vlahavas I. Pruning an ensemble of classifiers via reinforcement learning. Neurocomputing 2009; 72: 1900–1909.
10.1016/j.neucom.2008.06.007
Web of Science® Google Scholar
38Guo H, Zhi W, Han X, Fan M. A new metric for greedy ensemble pruning. LNAI 2011; 7003: 631–639.
Web of Science® Google Scholar
39Partalas I, Tsoumakas G, Vlahavas I. An ensemble uncertainty aware measure for directed hill climbing ensemble pruning. Machine Learning 2010; 81: 257–282.
10.1007/s10994-010-5172-0
Web of Science® Google Scholar
40Soto V, Martinez-Munoz G, Hernandez-Lobato D, Suarez A. A double pruning algorithm for classification ensembles. LNCS 2010; 5997: 104–113.
Web of Science® Google Scholar
41Hernandez-Lobato D, Martinez-Munoz G. Statistical instance-based pruning in ensembles of independent classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 2009; 31: 364–369.
10.1109/TPAMI.2008.204
PubMed Web of Science® Google Scholar
42Zhao QL, Jiang YH, Xu M. A fast ensemble pruning algorithm based on pattern mining process. Data Mining and Knowledge Discovery 2009; 19: 277–292.
10.1007/s10618-009-0138-1
Web of Science® Google Scholar
43Islam R, Abawajy J, Warren M. Multi-tier phishing email classification with an impact of classifier rescheduling. In Proceedings of the 10th International Symposium on Pervasive Systems, Algorithms, and Networks, 2009; 789–793.
Google Scholar
44Islam R, Zhou W, Chowdhury MU. Email categorization using (2+1)-tier classification algorithms. In Proceedings – 7th IEEE/ACIS International Conference on Computer and Information Science, IEEE/ACIS ICIS 2008, In conjunction with 2nd IEEE/ACIS Int. Workshop on e-Activity, IEEE/ACIS IWEA 2008: Portland, OR, USA, 2008; 276–281.
Google Scholar
45Islam R, Zhou W, Gao M, Xiang Y. An innovative analyser for multi-classifier email classification based on grey list analysis. Journal of Network and Computer Applications 2009; 32: 357–366.
10.1016/j.jnca.2008.02.023
Web of Science® Google Scholar
46Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. SIGKDD. Explorations 2009; 11: 10–18.
10.1145/1656274.1656278
Google Scholar
47Witten IH, Frank E, Hall MA. Data Mining: Practical Machine Learning Tools and Techniques (3rd edn.) Elsevier/Morgan Kaufman: Amsterdam, 2011.
Google Scholar
48Rousseeuw P. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational & Applied Mathematics 1987; 20: 53–65.
10.1016/0377-0427(87)90125-7
Web of Science® Google Scholar
49Islam R, Zhou W, Chowdhury M. Minimizing the drawbacks of grey list analyser of synthesis based spam filtering. Journal of Electronics and Computer Science 2009; 11: 89–96.
Google Scholar
50Yearwood J, Webb D, Ma L, Vamplew P, Ofoghi B, Kelarev A. Data Mining and Analytics 2009 Proc. 8th Australasian Data Mining Conference AusDM 2009 CRPIT. Applying clustering and ensemble clustering approaches to phishing profiling, PJ Kennedy, K Ong, P Christen (eds), Vol. 101. ACS: Melbourne, Australia, 2009; 25–34.
Google Scholar
51Peng T, Liu L, Zuo W. PU text classification enhanced by term frequency-inverse document frequency-improved weighting. Concurrency and Computation: Practice and Experience 2014; 26: 728–741.
10.1002/cpe.3040
Web of Science® Google Scholar
52Huda S, Abawajy J, Alazab M, Abdollalihian M, Islam R, Yearwood J. Hybrids of support vector machine wrapper and filter based framework for malware detection. Future Generation Computer Systems 2016; 55: 376–390.
10.1016/j.future.2014.06.001
Web of Science® Google Scholar
53Villar-Rodriguez E, Del Ser J, Torre-Bastida AI, Bilbao MN, Salcedo-Sanz S. A novel machine learning approach to the detection of identity theft in social networks based on emulated attack instances and support vector machines. Concurrency Computat.: Pract. Exper 2015; 27. 10.1002/cpe.3633.
Google Scholar

Citing Literature

Volume29, Issue23

Combined Special issues on Applications and techniques in information and network security (CSTA2015) and International conference on innovative network systems and applications held under the federated conference on computer science and information systems (FedCSis‐INetSApp2015)

10 December 2017

e3929

Multilayer hybrid strategy for phishing email zero-day filtering

Summary

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Multilayer hybrid strategy for phishing email zero-day filtering

Summary

References

Citing Literature

References

Related

Information