Mining frequent itemsets is a flourishing research area. Many papers on this topic have been published and even some contests have been held. Most papers focus on speed and introduce ad hoc algorithms and data structures. In this paper we put most of the algorithms in one framework, using classical Operations Research paradigms such as backtracking, depth-first and breadth-first search and branch-and-bound. Moreover, we present experimental results where the different algorithms are implemented under similar designs.

References

Agrawal, R. and R. Srikant (1994), Fast algorithms for mining association rules, in: Proceedings 20th International Conference on Very Large Data Bases (VLDB’94), Morgan Kaufmann, 487–499.
Google Scholar
Agrawal, R., H. Mannila, R. Srikant, H. Toivonen and A. Verkamo (1996), Fast discovery of association rules, in: Advances in knowledge discovery and data mining, AAAI/MIT Press, 307–328.
Google Scholar
Bayardo, R., B. Goethals and M. J. Zaki (eds) (2004), Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’04), CEUR Workshop Proceedings [online], http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-126/; accessed 10 February 2010.
Web of Science® Google Scholar
Bayardo, R. (1998), Efficiently mining long patterns from databases, in: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD’98), 89–93.
Google Scholar
Bodon, F. (2006), A survey on frequent itemset mining, Tech. rep., Budapest University of Technology and Economics.
Google Scholar
Borgelt, C. (2005), Keeping things simple: finding frequent item sets by recursive elimination, in: B. Goethals, S. Rÿosenand M. Zaki(eds), Procedings of the First International Workshop on open Source Data Missing: Frequent Pattern Mining Implementations, ACM.pp. 66–70.
Google Scholar
Boros, E., P. Hammer, T. Ibaraki, A. Kogan, E. Mayoraz and I. Muchnik (2000), An implementation of logical analysis of data, IEEE Transactions on Knowledge and Data Engineering 12, 292–306.
10.1109/69.842268
Web of Science® Google Scholar
Briandais, R. (1959), File searching using variable length keys, in: Proceedings Western Joint Computer Conference, 295–298.
Google Scholar
Brijs, T., G. Swinnen, K. Vanhoof and G. Wets (1999), Using association rules for product assortment decisions: a case study, in: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 1999), 254–260.
10.1145/312129.312241
Google Scholar
Brin, S., R. Motwani, J. Ullman and S. Tsur (1997), Dynamic itemset counting and implication rules for market basket data, in: Proceedings of the 1997 ACM SIGMOD International Conference (SIGMOD’97), 255–264.
Google Scholar
Brin, S., R. Motwani and C. Silverstein (1998), Beyond market baskets: generalizing association rules to correlations, Data Mining and Knowledge Discovery 2, 39–68.
10.1023/A:1009713703947
Web of Science® Google Scholar
Fredkin, E. (1960), Trie memory, Communications of the ACM 3, 490–499.
10.1145/367390.367400
Web of Science® Google Scholar
Garey, M. and D. Johnson (1979), Computers and intractability, a guide to the theory of NP-completeness, Freeman, New York
Google Scholar
Geurts, K. (2003), Traffic accidents data set [online], http://fimi.cs.helsinki.fi/data/accidents.pdf; accessed 10 February 2010.
Google Scholar
Geurts, K., G. Wets, T. Brijs and K. Vanhoof (2003), Profiling of high frequency accident locations by use of association rules, in: Proceedings of the 82nd Annual Transportation Research Board, US, 123–130.
Web of Science® Google Scholar
Goethals, B. (2003), Survey on frequent pattern mining, [online], http://www.adrem.ua.ac.be/~goethals/software/survey.pdf; accessed 10 February 2010.
Google Scholar
Goethals, B. and M. Zaki (eds) (2003), Proceedings of the First IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’03), CEUR Workshop Proceedings [online], http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-90/; accessed 10 February 2010.
Google Scholar
Golomb, S. W. and L. D. Baumert (1965), Backtrack programming, Journal of the ACM 12, 516–524.
10.1145/321296.321300
Web of Science® Google Scholar
Goodrich, M. and R. Tamassia (2005), Data structures and algorithms in Java, 4th edn, John Wiley and Sons, London.
Google Scholar
Gouda, K. and M. Zaki (2005), Genmax: an efficient algorithm for mining maximal frequent itemsets, Data Mining and Knowledge Discovery 11, 1–20.
10.1007/s10618-005-0002-x
Web of Science® Google Scholar
Grahne, G. and J. Zhu (2003), Efficiently using prefix-trees in mining frequent itemsets, in: B. Goethals and M. Zaki (eds), Proceedings of the First IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’03), CEUR Workshop Proceedings [online], http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-90/; accessed 10 February 2010.
Web of Science® Google Scholar
Gunopulos, D., R. Khardon, H. Mannila and H. Toivonen (1997), Data mining, hypergraph transversals, and machine learning, in: Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS’97), 209–216.
10.1145/263661.263684
Google Scholar
Gunopulos, D., R. Khardon, H. Mannila, S. Sajula, H. Toivonen and R. Sharm (2003), Discovering all most specfic sentences, ACM Transactions on Database Systems 28, 140–174.
10.1145/777943.777945
Web of Science® Google Scholar
Han, J., J. Pei and Y. Yin (2000), Mining frequent patterns without candidate generation, in: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD’00), 1–12.
10.1145/342009.335372
Google Scholar
Han, J., J. Pei, Y. Yin and R. Mao (2004), Mining frequent patterns without candidate generation: a frequent-pattern tree approach, Data Mining and Knowledge Discovery 8, 53–87.
10.1023/B:DAMI.0000005258.31418.83
Web of Science® Google Scholar
Hu, K., Y. Lu, L. Zhou and C. Shi (1999), Integrating classification and association rule mining: a concept lattice framework, in: Proceedings of the Seventh International Workshop on New Directions in Rough Sets, Data Mining and Granular Soft Computing, Springer, 443–447.
10.1007/978-3-540-48061-7_53
Google Scholar
Huhtala, Y., J. Kärkkäinen, P. Porkka and H. Toivonen (1999), TANE: an efficient algorithm for discovering functional and approximate dependencies, The Computer Journal 42, 100–111.
10.1093/comjnl/42.2.100
Web of Science® Google Scholar
Kosters, W. and W. Pijls (2003), Apriori: a depth-first implementation, In: B. Goethals and M. Zaki (eds), Proceedings of the First IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’03), CEUR Workshop Proceedings [online], http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-90/; accessed 10 February 2010.
Google Scholar
Kosters, W. A., E. Marchiori and A. J. Oerlemans (1999), Mining clusters with association rules, in: Proceedings of the Third International Symposium on Advances in Intelligent Data Analysis (IDA’99), Springer Lecture Notes in Computer Science 1642, 39–50.
10.1007/3-540-48412-4_4
Web of Science® Google Scholar
Liu, J., Y. Pan, K. Wang and J. Han (2002), Mining frequent item sets by opportunistic projection, in: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’02), 229–238.
10.1145/775047.775081
Google Scholar
Liu, G., H. Lu, J. Yu, W. Wei and X. Xiao (2003), AFOPT: an efficient implementation of pattern growth approach, in: B. Goethals and M. Zaki (eds), Proceedings of the First IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’03), CEUR Workshop Proceedings [online], http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-90/; accessed 10 February 2010.
Google Scholar
Liu, G., H. Lu, W. Lou, Y. Xu and J. Yu (2004), Efficiently mining of frequent patterns using ascending frequency ordered prefix-tree, Data Mining and Knowledge Discovery 9, 249–274.
10.1023/B:DAMI.0000040905.52966.1a
Web of Science® Google Scholar
Mannila, H., H. Toivonen and A. Verkamo (1995), Discovering frequent episodes in sequences, in: Proceedings of the First International Conference on Knowledge Discovery and Data Mining, 210–215.
Google Scholar
Papadimitriou, C. H. (1994), Computational complexity, Addison Wesley, New York.
Google Scholar
Pawlak, Z. (1991), Rough sets, theoretical aspects of reasoning about data, Kluwer Academic Publishers, Dordrecht, Netherlands.
Google Scholar
Pei, J., J. Han, H. Lu, S. Nishio, S. Tang and D. Yang (2001), H-mine: Hyper-structure mining of frequent patterns in large databases, in: Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM 2001), 441–448.
Web of Science® Google Scholar
Pijls, W. and R. Potharst (2000), Classification and target group selection based upon frequent patterns, in: Proceedings Twelfth Belgium-Netherlands Artificial Intelligence Conference (BNAIC’00), 125–132.
Google Scholar
Rácz, B. (2004), nonordfp: an FP-growth variation without rebuilding the FP-tree, in: R. Bayardo B. Goethals and M.J. Zaki (eds), Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’04), CEUR Workshop Proceedings [online], http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-126/; accessed 10 February 2010.
Google Scholar
Rácz, B., F. Bodon and L. Schmidt-Thieme (2005), On benchmarking frequent itemset mining algorithms: from measurement to analysis, in: B. Goethals, S. Rÿosenand M. Zaki (eds), Proceedings of the First International Workshop on open Source Data Mining: Frequent Pattern Mining Implementations, ACM. pp. 36–45.
10.1145/1133905.1133911
Google Scholar
Rymon, R. (1992), Search through systematic set enumeration, in: Proceedings of the Third International Conference on Principles of Knowledge Representation and Reasoning, 539–550.
Web of Science® Google Scholar
Savarese, A., E. Omiecinsky and S. Navathe (1995), An efficient algorithm for mining association rules in large databases, in: Proceedings of the 21st International Conference on Very Large Databases (VLDB’95), 432–443.
Web of Science® Google Scholar
Tan, P.-N., M. Steinbach and V. Kumar (2006), Introduction to data mining, Addison Wesley, New York.
Google Scholar
Uno, T. and K. Satoh (2003), Detailed description of an algorithm for enumeration of maximal frequent sets with irredundant dualization, in: B. Goethals and M. Zaki (eds), Proceedings of the First IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’03), CEUR Workshop Proceedings [online], http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-90/; accessed 10 February 2010.
Google Scholar
Uno, T., T. Asai, Y. Uchida and H. Arimura (2004), LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets, in: R. Bayardo, B. Goethals and M.J. Zaki (eds), Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’04), CEUR Workshop Pro-ceedings [online], http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-126/; accessed 10 February 2010.
Google Scholar
Valiant, L. (1979), The complexity of computing the permanent, Theoretical Computer Science 8, 189–201.
10.1016/0304-3975(79)90044-6
Google Scholar
Wang, K., L. Tang, J. Han and J. Liu (2002), Top-down FP-growth for association rule mining, in: Advances in Knowledge Discovery and Data Mining, Proceedings of the Sixth Pacific-Asia Conference (PAKDD 2002), Springer Lecture Notes in Artificial Intelligence 2336. 334–340.
Google Scholar
Wu, C. (2006), Applying frequent itemset mining to identify a small itemset that satisfies a large percentage of orders in a warehouse, Computers and Operations Research 33, 3161–3170.
10.1016/j.cor.2005.01.026
Web of Science® Google Scholar
Yang, G. (2004), The complexity of mining maximal frequent itemsets and maximal frequent patterns, in: Proceedings of the Tenth ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2004), 344–353.
10.1145/1014052.1014091
Google Scholar
Zaki, M. (2000), Scalable algorithms for association mining, IEEE Transactions on Knowledge and Data Engineering 12, 372–390.
10.1109/69.846291
Web of Science® Google Scholar
Zaki, M. and K. Gouda (2003), Fast vertical mining using diffsets, in: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003), 326–335.
10.1145/956750.956788
Google Scholar

Citing Literature

Volume64, Issue4

November 2010

Pages 367-387

Mining frequent itemsets: a perspective from operations research

Abstract

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Mining frequent itemsets: a perspective from operations research

Abstract

References

Citing Literature

References

Related

Information