Mining frequent itemsets: a perspective from operations research
Wim Pijls
Econometric Institute, Erasmus University, PO Box 1738, Rotterdam 3000 DR, The Netherlands
Search for more papers by this authorWalter A. Kosters
Leiden Institute of Advanced Computer Science, Leiden University, P.O. Box g512, 2300 RA Leiden, The Netherlands
Search for more papers by this authorWim Pijls
Econometric Institute, Erasmus University, PO Box 1738, Rotterdam 3000 DR, The Netherlands
Search for more papers by this authorWalter A. Kosters
Leiden Institute of Advanced Computer Science, Leiden University, P.O. Box g512, 2300 RA Leiden, The Netherlands
Search for more papers by this authorAbstract
Mining frequent itemsets is a flourishing research area. Many papers on this topic have been published and even some contests have been held. Most papers focus on speed and introduce ad hoc algorithms and data structures. In this paper we put most of the algorithms in one framework, using classical Operations Research paradigms such as backtracking, depth-first and breadth-first search and branch-and-bound. Moreover, we present experimental results where the different algorithms are implemented under similar designs.
References
- Agrawal, R. and R. Srikant (1994), Fast algorithms for mining association rules, in: Proceedings 20th International Conference on Very Large Data Bases (VLDB’94), Morgan Kaufmann, 487–499.
- Agrawal, R., H. Mannila, R. Srikant, H. Toivonen and A. Verkamo (1996), Fast discovery of association rules, in: Advances in knowledge discovery and data mining, AAAI/MIT Press, 307–328.
- Bayardo, R., B. Goethals and M. J. Zaki (eds) (2004), Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’04), CEUR Workshop Proceedings [online], http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-126/; accessed 10 February 2010.
- Bayardo, R. (1998), Efficiently mining long patterns from databases, in: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD’98), 89–93.
- Bodon, F. (2006), A survey on frequent itemset mining, Tech. rep., Budapest University of Technology and Economics.
- Borgelt, C. (2005), Keeping things simple: finding frequent item sets by recursive elimination, in: B. Goethals, S. Rÿosenand M. Zaki(eds), Procedings of the First International Workshop on open Source Data Missing: Frequent Pattern Mining Implementations, ACM.pp. 66–70.
- Boros, E., P. Hammer, T. Ibaraki, A. Kogan, E. Mayoraz and I. Muchnik (2000), An implementation of logical analysis of data, IEEE Transactions on Knowledge and Data Engineering 12, 292–306.
- Briandais, R. (1959), File searching using variable length keys, in: Proceedings Western Joint Computer Conference, 295–298.
-
Brijs, T.,
G. Swinnen,
K. Vanhoof and
G. Wets (1999), Using association rules for product assortment decisions: a case study, in: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 1999), 254–260.
10.1145/312129.312241 Google Scholar
- Brin, S., R. Motwani, J. Ullman and S. Tsur (1997), Dynamic itemset counting and implication rules for market basket data, in: Proceedings of the 1997 ACM SIGMOD International Conference (SIGMOD’97), 255–264.
- Brin, S., R. Motwani and C. Silverstein (1998), Beyond market baskets: generalizing association rules to correlations, Data Mining and Knowledge Discovery 2, 39–68.
- Fredkin, E. (1960), Trie memory, Communications of the ACM 3, 490–499.
- Garey, M. and D. Johnson (1979), Computers and intractability, a guide to the theory of NP-completeness, Freeman, New York
- Geurts, K. (2003), Traffic accidents data set [online], http://fimi.cs.helsinki.fi/data/accidents.pdf; accessed 10 February 2010.
- Geurts, K., G. Wets, T. Brijs and K. Vanhoof (2003), Profiling of high frequency accident locations by use of association rules, in: Proceedings of the 82nd Annual Transportation Research Board, US, 123–130.
- Goethals, B. (2003), Survey on frequent pattern mining, [online], http://www.adrem.ua.ac.be/~goethals/software/survey.pdf; accessed 10 February 2010.
- Goethals, B. and M. Zaki (eds) (2003), Proceedings of the First IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’03), CEUR Workshop Proceedings [online], http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-90/; accessed 10 February 2010.
- Golomb, S. W. and L. D. Baumert (1965), Backtrack programming, Journal of the ACM 12, 516–524.
- Goodrich, M. and R. Tamassia (2005), Data structures and algorithms in Java, 4th edn, John Wiley and Sons, London.
- Gouda, K. and M. Zaki (2005), Genmax: an efficient algorithm for mining maximal frequent itemsets, Data Mining and Knowledge Discovery 11, 1–20.
- Grahne, G. and J. Zhu (2003), Efficiently using prefix-trees in mining frequent itemsets, in: B. Goethals and M. Zaki (eds), Proceedings of the First IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’03), CEUR Workshop Proceedings [online], http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-90/; accessed 10 February 2010.
-
Gunopulos, D.,
R. Khardon,
H. Mannila and
H. Toivonen (1997), Data mining, hypergraph transversals, and machine learning, in: Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS’97), 209–216.
10.1145/263661.263684 Google Scholar
- Gunopulos, D., R. Khardon, H. Mannila, S. Sajula, H. Toivonen and R. Sharm (2003), Discovering all most specfic sentences, ACM Transactions on Database Systems 28, 140–174.
-
Han, J.,
J. Pei and
Y. Yin (2000), Mining frequent patterns without candidate generation, in: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD’00), 1–12.
10.1145/342009.335372 Google Scholar
- Han, J., J. Pei, Y. Yin and R. Mao (2004), Mining frequent patterns without candidate generation: a frequent-pattern tree approach, Data Mining and Knowledge Discovery 8, 53–87.
-
Hu, K.,
Y. Lu,
L. Zhou and
C. Shi (1999), Integrating classification and association rule mining: a concept lattice framework, in: Proceedings of the Seventh International Workshop on New Directions in Rough Sets, Data Mining and Granular Soft Computing, Springer, 443–447.
10.1007/978-3-540-48061-7_53 Google Scholar
- Huhtala, Y., J. Kärkkäinen, P. Porkka and H. Toivonen (1999), TANE: an efficient algorithm for discovering functional and approximate dependencies, The Computer Journal 42, 100–111.
- Kosters, W. and W. Pijls (2003), Apriori: a depth-first implementation, In: B. Goethals and M. Zaki (eds), Proceedings of the First IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’03), CEUR Workshop Proceedings [online], http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-90/; accessed 10 February 2010.
- Kosters, W. A., E. Marchiori and A. J. Oerlemans (1999), Mining clusters with association rules, in: Proceedings of the Third International Symposium on Advances in Intelligent Data Analysis (IDA’99), Springer Lecture Notes in Computer Science 1642, 39–50.
-
Liu, J.,
Y. Pan,
K. Wang and
J. Han (2002), Mining frequent item sets by opportunistic projection, in: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’02), 229–238.
10.1145/775047.775081 Google Scholar
- Liu, G., H. Lu, J. Yu, W. Wei and X. Xiao (2003), AFOPT: an efficient implementation of pattern growth approach, in: B. Goethals and M. Zaki (eds), Proceedings of the First IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’03), CEUR Workshop Proceedings [online], http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-90/; accessed 10 February 2010.
- Liu, G., H. Lu, W. Lou, Y. Xu and J. Yu (2004), Efficiently mining of frequent patterns using ascending frequency ordered prefix-tree, Data Mining and Knowledge Discovery 9, 249–274.
- Mannila, H., H. Toivonen and A. Verkamo (1995), Discovering frequent episodes in sequences, in: Proceedings of the First International Conference on Knowledge Discovery and Data Mining, 210–215.
- Papadimitriou, C. H. (1994), Computational complexity, Addison Wesley, New York.
- Pawlak, Z. (1991), Rough sets, theoretical aspects of reasoning about data, Kluwer Academic Publishers, Dordrecht, Netherlands.
- Pei, J., J. Han, H. Lu, S. Nishio, S. Tang and D. Yang (2001), H-mine: Hyper-structure mining of frequent patterns in large databases, in: Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM 2001), 441–448.
- Pijls, W. and R. Potharst (2000), Classification and target group selection based upon frequent patterns, in: Proceedings Twelfth Belgium-Netherlands Artificial Intelligence Conference (BNAIC’00), 125–132.
- Rácz, B. (2004), nonordfp: an FP-growth variation without rebuilding the FP-tree, in: R. Bayardo B. Goethals and M.J. Zaki (eds), Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’04), CEUR Workshop Proceedings [online], http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-126/; accessed 10 February 2010.
-
Rácz, B.,
F. Bodon and
L. Schmidt-Thieme (2005), On benchmarking frequent itemset mining algorithms: from measurement to analysis, in: B. Goethals, S. Rÿosenand M. Zaki (eds), Proceedings of the First International Workshop on open Source Data Mining: Frequent Pattern Mining Implementations, ACM. pp. 36–45.
10.1145/1133905.1133911 Google Scholar
- Rymon, R. (1992), Search through systematic set enumeration, in: Proceedings of the Third International Conference on Principles of Knowledge Representation and Reasoning, 539–550.
- Savarese, A., E. Omiecinsky and S. Navathe (1995), An efficient algorithm for mining association rules in large databases, in: Proceedings of the 21st International Conference on Very Large Databases (VLDB’95), 432–443.
- Tan, P.-N., M. Steinbach and V. Kumar (2006), Introduction to data mining, Addison Wesley, New York.
- Uno, T. and K. Satoh (2003), Detailed description of an algorithm for enumeration of maximal frequent sets with irredundant dualization, in: B. Goethals and M. Zaki (eds), Proceedings of the First IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’03), CEUR Workshop Proceedings [online], http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-90/; accessed 10 February 2010.
- Uno, T., T. Asai, Y. Uchida and H. Arimura (2004), LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets, in: R. Bayardo, B. Goethals and M.J. Zaki (eds), Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’04), CEUR Workshop Pro-ceedings [online], http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-126/; accessed 10 February 2010.
-
Valiant, L. (1979), The complexity of computing the permanent, Theoretical Computer Science
8, 189–201.
10.1016/0304-3975(79)90044-6 Google Scholar
- Wang, K., L. Tang, J. Han and J. Liu (2002), Top-down FP-growth for association rule mining, in: Advances in Knowledge Discovery and Data Mining, Proceedings of the Sixth Pacific-Asia Conference (PAKDD 2002), Springer Lecture Notes in Artificial Intelligence 2336. 334–340.
- Wu, C. (2006), Applying frequent itemset mining to identify a small itemset that satisfies a large percentage of orders in a warehouse, Computers and Operations Research 33, 3161–3170.
-
Yang, G. (2004), The complexity of mining maximal frequent itemsets and maximal frequent patterns, in: Proceedings of the Tenth ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2004), 344–353.
10.1145/1014052.1014091 Google Scholar
- Zaki, M. (2000), Scalable algorithms for association mining, IEEE Transactions on Knowledge and Data Engineering 12, 372–390.
-
Zaki, M. and
K. Gouda (2003), Fast vertical mining using diffsets, in: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003), 326–335.
10.1145/956750.956788 Google Scholar