A horizontal partitioning-based method for frequent pattern mining in transport timetable
Claudio Teixeira
CEFET/RJ - Federal Center for Technological Education of Rio de Janeiro, Rio de Janeiro, Brazil
Search for more papers by this authorMarta Mattoso
COPPE/UFRJ, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
Search for more papers by this authorDiego Carvalho
CEFET/RJ - Federal Center for Technological Education of Rio de Janeiro, Rio de Janeiro, Brazil
Search for more papers by this authorEduardo Bezerra
CEFET/RJ - Federal Center for Technological Education of Rio de Janeiro, Rio de Janeiro, Brazil
Search for more papers by this authorJorge Soares
CEFET/RJ - Federal Center for Technological Education of Rio de Janeiro, Rio de Janeiro, Brazil
Search for more papers by this authorGlauco Amorim
CEFET/RJ - Federal Center for Technological Education of Rio de Janeiro, Rio de Janeiro, Brazil
Search for more papers by this authorCorresponding Author
Eduardo Ogasawara
CEFET/RJ - Federal Center for Technological Education of Rio de Janeiro, Rio de Janeiro, Brazil
Correspondence
Eduardo Ogasawara, CEFET/RJ - Federal Center for Technological Education of Rio de Janeiro, Rio de Janeiro, Brazil.
Email: [email protected]
Search for more papers by this authorClaudio Teixeira
CEFET/RJ - Federal Center for Technological Education of Rio de Janeiro, Rio de Janeiro, Brazil
Search for more papers by this authorMarta Mattoso
COPPE/UFRJ, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
Search for more papers by this authorDiego Carvalho
CEFET/RJ - Federal Center for Technological Education of Rio de Janeiro, Rio de Janeiro, Brazil
Search for more papers by this authorEduardo Bezerra
CEFET/RJ - Federal Center for Technological Education of Rio de Janeiro, Rio de Janeiro, Brazil
Search for more papers by this authorJorge Soares
CEFET/RJ - Federal Center for Technological Education of Rio de Janeiro, Rio de Janeiro, Brazil
Search for more papers by this authorGlauco Amorim
CEFET/RJ - Federal Center for Technological Education of Rio de Janeiro, Rio de Janeiro, Brazil
Search for more papers by this authorCorresponding Author
Eduardo Ogasawara
CEFET/RJ - Federal Center for Technological Education of Rio de Janeiro, Rio de Janeiro, Brazil
Correspondence
Eduardo Ogasawara, CEFET/RJ - Federal Center for Technological Education of Rio de Janeiro, Rio de Janeiro, Brazil.
Email: [email protected]
Search for more papers by this authorFunding information: Conselho Nacional de Desenvolvimento Científico e Tecnológico; Coordenação de Aperfeiçoamento de Pessoal de Nível Superior; Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro
Abstract
Analysing transport timetables is an important task, as it brings the opportunity to discover which routes commonly lead to delays. Frequent pattern mining is a technique used to support such type of discovery. However, functional dependencies are intrinsic properties present in timetables, particularly related to attributes derived from the origin–destination matrix. Such functional dependencies compromise the search for patterns in timetables in both the number of association rules (ARs) generated and the computational cost. Several of these ARs refer to the same information. Redundancy removal techniques can reduce the number of ARs. However, these techniques are designed to be used after mining finishes, which increases the computational cost of finding useful ARs. This work presents timetable pattern mining (T-mine), a novel method for frequent pattern mining that improves knowledge discovery in timetables. We evaluated T-mine using Brazilian Flight Data and compared T-mine with the direct application of frequent pattern mining approaches with and without functional dependencies. Our experiments indicate that T-mine is about one order magnitude faster than other methods with functional dependencies.
Open Research
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are openly available in Brazilian Flights Dataset at https://doi.org/10.21227/k10b-qn21.
REFERENCES
- Aggarwal, C. C., & Han, J. (2014). Frequent pattern mining. 1st ed. ( 1–471). Springer.
10.1007/978-3-319-07821-2_1 Google Scholar
- Albalooshi, F., & Shatnawi, S. (2021). Timetable generation: Applying a modified FP-tree algorithm on mined students' and faculty preferences. International Journal of Applied Metaheuristic Computing, 12(1), 20–40.
- Bayardo, R., Jr., Agrawal, R., & Gunopulos, D. (2000). Constraint-based rule mining in large, dense databases. Data Mining and Knowledge Discovery, 4(2–3), 217–240.
- Belcastro, L., Marozzo, F., Talia, D., & Trunfio, P. (2016). Using scalable data mining for predicting flight delays. ACM Transactions on Intelligent Systems and Technology, 8(1), 1–20.
- Biswas, S., Biswas, N., & Mondal, K. (2018). Parallel apriori based distributed association rule mining: A comprehensive survey. Proceedings - 2018 4th IEEE International Conference on Research in Computational Intelligence and Communication Networks, ICRCICN 2018 (pp. 202–207).
- Bouraoui, M., & Touzi, A. (2019). Efficient mining of association rules based on clustering from distributed data. International Journal of Advanced Computer Science and Applications, 10(4), 401–409.
- Caruccio, L., & Cirillo, S. (2020). Incremental discovery of imprecise functional dependencies. Journal of Data and Information Quality, 12(4), 1–25.
- Caruccio, L., Deufemia, V., & Polese, G. (2020). Mining relaxed functional dependencies from data. Data Mining and Knowledge Discovery, 34(2), 443–477.
- Carvalho, L., Sternberg, A., Maia Gonçalves, L., Beatriz Cruz, A., Soares, J., Brandão, D., Carvalho, D., & Ogasawara, E. (2020). On the relevance of data science for flight delay research: A systematic review. Transport Reviews, 41, 499–528.
- Ceder, A. (2016). Public transit planning and operation: Modeling, practice and behavior. CRC Press.
10.1201/b18689 Google Scholar
- Chung, S.-H., Ma, H.-L., Hansen, M., & Choi, T.-M. (2020). Data science and analytics in aviation. Transportation Research Part E: Logistics and Transportation Review, 134, 101837.
- Codocedo, V., Baixeries, J., Kaytoue, M., & Napoli, A. (2018). Characterizing covers of functional dependencies using FCA. International Conference on Concept Lattices and Their Applications (Vol. CLA2018, pp. 279–290).
- Elmasri, R., & Navathe, S. B. (2016). Fundamentals of database systems. Pearson Education Limited.
- Fournier-Viger, P., Lin, J.-W., Vo, B., Chi, T., Zhang, J., & Le, H. (2017). A survey of itemset mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 7(4), 1–18.
- Ghofrani, F., He, Q., Goverde, R., & Liu, X. (2018). Recent applications of big data analytics in railway transportation systems: A survey. Transportation Research Part C: Emerging Technologies, 90, 226–246.
- Greeshma, L., & Pradeepini, G. (2016). Mining maximal efficient closed itemsets without any redundancy. Advances in Intelligent Systems and Computing, 433, 339–347.
10.1007/978-81-322-2755-7_36 Google Scholar
- Hahsler, M., Buchta, C., Gruen, B., & Hornik, K. (2020). arules: Mining association rules and frequent itemsets (Technical Report). https://cran.r-project.org/package=arules.
- Han, J., Pei, J., & Kamber, M. (2011). Data mining: Concepts and techniques. Elsevier.
- Jiang, Y., & Ceder, A. (2021). Incorporating personalization and bounded rationality into stochastic transit assignment model. Transportation Research Part C: Emerging Technologies, 127, 103127.
- Khiari, J., Moreira-Matias, L., Cerqueira, V., & Cats, O. (2016). Automated setting of bus schedule coverage using unsupervised machine learning. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9651, 552–564.
- Lee, W.-H., Yen, L.-H., & Chou, C.-M. (2016). A delay root cause discovery and timetable adjustment model for enhancing the punctuality of railway services. Transportation Research Part C: Emerging Technologies, 73, 49–64.
- Luna, J., Fournier-Viger, P., & Ventura, S. (2019). Frequent itemset mining: A 25years review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(6), 1–15.
- Mahmud, M., Huang, J., Salloum, S., Emara, T., & Sadatdiynov, K. (2020). A survey of data partitioning and sampling methods to support big data analysis. Big Data Mining and Analytics, 3(2), 85–101.
10.26599/BDMA.2019.9020015 Google Scholar
- Nasr, M., Hamdy, M., Hegazy, D., & Bahnasy, K. (2021). An efficient algorithm for unique class association rule mining. Expert Systems with Applications, 164, 113978.
- Papenbrock, T., Ehrlich, J., Marten, J., Neubert, T., Rudolph, J.-P., Schönberg, M., Zwiener, J., & Naumann, F. (2015). Functional dependency discovery: An experimental evaluation of seven algorithms. Proceedings of the VLDB Endowment, 8, 1082–1093.
- Prakash, R., Sarma, S., & Sheshikala, M. (2018). Generating non-redundant multilevel association rules using min-max exact rules. International Journal of Electrical and Computer Engineering, 8(6), 4568–4576.
- Rahim, N., Taib, S., & Mahamad, S. (2013). Courses enrollment pattern analysis. Lecture Notes in Electrical Engineering (Vol. 152, pp. 283–292). New York, NY: Springer.
- Sethi, M., & Jindal, R. (2020). Distributed association rules mining of varying data partition size using nodesets. International Journal of Advanced Trends in Computer Science and Engineering, 9(4), 5433–5441.
10.30534/ijatcse/2020/181942020 Google Scholar
- Shaharanee, I., & Hadzic, F. (2014). Evaluation and optimization of frequent, closed and maximal association rule based classification. Statistics and Computing, 24(5), 821–843.
- Shayegan Fard, M., & Namin, P. (2020). Review of apriori based frequent itemset mining solutions on big data. 2020 6th International Conference on Web Research, ICWR 2020 (pp. 157–164).
- Sternberg, A., Carvalho, D., Murta, L., Soares, J., & Ogasawara, E. (2016). An analysis of Brazilian flight delays based on frequent patterns. Transportation Research Part E: Logistics and Transportation Review, 95, 282–298.
- Susan, S., & Bhutani, A. (2020). Data mining with association rules for scheduling open elective courses using optimization algorithms. Advances in Intelligent Systems and Computing, 941, 770–778.
10.1007/978-3-030-16660-1_75 Google Scholar
- Teixeira, C., Teixeira, L., dos Santos, J., Amorim, G., Soares, J., & Ogasawara, E. (2020). Integrated Brazilian Flight Datasets description (Technical Report). https://doi.org/10.21227/k10b-qn21.
10.21227/k10b-qn21 Google Scholar
- Wei, Z., & Link, S. (2019). Discovery and ranking of functional dependencies. Proceedings - International Conference on Data Engineering (Vol. 2019-April, pp. 1526–1537).
- Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data mining: Practical machine learning tools and techniques. Morgan Kaufmann.
- Özsu, M. T., & Valduriez, P. (2019). Principles of distributed database systems. Springer Nature.