Overcoming traditional ETL systems architectural problems using a service-oriented approach
Corresponding Author
Bruno Oliveira
CIICESI, School of Management and Technology, Porto Polytechnic, Felgueiras, Portugal
Correspondence
Bruno Oliveira, CIICESI, School of Management and Technology, Porto Polytechnic, Felgueiras, Portugal.
Email: [email protected]
Search for more papers by this authorÓscar Oliveira
CIICESI, School of Management and Technology, Porto Polytechnic, Felgueiras, Portugal
Search for more papers by this authorOrlando Belo
ALGORITMI R&D Centre/LASI, University of Minho, Campus de Gualtar, Braga, Portugal
Search for more papers by this authorCorresponding Author
Bruno Oliveira
CIICESI, School of Management and Technology, Porto Polytechnic, Felgueiras, Portugal
Correspondence
Bruno Oliveira, CIICESI, School of Management and Technology, Porto Polytechnic, Felgueiras, Portugal.
Email: [email protected]
Search for more papers by this authorÓscar Oliveira
CIICESI, School of Management and Technology, Porto Polytechnic, Felgueiras, Portugal
Search for more papers by this authorOrlando Belo
ALGORITMI R&D Centre/LASI, University of Minho, Campus de Gualtar, Braga, Portugal
Search for more papers by this authorAbstract
Developing analytical systems imposes several challenges related not only to the amount and heterogeneity of the involved data but also to the constant need to readapt and evolve to overcome new business challenges. Data are a determinant factor in the success of analytical and decision-making applications, being its nature, availability, and quality, crucial aspects for planning and structuring populating analytical systems. Today's users are more demanding, requiring adaptable and flexible analytical applications, which impose serious challenges on extract-transform-load (ETL) systems design and development for ensuring flexible and robust data populating services, operating 24/7, and managing and processing large volumes of data. Thus, we should design and implement ETL processes using innovative and up-to-date approaches, having real application evidence. In this paper, we present a service-oriented implementation for ETL design and development. We mapped and implemented some of the most conventional ETL processes in a service-oriented architecture, to demonstrate the application and benefits that this kind of approach will provide to ETL systems project development.
CONFLICT OF INTEREST STATEMENT
The authors declare that they have no conflict of interest.
Open Research
DATA AVAILABILITY STATEMENT
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
REFERENCES
- Ali, F. S. E. (2014). A survey of real-time data warehouse and ETL. International Journal of Scientific and Engineering Research, 5, 5518.
- Ali, S. M. F. (2018). Next-generation ETL framework to address the challenges posed by big data (Vol. 2062). CEUR Workshop Proc.
- Armbrust, M., Ghodsi, A., Xin, R., & Zaharia, M. (2021). Lakehouse: A new generation of open platforms that unify data warehousing and advanced analytics. Conference on Innovative Data Systems Research (CIDR).
- Auer, F., Lenarduzzi, V., Felderer, M., & Taibi, D. (2021). From monolithic systems to microservices: An assessment framework. Information and Software Technology, 137, 106600. https://doi.org/10.1016/j.infsof.2021.106600
- Awad, M. M. I., Abdullah, M. S., & Ali, A. B. M. (2011). Extending ETL framework using service oriented architecture. Procedia Computer Science, 3, 110–114. https://doi.org/10.1016/j.procs.2010.12.019
10.1016/j.procs.2010.12.019 Google Scholar
- Chevalier, M., el Malki, M., Kopliku, A., Teste, O., & Tournier, R. (2015). Implementing multidimensional data warehouses into NoSQL. In Proceedings of the 17th international conference on Enterprise information systems (Vol. 1, pp. 172–183). SCITEPRESS – Science and Technology Publications, Lda, Setubal, PRT. https://doi.org/10.5220/0005379801720183
10.5220/0005379801720183 Google Scholar
- da Silva, M. S., Times, V. C., & Kwakye, M. M. (2012). A framework for ETL systems development. Journal of Data, Information and Management, 3, 300–315.
- Debroy, V., Brimble, L., & Yost, M. (2018). NewTL: Engineering an extract, transform, load (ETL) software system for business on a very large scale. Proceedings of the ACM Symposium on Applied Computing., 2018, 1568–1575. https://doi.org/10.1145/3167132.3167300
10.1145/3167132.3167300 Google Scholar
- Dehghani, Z. (2021). Data Mesh. O'Reilly Media, Inc.
- el Akkaoui, Z., Mazón, J.-N. N., Vaisman, A., & Zimányi, E. (2012). BPMN-based conceptual modeling of ETL processes. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)., 7448, 1–14. https://doi.org/10.1007/978-3-642-32584-7_1
10.1007/978-3-642-32584-7_1 Google Scholar
- el Akkaoui, Z., & Zimanyi, E. (2009). Defining ETL worfklows using BPMN and BPEL. In Proceeding of the ACM twelfth international workshop on data warehousing and OLAP DOLAP 09 (pp. 41–48). ACM. https://doi.org/10.1145/1651291.1651299
10.1145/1651291.1651299 Google Scholar
- Al Falahi, K., Atif, Y., & Elnaffar, S. (2010). Social networks: Challenges and new opportunities. In 2010 IEEE/ACM Int'l conference on green computing and Communications & Int'l conference on cyber, physical and social computing (pp. 804–808). IEEE. https://doi.org/10.1109/GreenCom-CPSCom.2010.14
10.1109/GreenCom-CPSCom.2010.14 Google Scholar
- Fu, G., Zhang, Y., & Yu, G. (2021). A fair comparison of message queuing systems. IEEE Access, 9, 421–432. https://doi.org/10.1109/ACCESS.2020.3046503
- Gamma, E., Helm, R., Johnson, R. E., & Vlissides, J. (1995). Design patterns: Elements of reusable object-oriented software. Design Patterns, 206, 395. https://doi.org/10.1093/carcin/bgs084
10.1093/carcin/bgs084 Google Scholar
- Gos, K., & Zabierowski, W. (2020). The comparison of microservice and monolithic architecture. In International conference on perspective technologies and methods in MEMS design (pp. 150–153). IEEE. https://doi.org/10.1109/MEMSTECH49584.2020.9109514
10.1109/MEMSTECH49584.2020.9109514 Google Scholar
- Hewitt, C., Bishop, P., & Steiger, R. (1973). A universal modular ACTOR formalism for artificial intelligence. In IJCAI'73: Proceedings of the 3rd international joint conference on artificial intelligence (pp. 235–245). Morgan Kaufmann Publishers Inc.
- Kandel, S., Paepcke, A., Hellerstein, J. M., & Heer, J. (2012). Enterprise data analysis and visualization: An interview study. IEEE Transactions on Visualization and Computer Graphics, 18, 2917–2926. https://doi.org/10.1109/TVCG.2012.219
- Kimball, R., & Ross, M. (2013). The data warehouse toolkit, the definitive guide to dimensional modeling (Vol. 32, pp. 101–102). John Wiley & Sons. https://doi.org/10.1145/945721.945741
- Köppen, V., Brüggemann, B., & Berendt, B. (2011). Designing data integration: The ETL pattern approach. The European Journal for the Informatics Professional, XII, 49–55.
- Laigner, R., Kalinowski, M., Diniz, P., Barros, L., Cassino, C., Lemos, M., Arruda, D., Lifschitz, S., & Zhou, Y. (2020). From a monolithic big data system to a microservices event-driven architecture. Proceedings – 46th Euromicro Conference on Software Engineering and Advanced Applications, SEAA, 2020, 213–220. https://doi.org/10.1109/SEAA51224.2020.00045
10.1109/SEAA51224.2020.00045 Google Scholar
- Lewis, J., & Fowler, M. Microservices. http://martinfowler.com/articles/microservices.html
- Liu, Y., & Vitolo, T. M. (2013). Graph data warehouse: Steps to integrating graph databases into the traditional conceptual structure of a data warehouse. In 2013 IEEE international congress on big data (pp. 433–434). IEEE. https://doi.org/10.1109/BigData.Congress.2013.72
10.1109/BigData.Congress.2013.72 Google Scholar
- Machado, G., Cunha, Í., Pereira, A. C. M., & Oliveira, L. B. (2019). DOD-ETL: Distributed on-demand ETL for near real-time business intelligence. Journal of Internet Services and Applications, 10, 1–15. https://doi.org/10.1186/s13174-019-0121-z
- Newman, S. (2015). Building microservices: Designing fine-grained systems. O'Reilly Media.
- Oliveira, B., & Belo, O. (2016). An ontology for describing ETL patterns behavior. In C. Francalanci & M. Helfert (Eds.), Proceedings of the 5th international conference on data management technologies and applications (pp. 102–109). SCITEPRESS – Science and Technology Publications. https://doi.org/10.5220/0005974001020109
10.5220/0005974001020109 Google Scholar
- Oliveira, B., & Belo, O. (2017). On the specification of extract, transform, and load patterns behavior: A domain-specific language approach. Expert Systems, 34, e12168. https://doi.org/10.1111/exsy.12168
- Oliveira, B., Leite, M., Oliveira, Ó., & Belo, O. (2022). A service-oriented framework for ETL implementation. In Lecture notes in computer science book series (pp. 636–647). EPIA. https://doi.org/10.1007/978-3-031-16474-3_52
- Ozyurt, I. B., & Grethe, J. S. (2018). Foundry: A message-oriented, horizontally scalable ETL system for scientific data integration and enhancement. Database: The Journal of Biological Databases and Curation, 2018, 1–13. https://doi.org/10.1093/database/bay130
10.1093/database/bay130 Google Scholar
- Schmidt, D. C., Fayad, M., & Johnson, R. E. (1996). Software patterns. Communications of the ACM, 39, 37–39. https://doi.org/10.1145/236156.236164
10.1145/236156.236164 Google Scholar
- Shakir, A., Staegemann, D., Volk, M., Jamous, N., & Turowski, K. (2021). Towards a concept for building a big data architecture with microservices. Business Information Systems, 83–94, 83–94. https://doi.org/10.52825/bis.v1i.67
10.52825/bis.v1i.67 Google Scholar
- Soldani, J., Tamburri, D. A., & Van Den Heuvel, W. J. (2018). The pains and gains of microservices: A systematic grey literature review. Journal of Systems and Software, 146, 215–232. https://doi.org/10.1016/j.jss.2018.09.082
- Solutions TA What You Should Know About Microservice Architecture.
- Strigini, L. (2012). Fault tolerance and resilience: Meanings, measures and assessment. In Resilience assessment and evaluation of computing systems (pp. 3–24). Springer. https://doi.org/10.1007/978-3-642-29032-9_1
10.1007/978-3-642-29032-9_1 Google Scholar
- Taibi, D., Lenarduzzi, V., & Pahl, C. (2018). Architectural patterns for microservices: A systematic mapping study. In Proceedings of the 8th international conference on cloud computing and services science (pp. 221–232). SCITEPRESS – Science and Technology Publications. https://doi.org/10.5220/0006798302210232
10.5220/0006798302210232 Google Scholar
- Theodorou, V., Abelló, A., Thiele, M., & Lehner, W. (2017). Frequent patterns in ETL workflows: An empirical approach. Data & Knowledge Engineering, 112, 1–16. https://doi.org/10.1016/j.datak.2017.08.004
- Tsai, C.-W., Lai, C.-F., Chao, H.-C., & Vasilakos, A. V. (2015). Big data analytics: A survey. Journal of Big Data, 2, 21. https://doi.org/10.1186/s40537-015-0030-3
10.1186/s40537-015-0030-3 Google Scholar
- Wang, H., & Ye, Z. (2010). An ETL services framework based on metadata. In Proceedings – 2010 2nd international workshop on intelligent systems and applications, ISA 2010, 0–3. IEEE. https://doi.org/10.1109/IWISA.2010.5473575
10.1109/IWISA.2010.5473575 Google Scholar
- Woo, J., Shin, S.-J., Seo, W., & Meilanitasari, P. (2018). Developing a big data analytics platform for manufacturing systems: Architecture, method, and implementation. The International Journal of Advanced Manufacturing Technology., 99, 2193–2217. https://doi.org/10.1007/s00170-018-2416-9
- Yangui, R., Nabli, A., & Gargouri, F. (2017). ETL based framework for NoSQL warehousing. Lecture Notes in Business Information Processing., 299, 40–53. https://doi.org/10.1007/978-3-319-65930-5_4
10.1007/978-3-319-65930-5_4 Google Scholar
- Zhelev, S., & Rozeva, A. (2019). Using microservices and event driven architecture for big data stream processing. AIP Conference Proceedings, 2172, 090010. https://doi.org/10.1063/1.5133587
10.1063/1.5133587 Google Scholar