Mapping Relational Data into Property Graphs

Student

  • Moueiad Alnashi

Description

The ISEBEL project aims to build an international search engine that is able to harvest data from folktale databases. The initial project concentrates on belief legends [1] found in the three well known digital collections by Evald Tang Kristensen from Denmark (etkspace), Richard Wossidlo from Mecklenburg (wossidia) and several collectors and narrators from the Netherlands (verhaalenbank). Part of the project is on data and graph mining [2,3] for frequent patterns. Therefore, the story data is harvested using OAI-PMH and managed in a CKAN repository. This data can be access as XML or JSON from that repository using a REST-API as well as from the WossiDiA systems REST-API. Nevertheless local databases provide more information the the exported story data but are often implemented using relational database systems. Therefore, mapping data from relational sources into graph data is essential.

The WossiDiA system [4] itselfs as one of the databases harvested by ISEBEL uses typed, directed hypergraphs [5] for representing the collections by Richard Wossidlo. The content encompasses field notes, correspondences with scholars, contributors and informants as well as references to published work on the everyday life in the country Mecklenburg from late 19th century to the 30s of 20th century. The hypergraph database is realized as an extension to an object-relational database system using PostgreSQL.

To support the researchers in analyzing, browsing and visualizing certain aspects in the collected data using the graph paradigm and algorithms developed not only restricted to the WossiDiA graph data, the bachelor thesis should aim at transforming collected data from such relational data source into property graph data [6] in general. The concepts of the relational model like relations, attributes, primary and foreign key relationships have to be mapped into concepts of the property graph model [7], i.e. nodes, edges, labels and properties as key-value pairs.

Therefore, a general methodology has to be developed for transforming relational data into graph data [7]. The transformation should be guided by user defined rules which describe a mapping from relational concepts [8] to Property Graph Model concepts [9, 10]. An extension to the rule specification languages has to be designed based on the X2G rule language [11, 12].

By that, the ethnologist and researcher should be able to define rules which select attributes and content from relational data sources and generate nodes, labels, edges, and properties of a graph from that selection. Finally, a tool has to be developed which automatically generates the property graph data in different output formats (e.g. csv, gexf, graphml, dot) based on a rule-set given by the user. It functionality has to be demonstrated by sample scenarios from the ISEBEL project, e.g. the witch and werewolf hunter scenario looking for the gender influence on witch or werewolf stories.

Road Map

  • Research on, analysis of and summing up the relational data model and Property Graph Model concepts

  • Presenting the state-of-the-art in relational and graph transformation concepts, techniques and tools

  • Requirement analysis of graph visualization scenarios of the ISEBEL project

  • Defining a transformation rule language based on the X2G rule language

  • Designing a software tool for rule-based transformation

  • Prototype implementation and quality based evaluation using ISEBEL sample scenarios

References

  1. Usó-Doménech, J.L. & Nescolarde-Selva, What are Belief Systems?. J. Found Sci (2016) 21: 147.

  2. Charu C. Aggarwal, Haixun Wang: Managing and Mining Graph Data. Advances in Database Systems 40, Springer 2010, ISBN 978-1-4419-6044-3

  3. DianeJ.Cook,LawrenceB.Holder(eds),MiningGraphData.Wiley,Hoboken,New Jersey, 2006

  4. HolgerMeyer,Alf-ChristianScheringandChristophSchmitt,WossiDiA---The Digital Wossidlo Archive, in: Holger Meyer, Christoph Schmitt, Thomas Jansen and Alf-Christian Schering (Hrsg.), Corpora ethnographica online --- Strategien der Digitalisierung kultureller Archive und ihrer Präsentation im Internet, Volume 5 of Rostocker Beiträge zur Volkskunde und Kulturgeschichte, Waxmann, 2014, 61–84.

  5. Meyer, Holger, Alf-Christian Schering, and Andreas Heuer. "The Hydra. PowerGraph System." Datenbank-Spektrum (2017): 1-17.

  6. AngelaBonifati,GeorgeFletcher,HannesVoigt,NikolayYakovets:QueryingGraphs. Morgan & Claypool, Synthesis Lectures on Data Management, 2018.

  7. Chiba, H., Yamanaka, R., Matsumoto, S. (2020). G2GML: Graph to Graph Mapping Language for Bridging RDF and Property Graphs. In: , et al. The Semantic Web – ISWC 2020. ISWC 2020. Lecture Notes in Computer Science(), vol 12507. Springer, Cham.

  8. Radu Stoica, George H. L. Fletcher, Juan F. Sequeda: On Directly Mapping Relational Databases to Property Graphs. AMW 2019

  9. GenovevaVargas-Solar,José-LuisZechinelli-Martini,JavierA.Espinosa-Oviedo: Enacting Data Science Pipelines for Exploring Graphs: From Libraries to Studios. ADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium, 271- 280, 2020.

  10. Dominik Tomaszuk, Renzo AnglesŁukasz Szeremeta, Karol Litman, and Diego Cisterna: Serialization for Property Graphs. Beyond Databases, Architectures and Structures. Paving the Road to Smart Data Processing and Analysis. BDAS 2019: Beyond Databases, Architectures and Structures. Paving the Road to Smart Data Processing and Analysis, 57-69, 2019.

  11. Zakkor, Safwat: XML to Graph Mapping Tool. Bachelor thesis, University of Rostock, 2022.

  12. The X2G Rule Language. git.informatik.uni-rostock.de/dbis/Hydra/x2g