posted on 2020-08-06, 16:16authored byAbdullah Q. F. Alqahtani
Enterprise data integration is the process of combining data from multiple sources to fulfil business and market needs. Traditional approaches such as shared databases and messaging-oriented systems fall short of new challenges to keep pace with the evolution of big data in changing markets. Such challenges can be observed at two levels. First, during integration design we need to establish a concise, visual and loosely-coupled mapping to support data integration without disturbing internal business process and applications. Second, at the execution level we need to support efficient bidirectional transformations and synchronisations to respond to evolving requirements in a cost-effective manner. Graph databases such as neo4j are designed to handle and integrate big data from heterogeneous sources using efficient navigation based query languages such as Gremlin. For flexibility and performance, they do not ensure data quality through schemata but leave it to the application level. They are built on top of a rich data model that can be utilised to realise the interconnectedness of real-world data domains. On the other hand, model-driven design and implementation techniques have been widely adopted to build systems at a high level of abstraction. They exploit visual relational models such as the Unified Modelling Language (UML) to describe systems at various stages of development and allow to automatically generate application logic. In this thesis, we present a model-driven approach to bidirectional transformations and synchronisations between relational sources in neo4j. We describe data sharing between relational sources using declarative Triple Graph Grammars (TGGs), then compile them into query and update operations in Gremlin to execute controlled transformations and synchronisations in neo4j using a Model-To-Text (M2T) transformation approach. To ensure the validity and the efficiency of our approach, we evaluate the correctness and the performance of generated Gremlin queries using well-known bidirectional transformations testing suites. The results show that our approach is valid and efficient to support enterprise data sharing and synchronisations. The approach combines the data quality of schema-based solutions while utilising the performance of Gremlin queries on neo4j to perform loosely coupled incremental data transformations for data driven applications.