<<

A Distributed Transaction Model for Read-Write Linked Data Applications

B Nandana Mihindukulasooriya( ),Ra´ul Garc´ıa-Castro, and Asunci´on G´omez-P´erez

Ontology Engineering Group, Escuela T´ecnica Superior de Ingenieros Inform´aticos, Universidad Polit´ecnica de Madrid, Madrid, Spain {nmihindu,rgarcia,asun}@fi.upm.es

Abstract. Read-write Linked Data applications provide a novel alter- native to application integration that helps breaking data silos by com- bining the Semantic Web technologies with the REST design principles. One drawback that hinders the adoption of this approach in enterprise systems is the lack of transactions support. Transactions play a vital role in enterprise systems because inconsistent data can lead to prob- lems, such as monetary losses or legal issues. This paper presents a thesis that aims at defining a REST-compliant transaction model for distributed read-write Linked Data applications. The model extends the ‘transactions as resources’ approach using a set of hypermedia-controls defined by a transactions ontology and a multiver- sion concurrency mechanism. The author plans to formalize the trans- action model which will then be evaluated to ensure the correctness and to perform a performance benchmark to evaluate the feasibility of using it in real world Linked Data applications.

Keywords: Transactions · Linked data · RESTful design

1 Introduction

Linked Data1-based application integration is getting traction as a novel app- roach for integrating data-intensive applications because of the benefits of Linked Data and Semantic Web technologies. Some advantages of this approach over existing approaches include: (a) global identifiers for data that can be accessed using the Web infrastructure and typed links between data from different applications; (b) the graph-based RDF data model that allows consuming and merging data from different sources without having to do complex structural transformations; and (c) explicit semantics of data expressed in RDF Schema or OWL ontologies which can be aligned and mapped to data models of other applications using techniques such as ontology matching [1]. In this context, W3C Linked Data Platform2 (LDP) provides a standard RESTful protocol for read-write Linked Data ensuring interoperability. 1 http://www.w3.org/DesignIssues/LinkedData.html 2 http://www.w3.org/TR/ldp-primer/ c Springer International Publishing Switzerland 2015 P. Cimiano et al. (Eds.): ICWE 2015, LNCS 9114, pp. 631–634, 2015. DOI: 10.1007/978-3-319-19890-3 45 632 N. Mihindukulasooriya et al.

Despite of these benefits, one of the main barriers for the wide adoption of this approach is the lack of transactions support. Traditionally, transactions ensure the atomicity, consistency, isolation, and durability (ACID) properties. However, the strong consistency properties of the ACID model may hinder other quality aspects of data-sharing systems as discussed by the CAP theorem [2] and the PACELC theorem [3]. To overcome these issues, new consistency models such as BASE [4] propose compromises between consistency and availability/latency. The objective of this thesis is to develop a transaction model for read-write Linked Data applications that will provide strong consistency guarantees.

2 Related Work

Though a transaction model for Linked Data applications is a fairly new topic, several approaches for RESTful transactions have been proposed. One of the ear- liest approaches that is widely used in RESTful services is batched transactions using the over-loaded POST method [5]. Atomic REST follows an approach sim- ilar to batched transactions using mediators. The transactions as resources app- roach introduces a novel way of modeling transactions and the RETRO model further develops it using hypermedia controls to drive transaction state. An optimistic technique for transactions using REST and a timestamp-based two phase protocol for RESTful services use optimistic mechanisms in contrast to pessimistic locking. The Try-Cancel/Confirm (TCC) pattern has been proposed to solve the specific business use case of reservation that only requires atomicity but not isolation. The author has studied aforementioned models and the analysis of those models shows that they fail to provide consistency properties required by Linked Data applications due to several challenges [6]. For instance, the TCC model does not guarantee the isolation property and the other models that guarantee isolation do not support distributed transactions; the RETRO model needs a large number of HTTP round trips, which leads to a high overhead (refer to [6] for details).

3 Research Problems

The core research problem addressed in this thesis is how to design a transaction model that ensures strong consistency in distributed read-write Linked Data applications. The core problem is divided into three sub-problems: RQ1. Which are the existing transaction models suitable for Linked Data appli- cations? This question analyzes the state-of-the-art of REST-compliant transac- tion models and evaluates the current approaches based on the consistency model, applicability in the context of Linked data, challenges and limitations. RQ2. How to design a REST-compliant transaction model for Linked Data applications? This question explores the possible compromises between REST constraints and strong consistency guarantees and investigates the possible ways of designing a transaction model that fits Linked Data applications. A Distributed Transaction Model for Read-Write Linked Data Applications 633

RQ3. How to evaluate the proposed transaction model? This question evalu- ates the proposed model for both correctness and the practical usefulness in real world applications.

4 Methodology

First, a comprehensive analysis of the state of the art of RESTful transaction models was carried out with a Systematic Literature Review. The results were used to identify the existing RESTful transaction models, their applicability in the Linked Data application domain, and their gaps and challenges [6]. Second, a RESTful transaction model for Linked Data application has been defined that encloses the good features of the existing models whilst addressing their limitations. The proposed model is built on the transactions as resources model as the base and extends it with a well-defined transaction ontology to represent the transaction metadata as Linked Data. Using the ontology, transac- tions metadata will be represented as dereferenceable structured data serialized in machine readable RDF formats. The model explicitly defines the semantics of the media types used and allows clients to easily extract and process transaction metadata. To achieve , the transaction protocol adopts two version two- phase locking [7] that is used in the domain to Linked Data appli- cations. The proposed model provides solutions to the novel challenges of this approach when applied in the REST domain such as the management of provi- sional resource identifiers, identity conversions on commit and the management of relative URLs. Distributed transactions are handled by a transaction manage- ment service that is transparent to the client by communicating among a network of resource management services involved in the transaction. The model is cur- rently implemented as an extension to the LDP4j framework [8], an open source Java-based framework for the development of interoperable read-write Linked Data applications. Finally, the transaction model will be evaluated based on two main hypothe- ses. The first hypothesis is that the transaction model is correct, i.e., it provides strong transaction guarantees. For this, the transaction model will be specified using a formal model that will define the restrictions of different states and the state transitions enforced by the model. We will follow a similar approach to what has been used by existing Web Service transaction protocols to prove the correctness by formalizing the proposed model using a method such as pi calcu- lus or temporal logic (TLA+). The second hypothesis that will be evaluated is that the transaction model has a low-overhead and does not considerably affect the performance of the applications, i.e., it will be feasible to use the model in real world Linked Data applications. A performance benchmark of the model implementation will be carried out to measure the performance of the transac- tion processing and the findings will be used to fine-tune the implementation and optimize the model. 634 N. Mihindukulasooriya et al.

5 Conclusions and Future Work

This paper presents a thesis on a novel REST-compliant transaction model for distributed read-write Linked Data applications. The model addresses the limita- tions of the previous approaches and provides strong consistency guarantees. The main contributions of this work include: (a) a thorough analysis of the impedance mismatches between the ACID transaction properties and the REST constraints, and (b) the definition, implementation, and evaluation of a transaction model for Linked Data applications that provides strong consistency guarantees. As future work, we plan to expand the transaction protocol to cover different consistency levels and to define a high-level framework for transaction negotia- tion. The model is designed in a way that it provides a mechanism for the clients to express their preferences and expectations (e.g., similar to content negotiation on the Web). This enables the model to be extended to support other types of transactions (e.g., compensating transactions). Another aspect is the integration of the transaction model with the W3C LDP protocol in order to be proposed as a standardized W3C LDP extension.

Acknowledgments. The author is supported by the 4V: Volumen, Velocidad, Var- iedad y Validez en la gesti?n innovadora de datos (TIN2013-46238-C4-2-R) project and he thanks Miguel Esteban-Guti´errez for his valuable input related to this thesis.

References

1. Mihindukulasooriya, N., Garc´ıa-Castro, R., Esteban-Guti´errez, M.: Linked data platform as a novel approach for enterprise application integration. In: Proceed- ings of the 4th International Workshop on Consuming Linked Data (COLD2013), Sydney, Australia (2013) 2. Brewer, E.A.: Towards robust distributed systems. In: Proceedings of the Nine- teenth Annual ACM Symposium on Principles of Distributed Computing, PODC 2000, p. 7. ACM, New York (2000) 3. Abadi, D.J.: Consistency Tradeoffs in Modern Distributed Database System Design: CAP is Only Part of the Story. Computer 45(2), 37–42 (2012) 4. Pritchett, D.: BASE: An Acid Alternative. Queue 6(3), 48–55 (2008) 5. Kochman, S., Wojciechowski, P.T., Kmieciak, M.: Batched transactions for REST- ful web services. In: Harth, A., Koch, N. (eds.) ICWE 2011 Workshops. LNCS, vol. 7059, pp. 86–98. Springer, Heidelberg (2012) 6. Mihindukulasooriya, N., Esteban-Guti´errez, M., Garc´ıa-Castro, R.: Seven chal- lenges for RESTful transaction models. In: Proceedings of the Companion Pub- lication of the 23rd International Conference on World Wide Web, Seoul, South Korea, pp. 949–952 (2014) 7. Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency Control and Recovery in Database Systems, vol. 370. Addison-Wesley, New York (1987) 8. Esteban-Guti´errez, M., Mihindukulasooriya, N., Garc´ıa-Castro, R.: LDP4j: a framework for the development of interoperable read-write linked data applica- tions. In: Proceedings of the 1st ISWC Developers Workshop, Riva del Garda, Italy (2014)