Consistency in Real-Time Collaborative Editing Systems Based on Partial Persistent Sequences
Total Page:16
File Type:pdf, Size:1020Kb
Consistency in Real-time Collaborative Editing Systems Based on Partial Persistent Sequences Qinyi Wu Calton Pu CERCS, Georgia CERCS, Georgia Institute of Technology Institute of Technology Atlanta, USA Atlanta, USA [email protected] [email protected] ABSTRACT broadcasting local updates to other sites. In real-time collaborative editing systems, users create a Several data consistency models for collaborative editing shared document by issuing insert, delete, and undo oper- systems have been proposed in the literature [10, 17, 20, ations on their local replica anytime and anywhere. Data 29]. They require 1) all document replicas converge to the consistency issues arise due to concurrent editing conflicts. same view; and 2) execution of editing operations conform Traditional consistency models put restrictions on editing to the happen-before precedent order defined by Lamport’s operations updating different portions of a shared docu- Clock Condition [14]. These earlier data consistency models ment, which is unnecessary for many editing scenarios, and are conservative in the sense that the happen-before rela- cause their view synchronization strategies to become less tion captures potential causality, not necessarily real causal- efficient. To address these problems, we propose a new data ity. For example, if two accountants work on the financial consistency model that preserves convergence and synchro- reports of different departments through a shared spread- nizes editing operations only when they access overlapped sheet, there is no real causality between the editing opera- or contiguous characters. Our view synchronization strategy tions from these two accountants. We are aware that there is implemented by a novel data structure–partial persistent are scenarios that people work closely on a shared document. sequence. A partial persistent sequence is an ordered set of For example, the web page for the 2008 Olympic Game at items indexed by persistent and unique position identifiers. Wikipedia was updated more than a thousand times dur- It captures data dependencies of editing operations and en- ing August 2008 [33]. In these scenarios, the happen-before codes them in a way that they can be correctly executed on relationships may closely mimic real causalities. However, any document replica. As a result, a simple and efficient the temporary violation of cause-effect order is generally re- view synchronization strategy can be implemented. garded acceptable in these non-mission-critical scenarios as long as the final version contains all the updates. View synchronization strategies have been proposed in Keywords the past to resolve editing conflicts such as locking, trans- collaborative editing system, data consistency model, per- action mechanisms, and turn-taking protocols [11]. These sistent data structure approaches have limited use in real-time collaborative edit- ing scenarios due to high overhead in lock management and 1. INTRODUCTION restricted collaboration. A well-accepted approach is called Operational Transformation (OT) [10] due to its non-blocking With the technological advance in real-time collaborative feature in view synchronization. The limitation of OT is editors (such as SubEthaEdit [2], Coward [34] and TeNDaX that each operation has to be transformed with all its con- [15]), more and more documents are coauthored. The col- current operations and all operations that happen after it if laboration includes the process of preparing a presentation they reach other sites out of order. Li et al. [16] analyzed with colleagues [1] to the more structured sharing of docu- one of OT algorithms and shows that when an editing op- ments in business process management [13]. As a result, a eration is broadcast to a remote site, the complexity of its document is no longer shared as an atomic unit. Instead, transformation cost is O(n2) on average and O(n3)inthe it is processed at a finer granularity such as sections, para- worst case, where n is the number of editing operation in graphs, or even sentences. To maintain data consistency, the history. this type of communication must be carefully coordinated To address the above problems, the first contribution of due to concurrent editing conflicts. this paper is the DDP consistency model that requires con- This paper focuses on a data-dependency preservation vergence and preserves data-dependency precedent order on (DDP) consistency model (Section 3) and its correspond- execution of editing operations. The data-dependency prece- ing view synchronization strategy for collaborative editing dence preservation property is more relaxed than the happen- systems. In the targeted editing scenario, a shared docu- before precedence preservation property (as defined in OT). ment is replicated at multiple sites connected by communi- Therefore, the DDP consistency model allows higher con- cation networks. The user at each site can update his/her currency and can be enforced more efficiently (see below). local replica by issuing insert, delete, and undo operations Concretely, a data dependency is defined on a pair of oper- anytime and anywhere. Local updates are executed immedi- ations if they modify overlapped or contiguous characters. ately for fast response time. The underlying editing system Users working on different portions of the document can is responsible for view synchronization among all replicas by collaborate efficiently without any interference. user1 user2 user3 The second contribution of the paper is an efficient and abcd abcd abcd simple method to enforce the DDP consistency model, based o1 o2 on a view synchronization strategy that uses a novel data o : insert ‘e’ at offset 0 structure–partial persistent sequence (PPS). A PPS is an or- 1 o o : insert ‘f’ at offset 2 dered set of items indexed by persistent and unique position 3 2 o3: delete ‘c’ identifiers. This new data structure is able to create a global site1 site2 site3 position identifier for every character, which carries enough information to directly locate its right position in any docu- Figure 1: A collaborative editing scenario ment replica. With the help of this data structure, it is also straightforward to construct an undo operation to cancel the Logical View effect of any editing operation. This greatly simplifies our a b c d view synchronization strategy for shared documents edited by insert, delete, and undo operations. Furthermore, the position identifiers are totally ordered, we can maintain a header a b c d tail PPS in a search tree (such as a B-tree). In section 5.1, we show that the computational complexity of our view syn- physical View chronization strategy is bounded by O(m), where m is the number of operations that are data-dependency related in Figure 2: Two views of a document an editing history. Finally. the PPS data structure allows us to easily capture the data dependency between any pair of editing operations that modify overlapped or contiguous remove the footprint of an editing operation from the edit- characters. As a result, editing conflicts due to shared access ing history. Instead, a new operation is created to cancel its can be efficiently detected and resolved. effect. The advantage of this compensated-undo approach Roadmap. The paper is organized as follows. Section 2 is that an undo operation is treated like a normal insert or first gives a brief background for collaborative editing sys- delete operation. Therefore, the view synchronization strat- tems. Then we introduce the notion of logical view and egy does not need to be changed. We will devote a separate physical view of a document. Based on these two levels of section for discussing the compensated undo approach in view, we describe two types of logical view synchronization Section 5.2. approaches and explain why we choose one of them. Section 3 defines the DDP consistency model. Section 4 first gives 2.1 Two Levels of View on Documents a formal definition for PPSs, and then shows how this new From a user’s perspective, a document consists of a sequence data structure maintains the two levels of view of a docu- of characters. If a new character is inserted, a portion of the ment and its properties. In Section 5, we show the appli- sequence will be shifted right to vacate the space for the new cation of PPSs to view synchronization and undo handling. character. Correspondingly, if a character is deleted, a por- Section 6 covers related work. We conclude in Section 7. tion of the sequence will be shifted left to reclaim the space. On the other hand, the underlying editing system keeps the 2. COLLABORATIVE EDITING SYSTEMS characters of the document in a selected data structure, such as an array and a linked list. Figure 2 illustrates the idea A collaborative editing system (CES) consists of a set of by a linked list. We call the sequence data structure from users editing a shared document, located on sites connected the user’s perspective logical view and the implementation by communication networks. For simplicity, each site has data structure from the editing system’s perspective physical one user. The users collaborate in a real-time fashion to view. In the rest of the paper, we use the term document as create a shared document. Each site maintains a replica abbreviation for the logical view of a document and the term of the shared document and executes four types of activi- physical document as abbreviation for the physical view of ties: 1) generate local insert, delete, and undo operations; adocument. 2) broadcast local updates to other sites; 3) receive opera- At the logical view, a document is defined by a sequence tions from other sites; 4) execute operations (either local or c of characters E = ci ∈ Σ , 1 ≤ i ≤ n, n ∈ N,where remote update). c Σ is the alphabet of the document. denotes an empty A collaborative editing scenario is shown in Figure 1.