Chapter Three

3. Management

Transactions are used to protect shared data. A process can make used of a transaction to access and modify multiple data items as a single atomic operation. If a transaction operates on data that are distributed across multiple machines then that is a distributed transaction.

There are three main components in a general distributed transaction management organization. The data manager which performs the actual read/write operations, the scheduler which handles , and the transaction manager which manages the entire transaction (Figure 1) [7].

Figure 1: Components in a Distributed Transaction

15 Each transactional resource maintains ACID properties locally. A distributed transaction maintains the following global properties.

 Global atomicity: All resource managers must abort or all commit

 Global serialization: Distributed transaction must be globally serializable

 Global : There must be no deadlocks involving multiple sites

3.1. Java Transaction API Java Transaction API (JTA) implements the X/Open Distributed Model [12]. X/Open is an independent, worldwide, open systems organization that aims to create and promote a vendor-independent interface standard called the Common Applications Environment (CAE) supported by most of the world’s largest information systems suppliers, user organizations and software companies [38].

In the JTA specification, distributed transactions are also referred to as global transactions. In this model there are three components which get involved in a transaction (Figure 2) [12].

 Application Program

 Resource Manager

 Transaction Manager

The application program uses resources from a set of resource managers and interacts with the transaction manager to define transaction boundaries.

16 Figure 2: Components in a JTA Distributed Transaction

A distributed transaction accesses resource managers distributed across a network. The transaction manager interacts with the resource managers to carry out the commit protocol. Here the resource manager is equivalent to the scheduler and the data manager put together in the Figure 1.

Typically a JTA implementation is used within the J2EE environment. Therefore, the application server, or to be more generic the container also has an important role to play in such distributed transactions. Figure 3 illustrates the interfaces used and the interactions in carrying out a distributed transaction in a JTA implementation.

17 Figure 3: Interfaces Involved in a JTA Transaction Note: Original is in color

Java Transaction Service (JTS) specifies the implementation of a transaction manager by implementing Java mapping of CORBA Object Transaction Service (OTS) [13]. JTS transaction manager at the high level implements JTA and at the low level implements the Java mapping of the CORBA OTS 1.1 Specification [13].

Following is a brief description of the execution flow in a JTA implementation of transaction handling [12].

1. A client application invokes a method (in an EJB or a POJO in IOC container) with TX_REQUIRED transaction attribute. At that point container starts a global transaction by asking the Transaction Manager to start a transaction before making the invocation.

2. After the transaction starts, the container invokes the method. The method would require a handler to the connection based resource. For example if the resource is a it would require a database connection. This handler can

18 be obtained using the API provided by the relevant resource adapter. In the case of Java the resource adapter for a database is the JDBC driver.

3. At this point the application server (container) obtains a resource from the resource adapter.

4. The resource adapter creates the Transactional Resource object and the associated XAResource and Connection objects.

5. The application server request for the associated XAResource reference to the transactional resource.

6. The application server enlists the resource to the transaction manager.

7. The transaction manager calls the start method of the XAResource to associate the current transaction to the resource.

8. The application server request for a Connection object reference to the transactional resource.

9. The application server returns the Connection object reference to the application.

10. The application performs one or more operations on the connection.

11. The application closes the connection.

12. The application server receives a notification on connection closure from the resource adapter and at that point delists the resource from the transaction manager.

13. The transaction manager calls the end method of the XAResource to disassociate the transaction from the resource.

14. The application server asks the transaction manager to commit the transaction.

15. The transaction manager calls the prepare method of the XAResource to inform the resource manager to prepare the transaction work for commit.

16. If the resource manager is ready the transaction manager calls the commit method of the XAResource to commit the transaction.

19 Figure 4: Sequence of Events in a JTA Transaction

The sequence diagram in Figure 4 illustrates the sequence of method invocations relevant to the above description in a JTA implementation [12].

3.2. Two Phase Commit Protocol

An atomic commit protocol, initiated by the coordinator ensures the global atomicity. In order to maintain atomicity in a distributed transaction, the protocol should allow the whole transaction to be aborted if any one of the participants (resource managers)

20 aborts the transaction. The two phase commit protocol is designed to handle this [7].

Figure 5: Two Phase Commit Protocol State Changes in the Coordinator

In the first phase of the two phase commit protocol, the coordinator sends a VOTE_REQUEST message to all participants. When a participant receives the VOTE_REQUEST message from the coordinator, it replies with either VOTE_COMMIT or VOTE_ABORT depend on its ability to commit its part of the transaction locally. The coordinator collects all the votes and if all participants have replied with VOTE_COMMIT, it also votes for a commit and sends a GLOBAL_COMMIT message to all participants. If at least one of the participants has replied VOTE_ABORT, it sends a GLOBAL_ABORT message to all the participants, aborting the distributed transaction. Each participant that has voted for a commit waits for the final message from the coordinator and if it receives a GLOBAL_COMMIT message, locally commit its part of the transaction or if it receives GLOBAL_ABORT message, locally abort its part of the transaction.

21 Figure 6: Two Phase Commit Protocol State Changes in the participant

3.2.1. Two Phase Commit Protocol under Failure Let us now evaluate the challenges of implementing a two phase commit protocol under failure. In a standard communication environment, if there are no byzantine failures, there could be communication failures and process failures. Any such failure could block the protocol execution. Typically timeout mechanisms are used to identify these types of failures.

There are three scenarios where the protocol execution could get blocked.

 Blocked coordinator in the WAIT state waiting for Vote-commit or Vote-abort from a participant

 Blocked participant in the INIT state waiting for Vote-request from the coordinator or

 Blocked participant in the READY state waiting for Global-commit or Global- abort from the coordinator

It can be understood that since the coordinator is the one who initiate the protocol it is not going wait in the INIT state. Also coordinator does not expect an acknowledgment for Global-commit and Global-abort. Hence it is not going to wait in those stages as well. Participant does not going get blocked at COMMIT or ABORT state since it is

22 the last state of the protocol execution.

3.2.1.1. Blocked Coordinator in the WAIT State The lost of messages VOTE_REQUEST, VOTE_COMMIT, VOTE_ABORT or a crashed participant could cause the coordinator to timeout in the state WAIT (Figure 7).

Figure 7: Coordinator Blocked at State WAIT Note: Original is in color

If a timeout occur at this state, the coordinator vote for an abort and sends a GLOBAL_ABORT to all participants.

3.2.1.2. Blocked Participant in the INIT State The lost of the message VOTE_REQUEST or a crashed coordinator could cause a participant to timeout in the INIT state (Figure 8).

If a timeout occur at this stage the participant locally aborts the transaction and sends a VOTE_ABORT to the coordinator.

3.2.1.3. Blocked Participant in the READY state The lost of messages GLOBAL_COMMIT, GLOBAL_ABORT or a crashed coordinator could cause a participant to timeout in the READY state (Figure 9).

23 Figure 8: Participant Blocked at State INIT Note: Original is in color

Figure 9: Participant Blocked at State READY Note: Original is in color

In this case the participant has to find out whether the coordinator has sent a GLOBAL_COMMIT message or a GLOBAL_ABORT message. The participant can contact another participant and check its state. If the second participant has received a GLOBAL_COMMIT or a GLOBAL_ABORT the participant who didn't receive the message can decide whether to locally commit its part of the transaction or to locally abort its part of the transaction. The second participant could still be in the INIT state if the coordinator has crashed while multicasting the VOTE_REQUEST message. Then it is safe to locally abort the transaction. If the second participant is also in the

24 READY state then no decision can be made, since to commit the transaction all participants should vote for it. In this case it has to wait until the coordinator recovers and resend the decision.

3.2.1.4. Process Recovery When a crashed process recovered it needs to figure out what are the next steps to follow in order to either complete the transaction if it is possible, or to abort the transaction. Hence during the protocol execution a process should save its state so that it can execute a recovery algorithm once it recovers from a crash.

A coordinator could be in any of the states, INIT, WAIT, COMMIT or ABORT when a crash occurs. The coordinator does not need to keep track of the INIT state since when recovers it is the default state. Even if the coordinator has crashed during the transmission of the message VOTE_REQUEST, it is acceptable to retransmit the message.

If it was in the WAIT state it should retransmit the message VOTE_REQUEST since there could be participants who has not received the message.

If the coordinator was in the COMMIT or the ABORT state it should retransmit the decision to the participants since there could be participants who have not received the decision.

A participant could be in any of the states, INIT, READY, COMMIT or ABORT when a crash occur. The participant needs to keep track of all these states in order to perform a successful crash recovery.

If it was in the INIT state it can safely abort its part of the transaction locally and send a VOTE_ABORT message to the coordinator.

If the participant was in the COMMIT or the ABORT state it should resend the decision to the coordinator. However, the coordinator may have timeout by this time and aborted the transaction and that could cause the recovered participant to timeout. At that point the participant can safely abort the transaction locally.

If the participant was in the READY state, it has to contact another participant to find out whether the coordinator has sent a GLOBAL_COMMIT message or a

25 GLOBAL_ABORT message. This is similar to the situation discussed in section 3.2.1.3. Blocked Participant in the READY state.

3.2.2. Pseudo Code for Two Phase Commit Protocol Figure 10 illustrates the algorithm need to be followed by the coordinator. Figure 11 and Figure 12 illustrate the algorithms need to be followed by the participants [7].

Figure 10: Pseudo Code Algorithm for the Coordinator

3.3. Global Serialization and Distributed Deadlocks

Two phase commit protocol ensures the global atomicity of a transaction. Global Serialization is achieved using concurrency control algorithms. Serialization guarantees that multiple transactions can be executed simultaneously while still being isolated at the same time. Each resource manager has a scheduler which handles concurrent operations in local transactions.

26 Figure 11: Pseudo Code Algorithm for Participants

Figure 12: Pseudo Code Algorithm for Participants' Decision Request

27 3.3.1. Concurrency Control in a Local Transaction Most widely used concurrency control technique is to make use of locks. Two-phase locking [7] and strict two-phase locking [7] are two popular algorithms used for concurrency control. The disadvantage of locking techniques is, use of locks could lead to deadlocks. Another approach to concurrency control is timestamp ordering [7]. In timestamp ordering, transactions do not wait for shared resources. Hence it is free. However it has a high probability of transaction aborts compare to locking. A third approach to concurrency control is optimistic concurrency control [7]. This is again a deadlock free technique which maximizes the parallelism. But in an environment where there is heavy load which include lots of write operations, this is a poor choice.

Deadlock prevention is not a popular approach since it unnecessarily restricts access to shared resources and highly reduces concurrency. Wait-for graph [24] is a technique used for deadlock detection. Typically manager is responsible for deadlock detection. When a deadlock is detected, one of the transactions which is part of the deadlock is aborted to break the cycle. A commonly used approach is lock timeouts. When a lock is timeout, if there is another transaction waiting for that resource the lock is broken, causing the transaction to be aborted.

3.3.2. Concurrency Control in a Distributed Transaction It can be seen that from the resource manager point of view there is no difference between concurrent local transactions vs. concurrent distributed transactions. However, in a distributed transaction, while each resource manager ensures local serialization of transactions, there is a joint responsibility of ensuring global serialization. Even if every local schedule of a single database is serializable, the global schedule of the whole system is not necessarily serializable. Distributed two phase locking, Distributed timestamp ordering, and Commitment ordering are few prominent global serialization techniques.

3.3.2.1. Global Serialization Using Distributed Two Phase Locking A lock manager which is responsible for managing locks for distributed resources across concurrent distributed transactions is required. Each resource manager is expected to obtain a lock from the lock manager before performing any operation on

28 its local resources. There are two types of distributed two phase locking techniques based on the lock manager implementation.

 Centralized Two Phase Locking

 Primary Two Phase Locking

In centralized two phase locking, one site is appointed as the lock manager. When the number of sites is high and the number of concurrent transactions is high, the lock manager could become a bottleneck. Another disadvantage of this approach is the lower reliability of the system due to central lock manager been a single point of failure.

In primary two phase locking, each site is made responsible for managing global locks for its own resources. This approach has overcome both problems in centralized two phase locking. Lock management is distributed across different sites and a failure of a single site does not bring the entire system down. However, handling deadlocks becomes difficult since the lock management is distributed across multiple sites.

Figure 13: Example of a Distributed Deadlock Note: Original is in color

It can be observed that, if a locking policy is used for concurrency control, since the locks are held globally for each resource and different servers may have different

29 local ordering, there is a higher possibility of distributed deadlocks. An example of this is demonstrated in the event ordering which is illustrated in figure 3.14. In this example, transaction T wait for U's write lock on A, while having a write lock on C. At the same time transaction U wait for T's write lock on C, while having a write lock on A (Figure 13). Distributed deadlock prevention and distributed deadlock detection are discussed in section 3.3.3 and section 3.3.4 respectively.

3.3.2.2. Global Serialization Using Timestamp Ordering

In timestamp ordering concurrency control, serial equivalence is enforced by committing the versions of objects in the order of the timestamps of transactions that accessed them. To ensure global uniqueness and ordering, a timestamp is generated using the Lamport's algorithm [7].

This globally unique transaction timestamp is issued to the client by the first resource manager access by a transaction. Then it is passed to the resource managers whose objects perform an operation in the transaction. Resource managers can make use of this globally unique timestamp to ensure the operations are carried out in a serially equivalent manner. Detail description of timestamp ordering concurrency control can be found in [7].

In timestamp ordering concurrency control, conflicts are resolved as each operation is performed. If a conflict occurs, which requires the transaction to be aborted, the transaction manager is informed. However, a transaction which could have waited and completed successfully in a lock base concurrency control schema may has to abort in a timestamp ordered concurrency control schema. But the advantage of a timestamp ordered concurrency control schema for distributed transaction management is, it is free of distributed deadlocks.

3.3.2.3. Global Serialization Using Commitment Ordering

In transaction processing, commitment order is a property of the transaction scheduler's transaction history. It has been identified that this property can be used to effectively achieve global serialization of distributed transactions [41].

In commitment ordering, if T1 and T2 are two concurrent committed transactions

30 where T1 has preceded T2, a commitment ordered schedule ensures that in every two such transactions T1 commits before T2 commits. In order to achieve global commitment order, a scheduler could delay its vote for the two phase commit protocol for a particular transaction until a proceeding transaction is completed.

Since transaction are scheduled independently by each local scheduler there could be different precedence ordering of transactions at each scheduler. In such scenarios, when a transaction tries to commit it is not going to receive all the votes required to complete the two phase commit protocol. Therefore, it can be seen that commitment ordering is deadlock free.

Let's consider three concurrent distributed transactions T1, T2, and T3 with global precedence order of T1 > T2 > T3. Scheduler A and scheduler B are involved in all three transactions (Figure 14).

T1 T2 T3 A B A B A B a1 a3 b2 a2 b1 a2 b1 b2 Figure 14: Resource access order of each transaction

Schedule A Schedule B Time t1 t2 t4 t3 t4 t5 T1 a1 a2 b1 T2 a3 b1 b2 T3 a2 b2 Figure 15: Transaction schedules at each resource manager

The scheduled order of these transactions at each scheduler could be different to the global precedence order and they may get scheduled in a different order. Figure 15 illustrates one such possible schedule. Here, the precedence order at scheduler B is T2 > T1 > T3.

31 It can be seen that even though each transaction is locally serialized by each scheduler they are not globally serialized. In this schedule, transaction T1 and T3 will try to complete its transaction first. But, scheduler A will delay its vote for transaction T3 until transaction T2 is completed (either commit or abort), in order to maintain the commitment order property. Therefore it guarantees global serialization of the transactions.

3.3.3. Distributed Deadlock Prevention Deadlocks can be prevented if there is prior knowledge on the resource acquired by a transaction. This is not practical in an interactive application. It is not possible to decide which resources in what order a transaction may acquire. Therefore distributed deadlock prevention strategies are not popular in distributed transaction handling.

3.3.4. Distributed Deadlock Detection

Deadlocks are rare. Hence, deadlock detection schemes are popular in distributed transaction handling. In deadlock detection schemes, a transaction is aborted when it is involved in a deadlock.

3.3.4.1. Distributed Deadlock Detection by Edge Chasing

A distributed approach to deadlock detection is edge chasing (Figure 16). This is also called path pushing. In this approach each resource manager has the knowledge about the edges which it has involved in the global wait-for graph. This information is forwarded in messages which follow the edges of the graph. The resource manager receives information about the global wait-for graph from other resource managers through the same edges. Once such information is received, the resource manager adds its own edge information to the message and sends it. Analyzing these messages deadlocks can be detected.

When a deadlock is detected, a transaction in the cycle is aborted to break the deadlock. The transaction to be aborted can be decided by assigning each transaction a priority. Transaction can be given priorities using Lamport's algorithm in such a way that they are totally ordered. When a deadlock is detected the transaction with the lowest priority is aborted.

32 Figure 16: Edge Chasing Algorithm Note: Original is in color

3.3.4.2. Distributed Deadlock Detection by Timeouts

If a transaction has been waiting too long for a lock, it could be in a deadlock cycle. If the application doesn't have long running transactions this provides a simple pessimistic approach to deadlock detection. This is the approach used in JBoss Transaction Service for deadlock resolution (as of October 2007).

33