Transaction Processing and Management in Distributed Database Systems

IJCST VOL. 2, ISSUE 3, SEPTEMBER 2011 ISSN : 2229-4333(Print) | ISSN : 0976-8491(Online) Transaction Processing and Management in Distributed Database Systems 1Gunjan Verma, 2Vineeta verma, 3Komal Singhal, 4Megha Maheshwari 1Meerut institute of Engineering & Technology, Meerut 2Sardar Valabhbhai Patel University of Agriculture & Technology, Meerut 3Meerut institute of Engineering & Technology, Meerut 4Meerut institute of Engineering & Technology, Meerut Abstract 2. Properties of Transactions Distributed Database systems (DDBS) have many different A Transaction has four properties that lead to the consistency problems when accessing database in distributed multiuser and reliability of a distributed data base. These are Atomicity, environment and replicated databases. Access control and Consistency, Isolation, and Durability. transaction management require different process to handle and Atomicity: Atomicity requires that database modifications must monitor the data access and update in DDBS. In multi-tier client/ follow an “all or nothing” rule. Each transaction is said to be server networks make DDBS a better solution to have access atomic. If one part of the transaction fails, the entire transaction to and control over databases. There is some leading Database fails and the database state is left unchanged. It is critical that Management System we can use two-phase commit technique to the database management system maintain the atomic nature of maintain consistent state for the database. The objective of this transactions in spite of any application, DBMS, operating system paper is to explain transaction processing and management in or hardware failure. DDBS and how we can implements this technique. An atomic transfer cannot be subdivided and must be processed in its entirety or not at all. Atomicity means that users do not have Keywords to worry about the effect of incomplete transactions. Transaction, commit, nodes, DB2, Commit phase. Transactions can fail for several kinds of reasons: 1. Hardware failure: A disk drive fails, preventing some of the transaction’s database changes from taking effect. Introduction to Distributed Database System ( D D B S ) 2. System failure: The user loses their connection to the A distributed database is a database that is under the control of a application before providing all necessary information. central database management system (DBMS) in which storage 3. Database failure: E.g., the database runs out of room to hold devices are not all attached to a common CPU. It may be stored additional data. in multiple computers located in the same physical location, or 4. Application failure: The application attempts to post data may be dispersed over a network of interconnected computers. that violates a rule that the database itself enforces such as Collections of data can be distributed across multiple physical attempting to insert a duplicate value in a column. locations. Distributed database system (DDBS) is system that Consistency: Referring to its correctness, this property deals with has distributed data and replicated over several locations. Data maintaining consistent data in a database system. Consistency may be replicated over a network using horizontal and vertical falls under the subject of concurrency control. For example, “dirty fragmentation similar to projection and selection operations in data” is data that has been modified by a transaction that has not Structured Query Language (SQL). The database shares the yet committed. Thus, the job of concurrency control is to be able problems of access control and transaction management, such to disallow transactions from reading or updating “dirty data.” as user concurrent access control and deadlock detection and Isolation: According to this property, each transaction should see a resolution. On the other hand, however, DDBS must also cope consistent database at all times. Consequently, no other transaction with different problems. Accessing of data control and transaction can read or Durability is the ability of the DBMS to recover the management in DDBS needs different methods to monitor data committed transaction updates against any kind of system failure access and update to distributed and replicated databases. IBM (hardware or software). Durability is the DBMS’s guarantee that DB2, a Database Management Systems (DBMS) employs the once the user has been notified of a transaction’s success the two-phase commit technique to maintain a consistent state for transaction will not be lost, the transaction’s data changes will the databases. We are trying to show the Implementation of two- survive system failure, and that all integrity constraints have been phase commit for transaction management in DDBMS and how satisfied, so the DBMS won’t need to reverse the transaction. DB2 implement this technique. Many DBMSs implement durability by writing transactions into a transaction log that can be reprocessed to recreate the system state Distributed Database Security right before any later failure. A transaction is deemed committed The database supports all of the security features that are available only after it is entered in the log. Durability does not imply a with a non distributed database environment for distributed permanent state of the database. A subsequent transaction may database systems, including: modify data changed by a prior transaction without violating the • Password authentication for users and roles durability principle. • Some types of external authentication for users and roles including: 3. Transaction Processing in a Distributed System o Kerberos version 5 for connected user links A transaction is a logical unit of work constituted by one or more o DCE for connected user links SQL statements executed by a single user. A transaction begins with the user’s first executable SQL statement and ends when it is committed or rolled back by that user. A remote transaction 282 INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY www.ijcst.com ISSN : 2229-4333(Print) | ISSN : 0976-8491(Online) IJCST VOL. 2, ISSUE 3, SEPTEMBER 2011 contains only statements that access a single remote node. A 1. The coordinator sends a commit message to all the cohorts. distributed transaction contains statements that access more than 2. Each cohort completes the operation, and releases all the one node. A distributed transaction is a transaction that includes locks and resources held during the transaction. one or more statements that, individually or as a group, update 3. Each cohort sends an acknowledgment to the coordinator. data on two or more distinct nodes of a distributed database. The coordinator completes the transaction when all acknowledgments have been received. 4. Two-Phase Commit of transaction in Distributed database System Failure In transaction processing, databases, and computer networking, the If any cohort votes No during the commit-request phase (or the two-phase commit protocol (2PC) is a type of atomic commitment coordinator’s timeout expires): protocol (ACP). It is a distributed algorithm that coordinates all 1. The coordinator sends a rollback message to all the the processes that participate in a distributed atomic transaction cohorts. on whether to commit or abort (roll back) the transaction (it is a 2. Each cohort undoes the transaction using the undo log, specialized type of consensus protocol). The protocol achieves its and releases the resources and locks held during the goal even in many cases of temporary system failure (involving transaction. process, network node, communication, etc. failures), and is thus 3. Each cohort sends an acknowledgement to the coordinator. widely utilized.[1-3] However, it is not resilient to all possible 4. The coordinator undoes the transaction when failure configurations, and in rare cases user (e.g., a system’s all acknowledgements have been received administrator) intervention is needed to remedy outcome. To accommodate recovery from failure (automatic in most cases) the protocol’s participants use logging of the protocol’s states. 5. The-Phase Commit : DB2 Database Management Log records, which are typically slow to generate but survive System failures, are used by the protocol’s recovery procedures. Many The DB2 database is a distributed database management protocol variants exist that primarily differ in logging strategies system, which employs the two-phase commit to achieve and recovery mechanisms. Though usually intended to be used and maintain data reliability. The following sections infrequently, recovery procedures comprise a substantial portion of explain DB2’s two-phase implementation procedures. the protocol, due to many possible failure scenarios to be considered and supported by the protocol. In a “normal execution” of any How Session maintains between nodes single distributed transaction, i.e., when no failure occurs, which In each transaction, DB2 constructs a session tree for the is typically the most frequent situation, the protocol comprises participating nodes. The session tree describes the relations two phases: between the nodes participating in any given transaction. Each 1. The commit-request phase (or voting phase), in which a node plays one or more of the following roles: coordinator process attempts to prepare all the transaction’s 1. Client: A client is a node that references data from another participating processes (named participants, cohorts, or node. workers) to take the necessary steps for either committing or 2. Database Server: A server

Load more