(12) United States Patent (10) Patent No.: US 6,212,595 B1 Mendel (45) Date of Patent: *Apr

USOO6212595B1 (12) United States Patent (10) Patent No.: US 6,212,595 B1 Mendel (45) Date of Patent: *Apr. 3, 2001 (54) COMPUTER PROGRAM PRODUCT FOR OTHER PUBLICATIONS FENCING A MEMBER OF A GROUP OF PROCESSES IN A DISTRIBUTED Chung-Sheng Li et al., Automatic Fault Detection, Isola PROCESSING ENVIRONMENT tion, and Recovery in Transparent All-Optical Networks, Journal of Lightwave Technology, pp. 1784-1793, Oct. (75) Inventor: Gili Mendel, Cary, NC (US) 1997.* Y. Ofek et al., Generating a Fault-Tolerant Global Cock (73) Assignee: International Business Machines Using High-Speed Control Signals for the MetaNet Archi Corporation, Armonk, NY (US) tecture, IEEE Transactions on Communications, pp. 2179–2188, May 1994.* Notice: Subject to any disclaimer, the term of this G. Alariet al., “Fault-Tolerant Hierarchical Routing", IEEE patent is extended or adjusted under 35 International Conference on Performance, Computing, and U.S.C. 154(b) by 0 days. Communications, pp. 159-165, 1997.* This patent is Subject to a terminal dis Aldred, M. “A Distributed Lock Manager on Fault Tolerant claimer. MPP', System Sciences, pp. 134–136, 1995.* Sankar, R., et al., An Automatic Failure Isolation and Recon figuration Methodology for Fiber Distributed Data Interface (21) Appl. No.: 09/124,672 (FDDI), Communications, pp. 186-190, 1992.* (22) Filed: Jul. 29, 1998 * cited by examiner (51) Int. Cl." .......................... G06F 12/00; G06F 17/60 (52) U.S. Cl. ............................... 711/1; 709/220; 711/154; Primary Examiner Reginald G. Bragdon 711/156; 711/159 ASSistant Examiner Pierre-Michel Bataille (58) Field of Search ..................................... 711/162, 154, (74) Attorney, Agent, or Firm-Floyd A. Gonzalez: 711/165, 201, 1, 156, 159; 714/7–8, 4; Lawrence D. Cutter 709/220; 370/254, 258 (57) ABSTRACT (56) References Cited A computer program product for use in a distributed pro cessing System having a plurality of nodes wherein Selected U.S. PATENT DOCUMENTS nodes are fenced or unfenced from Selected ones of periph 4,683,563 7/1987 Rouse et al. ......................... 370/224 eral device Server nodes in a fence/unfence operation. A 4,919,545 4/1990 Yu .......................................... 380/25 common memory is provided for Storing a fence map listing 5,301.283 4/1994 Thacker et al. ... 395/325 nodes fenced from Server nodes. In the fence/unfence 5,313,585 * 5/1994 Jeffries et al. ... 711/201 operation, a request processing node proposes changes to the 5,386,551 1/1995 Chikira et al. ... 395/575 fence map, and if no node fails during the fence/unfence 5,416.921 5/1995 Frey et al. ... ... 395/575 operation, the proposed changes are changed into committed 5,423,044 6/1995 Sutton et al. ........................ 395/725 changes. If a node fails during the fence/unfence operation, 5,568,491 * 10/1996 Beal et al. ........................... 714/746 the proposed changes are erased, the previous committed 5,675,724 * 10/1997 Beal et al. ............................... 714/4 changes are restored, and the fence/unfence request is 5,963,963 * 10/1999 Schmuck et al. ... 707/205 removed from the process queue for processing by the 5.991,264 * 11/1999 Croslin ........... ... 370/225 5,996,075 11/1999 Matena ................................. 713/200 request processing node. 5.999,712 * 12/1999 Moin et al. ...................... 395/200.5 6,038,604 3/2000 Bender et al. ....................... 709/233 10 Claims, 7 Drawing Sheets : -540 g 53 PHASE 3 YRTE PROPOSE) CHANCES TO 555 REGISTRY Y WOTE RE.JECT WOTE CONTiNE A-55 ATTACHF AS MESSAGE J 545- vote 58 A 80CE LUSH s AO STALE, isREFRESH 2. 534 EEEAKE MY 552-/ A NODEN 55 INX C0ki.T g REGISTRY 833- Y ELINATE OR RESTOREWSD ACCESS 552 - TE DEFAUL - - - - APPROVE REJECT scoutWOTE FSFAPPROVE PROFOCOL & U.S. Patent Apr. 3, 2001 Sheet 2 of 7 US 6,212,595 B1 90|| U.S. Patent Apr. 3, 2001 Sheet 3 of 7 US 6,212,595 B1 SDR-114 FENCE MAP 2O2 COMMT BT FLAG MAP SDR-114 FENCE MAP 202 COMMT BIT MAP MAP U.S. Patent Apr. 3, 2001 Sheet 4 of 7 US 6,212,595 B1 LOOK AT TOP-501 ---> OF REGUEST QUEUE GET FENCE MAP CHANGE REQUEST REMOVE N NEED REQUEST PROTOCOL FROM QUEUE ? 519 CLEAN UP REGISTRY VOE REJECT PROTOCOL RUNNINg WAT FIG.5A 1. w 508 ESSES +APPROVE PROTOCOL F ANY NODE FAIS PROTOCOL 9 PHASE 1 ON ALL NODES 50 AM Y/ | NODENN A COMPUTE F 511 3/N510 VOTE 520 CLEAN CONTINUE REGISTRY OF 512 PROPOSED RECORDS U.S. Patent Apr. 3, 2001 Sheet 5 of 7 US 6,212,595 B1 6 WRITE PROPOSED CHANGES TO REGISTRY 523 514 PHASE 2 VOTE CONTINUE 515 525 AM NODE YN A f A NODE p 516- 526 , -530 CLEAN REGISTRY IF MAP S STALE, REFRESH 527 FROM REGISTRY 534-N MAKE MY VOTE MAP STALE REJECT ELIMINATE OR RESTORE WSD ACCESS 535 VOTE :APPROVE PROTOCOL CONTINUE F NODES FAI FIG5B W t U.S. Patent Apr. 3, 2001 Sheet 6 of 7 US 6,212,595 B1 540 PHASE 3 DID ANYBODY IN VOTE F DE REJECT p AM A NODE N F p 545 VOTE APPROVE 550 AM SMALLEST Y COMMIT NODE INF REGISTRY p FIG5C 552 VOTE I DEFAULT APPROVE REJECT U.S. Patent Apr. 3, 2001 Sheet 7 of 7 US 6,212,595 B1 (APPROVED REMOVE FIG.6 FENCE REQ. -- 605 FROM QUEUE 6O2 REJECT RESTORE LOCAL FG.7 STATE TO LAST COMMITED VERSION OF REGISTRY US 6,212,595 B1 1 2 COMPUTER PROGRAM PRODUCT FOR node, and the data on the disk drive may still be accessed by FENCING A MEMBER OF A GROUP OF the Secondary node. PROCESSES IN A DISTRIBUTED The Group Services product of PSSP keeps a record of PROCESSING ENVIRONMENT member nodes in a group of nodes. It is desirable to provide a fencing function to the VSD Subsystem to provide fencing The present application is related to co-pending applica Support. tions bearing Ser. No. 09/124,677 and Ser. No. 09/124,394 In the case that a proceSS instance using VSDS on node X both of which were filed on the same day as the present is unresponsive, a distributed Subsystem may wish to ensure application namely, Jul. 29, 1998, and both of which are that X's access to a set of virtual disks (VSDs) is severed, assigned to the same assignee as the present invention. and all outstanding I/O initiated by X to these disks are The present invention is related to fencing of nodes in a flushed before recovery can proceed. Fencing X from a Set distributed processing environment, and is more particularly of VSDS denotes that X will not be able to access these related to fencing of nodes in a shared disk Subsystem. VSDs (until it is unfenced). Fence attributes must survive node Initial Program Loads (IPLs). BACKGROUND OF THE INVENTION 15 SUMMARY OF THE INVENTION U.S. Pat. No. 4,919,545 issued Apr. 24, 1990 to Yu for The present invention provides a distributed computer DISTRIBUTED SECURITY PROCEDURE FOR INTEL System having a plurality of nodes, one of the nodes being LIGENT NETWORKS, discloses a security technique for a request processing node (A node) and one or more nodes use in an intelligent network and includes Steps of granting being peripheral device server nodes (Snodes), an apparatus permission to an invocation node to access an object by for fencing or unfencing in a fence/unfence operation, and transmitting a capability and a signature from an execution one or more nodes (X nodes) from said S nodes. The node to the invocation node thereby providing a method for apparatus includes a common memory for Storing a fence authorizing a node to gain access to a network resource by map having entries therein, each entry for Storing an indi using a form of Signature encryption at the node. cation of an Snode to be fenced, a commit bit indicating if U.S. Pat. No. 5,301,283 issued Apr. 5, 1994 to Thacker et 25 the entry is proposed or committed, and a bit map indicating a1 for DYNAMIC ARBITRATION FOR SYSTEM BUS which X nodes are to be fenced from the Snode of the entry. CONTROLIN MULTIPROCESSOR DATAPROCESSING Each of the plurality of nodes includes a local memory for SYSTEM discloses a data processing system having a Storing a local copy of Said fence map. A node processes a plurality of commander nodes and at least one resource node request Specifying X nodes to be fenced or unfenced from interconnected by a System bus, and a bus arbitration tech Specified Snodes during Said fence/unfence operation, and nique for determining which commander node is to gain computes the nodes to participate (F nodes) in the fence/ control of the System bus to access the resource node thereby unfence operation. The participating nodes includes the A providing a node lockout which prevents nodes from gaining node, the X nodes to be either fenced or unfenced from said access to the System bus. S nodes, and the S nodes thus fenced or unfenced. The A U.S. Pat. No. 5,386,551 issued Jan. 31, 1995 to Chikira et 35 node sends messages to the F nodes instructing each F node al. for DEFERRED RESOURCES RECOVERY discloses a to begin the fence/unfence operation for that node. The resources management System for fencing all autonomous fence/unfence operation includes a first phase for proposing resources, and a protocol is followed to allow all activities changes in the fence map reflecting the fencing or unfencing in a work Stream to be completed before all fencing is of Said X nodes, a Second phase for refreshing the local map removed.

Load more