Front cover IBM TotalStorage: SAN Product, Design, and Optimization Guide
Use real-life case studies to learn SAN designs
Understand channel extension solutions
Learn best practices for your SAN design
Jon Tate Jim Kelly Pauli Rämö Leos Stehlik ibm.com/redbooks
International Technical Support Organization
IBM TotalStorage: SAN Product, Design, and Optimization Guide
September 2005
SG24-6384-01 Note: Before using this information and the product it supports, read the information in “Notices” on page xxxv.
Second Edition (July 2005)
This edition applies to the products described within.
© Copyright International Business Machines Corporation 2005. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents
Figures ...... xxvii
Notices ...... xxxv Trademarks ...... xxxvi
Preface ...... xxxvii The team that wrote this redbook...... xxxvii Become a published author ...... xli Comments welcome...... xli
Chapter 1. Introduction...... 1 1.1 Beyond disaster recovery ...... 2 1.1.1 Whose responsibility is it?...... 3 1.1.2 The Internet brings increased risks ...... 4 1.1.3 Planning for business continuity ...... 5 1.2 Using a SAN for business continuance ...... 6 1.2.1 SANs and business continuance ...... 7 1.3 SAN business benefits ...... 8 1.3.1 Storage consolidation and sharing of resources ...... 8 1.3.2 Data sharing ...... 10 1.3.3 Nondisruptive scalability for growth...... 11 1.3.4 Improved backup and recovery...... 11 1.3.5 High performance ...... 13 1.3.6 High availability server clustering ...... 13 1.3.7 Improved disaster tolerance ...... 14 1.3.8 Allow selection of best of breed storage ...... 14 1.3.9 Ease of data migration ...... 14 1.3.10 Reduced total costs of ownership ...... 15 1.3.11 Storage resources match e-business enterprise needs ...... 15
Chapter 2. SAN fabric components ...... 17 2.1 Fibre Channel technology sub-components ...... 18 2.2 Fibre Channel interconnects ...... 18 2.2.1 Fibre Channel transmission rates ...... 19 2.2.2 Small Form Factor Pluggable Module...... 19 2.2.3 Gigabit Interface Converters ...... 22 2.2.4 Gigabit Link Modules...... 23 2.2.5 Media Interface Adapters ...... 24 2.2.6 1x9 transceivers ...... 25
© Copyright IBM Corp. 2005. All rights reserved. iii 2.2.7 Fibre Channel adapter cable...... 25 2.2.8 Host Bus Adapters ...... 26 2.2.9 Loop Switches...... 27 2.2.10 Switches ...... 28 2.2.11 Directors ...... 29 2.2.12 Fibre Channel routers ...... 32 2.2.13 Switch, director and router features ...... 32 2.2.14 Test equipment ...... 34
Chapter 3. SAN features ...... 39 3.1 Fabric implementation ...... 40 3.1.1 Blocking...... 41 3.1.2 Ports ...... 42 3.1.3 Fabric topologies...... 44 3.1.4 Point-to-point...... 45 3.1.5 Arbitrated loop...... 46 3.1.6 Switched fabric ...... 48 3.1.7 Inter Switch Links ...... 51 3.1.8 Adding new devices ...... 58 3.2 Classes of service ...... 59 3.2.1 Class 1 ...... 60 3.2.2 Class 2 ...... 60 3.2.3 Class 3 ...... 60 3.2.4 Class 4 ...... 61 3.2.5 Class 5 ...... 61 3.2.6 Class 6 ...... 61 3.2.7 Class F ...... 62 3.2.8 Communication ...... 62 3.3 Buffers ...... 62 3.4 Addressing ...... 66 3.4.1 World Wide Name ...... 66 3.4.2 WWN and WWPN ...... 67 3.4.3 24-bit port address ...... 70 3.4.4 Loop address ...... 72 3.4.5 FICON addressing ...... 72 3.5 Fabric services ...... 77 3.5.1 Management services ...... 78 3.5.2 Time services ...... 78 3.5.3 Name services ...... 78 3.5.4 Login services ...... 78 3.5.5 Registered State Change Notification ...... 78 3.6 Logins ...... 78 3.6.1 Fabric login ...... 79
iv IBM TotalStorage: SAN Product, Design, and Optimization Guide 3.6.2 Port login ...... 79 3.6.3 Process login...... 80 3.7 Path routing mechanisms ...... 80 3.7.1 Spanning tree ...... 80 3.7.2 Fabric Shortest Path First ...... 81 3.7.3 What is FSPF? ...... 82 3.7.4 How does FSPF work? ...... 84 3.7.5 How does FSPF help? ...... 84 3.7.6 What happens when there is more than one shortest path?...... 84 3.7.7 Can FSPF cause any problems? ...... 86 3.7.8 FC-PH-2 and speed ...... 88 3.7.9 1, 2 and 4 Gbps and beyond...... 90 3.7.10 FC-PH, FC-PH-2, and FC-PH-3 ...... 91 3.7.11 Layers ...... 93 3.8 Zoning ...... 96 3.8.1 Hardware zoning ...... 98 3.8.2 Software zoning ...... 101 3.9 Trunking ...... 104 3.9.1 Frame filtering ...... 106 3.9.2 Oversubscription ...... 106 3.9.3 Congestion ...... 107 3.9.4 Information units ...... 107 3.9.5 The movement of data ...... 107 3.9.6 Data encoding ...... 108 3.10 Ordered set, frames, sequences, and exchanges...... 111 3.10.1 Ordered set ...... 112 3.10.2 Frames ...... 113 3.10.3 Sequences ...... 113 3.10.4 Exchanges ...... 113 3.10.5 Frames ...... 114 3.10.6 In order and out of order ...... 116 3.10.7 Latency ...... 116 3.10.8 Heterogeneousness ...... 117 3.10.9 Open Fiber Control ...... 117 3.11 Fibre Channel Arbitrated Loop (FC-AL) ...... 118 3.11.1 Loop protocols...... 118 3.11.2 Fairness algorithm...... 121 3.11.3 Loop addressing ...... 121 3.11.4 Private devices on NL_Ports...... 121 3.12 Factors and considerations ...... 124 3.12.1 Limits...... 124 3.12.2 Security ...... 125 3.12.3 Interoperability...... 126
Contents v 3.13 Standards ...... 127 3.14 SAN industry associations and organizations ...... 128 3.14.1 Storage Networking Industry Association ...... 128 3.14.2 Fibre Channel Industry Association ...... 129 3.14.3 SCSI Trade Association ...... 129 3.14.4 International Committee for Information Technology Standards. . 130 3.14.5 INCITS technical committee T11 ...... 130 3.14.6 Information Storage Industry Consortium ...... 130 3.14.7 Internet Engineering Task Force...... 131 3.14.8 American National Standards Institute ...... 131 3.14.9 Institute of Electrical and Electronics Engineers ...... 131 3.14.10 Distributed Management Task Force ...... 132 3.14.11 List of evolved Fibre Channel standards...... 132 3.15 SAN software management standards ...... 136 3.16 Standards-based management initiatives ...... 137 3.16.1 The Storage Management Initiative ...... 137 3.16.2 Open storage management with CIM ...... 138 3.16.3 CIM Object Manager ...... 138 3.16.4 Simple Network Management Protocol...... 140 3.16.5 Application Program Interface...... 141 3.16.6 In-band management ...... 141 3.16.7 Out-of-band management ...... 142 3.16.8 Service Location Protocol ...... 143 3.16.9 Tivoli Common Agent Services ...... 144 3.16.10 Managment of growing SANs ...... 145 3.16.11 Application management...... 146 3.16.12 Data management...... 147 3.16.13 Resource management...... 147 3.16.14 Network management ...... 147 3.16.15 Device Management ...... 149 3.16.16 Fabric management methods ...... 150 3.16.17 Common access methods...... 150 3.16.18 The SNIA Shared Storage Model ...... 161 3.16.19 Long distance links ...... 162 3.16.20 Backup windows ...... 162 3.16.21 Restore and disaster recovery time ...... 164 3.17 IBM Eserver zSeries and S/390 ...... 164 3.17.1 IBM Eserver pSeries ...... 165 3.17.2 IBM Eserver xSeries...... 165 3.17.3 IBM Eserver iSeries ...... 166 3.18 Security ...... 166 3.18.1 Fibre Channel security ...... 167 3.19 Security mechanisms ...... 168
vi IBM TotalStorage: SAN Product, Design, and Optimization Guide 3.19.1 Encryption ...... 168 3.19.2 Authorization database ...... 172 3.19.3 Authentication database ...... 172 3.19.4 Authentication mechanisms ...... 172 3.19.5 Accountability ...... 172 3.19.6 Zoning ...... 172 3.19.7 Isolating the fabric ...... 173 3.19.8 LUN masking...... 173 3.19.9 Fibre Channel Authentication Protocol ...... 174 3.19.10 Persistent binding ...... 174 3.19.11 Port binding ...... 174 3.19.12 Port type controls ...... 174 3.19.13 IP security ...... 175 3.20 Best practices ...... 175 3.21 Virtualization ...... 176 3.22 Solutions ...... 177 3.23 Emerging technologies ...... 179 3.24 iSCSI ...... 179 3.25 iFCP ...... 180 3.26 FCIP ...... 181
Chapter 4. SAN disciplines...... 183 4.1 Floor plan ...... 184 4.1.1 SAN inventory ...... 184 4.1.2 Cable types and cable routing...... 185 4.1.3 Planning considerations and recommendations ...... 189 4.1.4 Structured cabling ...... 191 4.1.5 Data center fiber cabling options...... 191 4.1.6 Cabinets ...... 194 4.1.7 Phone sockets...... 195 4.1.8 Environmental considerations ...... 196 4.1.9 Location...... 196 4.1.10 Sequence for design ...... 196 4.2 Naming conventions ...... 198 4.2.1 Servers ...... 198 4.2.2 Storage devices ...... 199 4.2.3 Cabinets ...... 200 4.2.4 Trunk cables ...... 200 4.2.5 SAN fabric components ...... 200 4.2.6 Cable labels ...... 201 4.2.7 Zones ...... 202 4.3 Documentation ...... 202 4.4 Power-on sequence ...... 203
Contents vii 4.5 Security ...... 204 4.5.1 General ...... 204 4.5.2 Physical access...... 205 4.5.3 Remote access ...... 205 4.6 Education ...... 207 4.6.1 SAN administrators ...... 207 4.6.2 Skills ...... 208 4.6.3 Certification ...... 208
Chapter 5. Host Bus Adapters ...... 211 5.1 Selection criteria ...... 212 5.1.1 IBM supported HBAs...... 212 5.1.2 Special features ...... 212 5.1.3 Quantity of servers ...... 212 5.1.4 HBA parameter settings ...... 213
Chapter 6. SAN design considerations ...... 215 6.1 What do you want to achieve with a SAN? ...... 216 6.1.1 Storage consolidation ...... 216 6.1.2 High availability solutions ...... 216 6.1.3 LAN-free backup ...... 217 6.1.4 Server-free backup ...... 217 6.1.5 Server-less backup ...... 217 6.1.6 Disaster recovery ...... 217 6.1.7 Flexibility ...... 218 6.1.8 Goals...... 218 6.1.9 Benefits expected ...... 219 6.1.10 TCO/ROI ...... 219 6.1.11 Investment protection ...... 219 6.2 Existing resources needs and planned growth ...... 219 6.2.1 Collecting the data about existing resources ...... 219 6.2.2 Planning for future needs ...... 221 6.2.3 Platforms and storage ...... 221 6.3 Select the core design for your environment...... 222 6.3.1 Selecting the topology...... 223 6.3.2 Scalability ...... 224 6.3.3 Performance ...... 224 6.3.4 Redundancy and resiliency ...... 226 6.4 Host connectivity and Host Bus Adapters ...... 230 6.4.1 Selection criteria ...... 230 6.4.2 Multipathing software ...... 231 6.4.3 Storage sizing ...... 234 6.4.4 Management software...... 234
viii IBM TotalStorage: SAN Product, Design, and Optimization Guide 6.5 Director class or switch technology ...... 235 6.6 General considerations ...... 252 6.6.1 Ports and ASICs ...... 253 6.6.2 Class F ...... 253 6.6.3 Domain IDs ...... 253 6.6.4 Zoning ...... 253 6.6.5 Physical infrastructure and distance ...... 254 6.7 Interoperability issues in the design ...... 255 6.7.1 Interoperability...... 255 6.7.2 Standards ...... 255 6.7.3 Legacy equipment and technology ...... 256 6.7.4 Heterogeneous support...... 256 6.7.5 Certification and support ...... 257 6.7.6 OEM/IBM mixes ...... 257 6.8 Pilot and test the design ...... 258
Chapter 7. IBM TotalStorage SAN Switch L10 ...... 259 7.1 Product description ...... 260 7.1.1 Specifications ...... 261 7.1.2 Management ...... 261 7.2 Fibre Channel Arbitrated Loop (FC-AL) ...... 262 7.3 Loop switch operation ...... 262 7.4 FC-AL Active Trunking ...... 264 7.5 Interoperability...... 264 7.5.1 Connecting the L10 to a fabric switch ...... 264 7.6 Managing Streaming Data Flows ...... 265 7.7 Part Numbers ...... 265
Chapter 8. IBM TotalStorage SAN b-type family...... 267 8.1 Product description ...... 268 8.1.1 IBM TotalStorage SAN16B-2 fabric switch ...... 268 8.1.2 IBM TotalStorage SAN32B-2 fabric switch ...... 269 8.1.3 IBM TotalStorage SAN Switch M14 ...... 271 8.1.4 IBM TotalStorage SAN256B director ...... 276 8.1.5 IBM TotalStorage SAN 16B-R...... 281 8.2 Switch features ...... 285 8.2.1 Advanced WEB TOOLS ...... 285 8.2.2 Advanced Performance Monitoring...... 286 8.2.3 Advanced Security ...... 286 8.2.4 Advanced Zoning ...... 286 8.2.5 Extended Fabric ...... 286 8.2.6 Fabric Manager ...... 287 8.2.7 Fabric Watch ...... 287
Contents ix 8.2.8 ISL Trunking ...... 287 8.2.9 Dynamic Path Selection ...... 287 8.2.10 Remote Switch ...... 287 8.3 Advanced Security ...... 288 8.3.1 Host-to-Switch Domain ...... 288 8.3.2 Administrator-to-Security Management Domain ...... 289 8.3.3 Security Management-to-Fabric Domain ...... 289 8.3.4 Switch-to-Switch Domain ...... 289 8.3.5 Fabric configuration servers ...... 289 8.3.6 Management access controls ...... 290 8.3.7 Device connection controls ...... 290 8.3.8 Switch connection controls ...... 290 8.3.9 Fibre Channel Authentication Protocol ...... 291 8.4 ISL ...... 291 8.4.1 ISLs without trunking or dynamic path selection ...... 292 8.4.2 ISLs with trunking ...... 293 8.4.3 Dynamic Path Selection ...... 294 8.4.4 Switch count ...... 296 8.4.5 Distributed fabrics ...... 297 8.5 FICON ...... 300 8.5.1 FICON servers ...... 300 8.5.2 Intermixed FICON and FCP ...... 300 8.5.3 Cascaded FICON support...... 300 8.6 Fabric management ...... 301 8.6.1 User accounts and Role-Based Access Control ...... 301 8.6.2 WEB TOOLS...... 302 8.6.3 Advanced Performance Monitoring...... 304 8.6.4 Fabric Watch ...... 306 8.6.5 Fabric Manager ...... 308 8.6.6 SCSI Enclosure Services ...... 310 8.7 Zoning ...... 312 8.7.1 Preparing to use zoning ...... 313 8.7.2 Increasing availability ...... 314 8.7.3 Advanced zoning terminology ...... 314 8.7.4 Zoning types ...... 316 8.7.5 Zone configuration ...... 317 8.7.6 Zoning administration ...... 318 8.8 Switch interoperability ...... 319
Chapter 9. IBM TotalStorage SAN m-type family ...... 321 9.1 IBM SAN components ...... 322 9.2 Product description ...... 323 9.2.1 Machine type and model number changes ...... 324
x IBM TotalStorage: SAN Product, Design, and Optimization Guide 9.2.2 IBM TotalStorage SAN12M-1 Fabric Switch ...... 324 9.2.3 IBM TotalStorage SAN16M-2 Fabric Switch ...... 326 9.2.4 IBM TotalStorage SAN24M-1 Fabric Switch ...... 328 9.2.5 IBM TotalStorage SAN32M-1 Fabric Switch ...... 330 9.2.6 IBM TotalStorage SAN32M-2 Fabric Switch ...... 333 9.2.7 IBM TotalStorage SAN140M Director ...... 335 9.2.8 IBM TotalStorage SAN256M director ...... 344 9.2.9 IBM TotalStorage SAN04M-R ...... 353 9.2.10 IBM TotalStorage SAN16M-R ...... 357 9.2.11 IBM eServer BladeCenter switch module ...... 361 9.2.12 IBM TotalStorage SANC40M ...... 362 9.3 Fabric planning ...... 362 9.3.1 Dual fabrics and directors ...... 363 9.3.2 Server-to-storage ratio ...... 363 9.3.3 ISLs ...... 363 9.3.4 Load balancing ...... 364 9.3.5 Principal switch selection ...... 364 9.3.6 Special considerations ...... 367 9.3.7 Open Fabric ...... 368 9.3.8 Supported devices, servers and HBAs ...... 368 9.4 Features of directors and switches ...... 368 9.4.1 Element Manager ...... 369 9.4.2 FICON Management Server ...... 369 9.4.3 Full Volatility Option ...... 369 9.4.4 Open Systems Management Server ...... 369 9.4.5 Open Trunking ...... 370 9.4.6 Preferred Path...... 373 9.4.7 SANtegrity Binding ...... 373 9.4.8 Feature activation ...... 374 9.5 FICON support ...... 375 9.6 Fabric management ...... 375 9.6.1 In-band management ...... 375 9.6.2 Out-of-band management ...... 376 9.6.3 EFC Server ...... 377 9.6.4 EFC Manager ...... 382 9.6.5 Troubleshooting ...... 384 9.6.6 SANpilot interface ...... 385 9.6.7 Command line interface ...... 386 9.6.8 SNMP ...... 387 9.7 Zoning ...... 387 9.7.1 Configuring zones ...... 388 9.7.2 Zoning and LUN masking ...... 390 9.7.3 Persistent binding ...... 391
Contents xi 9.7.4 Blocking a port ...... 391 9.7.5 Merging fabrics ...... 391 9.8 Performance ...... 392 9.9 Security ...... 393 9.9.1 Restricting access to those that need it ...... 393 9.9.2 Controlling access at the switch ...... 394 9.9.3 SANtegrity Authentication ...... 394 9.10 Licensing ...... 394 9.10.1 Warranties...... 395
Chapter 10. Cisco switches and directors ...... 397 10.1 Product description ...... 398 10.1.1 MDS 9120 and 9140 Multilayer Switches ...... 399 10.1.2 MDS 9216A Multilayer Switch...... 400 10.1.3 Cisco MDS 9216i Multilayer Switch ...... 403 10.1.4 MDS 9506 Multilayer Director ...... 405 10.1.5 MDS 9509 Multilayer Director ...... 406 10.2 MDS 9000 family features ...... 410 10.2.1 Supported attachments ...... 410 10.2.2 Port addressing and port modes ...... 410 10.2.3 Fibre Channel IDs and Persistent FC_ID ...... 411 10.2.4 Supported port types...... 412 10.3 Supervisor module ...... 415 10.3.1 Control and management ...... 415 10.3.2 Optional modules ...... 417 10.4 MDS 9000 SAN-OS 2.1...... 423 10.5 Fabric management ...... 424 10.5.1 Cisco MDS 9000 Fabric Manager ...... 424 10.5.2 In-band management and out-of-band management ...... 425 10.5.3 Using the setup routine ...... 427 10.5.4 Controlling administrator access with users and roles ...... 428 10.5.5 Accessing Cisco Fabric Manager ...... 428 10.5.6 Connecting to a supervisor module...... 429 10.5.7 Licensed feature packages ...... 429 10.5.8 PortChanneling ...... 433 10.5.9 Virtual SAN (VSAN) ...... 434 10.5.10 Trunking ...... 442 10.5.11 Quality of Service (QoS) ...... 443 10.5.12 Fibre Channel Congestion Control (FCC) ...... 444 10.5.13 Call home ...... 446 10.6 Security management ...... 446 10.6.1 Switch access security ...... 446 10.6.2 User authentication ...... 446
xii IBM TotalStorage: SAN Product, Design, and Optimization Guide 10.7 Troubleshooting features...... 449 10.7.1 Troubleshooting with Fabric Manager...... 449 10.7.2 Monitoring network traffic using SPAN ...... 451 10.7.3 Monitoring traffic using Fibre Channel analyzers ...... 456 10.8 FICON ...... 458 10.9 Zoning ...... 459 10.9.1 Zone features ...... 460 10.9.2 Zone membership ...... 461 10.9.3 Configuring a zone ...... 461 10.9.4 Zone enforcement ...... 461 10.9.5 Zone sets ...... 462 10.9.6 Default zone ...... 462 10.9.7 LUN zoning ...... 463 10.10 Switch interoperability mode ...... 463 10.10.1 Interoperability matrix ...... 465
Chapter 11. General solutions ...... 467 11.1 Objectives of SAN implementation ...... 468 11.2 Servers and host bus adapters ...... 468 11.2.1 Path and dual-redundant HBA ...... 469 11.2.2 Multiple paths ...... 469 11.3 Software ...... 470 11.4 Storage ...... 470 11.5 Fabric ...... 472 11.5.1 The fabric-is-a-switch approach ...... 472 11.5.2 The fabric-is-a-network approach ...... 473 11.6 High-level fabric design ...... 473 11.7 Definitions ...... 477 11.7.1 Port formulas...... 479 11.8 Our solutions ...... 480
Chapter 12. SAN event data gathering tips...... 481 12.1 Overview ...... 482 12.2 Hosts ...... 482 12.2.1 AIX ...... 482 12.2.2 HP-UX ...... 483 12.2.3 Linux ...... 484 12.2.4 Microsoft Windows ...... 485 12.2.5 Novell NetWare ...... 486 12.2.6 SUN Solaris...... 487 12.3 Switches ...... 488 12.3.1 SAN Switch 2031/2032 (McDATA) ...... 488 12.3.2 SAN Switch 2062 (Cisco) ...... 489
Contents xiii 12.3.3 SAN Switch 2109 (Brocade) ...... 489 12.3.4 SAN Switch 2042 and 2045 (CNT) ...... 490 12.4 Storage ...... 491 12.4.1 IBM TotalStorage DS Family disk subsystem ...... 491 12.4.2 IBM TotalStorage Enterprise Storage Server ...... 492 12.4.3 3583 Tape Library and SDGM ...... 492
Chapter 13. IBM TotalStorage SAN Switch L10 solutions ...... 495 13.1 Performance solutions...... 496 13.2 Availability solutions ...... 499 13.2.1 Dual loop ...... 499 13.3 Clustering solutions ...... 502 13.3.1 Two-node clustering ...... 502
Chapter 14. IBM TotalStorage SAN b-type family solutions ...... 505 14.1 Performance solutions...... 506 14.2 Availability solutions ...... 510 14.2.1 Single fabric ...... 511 14.2.2 Dual fabric ...... 514 14.3 Clustering solutions ...... 516 14.3.1 Two-node clustering ...... 516 14.3.2 Multi-node clustering ...... 519 14.4 Secure solutions ...... 522
Chapter 15. IBM TotalStorage SAN m-type family solutions...... 525 15.1 Performance solutions...... 526 15.1.1 Components ...... 527 15.1.2 Checklist ...... 528 15.1.3 Performance ...... 528 15.1.4 Scalability ...... 528 15.1.5 Availability ...... 529 15.1.6 Security ...... 530 15.1.7 What if failure scenarios ...... 530 15.2 Availability solutions ...... 530 15.2.1 Dual fabric ...... 530 15.2.2 Components ...... 531 15.2.3 Checklist ...... 532 15.2.4 Performance ...... 532 15.2.5 Scalability ...... 532 15.2.6 Security ...... 532 15.2.7 Availability ...... 532 15.2.8 What if failure scenarios ...... 532 15.3 Dual sites...... 533 15.3.1 Components ...... 534 xiv IBM TotalStorage: SAN Product, Design, and Optimization Guide 15.3.2 Checklist ...... 534 15.3.3 Performance ...... 535 15.3.4 Scalability ...... 535 15.3.5 Security ...... 535 15.3.6 What if failure scenarios ...... 535 15.4 Clustering solutions ...... 536 15.4.1 Components ...... 537 15.4.2 Checklist ...... 537 15.4.3 Performance ...... 538 15.4.4 Scalability ...... 538 15.4.5 Security ...... 538 15.4.6 What if failure scenarios ...... 539 15.5 Secure solutions ...... 540 15.5.1 Components ...... 541 15.5.2 Checklist ...... 541 15.5.3 Security ...... 542 15.5.4 Performance ...... 543 15.5.5 Scalability ...... 543 15.5.6 What if security scenarios ...... 543 15.6 Loop solutions ...... 544 15.6.1 Components ...... 546 15.6.2 Checklist ...... 547 15.6.3 Performance ...... 547 15.6.4 Scalability ...... 547 15.6.5 Security ...... 548 15.6.6 What if failure scenarios ...... 548 15.6.7 Switch capable tape drives ...... 549
Chapter 16. Cisco solutions ...... 551 16.1 Performance solutions...... 552 16.1.1 Components ...... 553 16.1.2 Checklist ...... 554 16.1.3 Performance ...... 554 16.1.4 Scalability ...... 555 16.1.5 Availability ...... 556 16.1.6 Security ...... 556 16.1.7 What if failure scenarios ...... 556 16.2 Availability solutions ...... 557 16.2.1 Dual fabric ...... 557 16.2.2 Dual sites ...... 560 16.3 Clustering solutions ...... 564 16.3.1 Two-node clustering ...... 565 16.3.2 Multi-node clustering ...... 567
Contents xv 16.4 Secure solutions ...... 570 16.4.1 Zoning security solution ...... 570 16.5 Loop solutions ...... 573 16.5.1 Using the translative loop port...... 574
Chapter 17. Case studies ...... 577 17.1 Case study 1: Company One ...... 578 17.1.1 Company one profile ...... 578 17.1.2 High-level business requirements ...... 578 17.1.3 Current infrastructure ...... 578 17.1.4 Detailed requirements ...... 578 17.1.5 Analysis of ports and throughput...... 579 17.2 Case study 2: Company Two ...... 581 17.2.1 Company profile ...... 581 17.2.2 High-level business requirements ...... 581 17.2.3 Current infrastructure ...... 581 17.2.4 Detailed requirements ...... 583 17.2.5 Analysis of ports and throughput...... 584 17.3 Case study 3: ElectricityFirst company ...... 589 17.3.1 Company profile ...... 589 17.3.2 High level business requirements ...... 590 17.3.3 Infrastructure requirements ...... 590 17.3.4 Analysis of ports and throughput...... 591 17.4 Case Study 4: Company Four ...... 594 17.4.1 Company profile ...... 594 17.4.2 High-level business requirements ...... 594 17.4.3 Current infrastructure ...... 594 17.4.4 Detailed requirements ...... 596 17.4.5 Analysis of ports and throughput...... 597 17.5 Case study 5: Company Five ...... 599 17.5.1 Company profile ...... 599 17.5.2 High-level business requirements ...... 599 17.5.3 Current infrastructure ...... 600 17.5.4 Detailed requirements ...... 601 17.5.5 Analysis of ports and throughput...... 602 17.6 Case study 6: Company Six ...... 604 17.6.1 Company profile ...... 604 17.6.2 High-level business requirements ...... 604 17.6.3 Current infrastructure ...... 604 17.6.4 Detailed requirements ...... 605 17.6.5 Analysis of ports and throughput...... 606
Chapter 18. IBM TotalStorage SAN b-type case study solutions ...... 609
xvi IBM TotalStorage: SAN Product, Design, and Optimization Guide 18.1 Case study 1: Company One ...... 610 18.1.1 Switch design ...... 610 18.1.2 Performance ...... 617 18.1.3 Availability ...... 617 18.1.4 Security ...... 617 18.1.5 Distance ...... 618 18.1.6 Scalability ...... 618 18.1.7 What if failure scenarios ...... 618 18.1.8 Manageability and management software ...... 619 18.1.9 Core switch design ...... 620 18.2 Case study 2: Company Two ...... 623 18.2.1 Design ...... 623 18.2.2 Performance ...... 626 18.2.3 Availability ...... 629 18.2.4 Security ...... 629 18.2.5 Distance ...... 630 18.2.6 Scalability ...... 630 18.2.7 What if failure scenarios ...... 631 18.2.8 Manageability and management software ...... 631 18.3 Case study 3: ElectricityFirst ...... 634 18.3.1 Solution design ...... 634 18.3.2 Performance ...... 637 18.3.3 Availability ...... 637 18.3.4 Security ...... 637 18.3.5 Distance ...... 638 18.3.6 Scalability ...... 638 18.3.7 What if failure scenarios ...... 638 18.3.8 Manageability and management software ...... 639 18.4 .Case study 4: Company Four...... 639 18.4.1 Design ...... 639 18.4.2 Performance ...... 641 18.4.3 Availability ...... 641 18.4.4 Security ...... 641 18.4.5 Distance ...... 642 18.4.6 Scalability ...... 642 18.4.7 What if failure scenarios ...... 642 18.4.8 Manageability and management software ...... 643 18.5 Case study 5: Company Five ...... 643 18.5.1 Design ...... 643 18.5.2 Performance ...... 645 18.5.3 Availability ...... 645 18.5.4 Security ...... 645 18.5.5 Distance ...... 646
Contents xvii 18.5.6 Scalability ...... 646 18.5.7 What if failure scenarios ...... 646 18.5.8 Manageability and management software ...... 647 18.6 Case study 6: Company Six ...... 647 18.6.1 Design ...... 647 18.6.2 Performance ...... 651 18.6.3 Availability ...... 651 18.6.4 Security ...... 651 18.6.5 Distance ...... 651 18.6.6 Scalability ...... 652 18.6.7 What if failure scenarios ...... 652 18.6.8 Manageability and management software ...... 653
Chapter 19. IBM TotalStorage SAN m-type case study solutions...... 655 19.1 Case study 1: Company One ...... 656 19.1.1 Design using Directors ...... 656 19.1.2 Performance ...... 660 19.1.3 Availability ...... 660 19.1.4 Security ...... 660 19.1.5 Distance ...... 661 19.1.6 Scalability ...... 661 19.1.7 What if failure scenarios ...... 661 19.1.8 Manageability and management software ...... 662 19.1.9 Design using switches...... 663 19.1.10 Performance ...... 667 19.1.11 Availability ...... 667 19.1.12 Security ...... 668 19.1.13 Distance ...... 668 19.1.14 Scalability ...... 668 19.1.15 What if failure scenarios ...... 668 19.1.16 Manageability and management software ...... 669 19.2 Case study 2: Company Two ...... 670 19.2.1 Design ...... 670 19.2.2 Performance ...... 673 19.2.3 Availability ...... 674 19.2.4 Security ...... 674 19.2.5 Distance ...... 675 19.2.6 Scalability ...... 675 19.2.7 What if failure scenarios ...... 675 19.2.8 Manageability and management software ...... 676 19.3 Case study 3: ElectricityFirst ...... 677 19.3.1 Solution design ...... 677 19.3.2 Performance ...... 680
xviii IBM TotalStorage: SAN Product, Design, and Optimization Guide 19.3.3 Availability ...... 680 19.3.4 Security ...... 680 19.3.5 Distance ...... 681 19.3.6 Scalability ...... 681 19.3.7 What if failure scenarios ...... 681 19.3.8 Manageability and management software ...... 682 19.4 Case study 4: Company Four ...... 682 19.4.1 Design ...... 682 19.4.2 Performance ...... 684 19.4.3 Availability ...... 684 19.4.4 Security ...... 684 19.4.5 Distance ...... 685 19.4.6 Scalability ...... 685 19.4.7 What if failure scenarios ...... 685 19.4.8 Manageability and management software ...... 686 19.5 Case study 5: Company Five ...... 687 19.5.1 Design ...... 687 19.5.2 Performance ...... 688 19.5.3 Availability ...... 689 19.5.4 Security ...... 689 19.5.5 Distance ...... 689 19.5.6 Scalability ...... 690 19.5.7 What if failure scenarios ...... 690 19.5.8 Manageability and management software ...... 690 19.6 Case study 6: Company Six ...... 691 19.6.1 Design ...... 691 19.6.2 Performance ...... 695 19.6.3 Availability ...... 696 19.6.4 Security ...... 696 19.6.5 Distance ...... 696 19.6.6 Scalability ...... 697 19.6.7 What if failure scenarios ...... 697 19.6.8 Manageability and management software ...... 697
Chapter 20. Cisco case study solutions ...... 699 20.1 Case Study 1: Company One ...... 700 20.1.1 Design using directors...... 700 20.1.2 Performance ...... 704 20.1.3 Availability ...... 704 20.1.4 Security ...... 704 20.1.5 Distance ...... 705 20.1.6 Scalability ...... 705 20.1.7 What if failure scenarios ...... 705
Contents xix 20.1.8 Manageability and management software ...... 706 20.1.9 Design using switches...... 707 20.1.10 Performance ...... 711 20.1.11 Availability ...... 711 20.1.12 Security ...... 712 20.1.13 Distance ...... 712 20.1.14 Scalability ...... 712 20.1.15 What if failure scenarios ...... 712 20.1.16 Manageability and management software ...... 713 20.2 Case study 2: Company Two ...... 714 20.2.1 Design ...... 714 20.2.2 Performance ...... 717 20.2.3 Availability ...... 718 20.2.4 Security ...... 718 20.2.5 Distance ...... 719 20.2.6 Scalability ...... 719 20.2.7 What if failure scenarios ...... 719 20.2.8 Manageability and management software ...... 720 20.3 Case study 3: ElectricityFirst ...... 720 20.3.1 Solution design ...... 720 20.3.2 Performance ...... 722 20.3.3 Availability ...... 723 20.3.4 Security ...... 723 20.3.5 Distance ...... 723 20.3.6 Scalability ...... 723 20.3.7 What if failure scenarios ...... 724 20.3.8 Manageability and management software ...... 725 20.4 Case study 4: Company Four ...... 725 20.4.1 Design ...... 725 20.4.2 Performance ...... 727 20.4.3 Availability ...... 727 20.4.4 Security ...... 727 20.4.5 Distance ...... 728 20.4.6 Scalability ...... 728 20.4.7 What if failure scenarios ...... 728 20.4.8 Manageability and management software ...... 729 20.5 Case study 5: Company Five ...... 729 20.5.1 Design ...... 730 20.5.2 Performance ...... 731 20.5.3 Availability ...... 732 20.5.4 Security ...... 732 20.5.5 Distance ...... 732 20.5.6 Scalability ...... 732
xx IBM TotalStorage: SAN Product, Design, and Optimization Guide 20.5.7 What if failure scenarios ...... 732 20.5.8 Manageability and management software ...... 733 20.6 Case study 6: Company Six ...... 734 20.6.1 Design ...... 734 20.6.2 Performance ...... 738 20.6.3 Availability ...... 739 20.6.4 Security ...... 739 20.6.5 Distance ...... 739 20.6.6 Scalability ...... 740 20.6.7 What if failure scenarios ...... 740 20.6.8 Manageability and management software ...... 740
Chapter 21. Channel extension concepts ...... 743 21.1 Channel extenders ...... 743 21.2 Amplifiers...... 744 21.3 Repeaters ...... 744 21.4 Multiplexers ...... 744 21.5 Time-Division Multiplexers ...... 745 21.6 Wave Division Multiplexing ...... 746 21.6.1 Coarse Wave Division Multiplexing (CWDM) ...... 746 21.6.2 Dense Wave Division Multiplexing (DWDM) ...... 747 21.6.3 DWDM components ...... 749 21.6.4 Optical add/drop multiplexers ...... 751 21.7 DWDM topologies ...... 752 21.7.1 Point-to-point...... 752 21.7.2 Linear ...... 753 21.7.3 Ring...... 753 21.8 Factors that affect distance ...... 757 21.8.1 Terminology ...... 758 21.8.2 Protocol definitions ...... 759 21.8.3 Light or link budget ...... 761 21.8.4 Buffer credits ...... 762 21.8.5 Fiber quality...... 763 21.8.6 Cable types ...... 763 21.8.7 Droop ...... 765 21.8.8 Latency ...... 767 21.8.9 Bandwidth sizing ...... 767 21.8.10 Hops ...... 768 21.8.11 Physical location of repeaters ...... 769 21.8.12 Standards ...... 769
Chapter 22. IBM TotalStorage SAN b-type family channel extension solutions ...... 771
Contents xxi 22.1 Brocade-compatible channel extension devices ...... 771 22.1.1 Cisco channel extension devices ...... 772 22.1.2 ADVA FSP 2000 channel extension devices ...... 772 22.1.3 Ciena CN 2000 channel extension devices ...... 773 22.1.4 Nortel Optical Metro 5200 ...... 773 22.2 Consolidation to remote disk less than 10Km away ...... 774 22.2.1 Buffer credits ...... 775 22.2.2 Do we have enough ISLs and enough ISL bandwidth? ...... 779 22.2.3 Cabling and interface issues ...... 779 22.3 Business continuance ...... 780 22.4 Synchronous replication up to 10 km apart ...... 781 22.4.1 Buffer credits ...... 781 22.4.2 Do we have enough ISLs and enough ISL bandwidth? ...... 782 22.4.3 Cabling and interface issues ...... 782 22.5 Synchronous replication up to 300 Km apart ...... 783 22.5.1 Buffer credits ...... 784 22.5.2 Do we have enough ISLs and enough ISL bandwidth? ...... 785 22.5.3 Cabling and interface issues ...... 786 22.6 Multiple site ring DWDM example ...... 786 22.6.1 Buffer credits ...... 787 22.6.2 Do we have enough ISLs and enough ISL bandwidth? ...... 788 22.6.3 Cabling and interface issues ...... 788 22.7 Remote tape vaulting ...... 788 22.7.1 Buffer credits ...... 790 22.7.2 Do we have enough ISLs and enough ISL bandwidth? ...... 790 22.7.3 Cabling and interface issues ...... 791 22.8 Long distance disaster recovery over IP ...... 791 22.8.1 Customer environment and requirements...... 791 22.8.2 The solution...... 792 22.8.3 Normal operation...... 794 22.8.4 Failure scenarios...... 794
Chapter 23. IBM TotalStorage SAN m-type family channel extension solutions ...... 797 23.1 McDATA-compatible channel extension devices ...... 797 23.1.1 Cisco channel extension devices ...... 798 23.1.2 ADVA FSP 2000 channel extension devices ...... 798 23.1.3 Ciena CN 2000 channel extension devices ...... 799 23.1.4 Nortel Optical Metro 5200 ...... 799 23.2 Consolidation to remote disk less than 10Km away ...... 800 23.2.1 Buffer credits ...... 801 23.2.2 Do we have enough ISLs and enough ISL bandwidth? ...... 801 23.2.3 Cabling and interface issues ...... 801
xxii IBM TotalStorage: SAN Product, Design, and Optimization Guide 23.3 Business continuance ...... 802 23.4 Synchronous replication up to 10 km apart ...... 803 23.4.1 Buffer credits ...... 803 23.4.2 Do we have enough ISLs and enough ISL bandwidth? ...... 804 23.4.3 Cabling and interface issues ...... 804 23.5 Synchronous replication up to 300 Km apart ...... 805 23.5.1 Buffer credits ...... 806 23.5.2 Do we have enough ISLs and enough ISL bandwidth? ...... 807 23.5.3 Cabling and interface issues ...... 807 23.6 Multiple site ring DWDM example ...... 808 23.6.1 Buffer credits ...... 809 23.6.2 Do we have enough ISLs and enough ISL bandwidth? ...... 810 23.6.3 Cabling and interface issues ...... 810 23.7 Remote tape vaulting ...... 810 23.7.1 Buffer credits ...... 812 23.7.2 Do we have enough ISLs and enough ISL bandwidth? ...... 812 23.7.3 Cabling and interface issues ...... 812 23.8 Long distance disaster recovery over IP ...... 812 23.8.1 Customer environment and requirements...... 812 23.8.2 The solution...... 814 23.8.3 Normal operation...... 816 23.8.4 Failure scenarios...... 816
Chapter 24. Cisco channel extension solutions...... 819 24.1 Cisco channel extension devices ...... 819 24.1.1 Cisco MDS 90000 with CWDM transceivers...... 820 24.1.2 Cisco 2062-CW1 ...... 820 24.1.3 Cisco ONS 15530, 15540 ...... 821 24.2 Consolidation to remote disk less than 10Km away ...... 823 24.2.1 Buffer credits ...... 824 24.2.2 Do we have enough ISLs and enough ISL bandwidth? ...... 825 24.2.3 Cabling and interface issues ...... 825 24.2.4 Use of VSAN ...... 826 24.3 Business continuance ...... 827 24.4 Synchronous replication up to 10 km apart ...... 828 24.4.1 Buffer credits ...... 829 24.4.2 Do we have enough ISLs and enough ISL bandwidth? ...... 829 24.4.3 Cabling and interface issues ...... 830 24.4.4 Use of VSAN ...... 830 24.5 Synchronous replication up to 300 Km apart ...... 830 24.5.1 Buffer credits ...... 831 24.5.2 Do we have enough ISLs and enough ISL bandwidth? ...... 832 24.5.3 Cabling and interface issues ...... 832
Contents xxiii 24.5.4 Use of VSAN ...... 833 24.6 Multiple site ring DWDM example ...... 833 24.6.1 Buffer credits ...... 835 24.6.2 Do we have enough ISLs and enough ISL bandwidth? ...... 835 24.6.3 Cabling and interface issues ...... 835 24.6.4 Use of VSAN ...... 836 24.7 Remote tape vaulting ...... 836 24.7.1 Buffer credits ...... 838 24.7.2 Do we have enough ISLs and enough ISL bandwidth? ...... 838 24.7.3 Cabling and interface issues ...... 838 24.7.4 Use of VSAN ...... 838 24.8 Disaster recovery with FCIP ...... 839 24.8.1 Existing systems ...... 839 24.8.2 IT improvement objectives ...... 840 24.8.3 New technology deployed and DR site established ...... 841 24.8.4 Global Mirroring established to the DR site...... 843
Chapter 25. SAN best practices ...... 847 25.1 Scaling...... 848 25.1.1 How to scale easily ...... 848 25.1.2 How to avoid downtime ...... 848 25.1.3 Adding a switch or director ...... 849 25.1.4 Adding ISLs...... 850 25.1.5 Performance monitoring and reporting ...... 850 25.2 Know your workloads ...... 850 25.3 Port placement ...... 851 25.3.1 IBM TotalStorage b-type switches and directors...... 851 25.3.2 IBM TotalStorage m-type switches and directors ...... 852 25.3.3 Cisco switches and directors...... 853 25.4 WWNs ...... 853 25.5 Tools ...... 853 25.6 Documentation ...... 855 25.7 Configurations ...... 856 25.8 Avoiding common SAN setup errors ...... 856 25.9 Zoning ...... 857 25.9.1 General zoning recommendations ...... 857 25.9.2 IBM TotalStorage b-type switches and directors...... 857 25.9.3 IBM TotalStorage m-type switches and directors ...... 857 25.9.4 Cisco switches and directors...... 858
Glossary ...... 859
Related publications ...... 883 IBM Redbooks ...... 883 xxiv IBM TotalStorage: SAN Product, Design, and Optimization Guide Other resources ...... 883 Referenced Web sites ...... 884 How to get IBM Redbooks ...... 885 IBM Redbooks collections...... 885
Index ...... 887
Contents xxv xxvi IBM TotalStorage: SAN Product, Design, and Optimization Guide Figures
L-R: Jon, Pauli, Leos, and Jim ...... xxxix 1-1 Business outage causes ...... 6 1-2 Storage consolidation ...... 9 1-3 Logical storage consolidation...... 10 1-4 Loading the IP network ...... 12 1-5 SAN total storage solutions ...... 16 2-1 SFP Hot Pluggable optical transceiver ...... 19 2-2 Small Form Fixed pin-through-hole Transceiver ...... 20 2-3 SFF hot-pluggable transceiver (SFP) with LC connector fiber cable . . . 21 2-4 Dual SC fiber-optic plug connector ...... 22 2-5 Gigabit Interface Converter ...... 23 2-6 Gigabit Link Module ...... 24 2-7 Media Interface Adapter...... 24 2-8 1x9 transceivers...... 25 2-9 Fibre Channel adapter cable ...... 25 2-10 HBA ...... 26 2-11 Fibre Channel core and edge switches ...... 29 2-12 A diagram of a backplane and blades architecture ...... 31 2-13 Meshed topology switched fabric...... 34 2-14 Connecting an FC analyzer ...... 36 3-1 Cascading directors ...... 40 3-2 Non-blocking and blocking switching ...... 41 3-3 Fibre Channel port types...... 44 3-4 Point-to-point ...... 45 3-5 Arbitrated loop ...... 46 3-6 Sample switched fabric configuration ...... 49 3-7 Cascading in a switched fabric ...... 51 3-8 Parallel ISLs with low traffic ...... 52 3-9 Parallel ISLs with high traffic ...... 52 3-10 ISL Trunking...... 53 3-11 Four-switch fabric...... 55 3-12 Exchange-based Dynamic Path Selection...... 56 3-13 Adjacent FC devices ...... 64 3-14 World Wide Name addressing scheme ...... 67 3-15 WWN and WWPN ...... 68 3-16 WWN and WWPN entries in a name server table ...... 69 3-17 Fabric port address ...... 71 3-18 Ficon port addressing ...... 74
© Copyright IBM Corp. 2005. All rights reserved. xxvii 3-19 FICON single switch: Switched point-to-point link address ...... 75 3-20 FICON addressing for cascaded directors...... 76 3-21 Two cascaded director FICON addressing ...... 77 3-22 Fabric shortest path first ...... 82 3-23 FSPF calculates the route taking the least hops ...... 83 3-24 Other possible paths ...... 83 3-25 FSPF and round robin ...... 85 3-26 Oversubscription and congestion...... 86 3-27 Hops and their cost, speed ...... 87 3-28 Mixing 2 Gbps and 1 Gbps ...... 91 3-29 Fibre Channel layers ...... 94 3-30 Zoning ...... 97 3-31 An example of zoning ...... 98 3-32 Zoning based on the switch port number...... 99 3-33 Hardware zoning ...... 100 3-34 Zoning based on the devices’ WWNs ...... 102 3-35 Trunking ...... 105 3-36 8b/10b encoding logic ...... 109 3-37 Public loop implementation ...... 120 3-38 Arbitrated loop address translation ...... 122 3-39 CIMOM component structure...... 140 3-40 SAN management hierarchy ...... 145 3-41 Common Interface Model for SAN management ...... 146 3-42 Typical SAN environment ...... 148 3-43 Device management elements ...... 149 3-44 MIB tree ...... 154 3-45 FlashCopy-based backup combined with file-based backup ...... 163 4-1 Mode differences through the fiber optic cable ...... 186 4-2 Messy cabling, no cabinet, and no cable labels...... 195 6-1 Single fabric: Nonresilient ...... 227 6-2 Single fabric: Resilient ...... 228 6-3 Redundant fabric: Nonresilient...... 229 6-4 Redundant fabric: Resilient ...... 230 6-5 Multiple paths to the same LUN...... 232 6-6 Multipath in single fabric SAN ...... 233 6-7 Director class or switch dilemma ...... 235 6-8 External managing of director class product ...... 237 6-9 Ports on different blades ...... 238 6-10 Routes in director class product...... 238 6-11 Blade zoning ...... 239 6-12 Director class product versus full meshed switch fabric ...... 240 6-13 Director class 64 ports versus 64 ports switch fabric...... 241 6-14 Adding tapes to the director class SAN ...... 243
xxviii IBM TotalStorage: SAN Product, Design, and Optimization Guide 6-15 Switches with loop support ...... 244 6-16 ISL between two redundant fabrics ...... 245 6-17 Two director class products with FC-AL support solution ...... 246 6-18 Two director class products without FC-AL support ...... 247 6-19 Edge switch with only one connection ...... 248 6-20 One director class product without FC-AL support ...... 249 6-21 Two-switch solution with FC-AL support ...... 250 6-22 Two-switch solution without FC-AL support...... 251 6-23 Single switch solution with FC-AL support...... 252 7-1 IBM TotalStorage Switch L10...... 260 7-2 L10 example, as an alternative to iSCSI ...... 263 8-1 IBM TotalStorage SAN16B-2 fabric switch ...... 269 8-2 IBM TotalStorage SAN32B-2 fabric switch ...... 271 8-3 IBM TotalStorage SAN Switch M14 ...... 272 8-4 Port side of 2109-M14 ...... 275 8-5 2109-M14 Port card ...... 276 8-6 IBM TotalStorage SAN256B director ...... 278 8-7 IBM TotalStorage SAN256B director 256-port numbering scheme . . . 281 8-8 IBM TotalStorage SAN 16B-R ...... 282 8-9 Parallel ISLs without trunking...... 292 8-10 2109 ISL trunking...... 293 8-11 Dynamic Path Selection in core-to-edge fabrics ...... 296 8-12 Extended Fabrics feature using dark fiber and DWDM ...... 299 8-13 Remote Switch feature using ATM ...... 300 8-14 SES management ...... 311 8-15 Zoning with the IBM TotalStorage b-type switches ...... 313 8-16 Overlapping zones ...... 315 9-1 SAN12M-1 ...... 324 9-2 SAN24M-1 ...... 328 9-3 SAN32M-1 ...... 330 9-4 SAN140M...... 336 9-5 SAN140M front port map ...... 338 9-6 SAN140M rear port map ...... 339 9-7 SAN140M front view ...... 341 9-8 McDATA Intrepid 6140 Director rear view ...... 343 9-9 SAN256M director ...... 345 9-10 McDATA Open Trunking ...... 372 9-11 LCD panel on front of EFC Management Server ...... 378 9-12 Rear view of EFC Management Server ...... 378 9-13 EFC Server public intranet with one ethernet connection ...... 380 9-14 EFC Server private network with two ethernet connections ...... 381 9-15 EFCM 8.0 main window ...... 383 10-1 MDS 9120 Multilayer Switch (IBM 2061-020) ...... 400
Figures xxix 10-2 MDS 9140 Multilayer Switch (IBM 2061-040) ...... 400 10-3 MDS 9216A Multilayer Switch (IBM 2062-D1A) with 48 ports ...... 401 10-4 Cisco MDS 9216A Multilayer Fabric Switch layout ...... 402 10-5 Cisco MDS 9216i ...... 404 10-6 MDS 9506 Multilayer Director (IBM 2062-D04) ...... 405 10-7 MDS 9509 Multilayer Director (IBM 2062-D07) ...... 407 10-8 Cisco MDS 9509 Multilayer Director layout ...... 409 10-9 Cisco MDS 9000 family port types...... 414 10-10 MDS 9500 Series supervisor module ...... 416 10-11 16 port switching module ...... 418 10-12 32 port switching module ...... 419 10-13 Cisco MDS 9000 14+2 Multi-Protocol Services Module ...... 419 10-14 8-port IP Services Module ...... 420 10-15 Storage Services Module...... 422 10-16 Cisco MDS 9000 Port Analyzer Adapter -2 ...... 422 10-17 Cisco MDS 9000 Fabric Manager user interface ...... 425 10-18 Out-of-band management connection ...... 426 10-19 In-band management connection ...... 427 10-20 PortChannels and ISLs on the Cisco MDS 9000 switches ...... 434 10-21 Traditional SAN ...... 436 10-22 Virtual SAN ...... 437 10-23 Inter-VSAN Routing ...... 441 10-24 Trunking and PortChanneling ...... 443 10-25 Forward Congestion Control ...... 445 10-26 Security with local authentication...... 447 10-27 Security with RADIUS server ...... 448 10-28 SPAN destination ports ...... 451 10-29 SD_Port for ingress (incoming) traffic ...... 452 10-30 SD_Port for egress (outgoing) traffic ...... 453 10-31 Fibre Channel analyzer without SPAN...... 456 10-32 Fibre Channel analyzer using SPAN ...... 457 10-33 Using a single SD_Port to monitor traffic ...... 458 10-34 Zoning overview...... 460 11-1 Two examples of switch cascading ...... 474 11-2 Ring design ...... 474 11-3 Meshed network design ...... 475 11-4 Host-tier and storage-tier ...... 476 11-5 Tier to tier...... 476 11-6 Core-edge design ...... 477 12-1 Storage Manager View Event Log icon ...... 492 13-1 Simple SAN with the IBM TotalStorage Switch L10...... 496 13-2 Expanded SAN with the IBM TotalStorage Switch L10 ...... 498 13-3 Simple dual loop design...... 500
xxx IBM TotalStorage: SAN Product, Design, and Optimization Guide 13-4 Expanded dual-loop design ...... 501 13-5 Simple clustering solution with the IBM TotalStorage SAN Switch L10 503 14-1 High performance design...... 506 14-2 Expanding the SAN fabric with E_Ports...... 509 14-3 Core-edge solution...... 511 14-4 High availability dual enterprise SAN fabric ...... 514 14-5 Simple HACMP cluster with dual switch with redundant fabric ...... 517 14-6 Large HACMP cluster ...... 520 14-7 Secure SAN ...... 523 15-1 High performance design...... 526 15-2 Expanded SAN fabric with E_Ports ...... 529 15-3 Redundant fabrics ...... 531 15-4 Dual sites ...... 533 15-5 Single director clustering solution ...... 536 15-6 Secure solution ...... 541 15-7 Tape attachment using IBM TotalStorage SAN24M-1 switches . . . . . 545 15-8 Tape zoning ...... 546 16-1 High performance design...... 552 16-2 Expanding the SAN fabric with E_Ports...... 555 16-3 Traditional dual fabric design without VSANs ...... 558 16-4 Dual fabric design with VSANs ...... 559 16-5 Traditional across site dual fabric design...... 561 16-6 Across site Dual fabric design using VSANs ...... 562 16-7 IBM HACMP cluster with redundant fabric...... 565 16-8 Large HACMP cluster ...... 568 16-9 Protecting your data from human error and sabotage ...... 571 16-10 Utilizing the Cisco MDS 9000 TL_Port...... 574 17-1 Case study 2: Server schematic ...... 583 17-2 Different ISL for data access and for data replication ...... 587 17-3 Case study 4: Server schematic ...... 596 17-4 Case Study 5: Server Schematic ...... 601 17-5 Case Study 6: Server schematic ...... 605 18-1 Core SAN design ...... 610 18-2 Adding an additional two switches to the SAN...... 612 18-3 SAN after first year of expansion ...... 613 18-4 SAN design after three years of operation...... 614 18-5 Core edge design...... 616 18-6 Management network ...... 619 18-7 Core switch design...... 620 18-8 Expanding to 40 servers using core switch technology ...... 622 18-9 Getwell initial design ...... 623 18-10 Feelinbad initial design ...... 625 18-11 Adding additional storage ports for non-SGI servers ...... 627
Figures xxxi 18-12 Trunking in Getwell Center ...... 629 18-13 Core switch design for Getwell site ...... 632 18-14 Core switch design for Feelinbad site ...... 633 18-15 ElectricityFirst solutiion based on IBM TotalStorage SAN32B-2 . . . . . 635 18-16 ElectricityFirst - changes to the SAN design to connect new servers . 636 18-17 Initial design using IBM TotalStorage SAN Switch M14 ...... 640 18-18 Proposed design for Company Five...... 644 18-19 Proposed design for Primary site...... 648 18-20 Proposed design for Secondary site ...... 649 18-21 DWDM connection between sites ...... 650 19-1 Core SAN design using a SAN140M Director ...... 656 19-2 Fully redundant SAN140M Director solution ...... 658 19-3 SAN140M solution with all potential servers ...... 659 19-4 Management network ...... 663 19-5 Initial design using IBM TotalStorage SAN32M-2 switches ...... 664 19-6 Final design to accommodate all potential servers ...... 666 19-7 Management network ...... 670 19-8 Getwell SAN: SAN140M Director and SAN32M-2 switches ...... 671 19-9 Feelinbad SAN: McDATA SAN32M-2 switches ...... 672 19-10 Management network ...... 677 19-11 ElectricityFirst solutiion based on IBM TotalStorage SAN32M-2 . . . . . 678 19-12 ElectricityFirst - changes to the SAN design to connect new servers . 679 19-13 Initial design using SAN140M Directors...... 683 19-14 Management network ...... 686 19-15 Proposed design for Company Five...... 687 19-16 Management network ...... 691 19-17 Proposed design for the Primary site...... 692 19-18 Proposed design for the Secondary site ...... 693 19-19 Complete solution ...... 694 19-20 Management network ...... 698 20-1 Core SAN design using a Cisco MDS 9506 Director ...... 700 20-2 Fully redundant Cisco MDS 9506 Director solution ...... 702 20-3 Cisco MDS 9506 solution with all potential servers ...... 703 20-4 Management network ...... 707 20-5 Initial design using Cisco MDS 9140 switches...... 708 20-6 Final design to accommodate all potential servers ...... 710 20-7 Management network ...... 714 20-8 Getwell SAN design using Cisco MDS 9506 Director ...... 715 20-9 Feelinbad SAN design using Cisco MDS 9506 Directors ...... 716 20-10 ElectricityFirst solution based on Cisco MDS 9140 switches...... 721 20-11 Scenario after adding two more Cisco MDS 9140 switches ...... 724 20-12 Initial design using Cisco MDS 9506 Director ...... 726 20-13 Management network ...... 729
xxxii IBM TotalStorage: SAN Product, Design, and Optimization Guide 20-14 Proposed design for Company Five...... 730 20-15 Management network ...... 734 20-16 The proposed design for the Primary site ...... 735 20-17 Proposed design for the Secondary site ...... 736 20-18 The complete solution ...... 737 21-1 Time Division Multiplexer concepts ...... 746 21-2 Coarse Wave Division Multiplexer concepts ...... 747 21-3 DWDM overview ...... 748 21-4 Multiplexer to demultiplexer ...... 750 21-5 Both multiplexer and demultiplexer ...... 750 21-6 Light dropped and added ...... 751 21-7 Example of OADM using dielectric filter...... 752 21-8 Point-to-point topology ...... 752 21-9 Linear topology between three locations ...... 753 21-10 Ring topology using two DWDM and two OADM...... 754 21-11 Ring topology with three DWDM ...... 755 21-12 DWDM module showing east and west ...... 755 21-13 East and west: Same wavelengths within the same band ...... 756 21-14 Light propagation through fiber ...... 764 21-15 Light propagation in single-mode fiber...... 764 21-16 Light propagation in multi-mode fiber...... 765 21-17 ESCON droop example ...... 766 21-18 ESCON compared to 1 Gbps FICON ...... 766 21-19 Async PPRC bandwidth estimator ...... 767 21-20 Sample output from Async PPRC bandwidth estimator...... 768 22-1 Consolidation of disk storage across a business park (<10Km) . . . . . 774 22-2 Output from the portbuffershow command ...... 776 22-3 Brocade WebTools shows up buffer limited ports...... 778 22-4 SAN distance extension up to 10 km with synchronous replication. . . 781 22-5 Metro Mirror up to 300 km with DWDM ...... 784 22-6 Multiple site: Ring topology DWDM solution ...... 787 22-7 Seven tiers of disaster recovery...... 789 22-8 Remote tape vaulting ...... 790 22-9 Customer environment...... 792 22-10 Disaster recovery solution ...... 793 23-1 Consolidation of disk storage across a business park (<10Km) . . . . . 800 23-2 SAN distance extension up to 10 km with synchronous replication. . . 803 23-3 Metro Mirror up to 300 km with DWDM ...... 806 23-4 Multiple site: Ring topology DWDM solution ...... 809 23-5 Seven tiers of disaster recovery...... 810 23-6 Remote tape vaulting ...... 811 23-7 Customer environment...... 814 23-8 Disaster recovery solution ...... 815
Figures xxxiii 24-1 Cisco 2062-CW1 CWDM ...... 821 24-2 Cisco ONS 15530 distance applications ...... 822 24-3 Cisco ONS 15540 and 15530 ...... 822 24-4 Consolidation of disk storage across a business park (<10Km) . . . . . 824 24-5 SAN distance extension up to 10 km with synchronous replication. . . 828 24-6 Metro Mirror up to 300 km with DWDM ...... 831 24-7 Multiple site: Ring topology DWDM solution ...... 834 24-8 Seven tiers of disaster recovery...... 836 24-9 Remote tape vaulting ...... 837 24-10 The existing SAN environment at Power Transmission Company ZYX840 24-11 Separation of development/test from production; DR site established. 842 24-12 Async PPRC bandwidth estimator ...... 843 24-13 Output from Async PPRC bandwidth estimator ...... 844 24-14 Utilization statistics from IBM Disk Magic for the DR DS6800 at 5,000 IOPs 845 24-15 Global Mirroring has been established using FCIP tunneling and IVR 846 25-1 Connecting high I/O servers to the core switches ...... 851
xxxiv IBM TotalStorage: SAN Product, Design, and Optimization Guide Notices
This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and distribute these sample programs in any form without payment to IBM for the purposes of developing, using, marketing, or distributing application programs conforming to IBM's application programming interfaces.
© Copyright IBM Corp. 2005. All rights reserved. xxxv Trademarks
The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both:
Eserver® ESCON® Redbooks™ Eserver® ETE™ RS/6000® Redbooks (logo) ™ FlashCopy® S/360™ ibm.com® FICON® S/370™ iSeries™ HACMP™ S/390® pSeries® Illustra™ Storage Tank™ xSeries® Informix® System/36™ z/Architecture™ IBM TotalStorage Proven™ System/360™ z/OS® IBM® System/370™ zSeries® Lotus Notes® System/38™ z9™ Lotus® System/390® AFS® Magstar® SANergy® AIX® Netfinity® Tivoli Enterprise™ AS/400® NetView® Tivoli® BladeCenter® Notes® TotalStorage Proven™ CICS® NUMA-Q® TotalStorage® DB2® OS/390® Tracer™ Enterprise Storage Server® Parallel Sysplex® Wave® Enterprise Systems PowerPC® WebSphere® Architecture/390® POWER™ Everyplace® PR/SM™
The following terms are trademarks of other companies: Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.
xxxvi IBM TotalStorage: SAN Product, Design, and Optimization Guide Preface
In this IBM® Redbook, we visit some of the core components and technologies that underpin a storage area network (SAN). We cover some of the latest additions to the IBM SAN portfolio, discuss general SAN design considerations, and build these considerations into a selection of real-world case studies.
We have also consolidated material from other SAN redbooks to create a complete overview of the depth and breadth of the IBM TotalStorage® SAN portfolio.
We realize that there are many ways to design a SAN and put all the components together. In our examples, we have incorporated the major considerations that you need to think about, but still left room to maneuver on the SAN field of play.
This redbook focuses on the SAN products that are generally considered to form the backbone of the SAN fabric today: switches and directors. With this backbone, developing it has prompted discrete approaches to the design of a SAN fabric. The bespoke vendor implementation of technology that is characteristic in the design footprint of switches and directors, means that we have an opportunity to answer challenges in different ways.
We will show examples where strength can be built in to our SAN using the network and the features of the components themselves. Our aim is to show that you can cut your SAN fabric according to your cloth.
The team that wrote this redbook
This redbook was produced by a team of specialists from around the world working at the International Technical Support Organization, San Jose Center.
Jon Tate is a Project Manager for IBM TotalStorage SAN Solutions at the International Technical Support Organization, San Jose Center. Before joining the ITSO in 1999, he worked in the IBM Technical Support Center, providing Level 2 support for IBM storage products. Jon has 19 years of experience in storage software and management, services, and support, and is both an IBM Certified IT Specialist and an IBM SAN Certified Specialist. He is also a Member of the British Computer Society, Chartered IT Professional (MBCS CITP).
© Copyright IBM Corp. 2005. All rights reserved. xxxvii Jim Kelly is a storage Field Technical Sales Support for the Systems and Technology Group in IBM New Zealand, and an SNIA Certified Professional (SCP). Prior to joining IBM in 1999, he spent 13 years at Data General, including a brief period with EMC. His early career was spent working in an IBM VSE mainframe environment.
Pauli Rämö is an Advisory IT Specialist in IBM Global Services, Finland. He has 13 years of experience with RS/6000®, IBM eServer pSeries®, AIX®, HACMP™, and Linux®. His areas of expertise also include open systems storage solutions and SAP R/3 Basis. He has contributed to two SAN-related Redbooks™ in the past.
Leos Stehlik is an IT Architect for Storage Solutions at IBM ITS in the Czech Republic. He has eight years of experience in the fields of SAN, storage hardware and software, Tivoli® Storage Management and UNIX®. He has written four IBM Redbooks, and developed IBM classes in many areas of storage and storage management. His previous publications include the IBM Redbook Using Tivoli Storage Manager in a SAN Environment, SG24-6132-00 and Introducing the SAN File System, SG24-7057-01.
xxxviii IBM TotalStorage: SAN Product, Design, and Optimization Guide L-R: Jon, Pauli, Leos, and Jim
Thanks to the following people for their contributions to this project:
Tom Cady Deanna Polm Sangam Racherla International Technical Support Organization, San Jose Center
Stephen Garraway Ronda Hruby Alexander Ignacio Russell Nunag Glen Routley Madhav Vaze Bruce Wilson The previous authors of this redbook
Preface xxxix Lisa Dorr IBM Systems and Technology Group
Jim Banask Cal Blombaum William Champion Scott Drummond Parker Grannis Pam Lukes Michael Starling Jeremy Stroup Ernie Williamson Michelle Wright IBM Storage Systems Group
Anthony Vandewerdt IBM Global Services
Jim Baldyga Brian Steffler Brocade Communications Systems
Reena Choudhry Mark Allen Kamal Bakshi Seth Mason Cuong Tran Cisco Systems
Tom Hammond-Doel Greg Singhaus Lovest Watson Emulex Corporation
Brent Anderson McDATA Corporation
Tom and Jenny Chang Garden Inn Hotel, Los Gatos, California
xl IBM TotalStorage: SAN Product, Design, and Optimization Guide Become a published author
Join us for a two- to six-week residency program! Help write an IBM Redbook dealing with specific products or solutions, while getting hands-on experience with leading-edge technologies. You'll team with IBM technical professionals, Business Partners and/or customers.
Your efforts will help increase product acceptance and customer satisfaction. As a bonus, you'll develop a network of contacts in IBM development labs, and increase your productivity and marketability.
Find out more about the residency program, browse the residency index, and apply online at: ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us!
We want our Redbooks to be as helpful as possible. Send us your comments about this or other Redbooks in one of the following ways: Use the online Contact us review redbook form found at: ibm.com/redbooks Send your comments in an Internet note to: [email protected] Mail your comments to: IBM Corporation, International Technical Support Organization Dept. QXXE Building 80-E2 650 Harry Road San Jose, California 95120-6099
Preface xli xlii IBM TotalStorage: SAN Product, Design, and Optimization Guide 1
Chapter 1. Introduction
Until recently, disaster planning for businesses focused on recovering centralized data centers following a catastrophe, either natural or man-made. While these measure remain important to disaster planning, the protection they provide is far from adequate for today's distributed computing environments.
The goal for companies today is to achieve a state of business continuity, where critical systems and networks are always available. To attain and sustain business continuity, companies must engineer availability, security, and reliability into every process from the outset.
In this chapter, we consider the many benefits SAN has to offer in these areas.
© Copyright IBM Corp. 2005. All rights reserved. 1 1.1 Beyond disaster recovery
When disaster recovery emerged as a formal discipline and a commercial business in the 1980s, the focus was on protecting the data center, the heart of a company’s heavily centralized IT structure. This model began to shift in the early 1990s to distributed computing and client/server technology.
At the same time, information technology became embedded in the fabric of virtually every aspect of a business. Computing was no longer something done in the background. Instead, critical business data could be found across the enterprise, on desktop PCs and departmental local area networks, as well as in the data center.
This evolution continues today. Key business initiatives such as enterprise resource planning (ERP), supply chain management, customer relationship management and e-business have all made continuous, ubiquitous access to information crucial to an organization. This means business can no longer function without information technology in the following areas: Data Software Hardware Networks Call centers Laptop computers
A company that sells products on the Web, for example, or supports customers with an around-the-clock call center, must be operational 24 hours a day, seven days a week, or customers will go elsewhere. An enterprise that uses e-business to acquire and distribute parts and products is not only dependent on its own technology but that of its suppliers. As a result, protecting critical business processes, with all their complex interdependencies, has become as important as safeguarding data itself.
The goal for companies with no business tolerance for downtime is to achieve a state of business continuity, where critical systems and networks are continuously available, no matter what happens. This means thinking proactively: engineering availability, security, and reliability into business processes from the outset, not retrofitting a disaster recovery plan to accommodate ongoing business continuity requirements.
2 IBM TotalStorage: SAN Product, Design, and Optimization Guide 1.1.1 Whose responsibility is it? Many senior executives and business managers consider business continuity the responsibility of the IT department. However, it is no longer sufficient or practical to vest the responsibility exclusively in one group. Web-based and distributed computing have made business processes too complex and decentralized. More than that, a company’s reputation, customer base and, of course, revenue and profits are at stake. All executives, managers, and employees must therefore participate in the development, implementation, and ongoing support of continuity assessment and planning.
The same information technology driving new sources of competitive advantage has also created new expectations and vulnerabilities. On the Web, companies have the potential to deliver immediate satisfaction or dissatisfaction to millions of people. Within ERP and supply chain environments, organizations can reap the rewards of improved efficiencies, or feel the impact of a disruption anywhere within their integrated processes.
With serious business interruption now measured in minutes rather than hours, even success can bring about a business disaster. Web companies today worry more about their ability to handle unexpected peaks in customer traffic than about fires or floods, and for good reason. For example, an infrastructure that cannot accommodate a sudden 200 percent increase in Web site traffic generated by a successful advertising campaign can result in missed opportunities, reduced revenues, and a tarnished brand image. Because electronic transactions and communications take place so quickly, the amount of work and business lost in an hour far exceeds the toll of previous decades. According to reports, the financial impact of a major system outage can be enormous: US$6.5 million per hour for a brokerage operation US$2.6 million per hour for a credit-card sales authorization system US$14,500 per hour in automated teller machine (ATM) fees if an ATM system is offline
Even what was once considered a minor problem, a faulty hard drive or a software glitch, can cause the same level of loss as a power outage or a flooded data center, if a critical business process is affected. For example, it has been calculated that the average financial loss per hour of disk array downtime stands at: US $29,301 in the securities industry US $26,761 for manufacturing US $17,093 for banking US $9,435 for transportation
Chapter 1. Introduction 3 More difficult to calculate are the intangible damages a company can suffer: lower morale and productivity, increased employee stress, delays in key project time lines, diverted resources, regulatory scrutiny, and a tainted public image. In this climate, executives responsible for company performance now find their personal reputations at risk. Routinely, companies that suffer online business disruptions for any reason make headlines the next day, with individuals singled out by the press. Moreover, corporate directors and officers can be liable for the consequences of business interruption or loss of business-critical information. Most large companies stipulate in their contracts that suppliers must deliver services or products under any circumstances. What’s more, adequate protection of data may be required by law, particularly for a public company, financial institution, utility, health care organization, or government agency.
Together, these factors make business continuity the shared responsibility of an organization’s entire senior management, from the CEO to line-of-business executives in charge of crucial business processes. Although IT remains central to the business continuity formula, IT management alone cannot determine which processes are critical to the business and how much the company should pay to protect those resources.
1.1.2 The Internet brings increased risks A recent IBM survey of 226 business recovery corporate managers revealed that only eight percent of Internet businesses are prepared for a computer system disaster. Yet doing business online means exposing many business-critical applications to a host of new risks. While the Internet creates tremendous opportunity for competitive advantage, it can also give partners, suppliers, customers, employees and hackers increased access to corporate IT infrastructures. Unintentional or malicious acts can result in a major IT disruption. Moreover, operating a Web site generates organizational and system-related interdependencies that fall outside of a company’s control from Internet Service Providers (ISP) and telecommunications carriers to the hundreds of millions of public network users.
Therefore, the greatest risk to a company’s IT operations may no longer be a hurricane, a 100-year flood, a power outage, or even a burst pipe. Planning for continuity in an e-business environment must address vulnerability to network attacks, hacker intrusions, viruses, and spam, as well as ISP and telecommunication line failures.
4 IBM TotalStorage: SAN Product, Design, and Optimization Guide 1.1.3 Planning for business continuity Few organizations have the need or the resources to assure business continuity equally for every functional area. Therefore, any company that has implemented a single business continuity strategy for the entire organization is likely under-prepared, or spending money unnecessarily. The key to business continuity lies in understanding your business, determining which processes are critical to staying in that business, and identifying all the elements crucial to those processes. Specialized skills and knowledge, physical facilities, training, and employee satisfaction, as well as information technology, should all be considered. By thoroughly analyzing these elements, you can accurately identify potential risks and make informed business decisions about accepting, mitigating or transferring those risks.
Once you have developed a program for assuring that critical processes will be available around the clock, you should assume that it will fail and commit to keeping your program current with business and technology infrastructure changes. A fail-safe strategy assumes that no business continuity program can provide absolute protection from every type of damage, no matter how comprehensive your high-availability, redundancy, fault tolerance, clustering, and mirroring strategies.
Today, the disasters most likely to bring your business to a halt are the result of human error or malice: the employee who accidentally deletes a crucial block of data; the disgruntled ex-employee seeking revenge by introducing a debilitating virus; the thief who steals vital trade secrets from your mainframe; or the hacker who invades your network. According to a joint study by the U.S. Federal Bureau of Investigation and the Computer Security Institute, the number and severity of successful corporate hacks is increasing dramatically, particularly intrusions by company insiders. In one study, 250 Fortune 1000 companies reported losses totaling US $137 million in 1997, an increase of 37 percent over the previous year.
Making an executive commitment to regularly testing, validating, and refreshing your business continuity program can protect your company against perhaps the greatest risk of all, complacency. In the current environment of rapid business and technology change, even the smallest alteration to a critical application or system within your enterprise or supply chain can cause an unanticipated failure, impacting your business continuity. Effective business protection planning addresses not only what you need today, but what you will need tomorrow and into the future.
Chapter 1. Introduction 5 1.2 Using a SAN for business continuance
Although some of the concepts that we detail purely apply to only the SAN environment, there are general considerations that need to be taken into account in any environment. Any company that is serious about business continuance will have considered and applied processes or procedures to take into account any of the eventualities that may occur, such as those listed in Figure 1-1.
Sprinkler A/C Failure Evacuation Low Voltage Discharge Acid Leak Explosion Microwave Fade Static Fire Network Failure Asbestos Electricity Bomb Threat Flood PCB Contamination Strike Action Bomb Blast Fraud Plane Crash S/W Error Brown Out Frozen Pipes Power Outage S/W Ransom Burst Pipe Hacker Power Spike Terrorism Cable Cut Hail Storm Power Surge Theft Chemical Spill Halon Programmer Toilet Overflow CO Fire Discharge Error Tornado Condensation Human Error Raw Sewage Train Construction Humidity Relocation Delay Derailment Coolant Leak Hurricane Rodents Transformer Cooling Tower HVAC Failure File Roof Cave In Leak H/W Error Tsunami Sabotage Corrupted Data Ice Storm UPS Failure Shotgun Blast Diesel Insects Vandalism Shredded Data Generator Lightening Vehicle Crash Sick building Earthquake Logic Bomb Virus Smoke Damage Electrical Short Lost Data Water (Various) Snow Storm Epidemic Wind Storm Volcano Figure 1-1 Business outage causes
Some of these problems are not necessarily common to all regions throughout the world, but they should be considered nonetheless, even if only to dismiss the eventuality that they might happen. Careful consideration will result in a deeper understanding of what is likely to cause a business outage, rather than adopting an it will not happen to me attitude. After all, the Titanic was once thought unsinkable.
6 IBM TotalStorage: SAN Product, Design, and Optimization Guide 1.2.1 SANs and business continuance So why would the risk increase if you were to implement a SAN in your environment? The short answer is that it might not increase the risk. It might expose you to more risk over a greater area, for example, the SCSI 25 m restriction means that a small bomb planted in the correct position would do quite nicely. If you are using a SAN for distance solutions, then it might be necessary to increase the size of the bomb, or plant many more of them, to cause the same effect.
What a SAN means is that you now are beginning to explore the potential for ensuring that your business can actually continue in the wake of a disaster. It may be able to do this by: Providing for greater operational distances Providing mirrored storage solutions for local disasters Providing failover support for local disasters Providing remote vaulting anywhere in the world Providing high availability file serving functionality Providing the ability to avoid space outage situations for higher availability
If we are to take the simple example of distance, what a SAN will allow you to do is to break the SCSI distance barrier. Does this in itself make you any safer? Of course, it does not. Does it give you an opportunity to minimize the risk to your business. Of course, it does.
It is up to you if you decide to use that to your advantage, or ignore it and the other benefits that it can bring to your business. One thing is certain though; if you do not exploit the SAN’s potential to its fullest, other people might. Those other people might be your competitors. Does that worry you? If it does not, then you can stop reading right now, because this redbook is not for you! We are targeting those people that are concerned with unleashing the potential of their SAN, or are interested in seeing what a SAN can do.
But that is not all we will do. We will provide you with as much information as we can that will cover the data center environment from floor to ceiling and the considerations that you should take to ensure minimal exposure to any outage.
As availability is linked to business continuance and recovery, we will also cover methods that can be employed to ensure that the data in your SAN is available to those that are authorized to access it, and protected from those that are not.
Chapter 1. Introduction 7 1.3 SAN business benefits
Today’s business environment creates many challenges for the enterprise IT planner. This is a true statement and relates to more than just business continuance, so perhaps now is a good time to look at whether deploying a SAN will solve more than just one problem. It can be an opportunity to look at where you are today and where you want to be in three year. Is it better to plan for migration to a SAN from the start, or try to implement one later after other solutions have been considered and, possibly, implemented? Are you sure that the equipment that you install today will still be usable three years later? Is there any use that you can make of it outside of business continuance? A journey of a thousand miles begins with one step.
In the topics that follow, we will remind you of some of the business benefits that SANs can provide. We have identified some of the operational problems that a business faces today, and which could potentially be solved by a SAN implementation.
1.3.1 Storage consolidation and sharing of resources By enabling storage capacity to be connected to servers at a greater distance, and by disconnecting storage resource management from individual hosts, a SAN enables disk storage capacity to be consolidated. The results can be lower overall costs through better use of the storage equipment, lower management costs, increased flexibility, and increased control.
This can be achieved physically or logically, as we explain in the following sections.
Physical consolidation Data from disparate storage subsystems can be combined on to large, enterprise class shared disk arrays, which may be located at some distance from the servers. The capacity of these disk arrays can be shared by multiple servers, and users may also benefit from the advanced functions typically offered with such subsystems. This may include RAID capabilities, remote mirroring, and instantaneous data replication functions, which might not be available with smaller, integrated disks. The array capacity may be partitioned, so that each server has an appropriate portion of the available gigabytes.
8 IBM TotalStorage: SAN Product, Design, and Optimization Guide Physical consolidation of storage is shown in Figure 1-2.
Consolidated Storage Server B Server C Server A
Disk
A B C unused Shared Disk Array Figure 1-2 Storage consolidation
Available capacity can be allocated dynamically to any server requiring additional space. Capacity not required by a server application can be reallocated to other servers. This avoids the inefficiency of free disk capacity on one server not being usable by other servers. Extra capacity may be added, in a nondisruptive manner.
Logical consolidation It is possible to achieve shared resource benefits from the SAN, but without moving existing equipment. A SAN relationship can be established between a client and a group of storage devices that are not physically collocated, excluding devices that are internally attached to servers. A logical view of the combined disk resources may allow available capacity to be allocated and reallocated between different applications running on distributed servers, to achieve better utilization. Consolidation is covered in greater depth in redbook IBM Storage Solutions for Server Consolidation, SG24-5355.
Chapter 1. Introduction 9 In Figure 1-3 we show a logical consolidation of storage.
NFS CIFS FTP Client Existing IP Network for Client/Server Communications HTTP
Heterogeneous Clients (workstations or servers) Private Cluster NT Client AIX Client Solaris Client Linux Client Persistent store share among servers IFS w/ IFS w/ IFS w/ IFS w/ Meta-data server cache cache cache cache
Meta-data server Fibre Channel Network SAN Fabric . . . Device-to-device data movement Tape Disk Disk Meta-data server
Shared Storage Server Cluster for: Devices Load Balancing Fail-over processing Active data Backup, archive and inactive data Scalability Figure 1-3 Logical storage consolidation
1.3.2 Data sharing The term data sharing is used somewhat loosely by users and some vendors. It is sometimes interpreted to mean the replication of files or databases to enable two or more users, or applications, to concurrently use separate copies of the data. The applications can operate on different host platforms. A SAN can ease the creation of such duplicated copies of data using facilities such as remote mirroring.
Data sharing can also be used to describe multiple users accessing a single copy of a file. This could be called true data sharing. In a homogeneous server environment, with appropriate application software controls, multiple servers may access a single copy of data stored on a consolidated storage subsystem.
If attached servers are heterogeneous platforms, for example, a mix of UNIX and Microsoft® Windows® NT, sharing of data between such disparate operating system environments is complex. This is due to differences in file systems, data formats, and encoding structures. IBM however, uniquely offers a true
10 IBM TotalStorage: SAN Product, Design, and Optimization Guide data-sharing capability, with concurrent update, for selected heterogeneous server environments, using the Tivoli SANergy® File Sharing solution.
The SAN advantage in enabling enhanced data sharing can reduce the need to hold multiple copies of the same file or database, reducing duplication of hardware costs to store copies. It also enhances the ability to implement cross-enterprise applications, such as e-business, which is inhibited when multiple data copies are stored.
1.3.3 Nondisruptive scalability for growth There is an explosion in the quantity of data stored by the majority of organizations. This is fueled by the implementation of applications, such as e-business, e-mail, business intelligence, data warehouse, and enterprise resource planning. Some industry analysts, such as IDC and Gartner Group, estimate that electronically stored data is doubling every year. In the case of e-business applications, opening the business to the Internet, there have been reports of data growing by more than 10 times annually. This is a nightmare for planners, because it is increasingly difficult to predict storage requirements.
A finite amount of disk storage can be connected physically to an individual server due to adapter, cabling, and distance limitations. With a SAN, new capacity can be added as required, without disrupting ongoing operations. SANs enable disk storage to be scaled independently of servers.
1.3.4 Improved backup and recovery With data doubling every year, what effect does this have on the backup window? Backup to tape, and recovery, are operations which are problematic in the parallel SCSI or LAN-based environments. For disk subsystems attached to specific servers, two options exist for tape backup. Either it must be done onto a server attached tape subsystem, or by moving data across the LAN.
Tape pooling Providing tape drives to each server is costly. It also involves the added administrative overhead of scheduling the tasks and managing the tape media. SANs allow for greater connectivity of tape drives and tape libraries, especially at greater distances. Tape pooling is the ability for more than one server to logically share tape drives within an automated library. This can be achieved by software management, using tools, such as Tivoli Storage Manager; or with tape libraries with outboard management, such as the IBM 3494.
Chapter 1. Introduction 11 LAN-free and server-free data movement Backup using the LAN moves the administration to centralized tape drives or automated tape libraries. However, at the same time, the LAN experiences very high traffic volume during the backup or recovery operations, and this can be extremely disruptive to normal application access to the network. Although backups can be scheduled during non-peak periods, this might not allow sufficient time. Also, it might not be practical in an enterprise operating in multiple time zones.
We illustrate loading the IP network in Figure 1-4.
LAN Backup/Restore Today
Existing IP Network for Client/Server Communications
Backup/Restore
Control and Client Data Movement Server
Storage Manager Storage Manager client server
Disk Tape Disk Tape
Figure 1-4 Loading the IP network
SAN provides the solution, by enabling the elimination of backup and recovery data movement across the LAN. Fibre Channel’s high bandwidth and multi-path switched fabric capabilities enables multiple servers to stream backup data concurrently to high speed tape drives. This frees the LAN for other application traffic. The IBM Tivoli software solution for LAN-free backup offers the capability for clients to move data directly to tape using the SAN. A future enhancement to be provided by IBM Tivoli will allow data to be read directly from disk to tape (and tape to disk), bypassing the server. This solution is known as server-free backup.
12 IBM TotalStorage: SAN Product, Design, and Optimization Guide 1.3.5 High performance Applications benefit from the more efficient transport mechanism of Fibre Channel. Currently, Fibre Channel transfers data at 200 MBps, several times faster than typical SCSI capabilities, and many times faster than standard LAN data transfers. Future implementations of Fibre Channel at 400 and 800 MBps have been defined, offering the promise of even greater performance benefits in the future. Indeed, prototypes of storage components which meet the 2-Gigabit transport specification are already in existence.
The elimination of conflicts on LANs, by removing storage data transfers from the LAN to the SAN, might also significantly improve application performance on servers.
1.3.6 High availability server clustering Reliable and continuous access to information is an essential prerequisite in any business. As applications have shifted from robust mainframes to the less reliable client/file server environment, so have server and software vendors developed high availability solutions to address the exposure. These are based on clusters of servers. A cluster is a group of independent computers managed as a single system for higher availability, easier manageability, and greater scalability. Server system components are interconnected using specialized cluster interconnects, or open clustering technologies, such as Fibre Channel - Virtual Interface mapping.
Complex software is required to manage the failover of any component of the hardware, the network, or the application. SCSI cabling tends to limit clusters to no more than two servers. A Fibre Channel SAN allows clusters to scale to 4, 8, 16, and even to 100 or more servers, as required, to provide very large shared data configurations, including redundant pathing, RAID protection, and so forth. Storage can be shared and easily switched from one server to another. Just as storage capacity can be scaled non-disruptively in a SAN, so can the number of servers in a cluster be increased or decreased dynamically, without having an impact the storage environment.
Chapter 1. Introduction 13 1.3.7 Improved disaster tolerance Advanced disk arrays, such as IBM Enterprise Storage Server® (ESS), provide sophisticated functions, like Peer-to-Peer Remote Copy services, to address the need for secure and rapid recovery of data in the event of a disaster. Failures can be due to natural occurrences, such as fire, flood, or earthquake; or to human error. A SAN implementation allows multiple open servers to benefit from this type of disaster protection, and the servers can even be located some distance, up to 10 km, from the disk array which holds the primary copy of the data. The secondary site, holding the mirror image of the data, can be located up to a further 100 km from the primary site.
IBM has also announced Peer-to-Peer Copy capability for its Virtual Tape Server (VTS). With VTS, users maintain local and remote copies of virtual tape volumes, improving data availability by eliminating all single points of failure.
1.3.8 Allow selection of best of breed storage Internal storage, purchased as a feature of the associated server, is often relatively costly. A SAN implementation enables storage purchase decisions to be made independently of the server. Buyers are free to choose the best of breed solution to meet their performance, function, and cost needs. Large capacity external disk arrays may provide an extensive selection of advanced functions. For instance, the ESS includes cross platform functions, such as high performance RAID 5, Peer-to-Peer Remote Copy, Flash Copy, and functions specific to S/390®, such as Parallel Access Volumes (PAV), Multiple Allegiance, and I/O Priority Queuing. This makes it an ideal SAN attached solution to consolidate enterprise data.
Client/server backup solutions often include attachment of low capacity tape drives, or small automated tape subsystems, to individual PCs and departmental servers. This introduces a significant administrative overhead as users, or departmental storage administrators, often have to control the backup and recovery processes manually. A SAN allows the alternative strategy of sharing fewer, highly reliable, powerful tape solutions, such as the IBM Magstar® family of drives and automated libraries, between multiple users and departments.
1.3.9 Ease of data migration Data can be moved nondisruptively from one storage subsystem to another using a SAN, without server intervention. This may greatly ease the migration of data associated with the introduction of new technology, and the retirement of old devices.
14 IBM TotalStorage: SAN Product, Design, and Optimization Guide 1.3.10 Reduced total costs of ownership Expenditure on storage today is estimated to be in the region of 50% of a typical IT hardware budget. This is expected to increase as financial regulations, government legislation, offshoring, globalization, and so on, all increase the need to store more information, and for longer. IT managers are becoming increasingly focused on controlling these growing costs.
Consistent, centralized management As we have shown, consolidation of storage can reduce wasteful fragmentation of storage attached to multiple servers. It also enables a single, consistent data and storage resource management solution to be implemented, such as IBM StorWatch tools, combined with software such as Tivoli Storage Network Manager, Tivoli Storage Manager, and Tivoli SAN Manager, which can reduce costs of software and human resources for storage management.
Reduced hardware costs By moving data to SAN-attached storage subsystems, the servers themselves might no longer need to be configured with native storage. In addition, the introduction of LAN-free and server-free data transfers largely eliminate the use of server cycles to manage housekeeping tasks, such as backup and recovery, and archive and recall. The configuration of what might be termed thin servers therefore might be possible, and this could result in significant hardware cost savings to offset against costs of installing the SAN fabric.
1.3.11 Storage resources match e-business enterprise needs By eliminating islands of information, typical of the client/server model of computing, and introducing an integrated storage infrastructure, SAN solutions match the strategic needs of today’s e-business.
We show this in Figure 1-5 on page 16.
Chapter 1. Introduction 15 Storage within a SAN
Dynamic Storage Resource Management
UNIX UNIX (AIX) Automatic (HP) UNIX Data Management (Sun)
Intel NT/2000/NW/Linux z/OS
Universal data access Scalability & Flexibility 24 x 7 connectivity Server & Storage OS/400 UNIX (DEC) UNIX (SGI)
Figure 1-5 SAN total storage solutions
A well-designed, well-thought-out SAN can bring many benefits, and not only those related to business continuance. Using the storage network will be key to the storage and successful retrieval of data in the future, and the days of server-centric storage are rapidly becoming a distant memory.
16 IBM TotalStorage: SAN Product, Design, and Optimization Guide 2
Chapter 2. SAN fabric components
In this chapter we describe the Fibre Channel products that are used in an IBM Enterprise SAN implementation. This does not mean that you cannot implement other SAN compatible products, including those from other vendors, but the interoperability agreement must be clearly documented and agreed upon.
Fibre Channel is an open standard communications and transport protocol as defined by ANSI (Committee X3T11) and operates over copper and fiber optic cabling at distances of up to 10 kilometers (media dependent). IBM’s implementation is mainly in fiber optic cabling (copper SFPs are supported on the IBM TotalStorage SAN16M-R multiprotocol SAN router) and will be referred to as Fibre Channel cabling, or FC cabling, in this redbook.
We start by covering hardware that can be used to build a networked storage solution. Because the whole purpose of a SAN is to interconnect servers and storage, there are also the components and their subcomponents that make up the SAN itself. We use the abbreviation SAN to describe the complete storage area network including fabric and disk systems, and the term fabric to describe the Fibre Channel switching and networking environment.
Fibre or Fiber?: Fibre Channel was originally designed to support fiber optic cabling only. When copper support was added, the committee decided to keep the name in principle, but to use the UK English spelling (Fibre) when referring to the standard. We retain the US English spelling when referring generically to fiber optics and cabling.
© Copyright IBM Corp. 2005. All rights reserved. 17 2.1 Fibre Channel technology sub-components
This chapter focuses on the more visible components on the SAN fabric, but it is worth noting that the IBM Microelectronics division plays a major role behind the scenes in the development and manufacture of less visible FC fabric sub-components.
Communication over Fibre Channel, whether optical or copper, is serial. Computer busses on the other hand are parallel. This means that Fibre Channel devices need to be able to convert between the two. For this they use a serializer/deserializer, commonly referred to as a SerDes. IBM is a major manufacturer and supplier of SerDes ASICs.
Designers of FC switches and directors need to be able to deliver features and performance within tight time-to-market and cost constraints. Designers often therefore use application-specific integrated circuits (ASICs) sourced from third parties. IBM Microelectronics designs and manufactures processors and ASICs for Brocade, McDATA and Cisco. For example, the Cisco MDS Storage Services Module (SSM) uses eight PowerPC® processors for SCSI data-path processing, and the control processor on the Brocade Silkworm 4800 is also PowerPC.
For more information about IBM Microelectronics, refer to: http://www.ibm.com/chips
2.2 Fibre Channel interconnects
In Fibre Channel technology, frames are moved from source to destination using Fibre Channel Protocol (FCP) which, in most cases, is transmitted over fiber optic cable. Both sides of the conversation need to be able to interpret the light frequencies into electrical signals. The fiber optic interfaces can be provided by building interfaces directly into the device (as is typically done with HBAs) or by using separate fiberoptic interface modules which plug into the device.
Although FCP can be transmitted over copper cables, fiber optic implementations are much more common due to the longer distances they can achieve.
The interfaces that can be used to convert light signals to electrical signals are: Small Form Factor Pluggable Module (SFP) Gigabit Interface Converters (GBIC) Gigabit Link Modules (GLM) Media Interface Adapters (MIA) 1x9 transceivers
18 IBM TotalStorage: SAN Product, Design, and Optimization Guide We provide a brief description of the types of cables and connectors, and their functions in the following sections.
2.2.1 Fibre Channel transmission rates The current set of vendor offerings for switches, host bus adapters, and storage devices offer rates of 1, 2, and 4 Gbps, with some such as the IBM TotalStorage SAN256M (McData i10K) director also offering 10-Gbps ISLs. Typically both 4-Gbps and 2-Gbps hardware can autonegotiate down to support slower speeds, and 10 Gbps cannot.
It is not yet clear however how quickly the industry will transition to 10 Gbps for uses other than ISLs. Another technology that is likely to appear in the future is 8 Gbps, which has some backward compatibility with 4 Gbps so it might prove more popular.
2.2.2 Small Form Factor Pluggable Module The most common Fibre Channel interconnect component in use today is the small form factor pluggable module as shown in Figure 2-1. This component is hot pluggable on the I/O module or on some HBAs, and the cable is also hot-pluggable. Different SFPs are used for longwave (transmitted over 50 micron or 62.5 micron cable) or for shortwave (transmitted over 9-micron cable) communications, so remember to use longwave SFPs if you need to connect over longer distances, such as more than 150m at 4 Gbps.
Figure 2-1 SFP Hot Pluggable optical transceiver
Another version of the transceiver are called Small Form Fixed optical transceivers, and are mounted on the I/O module or the HBA through pin-through-hole technology as shown in Figure 2-2 on page 20. The transceivers, which are designed for increased densities, performance, and
Chapter 2. SAN fabric components 19 reduced power, are well-suited for Gigabit Ethernet, Fibre Channel, and 1394b (Firewire) applications.
Figure 2-2 Small Form Fixed pin-through-hole Transceiver
The small dimensions of SFP optical transceivers are ideal in switches and other products where many transceivers have to be configured in a small space. SFPs designed for use with 4-Gbps transmission rates can also be used on 2-Gbps and 1-Gbps connections. The quality of SFPs may vary and it is always best to order SFPs with the device into which you are planning to plug them.
SFPs are integrated fiber optic transceivers providing a high-speed, serial, electrical interface for connecting processors, switches, and peripherals through a fiber optic cable. In the Gigabit Ethernet environment, these transceivers can be used in local area network (LAN) switches or hubs, as well as in interconnecting processors. In SANs, they can be used for transmitting data between peripheral devices and processors.
Cabling for shortwave SFPs is multi-mode optical fiber (50 micron or 62.5 micron) or single-mode optical fiber (9 micron) terminated with industry-standard LC connectors, as illustrated in Figure 2-3 on page 21.
20 IBM TotalStorage: SAN Product, Design, and Optimization Guide Figure 2-3 SFF hot-pluggable transceiver (SFP) with LC connector fiber cable
The distances that can be achieved using short wavelength and long wavelength SFPs are listed in Table 2-1.
Table 2-1 Distances using SFP-based fiber optics Type of fiber Distance for each speed
9/125 µm Optical Fiber • 1.250 Gbps: 2 m - 5 km Longwave • 2.125 Gbps: 2 m - 10 km • 4.125Gbps: 2m - 10km • 10 Gbps: 2m - 10km
50/125 µm Optical Fiber • 1.0625 Gbps: 2 - 500 m Shortwave • 1.250 Gbps: 2 - 550 m • 2.125 Gbps: 2 - 300 m • 4.250 Gbps: 2- 150m • 10 Gpbs: 2 - 82m
62.5/125 µm Optical Fiber • 1.0625 Gbps: 2 - 300 m Shortwave • 1.250 Gbps: 2 - 275 m • 2.125 Gbps: 2 - 150 m
The distances shown are not necessarily the supported distances, and you will have to verify this with the switch and HBA vendor and the fiber optic installer.
There are also some especially high-powered laser versions of longwave SFPs to provide extended distances up to 35 Km or 80 Kms. Check your individual switch or director specifications to see if these options are supported.
Chapter 2. SAN fabric components 21 LC to SC converter cables can be used when cabling SFPs to GBICs.
SFP: Not all SFPs will work with each of the different switches, so we suggest you buy specific SFPs for the switches you are implementing. Also, different SFPs are used for shortwave and longwave connections.
2.2.3 Gigabit Interface Converters Gigabit Interface Converters (GBICs) are fiber optic transceivers providing a serial, electrical interface at gigabit speeds. GBICs are now less common in FC networks since many installations haved moved from 1 Gbps FC to higher speeds.
GBICs support connection over single-mode or multi-mode fiber optic cables. The standard dual SC plug is used to connect to the fiber optic cable. The plug is shown in Figure 2-4.
Figure 2-4 Dual SC fiber-optic plug connector
The distances that can be achieved using short wavelength and long wavelength GBICs are listed in Table 2-2.
Table 2-2 Distance using 1 Gbps GBIC-based fiber optics Type of fiber SWL LWL
9/125 µm Optical Fiber n/a 10 km
50/125 µm Optical Fiber 2 - 550 m 2 - 550 m
62.5/125 µm Optical Fiber 2 - 300 m 2 - 550 m
22 IBM TotalStorage: SAN Product, Design, and Optimization Guide Shortwave, or multi-mode, GBICs are usually color-coded beige with a black exposed surface; and longwave, or single-mode, GBICs are usually color-coded blue with blue exposed surfaces.
A GBIC is shown in Figure 2-5.
Figure 2-5 Gigabit Interface Converter
2.2.4 Gigabit Link Modules Gigabit Link Modules (GLMs), sometimes referred to as Gigabaud Link Modules, were used in early Fibre Channel applications. GLMs are interfaces for 266-Mbps and 1-Gbps transmission and are now seldom used. GLMs are not hot-pluggable.
With 1063 Mbps you can achieve the distances listed in Table 2-2, “Distance using 1 Gbps GBIC-based fiber optics” on page 22.
A GLM is shown in Figure 2-6 on page 24.
Chapter 2. SAN fabric components 23 Figure 2-6 Gigabit Link Module
2.2.5 Media Interface Adapters Media Interface Adapters (MIAs) can be used to facilitate conversion between optical and copper interface connections. Typically, MIAs are attached to host bus adapters to convert the signal to the appropriate media type, copper or optical. Best practice is usually to avoid MIAs as they introduce an extra set of connections and an additional potential point of failure, especially if they protrude from a device.
An MIA is shown in Figure 2-7.
Figure 2-7 Media Interface Adapter
24 IBM TotalStorage: SAN Product, Design, and Optimization Guide 2.2.6 1x9 transceivers Early FC implementations sometimes relied on 1x9 transceivers for providing SC connection to their devices. These are typically no longer used but are shown in Figure 2-8.
Figure 2-8 1x9 transceivers
2.2.7 Fibre Channel adapter cable The LC-SC adapter cable attaches to the end of an LC-LC cable to support SC device connections. A combination of one LC/LC fiber optic cable and one LC/SC adapter cable is required for each connection. This is used to connect from some of the older 1-Gbps devices to a 2-Gbps capable and LC interface-based SAN.
Shown in Figure 2-9 is a Fibre Channel adapter cable.
Figure 2-9 Fibre Channel adapter cable
Chapter 2. SAN fabric components 25 2.2.8 Host Bus Adapters The device that acts as the interface between the fabric of a SAN and either a host or a storage device is a Host Bus Adapter (HBA). In the case of storage devices, they are often just referred to as Host Adapters.
The HBA connects to the bus of the host or storage system. It has some means of connecting to the cable leading to the fabric. The function of the HBA is to convert the parallel electrical signals from the bus into a serial signal to pass to the fabric.
Some server HBAs are dual ported, which can be useful if server I/O slots are constrained, but dual HBAs are typically around twice the price of single HBAs and may use a shared cache architecture which can affect performance, and will provide lower overall availability than a pair of single port HBAs.
An example of an HBA is shown in Figure 2-10.
Figure 2-10 HBA
Various cables may be supported by the HBAs, for example: Glass fiber – Single-mode – Multi-mode Copper – Twisted pair –Coaxial
26 IBM TotalStorage: SAN Product, Design, and Optimization Guide There are several manufacturers of HBAs and an important consideration when planning a SAN, is the choice of HBAs. Some HBAs may have interoperability problems with some other FIbre Channel components.
A server or storage device may have one HBA or it may have many. Depending upon the particular configuration of the SAN, if there are more than one, they might all be identical, or they could be of different types.
The adapters in storage arrays are usually determined by the manufacturer. Factors influencing the choice of HBAs in servers are dealt with in 6.4, “Host connectivity and Host Bus Adapters” on page 230.
2.2.9 Loop Switches Some devices require FC Arbitrated Loop support, one example being LTO1 FC drives in an IBM TotalStorage 3584 Ultrascalable Tape Library. Some full fabric switches such as the IBM TotalStorage SAN24M-1 Mid-Range Switch, allow an administrator to set specific ports as FC-AL ports.
There is also a device called a Loop Switch. In this case each of the attached devices is in its own Arbitrated Loop. These loops are then internally connected by a nonblocking switched fabric.
A loop switch is useful to connect several FC-AL devices, but allow them to each communicate at full Fibre Channel bandwidth rather than them all sharing the bandwidth.
Loop switches differ from FC-AL hubs in that hubs share the available bandwidth on an arbitrated (excuse-me) basis.
Note: Sometimes the term nonblocking is used to describe a switch that does not over-subscribe its bandwidth. Perhaps more technically correct is when the term is used to describe a switch that has a pipelined architecture so that one frame’s progress through the switch is not dependent on the frame in front of it being passed successfuly to a destination.
Typically, modern switches are all multipipelined and so are nonblocking in that sense, but many also use over-subscription of backplane bandwidth and this is becoming more common as FC speeds increase.
Chapter 2. SAN fabric components 27 2.2.10 Switches Switches allow Fibre Channel devices to be connected together, implementing a switched fabric topology between them. In a fabric switch, all devices operate at up to full Fibre Channel bandwidth, athough some switches use over-subscription (see 3.9.2, “Oversubscription” on page 106 for a description of over-subscription) rates of up to 3.2 to one. The switch creates a direct communication path between any two ports which are exchanging data. The switch intelligently routes frames from the initiator to the responder.
It is possible (though seldom desirable) to connect switches together in cascades and meshes using Inter-Switch links (ISLs). It should be noted that switches from different manufacturers might not interoperate fully.
As well as implementing this switched fabric, the switch also provides a variety of fabric services and features such as: Name services Fabric control Time services Automatic discovery and registration of host and storage devices Rerouting of frames, if possible, in the event of a port problem Storage Services (virtualization, replication, extended distances)
Features which can be implemented in Fibre Channel switches include: Telnet and RS-232 interface for management HTTP server for Web-based management MIB for SNMP monitoring Hot-swappable, redundant power supplies and cooling devices Hot-pluggable SFPs Zoning Trunking (transparent bandwidth sharing between ports) Exchange-based path selection/load balancing between ports (called trunking by some vendors) VSAN (Virtual SAN being a way to create a logical sub-fabric) VSAN trunking (piping more than one VSAN over a single ISL or FCIP link) Fibre Channel Protocol (FCP) FICON® FICON CUP iSCSI FCIP iFCP
28 IBM TotalStorage: SAN Product, Design, and Optimization Guide It is common to refer to a fabric as either core or edge, depending on its location in the SAN, and switches then are also cometimes referred to as being core or edge switches. If the switch forms, or is part of the SAN backbone, then it is a core switch. If it is mainly used to connect to hosts or storage then it is called an edge switch. There are certain cases where it is appropriate for storage, servers or both to be connected directly to core switches.
ERP server Central file server Workgroup servers
Switch Switch Switch Red core fabric Blue core fabric Green edge fabric
Tape Dual controller disk system
Figure 2-11 Fibre Channel core and edge switches
2.2.11 Directors Switches which are designed to be in the core of a large fabric generally have higher resilience and more features than might be needed for edge devices. A Fibre Channel director is a switch which can be used to carry data for many edge switches.
Some additional features that can be implemented in directors include: Enhanced security features Backplane and blade based design for ease of expansion Potentially 99.999% uptime Non disruptive upgrade of firmware Hot-swap redundant components Support for large numbers of ISLs or very high-bandwidth ISLs Additional module options for advanced functions such as virtualization.
Chapter 2. SAN fabric components 29 The following sections discuss some of the typical differences between switches and directors.
Port capacity Directors tend to be larger than switches in both physical size and port capacity. Directors are typically 256-port capable, while switches typically currently run up to 48 ports or less.
MTBF Director manufacturers typically claim 99.999% uptime (an average of 25 minutes total unplanned downtime over a five year period), while switches tend to be specified as delivering 99.9% uptime (an average of eight hours total unplanned downtime per year) and dual fabric redundant switch pairs at delivering 99.99% uptime (an average of 50 minutes total unplanned downtime per year).
Despite this, many SAN architects are wary of claims of 99.999% uptime and may prefer to deploy two directors on separate fabrics.
Latency When building large networks, the latency between ports on a director will tend to be significantly lower than the latency between ports on multiple switches that are connected using Inter Switch links.
Firmware updates Director firmware should be able to be upgraded online. This feature is increasingly available also in smaller switches, although some switches may still require a reboot. Architects shoudl check individual product specifications as some ‘directors’ sold in recent times also had requirements for rebooting when updating firmware.
If a reboot is required, that fabric loses service during the reboot. This is another good reason for configuring dual fabrics, even when deploying directors. In this way, a fault with a new level of firmware can sometimes be detected and resolved before the entire SAN is committed to run on the new firmware.
Advanced storage services Some switch manufacturers are moving to expand the functionality of the directors and switches by offering modules which plug into a switch slot to offer advanced functions like storage virtualization and storage replication.
30 IBM TotalStorage: SAN Product, Design, and Optimization Guide Backplane and blades Rather than having a single, printed circuit assembly containing all the components in a device, directors are usually designed with a backplane and blades (sometimes simply referred to as cards or modules). If the backplane is in the centre of the unit with blades being plugged in at the back and the front, then it would usually be referred to as a midplane.
We show a backplane and blades architecture diagram in Figure 2-12.
Figure 2-12 A diagram of a backplane and blades architecture
If a backplane has components such as transistors or integrated circuits, then it is an active backplane. If it has no components at all, or just passive components such as resistors and capacitors then it is a passive backplane. In most implementations, the backplane is passive.
Some major benefits which are possible using this design: On the fly upgrades by adding extra blades giving additional ports On the fly implementation of other functionality, for example new protocol support by adding blades with different functionality Potential to have different levels of firmware on different blades Passive backplane
Chapter 2. SAN fabric components 31 Leading to very high level of reliability of the unit as a whole, faults can be isolated to a blade. This is especially true if the backplane has no components, but is just a circuit board with sockets and conductor tracks.
Note: A product might be described by its manufacturer as director class, but if it has a single backplane then this becomes a single point of failure.
It has long been a common practice for mainframe sites to implement duplicate ESCON® directors. Such companies might find it necessary to use duplicate Fibre Channel directors in a SAN.
2.2.12 Fibre Channel routers FC routers are devices designed to isolate and connect FC fabrics in the same way that IP routers have traditionally isolated and connected IP networks. Routers typically also have additional features such as the ability to route FC over IP (FCIP), or convert FC to iFCP, or provide an iSCSI gateway.
2.2.13 Switch, director and router features In this section, we discuss some of the main features for these components.
Frame buffering The number of buffer-to-buffer credits vary greatly depending on the device and can be an important selection criterion when FC devices are separated by any significant distance. The buffer-to-buffer-credit limit sets the number of unacknowledged frames that are allowed to exist between two FC devices, before they stop sending new data.
Domain number routing decision Because the destination address is divided into domain, area, and port, it is possible to make the routing decision on a single byte. As one example of this, if the domain number of the destination address indicates that the frame is intended for a different switch, the routing process can forward the frame to the appropriate interconnection without the need to process the entire 24-bit address and the associated overhead.
Data path in switched fabric Typically, the best practice is to keep switched fabrics as simple as possible, and to use routing to interconnect fabrics. A complex switched fabric can be created by interconnecting Fibre Channel switches. Switch-to-switch connections are performed by E_Port connections. This means that if you want to interconnect
32 IBM TotalStorage: SAN Product, Design, and Optimization Guide switches, they need to support E_Ports. Switches can also support multiple E_Port connections to expand the bandwidth.
In a switched fabric, a cut-through switching mechanism is used. This is not unique to switched fabrics and it is also used in Ethernet switches. Their function is to speed packet routing from port to port.
When a frame enters the switch, cut-through logic examines only the link level destination ID of the frame. Based on the destination ID, a routing decision is made, and the frame is switched to the appropriate port by internal routing logic contained in the switch. It is this cut-through which increases performance by reducing the time required to make a routing decision. The reason for this is that the destination ID resides in the first four bytes of the frame header, allowing the cut-through to be accomplished quickly. A routing decision can be made at the instant the frame enters the switch, without interpretation of anything other than the four bytes.
In such a configuration with interconnected switches, known as a meshed topology, multiple paths from one N_Port to another can exist.
An example of a meshed topology is shown in Figure 2-13 on page 34.
Chapter 2. SAN fabric components 33 Disk Disk Server Server
Switch Switch Tape Tape
Switch Switch
Disk Disk Server Server
Figure 2-13 Meshed topology switched fabric
2.2.14 Test equipment There are a few different pieces of test gear that are available in the area of fiber optics.
GO/NOGO testers GO/NOGO testers are simple devices which allow the user to prove that light is passing through the cable. Commonly a laser source is attached to one end of the fiber. If light reaches the other end, then the fiber is continuous. This is a useful way to quickly identify the two ends of a particular fiber in a bundle routed out of sight. The emerging light can be detected if the loose end of the fiber is placed near a sheet of writing paper.
The laser can be much higher powered than those in use by Fibre Channel devices, for example, Class 3 lasers.
Attention! Lasers are dangerous and there is a risk of serious injury to your eyes. Do not look directly into the laser.
34 IBM TotalStorage: SAN Product, Design, and Optimization Guide Light sources and attenuation meters GO/NOGO testers do not prove that the quality of the fiber is high enough for reliable communication. Specialized light sources and attenuation or power meters can be used to validate short distance cables.
The exact method of using the test equipment will depend on the test gear itself. The result is that the tester will be able to determine the attenuation along the fiber, usually measured in decibels (dB).
This test is considerably more time consuming than the GO/NOGO test.
The equipment needs to be regularly calibrated by an authorized agency in order to be sure that the results are accurate.
Optical Time Domain Reflectometer An Optical Time Domain Reflectometer (OTDR) is used to investigate the quality of long fiber optic cables, maybe as long as hundreds or even thousands of kilometers.
The OTDR sends out a pulse of light along the fibre and looks for reflections back. There will be a reflection: At the point where the fiber is plugged into the OTDR At the end of the fiber or a break Any splices in the fiber Sharp bends Damage to the fiber
When long distance fibers are laid, it is best practice for them to be tested using an OTDR.
The device creates a trace of where the backscatter or reflections take place. This is either displayed on a screen, printed, or both. The trace shows time or distance, directly proportional, on the horizontal axis and power on the vertical axis.
Fibre Channel analyzer In much the same way as there are ethernet network analyzers for looking at traffic going over a network, there are similar devices for Fibre Channel.
Ethernet analyzers are quite common today, however their Fibre Channel counterparts are less common and there are a few points to be made about them: Ethernet analyzers, twisted pair, can be connected to the network without disruption, and can analyze all data in the network. Fibre Channel analyzers
Chapter 2. SAN fabric components 35 are placed in a Fibre Channel link. They monitor frames going through the link, not all data on the fabric. The insertion of the analyzer is disruptive. It is connected by unplugging the link, and inserting the analyzer into the link. This is shown in Figure 2-14.
Without With analyzer analyzer
Analyzer
Figure 2-14 Connecting an FC analyzer