Open Source Development Labs Data Center

Goals and Capabilities Version 1.2

Open Source Development Labs, Inc. 127 SW Millikan Way, Suite 400 Beaverton, OR 97005 USA Phone: +1-503-626-2455

Copyright () 2006 by The Open Source Development Labs, Inc. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is available at http://www.opencontent.org/openpub/). Distribution of substantively modified versions of this document is prohibited without the explicit permission of the copyright holder. Other company, product or service names may be the trademarks of others. Linux is a Registered Trademark of .

1

Contents CONTENTS...... 2 WHAT'S NEW? REVISIONS OF THIS DOCUMENT...... 9 DATA CENTER LINUX INITIATIVE GOALS...... 10 The Market and Technical Scope of This Document...... 11 CATEGORIES OF MARKETING GOALS ...... 11 CATEGORIES OF TECHNICAL CAPABILITIES ...... 11 Join Our Global Community Resource ...... 12 WE INVITE PARTICIPATION...... 12 ACKNOWLEDGEMENTS...... 12 MARKETING GOALS...... 14 Market Priorities ...... 14 Categories of Marketing Goals ...... 15 LINUX AWARENESS AND CONFIDENCE ...... 15 GLOBAL ENTERPRISE SERVICES AND SUPPORT...... 15 WORKLOADS: SOLUTION STACK LAYERS AND WORKLOAD ENABLERS ...... 15 TECHNICAL TRAINING AND EDUCATION ...... 16 DEVELOPMENT COMMUNITY ...... 16 TOTAL COST OF OWNERSHIP...... 16 STABILITY...... 16 Guide to Marketing Goal Table Entries...... 17 Market Goal Tables...... 18 LINUX AWARENESS AND CONFIDENCE ...... 18 GLOBAL ENTERPRISE SERVICES AND SUPPORT SERVICES ...... 29 WORKLOADS: SOLUTION STACK LAYERS AND WORKLOAD ENABLERS ...... 36 TECHNICAL TRAINING AND EDUCATION ...... 51 DEVELOPMENT COMMUNITY ...... 54 TOTAL COST OF OWNERSHIP...... 61 STABILITY...... 64 TECHNICAL CAPABILITIES ...... 68 Technical Overview...... 68 Description of Technical Categories ...... 68 SCALABILITY...... 68 PERFORMANCE...... 68 RAS (RELIABILITY, AVAILABILITY, SERVICEABILITY) ...... 69 MANAGEABILITY...... 69 ...... 69 CLUSTERS...... 69 STANDARDS...... 70 SECURITY...... 70 USABILITY ...... 70 Priority One Technical Capabilities ...... 71 Guide to Technical Capability Table Entries...... 78 GENERAL TABLE FORMAT...... 78 MATURITY LEVEL DEFINITIONS ...... 79 Technical Capability Tables ...... 80 SCALABILITY...... 80 CPUs—1 way...... 80 CPUs—2 way...... 80 CPUs—4 way...... 81 CPUs—-8 way...... 81

2

CPUs—16 way...... 82 CPUs—32 way...... 82 CPUs—64 way...... 83 Network I/O–-10/sec ...... 83 Network I/O—100/sec...... 83 Network I/O–-1000/sec ...... 84 Network I/O—10Mb/sec...... 84 Network I/O—100Mb/sec...... 84 Network I/O–-1000Mb/sec ...... 85 Network I/O—10Gb/sec ...... 85 Network—Sendfile ...... 85 Network—Copyless Send/Receive...... 86 Network—Scalable Poll ...... 86 Network—Asynchronous I/O...... 87 Network—Segment Offloading ...... 87 Network—Checksum Offloading...... 88 Network—Denial of Service Protection...... 88 Network—High Speed Routing (Especially IPV6) ...... 88 Network—Better Quality of Service and Queuing...... 89 Network—APIs for High Speed Interconnect...... 89 Network—Support for High Speed TCP ...... 89 Disk I/O Connectivity— 2 Storage Devices ...... 90 Disk I/O Connectivity— 8 Storage Devices ...... 90 Disk I/O Connectivity—12 Storage Devices ...... 90 Disk I/O Connectivity—256 Storage Devices ...... 91 Disk I/O Connectivity—4096 Storage Devices ...... 92 Disk I/O Connectivity—8K Storage Devices ...... 93 Disk I/O Maximum File Size—160GB...... 94 Disk I/O Maximum File Size—1TB...... 94 Disk I/O Maximum File Size—16TB...... 94 Disk I/O Maximum File Size—32TB...... 94 Disk I/O—625/sec ...... 95 Disk I/O—5000/sec ...... 95 Disk I/O—80,000/sec ...... 95 Disk I/O—160,000/sec ...... 96 Disk I/O Throughput—40MB/sec ...... 96 Disk I/O Throughput—300MB/sec ...... 96 Disk I/O Throughput—5GB/sec ...... 97 Disk I/O—Scalable Disk Locking ...... 97 Disk I/O—...... 97 Disk I/O—Vectored I/O ...... 98 Disk I/O—Async I/O (Raw) ...... 98 Disk I/O-–Async I/O—File System...... 99 Disk I/O-–Direct I/O—Raw ...... 99 Memory—1GB ...... 100 Memory—4GB ...... 100 Memory—8GB ...... 100 Memory—16GB ...... 100 Memory—64GB ...... 101 Memory—256GB ...... 101 Memory—1TB ...... 101 Layered Software—ERP...... 102 Layered Software—SCM ...... 102 Layered Software—CRM ...... 102 Layered Software—MRO...... 103 Layered Software—SFA ...... 103 Layered Software—Java...... 103 Layered Software—ORB ...... 103

3

Layered Software—RDBMS ...... 104 I/O Interface–-PCI ...... 104 I/O Interface–-PCI-X...... 105 I/O Interface–-InfiniBand ...... 105 Kernel—Huge Number of Threads ...... 106 Kernel—Memory ...... 106 Kernel—CPU...... 106 Kernel—I/O Interface ...... 106 Kernel—Node...... 107 Non-Uniform Memory Access (NUMA) APIs ...... 107 Non-Uniform Memory Access (NUMA)Topology ...... 108 Symmetric Multi-Threading (e.g. Hyperthreading)...... 109 PERFORMANCE...... 110 Packet Tests ...... 110 Forwarding/Firewall Test...... 110 Load Balancing ...... 110 Security Keys/Second...... 111 File Server...... 111 Web Server ...... 111 Mail Server ...... 112 Directory Services...... 112 (NFS) V2/V3 Performance Server/Client...... 113 Network File System (NFS) V4 Performance and Functionality...... 114 GCC Optimizations ...... 115 Java Performance...... 116 Data Base Connection Performance ...... 117 File System Performance...... 118 Large Multi-Task Performance...... 119 Performance Features & APIs ...... 119 Port Quality ...... 120 Middleware Performance— Open Connectivity (ODBC) ...... 120 Application Performance...... 121 Workloads: Online Transaction Processing (OLTP)—4 Processors or Less ...... 122 Workloads: Online Transaction Processing (OLTP)—Greater than 4 Processors.. 122 Workloads: Decision Support System (DSS)—4 Processors or Less ...... 122 Workloads: Decision Support System (DSS)—Greater than 4 Processors...... 123 Workload—ECommerce ...... 123 Workload—Financial (Trades) ...... 123 Tunable Parameters...... 124 Current Technology Implementations...... 124 Performance Measurement Infrastructure ...... 124 RELIABILITY, AVAILABILITY AND SERVICEABILITY (RAS)...... 125 Crash Dump ...... 126 Package Change History/Logging ...... 127 Hardware Change History/Logging...... 127 Customization History/Logging ...... 127 Install on Alternate Target...... 128 Revert Installation/Patch Sets if Failure...... 128 Version and Integrity Checking...... 129 Update Notification— Security ...... 129 Update Notification—Data Corruption...... 130 Debugger—Kernel ...... 131 Debugger—Application ...... 132 Dynamic Tracer...... 133 Hardware Fault Prediction and Fault Location...... 135 Software Fault Location Identification...... 137 Live Snapshot—Kernel Level...... 138 Live Snapshot—Process Level ...... 138

4

Proactive System Health Monitoring...... 139 Remote Serviceability ...... 140 Performance Monitoring...... 141 Reproducible Server Replacement— Procedural...... 141 Reproducible Server Replacement—Automated...... 142 System Checkpoint/Server Replacement ...... 142 Hot Swap: I/O Bus Level—PCI, PCI-X, cPCI...... 143 Hot Swap: I/O Bus Level—SCSI...... 144 Hot Swap: I/O Bus Level—PCI Express ...... 145 Hot Swap: I/O Bus Level—USB...... 146 Hot Swap: I/O Bus Level—iSCSI...... 146 Hot Swap: I/O Bus Level—InfiniBand ...... 147 Hot Swap: I/O Bus Level—Serial Advanced Technology Attachment (S-ATA)...... 147 Hot Swap: I/O Bus Level—Firewire ...... 148 Hot Swap: Component Level—I/O...... 148 Hot Swap: Component Level—Memory Remove...... 149 Hot Swap: Component Level Memory Add...... 150 Hot Swap: Component Level—CPU...... 151 Hot Swap: Component Level—Node...... 152 Component Notification: Mem/IO/Power Failure, Temperature...... 153 Fast System Boot...... 154 Fast Install...... 154 Journaling File Systems...... 155 Reliable File System Writes ...... 156 Atomic File System Operation ...... 157 Failover—Network Services...... 158 Failover—Stateful...... 158 Failover—Open RDBMS Dependent ...... 158 Failover—Proprietary RDBMS Dependent ...... 159 Multipath I/O...... 160 Backup Solution—GB Range...... 161 Backup Solution—TB Range ...... 161 Volume Manager...... 162 & IPC Parameter Changes without Reboot ...... 162 Replacing Modules without Reboot ...... 163 Graceful Handling of Memory Exhaustion ...... 163 MANAGEABILITY...... 164 Local Software Stack Install and Update (Pull)...... 164 Remote Multi-System Software Stack Install, Update or Replication (Push) ...... 164 Common Interface for Third Party Integration to Install Tools ...... 165 Software Package Management...... 166 Software Package Management 1—Local Software Stack Install and Update...... 167 Software Package Management 2—Reversion of Software Installs and Updates.. 168 Software Package Management 3— Remote Multi-system SW Stack Install...... 169 Software Package Management 4—Package Content Identification ...... 170 Configuration Management (Expanded to Full Stack)...... 171 Volume Management...... 172 Device Configuration Discovery...... 172 Persistent Storage Device Naming...... 173 Remote Console Access...... 174 Job Management ...... 174 Problem Management...... 175 Remote Management ...... 175 Network Management...... 176 User Management...... 176 Log Monitoring/Event Notification/Agents...... 177 Resource Management—Asset Management...... 177 Resource Management—Usage Tracking...... 178

5

Resource Management—Load Balancing and Tracking ...... 178 Workload Management...... 179 Process and Resource Monitoring...... 180 Enhanced Process and Resource Monitoring ...... 181 System Error Management ...... 182 Scripting—Development Tools ...... 182 Capacity Planning ...... 183 VIRTUALIZATION ...... 184 Run Application Software Unmodified ...... 184 Application Separation (security) ...... 184 Full Virtualization – Unmodified Guest OS ...... 185 Full Virtualization - Performance with Unmodified Guest OS ...... 185 Paravirtualization – Run a Paravirtualized Guest ...... 185 32-bit Linux Guest on 32- bit Hardware ...... 186 32-bit Guest on 64-bit Hardware...... 186 64-bit Guest on 64-bit hardware (or 64-bit host/64-bit hardware)...... 187 64-bit Guest on 32-bit Host OS and 64-bit Hardware...... 187 Windows 32-bit Guest on 32-bit Hardware ...... 187 Windows 32-bit Guest on 64-bit Hardware ...... 188 Windows 64-bit Guest on 64-bit Hardware ...... 188 Other Guest OSs (Solaris, Netware, Linux 2.4.x…) ...... 188 Architecture Support – Support for X86-64...... 189 Architecture Support – Support for IA64...... 189 Architecture Support – Support for IA32...... 189 Architecture Support – Support for Power PC-64...... 189 Architecture Support – Support for Power PC-32...... 190 SMP Support – 2-CPU Host ...... 190 SMP Support – 4-CPU SMP Host...... 190 SMP Support – 8-CPU SMP Host...... 190 SMP Support – 16-CPU SMP Host...... 191 SMP Support – 32-CPU SMP Host...... 191 SMP Support – 64-CPU SMP Host...... 191 SMP Support – Greater than 64-CPU SMP Host ...... 191 Multi-core Support – 2-CPU SMP Host...... 192 Multi-core Support – 4-CPU SMP Host...... 192 Multi-Core Support – 8-CPU SMP Host...... 192 Multi-core Support – 16-CPU SMP Host...... 192 Multi-core Support – 32-CPU SMP Host...... 193 Multi-core Support – 64-CPU SMP Host...... 193 Multi-core Support – Greater than 64-CPU SMP Host ...... 193 Plug-in Schedulers...... 194 SMP Guest on SMP Host ...... 194 Non-SMP Guest on SMP Host...... 194 Non-SMP Guest on Non-SMP Host...... 194 SMP Guest on a Non-SMP Host...... 195 Shared Drivers: Network Interfaces...... 195 Shared Drivers: Graphics (Including AGP) ...... 196 Shared Drivers: Storage ...... 196 Shared Drivers: Miscellaneous Others ...... 197 Pass Thru: Network Interfaces...... 197 PassThru: Graphics (Including AGP)...... 198 Pass Thru: Storage ...... 198 Pass Thru: Miscellaneous Others...... 198 Hardware Pass Thru: Network Interfaces...... 199 Hardware Pass Thru: Graphics...... 199 Hardware Pass Thru: Storage ...... 199 Hotplug CPU ...... 200 Hotplug memory...... 200

6

Hotplug I/O bus ...... 200 Metering ...... 201 Quality of Service Support ...... 201 Automatic Resource Balancing...... 201 Checkpoint/Restart Support for Guest OS...... 202 VM Migration to Another Physical Machine ...... 202 Save/Restore of a Guest OS ...... 202 CIM Support ...... 202 Tie-in to Legacy Data Center Management Tools ...... 203 Enhanced Serviceability...... 203 VM Clustering...... 203 Virtual Network Provisioning, NAT...... 203 Remote Console ...... 204 Console Not Required...... 204 Single Kernel Binary for Paravirtualized Solutions ...... 204 Dynticks...... 205 Large Page Support in the VM...... 205 Debugger ...... 205 Conversion Tools ...... 206 Test Suites for Validation and Regression Testing...... 206 VM Aware Performance Tools ...... 206 CLUSTERS...... 207 Administrative–User Management...... 207 Administrative—Software Deployment ...... 207 Administrative—Software Upgrade...... 208 Administrative—Central Cluster (Log/Notification/Monitoring) ...... 208 Administrative—Cluster Commands...... 208 Cluster-Wide Persistent Storage Device Naming ...... 209 Cluster Volume Management ...... 209 Cluster File System...... 210 Group Messaging...... 210 Event Notification ...... 211 Checkpoint ...... 212 Kernel—...... 213 Membership ...... 214 Communications ...... 215 Load Balancing—Connection Based ...... 215 Load Balancing—Wide-Area Network (WAN) ...... 216 Load Balancing—Resource Based...... 216 Load Balancing—Dynamic Balancing...... 216 HA—Failover: Transaction Based...... 217 HA—Failover: Continuous ...... 217 Single-System Image—Process ...... 218 —File System View...... 218 Single System Image—I/O...... 218 Single System Image—User/Group...... 219 STANDARDS...... 220 (LSB) 2.0 Compliance ...... 220 Linux Standard Base (LSB) 3.0 Compliance ...... 221 CIM...... 222 Simple Network Management Protocol (SNMP) through v3...... 222 Internet Protocol (IP)...... 223 Internet Protocol—SEC...... 223 Intelligent Platform Management Interface (IPMI) ...... 224 SAF—AIS...... 224 SAF—HPI...... 225 Advanced Configuration and Power Interface (ACPI) ...... 225 Globalization/ Internationalization...... 226

7

Open Printing ...... 226 I/O Interface—Peripheral Component Interconnect (PCI) ...... 226 I/O Interface—PCI-X ...... 227 I/O Interface—InfiniBand...... 227 SECURITY...... 228 User Stack Overflow Protection...... 231 User and System Stack Not Executable...... 231 Linux Security Module (LSM) Support ...... 232 System Integrity Check ...... 232 Static Analysis Tools...... 233 Run Time Analysis Tools ...... 233 Fast Security Fix Process ...... 234 Discretionary Access Control...... 234 Mandatory Access Control...... 235 Restrict Net Access...... 235 Distributed User Authentication ...... 235 Process Rights Management...... 236 Application Isolation (Vserver) ...... 236 Encryption Per File...... 236 While Disk Encryption ...... 237 Security Auditing ...... 237 Security Auditing: Tamper Evident Audit Logs ...... 237 Intrusion Detection: Detect File Tampering ...... 239 Trusted Applications: Signed Applications...... 239 Trusted System: Signed System Drivers ...... 239 Trusted System: Signed Libraries...... 240 Signed Kernel...... 240 Core Root of Trust...... 240 Trusted Network Connect ...... 241 Filter Incoming/Outgoing Ports ...... 241 Filter Forwarding Traffic ...... 241 Filter Control Protocols ...... 242 Level 3 + Filtering...... 242 Application Level Filtering ...... 242 Integrated Cryptographic Framework ...... 243 Authentication and Access Control...... 243 Full Peer Domain ...... 243 Active Directory Server ...... 244 USABILITY ...... 245 Common Command Line Administration across Distributions...... 245 Third Party Software Integration ...... 246 Migration Tools...... 247 GENERAL REFERENCES...... 248

8 Data Center Linux Capabilities What's New? Revisions of This Document

Revisions of the Data Center Linux Capabilities Document

Version 1.0 • The original DCL Technical Capabilities document February 2004

Version 1.1 • The scope is expanded to include Marketing Goals. February • The Marketing Goals and Technical Capabilities are synchronized. 2005 • The acknowledgements include technical and marketing participants.

• The Technical Capabilities section is reorganized: Priority One and Priority Two items are listed together for each category. A table of Priority One items links to their descriptions for easy reference.

• Input from the DCL Japan work group is included throughout the document.

• The new Security section approach is based upon feedback from the security community and the security SIG.

• Priorities shifted for some items during 2004.

• The new 2.6 version of the kernel is reflected in the Technical Capability maturities. A new level of technical maturity, Product Available, indicates at least one distribution or ISV provides a solution.

• There are new technical entries that are submitted for clustering, networking and system monitoring.

• Errata – several omissions and changes were adopted via the DCL committees.

Version 1.2 The following have been added: March 2006 • A new Security section based on feedback from the Security SIG.

• Updated information for technical capabilities, in particular the maturity of items based on features in Linux 2.6.15.

• A new Virtualization section that focuses on capabilities for Linux as a guest .

• Errata – several omissions and changes to update DCL direction and the Marketing Goals.

9 Data Center Linux Capabilities

Data Center Linux Initiative Goals

The Data Center Linux (DCL) Initiative is hosted by the non-profit Open Source Development Labs (OSDL). OSDL has embraced the mission to provide a forum for industry leaders to accelerate development and adoption of Linux in data centers that use multiprocessor servers as platforms for a variety of mission-critical, global enterprise applications and . Toward that goal, the DCL Initiative, consisting of global OSDL member companies, industry leaders, IT technology vendors, end-users and interested individuals, identifies capabilities required for data center operation and assembles them into an ever-evolving prioritized list. DCL Initiative working groups then track and promote customer availability of features and services that provide these capabilities on Linux, making it a viable choice in enterprise data centers. is examined across a broad spectrum that represents many industries. In the DCL Technical Capabilities 1.0 document, the initiative focused purely on technical issues for these tiers: edge servers, application servers and database servers. We recognized, however, that not all inhibitors to Linux adoption are technical. In the next DCL Goals and Capabilities 1.1 document, the Initiative expanded the scope to examine some market drivers for data centers and how the drivers correspond to the Technical Capabilities. This expanded analysis has exposed additional gaps that can now be addressed by solutions. Based upon the feedback received and based on new functionality with the release of the Linux 2.6.15 kernel, we have made updates to the DCL Capabilities. The intent of the DCL Working Group is to use this document to stimulate discussion and public review. This document isn't a requirements document or a specification that would facilitate building solutions. It is a list of capabilities, goals and priorities we will use to help us best focus our efforts to achieve our goal of Linux adoption for mission critical applications deep in the data center.

10 Data Center Linux Capabilities

The Market and Technical Scope of This Document This version includes Marketing Goals and Technical Capabilities across a broad range of categories.

Categories of Marketing Goals • Features that meet the expectations for global services and support for the Linux platform • Features that support key workloads for the data center, which span several software layers • Features required by workload enablers like independent software vendors (ISVs) who provide solutions for the data center • Services for training and education that help expand usage of the Linux platform • Features and support that provide a level of stability expected for demanding enterprise usage • Promotion of a pervasive awareness and confidence in the overall Linux solution to end-user companies who deploy applications • A total cost of ownership of the Linux solution that meets the needs of companies acquiring solutions delivered on the Linux platform.

Categories of Technical Capabilities • Features to meet or exceed performance required for data center applications • Features needed so that Linux solution stack can scale on enterprise-class-server hardware environments • Functionality to drive manageability of data center servers • Customer visible capabilities needed when deploying Linux as a guest OS in a virtualized environment • Functionality to achieve reliability, availability, and serviceability(RAS) necessary for data center applications • Standards of security that meet or exceed those provided by existing operating system alternatives • Standards that encourage end-user, ISV and third-party software adoption of Linux • Functionality that provides a high degree of usability for activities requiring human interaction • Functionality to allow the clustering of resources for data center applications

11 Data Center Linux Capabilities

Join Our Global Community Resource

We Invite Participation OSDL invites participation in this process at the following site: http://www.osdl.org/projects/dcl. You may join the discussion on the dcl_discussion mailing list: http://lists.osdl.org/mailman/listinfo/dcl_discussion. You can receive informational postings about the current status of DCL on the dcl_info list: see http://lists.osdl.org/mailman/listinfo/dcl_info. Direct inquiries about other opportunities for involvement or about the specific technical processes used to create DCL Capabilities to the DCL Roadmap Coordinator, mailto:[email protected]

Acknowledgements Contributors to each section of this capabilities definition include the following:

Technical Contributors Trent Shue – Bull Ric Wheeler – EMC Jack Lo – EMC Pratap Subrahmanyam – EMC Dong-Jae Kang – ETRI Chei-Yol Kim – ETRI Ping-Hui Kao – Hewlett Packard Martine Silbermann – Hewlett Packard Hisashi Hashimoto – Hitachi Yumiko Sugita – Hitachi Gerrit Huizenga - IBM Greg Kroah-Hartman – IBM Emily Ratliff – IBM Julie Fleischer – Rusty Lynch -Intel Inaky Perez-Gonzalez – Intel Mats Wichmann – Intel Hiro Yoshioka – Miracle Linux Clyde Griffin – Ross Maxfiel – Novell Ed Reed – Novell Isao Arakawa – NTT Yusuke Hori – NTT COMWARE Yoshifumi Manabe – NTT Hiroshi Miura – NTT DATA INTELLILINK Keisuke Mori – NTT DATA INTELLILINK Ken-ichi Okuyama – NTT DATA INTELLILINK Miyoshi Omori – NTT COMWARE Koichi Suzuki – NTT DATA INTELLILINK John Cherry – OSDL Lynn de la Torre – OSDL Steve Hemminger – OSDL Mary Edie Meredith – OSDL

12 Data Center Linux Capabilities

Craig Thomas – OSDL Chris Wright – RedHat Bruce Vessey - Unisys

Marketing Contributors Marc Miller – AMD Andrew Bowles – Bakbone Doug Mason – Hewlett Packard Hisashi Hashimoto - Hitachi Andy Wachs – IBM Jim Wasko – IBM Rammohan Peddibhotla – Intel Matt Semenza – Intel Hiro Yoshioka - Miracle Linux J.D. Nyland – Novell Tracy Thayne – Novell Isao Arakawa – NTT Yoshifumi Manabe - NTT Yusuke Hori - NTT COMWARE Miyoshi Omori - NTT COMWARE Hiroshi Miura – NTT DATA INTELLILINK Lynn de la Torre – OSDL Alex Doumani – OSDL Derek Rodner - Unisys

13 Data Center Linux Capabilities

Marketing Goals

One of the functions the Data Center Linux (DCL) Marketing team undertakes is market analysis, which helps drive DCL Initiative efforts. The DCL Marketing team has specified Marketing Goals and divided them into categories. Within each category, Priority One Marketing Goals are considered the most important for Data Center Linux mission critical readiness, and Priority Two Marketing Goals are presented to stimulate thought and discussion.

Market Priorities The market analysis performed by the DCL Marketing committee during the 1.1 document timeframe was done at a broad level. This was undertaken by looking at data center needs from a high-level perspective and isolating common characteristics within large data centers. The committee realizes the need to drive this analysis deeper moving forward, by analyzing specific industry needs. This will allow the committee to further refine the analysis of market drivers and inhibitors to adoption of Linux in the data center. Based upon our current analysis, there are seven categories that the DCL Marketing committee has defined as part of that analysis. Our current focus is on the following three categories, which are driving the adoption of the Linux solution. The categories also illustrate the challenges that are faced.

• Workload Enablers (subset of the Workloads section)—Independent software vendors (ISVs) who deliver solutions for the data center must be successful in delivering their solutions on Linux. • Total Cost of Ownership—The total cost of ownership for an enterprise-class solution must meet the expectations of companies acquiring and maintaining those solutions for their deployments. • Stability of the Platform—The Linux solution must deliver overall stability as a well-supported platform for mission-critical applications. What follows is a list of all of the Marketing Goals, by category.

14 Data Center Linux Capabilities

Categories of Marketing Goals Data Center Linux Marketing Goals are grouped into one of the following categories. Each goal is described individually within a category.

Linux Awareness and Confidence Linux awareness refers to general awareness of the Linux platform’s suitability for use in data center enterprise applications. Linux confidence refers to enterprise company confidence in deploying Linux as a strategic platform in a mission critical setting. End-user companies implementing Linux-based solutions are the primary audience for most of the goals related to awareness and confidence. However, some deliverables help form partnerships and strengthen the overall ecosystem surrounding Linux.

Global Enterprise Services and Support This category describes the services and support that will make an enterprise-class deployment of the Linux operating system successful.

Workloads: Solution Stack Layers and Workload Enablers Our goal within the data center ecosystem is to enable broad Linux adoption in enterprise data centers. One way to help accomplish this goal is to provide for wide availability of key workloads under Linux that span all common enterprise mission critical requirements. This section contains two sub-sections: Solution Stack Layers and Workload Enablers. The Data Center Linux Marketing team has defined a workload to be a customized combination of stack layers built to meet a business requirement. Stack layers are a combination of various software components layered between an operating system and an end-user software solution. Independent software vendors (ISVs) produce software solutions to provide functions within stack layers. An enterprise typically requires targeted applications specific to its vertical industry segment. Some vertical applications are developed in-house, but many others, especially those in market segments of significant size, are offered by independent software vendors. Most major industry segments contain at least a handful of vertical solutions. Examples include the following:

• Financial & Insurance (analysis tools, wealth management) • Manufacturing (mechanical design, electronic purchasing, manufacturing test) • Engineering (electronic design automation, development tools, project management) • Health Care Providers (hospital management, medical history) • Retail (stock handling, gift card management)

15 Data Center Linux Capabilities

Here are some details regarding the Workloads section:

• The individual stack layers that are defined the table provided in the Workloads category section. In addition, workload enablers, such as ISV software solutions required to meet business needs are also addressed in that section. • Workloads are a useful tool to analyze and communicate a cohesive Data Center Linux goal. We have identified the use of workloads as a strategy to accomplish further market analysis. However, in the scope of this document, we are outlining an overall approach as opposed to specific workloads.

Technical Training and Education This category addresses training for all technical users, such as system administrators and developers. End-user training is related to desktops and therefore is not in scope for Data Center Linux.

Development Community This category encompasses goals that would enable development under Linux, through proliferation of development, diagnostic and migration tools for multiple distributions and through developer programs.

Total Cost of Ownership Within this document, the term “total cost of ownership” is used in the broadest sense. The category addresses all aspects of the costs and benefits associated with Linux platform ownership. This initial description is required to illustrate that the perceived and actual costs of Linux solution ownership are a market driver.

Stability In order for enterprises to seriously consider deploying Linux for mission critical settings in their data centers, the Linux operating system, stacks and applications must meet a high level of stability requirements. Stability is a key selection criterion for most enterprises, and it is especially critical for larger corporations.

16 Data Center Linux Capabilities Guide to Marketing Goal Table Entries Each table within the category sections describes a unique Marketing Goal. This is the format of the tables:

Marketing Goal Name

The name is a short description of the goal. This name is used in all internal and external communication. Orange table headers signify Priority One goals, and gray table headers signify Priority Two goals.

ID Number Priority Level 1 or 2 Blank Each capability is assigned an identifier in the Priority One goals are most This field is reserved for format of XX.number. important for Data Center future use. Linux readiness. Table The XX abbreviation indicates the Marketing headers are orange. Goal category, as defined here: Priority Two goals are AC: Linux Awareness and Confidence presented to stimulate GES: Global Enterprise Services and Support thought and discussion. Table headers are gray. WL: Workloads TTE: Technical Training and Education DC: Development Community TCO: Total Cost of Ownership STBL: Stability

Description This section describes the DCL Marketing team’s goal and scope for the Marketing Goal.

Metrics Each metric listed is a measurable accomplishment by which to gauge the extent of completion of the action plan for the Marketing Goal.

References References often include links to website pages that specifically address topics related to the Marketing Goal.

17 Data Center Linux Capabilities

Market Goal Tables The detailed descriptions of the DCL Marketing Goals follow, organized by the marketing categories previously described.

Linux Awareness and Confidence Linux awareness refers to general awareness of the Linux platform for use in data center enterprise applications. Linux confidence refers to enterprise company confidence in deploying Linux as a strategic platform in mission critical settings. The primary audience for most of the deliverables related to awareness and confidence is end- user companies implementing Linux-based solutions. However, some deliverables help form partnerships and strengthen the overall ecosystem surrounding Linux.

Data Center Linux Message

AC-1. Priority Level 1

Description Determine the DCL message for each target audience, and ensure OSDL and its members communicate it. The following message has been developed and communicated for the end-user audience: There is ubiquitous use of Linux in application and database server environments, yet this usage does not compromise DCL usage within the edge environment. Develop a message for these audiences within each important industry segment: developer, independent software vendor (ISV), independent hardware vendor (IHV) and system integrator (SI), and IT influencers such as trade analysts, standards bodies, or consultants.

Metrics:

• There is an ability to determine the pervasiveness of the message in the ecosystem. However, this metric is constrained by the level of effort required to measure this.

References OSDL documents: http://www.osdl.org/lab_activities/data_center_linux/articles.html/document_view

18 Data Center Linux Capabilities

Return on Investment

AC-2. Priority Level 1

Description Make it possible for companies who are considering deploying a specific application or application set on Linux to examine the return on investment of deploying the Linux stack.

Metrics

• A variety of measurement techniques are used in the marketplace. Two common techniques are Acquisition Cost and Total Cost of Ownership. However, there is no common approach within the Linux ecosystem that has produced a definitive study reflecting the price and performance of the Linux solution.

References Total Cost of Ownership section in DCL Capabilities 1.2 document.

19 Data Center Linux Capabilities

General Outreach and Collateral

AC-3. Priority Level 1

Description Ensure OSDL and its member companies support Linux in the data center through outreach. The support will help ensure that the following important DCL Initiative messages are received by the Linux ecosystem.

• DCL goals and capabilities are well prioritized, as reflected by this document.

• Linux usage is dominant on a variety of enterprise platforms, as reflected in outreach and collateral.

• Server usage is a more strategic decision for large Fortune 500 companies. Member information is gathered and leveraged to help build this point. Note: The Data Center Linux Data Sheet is published, and it meets many of these goals.

Metrics

• Metrics are developed that associate increased Linux deployments with outreach and collateral.

• Note: Some metrics exist for server deployments, but there is not a well-referenced metric used across the Linux industry for this.

References Data Center Linux Data Sheet: http://www.osdl.org/lab_activities/data_center_linux/articles.html/document_view

20 Data Center Linux Capabilities

Outreach to Information Technology (IT) Management: Conferences and Newsletters

AC-4. Priority Level 1

Description In order to gain acceptance of Linux as a development platform, reach out to the IT managers in companies that are deploying Linux.

Metrics

• Electronic newsletters exist.

• The OSDL DCL and DCL member websites should publish customer success stories for Linux

• Webcasts featuring topics of interest to implementers of Linux-based solutions occur.

• Specific Linux deployment guides, hints/tips and white papers exist to illustrate particular Linux solution sets.

• Linux “best practices” for deployments of solutions on Linux are developed.

• Participation occurs in industry conferences surrounding enterprise solutions in edge, applications and database/data warehouse areas.

21 Data Center Linux Capabilities

Performance Proofs and Benchmarks

AC-5. Priority Level 1

Description Help demonstrate that Data Center Linux solutions perform favorably as measured by widely accepted performance proofs and benchmarks. This capability encompasses both informal performance testing (performance proofs) and formal performance testing (benchmarks). A performance proof is a relatively informal benchmark that illustrates one or more aspects of a solution. The proof might consider single components, complete stacks or subsets of stacks. A benchmark is a formal, rigorous benchmark used for specific purposes. It includes a formal testing and publishing methodology such as a System Performance Evaluation Cooperative (SPEC) methodology or a Transaction Performance Processing Council (TPC) methodology. Formal benchmarks are most important because they are widely recognized and referenced within data centers. In addition to benchmarks from organizations like SPEC and TPC, application-level benchmarks that are coordinated with independent software vendors are an important gauge for how critical applications perform in the data center.

Metrics

• Based upon standard, widely accepted benchmarks from organizations such as SPEC and TPC, Linux-based data center solutions occupy a competitive position when compared to UNIX. Linux solutions for database and application server workloads perform consistently within the top five positions of several key benchmarks (for example, TPC-C and TPC-H).

• Published scalability tests demonstrate that very large workloads perform favorably on Linux-based platforms.

• Linux data center solutions perform favorably as measured by the quantity of data (which relates to awareness) as compared to the quality of data (which relates to confidence).

References Database and application server benchmarks: Transaction Processing Performance Council: http://www.tpc.org/tpch/results/tpch_perf_results.asp?resulttype=all http://www.tpc.org/tpcc/results/tpcc_perf_results.asp SPEC HPC results: http://www.spec.org/hpc2002/results/

22 Data Center Linux Capabilities

Public Relations for Data Center Linux (DCL) Initiative

AC-6. Priority Level 2

Description OSDL and DCL member companies should generate and manage outreach programs by the following means:

• White papers and requirements/capabilities documents

• PR activities: Press releases related to DCL, trade show presence for DCL, and trade show presence for data center applications such as DataWarehouse, enterprise resource planning (ERP) and Java development

Metrics

• An acceptable total number of OSDL and member press releases occur each month.

• The reactions to press releases are measured, for example, the number of hits on the OSDL and DCL member-company web pages is tracked.

• OSDL and DCL member-company marketing and event plans include DCL-related items.

• The trade press discusses data center-related topics and cites press releases relating to DCL Initiative activities.

• Member company pages point to DCL initiative-related articles.

• DCL initiative and related special interest group (SIG) information is accessible on the Web. This includes the www.osdl.org site and developer.osdl.org.

References The articles section of the DCL osdl.org web page: http://www.osdl.org/lab_activities/data_center_linux/articles.html/document_view A page that links to the SIG pages on developer.osdl.org: http://groups.osdl.org/sigs

23 Data Center Linux Capabilities

Analyst Coverage and Meetings

AC-7. Priority Level 2

Description Drive analyst coverage of DCL and Linux adoption through meetings and Linux industry events.

Metrics

• Analyst briefing materials provide favorable press for the DCL initiative and enterprise applications on Linux

• DCL member companies work together to publicize the key DCL initiative messages within the analyst community.

• The quantity and quality of analyst coverage is good, both at trade shows and at specific analyst events.

Solution Collateral

AC-8. Priority Level 2

Description Target collateral material at particular data center-oriented solutions (such as data warehouse).

Metrics

• Customer success stories or case studies are published for the three major DCL tiers: edge, applications, and database.

24 Data Center Linux Capabilities

Success Stories: Testimonials

AC-9. Priority Level 2

Description Gather and present end-user success stories that illustrate successful deployments on the Linux platform.

Metrics

• Success stories for key data center applications on Linux are published.

Case Studies

AC-10. Priority Level 2

Description Present case studies that examine usage scenarios of data center-oriented solutions.

Metrics

• Case studies for key data center applications on Linux are published.

Published Research and White Papers

AC-11. Priority Level 2

Description Compile detailed analysis documents for specific, targeted purposes. For example, analyze performance on the latest version of the kernel.

Metrics

• OSDL staff, DCL members and other contributors within the ecosystem on osdl.org publish a compendium of DCL-related white papers.

• Favorable quantity and quality of white papers exists. The white papers illustrate or explain solutions for database and data warehouse, applications, and edge environments.

25 Data Center Linux Capabilities

Outreach to Executives: Conferences

AC-12. Priority Level 2

Description Ensure that OSDL and its DCL members take part in conferences targeted to an executive audience.

Metrics

• Visibility of Data Center Linux usage occurs at executive-level conferences.

References Open Source Business Conference: http://www.osbc.com

Outreach to Developers: Conferences and Discussion Groups

AC-13. Priority Level 2

Description OSDL and its DCL members should target outreach programs to application and kernel developers at conferences in order to gain acceptance of Linux as a platform for development. OSDL has formed special interest groups (SIGs) that provide an open forum for investigation into specific technical areas such as security, storage, and so on. One of the primary goals of these forums is to improve collaboration with the OSS community regarding these areas. Additionally, members should reach a broad array of developers in the community by participating in discussion lists that are targeted to a particular topic of interest.

Metrics

• There are active lists on list servers, for example, [email protected]

• High priority data center usage topics are considered important discussion points at Linux developer conferences

References Page that points to OSDL SIGs information on developer.osdl.org: http://groups.osdl.org/sigs

26 Data Center Linux Capabilities

Outreach to ISV/IHV/SI: Conferences and Forums

AC-14. Priority Level 2

Description Help to gain acceptance by independent software vendors (ISV), independent hardware vendors (IHV) and system integrators (SI) for deployment and support of the Linux platform.

Metrics

• Forums for discussion of issues important to porting of applications and support for Linux are developed and well publicized.

• Forums for discussion of issues related to supporting multiple architectures on Linux are developed and well publicized.

• To increase the visibility of Linux, a community voice within the ISV, IHV and SI communities is built.

International Outreach

AC-15. Priority Level 2

Description Gain acceptance of Linux as a global platform for international deployments of enterprise applications.

Metrics

• Each of the other goals within this Awareness and Confidence section has an international component.

27 Data Center Linux Capabilities

Open Source Business Model

AC-16. Priority Level 2

Description Present materials that explain, illustrate and show the benefits of the open system development model.

Metrics

• A clear business-level description of the open source development process as it applies to Linux is published.

• One or more white papers or articles illustrating the advantages of the open model are published. For example, a topic might be “Improved Security with Open Source Allows Immediate Remediation.”

References Linux Development Process (Graphic): http://www.osdl.org/newsroom/graphics/linux_dev_process_graphic.jpg

28 Data Center Linux Capabilities

Global Enterprise Services and Support Services This category describes the Services and Support needs that will make an enterprise-class deployment successful.

Global Support Infrastructure and Distribution

GES-1. Priority Level 1

Description Encourage vendor or partner organizations to staff for international sales and support of the Linux platform. Staffing includes native language, local time zone coverage and local pricing.

Metrics

packages can be ordered in each region.

• Distribution is localized based on OpenI18N standard. For example, local input method, font support, printing and more are localized.

• International companies with operations in multiple geographic areas can utilize one vendor to support a single Linux distribution.

References The DCL Marketing committee has identified at least 5 offerings that meet these metrics. This information is readily available by searching on the Web.

29 Data Center Linux Capabilities

Performance Diagnostic Services

GES-2. Priority Level 1

Description Support the availability of online diagnostic skills and tools necessary to provide services to support system-level performance optimization on multiprocessor systems. Missing tool components include the following:

• Per task statistics are insufficient. The introduction of NPTL threading has made the situation worse because the few per task statistics that are maintained are reported only per process. Thus visibility into individual threads is not achieved.

• A lightweight application program interface (API) for retrieving kernel data is required.

• Tools such as ltrace(1) need to work on all architectures. Strace(1) can't follow new threads across a clone(2) call, even on i386.

• More adequate interpretation of data is required in documentation regarding statistical reports by sar(1), iostat(1), vmstat(1) and top(1). For example, units of measure such as “blocks” should be eliminated, because they are not defined.

Metrics

• Multiprocessor kernel hooks are enabled in enterprise distributions.

• Kernel changes to support necessary kernel improvements for diagnostic tools are accepted in kernel.org kernel.

• A defined percentage of required tools availability is achieved. Tools include top, top2, sar, iostat and more.

30 Data Center Linux Capabilities

Product Maintenance

GES-3. Priority Level 2

Description Encourage regular Linux software stack product updates to existing releases, for a period of years and beyond. Updates should be based on generally available (GA) releases.

Metrics

• There is a regular release cycle of fix rollups (quarterly, semi-annually or annually).

• There are at least two worldwide and/or local distributions for each product release.

References The DCL Marketing committee has identified at least three offerings that meet these metrics. This information is readily available on the Web.

Operational Support

GES-4. Priority Level 2

Description Using standard, service-level agreements common to commercial-grade UNIX operating systems, encourage level one, two and three support (question and answer support and defect support) on a 7- day, 24-hour basis.

Metrics

• Two or more vendors offer all three levels of support in all major geographical regions.

References The DCL Marketing committee has identified at least three offerings that meet these metrics. This information is readily available on the Web.

31 Data Center Linux Capabilities

Migration Services

GES-5. Priority Level 2

Description Encourage availability of fee-based migration services from well-known proprietary operating systems to Linux.

Metrics

• More than one vendor offers migration services for Windows, OS2, UNIX variants and other proprietary operating systems.

• Services are available in multiple geographic areas.

References The DCL Marketing committee has identified at least three offerings that meet these metrics. This information is readily available on the Web.

Information Technology (IT) Outsourcing

GES-6. Priority Level 2

Description Nurture availability of outsourced offerings that can proficiently run data center workloads on Linux.

Metrics

• More than one vendor offers IT outsourcing.

• Outsourcing is offered in multiple regions.

• Outsourcing targeted at Linux solutions specifically exists.

References The DCL Marketing committee has identified at least 2 offerings that meet these metrics. This information is readily available on the Web.

32 Data Center Linux Capabilities

Disaster Recovery Services

GES-7. Priority Level 2

Description Encourage disaster recovery-service offerings in multiple geographic regions.

Metrics

• Two or more vendors offer services and delivery on Linux across a variety of distributions and architectures.

References The DCL Marketing committee has identified at least 2 offerings that meet these metrics. This information is readily available on the Web.

Certified Linux Engineer

GES-8. Priority Level 2

Description Foster Linux certification for IT support staff and administrators. Certification is obtainable through examination in local languages.

Metrics:

• Twenty or more enterprises use Linux certification as a job requirement, hiring advantage or promotional opportunity.

• In order to measure this objective, utilize feedback from the OSDL Linux User Advisory Council (LUAC) or other industry survey.

References The DCL Marketing committee has identified at least 2 offerings that meet these metrics. This information is readily available on the Web.

33 Data Center Linux Capabilities

System Integrators (SIs)

GES-9. Priority Level 2

Description Offer support provided by SI companies to build integrated solution stacks (hardware, operating systems, middleware and applications in a “turnkey” solution set). Encourage vendors to offer a support channel for system integrators (SIs) with Linux offerings.

Metrics

• In each major region, there are two or three integrators available per focus area.

• Two or more vendors support major system integrators.

References The DCL Marketing committee has identified at least three offerings that meet these metrics. This information is readily available on the Web.

Consulting

Business Consulting

GES-10. Priority Level 2

Description Foster business-consulting offerings that include Enterprise Linux solutions.

Metrics

• Two or more companies provide this service in multiple geographic areas.

References The DCL Marketing committee has identified at least 4 offerings that meet these metrics. This information is readily available on the Web.

34 Data Center Linux Capabilities

Security Consulting

GES-11. Priority Level 2

Description Support security consulting to evaluate and implement Linux security in the enterprise.

Metrics

• There is an ability to provide a suite of security offerings similar to those provided by proprietary operating systems.

• Two or more vendors provide the suites.

• The suites are provided in all major geographical regions.

References The DCL Marketing committee has identified at least three offerings that meet these metrics. This information is readily available on the Web.

35 Data Center Linux Capabilities

Workloads: Solution Stack Layers and Workload Enablers Our goal within the data center ecosystem is to enable broad Linux adoption in enterprise data centers. Within this document, we use the term data center to refer to server-based systems and not the desktop platform. The Desktop Linux (DTL) initiative is addressing the desktop platform. Interoperability capabilities across server and desktops will be evaluated for later inclusion. This would be in conjunction with the DTL initiative. One way to help accomplish this goal is to provide for wide availability of key workloads under Linux that span all common enterprise requirements. The Data Center Linux Marketing team has defined a workload to be a customized combination of stack layers built to meet a business requirement. Stack layers are a combination of various software components layered between an operating system and an end-user software solution. Independent software vendors (ISVs) and the open source community produce solutions to provide functions within stack layers. This section consists of two sub-sections: Solution Stack Layers and Workload Enablers. An illustration of those stack layers is described in this table:

Example of Stack Layers

Transaction Management

High Availability (HA)

Storage Management

Enterprise Application Integration (EAI)

Media Services

Systems and Network Management

Security Support

Portal

Business Process Integration

Data Management

36 Data Center Linux Capabilities

Solution Stack Layers Individual stack layers and ISV software solutions required to meet business needs are described in detail in the tables of this section

Transaction Management

WL-1. Priority Level 1

Description Nurture transaction management solutions, which manage transactions requested by application components that allow resource enlistment and delistment. Transaction management solutions can conduct a two-phase commit or recovery protocol with the resource managers.

Metrics

• An open transaction management solution is available.

• There is a breadth and depth of transaction management Linux offerings as compared to UNIX.

References http://jotm.objectweb.org http://sourceforge.net/projects/tyrex

37 Data Center Linux Capabilities

High Availability (HA)

WL-2. Priority Level 1

Description Foster 24-hours-a-day, 7-days-a-week proactive protection of hardware resources, applications, data and communication paths by recognizing faults before the system goes down or before users are affected. Provide efficient, continuous access to mission critical applications, information and services. Another approach to HA involves the use of clusters. Clustering data sharing technologies allows for the dynamic redirection of available servers, thus providing near-continuous application availability. Open source and proprietary HA solutions are available today.

Metrics

• As compared to UNIX, a breadth of HA solutions and a level of reliability, availability and serviceability (RAS) features exist in Linux.

• HA Solutions exist on Linux that meet mission-critical needs

References DCL Marketing has identified at least 2 solutions that meet the metrics. . This information is readily available on the Web.

Storage Management

WL-3. Priority Level 1

Description Encourage the use of automated management of storage networking resources, which allows for increased efficiency through centralized control. This will enable information management such that data is accessed in a time-critical manner and in a desired format (presentation). Data is recoverable in case of system failure.

Metrics

• As compared to UNIX, the following Linux storage management tools are available: backup and recovery tools (local and remote), asset inventory, array configuration and problem identification tools.

• As compared to UNIX, Linux support for network attached storage (NAS) and (SAN) devices is equally available.

38 Data Center Linux Capabilities

Enterprise Application Integration (EAI)

WL-4. Priority Level 1

Description Encourage the availability of a breadth of EAI components under Linux. Enterprise application integration is the process of bringing data or a function from one application together with that of another application to provide nearly real-time integration. Application integration is used in many areas, including B2B integration, customer relationship management (CRM) systems implementation, and the building of websites that leverage legacy systems.

Metrics

• As compared to UNIX, a breadth of EAI components is equally available.

Media Services

WL-5. Priority Level 1

Description Encourage services that provide quality streaming audio and video, piracy protection and enhanced commerce capabilities.

Metrics

• A breadth and depth of products is available as compared to Windows Media Services.

39 Data Center Linux Capabilities

Systems and Network Management

WL-6. Priority Level 1

Description Encourage systems and network management under Linux. Systems management is the ability to monitor, control and report the health of the entire information technology (IT) environment, including servers, storage, clients and printers. Network management is the ability to monitor, control and report the health of the network.

Metrics

• Compared to UNIX, a breadth of systems management tools is equally available.

• Top-tier systems management tools are available on Linux.

Security Support

WL-7. Priority Level 1

Description In the broad sense, encourage security support through many venues, including authentication, authorization, single sign-on, encryption and more.

Metrics

• A breadth of security support is offered which is equal or superior to all other offerings.

40 Data Center Linux Capabilities

Portal

WL-8. Priority Level 1

Description Encourage offerings that support a website featuring a suite of commonly used web-based services. The site should serve as a starting point and frequent gateway to the Web.

Metrics

• As compared to UNIX, a breadth of portal support services is offered, such as support for a website featuring a suite of commonly used services.

Business Process Integration

WL-9. Priority Level 1

Description Encourage offerings related to automation of strategic processes on Linux across a company’s applications, data and people.

Metrics

• As compared to UNIX, a robust breadth and depth of business process integration offerings are available.

Data Management

WL-10. Priority Level 1

Description Encourage a collection of tools that manage large structured sets of persistent data, offering a broad range of data management services, for example ad hoc querying to multiple users.

Metrics

• As compared to UNIX, a robust breadth and depth of data management tools are widely available.

41 Data Center Linux Capabilities

Workload Enablers

Top-Echelon (ISV-1) Software Solution Availability

WL-11. Priority Level 1

Description Encourage the furnishing and support of a set of core Linux software solutions to achieve the minimum critical mass required for major enterprises to seriously consider using Linux in their data centers. Available solutions should cover all three data center tiers: database, applications and edge. Edge tier solutions are well addressed by a variety of open source products. Database and application tier solutions are the focus area for this Marketing Goal. DCL has identified 15 top-echelon key software solutions that represent a core set required for most enterprises. For purposes of this document, we refer to them as ISV-1. The DCL Marketing committee maintains the ISV-1 list for analysis purposes only. For the database tier, the ISV-1 list includes the primary enterprise-grade commercial databases, which have been identified. For the application tier, the list includes enterprise resource planning (ERP) and customer relationship management (CRM), which are the key horizontal, enterprise-wide categories. The Middle Echelon software solutions (which we refer to as ISV-2) are the remaining solutions, required to support these areas: system management and infrastructure software, storage management software, application servers and a Java Virtual Machine (JVM) to round out the required foundation. The ISV-1 solutions do not represent all of the horizontal solutions used by large enterprise data centers. However, if these solutions are available and supported on the latest Linux releases, they will present a large enough footprint to have a significant impact toward Linux acceptance by major enterprises.

Metrics

• Top Echelon Database software solution availability is considered by many to be the key metric for the maturity of Linux in the enterprise.

• At a minimum, two of the three databases listed in ISV-1 have visible and current Linux support.

• At the application tier, all categories are present, and ideally all the software solutions listed in ISV-1 are available for Linux. At a minimum, for stacks with more than one software solution on the ISV-1 list, missing only one application is acceptable in the short term.

References See WL-12 Middle-Echelon (ISV-2) Software Solution Availability (Not Desktop) for details on ISV-2 software solutions.

42 Data Center Linux Capabilities

Middle-Echelon (ISV-2) Software Solution Availability (Not Desktop)

WL-12. Priority Level 1

Description In order for a top-echelon software solution to be usable by an enterprise, encourage sufficient representation of middle-tier supporting applications under Linux. In most cases, a major enterprise software solution is complemented by other solutions or components of the underlying solution stack. The collective becomes the software stack implemented by the enterprise. Some of these applications are vertical and unique to certain market segments, while others are horizontal and have become an integral part of most implementations. A good example of such a stack is the storage-handling, backup, performance monitoring and other data management applications that complement a core database. Another example is the broad cottage industry of add-on software solutions that complement each of the major enterprise resource planning (ERP) systems on the market today. These middle-echelon software solutions and middleware components are referred to as ISV-2. The DCL Marketing committee maintains the ISV-2 list for analysis purposes only. These software solutions are sometimes offered by the same independent software vendor (ISV) who offers the corresponding top-echelon software, but more typically they are available from different ISVs. The second order and middleware ISV vendor group is significantly larger in numbers than core application ISV vendors.

Metrics

• For each of the major software solutions identified in the DCL Top Echelon Software Solution Availability goal (WL-11) all required complementary software solutions needed to provide a complete industry solution are also available on Linux.

43 Data Center Linux Capabilities

Software Solutions Availability for Workloads (ISV-3)

WL-13. Priority Level 1

Description Encourage availability of more top-echelon and middle-echelon software solutions needed to build all stack layers (WL-1 through 10) under Linux. For a given workload, for example CRM, many enterprises may have different implementations that involve software solutions not listed in ISV-1 or ISV-2. For example, an enterprise might use a customer relationship management (CRM) solution other than the best-of-breed product on Linux, and the customer typically will not accept switching software solutions as part of the enterprises’ migration to Linux. In addition, the software listed in ISV-1 does not include all common workloads, so this category captures those that are missing.

Metrics

• The DCL Marketing committee will develop the ISV-3 list based upon future industry analysis.

• More than one independent software vendor (ISV) provides software solutions on Linux for each of the 10 most common workloads associated with that industry analysis

44 Data Center Linux Capabilities

Linux Standards Base (LSB) Support

WL-14. Priority Level 1

Description Encourage independent software vendors (ISVs) to certify their products against the Linux Standards Base (LSB). The Linux Standards Base (LSB) represents the only set of standards targeted to promote compatibility among Linux distributions and to enable software to run on any compliant system. The LSB specifies a detailed process for Linux software package certification and Linux distribution. Products fall into the following two classes: software package and runtime environments, the latter consisting of a Linux distribution or hosted environment. LSB software package certification is a self-test process by which the ISV demonstrates that the software executes correctly on the LSB Sample Implementation and two different LSB runtime environments. The steps for testing are followed exactly as defined by the LSB certification program. For the purposes of Data Center Linux, the two-runtime environments must be two enterprise Linux distributions that hold current LSB certification. Following initial certification, the ISV must maintain certification currency for their software packages. This requires re-certification after software changes and after expiry of the initial certification period as defined by the LSB.

Metrics

• Fifty percent of Linux enterprise software (part of ISV-1, ISV-2 or ISV-3) is certified to the most current LSB standard.

• At any point in time, 80% of previously certified software in a representative sample is found to be current in its certification.

References The LSB website: www.linuxbase.org LSB certifications: Certification information: http://www.freestandards.org/certify/

45 Data Center Linux Capabilities

Multiple Architecture Support

WL-15. Priority Level 1

Description In order to accelerate the adoption of Linux in the data center, foster maximization of the number of architectures supported by Linux independent software vendors (ISVs). Linux stack is supported on a variety of common processor architectures. Examples of common architectures include (but are not limited to) the following:

• X86

• Itanium

• Intel EM64T

• AMD64

• Power architecture

• Mainframe architectures A software solution may run under Linux on one or more of these architectures. It is typically insufficient for an ISV to indicate simply that its software run on Linux without specifying the architectures for which the software is available and supported. The number of architectures supported varies widely from solution to solution. As enterprises migrate to Linux, they are unlikely to change their hardware platforms simply to accommodate one or more software offerings.

Metrics

• ISVs specify which hardware architecture(s) are supported by each of their software products.

• At least 75% of Linux based software support more than one architecture

• At least 30% of Linux based software support three or more architectures

• Linux ISV software certifications on more than one hardware architecture

46 Data Center Linux Capabilities

Multiple Distribution Support

WL-16. Priority Level 1

Description Nurture availability and support of complete software solution stacks on multiple distributions. A number of enterprise-class server distributions have evolved to address the data center requirements of scalability, robustness, consistency and a high level of support. Some of these distributions target global coverage, while others are focused regionally. It is desirable to have software supported on multiple distributions. However, currently this is limited. Many software offerings are supported on only one enterprise distribution, and very few support more than one. This makes it difficult for an enterprise to find a complete solution stack that covers all their needs for running their distribution and version of choice.

Metrics

• Every ISV-1 software solution and one complete instance of a stack associated to that solution is available via at least two major global distributions and one regional distribution for major geographical regions.

• Eighty percent of Linux software offerings support more than one distribution.

• All Linux based software offerings indicate specifically which version numbers of enterprise distributions they support.

47 Data Center Linux Capabilities

Open Source Software Availability

WL-17. Priority Level 1

Description Encourage open source vendors to advance their software offering to levels of availability deemed acceptable to enterprise management. One of the benefits of adopting Linux is the ready access to a wide variety of open source software and solutions. In order for an enterprise to take full advantage of these open source software solutions, the applications must be enterprise ready. Some open source solutions, such as the Apache web server, have achieved the levels of reliability, scalability and stability required by enterprise data centers, and they are enjoying broad usage across market segments. Many other solutions are yet to evolve to a level acceptable to a broad cross section of enterprises. There are emerging areas where open source solutions are needed. One example is a need for an open framework to do transaction processing.

Metrics

• At least 50% of common open source solutions are enterprise-ready on Linux.

• Open Source projects initiated to address new areas where open source solutions are not commonly available.

48 Data Center Linux Capabilities

DCL Definition of Independent Software Vendor (ISV) Support for Linux

WL-18. Priority Level 1

Description Encourage every ISV that provides Linux-supporting software to communicate publicly its support matrix and its support policy. One important difference between enterprise users and the general Linux community is that enterprise management requires and typically demands a high standard of support from the ISVs providing software into the data center. To satisfy the needs of the enterprise user, at a minimum, ISVs must specify exactly which environments their software solutions support. Ideally this takes the form of a matrix delineating the ISV’s software versions, supported distribution revisions and processor architectures. The ISVs must also publicly communicate their support policies, including the following issues: what support means to the ISV’s customer, available methods for submitting support requests, response times honored by the ISV, how solutions to uncovered issues (patches or otherwise) are handled and released, the support period, and the extent of support provided for older versions of the software. An ISV may elect to offer a tiered approach to Linux support and to provide multiple levels of support with different policies and fee structures. In these cases, the support policy must be clearly articulated for each level of support.

Metrics

• Every software solution that supports Linux clearly and publicly includes a support matrix and support policy.

49 Data Center Linux Capabilities

Heterogeneous Systems Interoperability

WL-19. Priority Level 1

Description Linux within the data center is commonly deployed in a heterogeneous OS environment. In order to gain market acceptance in the enterprise, Linux should provide interoperability to some of the key platforms that are deployed. ISVs that provide applications to the data center should incorporate interoperability into the support for the Linux platform. In addition, some ISVs should be targeted at providing specific solutions that fill requirements for Linux. One example of this is Samba, which provides interoperability between and Linux.

Metrics

• Sufficient ISV solutions for Linux exist to enable basic interoperability between key platforms in the data center

References http://www.samba.org

50 Data Center Linux Capabilities

Technical Training and Education This category addresses training for all technical users, such as system administrators and developers. End-user training is not in scope for Data Center Linux.

Developer Training

TTE-1. Priority Level 1

Description Support training for company-specific software and application developer programs.

Metrics

• A targeted number and frequency of courses occur for application developers on Linux.

• Sufficient attendance occurs at the programs.

Administrator Training

TTE-2. Priority Level 1

Description Encourage certification-oriented Linux administrator training.

Metrics:

• Linux certifications exist.

• Enterprise class companies consider Linux certifications a valuable tool for evaluating staff.

51 Data Center Linux Capabilities

Internal Company Training (Intermediate Distribution Frame)

TTE-3. Priority Level 2

Description Foster outsourced or developed-in-house Linux training, hosted at business locations.

Metrics:

• As compared to other server platforms, Linux server training seat capacity is favorable.

• The quantity of Linux training programs is sufficient for enterprises that want to deploy the Linux stack.

• An acceptable number of Fortune 500 companies host courses for the Linux platform.

• Appropriate geographic coverage for training is in place.

Private Training Companies

TTE-4. Priority Level 2

Description Assure ready availability of privately provided training courses, both for certification and non-certification.

Metrics

• As compared to other server platforms, the defined Linux training seat capacity has been achieved.

• Private training companies provide Linux and Linux-stack course offerings.

52 Data Center Linux Capabilities

Colleges and Universities

TTE-5. Priority Level 2

Description Nurture courses at private and public colleges and universities on Linux and Linux-related topics. Include degree and non-degree programs at technical colleges.

Metrics

• A DCL defined number of institutions offer Linux courses.

• A favorable percentage of Linux versus other operating system courses are taught.

• A specified number of top-tier worldwide universities use Linux versus Windows.

• Top five universities offer Linux courses as part of degree programs in engineering.

• Executive education programs offer Linux as a topic.

53 Data Center Linux Capabilities

Development Community This category encompasses requirements that enable development under Linux, through proliferation of development, diagnostic and migration tools for multiple distributions and through developer programs.

Common Application Development Environment (ADE) and Integrated Development Environment (IDE) Tools

DC-1. Priority Level 1

Description Nurture a collection of development environments. Ideally, each environment will support multiple architectures, and the collection can support all Linux architectures. Having a variety of robust and well-supported development environments is required for accelerating the availability of applications under Linux. Application developers have come to expect the ready availability of well-integrated tools that simplify and accelerate their development and testing efforts. This applies to application development by third party independent software vendors (ISVs) and the myriad of custom applications developed in-house or commissioned by the enterprise. ISVs developing applications for the enterprise require integrated development environments (IDEs) that support all the major languages used within the enterprise. These include C, C++, and Java. Each IDE must include all the basic development tools such as an editor, integrated compilers, project build management, interactive debugger, code browser, and an integrated revision control system. Availability of advanced IDEs that include optional tools such as performance profilers, runtime error checking and code coverage analysis is also highly valuable to ISVs. A number of these tools are discussed in more detail as separate requirements below. Application developers within the enterprise will benefit more from a higher-level application development environment (ADE) that simplifies targeted development such as web-based applications or transaction processing applications. ADEs with full support for web applications, transaction management and connectivity will increase the attractiveness of Linux to the enterprise information technology (IT) organization. The ADEs should support common web development architectures such as Jakarta or STRUTS. Ideally, a complete IDE or ADE is cross-platform and it supports all Linux hardware architectures. With the wide range of architectures that run Linux, it is unlikely that a single development environment will support all architectures.

Metrics

• Linux support exists for the most common enterprise class IDEs and ADEs that are available on the market for other UNIX operating systems.

• Development environments completely support multiple processor architectures and all common development languages.

• Linux support exists for the most common IDE tools.

• At least one common ADE is available for every Linux processor architecture

54 Data Center Linux Capabilities

Process Health Check Tool

DC-2. Priority Level 1

Description Foster availability of tools that can run a process health check. As part of the process health check, thresholds can be set to determine if critical processes on a system are healthy. When a problem is encountered, the system administrator can be notified of the process abort and the reason for aborting. Notification can occur via any one of a number of mechanisms, including email or paging. The administrator has the ability to specify the specific processes or class of processes monitored and to specify process abort reasons that trigger notification.

Metrics

• More than one comprehensive process health check tool is available for Linux.

• An open health check tool is available on Linux

Operating System Migration Paths (Two-Distribution Minimum)

DC-3. Priority Level 2

Description Encourage Linux distributors and Linux platform providers to furnish automated operating system migration tools, quality documentation and training and support programs, particularly aids for migration of applications from other UNIX operating systems. Enable migration for multiple Linux distributions. Help ease the burden of migration by guiding new Linux developers on the pitfalls of migration. Over time more and more new applications will be developed specifically for Linux. However, today most enterprise applications considered for Linux support already exist under some other flavor of UNIX or under MS Windows. The cost of application migration is one of the inhibitors slowing the adoption of Linux.

Metrics

• Gap analysis tools are available for migration from all primary UNIX operating systems.

• Training and migration support programs are available from all major distributions and platform providers; formal programs are available from at least two distributions.

• Migration guides are available that help developers identify and address major differences between the various popular UNIX implementations and Linux.

• White papers comparing Linux distributions are available.

55 Data Center Linux Capabilities

Developer Conferences

DC-4. Priority Level 2

Description Extend coverage of Linux topics at developer conferences around the world. Technical conferences dedicated specifically to Linux application developers are very limited in number, especially when one considers the global landscape. However, there is a growing number of general conferences targeted at the Linux community. These conferences present an opportunity to provide coverage for Data Center Linux through technical tracks that target application developers by addressing challenges of migration to Linux, presenting highlights of the latest releases of the kernel and key packages, discussing case studies of data center implementations, and the like.

Metrics

• A sufficient number of conferences occur at regular frequency across all major geographies.

• A significant number of tutorials and technical sessions targeted at data center developers are offered at conferences.

• Attendance at tutorials and technical sessions is above the average for all sessions in the conference.

• A sufficient number of papers addressing developer issues are published.

56 Data Center Linux Capabilities

Complete Tool Set

Analysis Tools

DC-5. Priority Level 2

Description Foster a range of user-friendly software analysis tools under Linux for enterprise application developers. The range must include tools for high-level application developers and developers of system level code, such as drivers. Examples include tools for static checking of source code, for coding errors and security vulnerabilities, such as: lint, clint, splint, and Jlint, which together cover C, C++ and Java. Other examples include memory analysis tools for both static code analysis and runtime memory management, and thread analysis tools that identify data race conditions and missed synchronization in multi-threaded applications.

Metrics

• A sufficient breadth of static and dynamic analysis tools is available in Linux.

57 Data Center Linux Capabilities

Debuggers

DC-6. Priority Level 2

Description Encourage availability of a breadth of kernel-level debuggers and application/user-level debuggers for all core-programming languages and all Linux platforms. It is desirable to have integrated debuggers that handle both kernel mode and user mode code. These debuggers should support all standard debug features, including viewing source code, setting breakpoints, viewing variables including complex objects, stack traces, thread awareness and remote debugging for kernel code debug. Other required features include support of a graphical user interface and scripting. Availability of debuggers to cover all source-level debugging of C, C++, and Java is required, at a minimum. Ability to debug assembly code for the various platforms is also important. A debugger that supports multiple Linux platforms is a useful addition that provides developers developing an application that runs on multiple platforms the luxury of using a common critical tool as they port to each platform. Cross-platform debuggers allow developers of embedded applications to download the application to the embedded device and debug it remotely, typically from a workstation of a different architecture. These cross-platform debuggers, though important for embedded applications, are not critical for data center support.

Metrics

• A breadth of debugging tools is available that supports all standard debug features, kernel and user mode debug features, all core programming languages and all Linux platforms.

58 Data Center Linux Capabilities

Trace and Profiling Tools

DC-7. Priority Level 2

Description Nurture a range of trace and profiling tools that complements basic debuggers for all Linux platforms and major distributions. Though trace and profiling capabilities are sometimes built into debuggers, the more sophisticated tools tend to be separate offerings that are available either standalone or integrated into IDEs. Trace tools outside of the debugger also tend to be less intrusive on application execution. Trace tools that trace through all API calls, both user-level and kernel, provide developers invaluable insight into how their code interacts with the operating system and all the other libraries and packages that code calls upon. A complete trace tool also traces and displays kernel events. In addition to call tracing, a variety of profiling tools that are commonly available for other operating systems must be supported on Linux. These include memory profilers that help developers analyze and reduce memory usage (when necessary), find memory leaks and speed up execution through efficient memory usage. Profiling tools also include cache and heap profilers. Cache profilers typically perform cache simulation to pinpoint accurately sources of cache misses in the code. Heap profilers trace heap usage over time. Finally, code coverage profilers help developers substantially improve their test processes and product quality.

Metrics

• A range of trace and profiling tools that complement basic debuggers is available for all Linux platforms and major distributions.

59 Data Center Linux Capabilities

Performance Monitors and Gauges

DC-8. Priority Level 2

Description In order to improve overall application and system performance for Linux-based enterprise systems, encourage availability of performance tools and gauges under Linux for application developers, architects and maintainers of enterprise systems. Through the collection of runtime execution and resource statistics, performance monitors and counters help users identify performance bottlenecks in application code and in their usage of kernel facilities and I/O resources (disk, network, etc.). A variety of performance tools and gauges exists with varying degrees of system intrusiveness and quality of feedback. The performance monitoring tools need to support a variety of system architectures, such as distributed, clustered, NUMA, MPP and SMP systems.

Metrics

• A substantial portion of common UNIX-based performance monitoring tools should be available for Linux. Those tools should provide support or the latest two production releases of the Linux kernel.

Log Monitoring Tool

DC-9. Priority Level 2

Description Encourage the supply of an automated tool that monitors system logs produced by the kernel and application-specific logs. When abnormalities are detected in a log, the system administrator is notified via any one of a number of notification mechanisms, including email or paging. The administrator has the ability to specify which logs are monitored and which situations trigger notification.

Metrics

• At least one comprehensive log-monitoring tool is available for Linux.

60 Data Center Linux Capabilities

Total Cost of Ownership Within this document, the term “total cost of ownership” is used in the broadest sense. The category addresses all aspects of the costs and benefits associated with Linux platform ownership. In future versions of this document, this section will be developed further. This initial description is required to illustrate that the perceived and actual costs of Linux solution ownership are a market driver. Five major areas have been identified for study: Acquisition Cost, Deployment Cost, Maintenance Cost, Support Cost and Operation Cost.

Acquisition Cost

TCO-1. Priority Level 1

Description These costs are generally associated with the initial acquisition of the Linux solution:

• Hardware purchase cost

• Software purchase cost

• Clustering pricing needs to reflect the entire cluster, not just the components.

• Multi-core and multiprocessor licensing considerations

• Consulting services cost

• Initial installation cost related to the machine room space air-conditioning power-supply backup power, and so on

Metrics

• ISV-1, ISV-2, and ISV-3 software providers and Linux distributors provide cluster-specific pricing for their products (See Workloads section WL-11, WL-12, and WL-13 for definitions of ISV-1, 2 and 3).

• ISV-1, ISV-2, and ISV-3 software providers and Linux distributors provide multi-processor and multi- core based pricing for their products.

• The most price-competitive hardware architectures run on Linux.

61 Data Center Linux Capabilities

Deployment Cost

TCO-2. Priority Level 1

Description These types of costs are generally associated with deployment:

• System Installation

• Operating system and application deployment

• Development costs – custom applications and extensions to acquired software

• Application integration

• User training

Metrics: Costs are competitive with offerings for other platforms.

Maintenance Cost

TCO-3. Priority Level 1

Description These types of costs are generally associated with the maintenance of the data center solution:

• Licensing/Subscription Costs

• Clustering—licensing schemes need to reflect the entire cluster, and not just the components.

• Upgrade costs—the total costs associated with upgrading to a new version of the software

• Hardware maintenance costs

• Migration costs—costs associated with migrating platforms For example, migration from legacy operating systems to Linux Custom application migration costs Data base migration costs

Metrics:

• ISV-1, ISV-2, ISV-3 software providers and Linux distributions provide cluster-specific license options.

• Maintenance costs are competitive as compared to other software platforms offerings.

62 Data Center Linux Capabilities

Support Cost

TCO-4. Priority Level 1

Description: These types of costs are generally associated with ongoing usage of the Linux solution:

• System Monitoring—monitoring Network, operating system and so on

• End-User Support Cost—for example, Enterprise Support Center costs

• Education Costs—for the end user and the internal IT department

Metrics: Support costs are competitive with offerings for other platforms.

Operational Cost

TCO-5. Priority Level 1

Description: These types of costs are generally associated with ongoing usage of the Linux solution:

• Backup Costs—for example, data backup

• Problem Determination Costs—for example, problem troubleshooting, reporting, and so on

• Outages Costs—for example, loss of productivity, profit and so on, due to outages associated with running the solution

• Security measures costs—for example, security monitoring, prevention and so on

• Facilities—for example, the machine room, printer areas, space, air conditioning, power, backup power, and so forth

• Server power consumption

Metrics: Operational costs are competitive with offerings for other platforms

63 Data Center Linux Capabilities

Stability In order for enterprises to seriously consider deploying Linux for the long term in their data centers, the Linux operating system, stacks and applications must meet a high level of stability requirements. Stability is a key selection criterion for most enterprises, and it is especially critical for larger corporations. This category addresses those requirements.

Kernel Stability

STBL-1. Priority Level 1

Description Since the Linux kernel is a driver for the pace at which surrounding stacks, libraries and other components evolve, its stability is fundamental to the stability of the Linux environment as a whole. The slower the revision and release rate of production kernels, the less disruption caused within the Linux ecosystem, especially to ISV and Linux distributions. There are other factors critical to the stability of the kernel that are related to this goal, such as regression testing and integration testing. In addition, when production kernel revisions are released, one of the most critical threats to software package stability is changes in kernel APIs. Any API change that forfeits backwards-compatibility has a ripple effect across a potentially huge number of packages and applications. Therefore, changes in kernel APIs should be minimized to meet the stability requirement for enterprise deployment.

Metrics

• The frequency of distribution releases (based upon kernel releases) is reduced to match enterprise expectations.

• Minor revisions of the kernel maintain backwards-compatibility for all APIs.

Software Stack Stability

STBL-2. Priority Level 1

Description In addition to the kernel, the complete software stack that ultimately depends on the kernel should meet the demanding stability requirements of the enterprise. Software solutions that run on the Linux platform must be reliable in the enterprise context. Anticipating, detecting and diagnosing failures within the stack are important. For example, it would be helpful to detect hardware failures so that corrective measures could be taken.

64 Data Center Linux Capabilities

Stack Documentation and Testing

STBL-3. Priority Level 1

Description A common problem in complex systems is that integration and/or re-integration of adequately tested updated components still results in failures. The failures are caused by inadequate integration and regression testing of the new components in all variants of the integrated system. Sometimes when different feature sets are developed by separate communities, unanticipated problems arise when they are used together. Software stacks within the Linux environment are no exception. Another common cause of instability in the stack is the introduction of a new software version (into a hitherto stable system) that results in failures. The failures subsequently cause the requirement of one or more additional package revisions and interim patches to correct the problems. Making software updates available for testing early in their validation cycles mitigates this stability concern. The community should test new functionality and/or updates across a wide range of stacks and application workloads. In addition, package providers should describe in detail, in early documentation, the changes in their releases and the potential impact on package users. This documentation helps data center managers, application ISVs and other providers of inter-dependent stack components proactively address the impact of changes and minimize disruption to production system stability.

Metrics

• Prior to releasing revised components, providers of open source packages and other middleware make and maintain publicly available matrices that address the testing of their revisions with stack configurations. In addition, when new releases of the Linux stack are available, the complete stack is re-tested.

• Eighty percent of open source package releases include the availability of early release notes that comprehensively describe the impact of the changes in the releases.

65 Data Center Linux Capabilities

Package Revisions

STBL--4. Priority Level 2

Description In addition to the kernel, a typical Linux operating environment contains several hundred open source packages. The stability of these packages is a major contributor to the overall stability of Linux based systems. Changes in a package may force changes in applications using that package. The frequent release of application revisions results in loss of stability within the data center. Though there are several thousand such packages available today, only a small subset is used pervasively in data center applications. OSDL has identified a set of important base packages most commonly used across system implementations. This set is collectively referred to as the OSDL Working Set. We identify this set of packages in order to monitor package evolution and consistency across revisions, through regression testing. The stability of these core packages contributes heavily to the overall stability of Linux.

Metrics

• Binary regression test suite does not uncover any backward-compatibility issues across minor revisions of the packages included in a software stack.

References Attempts to create binary testing are underway. See the Binary SIG:

66 Data Center Linux Capabilities

Application and Distribution Support

STBL--5. Priority Level 2

Description Due to the cost and disruption associated with software changes, managers of enterprise data centers strive to keep a particular version of a software package or operating system in operation for several years. The providers of software competitive to Linux have historically satisfied this enterprise requirement by offering active support for extended periods of time. In order for Linux to be considered seriously in the data centers of major enterprises, Linux distributions and core application ISVs must at a minimum offer the same level of extended support to which data center managers are accustomed.

Metrics

• Global and major regional enterprise Linux distributors commit to supporting each major Linux revision for a minimum of seven years from release date.

• The major revisions of all top tier and at least 50% of middle tier applications (see Workloads section) are supported for a minimum of seven years from release date.

Linux Stack Roadmap

STBL-6. Priority Level 2

Description The demands for stability within the enterprise require IT management to have a long planning horizon that often extends up to three years out. This requires data center software suppliers to provide product and feature release roadmaps with a similar planning horizon. In addition, data center managers expect suppliers to publish firm commitments for deliverables over a 12-month period, at a minimum, so that managers can plan their short-term budgets and schedules. Linux enterprise distributors are stepping up to the enterprise requirements, and many of them are publishing roadmaps. However, these are typically planning roadmaps only, and they do not reflect guaranteed feature commitments over the timeframes. One factor that complicates a distributors' ability to provide these commitments is the lack of committed roadmaps from the upstream kernel and package developer communities.

Metrics

• An adequate metric needs to be developed for this goal.

67 Data Center Linux Capabilities

Technical Capabilities

Technical Overview While updating the Technical Capabilities to create this version of the document, the DCL Technical Committee took into account the following:

• Linux 2.6.15 release's impact on the maturity levels of the Technical Capabilities • The DCL Marketing Committee's analysis of the priority levels of the Marketing Goals The Technical Capabilities and Marketing Goals are tied together as illustrated by the Marketing Goals That Influenced Priority One Technical Capabilities table below. The table illustrates which Marketing Goals influenced each Priority One Technical Capability. As before, Technical Capability maturity levels are measured for each of three tiers:

• The Edge Tier, which includes edge and infrastructure servers • The Application Tier • The Database and Content Tiers

Description of Technical Categories Each capability is described in table format, and capabilities are organized within categories. The categories are described below.

Scalability Capabilities in the Scalability category support horizontal and vertical scaling of data center servers such that the addition of hardware resources results in acceptable increases in capacity. Certain minimum capacity limits should be met in the areas of CPU, I/O, memory and networking.

Performance Capabilities in the Performance category support performance levels expected in data center environments. We measure performance with a workload focus. In particular, we’ve chosen workloads for which recognized industry-standard benchmarks exist, so performance can be measured.

To judge maturity, we use actual benchmark results when they exist. If benchmark results do not exist, we use our knowledge of existing open or commercial-solution performance.

68 Data Center Linux Capabilities

RAS (Reliability, Availability, Serviceability) Capabilities in this category support greater system and application availability. These provide features that enhance software component robustness or support hardware failure recovery. The RAS category includes serviceability components that would typically be handled or directed by an outside service organization (as opposed to the IT staff). RAS encompasses tools or features required to prevent, locate, circumvent and recover from situations that aren’t normal or desired for a customer. It also includes the typical tools and features needed for initial installation and major updates. Service actions on behalf of the customer are either passive (automated) or active (involving human interaction). RAS capabilities that require active service actions can have a usability descriptor in their tables. The descriptors include criteria that define the solution’s usability.

Manageability Capabilities in the Manageability category address day-to-day operation of a system. For this analysis, we focus on administrators, not end-users or vendor service-personnel. These capabilities manage activities, either passive (automated) or active (involving human interaction). Manageability capabilities that require active service actions can have a usability descriptor in their tables. The descriptors include criteria that define the solution’s usability.

Virtualization Capabilities in the Virtualization category identify the customer-visible capabilities important to characterizing a virtualization implementation. These capabilities focus on Linux as a guest operating system. Completeness for this category means that two or more solutions on Linux have the stated capability. If only one solution exists, the maturity for a capability is assigned “Available.” Virtualization capabilities that have a customer visible implementation detail affecting usability will have a usability descriptor in their tables.

Clusters Capabilities in the Clusters category support the use of multiple-server systems to provide the following features: (1) higher levels of service availability through redundant resources and recovery capabilities, and (2) a horizontally-scaled environment supporting increased throughput. Components needed specifically for Clustering that might otherwise logically appear under another category will be found in the Clusters category. For example, while cluster-related administration capabilities are related to the Manageability category, you will find them in the Cluster category of this document. Cluster is a very important solution for those who need it. From an organizational standpoint, clustering capabilities are easier to track if components needed for clustering are found in one place.

69 Data Center Linux Capabilities

Standards Capabilities in the Standards category reference specifications controlled outside Data Center Linux working groups. This category includes only capabilities with standards related to adopting Linux in data centers.

Security Capabilities in the Security category provide mechanisms for Data Center systems to help protect confidential data and to help ensure high availability and reliability. Security mechanisms reduce down-time caused by security issues. Demands for more secure systems are driven by a diversity of end-users and increased vulnerability due to enterprise system access to the Internet. Security mechanisms are designed to take into account both potential internal and external attacks. Systems also need to be designed to minimize damage should an attack succeed. Completeness of capabilities identified in this section does guarantee a secure system, Security experts use these capabilities as part of the infrastructure needed to build a secure system. These capabilities were reviewed by the OSDL Security Special Interest Group and are based on the server descriptions found in the use cases they provided (see http://developer.osdl.org/dev/usecases/security.shtml The DCL analysis of security doesn’t include special government agency needs. Capabilities that have implementation details that affect usability have a usability descriptor in their tables.

Usability Capabilities in the Usability category represent the usability of tools, utilities and services that a system administrator uses for servicing or managing in a non-passive way. Capabilities listed in the Usability category are unique to Usability. Usability information is also listed in the usability descriptors of certain RAS, Manageability, Virtualization, and Security capabilities. A solution’s usability is judged based on many factors, such as (1) its ability to be learned quickly and in-depth, and to be remembered by administrators, (2) its ability to avoid and easily correct errors, (3) how well integrated it is with complementary features, and (4) how pleasant it is to use.

70 Data Center Linux Capabilities

Priority One Technical Capabilities Priority One capabilities are most important for Data Center Linux readiness. They are in tables with orange headers. The following table illustrates the categories of Marketing Goals that have influenced the choice of each Priority One Technical Capability.

Marketing Goals That Influenced Priority One Technical Capabilities

Priority One Linux Global Workloads Technical Development Total Cost Stability Technical Awareness Enterprise Training & Community of Capabilities & Services & Education Ownership Confidence Support

Scalability

CPUs16Way √ √

Disk I/O √ √ Connectivity— 4096 Storage Devices

Disk I/O—Async √ √ I/O—File System

Memory—64GB √ √

Non-Uniform √ √ Memory Access (NUMA) APIs

Non-Uniform √ √ Memory Access (NUMA)Topology

Symmetric Multi- √ √ Threading (e.g. Hyperthreading)

71 Data Center Linux Capabilities

Marketing Goals That Influenced Priority One Technical Capabilities

Priority One Linux Global Workloads Technical Development Total Cost Stability Technical Awareness Enterprise Training & Community of Capabilities & Services & Education Ownership Confidence Support

Performance

Network File √ √ System (NFS) V2/V3 Performance Server/Client

Network File √ √ System (NFS) V4 Performance and Functionality

Java Performance √

File System √ Performance

Port Quality √

Application √ √ Performance

Reliability, Availability and Serviceability (RAS)

Crash Dump √

Update √ Notification—Data Corruption

Debugger—Kernel √ √

Dynamic Tracer √ √

Hardware Fault √ √ √ Prediction and Fault Location

72 Data Center Linux Capabilities

Marketing Goals That Influenced Priority One Technical Capabilities

Priority One Linux Global Workloads Technical Development Total Cost Stability Technical Awareness Enterprise Training & Community of Capabilities & Services & Education Ownership Confidence Support

Software Fault √ √ √ Location Identification

Live Snapshot— √ Kernel Level

Performance √ √ √ Monitoring

Hot Swap: I/O Bus √ √ √ Level—PCI, PCI- X, cPCI

Hot Swap: I/O Bus √ √ √ Level—SCSI

Hot Swap: I/O Bus √ √ √ Level—PCI Express

Hot Swap: √ √ √ Component Level—Memory Remove

Hot Swap: √ √ √ Component Level—Memory Add

Hot Swap: √ √ √ Component Level—CPU

Hot Swap: √ √ √ Component Level—Node

73 Data Center Linux Capabilities

Marketing Goals That Influenced Priority One Technical Capabilities

Priority One Linux Global Workloads Technical Development Total Cost Stability Technical Awareness Enterprise Training & Community of Capabilities & Services & Education Ownership Confidence Support

Component √ Notification: Mem/IO/Power Failure, Temperature

Fast System Boot √ √

Reliable File √ √ System Writes

Multipath I/O √ √ √

Shared Memory & √ IPC Parameter Changes without Reboot

Manageability

Common Interface √ for Third Party Integration to Install Tools

Software Package √ √ Management

Configuration √ Management (Expanded to Full Stack)

Volume √ √ Management

Persistent Storage √ Device Naming

Remote √ √ √ Management

74 Data Center Linux Capabilities

Marketing Goals That Influenced Priority One Technical Capabilities

Priority One Linux Global Workloads Technical Development Total Cost Stability Technical Awareness Enterprise Training & Community of Capabilities & Services & Education Ownership Confidence Support

Log √ √ √ Monitoring/Event Notification/Agents

Workload √ Management

Enhanced Process √ √ √ and Resource Monitoring

Virtualization

Run Application √ √ √ √ Software Unmodified

Application √ √ √ √ Separation

Clusters

Cluster-Wide √ √ Persistent Storage Device Naming

Cluster Volume √ √ Management

Cluster File √ √ System

Membership √

Load Balancing— √ Resource Based

75 Data Center Linux Capabilities

Marketing Goals That Influenced Priority One Technical Capabilities

Priority One Linux Global Workloads Technical Development Total Cost Stability Technical Awareness Enterprise Training & Community of Capabilities & Services & Education Ownership Confidence Support

Single System √ Image—File System View

Standards

Linux Standard √ √ Base (LSB) 2.0 Compliance

Linux Standard √ √ Base (LSB) 3.0 Compliance

Security

User Stack √ √ Overflow Protection

User and System √ √ Stack Not Executable

Linux Security √ √ √ √ Module (LSM) Support

System Integrity √ √ Check

Static Analysis √ √ Tools

Run-time Analysis √ √ Tools

Fast Security Fix √ √ Process

76 Data Center Linux Capabilities

Marketing Goals That Influenced Priority One Technical Capabilities

Priority One Linux Global Workloads Technical Development Total Cost Stability Technical Awareness Enterprise Training & Community of Capabilities & Services & Education Ownership Confidence Support

Usability

Third Party √ √ Software Integration

77 Data Center Linux Capabilities

Guide to Technical Capability Table Entries Each table within the category sections describes a unique technical capability. The next section describes the format of the tables, followed by an explanation of the maturity given in the tables. Then the capabilities themselves are listed by category.

General Table Format Each capability is described in a unique table, and within each table are the following descriptors

Capability Priority Level of Category of the Maturity Levels of the Capability Name Identifier (ID) Capability Capability Capability’s Three Tiers

CC.XXXX A short descriptor Priority Ones See technical The 3 tiers are are the most category CC is an important for Linux descriptions. • Edge abbreviation data center • Application of the readiness. SC Scalability capability’s P Performance • Database/Content category Priority Twos R RAS are listed to M Management Maturity levels are stimulate thought V Virtualization described another table and discussion. ST Standards that follows this one. SE Security C Clusters U Usability

Description Each capability’s description includes the rationale behind and an explanation of its features, its applications and its priority level.

Usability Some capabilities organized under categories other than Usability include a usability descriptor. In these cases, usability applies to the non-passive activities needed to take advantage of those features. See the Usability category description for how usability is judged.

References These are usually links, but they can be descriptions. The following are referenced for each capability, as they apply: status, active projects / proof of concept (POC), applicable standards, relevant article links, and POC dependencies. We strive to provide URLs as references to projects or implementations that might or might not fully meet the capabilities described. The references do not represent an endorsement of any particular implementation by the Data Center Linux working group.

78 Data Center Linux Capabilities

Maturity Level Definitions Each capability is assigned a maturity level for each of three tiers: Edge, Application and Database/Content. If the maturity is the same for all three tiers, only one maturity level is listed.

Maturity Level Range of Completion Description

Investigation 0%-9% The project is in the concept phase.

Development 10%-39% The project is started.

Released 40%-49% Early releases are available.

Usable 50%-69% Working, usable releases are available

Stable 70%-79% The project has released a stable version.

Integrated 80%-89% The code is in the development kernel tree, and/or distributions are included as patches.

Mainline 90%-94% The code is in a stable kernel tree (currently 24 baseline), but it might not be available or complete.

Product Available 95%-99% One customer-available version meets specification.

Completed 100% A customer-available version meets specification in more than one distribution. Capabilities can be completed, yet further integration or regression testing may still be needed.

N/A Not applicable for a particular tier.

79 Data Center Linux Capabilities

Technical Capability Tables

Scalability Capabilities in the Scalability Category support horizontal and vertical scaling of data center servers such that the addition of hardware resources results in acceptable increases in capacity.

CPUs

ID Name of Capability Priority Level Category Maturity Level

SC.CPU1 CPUs—1 way 2 Scalability Edge: Completed Application: N/A DB/Content: N/A

Description This capability enables Linux to allow CPU-bound applications to utilize close to 100% of the processing power of one CPU.

ID Name of Capability Priority Level Category Maturity Level

SC.CPU2 CPUs—2 way 2 Scalability Edge: Completed Application: N/A DB/Content: N/A

Description This capability enables Linux to allow CPU-bound applications to utilize close to 100% of the processing power of two CPUs.

80 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

SC.CPU4 CPUs—4 way 2 Scalability Edge: Completed Application: Completed DB/Content: N/A

Description This capability enables Linux to allow CPU-bound applications to utilize close to 100% of the processing power of four CPUs.

ID Name of Capability Priority Level Category Maturity Level

SC.CPU8 CPUs—-8 way 2 Scalability Completed

Description This capability enables Linux to allow CPU-bound applications to utilize close to 100% of the processing power of eight CPUs.

81 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

SC.CPU16 CPUs—16 way 1 Scalability Completed

Description Several vendors are selling 16-CPU and larger platforms into data centers. Some vendors have 32-CPU systems coming on systems such as IA32 or zSeries; others have 64+ CPU systems on systems such as PPC64, IA64 and so on. Several large applications are ill-suited for clustered database applications. These include content servers and databases that typically support very large OLTP or decision support applications. The actual measure of success for Linux will be industry-competitive benchmarks compared to other operating systems on the same hardware platforms (For example: Linux on IA32, IA64, Power, zSeries, and so on). Some aspects of performance are related directly to the Linux kernel, others are related to applications.

Many of the improvements in this area are driven by specific benchmarks or related activities on 16-CPU machines. Several vendors are engaged in improving scalability on various platforms; most of those activities are reflected on LKML or the lse-tech mailing list.

Currently, Linux 2.6.9 and later scale quite well to at least 16 CPUs and likely more. The scalability depends upon the workload, and in some cases it depends upon whether specific applications in the workload have been ported to Linux so as to take advantage of key APIs related to scalability.

References Mail can be sent to the Lse-tech mailing list at [email protected]. For more information: https://lists.sourceforge.net/lists/listinfo/lse-tech

ID Name of Capability Priority Level Category Maturity Level

SC.CPU32 CPUs—32 way 2 Scalability Edge: N/A Application: N/A DB/Content: Stable

Description This capability enables Linux to allow CPU-bound applications to utilize close to 100% of the processing power of 32 CPUs.

82 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

SC.CPU64 CPUs—64 way 2 Scalability Edge: N/A Application: N/A DB/Content: Stable

Description This capability enables Linux to allow CPU-bound applications to utilize close to 100% of the processing power of 64 CPUs.

Network I/O—Connections

ID Name of Capability Priority Level Category Maturity Level

SC.NetCon10ps Network I/O–-10/sec 2 Scalability Edge: Completed Application: N/A DB/Content: N/A

Description Assuming the system configuration is capable, this feature enables Linux to allow inbound or outbound network connections to meet or exceed the rate of 10 connections per second.

ID Name of Capability Priority Level Category Maturity Level

SC.NetCon100ps Network I/O—100/sec 2 Scalability Edge: Completed Application: N/A DB/Content: Completed

Description Assuming the system configuration is capable, this feature enables Linux to allow inbound or outbound network connections to meet or exceed the rate of 100 connections per second.

83 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

SC.NetCon1000ps Network I/O–-1000/sec 2 Scalability Edge: Completed Application: N/A DB/Content: N/A

Description Assuming the system configuration is capable, this feature enables Linux to allow inbound or outbound network connections to meet or exceed the rate of 1000 connections per second.

Network I/O—Total Throughput/sec

ID Name of Capability Priority Level Category Maturity Level

SC.NetThru10Mbps Network I/O—10Mb/sec 2 Scalability Edge: Completed Application: N/A DB/Content: N/A

Description Assuming the system configuration is capable, this feature enables Linux to allow inbound or outbound I/O throughput to meet or exceed the rate of 10 megabits per second.

ID Name of Capability Priority Level Category Maturity Level

SC.NetThru100Mbps Network I/O—100Mb/sec 2 Scalability Completed

Description Assuming the system configuration is capable, this feature enables Linux to allow inbound or outbound I/O throughput to meet or exceed the rate of 100 megabits per second.

84 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

SC.NetThru1000Mbps Network I/O–-1000Mb/sec 2 Scalability Completed

Description Assuming the system configuration is capable, this feature enables Linux to allow inbound or outbound I/O throughput to meet or exceed the rate of 1000 megabits per second.

ID Name of Capability Priority Level Category Maturity Level

SC.NetThru10Gbps Network I/O—10Gb/sec 2 Scalability Stable

Description Assuming the system configuration is capable, this feature enables Linux to allow inbound or outbound I/O throughput to meet or exceed the rate of 10 gigabits per second.

Network Improvement

ID Name of Capability Priority Level Category Maturity Level

SC.NetSendFile Network—Sendfile 2 Scalability Edge: Completed Application: N/A DB/Content: N/A

Description The sendfile system call makes network transfers more efficient by eliminating data copies between the user and kernel levels.

References LinuxForum.com: http://www.linuxforum.com/man/sendfile.2.php

85 Data Center Linux Capabilities

Priority ID Name of Capability Category Maturity Level Level

SC.NetCopyless Network—Copyless 2 Scalability Stable Send Send/Receive

Description Large network file transmissions cause excessive memory copy operations and system calls between the kernel and , which are expensive and require many CPU cycles for processing. This capability reduces the CPU cost per transaction by eliminating CPU copies between the kernel and user space and by avoiding unnecessary system calls in user space. The improved CPU efficiency means that a larger number of requests can be serviced with the same CPU configuration.

References Maturity is based on this project: https://sourceforge.net/projects/zero-copy Previous efforts include Linux Zero Copy: http://www.spinics.net/lists/linux-net/msg08264.html

ID Name of Capability Priority Level Category Maturity Level

SC.ScalablePoll Network—Scalable Poll 2 Scalability Completed

Description This capability provides for “poll and select” scaling beyond 1000 file descriptors.

References Center for Information Technology Integration, Linux Scalability project: http://www.citi.umich.edu/projects/linux-scalability/reports/poll.html

86 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

SC.NetAIO Network—Asynchronous I/O 2 Scalability Usable

Description A variety of applications require a generalized asynchronous mechanism for networking I/O for high performance Internet throughput. Applications like web servers benefit from zero-copy sendfile(), for instance, although that is still roughly a synchronous call and isn't typically used for dynamic web content. A network AIO mechanism was originally implemented for one of the distributions at one point, but that code was not actively carried forward into the Linux 2.6 kernel. This is one of the primary advantages that the in-kernel implementation still has over public or commercial web server solutions. Other corporate application suites and Java applications could also benefit from the ability to send and receive data asynchronously over the network. The current block device-based asynchronous IO work doesn’t currently provide a networking solution. Note that () provides some of the key information that enables some networking products to scale reasonably, although this addresses only one class of problems that rely on asynchronous networking support.

References The Network-Asynchronous I/O project: http://www.sourceforge.net/projects/naio AIO support for Linux: http://lse.sourceforge.net/io/aio.html C10K notes: http://www.kegel.com/c10k.html Design notes on network AIO: http://lse.sourceforge.net/io/aionotes.txt

ID Name of Capability Priority Level Category Maturity Level

SC.NetSegOffload Network—Segment Offloading 2 Scalability Completed

Description This capability enables systems running Linux to take advantage of modern Network Interface Controllers that are capable of performing the low layer segmentation of packets. This frees the CPU from having to do this activity.

References LWN.net on Kernel Development: https://lwn.net/Articles/8779/

87 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

SC.NetChksmOffload Network—Checksum 2 Scalability Edge: Completed Offloading Application: N/A DB/Content: N/A

Description This capability enables systems running Linux to take advantage of modern Network Interface Controllers that can perform TCP or UDP checksumming. This frees the CPU from having to do this activity. This capability is generally applicable only to edge servers.

ID Name of Capability Priority Level Category Maturity Level

SC.Net-DoS Network—Denial of Service 2 Scalability Edge: Stable Protection Application: N/A DB/Content: N/A

Description Without denial of service (DoS) protection, a Linux system cannot perform well under attack. This capability is generally applicable only to edge servers.

References Internet Advisory Board research recommendations, see section 3.4.6, “Denial of Service Protection” : http://www.faqs.org/rfcs/rfc3869.html,

ID Name of Capability Priority Level Category Maturity Level

SC.Net- Network—High Speed 2 Scalability Edge: Mainline HiSpeedRouting Routing (Especially IPV6) Application: N/A DB/Content: N/A

Description The existing Linux routing code has scalability problems when it runs directly on Internet backbone .This capability generally applies only to edge servers.

References Internet Advisory Board: research recommendations, see section 3.3, “Routing”: http://www.faqs.org/rfcs/rfc3869.html

88 Data Center Linux Capabilities

Priority ID Name of Capability Category Maturity Level Level

SC.Net-QoS Network—Better Quality of 2 Scalability Edge: Integrated Service and Queuing Application: Integrated DB/Content: N/A

Description Linux has a rich array of queuing support, but there is a need for more support tools and research. This capability is generally applicable only to edge servers.

References Internet Advisory Board research recommendations, see section 3.6.2, “New Queuing Disciplines”: http://www.faqs.org/rfcs/rfc3869.html,

Priority ID Name of Capability Category Maturity Level Level

SC.Net-HiSpeedIC-APIs Network—APIs for 2 Scalability Edge: N/A High Speed Application: N/A Interconnect DB/Content: Development

Description There are many competing solutions (AIO, RDMA, and TCP off load) that are not currently implemented in Linux. Investigation is required to determine if there are requirements for cluster and data center applications.

References An InfiniBand comment, regarding RDMA, that applies broadly: http://www.scl.ameslab.gov/Publications/Troy/usenix-ib-04/node13.html

Priority ID Name of Capability Category Maturity Level Level

SC.Net-HiSpeedTCP Network—Support for High 2 Scalability Mainline Speed TCP

Description This capability evaluates and applies TCP improvements for performance over high-speed, long-delay paths. Current TCP supports several alternative congestion control and tuning options; evaluation is required to ensure that the system defaults for these options are reasonable.

89 Data Center Linux Capabilities

Disk I/O Connectivity

ID Name of Capability Priority Level Category Maturity Level

SC.2-SD Disk I/O Connectivity— 2 Scalability Edge: Completed 2 Storage Devices Application: N/A DB/Content: N/A

Description This capability ensures that Linux can connect at least two storage devices.

ID Name of Capability Priority Level Category Maturity Level

SC.8-SD Disk I/O Connectivity— 2 Scalability Edge: Completed 8 Storage Devices Application: N/A DB/Content: N/A

Description This capability ensures that Linux can connect at least eight storage devices.

ID Name of Capability Priority Level Category Maturity Level

SC.12-SD Disk I/O Connectivity— 2 Scalability Edge: Completed 12 Storage Devices Application: N/A DB/Content: N/A

Description This capability ensures that Linux can connect at least 12 storage devices.

90 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

SC.256-SD Disk I/O Connectivity— 2 Scalability Edge: Completed 256 Storage Devices Application: N/A DB/Content: N/A

Description This capability ensures that Linux can connect at least 256 storage devices.

91 Data Center Linux Capabilities

ID Name of Capability Priority Category Maturity Level Level

SC.4096-SD Disk I/O Connectivity—4096 1 Scalability Completed Storage Devices

Description Data Center Linux systems need to support large amounts of disk storage with thousands of disk devices. The falling price of storage and the availability of SANs mean Linux systems need to efficiently support up to 4096 block I/O devices today. At the releases beyond Linux 2.6, we can easily expect 16k devices to be practical.

Many large servers with SANs need to use a lot of logical disks for performance. This feature applies to OLTP and DSS (Data Warehousing) workloads in the database tier. It’s less important but has some applicability in the application tier. The TPC-C benchmark attempts to represent common customer OLTP solutions; it also has a need for a large number of disk devices. It has very little applicability at the edge tier.

Though data centers typically allow only 1000 or so on 32-bit, large data center configurations will require 64-bit systems. This includes support for RAID arrays, so these are not necessarily disks, but rather LUNs exported by the RAID arrays. One approach might be to make a small number of really big LUNs, but the counter arguments include the following:

• Various RAID arrays have surprisingly low limits to the size a given LUN can be.

• Customers with large amounts of storage will usually break it down into fixed-size chunks (perhaps 50 GB or so) in order to be able to keep track of the storage with less chance of insanity. Then 1,000 LUNs gets you to 50 TB, which isn’t that many for high-end systems today.

• The larger the LUNs, the more storage lost due to internal fragmentation, and the uglier things get at data recovery time--for example, backup windows get larger, reducing availability.

• Recent laws intended to improve corporate governance and protect individual privacy have the side effect of requiring a lot more data be retained on local storage.

• Large clustered file systems operate on a lot of data. You can make each node access only local data, but in many cases you then run into data-skew problems and data-transfer bottlenecks. Similar issues show up in large data centers that have workloads requiring random queries over historical data (for example, fraud detection that is mandated in some financial industries). Think of "one record per stock trade over seven years" as an example. The Linux 2.6.0-test6 kernel introduced support for sufficient minor device numbers with the new 32-bit dev_t and the 10/22 bit major/minor split.

Additional work to support >1000/4000/5000 devices in the data center includes validation and testing on 32-bit and 64-bit hardware platforms. Integration testing with other data center features like and Multipath I/O is also needed.

92 Data Center Linux Capabilities

References The effort to implement this feature was commonly known as "64bit dev_t," although the initial support as of the Linux 2.6.0-test has been a 32 bit dev_t solution. glibc support is now present in the latest glibc sources, but it requires distributions to pick up the latest glibc. For more details about the status, see the following websites: LWN.net on Dev_t expansion status: http://lwn.net/Articles/46678/ BitKeeper on the Linux kernel tree: http://linus.bkbits.net:8080/linux- 2.5/search/?expr=dev_t&search=ChangeSet+commentshttp://linus.bkbits.net:8080/linux- 2.5/search/?expr=dev_t&search=ChangeSet+comments For a test plan for integration and configuration testing of large number of storage devices, see the OSDL Storage Networking SIG status page and look for the 4096 Disk focus area: http://developer.osdl.org/maryedie/STORAGE_NETWORKING/ Integration testing with large numbers of LUNs with udev, LVM, and Multipath I/O: http://www.osdl.org/cgi-bin/mpio_wiki.pl?Integration

ID Name of Capability Priority Level Category Maturity Level

SC.8K-SD Disk I/O Connectivity—8K 2 Scalability Completed Storage Devices

Description This capability ensures that Linux can connect at least 8192 storage devices.

Reference Integration testing with large numbers of LUNs with udev, LVM, and Multipath I/O: http://www.osdl.org/cgi-bin/mpio_wiki.pl?Integration

93 Data Center Linux Capabilities

Disk I/O – Max File Size

ID Name of Capability Priority Level Category Maturity Level

SC.MaxFileSz160GB Disk I/O Maximum File 2 Scalability Edge: Completed Size—160GB Application: N/A DB/Content: N/A

Description This capability ensures that Linux can support file sizes of at least 160 storage gigabytes.

ID Name of Capability Priority Level Category Maturity Level

SC.MaxFileSz1TB Disk I/O Maximum File 2 Scalability Completed Size—1TB

Description This capability ensures that Linux can support files sizes of at least one terabyte.

Priority ID Name of Capability Category Maturity Level Level

SC.MazFileSz16TB Disk I/O Maximum File 2 Scalability Completed Size—16TB

Description This capability ensures that Linux can support files sizes of at least 16 terabytes.

ID Name of Capability Priority Level Category Maturity Level

SC.MaxFileSz32TB Disk I/O Maximum 2 Scalability Edge: N/A File Size—32TB Application: Usable DB/Content: Usable

Description This capability ensures that Linux can support files sizes of at least 32 terabytes.

94 Data Center Linux Capabilities

Disk I/O per Second

ID Name of Capability Priority Level Category Maturity Level

SC.625iops Disk I/O—625/sec 2 Scalability Completed

Description Assuming a system is configured to support enough I/O operations, this capability enables Linux to support at least 625 I/O storage device operations per second on the system.

ID Name of Capability Priority Level Category Maturity Level

SC.5000iops Disk I/O—5000/sec 2 Scalability Edge: N/A Application: Product Available DB/Content: Product Available

Description Assuming a system is configured to support enough I/O operations, this capability enables Linux to support at least 5000 I/O storage device operations per second on the system.

ID Name of Capability Priority Level Category Maturity Level

SC.80000iops Disk I/O—80,000/sec 2 Scalability Edge: N/A Application: N/A DB/Content: Integrated

Description Assuming a system is configured to support enough I/O operations, this capability enables Linux to support at least 80,000 I/O storage device operations per second on the system.

95 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

SC.160000iops Disk I/O— 2 Scalability Edge: N/A 160,000/sec Application: N/A DB/Content: Usable

Description Assuming a system is configured to support enough I/O operations, this capability enables Linux to support at least 160,000 I/O storage device operations per second on the system.

Disc I/O Total Throughput/sec

ID Name of Capability Priority Level Category Maturity Level

SC.Thru40MBps Disk I/O Throughput— 2 Scalability Edge: Completed 40MB/sec Application: N/A DB/Content: N/A

Description Assuming a system is configured to support enough I/O bandwidth, this capability enables Linux to support at least 40 megabytes per second storage device throughput on the system.

ID Name of Capability Priority Level Category Maturity Level

SC.Thru300MBps Disk I/O Throughput— 2 Scalability Completed 300MB/sec

Description Assuming a system is configured to support enough I/O bandwidth, this capability enables Linux to support at least 300 megabytes per second storage device throughput on the system.

96 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

SC.Thru5GBps Disk I/O Throughput— 2 Scalability Completed 5GB/sec

Description Assuming a system is configured to support enough I/O bandwidth, this capability enables Linux to support at least 5 gigabytes per second storage device throughput on the system.

Disk I/O Improvement

ID Name of Capability Priority Level Category Maturity Level

SC.DiskIOLocking Disk I/O—Scalable Disk 2 Scalability Edge: N/A Locking Application: Mainline DB/Content: Integrated

Description This capability provides scalable kernel locking in Linux for kernel structures associated with storage devices.

ID Name of Capability Priority Level Category Maturity Level

SC.DiskIOReadAhead Disk I/O—ReadAhead 2 Scalability Edge: Completed Application: N/A DB/Content: N/A

Description This capability provides the ability for the Linux kernel to recognize and anticipate a sequential read pattern and then read the next logical record before the applications requests it. This increases the chance that when a sequential read is requested, the block is already read.

97 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

SC.DiskVectoredIO Disk I/O—Vectored I/O 2 Scalability Completed

Description This feature allows an I/O operation to read or write data using a vector of addresses. This is convenient for applications such as database servers that might have data blocks scattered throughout their buffer caches and cache sizes that might not match their read or write sizes.

ID Name of Capability Priority Level Category Maturity Level

SC.Disk-AIO-raw Disk I/O—Async I/O 2 Scalability Edge: N/A (Raw) Application: Completed DB/Content: Completed

Description The capability supports un-buffered asynchronous I/O for Linux.

98 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

SC.Disk-AIO-fs Disk I/O-–Async I/O—File 1 Scalability Edge: N/A System Application: Stable DB/Content: Stable

Description Scalable applications, in particular database applications, need the ability to issue I/O without blocking asynchronously for buffered or un-buffered I/O. For CPU-intensive applications, this permits greater CPU utilization that would be otherwise wasted while waiting for I/O to complete. The POSIX IEEE Std 1003.1 standard defines the minimum requirements for Async I/O and several optional features. Both Linux 2.6+ and the updated glibc library required to support the new AIO calls meet the minimal POSIX standard. However, for additional performance boasts, application developers expect implementation of some of the optional POSIX features (for example, async fsync). In order for these feature patches to be acceptable to the community, more investigation is needed to determine which optional POSIX capabilities are high priority and which options are needed.

References Linux Magazine, brief mention of the AIO calls (for members only): http://www.linux-mag.com/2004- 06/compile_01.html OSDL Storage SIG status: http://www.developer.osdl.org/maryedie/STORAGE_NETWORKING/AIO/status.txt http://lse.sourceforge.net/io/aio.html POSIX Standards: http://www.opengroup.org/onlinepubs/009695399/mindex.html

ID Name of Capability Priority Category Maturity Level Level

SC.Disk-DirectIO-raw Disk I/O-–Direct 2 Scalability Edge: Completed I/O—Raw Application: N/A DB/Content: N/A

Description This capability guarantees that Linux can support un-buffered I/O for I/O operations.

99 Data Center Linux Capabilities

Memory

ID Name of Capability Priority Level Category Maturity Level

SC.MEM1GB Memory—1GB 2 Scalability Edge: Completed Application: N/A DB/Content: N/A

Description This capability enables Linux to support systems with at least one gigabyte of physical memory.

ID Name of Capability Priority Level Category Maturity Level

SC.MEM4GB Memory—4GB 2 Scalability Edge: Completed Application: N/A DB/Content: N/A

Description This capability enables Linux to support systems with at least four gigabytes of physical memory.

ID Name of Capability Priority Level Category Maturity Level

SC.MEM8GB Memory—8GB 2 Scalability Edge: Completed Application: Completed DB/Content: N/A

Description This capability enables Linux to support systems with at least eight gigabytes of physical memory.

ID Name of Capability Priority Level Category Maturity Level

SC.MEM16GB Memory—16GB 2 Scalability Completed

Description This capability enables Linux to support systems with at least 16 gigabytes of physical memory.

100 Data Center Linux Capabilities

ID Name of Priority Level Category Maturity Level Capability

SC.MEM64GB Memory—64GB 1 Scalability Completed

Description Data Center Linux systems often use as much memory as the system will provide. Most distributions support at least 16 GB on 32-bit architectures with reasonable stability. In addition, user-level applications have access to a maximum of 3 GB of user address space. Major distributions also have a means for using at least 32 GB or 64 GB of physical, typically using an alternate kernel.

Database applications in particular tend to rely on the presence of a large shared memory cache.

References Notes: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-test5/2.6.0-test5-mm4/broken- out/4g-2.6.0-test2-mm2-A5.patch

ID Name of Capability Priority Level Category Maturity Level

SC.MEM256GB Memory—256GB 2 Scalability Completed

Description This capability enables Linux to support systems with at least 256 gigabytes of physical memory.

ID Name of Capability Priority Level Category Maturity Level

SC.MEM1TB Memory—1TB 2 Scalability Product Available

Description This capability enables Linux to support systems with at least one terabyte of physical memory.

101 Data Center Linux Capabilities

Layered Software

ID Name of Capability Priority Level Category Maturity Level

SC.Layered- Layered Software— 2 Scalability Edge: N/A SW-ERP ERP Application: Integrated DB/Content: N/A

Description The capability enables Linux to support scaled ERP applications.

ID Name of Capability Priority Level Category Maturity Level

SC.Layered- Layered Software—SCM 2 Scalability Edge: N/A SW-SCM Application: Integrated DB/Content: N/A

Description The capability enables Linux to support scaled SCM applications.

ID Name of Capability Priority Level Category Maturity Level

SC.Layered- Layered Software—CRM 2 Scalability Edge: N/A SW-CRM Application: Integrated DB/Content: N/A

Description The capability enables Linux to support scaled CRM applications

102 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

SC.Layered- Layered Software— 2 Scalability Edge: N/A SW-MRO MRO Application: Integrated DB/Content: N/A

Description The capability enables Linux to support scaled MRO applications

ID Name of Capability Priority Level Category Maturity Level

SC.Layered- Layered Software—SFA 2 Scalability Edge: N/A SW-SFA Application: Integrated DB/Content: N/A

Description The capability enables Linux to support scaled SFA applications

ID Name of Capability Priority Level Category Maturity Level

SC.Layered- Layered Software—Java 2 Scalability Edge: N/A SW-Java Application: Integrated DB/Content: Integrated

Description The capability enables Linux to support scaled Java-based applications

ID Name of Capability Priority Level Category Maturity Level

SC.Layered- Layered Software—ORB 2 Scalability Edge: N/A SW-ORB Application: Integrated DB/Content: Integrated

Description The capability enables Linux to support scaled ORB applications

103 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

SC.Layered- Layered Software—RDBMS 2 Scalability Edge: N/A SW-RDBMS Application: Integrated DB/Content: Integrated

Description The capability enables Linux to support scaled RDBMS applications

I/O Interface

ID Name of Capability Priority Level Category Maturity Level

SC.IOPCI I/O Interface–-PCI 2 Scalability Completed

Description The capability enables Linux to support scaled performance of the PCI I/O interface.

104 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

SC.IOPCI-X I/O Interface–-PCI-X 2 Scalability Edge: N/A Application: Completed DB/Content: Completed

Description This item refers to the support and scalable performance of PCI-X (not PCI-X 2.0 or PCI-Express). We’ll add PCI-X 2.0 and PCI-Express technologies as separate entries when they emerge as important for Linux adoption in the data center. PCI-X is very similar to PCI "Conventional," and it needs very little extra support. For future reference, once hardware is available for testers, PCI-X 2.0 and PCI-Express support isn’t expected to be difficult (assuming the specifications are open). See references. Scalable support for PCI-X is needed for the application and database/content tiers for its speed and throughput capabilities for disk interfaces. At the edge, our primary interest is NIC cards, and very few if any NIC cards today require PCI-X to support throughput or speed requirements.

References: Specifications for PCI-X, PCI-X 2.0 and PCI Express (available for PCI-SIG members): http://www.pcisig.com/specifications/order_form

PCI-SIG members appear to be free to pass on the specs (there doesn’t appear to be a “do not distribute” disclaimer).

ID Name of Capability Priority Level Category Maturity Level

SC.IOIB I/O Interface–-InfiniBand 2 Scalability Edge: N/A Application: Completed DB/Content: Completed

Description The capability enables Linux to support scaled performance of the InfiniBand I/O.

105 Data Center Linux Capabilities

Kernel Improvement

ID Name of Capability Priority Level Category Maturity Level

SC.KernelHugeThreads Kernel—Huge Number 2 Scalability Completed of Threads

The capability enables Linux to support a huge number of threads. The ability to create 100,000 threads within a few seconds would be considered huge for the purpose of this item.

ID Name of Capability Priority Level Category Maturity Level

SC.KernelMemory Kernel—Memory 2 Scalability Completed

Description This capability addresses efficient memory placement policies for user memory. It lays out kernel memory allocation so more kernel data is kept in nodes where it’s more likely to be accessed.

ID Name of Capability Priority Level Category Maturity Level

SC.KernelCPU Kernel—CPU 2 Scalability Completed

Description The kernel needs to provide efficient scheduler rebalancing policies to keep processes near their physical memory and near other processes with which they communicate.

ID Name of Capability Priority Level Category Maturity Level

SC.KernelIOInterface Kernel—I/O Interface 2 Scalability Completed

Description This capability rewrites the block layer (completed in the Linux 2.6 mainline kernel.)

106 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

SC.KernelNode Kernel—Node 2 Scalability Completed

Description This capability deals with jobs that don't "fit" in a single node, either because of CPU usage or physical memory usage.

ID Name of Capability Priority Category Maturity Level Level

SC.NUMA-API Non-Uniform Memory 1 Scalability Edge: N/A Access (NUMA) APIs Application: Completed DB/Content: Completed

Description Many large hardware platforms today have decreasing uniformity in access to memory from multiple CPUs. The most obvious cases are those machines designed with NUMA architecture in mind, including several IA32 and IA64 platforms. Other machines, such as AMD x86-64 based platforms, are referred to as "SUMA," or Sufficiently Uniform Memory Architecture. However some key applications that are SMP- aware are also able to avoid the memory access penalties of NUMA or generally non-uniform memory hierarchies by collecting information about the hardware topology and making application decisions at run time to increase the applications’ overall performance. Several aspects of the topology important to some applications include knowledge about the number of CPUs, the presence or absence of symmetric multi-threading or hyper-threading capabilities, the affiliation between processors and blocks of memory, the relationships between IO controllers, processors and memory blocks, and so on. Linux 2.6 contains a number of application program interfaces (APIs) today to help applications aware of the NUMA topology to express or memory allocation preferences to the operating system. These include APIs to set and get the application’s CPU affinity and to determine CPUs associated with nodes and thereby provide affinity of an application to a node. Linux today offers APIs for hugetlbfs, mmap placement (in -mm 2.6.10-rc2-mm4), shared memory (System V shm*() functionality), mount option for tmpfs memory interleaving and memory placement policies. Use mbind() or setmempolicy(), getpolicy().

References Linux scalability effort mailing lists: http://sourceforge.net/mailarchive/forum.php?thread_id=2375117&forum_id=5292

107 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

SC.NUMA-Topo Non-Uniform Memory Access 1 Scalability Completed (NUMA)Topology

Description Many large hardware platforms today have decreasing uniformity in access to memory from multiple CPUs. The most obvious cases are machines designed with NUMA architecture in mind, including several IA32 and IA64 platforms. Other machines, such as AMD x86-64-based platforms, are referred to as "SUMA," or Sufficiently Uniform Memory Architecture. However, some key applications that are SMP- aware are also able to avoid the memory-access penalties of NUMA (or generally non-uniform memory hierarchies) by collecting information about the hardware topology and making application decisions at run time to increase the applications’ overall performance. Several aspects of the topology that are important to some applications include knowledge about the number of CPUs, the presence or absence of symmetric multi-threading or hyper-threading capabilities, the affiliation between processors and blocks of memory, the relationships between IO controllers, processors and memory blocks, and so on. The Linux 2.6 kernel contains some portions of this topology information, which is exported to user space via /. A few subsystems have been converted, for example PCI busses have been implemented in this topology representation, although not all possible IO bus types have been converted to provide topology information to user space. Additionally, for more complex hardware configurations, being able to assign distance or cost values to the relationships between blocks of memory and CPUs or IO controllers would improve application scalability and performance for high end IA64 or x86-64 platforms. Patches for this are now available for CPU and node distances within sysfs, allowing general access to varying node distances from user level. The topology information is available for all platforms which use SRAT, SLIT and open firmware, and most IA64 platforms support the topology information as of Linux 2.6.10-rc2-mm4. Any new hardware platform should complete sysfs information for that platform. Completion depends on this getting pushed to mainline from -mm tree and all NUMA platforms implementing the distances information.

References See also in this document: SC.NUMA-API (Non-Uniform Memory Access (NUMA) APIs)

108 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

SC.SMT Symmetric Multi-Threading 1 Scalability Stable (e.g. Hyperthreading)

Description At the 2003 Linux Kernel Summit, a variety of key processor architects spoke about upcoming trends in processor technologies. One of the key points the architects made was that most upcoming processors will provide multiple processing elements on a single die. In some cases, those processing elements will share key portions of instruction cache, data cache, L2 cache, and so on. With the increasing trend toward multiple-cores per die and symmetric multi-threading and hyperthreading, Linux will continue to need increased innovation and support for scheduling processes intelligently on the various processing elements of the system. Early experiences on Intel's Hyperthreaded Xeon Pentium IV (tm) processors indicate that performance can vary, depending primarily on the scheduling decisions. At one end of the spectrum, processors can yield fewer throughputs than scheduling only a single element. At the other end, processors can produce a 30% increase in overall throughput. To remain competitive on such platforms, Linux will need continued improvements in scheduling for SMT, H and multi-die per core processors.

References KernelTrap.org: http://kerneltrap.org/node/view/2554 eWeek article: http://www.eweek.com/article2/0,1759,1545607,00.asp

109 Data Center Linux Capabilities

Performance Capabilities in the Performance Category support performance levels expected in data center environments.

ID Name of Capability Priority Level Category Maturity Level

P.Packets Packet Tests 2 Performance Edge: Completed Application: N/A DB/Content: N/A

Description This capability addresses packets/second and bytes/ second network performance.

ID Name of Capability Priority Level Category Maturity Level

P.Forwarding Forwarding/Firewall Test 2 Performance Edge: Integrated Application: N/A DB/Content: N/A

Description This capability requires that Linux perform forwarding (including at firewall) as well as or superior to existing industry-leading commercial solutions (against measures such as sessions/second). Maturity level here is judged based on comparison against dedicated hardware solutions.

ID Name of Capability Priority Level Category Maturity Level

P.LoadBalancing Load Balancing 2 Performance Edge: Completed Application: N/A DB/Content: N/A

Description This capability requires that Linux perform load balancing as well as or superior to existing industry- leading commercial solutions (against measures such as sessions/second). Edge maturity of this capability is not considered 100% since it is compared against hardware solutions.

110 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

P.SecKeysps Security Keys/Second 2 Performance Edge: Mainline Application: N/A DB/Content: N/A

Description This capability is concerned with SSL performance measured as the number of security keys per second. Tests include SPEC web SSL and SSL source micro benchmarks. To reach 100% edge maturity, we would need hardware support.

ID Name of Capability Priority Level Category Maturity Level

P.FileSrvr File Server 2 Performance Edge: Completed Application: N/A DB/Content: N/A

Description This capability is concerned with file server performance measured as results from tests like NetBench. The maturity level is measured relative to other industry-leading operating systems and any other solution in the same price range.

ID Name of Capability Priority Level Category Maturity Level

P.WebSrvr Web Server 2 Performance Edge: Completed Application: N/A DB/Content: N/A

Description The performance capability is measured using SpecWeb test results.

111 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

P.MailSrvr Mail Server 2 Performan Edge: Completed ce Application: N/A DB/Content: N/A

Description SpecMail is the performance test used to measure the maturity of this capability.

ID Name of Capability Priority Level Category Maturity Level

P.DirectorySvcs Directory Services 2 Performance Edge: Mainline Application: N/A DB/Content: N/A

Description This reflects performance by directory services, an LDAP BMS specification, using the DNS benchmarks as a measure. To achieve 100% edge maturity, further maturity in LDAP is required.

112 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

P.NFSV3- Network File System 1 Performance Mainline (NFS) V2/V3 Performance Server/Client

Description Choosing an adaptable and dynamic storage solution is key to the future of an enterprise. The Network File System versions 2 and 3 are two of the most widely deployed file sharing protocols used in proprietary enterprise operating systems, for both Storage Area Network (SAN) and network-attached storage. Data Center Linux systems should provide stable and competitively performing NFS v2/v3 client and server implementations.

Historically, NFS has trailed the rest of Linux in providing the stability, performance and scalability that is appropriate for enterprise workloads. This capability seeks to maximize Linux NFS RAS and performance to meet or exceed competitive pressure from other enterprise operating systems.

Maximizing NFS v2/v3 performance and stability requires iterative testing and analysis of open and industry benchmarks, analyzing bottlenecks and failures, hypothesizing and refining improvements.

Examples of areas of concern for today's Linux NFS v2/v3 implementations include out-of-box performance/tuning issues, performance regressions, silent failures caused by lack of error messages/return codes, lack of metrics for performance monitoring and debugging, issues with diagnosing client and network problems, documentation, legacy support and so on.

Examples of proposed or ongoing development work that might impact NFS v2/v3 RAS and performance include dynamic nfsd threads, rpc client work queues, direct I/O and readahead modifications, rpc transport switch, smp affinity for threads/sockets and continuing BKL removal.

Given the fast-changing nature of the Linux kernel, frequent workload and regression testing and analysis are necessary, particularly as NFSv4 features continue to be retrofitted into the Linux NFS V3 client and server implementations.

References Request for Comment 1094: “NFS, Network File System Protocol Specification”: http://www.faqs.org/rfcs/rfc1094.html Request for Comment 1813, “NFS Version 3 Protocol Specification”: http://www.faqs.org/rfcs/rfc1813.html Technical Report, “Using the Linux NFS Client with Network Appliance Filers”: http://www.netapp.com/tech_library/ftp/3183.pdf

113 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

P.NFSV4-perf Network File System 1 Performance Mainline (NFS) V4 Performance and Functionality

Description The Network File System (NFS) has been the standard distributed file system for *NIX systems for almost two decades. The latest version of the NFS protocol, version 4, is in the process of being retrofitted into the 2.6 Linux NFS client and server implementations. Certain features are available today, but others, such as read and write delegation and replication and migration support, are still under development. Data Center Linux should deliver on the promise of NFSv4 to provide a production-quality, full-featured and scalable NFSv4 implementation that meets the challenge of the proprietary enterprise operating systems solutions currently available.

Important enterprise enhancements in NFSv4 include expanded file system name space and a file sharing model that supports Windows, performance and scalability, RPC and communications transport, and security improvements. NFSv4 introduces stateful file sharing with sophisticated client and server reboot recovery mechanisms, byte-range and share reservation, client file delegation, compound RPCs for performance, and mandated strong security mechanisms. Standardized use and interpretation of ACLs across Posix and Windows, collecting the disparate NFS protocols into a single protocol specification, and support for file migration and replication make NFSv4 a key enterprise file-sharing protocol for the data center.

Maximizing NFSv4 performance and functionality requires development of test tools, iterative testing and analysis of open and industry benchmarks, analysis of bottlenecks and failures, hypothesizing and refinements.

Given the fast-changing nature of the Linux kernel, frequent workload and regression testing and analysis are necessary, particularly as NFSv4 features continue to be retrofitted into the Linux NFS client and server implementations.

References NFS client patches for Linux: http://www.linux-nfs.org

Network Working Group Call for Requests 3530, “Network File System (NFS) Version 4 Protocol” (April 2003): http://www.faqs.org/rfcs/rfc3530.html Center for Information Technology Integration, NFS Version 4 Open Source Reference Implementation Project: http://www.citi.umich.edu/projects/nfsv4/ OSDL Storage SIG NFS Testing Matrix: http://developer.osdl.org/dev/nfsv4/

114 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

P.GCCopt GCC Optimizations 2 Performance Product Available

Description The performance measure for this capability is SPECcpu2000.

115 Data Center Linux Capabilities

ID Name of Capability Priority Category Maturity Level Level

P.Java-Perf Java Performance 1 Performance Edge: N/A Application: Usable DB/Content: Integrated

Description There is an industry perception that Java performance on Linux is not competitive with proprietary enterprise operating systems. To win acceptance in the Data Center, Linux should demonstrate real world competitive Java performance with key current and future workloads, particularly those that demonstrate J2EE enterprise applications.

This capability seeks to maximize Linux Java/J2EE performance and provide quantifiable results that meet or exceed competitive pressure from other enterprise platforms such as .NET.

The history of Java performance refinements has resulted in ever-increasing performance, from the introduction of JIT compilation through the recent integration of NPTL-native POSIX threading facilities with the Java threading abstraction.

Maximizing Java/J2EE performance requires iteratively measuring application response using open and industry benchmarks, analyzing bottlenecks, hypothesizing and refining improvements. Open benchmarks: open workloads and micro-benchmarks that accurately model key characteristics of enterprise applications (particularly in multi-tier configurations) and are deployable in standard test frameworks, such as LTP and OSDL STP, are required for open source developers to successfully measure and improve applications performance. Quantifiable results: to support customer decisions and verify the performance of Data Center Linux distributions, industry benchmark runs are necessary with large configurations, to meet competitive pressure. Performance analysis tools: to resolve bottlenecks and identify architectural limitations, improved performance analysis tools are necessary. In order to continue to increase system performance, automated workloads, micro-benchmarks, and performance measurement/observability tools available across Linux JVMs and Data Center Linux distributions are necessary.

Ongoing analysis of the performance of Linux JVMs across Data Center Linux distributions and identification and removal of bottlenecks is key to meeting or exceeding competitive pressures, particularly to reinforce the viability of J2EE on Linux versus .NET.

Given the fast-changing nature of the Linux kernel, frequent measurements will be necessary to confirm performance gains.

116 Data Center Linux Capabilities

References Related Links: Java Linux: http://www.blackdown.org Kaffe.org: http://www.kaffe.org java.net: http://java.net On Java technology: http://www.ibm.com/developerworks/java/ More on Java: http://java.sun.com

Benchmarks: Standard Performance Evaluation Corporation (SPEC): http://www.spec.org/benchmarks.html#java http://www.spec.org/jAppServer2002 http://www.spec.org/jAppServer2001 http://www.spec.org/jbb2000 http://www.spec.org/jvm98

Volano Chat: http://www.volano.com/benchmarks.html

Apache Jakarta Project: http://jakarta.apache.org/jmeter/

ID Name of Capability Priority Level Category Maturity Level

P.DBConnections Data Base Connection 2 Performance Edge: N/A Performance Application: Product Available DB/Content: N/A

Description Third party applications, web integrated development environment (IDE) and application development environments (ADE) indirectly test the application of this capability.

117 Data Center Linux Capabilities

ID Name of Capability Priority Category Maturity Level Level

P.FILESYS File System Performance 1 Performance Integrated

Description Competitive file system performance is required for Data Center Linux acceptance. There is a customer perception that Linux open source file system performance is not competitive with proprietary enterprise operating systems, particularly for database applications.

This capability seeks to maximize Linux file system performance and provide quantifiable results that meet or exceed competitive pressure from other enterprise operating systems.

The file system has long been the heart of UNIX storage management, providing the key abstraction for data storage allowing application portability at the cost of performance. Data Center Linux DB/Content servers require improved performance when using file systems for database and content applications, particularly with large, persistent files. Less importantly, application and edge servers will also benefit from improved performance, typically with many small, short-lived files.

Maximizing file system performance requires iteratively measuring file system response using open and industry benchmarks, analyzing bottlenecks, hypothesizing, and refining improvements. Open benchmarks: open benchmarks and micro-benchmarks that accurately model key characteristics of database applications and are deployable in standard test frameworks such as LTP and OSDL STP are required for open source developers to successfully measure and improve file system performance. Quantifiable results: to support customer decisions and verify the performance of Data Center Linux distributions, industry benchmark runs with large configurations are necessary to meet competitive pressure. Performance analysis tools: to resolve bottlenecks and identify architectural limitations, improved performance analysis tools are necessary. Projects that address the performance characteristics of individual file systems, the kernel Multiple Devices (md), S mid-layer, data copying, and device drivers may be needed to meet competitive pressure. Given the fast-changing nature of the Linux kernel, frequent measurements are necessary to confirm performance gains.

References Linux Performance Projects: Linux Technology Center: http://www.ibm.com/developerworks/linux/ltc Linux Bench mark Suite Homepage: http://lbs.sourceforge.net/

Related projects include: Benchmarks: Penguinometer: http://pgmeter.sourceforge.net/ The I/O Zone Filesystem Benchmark: http://www.iozone.org/ Bonnie++ Benchmark Suite: http://www.coker.com.au/bonnie++/ The Postmark Source Code:http://www.netapp.com/ftp/postmark-1_5.c USENIX whitepaper: http://www.usenix.org/events/usenix02/tech/freenix/bryant.html

118 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

P.LrgMultiTask Large Multi-Task 2 Performance Completed Performance

Description This capability addresses overall system performance when there are a lot of active processes and/or threads. Large here means more than 10 times the number of active tasks (threads or processes) or thousands of started tasks.

ID Name of Capability Priority Level Category Maturity Level

P.FeaturesAPIs Performance 2 Performance Product Available Features & APIs

Description This capability addresses whether or not Linux has the APIs that the ISVs need for performance levels equal to or better than other industry-leading OS solutions.

119 Data Center Linux Capabilities

ID Name of Priority Level Category Maturity Level Capability

P.PORT Port Quality 1 Performance Product Available

Description Competitive performance for ported applications and kernel modules is required for Data Center Linux acceptance. There is a customer perception that Linux performance is not competitive with proprietary enterprise operating systems, particularly when the quality of an application ported from its previous operating environment is not optimal.

This capability seeks to maximize the performance of applications and kernel modules ported to Linux, to meet or exceed competitive pressure from other enterprise operating systems.

Application availability is frequently cited as one of the primary barriers to Data Center Linux adoption. A minority of applications that have been ported to Linux take advantage of high performance Linux features and design practices, which are not well known or well documented. Examples include the use of recent POSIX interfaces, such as sendfile(), large pages, mutexes, reserved kernel memory, interrupt and process affinity and so on.

Maximizing application performance requires performance porting guides and best practice documents to ensure ISVs take full advantage of Data Center Linux's performance Application Programming Interfaces (APIs). Standardization of high performance Linux features and interfaces through the LSB will also speed adoption by ISVs. Finally, Application Binary Interface (ABI) checking tools that identify the use of obsolete interfaces are needed, and enhanced application performance analysis tools that identify characteristic bottlenecks and solutions are needed.

References

Linux Performance Projects: Linux Technology Center: http://www.ibm.com/developerworks/linux/ltc Linux Benchmark Suite Homepage: http://lbs.sourceforge.net

Related projects include: Porting resources: Unix Porting: http://www.unixporting.com/

ID Name of Capability Priority Level Category Maturity Level

P.Middleware Middleware Performance— 2 Performance Product Available Open Database Connectivity (ODBC)

Description Third party applications and database benchmarks indirectly test the performance of ODBC.

120 Data Center Linux Capabilities

ID Name of Capability Priority Category Maturity Level Level

P.APP Application Performance 1 Performance Product Available

Description Competitive enterprise applications performance is required for Data Center Linux acceptance. There is a customer perception that Linux enterprise applications performance is not competitive with proprietary enterprise operating systems. This capability seeks to maximize Linux enterprise applications performance and provide quantifiable results that meet or exceed competitive pressure from other enterprise operating systems.

The 64 billion dollar enterprise applications (EA) market includes tightly integrated suites of enterprise resource planning (ERP), financials, human resources (HR), supply chain management/planning (SCM), E-business suites and employee and customer relationship management (ERM/CRM) applications, increasingly deployed across the entire enterprise. To be a viable EA platform, Data Center Linux servers require improved performance when benchmarked with the industry/capacity planning EA benchmarks associated with the ISV-1 List. See also WL-11 (Top-Echelon (ISV-1) Software Solution Availability) in this document.

Maximizing EA performance requires iteratively measuring application response by using open and industry benchmarks, analyzing bottlenecks, hypothesizing and refining improvements.

Open benchmarks: open workloads and micro-benchmarks that accurately model key characteristics of enterprise applications are required for Open Source developers to successfully measure and improve applications performance (particularly in multi-tier configurations). Benchmarks should be deployable in standard test frameworks such as LTP and OSDL STP. Quantifiable results: to support customer decisions and verify the performance of Data Center Linux distributions, industry benchmark runs identified in ISV-1 list are necessary with large configurations to meet competitive pressure. Performance analysis tools: to resolve bottlenecks and identify architectural limitations, improved performance analysis tools are necessary. Given the fast-changing nature of the Linux kernel, frequent measurements will be necessary to confirm performance gains.

References

Linux Performance Projects: Linux Technology Center: http://www.ibm.com/developerworks/linux/ltc Linux Benchmark Suite Homepage: http://lbs.sourceforge.net

Benchmark examples: http://www.sap.com/benchmark http://www.oracle.com/apps_benchmark/index.html http://www.siebel.com/products/performance_benchmark

121 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

P.OLTP≤4Procs Workloads: Online 2 Performance Edge: N/A Transaction Processing Application: N/A (OLTP)—4 Processors or DB/Content: Integrated Less

Description Single node 4 CPU or less OLTP benchmarks, for example, TPC-C, are the measure of performance for this capability.

ID Name of Capability Priority Level Category Maturity Level

P.OLTP>4Procs Workloads: Online 2 Performance Edge: N/A Transaction Processing Application: N/A (OLTP)—Greater than 4 DB/Content: Usable Processors

Description Single node greater than 4 CPU OLTP benchmarks, for example, TPC-C, are the measure of performance for this capability.

ID Name of Capability Priority Level Category Maturity Level

P.DSS≤4Procs Workloads: Decision 2 Performance Edge: N/A Support System (DSS)—4 Application: N/A Processors or Less DB/Content: Integrated

Description Single node 4 CPU or less DSS benchmarks, for example, TPC-H/R, are the measure of performance for this capability. Database audited benchmark results are required to move the DB/Content maturity higher.

122 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

P.DSS>4Procs Workloads: Decision 2 Performance Edge: N/A Support System (DSS)— Application: N/A Greater than 4 Processors DB/Content: Usable

Description Single node greater than 4 CPU DSS benchmarks, for example, TPC-H/R, are the measure of performance for this capability. Database automated benchmark results are required to move the DB/Content maturity higher.

Priority ID Name of Capability Category Maturity Level Level

P.ECommerce Workload— 2 Performance Edge: Product Available ECommerce Application: Integrated DB/Content: Product Available

Description SPECjappserver benchmark results are the measure of performance for this capability.

ID Name of Capability Priority Level Category Maturity Level

P.Financial Workload—Financial 2 Performance Edge: Mainline (Trades) Application: Integrated DB/Content: Mainline

Description Ecperf, Trade2 and Trade 3 benchmark results are the measure of performance for this capability.

123 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

P.TunableParams Tunable 2 Performance Stable Parameters

Description This capability allows tunable parameters related to operating system structures that affect performance characteristics to be adjusted without having to reboot a system. The parameters should be documented, with information about how they affect performance.

ID Name of Capability Priority Level Category Maturity Level

P.CurrentTech Current Technology 2 Performance Product Available Implementations

Description Complete maturity for this capability focuses on Linux having ports to new technology, hardware, chipsets and so on, and includes Linux taking advantage of hardware performance features (for example, hyper-threading).

ID Name of Capability Priority Level Category Maturity Level

P.MeasureInfra Performance Measurement 2 Performance Mainline Infrastructure

Description This capability provides hooks to the operating system to pull performance data. It identifies important information needed for performance tuning, and it defines APIs that any tool can use to pull measurements. The capability documents how data is captured and what it means.

124 Data Center Linux Capabilities

Reliability, Availability and Serviceability (RAS) Capabilities in the RAS category support greater Reliability, Availability and Serviceability.

125 Data Center Linux Capabilities

ID Name of Priority Level Category Maturity Level Capability

R.CRASH Crash Dump 1 RAS Mainline

Description When a system crashes, the OS kernel should be able to produce and save a retrievable image of the system at the moment of the crash. The image should be saved to permanent or transient storage (local or remote). The data should allow for identification of the following:

• the precise CPU action that caused the crash

• the context information for each CPU

• the different kernel data structures

• memory dumps This is a post-mortem product needed to determine the cause of a system crash and to assist in gathering data that might be fed back to the development/support teams to fix the root cause. This improves general system availability in the long run.

Usability The usability aspect of this capability requires further investigation.

References Not all platforms are supported, and the solution has reliability problems, but it is in customer hands today in open distros. Linux Kernel Crash Dump Project: http://lkcd.sf.net LKCD provides a full set of dump targets, including network, memory, raw block devices (problematic) and ide disks (experimental). The memory dump uses to preserve memory across reboot. ’s Network Console and Crash Dump Facility: http://www.redhat.com/support/wpapers/redhat/netdump/ This is a separate version that splintered off from an earlier version of LKCD. The LKCD patches were rejected for inclusion in the mainline Linux 2.6.0 kernel, but parts of the project receive regular contributions. The network device changes necessary for network dumping have been accepted by the maintainer. Some discussion of using software suspend for crash dumping has occurred. There is a new open kernel crash dump tool, Project Mini-Kernel Dump, based on kexec in the Linux 2.6 kernel: http://www.sourceforge.net/projects/mkdump/ At the moment of a crash, kexec initiates a mini kernel dump as a separate kernel, and it is this mini kernel that obtains the crash dump. The operation of the mini kernel crash dump is documented on the SourceForge project site cited above.

126 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

R.PkgChgHist Package Change 2 RAS Product Available History/Logging

Description This keeps track of the RPM history and which package version and kernel is running.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. The usability aspect of this capability requires further investigation.

ID Name of Capability Priority Level Category Maturity Level

R.HwChgHist Hardware Change 2 RAS Investigation History/Logging

Description This capability checks which devices were there from the last time a system was booted. It checks for any kind of Bios data.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. The usability aspect of this capability requires further investigation.

ID Name of Capability Priority Level Category Maturity Level

R.CustomHist Customization 2 RAS Investigation History/Logging

Description This includes tracking tuneable parameters/changes, configuration files, and patches not included in packaging.

Usability The usability aspect of this capability requires further investigation.

127 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

R.AltTargetInst Install on Alternate Target 2 RAS Edge: N/A Application: Completed DB/Content: Completed

Description This capability allows installation on alternate target disks, and it includes installations of distributions and/or applications. It installs the product stack without rebooting, until it is time for use. This minimizes downtime and allows a fast back-out strategy in case of upgrade problems

Usability The usability aspect of this capability requires further investigation.

ID Name of Capability Priority Level Category Maturity Level

R.RevertInst Revert Installation/Patch 2 RAS Edge: N/A Sets if Failure Application: Released DB/Content: Released

Description This capability allows a system administrator to revert to an old distribution if the new distribution has problems, or to revert to a series of patches if those patches prove to be inadequate. In regard to the application maturity level, there is some availability through distributions.

Usability The usability aspect of this capability requires further investigation.

128 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

R.IntegrityCkg Version and Integrity 2 RAS Completed Checking

Description This ensures that new versions being installed are not corrupted with the already-existing installed packages. Dependency checking, version control, integrity checking and security checking are all needed.

Usability The usability aspect of this capability requires further investigation.

ID Name of Capability Priority Level Category Maturity Level

R.UpdNotific-Sec Update Notification— 2 RAS Completed Security

Description This capability enables automatic vendor notification of package updates to end customers. The edge maturity level is estimated based on this capability involving security, firewall and spam filters. The application and DB/Content maturity level is based on data corruption issues rather than security issues.

Usability The usability aspect of this capability requires further investigation.

129 Data Center Linux Capabilities

ID Name of Capability Priority Category Maturity Level Level

R.UpdNotifi-DC Update Notification— 1 RAS Development Data Corruption

Description The large number of systems in a data center requires a mechanism for receiving notification of package updates for defects such as security exploits exposed from a buffer overrun bug.

In addition to receiving notification messages for package updates, the package management system needs to provide the ability to detect and notify the user about data corruption for already existing software.

Usability The usability aspect of this capability requires further investigation.

References Red Hat Enterprise Systems Management: http://www.redhat.com/software/rhn/

RPM provides the ability to verify the integrity of an install package: http://www.rpm.org/

130 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

R.KDEBUGGER Debugger—Kernel 1 RAS Product Available

Description DCL implementations require high quality tools for development, monitoring and problem analysis/resolution. Among the most important of these tools is a highly functional and feature-complete kernel debugger accompanied by an OS environment that allows the debugger to be easily deployed. Problem resolution in the data center environments often takes place on “live” systems. In these situations the kernel debugger may be called upon to resolve difficult issues that take a long time to reproduce or are only reproducible on the production server with production code in place. The assumption that the system can be taken down to get a debug kernel loaded may not be acceptable. Furthermore the assumption that the problem can be reproduced and debugged on a debug kernel may also be false. Consequently DCL implementations require support for the debugger to be loaded on the production server without rebuilding the kernel or rebooting the system. For kernel debugging there are three main capabilities that are needed. Each major area has additional features and capabilities that are required. The three main capabilities are “on-box” kernel debugging, “off-box” remote source-level debugging and finally the OS infrastructure to support loadable debug agents. We will address these capabilities in that order. • Kernel Debugger Capabilities (On-Box) DCL implementations require a good on-box kernel debugger. A debug agent should be provided to support a variety of features and capabilities. These features include but are not limited to the following: Stepping features Breakpoint capabilities, including conditional breakpoints An API for extending the debugger The capability to examine CPU and system state Program screen viewing Expression evaluation Ability to display thread and process information Help screens Disassembly Stack trace Support for event such as breaking on module load/unload, thread create/destroy, break on read/write (as CPU architectures permit) Memory manipulation The capability to change focus CPU and read per CPU state/memory Ability to reboot the system Catch kernel panics List loaded modules Symbol support Dump machine state to an external agent for offline analysis The debugger should support access to CPU and hardware specific registers

131 Data Center Linux Capabilities

• Debugger Capabilities (Off-Box—Remote Source Level) DCL implementations also require the capability to support remote source-level debugging. A debug agent should be provided to support a variety of features and capabilities. These features include but are not limited to the following: Support for the GDB transport and its associated verbs and events (or another well-documented protocol) The ability to connect remotely via LAN, Serial IO and so on All the standard features of a remote source level debugger should be supported by the kernel agent • Debugger Capabilities (Support for Loadable Debug Agents) DCL implementations require the ability to examine the state of a “live” system. Kernel debugging, whether on-box or remote, must be possible on a live system, without the requirement to reboot or rebuild the kernel. To support this capability, a framework should be provided, at the OS exception level, that allows trusted and signed debug agents to be loaded by root and hook into the exception handling framework, thus supporting on-the-fly loading of kernel debuggers.

Usability The usability aspect of this capability requires further investigation.

References Linux Kernel Level Source Debugger (KGDB): http://kgdb.linsyssoft.com/index.html GDB, GNU Project Debugger: http://www.gnu.org/software/gdb/gdb.html Kernel Dynamic Probes (dprobes): http://dprobes.sourceforge.net/ Kernel Debugging with Kprobes: http://sourceware.org/systemtap/kprobes Maturity level is based on NLKD http://forge.novell.com/modules/xfmod/project/?nlkd

ID Name of Capability Priority Level Category Maturity Level

R.AppDEBUGGER Debugger—Application 2 RAS Product Available

Description The edge maturity level for this capability is based on the fact that it currently doesn’t support robust threading. There are C++ issues and many other issues.

Usability The usability aspect of this capability requires further investigation.

132 Data Center Linux Capabilities

ID Name of Priority Level Category Maturity Level Capability

R.DynamicTracer Dynamic Tracer 1 RAS Integrated

Description Reliability is required on enterprise systems. This “reliability” means not only availability and serviceability, but it provides a way for quick investigation of system failures that can result in bug fixes. A tracer is an important and useful tool as a means to offer this information. A Linux kernel tracer should be a fully featured tracing system for the Linux kernel.

• When a system failure occurs, the tracer can collect and save data for analysis of behavior in the kernel.

• On a live system, the tracer can collect and save data for evaluation of kernel performance.

• Events (trace points) can be selected dynamically.

• The modules (handlers) that collect kernel data can change dynamically.

• Trace data can be recorded cyclically in buffer (in memory), so collection can be continuous.

• Trace data can be recorded in a file (on disk).

Usability This capability allows an OS to intercede in the following situation: a problem occurs at a point in time, but the actual failure happens much later. To debug this situation, a non-burdensome mechanism (resource-wise) is required to trace what happened and to focus on a particular subsystem, if necessary. The tracing mechanism needs to be highly controllable, so you can specify what you want to capture, and where. The mechanism lives in the kernel, but it is not enabled all the time. In addition, a good dynamic tracing tool can be used without the requirement of a reboot. Existing solutions have fixed places where the reporting can occur, and these places are set in advance (whereas the Linux kernel tracer could be turned on without reboot). LTT, for example, does not go far enough in this regard. It is Dtrace-like; it has a language to write expressions that can capture stacks, locks, and find information. Kprobe and dprobe are the current existing Linux tracing tools, and they provide the building blocks of this capability. One of the problems with dprobe is that it is expensive in terms of numbers of "hot points." Solving this would likely mean extending dprobe and kprobes. DTrace on another commercial UNIX platform offers a solution with free form kernel tracing capability. It provides a high level of control for choosing which "parts" of the system should be monitored.

133 Data Center Linux Capabilities

References SystemTap project: http://sourceware.org/systemtap/ Linux Kernel State Tracer Project: http://lkst.sourceforge.net (as of 2.6.12 is included in some Asian distributions) The Linux Kernel State Tracer (LKST) provides flexible and extensible logging facilities. LKST records kernel information as trace data at events in the Linux kernel. LKST not only cyclically records data into a buffer as the flight recorder, but it also saves it into disk. LKST users can customize events and event handlers (modules) and it can buffer dynamically. Users can use LKST data for analysis of behavior in the kernel, analysis of state transition for a process, evaluation of kernel performance, and so on. Additionally, LKST cooperates with LKCD, as in the following examples: process context switch, send signal, exception, memory allocation, send packet, and so on. Linux Trace Toolkit Project: Linux Tips and Tricks: http://www.patoche.org/LTT/ OPERSYSinc: http://www.opersys.com/LTT/ LTT provides its user with all the information required to reconstruct a system's behavior during a certain period of time. LTT includes the kernel components required for tracing and the user-level tools required to view the traces. The features support Real-Time Application Interface (RTAI), a dynamic creation and logging of custom events in kernel and in user space, used to support custom formatting of custom events, and so on. (From LTT web site.) Linux Technology Center: Dynamic probes (dprobes): http://dprobes.sourceforge.net/ Kernel Probes (kprobes): http://sourceware.org/systemtap/kprobes Dtrace: http://www.sun.com/bigadmin/content/dtrace/

Usenix paper summarizing Dtrace: http://www.sun.com/bigadmin/content/dtrace/dtrace_usenix.pdf NPTL Trace Tool Project: http://sourceforge.net/projects/nptltracetool/

134 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

R.HdwrFPFL Hardware Fault 1 RAS Integrated Prediction and Fault Location

Description In order to increase system availability, software should detect hardware that is close to failure. In this scenario, hardware can be replaced before it fails, time spent repairing damage is reduced and the possibility of catastrophic loss of data is reduced. DCL implementations should provide means for kernel and user space applications to collect and collate system-provided data that can be used to locate or predict hardware faults. To predict and locate faults, this feature uses statistical processing of intermittent or non-deterministic hardware malfunction. The data gathering includes, but is not limited to, the following sources:

• Memory parity errors

• Sensor readings of fan speeds, CPU/disk/device temperatures, voltages, and so on

• Statistics from the different CPUs and devices on their performance estimations The data can be used to predict a trend or detect a possible fault or behavioral change (for example, if a disk drive's temperature rises and the temperature of the enclosing cabinet has not risen, a failure might be forthcoming). When properly logged, the data can also be used to identify what has failed, and why. To complete fault prediction and location, two functions are necessary:

• A collection mechanism that continuously watches hardware and collects information of the hardware (system) condition onto a log file. The kernel piece of this is exposing common software counters (fro example, network errors).

• An analysis mechanism that need not reside in the kernel. The mechanism statistically analyzes the log file in order to detect error and the location of the error. This feature is very important for the database tier, important for the application tier. It is slightly important for the edge tier since it is assumed that edge server-based solutions inherently guard against a hardware fault actually affecting the end user of the application.

135 Data Center Linux Capabilities

Usability Effective usability is critical for effective implementation. The following examples illustate effective usability.

• Serious errors should to be easily recognizable over minor issues.

• Information regarding failing components should be provided in a timely, easily recognizable fashion.

• Notification mechanisms should be included.

• Identification and resolution of problems (such as hot swapping components) should be highly integrated.

• There should be customizable or rule-based error limits that are determined by customer experience and/or historical data. Currently some errors are reported in logs, but they are difficult to find and harder yet to analyze. This subject needs further definition and research. The feature is dependent upon system and event-logging enhancements. Existence of hot swap CPU and memory tools is necessary to enable integration failure identification with proactive response. The kernel includes proper support for reading i2c sensors, whose different drivers are being merged in.

References Related projects include the following: Autogen Automated Event Management Project information: http://autogen.sourceforge.net/autoevents.html SWATCH Active logfile monitoring tool: http://swatch.sourceforge.net Disk drive impending failures. Monitoring storage systems using the Self-Monitoring, Analysis and Reporting Technology System (SMART) supported by many recent ATA and SCSI hard disks: http://smartmontools.sourceforge.net/ Event logging for enterprise-class systems: http://evlog.sourceforge.net/ Linux Diagnostic Tools project:http://linux-diag.sourceforge.net/ OpenIPMI: http://openipmi.sourceforge.net Management interface to obtain Linux ECC information from memory modules, for Linux 2.2 kernels.

136 Data Center Linux Capabilities

ID Name of Capability Priority Category Maturity Level Level

R.SW-FAULT-ISOLATION Software Fault Location 1 RAS Investigation Identification

Description The complex nature of data center operations requires an operating system to be able to isolate software failures rapidly. An operating system suitable for data system operations requires the ability to trace software faults, enabling fault isolation.

Usability Linux already provides mechanisms for logging and persisting system software messages, but it requires deep knowledge of the specific subsystem to determine the source of a fault.

This capability associates faults such as Event Logging or Printf, with customer or service-useable information.

References Needs further definition and research.

137 Data Center Linux Capabilities

ID Name of Capability Priority Category Maturity Level Level

R.LKSNAP Live Snapshot—Kernel 1 RAS Edge: N/A Level Application: Mainline DB/Content: Mainline

Description DCL implementations should provide facilities to take live snapshots of the running kernel image, without disturbing system operation. These live kernel snapshots should provide a complete image of the system status at the time of the snapshot. In many aspects, this feature is similar to the requirement described in R.CRASH. The intention behind this capability is to provide ways to dissect a system, as in the following examples:

• Some nonfatal inconsistency has been detected. The feature looks for preemptive ways to predict and fix it.

• The feature provides optimization analysis with real, running-system data.

• The feature provides a primitive for system checkpointing implementation. This adds fast recovery and service replication.

Usability The usability aspect of this capability requires further definition and research.

References Linux Kernel Crash Dump (LKCD) project: http://lkcd.sf.net

ID Name of Capability Priority Level Category Maturity Level

R.LPSNAP Live Snapshot—Process 2 RAS Edge: N/A Level Application: Mainline DB/Content: Mainline

Description This capability provides live snapshots at the process level, without kernel shutdown or crash dump.

Usability The usability aspect of this capability requires further definition and research.

138 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

R.ProactiveMon Proactive System Health 2 RAS Edge: N/A Monitoring Application: Investigation DB/Content: Investigation

Description This capability provides the ability for an OS to proactively detect (and avoid, when possible) potential errors. By monitoring different system components and gathering information, this capability can check for data consistency and hardware reliability. If the OS detects inconsistency, further analysis determines if the nature and type of the inconsistency can be handled so that an error will not occur. This capability is not meant to be an event log analyzer but to go one step further in the prevention of potential errors by detecting inconsistencies before events reporting them are logged. Environmental monitoring is not considered in this capacity.

Examples of system components that could benefit from such a capability are memory, storage devices, kernel structures, file systems and so on.

This capability should run in the background without affecting the performance of the system.

Usability In general this capability will be considered as passive. However depending on the system component affected human intervention might be required at some point in time (for example in the case where a component is determined to be faulty and is therefore isolated, replacement of that component would require human intervention).

References Environmental Monitoring Capability See also in this document: R.HdwrFPFL (Hardware Fault Prediction and Fault Location).

139 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

R.RemoteSrvc Remote Serviceability 2 RAS Completed

Description This addresses remote servicing of a machine outside of the local network. It includes the ability to access a system outside the normal user access mode (for example net console and serial console). In addition, considerations are being made for servicing systems without their OSs running.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. The usability aspect of this capability requires further definition and research.

140 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

R.PerfMonitor Performance Monitoring 1 RAS Mainline

Description This capability enables service personnel to investigate the performance of a system. Through the collection of runtime execution and resource statistics, performance monitors and counters help users identify performance bottlenecks in application code and in their usage of kernel facilities and I/O resources (disk, network, and so on). The performance monitoring tools should support a variety of system architectures, such as distributed, clustered, NUMA, MPP and SMP systems. This capability requires a stable, complete set of performance data. For example, items that iostat command reports should be improved. Iostat on UNIX-based systems report more items than iostat on Linux systems. Figures such as “%w” (the percent of time that transactions are waiting for service) and “%b” (the percent of time the disk is busy) are very useful to notice when disk I/O is in critical condition.

Usability The DCL performance monitoring tools deal with the same information supplied by vmstat, iostat, netstat, and so forth. This information (or figures) should be organized and integrated into an easily viewable user interface. The appearance of the UI could be similar to “top” command, with the ranking of obstacles reported, so users can identify system status viscerally. These tools should provide the ability to monitor each status in real time and the ability to analyze statistical information for a specified term. Recording of live data for later investigation is also needed.

References SYSTAT Utilities home page: http://perso.wanadoo.fr/sebastien.godard/ RRDtool: http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/ Performance Co-Pilot:: http://oss.sgi.com/projects/pcp/

ID Name of Capability Priority Level Category Maturity Level

R.SrvrReplacePrcdrl Reproducible Server 2 RAS Completed Replacement— Procedural

Description This capability addresses server level replacement: bringing servers in/out within a group by following procedures.

141 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

R.SrvrReplace-Auto Reproducible Server 2 RAS Development Replacement— Automated

Description This capability enables server level replacement, bringing servers in/out within a group in an automated fashion, simplifying the effort. It assumes similar hardware.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. The usability aspect of this capability requires further definition and research.

ID Name of Capability Priority Level Category Maturity Level

R.SysCkptSrvrReplace System 2 RAS Edge: N/A Checkpoint/Server Application: Investigation Replacement DB/Content: Investigation

Description This enables added capability (above reproducible server replacement) that allows full system state image/restore.

Usability The usability aspect of this capability requires further investigation.

142 Data Center Linux Capabilities

ID Name of Capability Priority Category Maturity Level Level

R.HOTSWAPPCI Hot Swap: I/O Bus 1 RAS Edge: N/A Level—PCI, PCI-X, cPCI Application: Completed DB/Content: Completed

Description DCL implementations should allow the plugging and unplugging of entire PCI I/O bus hierarchies without taking down the system, properly handling the disconnection of each device that was attached to a now- disconnected bus and the probing and connection of a device present in a recently-attached bus. The system should provide means to handle ordered device unplugs, where the bus and all its child buses and devices are placed in quiesce mode until all current transactions are flushed and completed. As well, the system should support surprise extractions as initiated by the system motherboard on platforms that support an ejection capability, where devices and/or entire buses can disappear from the system without previous notice to the software stack. This requirement is needed so it’s possible to increase the system resources (as provided by added devices), and so the different devices connected to the system can be serviced or repaired without increasing the system downtime. Also, due to the wide variety of hardware that can be connected at the PCI level, it is impossible to predict or assume maintenance schedules will fit for all of them. Thus, it is simpler, less expensive and more cost- effective to be able to take offline a whole bus tree for service. For example, in the cases of a fiber-channel connection that needs to be serviced due to a defective cable, or an NIC card that provides connection to an iSCSI server, and so on. A system administrator should be able to identify a component to be replaced/unplugged, and then automatically quiesce all software uses of that device before unplugging (for example, umount filesystems, disable network connections, and so on).

Usability The usability aspect of this capability requires further investigation.

References Linux hotplugging: http://linux-hotplug.sourceforge.net/ Linux PCI hotplugging: http://linux-hotplug.sourceforge.net/?selected=pci

143 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

R.HOTSWAPSCSI Hot Swap: I/O Bus 1 RAS Edge: N/A Level—SCSI Application: Completed DB/Content: Completed

Description DCL implementations should allow the plugging and unplugging of an entire SCSI IO subsystem without taking down the system, properly handling the disconnection of each SCSI device connected to a particular SCSI bus or adding an entire SCSI bus and all SCSI devices connected to it. This requirement is needed so it is possible to increase the system resources (as provided by added devices) and so the different SCSI devices can be added, removed or serviced without increasing the system downtime. A system administrator should be able to identify a component to be replaced/unplugged, and then automatically quiesce all software uses of that device before unplugging (for example umount filesystems, disable volume managers, and so on).

Usability Although the information needed to achieve this capability is in the kernel, to make this seamless, one needs a user-level tool to locate all the devices on a bus and to remove them individually. It is relatively easy to trigger a scan of a bus to discover all the devices on it. Work is underway to handle file systems.

References Linux SCSI hotplugging: http://linux-hotplug.sourceforge.net/?selected=scsi Linux Hotplugging: http://linux-hotplug.sourceforge.net/ Mailing list: [email protected] Mailing list archive: http://marc.theaimsgroup.com/?l=linux-scsi

144 Data Center Linux Capabilities

ID Name of Capability Priority Category Maturity Level Level

R.HOTSWAPPCI-EX Hot Swap: I/O Bus 1 RAS Edge: N/A Level—PCI Express Application: Completed DB/Content: Completed

Description DCL implementations should allow the plugging and unplugging of entire PCI Express I/O bus hierarchies without taking the system down, properly handling the disconnection of each device that was attached to a now-disconnected bus and the probing and connection of a device present in a recently- attached bus. The system should provide means to handle ordered device unplugs, where the bus and all its child buses and devices are placed in quiesce mode until all current transactions are flushed and completed. As well, the system should support surprise extractions as initiated by the system motherboard on platforms that support an ejection capability, where devices and/or entire buses can disappear from the system without previous notice to the software stack. This requirement is needed so that it’s possible to increase the system resources (as provided by added devices) and so the different devices connected to the system can be serviced or repaired without increasing the system downtime. Also, due to the wide variety of hardware that can be connected at the PCI level, it is impossible to predict or assume maintenance schedules will fit for all of them. Thus, it is simpler, less expensive and more cost-effective to be able to take offline a whole bus tree for service. For example, in the cases of a fiber-channel connection that needs to be serviced due to a defective cable, or an NIC card that provides connection to an iSCSI server, and so on. A system administrator should be able to identify a component to be replaced/unplugged, and then automatically quiesce all software uses of that device before unplugging (for example umount filesystems, disable network connections and so on).

Usability The usability aspect of this capability requires further investigation.

References Linux hotplugging: http://linux-hotplug.sourceforge.net/

145 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

R.HOTSWAPUSB Hot Swap: I/O Bus 2 RAS Edge: N/A Level—USB Application: Completed DB/Content: Completed

Description DCL implementations should allow the plugging and unplugging of an entire USB subsystem without taking down the system, properly handling the disconnection of each USB device connected to a particular USB bus or adding an entire USB bus and all USB devices connected to it.

Usability The usability aspect of this capability requires further investigation.

References Linux hotplugging: http://linux-hotplug.sourceforge.net/?selected=usb Linux USB project: http://www.linux-usb.org/

ID Name of Capability Priority Level Category Maturity Level

R.HOTSWAPiSCSI Hot Swap: I/O Bus 2 RAS Edge: N/A Level—iSCSI Application: Completed DB/Content: Completed

Description DCL implementations should allow the plugging and unplugging of an entire iSCSI subsystem without taking down the system, properly handling the disconnection of each iSCSI device connected to a particular iSCSI bus, or adding an entire iSCSI bus and all iSCSI devices connected to it.

Usability The usability aspect of this capability requires further investigation.

References Linux-iscsi project: http://sourceforge.net/projects/linux-iscsi/ UNH-iSCSI Initiator and Target for Linux project: http://sourceforge.net/projects/unh-iscsi/

146 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

R.HOTSWAPIB Hot Swap: I/O Bus 2 RAS Edge: N/A Level—InfiniBand Application: Investigation DB/Content: Investigation

Description DCL implementations should allow the plugging and unplugging of an entire IB subsystem without taking down the system, properly handling the disconnection of each IB device connected to a particular IB bus, or adding an entire IB bus and all IB devices connected to it.

Usability The usability aspect of this capability requires further investigation.

ID Name of Capability Priority Level Category Maturity Level

R.HOTSWAPSATA Hot Swap: I/O Bus 2 RAS Edge: N/A Level—Serial Application: Usable Advanced DB/Content: Usable Technology Attachment (S-ATA)

Description DCL implementations should allow the plugging and unplugging of an entire Serial ATA (S-ATA) subsystem without taking down the system, properly handling the disconnection of each S-ATA device connected to a particular S-ATA bus, or adding an entire S-ATA bus and all S-ATA devices connected to it.

Usability The usability aspect of this capability requires further investigation.

147 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

R.HOTSWAPFire Hot Swap: I/O Bus 2 RAS Edge: N/A Level—Firewire Application: Investigation DB/Content: Investigation

Description DCL implementations should allow the plugging and unplugging of an entire Firewire subsystem without taking down the system, properly handling the disconnection of each Firewire device connected to a particular Firewire bus, or adding an entire Firewire bus and all Firewire devices connected to it.

Usability The usability aspect of this capability requires further investigation.

Maturity ID Name of Capability Priority Level Category Level

R.HOTSWAPCompIO Hot Swap: Component 2 RAS Completed Level—I/O

Description This capability considers whether a system can swap disk drives and cards (for example NIC cards).

Usability The usability aspect of this capability requires further investigation.

References See the Hot Swap I/Bus architecture items for Hot Swap at the I/O bus level, beginning here: R.HOTSWAPPCI (Hot Swap: I/O Bus Level—PCI, PCI-X, cPCI).

148 Data Center Linux Capabilities

ID Name of Capability Priority Category Maturity Level Level

R.HOTSWAPMEM-RM Hot Swap: Component 1 RAS Edge: N/A Level—Memory Remove Application: Usable DB/Content: Usable

Description DCL implementations should allow disabling of specific areas of physical memory. The kernel should be asked to quiesce and remap the areas of physical memory that are being taken offline. The kernel should be able to do this while the system is running normally, without taking the system down. Consequences derived from memory pressure caused by the removal are acceptable. Note that this requirement does not impose the handling of surprise memory removals. While software deconfiguration of memory may be useful, this capability adds value to a data center system under the following conditions:

• When the underlying hardware platform provides a dynamic memory reconfiguration capability via a partition manager

• When the hardware provides hot plugging/removal of physical memory modules, non-interleaved memory cells, means to map the memory modules to physical memory areas, and (optional) memory fault prediction. Component-level memory reconfiguration allows the user to physically remove hardware. The following cases illustrate situations when this would be helpful:

• An ECC memory failure occurs that is based on platform prediction capabilities.

• An ECC memory failure occurs that is based on a dynamic partitioning system to reconfigure memory sizes on an OS partition. Additionally, if necessary, memory component failure and identification capabilities should allow failing memory to be replaced without taking the system offline.

Usability The usability aspect of this capability requires further investigation.

Reference There has been a lot of activity on this project, particularly around hot remove of user space memory. The infrastructure to support this effort is in place. Memory migration patches have found acceptance. The capability to hot add a memory module (R.HOTSWAPMEM-ADD) has been completed and is in the mainline kernel. The first attempts at creating patches to reduce memory fragmentation to assist in removal were met with resistance. Other approaches to memory removal that use memory pools have been submitted. Mel Gorman’s patches to reduce memory fragmentation (two approaches): http://www.skynet.ie/~mel/projects/patches/brokenout/ Sourceforge Project Linux Hotplug Memory support: http://sourceforge.net/projects/lhms

149 Data Center Linux Capabilities

ID Name of Priority Level Category Maturity Level Capability

R.HOTSWAPMEM-ADD Hot Swap: 1 RAS Edge: N/A Component Level Application: Completed Memory Add DB/Content: Completed

Description DCL implementations should allow enabling of specific areas of physical memory. The kernel should be able to do this while the system is running normally, without taking the system down. The additional physical memory should be usable after it is added.

The capability to add memory is used in data centers when the growth of system activity is causing memory pressure. Adding additional memory on-line allows systems to resume expected performance levels without sacrificing availability. It can also be used in conjunction with a memory remove capability to proactively replace memory that is close to failure.

References This capability is now in the mainline kernel. See R.HOTSWAPMEM-RM for related capabilities of memory remove.

150 Data Center Linux Capabilities

ID Name of Capability Priority Category Maturity Level Level

R.HOTSWAPCPU Hot Swap: 1 RAS Edge: N/A Component Level— Application: Completed CPU DB/Content: Completed

Description DCL implementations should allow enabling and disabling of one or more CPUs (at least one has to be always online) while the system is running, properly handling all the necessary state changes and without any loss of information or downtime. Note this requirement doesn’t impose the handling of surprise CPU removals. The ability to disable and enable processors should enable the actual component removal of disabled CPUs on hardware platforms that support CPU Hot Plugging. In the datacenter, a system that provides CPU fault prediction should be able to take processors offline if they’re likely to fail, and if electronically possible, replace a failed CPU, thus improving overall system availability. Another use would be to increase dynamically the processing power of an SMP system by dynamically adding an additional CPU, either electronically or through dynamic repartitioning.

Usability The usability aspect of this capability requires further investigation.

References At the time of this writing, the code that handles the common cases is complete and available in the Linux 2.6 kernel. There are still special cases that are under investigation, such as removing the boot CPU or dealing with mixed speed CPUs. Linux Hotplug CPU Support project: http://sourceforge.net/projects/lhcs OSDL Hotplug SIG home page: http://developer.osdl.org/maryedie/HOTPLUG/

151 Data Center Linux Capabilities

ID Name of Capability Priority Category Maturity Level Level

R.HOTSWAPNODE Hot Swap: Component 1 RAS Edge: N/A Level—Node Application: Released DB/Content: Released

Description DCL implementations should allow the addition and removal of entire nodes while a system is running. The implementation should simultaneously handle all necessary state changes, without any loss of information or downtime. In this context, nodes are defined as container devices that include CPU, memory and/or I/O devices. While hotplug node takes advantage of existing individual hotplug mechanisms (such as CPU hotplug, memory hotplug and I/O hotplug), it is important to consider the specifics of the node and to add and remove nodes in a manner that is aware of resources. For example, when you add a node that contains CPU and memory, memory should be added first so that CPUs can allocate data in the memory while the CPU is being added. Otherwise the CPU needs to allocate it in other memory on another node, which might cause performance loss. In a data center, if a failure is predicted, a system with component fault prediction capability should be able to take offline a whole node with a faulty component and replace the node, assuming the hardware can support it. This scenario improves overall system availability.

Usability The usability aspect of this capability requires further investigation.

References There is currently a Linux Hotplug Node Support project that is actively working on the implementation of node hotplug on Linux 2.6.x. They have adopted ACPI as the hardware manipulation interface: http://lhns.sourceforge.net/ OSDL Hot Plug SIG home page: http://developer.osdl.org/maryedie/HOTPLUG/

152 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

R.Comp-Notifi Component Notification: 1 RAS Completed Mem/IO/Power Failure, Temperature

Description Data center hardware contains a rich set of intelligent components (such as baseboard management controllers) that can autonomously manage resources and simple sensors. This management provides raw data for all aspects of platform operation (such as on-device temperature sensors commonly found on data center class processors.)

Providing a complete, high-availability solution requires listening for events (CPU temperature threshold exceeded, fan failure and so on) from these devices. A complete solution also requires taking appropriate actions in these situations, such as alerting the system administrator, offloading service to a hot standby machine, and so on This capability calls for the low-level operating system functionality to query for and receive events as noted above.

Usability Linux currently provides mechanisms for pulling raw sensor data, be it direct sysfs access to i2c devices on a Linux 2.6-based kernel or through the lm_sensors libraries on 2.4 and 2.6-based kernels. Linux also has the ability to communicate with intelligent components via the IPMI driver, but it lacks a generalized framework that would allow HA middle-ware and/or applications to be written for all platforms.

The Service Availability Forum (SAF) has published the Hardware Platform Interface (HPI) specification that defines a 'c' callable API for just this purpose. In order for a Linux operating system to meet the usability needs demanded of the data center, an implementation of the SAF HPI Specification should be provided.

References More information on SAF (along with a free copy of the HPI specification) is available at Service Availability Forum: http://www.saforum.org.

The OpenHPI project has created an open source implementation of the HPI specification for devices that support the IPMI interface. Other plug-ins can be created to support other management frameworks using the HPI interface: http://openhpi.sf.net

153 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

R.FastBoot Fast System Boot 1 RAS Mainline

Description Linux startup time varies wildly between different architectures, setups and distributions, and in most cases, the boot and reboot times are too long. RAS requires that to improve uptime, a boot or a reboot has to be as fast as possible. This capability tries to ensure the following occur:

Time from power on/reset to kernel boot is minimized: this is a BIOS issue that is out of the scope of DCL, but collaboration with BIOS and/or platform vendors would only help. It should be possible to skip this step on reboot by launching a new OS kernel from inside the rebooting kernel. Time from kernel boot to user space initialization (init) is minimized: probing of resources can prove to be a time consuming task, especially when it’s needed to enumerate and reset devices and buses. Reboots should use resource topography information passed forward by the rebooting kernel to avoid probing. Parallel resource initialization should be considered as well. Time from user space initialization (init) to full system online is minimized: ISVs and distributors need to improve their system start-up scripts to reduce the time until full system online. Parallelization should be considered.

Usability The usability aspect of this capability requires further investigation.

ID Name of Capability Priority Level Category Maturity Level

R.FASTInst Fast Install 2 RAS Mainline

Description This capability enables a speedy system install process, which addresses a serviceability issue (as opposed to a manageability issue).

Usability This item is a non-passive item, meaning interaction is required for its use. Therefore, usability of this item is tracked separately. The usability aspect of this capability requires further investigation.

154 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

R.JournalFS Journaling File Systems 2 RAS Completed

Description This capability checks whether, after system failure, a system can get its file system back within a time period that meets the customer's availability requirements.

Usability The usability aspect of this capability requires further investigation.

155 Data Center Linux Capabilities

ID Name of Capability Priority Categor Maturity Level Level y

R.ReliableWrites Reliable File System Writes 1 RAS Completed

Description Many applications used in data centers guarantee that “write” system calls have been completed, or written to the physical device1. Thus data center file systems should allow for reliable file system writes. This capability is critical for a database management system (DBMS). A DBMS frequently saves log records to files in order to guarantee committed results of transactions. Before the database can commit a database transaction, it needs to know that the log record has been written to the physical media. A well-defined approach is required for application developers to guarantee writes for buffered and un- buffered I/O and for synchronous or asynchronous I/O. Since Linux 2.4 (since Linux 2.6 for asynchronous I/O), an application can guarantee a file system write has been completed in any of the ways listed below. The approach varies, depending upon whether you do buffered on non-buffered (direct) I/O and whether you do synchronous2 or asynchronous I/O. In all cases, the write is guaranteed only when the call sequence returns to the application3.

• write()/fsync() on fd opened without O_SYNC or O_DIRECT (buffered, synchronous I/O)

• write() on fd where file system is mounted with option “sync” ( for file systems supporting the sync option)

• write() on fd opened O_SYNC (buffered, synchronous I/O)

• write() on fd opened O_DIRECT (un-buffered, synchronous I/O)

• async write() / wait for completion / fsync() (buffered, asynchronous I/O)

• async write() / wait for completion on fd O_SYNC (buffered, asynchronous I/O)

• async write() / wait for completion on fd O_DIRECT (un-buffered, asynchronous I/O) For completion of writes, fdatasync can be used in place of fsync, as long as the metadata does not change (for example, when the file is pre-allocated before writes occur). Pre-allocating files is a common approach for databases. The O_DSYNC open() flag is defined as an alternative for synchronous I/O, but in Linux 2.6.9, it is implemented as the O_SYNC flag. When mmap is used to accelerate I/O, msync with MS_SYNC flag is used instead of fsync/fdatasync.

Usability This guarantee should be guarded with regular regression testing.

1 “Guarantee a write completion" means the operating system has issued a write to the I/O subsystem, and the device has returned an affirmative response. Once an affirmative response is sent, recovery from power-down without data loss is the responsibility of the I/O subsystem. 2 By asynchronous, we mean that the application proactively determines when the I/O completes. Synchronous means that once the write returns, the write buffer can be reused immediately 3 If a system crashes before the appropriate sequence returns to the application, the modification is not always guaranteed to have been completed by the OS. For example, partial writes may occur. It may be possible for file systems to support features to resolve this problem.

156 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

R.AtomicFile Atomic File System Operation 2 RAS Usable

Description In this context, “atomic file system operation” satisfies the following: if the system crashes before a file system can complete a system call, then the last uncompleted modification should be completely executed or completely not executed. Journaled file systems usually provide atomic file system operations. Metadata/data operations are recorded in the journal and played back to recover the file system in the event of failure. However, for any journaled file system, there are edge cases that can result in partial writes (to data or metadata) when a system crashes before a write returns. Linux file systems need to document these cases and provide enough control such that applications needing atomicity can avoid these situations. Rigorous test cases must be developed to prove atomicity is provided under the circumstances documented. Note: Even with this capability, all file corruptions are not prevented. Bad hardware can write over the drives in ways that cannot be anticipated in advance. In particular, hardware can write over journals that are used to make such a guarantee. There is no substitute for backups and log archiving for application availability and reliability. Databases should be able to recognize the last good transaction logged and roll forward to that.

Usability This guarantee should be guarded with regular regression testing. File systems providing this capability should document options that limit it.

References Project DOUBT: http://www.osdl.org/lab_activities/lab_projects/active_projects/display_single.html?uid=1268 ReiserFS: http://www.namesys.com/v4/v4.html Log-Structured File System Project, NILFS: http://www.nilfs.org/

157 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

R.FO-NetSrvcs Failover—Network Services 2 RAS Product Available

Description This is a and load-balancer type of service.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. The usability aspect of this capability requires further investigation.

ID Name of Capability Priority Level Category Maturity Level

R.FOStateful Failover—Stateful 2 RAS Edge: N/A Application: Investigation DB/Content: Investigation

Description This capability is important for market segments such as the financial and airline segments. It provides Linux support and services for application layer switchover.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. The usability aspect of this capability requires further investigation.

ID Name of Capability Priority Level Category Maturity Level

R.ORDBMS Failover—Open RDBMS 2 RAS Edge: N/A Dependent Application: Completed DB/Content: N/A

Description This provides support for Open Source-style (database) failover.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. The usability aspect of this capability requires further investigation.

158 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

R.FOPropRDBMS Failover—Proprietary 2 RAS Edge: N/A RDBMS Dependent Application: Product Available DB/Content: Product Available

Description This provides support for proprietary database-style failover.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. The usability aspect of this capability requires further investigation.

159 Data Center Linux Capabilities

ID Name of Priority Level Category Maturity Level Capability

R.Multipath Multipath I/O 1 RAS Edge: N/A Application: N/A DB/Content: Mainline

Description A common need for data center systems is the ability for devices to address the same physical disk over multiple paths. Multipathing allows NUMA systems to have a direct path to each device in the storage fabric, reducing the need for DMA across an interconnect. This leads to better IO performance overall. Multipathing also provides increased bandwidth to storage arrays by allowing the use of the full bandwidth of several host bus adapters when connected to a storage array or storage fabric. And, perhaps most importantly, multipathing allows device failures to be isolated and identified and IO to be rerouted around failing device components. A multipath solution should work well with the Persistent Storage Device Naming, Volume Management and 4096-SD solutions (other Priority One DCL capabilities).

Usability The solution should find all paths to the same storage device, without human intervention or the need to specify paths in a configuration file. User tools need to be well-documented, and they should not limit the capability of underlying support. For example, if load balancing across paths is available with several different algorithms, the tools should allow selection from all the different algorithms available. The solution should provide well-documented ways (for example, APIs) for I/O sub-system vendors to provide custom features that they typically support. The investigation of the usability aspect of this capability continues within the OSDL Storage Networking special interest group ([email protected]).

References An open MPIO solution is under development with user tools. This MPIO is built upon hotplug, udev, (DM) and the DM library. Email regarding development: [email protected] Multi-path tools, user tools: http://christophe.varoqui.free.fr Linux hotplug: http://linux-hotplug.sourceforge.net/ Linux hotplug development mail list archive: http://marc.theaimsgroup.com/?l=linux-hotplug-devel&w=2&r=1&s=documentation&q=b For gap analysis of some solutions in this area, see the OSDL Storage Networking SIG status page and look for the Multipath I/O focus area: http://developer.osdl.org/maryedie/STORAGE_NETWORKING An on-going gap analysis: http://developer.osdl.org/maryedie/STORAGE_NETWORKING/MPIO/Gap_analysis.html Integration testing with large numbers of LUNs with udev, LVM, and Multipath I/O: http://www.osdl.org/cgi-bin/mpio_wiki.pl?Integration

160 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

R.Backup-GB Backup Solution—GB 2 RAS Edge: N/A Range Application: Completed DB/Content: Completed

Description This capability provides a well-tested backup solution for up to 1 gigabyte of application or database data.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. The usability aspect of this capability requires further investigation.

ID Name of Capability Priority Level Category Maturity Level

R.Backup-TB Backup Solution—TB 2 RAS Edge: N/A Range Application: N/A DB/Content: Integrated

Description This capability provides a well-tested backup solution for up to 1 terabyte of application or database data.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. The usability aspect of this capability requires further investigation.

161 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

R.VolMngr Volume Manager 2 RAS Product Available

Description This provides simple mirroring capability, software RAID and disk replacement. The DB/Content maturity level is based on the expectation of HW RAID more than SW RAID.

Usability The usability aspect of this capability requires further investigation.

References Integration testing with large numbers of LUNs with udev, LVM, and Multipath I/O: http://www.osdl.org/cgi-bin/mpio_wiki.pl?Integration

ID Name of Capability Priority Level Category Maturity Level

R.Dynamic-IPC- Shared Memory & IPC 1 RAS Edge: N/A Configuration Parameter Changes Application: Investigation without Reboot DB/Content: Investigation

Description Data center applications require flexible IPC configuration, but not at the cost of availability. This requires run time-configurable limits for message queues, shared memory and semaphores.

Usability The usability aspect of this capability requires further investigation.

References The Linux 2.4 and 2.6 kernels provide sysctl controls for message queues, shared memory and semaphores. Needs further definition and research.

162 Data Center Linux Capabilities

Category ID Name of Capability Priority Level Maturity Level

R.ModuleReplace Replacing Modules without 2 RAS Edge: N/A Reboot Application: Completed DB/Content: Completed

Description This capability ensures that failed kernel modules can be replaced without the need to reboot the system, thus avoiding unnecessary downtime.

Usability The usability aspect of this capability requires further investigation.

ID Name of Capability Priority Level Category Maturity Level

R.MemExhaustion Graceful Handling of 2 RAS Usable Memory Exhaustion

Description An operating system in the Data center is expected to either (1) have an option to prevent over- commitment of virtual memory, or (2) handle an out of memory (OOM) condition gracefully. By gracefully, we mean "by using approaches as good as or better than those used by operating systems found in the data center currently." Regarding all of the maturity levels, the Linux 2.4 releases are clearly deficient, having provided neither (1) nor (2). Linux 2.6 provides (1) with strict over-commit. Appropriate Data center quality solutions to (2) are under investigation.

Usability The usability aspect of this capability requires further investigation.

163 Data Center Linux Capabilities

Manageability Capabilities in this category address administrative, day-to-day operation of a system’s active or passive activities.

Software Management

ID Name of Capability Priority Level Category Maturity Level

M.LocalSWInst Local Software Stack 2 Manageability Product Available Install and Update (Pull)

Description This capability provides the ability to pull from remote sources to install or update the OS, packages and third party software (the whole stack). It includes upgrade reversal, dependency checking, conflict resolution and security checking.

Usability The usability aspect of this capability requires further investigation.

ID Name of Capability Priority Level Category Maturity Level

M.RemoteSWInst Remote Multi-System 2 Manageability Integrated Software Stack Install, Update or Replication (Push)

Description This provides the ability to remotely install, update or replicate a system, including the OS, packages and third party software (the whole stack). It includes the ability to do upgrade reversal, dependency checking, conflict resolution and security checking. Multi System Install does not apply to the DB/Content maturity level, but remote install is important.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. The usability aspect of this capability requires further investigation and research.

164 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

M.3PI Common Interface for 1 Manageability Development Third Party Integration to Install Tools

Description Independent software vendors (ISVs) face the challenge of porting their tools to multiple platforms and operating systems. If there is no assurance that each Linux distribution shares a common method of installation and a similar run time environment, ISVs incur costs for testing installation of their applications on multiple Linux distributions. The Linux Standard Base (LSB) is addressing the need for all Linux distributions to adhere to requirements that will provide the basics of a common runtime environment. They are also addressing some aspects of how to specify and build a package for installation on Linux. However, this is not enough. Currently, vendors work to fit their third party applications into the installation package used by a particular distribution, such as RPM and . This means that a third party ISV needs to create several versions of an application install module so it can be installed on specific distributions. There is need to define and provide a common installation for ISVs who need to distribute their software applications. This format should work across all distributions so that third party application developers can create an install package for one distribution and trust that their application can be installed on all compliant distributions. The common installation package format needs to address problems that can arise with conflicts in file names and versions of dependent files (such as shared archived libraries). The installation package should be able to detect conflicts and provide a mechanism to resolve those conflicts. A set of application programming interfaces (APIs) should be specified to allow the creation or porting of installer tools that adhere to the API. The APIs will allow an installer tool to register certain files belonging to a package. Installer tools that are created or ported should have the capability to install and de-install packages while maintaining the integrity of the overall system (the tools should follow the capabilities defined in the package management section). The installer package should adhere to the conventions specified by the Filesystem Hierarchy Standard version 2.3 to ensure that application files are installed in the proper directories. The package should also check for dependencies and name conflicts.

Usability The installer package should display a status window that displays the installation process status. An administrator installing the application would find this helpful.

References The Linux Standard Base (LSB) has defined some of these principles: http://www.linuxbase.org

165 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

M.SWPM Software Package Management 1 Manageability Mainline

Description Existing software package management systems for Linux distributions don’t sufficiently address the needs of mission-critical enterprise systems. Current package managers are specific to their distributions, and therefore a system administrator would face different procedures in a heterogeneous Linux environment. Software dependency management is complicated under Linux, and if a product is missing a dependency, it usually falls upon the system administrator to manually install that dependency before proceeding with the intended package install.

Data centers need a software package management system to perform the following capabilities: Install to an alternate target without rebooting. Revert installation in case of failure, or delete a package altogether (uninstall). Update and manage distribution across a set of machines. List installed packages on a particular platform, providing package location and version at a minimum. Search for packages to determine if they’re installed on a system, and if so, provide package locations and versions. List the contents of a package to be installed, without actually installing the package (package preview). Provide package integrity-checking for installations and reverting installations. It’s highly desirable that the package management system work across different Linux distributions. This may involve defining a framework with a common set of APIs that all distributions can adhere to. It also involves setting a standard format for packages to follow, so that it’s assured it can be successfully installed on any distribution that contains the defined framework.

Usability Many of the package managers are command-line driven. A common graphical user interface needs to be created for a to be successful in meeting the user's expectations.

References Gentoo : http://gentoo-portage.com/ RPM: http://www.rpm.org/ dpkg: http://packages.debian.org/dpkg Autopackage project: http://autopackage.org/ This is an example of an open solution that works across multiple distributions. There are many other examples of distribution-specific installation tools that support online package installation. Some examples include the following: , Linux Auto YaST, and Redhat Network

166 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

M.SWPM-1 Software Package Management 1— 1 Manageability Mainline Local Software Stack Install and Update

Description Software package management should enable an installer to pull source or binary packages from remote locations and install or update them. The software may include OS packages, distribution releases and third party software. The process to install or update software should minimize downtime of the system. Features that support installations and upgrades should interact with and support significant capabilities for identifying and tracking the software installed on the system.

A package management system should provide remote software upgrade capabilities that include provisions for version compatibility and dependency checking at the software package file level. The upgrade process should allow the coexistence of new and old executables, shared libraries, configuration files and data. It’s reasonable that this capability be implemented as a combination of the installer and the chosen package management system.

The package manager should also perform conflict identification and resolution for packages installed or updated. When downloading sources from remote locations, the package manager should have the capability to perform security checking to ensure the content installed is authentic and not corrupted. Consideration is also given to the ability to log dates, times, changes and the identity of the administrator who performed the install or upgrade.

Usability The usability aspect of this capability requires further investigation.

References Distribution-specific installation tools: RPM, Linux Auto YaST, Portage, dpkg Linux at Duke project: http://www.linux.duke.edu/projects/yum/

167 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

M.SWPM-2 Software Package 1 Manageability Usable Management 2—Reversion of Software Installs and Updates

Description Administrators need the capability to easily revert software version packages if they determine a current install or update of a software package is unacceptable for their data center needs.

A software package management system should provide mechanisms that allow manual rollback to a previous version of the software without having to reinstall the previous version. In addition, a complete uninstall can be performed by the package manager so dependency checking and conflict identification/resolution is performed. In other words, an uninstall can be performed so that software remaining on the system doesn’t have dependency conflicts due to deinstallation of a software package.

Usability The usability aspect of this capability requires further investigation.

References Rollback failed RPM transactions: http://linuxjournal.com/article/7034 Distribution-specific installation tools: RPM, Portage, dpkg Linux at Duke yum project: http://www.linux.duke.edu/projects/yum/

168 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

M.SWPM-3 Software Package Management 3— 1 Manageability Usable Remote Multi-system SW Stack Install

Description There is a need to perform software package installation (and removal) on multiple systems in a data center. It would be easier for an administrator to have the ability to control the installations on several systems or to remove software packages from many systems from a single access point.

A package management system should provide software remote upgrade mechanisms that support multiple versions of applications and kernels on image target nodes. The new version should be installable without interfering with the execution of the older version or with the older version's configuration, logs and other files or information.

An image server on which all of the code to be installed resides can be used. An image target is any node installed or updated from an image server. Each target node can be customized, using post-install scripts or other techniques, to allow node-specific data such as hostnames, IP addresses, and application configuration information to be configured on the remotely installed nodes. This data is maintained on the server and automatically configured.

Usability Usability is improved by the ability to access and control the install/upgrade from a single access point that can communicate with the source of the packages and the destination nodes where the packages are to be installed.

References System Installation Suite (SIS): http://sisuite.sourceforge.net/ Distribution-specific installation tools: Kickstart, Linux Auto YaST

169 Data Center Linux Capabilities

ID Name of Capability Priority Category Maturity Level Level

M.SWPM-4 Software Package Management 1 Manageability Stable 4—Package Content Identification

Description Administrators need to know the current software versions on their managed systems or whether or not a specific version of a package (or even a file) is installed. It’s also desirable to determine package content before actually installing on a system. The software package manager should be able to provide capabilities to search for packages installed on a system and provide information about the package, such as

• The package location of the compressed file

• Information about the package, such as its version and dependencies

• The date it was installed In addition, the installer or update feature of the package manager needs to provide a method where the contents of the package can be shown without actually installing the files on the system.

Usability A graphical user interface with a search function would be ideal for improving the usability of this capability. In addition, a command line option where an administrator can search for a package name would be beneficial. Administrators need to assess the security risk of the patch.

References Distribution-specific installation tools: RPM, Portage, dpkg Linux at Duke yum project: http://www.linux.duke.edu/projects/yum/

170 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

M.Config-Mgmt Configuration Management 1 Manageability Usable (Expanded to Full Stack)

Description Configuration Management provides the ability to manage the overall operating environment in an automated way. To better manage the environment, automatic tracking of changes needs to be done. The scope of this capability is for the complete software stack including the operating system, patches, packages and third-party products.

The following two specific functions included in this item are as follows:

• Setting software configuration parameters

• Tracking changes to software configuration parameters that track date, time, change made, and identity of user

Usability The capabilities in the edge tier are mixed and need further investigation for usability. From a usability perspective, there is a high need to set configuration parameters, but the need for history is less important. The capabilities in the applications tier are mixed and need further investigation, and the assignment of configuration parameters is very important. Historical tracking is more important for the application tier than for the edge tier. The capabilities in the DB/Content tier are mixed and need further investigation. The assignment of parameters is very important. Historical tracking is more important for the DB/Content tier than for the applications tier.

References These references are for tracking change items:

System Installation Suite (SIS): http://sisuite.sourceforge.net/ Distribution-specific installation tools: Kickstart, YaST Cfengine, a configuration engine: www.cfengine.org

171 Data Center Linux Capabilities

Hardware Management

ID Name of Capability Priority Category Maturity Level Level

M.Vol-Mgmt Volume Management 1 Manageability Completed

Description Volume management provides a way to manage storage by assigning logical interface to the underlying physical disks. Most volume managers provide logical naming, storage aggregation and snapshots.

On enterprise systems, managing disks directly is error-prone and it limits the size of the file system or database to the size of the device.

Usability User tools need to be well-documented. User tools should not limit the capability of underlying support. For example, if the kernel supports segment sizes greater than 512 bytes, the user tools should not limit segment size to 512.

References There are two open source volume managers: LVM2 and EVMS. These solutions are based on Device Mapper (DM). EVMS can also use the MD . For gap analysis of some solutions in this area, see the OSDL Storage Networking SIG status page and look for the Volume Management focus area: http://developer.osdl.org/maryedie/STORAGE_NETWORKING The gap analysis based on Device Mapper functionality: http://developer.osdl.org/maryedie/STORAGE_NETWORKING/VOLMGMT/Gap_analysis_VMGT.html

ID Name of Capability Priority Level Category Maturity Level

M.Config-Discovery Device Configuration 2 Manageability Completed Discovery

Description This capability is concerned with boot time discovery and dynamic discovery (for example, hot plug).

Usability The usability aspect of this capability requires further investigation.

172 Data Center Linux Capabilities

ID Name of Capability Priority Category Maturity Level Level

M.PSDN Persistent Storage 1 Manageability Completed Device Naming

Description This provides device recognition and persistent device naming. This capability is important in any large environment utilizing a cluster, SAN or device reconfiguration. Although any solution should cover all types of devices, disks are Priority One, and other types are Priority Two.

File system naming and applications are associated with the contents of a storage device, not an I/O address. On enterprise systems, many RAS options like Multipath, Clusters and Volume Management can cause physical device addresses to change. Therefore there needs to be a dynamic and consistent association between device naming and storage contents.

Usability: The application maturity differs from the edge and DB/Content maturities due to the number of devices served by the latter two tiers.

References Network devices can already be assigned names. disklabel is shipped by some vendors, and other work is done by CGL and OpenGFS. udev: active project that replaces devfs in Linux 2.6 and later: http://www.kernel.org/pub/linux/utils/kernel/hotplug/udev.html Another (inactive) project based on Linux 2.6 is User Space System Device Enumeration: Der Keiler Linux: http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-10/5715.html SourceForge.net: http://sourceforge.net/projects/usde For gap analysis and use cases for some solutions in this area, see the OSDL Storage Networking SIG status page and look for the Persistent Storage Device Name focus area: http://developer.osdl.org/maryedie/STORAGE_NETWORKING Integration testing with large numbers of LUNs with udev, LVM, and Multipath I/O: http://www.osdl.org/cgi-bin/mpio_wiki.pl?Integration

173 Data Center Linux Capabilities

System Management

ID Name of Capability Priority Level Category Maturity Level

M.Remote-Console Remote Console Access 2 Manageability Mainline

Description An example implementation of this feature is net console.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. The usability aspect of this item requires further investigation and research.

ID Name of Capability Priority Level Category Maturity Level

M.Jobs Job Management 2 Manageability Stable

Description This capability enables system administrators to schedule a series of activities (jobs) at particular times or on assigned frequencies, in a predefined order.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. The usability aspect of this item requires further investigation and research.

174 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

M.Problem Problem Management 2 Manageability Edge: Investigation Application: Investigation DB/Content: N/A

Description This capability provides tools for system administrators to perform problem determination.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. The usability aspect of this item requires further investigation and research.

ID Name of Capability Priority Category Maturity Level Level

M.Remote Remote Management 1 Manageability Edge: Stable Application: Stable DB/Content: Integrated

Description Edge maturity level is based on tracking open source solutions, and application and DB/Content maturity level is dependent on tools available from database suppliers on Linux.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. The usability aspect of this item requires further investigation and research.

175 Data Center Linux Capabilities

Priority ID Name of Capability Category Maturity Level Level

M.Network Network Management 2 Manageability Mainline

Description This capability provides tools for system administrators to manage enterprise networks.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. The usability aspect of this item requires further investigation and research.

Priority ID Name of Capability Category Maturity Level Level

M.User User Management 2 Manageability Stable

Description This capability includes single sign-on.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. The usability aspect of this item requires further investigation and research.

176 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

M.Events Log Monitoring/Event 1 Manageability Mainline Notification/Agents

Description This capability is concerned with event notification for system administrators. When an OS detects abnormalities in a log, the OS notifies the system administrator via any one of a number of notification mechanisms, including email or paging in real time. The administrator can specify which logs are monitored and which situations trigger notification. By monitoring logs in real time, this tool is useful for detecting unauthorized access, among other things. Commercial products satisfy the capability (for all tiers).

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately.

References At least one comprehensive log-monitoring tool is available for Linux. Swatch: http://swatch.sourceforge.net/

ID Name of Capability Priority Level Category Maturity Level

M.Assets Resource Management—Asset 2 Manageability Development Management

Description This capability enables system administrators to globally track system components, including serial numbers and history of removal and insertion.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. The usability aspect of this item requires further investigation and research.

177 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

M.Usage Resource Management— 2 Manageability Usable Usage Tracking

Description This capability provides tools for system administrators to track system usage via many dimensions (user, application, department and so on), usually with the goal of back-charging departments.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. As Linux systems integrate with heterogeneous environments, it is necessary for typical usage tracking tools in data centers to have the ability to manage Linux systems.

ID Name of Capability Priority Level Category Maturity Level

M.LB-Tracking Resource Management— 2 Manageability Edge: Integrated Load Balancing and Application: Integrated Tracking DB/Content: N/A

Description This capability provides tools for system administrators to control and track resource utilization via many dimensions (disk, CPU and memory). This enables efficient use of resources during peak loads, and it provides an understanding of resource utilization within the enterprise.

Usability The usability aspect of this capability requires further investigation. As Linux systems integrate with heterogeneous environments, it is necessary for typical load balancing and tracking tools in data centers to have the ability to manage Linux systems.

178 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

M.Workload Workload Management 1 Manageability Mainline

Description Workload management utilizes one or more mechanisms to manage multiple applications on a single machine. The primary goal is to manage process resource consumption in terms of CPU, memory, I/O blocks and eventually TCP/IP traffic. The specific goal is to provide resource sharing between groups of processes. It allows administrators to divide resources between applications without having to physically partition the system.

System administrators can add workload management capabilities to a system in order to have better control of system workload. This is particularly a need for larger, more complex operating environments that might be running multiple workloads with differing priority schemes. The nice program doesn’t include capabilities to set group priority schemes: it is only for a specific process, and it only allows the existing priority for that specific process to be increased or decreased. So, the nice program is not sufficient to perform complete workload management. The workload manager should dynamically modify scheduler and VM behavior for a group of processes to follow administrator-defined parameters. It should allow the priority of the process group to be established and the relative priority of the process groups on the machine to be controlled dynamically. To accomplish this goal, the solution can include virtual machines or logical partitions and a logical grouping of processes for prioritization. Specific techniques can include user mode Linux (UML) or workload managers such as the one that the CKRM project is developing. Machine virtualization is another approach to workload management. The Source project is an emerging open source virtual machine monitor with support on Linux. It can execute multiple virtual machines, each running its own operating system

Usability The usability aspect of this capability requires further investigation.

References Several major vendors have products for workload management. There is an open source project for this called Class-based Kernel Resource Management (CKRM). The CKRM website lists this project in development status. User-Mode Linux Kernel Homepage: http://user-mode-linux.sourceforge.net/ Class Based Linux Kernel Resource Management Project (CKRM): http://sourceforge.net/projects/ckrm/ For details on an interesting, somewhat-related project, see the CPUSET proposal, used for controlling CPU placement: http://lwn.net/Articles/50690/ Several commercial solutions exist for machine virtualization. One open virtualization project is Xen Source. ”Linux: Xen 2.0 Released': http://kerneltrap.org/node/4168

179 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

M.Process Process and Resource 2 Manageability Product Available Monitoring

Description This capability uses GTOP, SAR and other monitoring tools.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. The usability of this item requires further investigation and research.

180 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

M.ProcessPlus Enhanced Process and 1 Manageability Stable Resource Monitoring

Description A system administrator needs to check if systems are healthy. Since typically many processes are running on systems, a mechanism is needed that can detect sudden process abort or the exhaustion of resources. This capability provides process monitoring tools that can determine if critical processes on a system are healthy. When a problem is encountered, the tools can notify the system administrator of the process abort and the reason for the abort. Generally an rc script is executed only once, and it doesn't detect process abort. So currently the administrator must proactively and manually check whether a certain process is running or not by using the ps command or by creating a script that re-executes the process. Enhanced process and resource monitoring is an essential factor for system operation to reduce such an inconvenient approach. Enhanced resource monitoring tools also include tools that can determine if enough system resources are available. When a problem is encountered, for example, when a system is out of disk space, the system administrator can be automatically notified of the exhaustion of system resources. The following functions are needed to monitor processes and resources:

• A web-based user interface, which reduces the effort to learn

• Real-time monitoring of current status, history records and statistical information

• Visualization of this information

• Notification of error detection via email or paging

• Monitoring that guarantees service level agreement (SLA)

• Monitoring for SNMP version 1, 2 and 3 devices Integrated tools like ZABBIX are now available, but ZABBIX does not have all functions that are needed. With improvements, it could be a candidate for a solution for this capability.

Usability To meet this capability, a mechanism that displays the status of processes and resources is needed.

References There are some simple ways to check processes and resources such as GTOP and SAR monitoring tools, and ZABBIX is an integrated tool for this issue (see M.Process (Process and Resource Monitoring). zabbix : http://www.zabbix.com/ daemontools: http://cr.yp.to/daemontools.html

181 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

M.SysError System Error Management 2 Manageability Product Available

Description This capability provides tools for system administrators to track and manage system errors.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. As Linux systems integrate with heterogeneous environments, it is necessary for typical error tracking tools in data centers to have the ability to manage Linux systems.

Development

ID Name of Capability Priority Level Category Maturity Level

M.Tools Scripting—Development 2 Manageability Completed Tools

Description This capability utilizes PERL and System Administrator Tool Bag.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. The usability of this item requires further investigation and research.

182 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

M.Capacity Capacity Planning 2 Manageability Edge: N/A Application: Mainline DB/Content: Mainline

Description This capability provides the tools required to track system workload changes over time. It also provides tools to predict future hardware requirements (CPU, memory, disk interfaces and network interfaces), for budget planning.

Usability This item is a non-passive item, meaning human interaction is required for its use. Therefore, usability of this item is tracked separately. The usability of this item requires further investigation and research.

183 Data Center Linux Capabilities

Virtualization Capabilities in the Virtualization category identify the customer-visible capabilities important to characterizing a virtualization implementation. These capabilities focus on Linux as a guest operating system. Two popular virtualization solutions were evaluated to determine the maturities in this category, VMWare and Xen. “Completed” maturity for this category means that two solutions on Linux have the stated capability. If only one solution exists, the maturity for a capability is assigned “Product Available.” If neither solution is available, the maturity of the solution having the greatest maturity is given. Virtualization capabilities that have a customer visible implementation detail affecting usability will have a usability descriptor in their tables.

Basic Customer Visible Attributes

ID Name of Capability Priority Level Category Maturity Level

V.UnmodifiedApps Run Application Software 1 Virtualization Completed Unmodified

Description This capability allows support for running unmodified application software in the context of an operating system running within a Virtualized Machine.

ID Name of Capability Priority Level Category Maturity Level

V.AppSeparation Application Separation 1 Virtualization Completed (security)

Description This capability enables applications running in separate virtualized environments on the same machine to be separated from each other as though they are running on two physically separate machines. Activities of one should not interfere with activities of the other. The separation should be as good as can be achieved through physical means.

184 Data Center Linux Capabilities

Virtualization Approaches

ID Name of Capability Priority Level Category Maturity Level

V.FullUnmodGuest Full Virtualization – 2 Virtualization Product Available Unmodified Guest OS

Description This capability provides a virtualized machine environment which can run the Guest Operating system without requiring that it be recompiled, although possibly different drivers may need to be loaded depending on how the underlying devices are exposed. This is one of three possible approaches to providing a virtualization solution: V.FullUnmodGuest, V.FullPerf, and V.ParaVirtGuest.

References QEMU is an example of this approach: http://fabrice.bellard.free.fr/qemu/

ID Name of Capability Priority Level Category Maturity Level

V.FullPerf Full Virtualization - 2 Virtualization Product Available Performance with Unmodified Guest OS

Description This capability allows virtualization solutions to achieve greater performance by using hardware means, such as implementing architecture-specific technologies like AMD’s SVM (Secure Virtual Machine) or Intel’s Virtualization Technology, or by software means, such as using binary translation of operation codes from the guest to the base hardware. This is one of three possible approaches to providing a virtualization solution solution: (V.FullUnmodGuest, V.FullPerf, and V.ParaVirtGuest.

ID Name of Capability Priority Level Category Maturity Level

V.ParaVirtGuest Paravirtualization – Run a 2 Virtualization Product Available Paravirtualized Guest

Description This capability supports a virtualization solution in which a modified Guest OS is required to run in the virtual machine. This is one of three possible approaches to providing a virtualization solution: FullUnmodGuest, V.FullPerf, and V.ParaVirtGuest.

185 Data Center Linux Capabilities

Supports Guest OS Type

ID Name of Capability Priority Level Category Maturity Level

V.32bGuest32bHw 32-bit Linux Guest on 32- 2 Virtualization Completed bit Hardware

Description This capability allows a virtualization solution to support the following configurations:

• A 32-bit host running on 32-bit hardware

• A 32-bit guest running on a "bare metal" virtualization solution running on 32-bit hardware (for example, a running on 32-bit hardware). (Note: that run directly on the hardware as opposed to running on top of a host OS are often referred to as "bare metal" virtualization solutions.)

ID Name of Capability Priority Level Category Maturity Level

V.32bGuest64bHw 32-bit Guest on 64-bit 2 Virtualization Product Available Hardware

Description This capability allows a virtualization solution to support the following configurations:

• A 32-bit host running on 64-bit hardware

• A 32-bit guest running on a "bare metal" virtualization solution running on 64-bit hardware (for example, a hypervisor running on 64-bit hardware). (Note: Hypervisors that run directly on the hardware as opposed to running on top of a host OS are often referred to as "bare metal" virtualization solutions.)

186 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

V.All64b 64-bit Guest on 64-bit 2 Virtualization Completed hardware (or 64-bit host/64- bit hardware)

Description This capability allows a virtualization solution to support the following configurations:

• A 64-bit host OS running on a 64-bit hardware

• A 64-bit guest running on a "bare metal" virtualization solution running on 64-bit hardware (for example, a hypervisor running on 64-bit hardware). (Note: Hypervisors that run directly on the hardware as opposed to running on top of a host OS are often referred to as "bare metal" virtualization solutions.)

ID Name of Capability Priority Level Category Maturity Level

V.64bGuest32 64-bit Guest on 32-bit Host 2 Virtualization Product Available bHost64bHw OS and 64-bit Hardware

Description This capability may be used when a hosted model is implemented using a 32-bit host OS on 64-bit hardware running a 64-bit guest within the Virtual Machine. This is an interesting scenario for customers who want to invest in 64-bit system hardware, but want to keep their 32-bit OS as an interim step. A motivating factor may be that not all drivers used for the 32-bit OS are available yet on the 64-bit system, or other components might not be mature enough to migrate their whole environment to a relatively new 64-bit OS. With these capabilities, they can run their old OS as a host and try out a 64-bit guest that can utilize the 64-bit hardware.

ID Name of Capability Priority Level Category Maturity Level

V.Win32bGuest Windows 32-bit Guest on 2 Virtualization Product Available 32bHw 32-bit Hardware

Description In addition to supporting Linux as a guest, this capability allows a 32-bit Windows OS to run as a guest on 32-bit hardware.

187 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

V.Win32bGuest Windows 32-bit Guest on 2 Virtualization Product Available 64bHw 64-bit Hardware

Description In addition to supporting Linux as a guest, this capability allows a 32-bit Windows OS to run as a guest on 64-bit hardware.

ID Name of Capability Priority Level Category Maturity Level

V.WinAll64b Windows 64-bit Guest on 2 Virtualization Product Available 64-bit Hardware

Description In addition to supporting Linux as a guest, this capability allows a 64-bit Windows OS to run as a guest on 64-bit hardware.

ID Name of Capability Priority Level Category Maturity Level

V.OtherGuestOS Other Guest OSs (Solaris, 2 Virtualization Completed Netware, Linux 2.4.x…)

Description In addition to supporting Linux as a guest, this capability allows at least one other UNIX-based OS to run as a guest.

188 Data Center Linux Capabilities

Hardware Support

ID Name of Capability Priority Level Category Maturity Level

V.X86-64 Architecture Support – 2 Virtualization Completed Support for X86-64

Description This capability allows a virtualization solution to support X86-64 architectures.

ID Name of Capability Priority Level Category Maturity Level

V.IA64 Architecture Support – 2 Virtualization Usable Support for IA64

Description This capability allows a virtualization solution to support IA-64 architectures.

ID Name of Capability Priority Level Category Maturity Level

V.IA32 Architecture Support – 2 Virtualization Completed Support for IA32

Description This capability allows a virtualization solution to support IA-32 architectures.

ID Name of Capability Priority Level Category Maturity Level

V.PPC64 Architecture Support – 2 Virtualization Development Support for Power PC-64

Description This capability allows a virtualization solution to support Power PC-64 architectures.

189 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

V.PPC32 Architecture Support – 2 Virtualization Investigation Support for Power PC-32

Description This capability allows a virtualization solution to support Power PC-32 architectures.

ID Name of Capability Priority Level Category Maturity Level

V.SMP2 SMP Support – 2-CPU Host 2 Virtualization Completed

Description This capability allows a virtualization solution to provide a host that supports symmetric multi-processing with up to 2 CPUs.

ID Name of Capability Priority Level Category Maturity Level

V.SMP4 SMP Support – 4-CPU SMP 2 Virtualization Completed Host

Description This capability allows a virtualization solution to provide a host that supports symmetric multi-processing with up to 4 CPUs.

ID Name of Capability Priority Level Category Maturity Level

V.SMP8 SMP Support – 8-CPU SMP 2 Virtualization Completed Host

Description This capability allows a virtualization solution to provide a host that supports symmetric multi-processing with up to 8 CPUs.

190 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

V.SMP16 SMP Support – 16-CPU 2 Virtualization Completed SMP Host

Description This capability allows a virtualization solution to provide a host that supports symmetric multi-processing with up to 16 CPUs

ID Name of Capability Priority Level Category Maturity Level

V.SMP32 SMP Support – 32-CPU 2 Virtualization Product Available SMP Host

Description This capability allows a virtualization solution to provide a host that supports symmetric multi-processing with up to 32 CPUs.

ID Name of Capability Priority Level Category Maturity Level

V.SMP64 SMP Support – 64-CPU 2 Virtualization Investigation SMP Host

Description This capability allows a virtualization solution to provide a host that supports symmetric multi-processing with up to 64 CPUs.

ID Name of Capability Priority Level Category Maturity Level

V.SMPgt64 SMP Support – Greater than 2 Virtualization Investigation 64-CPU SMP Host

Description This capability allows a virtualization solution to provide a host that supports symmetric multi-processing with greater than 64 CPUs.

191 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

V.MultiCore2 Multi-core Support – 2-CPU 2 Virtualization Completed SMP Host

Description This capability allows a virtualization solution to provide multi-core support for a 2-CPU SMP host.

ID Name of Capability Priority Level Category Maturity Level

V.MultiCore4 Multi-core Support – 4-CPU 2 Virtualization Completed SMP Host

Description This capability allows a virtualization solution to provide multi-core support for a 4-CPU SMP host.

ID Name of Capability Priority Level Category Maturity Level

V.MultiCore8 Multi-Core Support – 8-CPU 2 Virtualization Completed SMP Host

Description This capability allows a virtualization solution to provide multi-core support for an 8-CPU SMP host.

ID Name of Capability Priority Level Category Maturity Level

V.MultiCore16 Multi-core Support – 16- 2 Virtualization Product Available CPU SMP Host

Description This capability allows a virtualization solution to provide multi-core support for a 16-CPU SMP host.

192 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

V.MultiCore32 Multi-core Support – 32- 2 Virtualization Product Available CPU SMP Host

Description This capability allows a virtualization solution to provide multi-core support for a 32-CPU SMP host.

ID Name of Capability Priority Level Category Maturity Level

V.MultiCore64 Multi-core Support – 64- 2 Virtualization Investigation CPU SMP Host

Description This capability allows a virtualization solution to provide multi-core support for a 64-CPU SMP host.

ID Name of Capability Priority Level Category Maturity Level

V.MultiCoregt64 Multi-core Support – 2 Virtualization Investigation Greater than 64-CPU SMP Host

Description This capability allows a virtualization solution to provide multi-core support for a greater than 64-CPU SMP host.

193 Data Center Linux Capabilities

Virtualized Guest SMP Characteristics

ID Name of Capability Priority Level Category Maturity Level

V.PlugSched Plug-in Schedulers 2 Virtualization Integrated

Description This capability allows the appropriate I/O and/or CPU scheduler to be deployed for a workload running in any particular virtualized environment. This allows workload dependent scheduling and priority scheduling for various guests running simultaneously.

ID Name of Capability Priority Level Category Maturity Level

V.SMPGuestSMPHost SMP Guest on SMP 2 Virtualization Completed Host

Description This capability allows an SMP guest to run on an SMP Host.

ID Name of Capability Priority Level Category Maturity Level

V.NonSMPGuestSMPHost Non-SMP Guest on 2 Virtualization Completed SMP Host

Description This capability allows a non-SMP guest to run on an SMP host.

Name of ID Priority Level Category Maturity Level Capability

V.NonSMPGuestNonSMPHost Non-SMP Guest 2 Virtualization Completed on Non-SMP Host

Description This capability allows a non-SMP guest to run on a non-SMP host.

194 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

V.SMPGuestNonSMPHost SMP Guest on a 2 Virtualization Completed Non-SMP Host

Description This capability allows an SMP guest to run on a non-SMP host.

Virtualized Guest OS Drivers

ID Name of Capability Priority Level Category Maturity Level

V.SharedNetIF Shared Drivers: Network 2 Virtualization Completed Interfaces

Description This capability enables a virtualization solution to allow drivers on multiple virtualized guest OSs to share network interfaces.

Usability IT staff prefer when a guest can use the same drivers in the shared virtualized environment as those loaded in a non-virtualized environment, thus avoiding the need to proof or maintain another version. The maturity for the usability of this item is “Product Available.”

195 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

V.SharedGraphicsIF Shared Drivers: 2 Virtualization Integrated Graphics (Including AGP)

Description This capability enables a virtualization solution to allow drivers on multiple virtualized guest OSs to share graphics interfaces, including AGP.

Usability IT staff prefer when a guest can use the same drivers in the shared virtualized environment as those loaded in a non-virtualized environment, thus avoiding the need to proof or maintain another version. The maturity for the usability of this item is “Integrated.” (The maturity would be higher if the shared capability were complete.)

ID Name of Capability Priority Level Category Maturity Level

V.SharedStorageIF Shared Drivers: Storage 2 Virtualization Completed

Description This capability enables a virtualization solution to allow drivers on multiple virtualized guest OSs to share storage interfaces.

Usability IT staff prefer when a guest can use the same drivers in the shared virtualized environment as those loaded in a non-virtualized environment, thus avoiding the need to proof or maintain another version. The maturity for the usability of this item is “Product Available”.

196 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

V.SharedMisc Shared Drivers: 2 Virtualization Usable Miscellaneous Others

Description This capability enables a virtualization solution to allow drivers on multiple virtualized guest OSs to share other interfaces, such as USB.

Usability IT staff prefer when a guest can use the same drivers in the shared virtualized environment as those loaded in a non-virtualized environment, thus avoiding the need to proof or maintain another version. The maturity for the usability of this item is “Usable.” (The maturity would be higher if the shared capability were complete.)

Pass-through Guest Drivers

ID Name of Capability Priority Level Category Maturity Level

V.PassThruNetIF Pass Thru: Network 2 Virtualization Investigation Interfaces

Description This capability enables a virtualization solution to permit the guest OS to have total use of the network interface. No other guest can use that interface simultaneously. NOTE: Pass-through has two cases: (a) the guest OS has total use of the interface, and (b) interfaces with the right hardware support can handle multiple Virtual Machines if the software can take advantage of the hardware. This capability is the first type.

197 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

V.PassThruGraphicsIF PassThru: Graphics 2 Virtualization Product Available (Including AGP)

Description This capability enables a virtualization solution to permit the guest OS to have total use of the graphics interface (including AGP). No other guest can use that interface simultaneously. NOTE: Pass-through has two cases: (a) the guest OS has total use of the interface, and (b) interfaces with the right hardware support can handle multiple Virtual Machines if the software can take advantage of the hardware. This capability is the first type.

ID Name of Capability Priority Level Category Maturity Level

V.PassThruStorageIF Pass Thru: Storage 2 Virtualization Development

Description This capability enables a virtualization solution to permit the guest OS to have total use of the storage interface. No other guest can use that interface simultaneously. NOTE: Pass-through has two cases: (a) the guest OS has total use of the interface, and (b) interfaces with the right hardware support can handle multiple Virtual Machines if the software can take advantage of the hardware. This capability is the first type.

ID Name of Capability Priority Level Category Maturity Level

V.PassThruMiscIF Pass Thru: Miscellaneous 2 Virtualization Development Others

Description This capability enables a virtualization solution to permit the guest OS to have total use of the miscellaneous other interfaces, such as USB. No other guest can use that interface simultaneously. NOTE: Pass-through has two cases: (a) the guest OS has total use of the interface, and (b) interfaces with the right hardware support can handle multiple Virtual Machines if the software can take advantage of the hardware. This capability is the first type.

198 Data Center Linux Capabilities

Device Specific Hardware Supported for Virtualization

ID Name of Capability Priority Level Category Maturity Level

V.HdwrNetIF Hardware Pass Thru: 2 Virtualization Development Network Interfaces

Description This capability enables a virtualization solution to support special hardware found on some network interfaces that allow Virtual Machines to transparently share the same interface. NOTE: Pass-through has two cases: (a) the guest OS has total use of the interface, and (b) interfaces with the right hardware support can handle multiple Virtual Machines if the software can take advantage of the hardware. This capability is the second type.

ID Name of Capability Priority Level Category Maturity Level

V.HdwrGraphicsIF Hardware Pass Thru: 2 Virtualization Development Graphics

Description This capability enables a virtualization solution to support special hardware found on some graphics card interfaces that allow Virtual Machines to transparently share the same interface. NOTE: Pass-through has two cases: (a) the guest OS has total use of the interface, and (b) interfaces with the right hardware support can handle multiple Virtual Machines if the software can take advantage of the hardware. This capability is the second type.

ID Name of Capability Priority Level Category Maturity Level

V.HdwrStorageIF Hardware Pass Thru: 2 Virtualization Development Storage

Description This capability enables a virtualization solution to support special hardware found on some storage interface cards that allow Virtual Machines to transparently share the same interface. NOTE: Pass- through has two cases: (a) the guest OS has total use of the interface, and (b) interfaces with the right hardware support can handle multiple Virtual Machines if the software can take advantage of the hardware. This capability is the second type.

199 Data Center Linux Capabilities

Hotplug

ID Name of Capability Priority Level Category Maturity Level

V.HotplugCpu Hotplug CPU 2 Virtualization Product Available

Description This capability provides the ability to add and remove CPUs in a virtualized environment. This includes both the guest OS’s ability to recognize more resources and the Virtual Machine Monitor’s ability to add and remove resources. Linux supports CPU add and remove in mainline kernels.

ID Name of Capability Priority Level Category Maturity Level

V.HotplugMemory Hotplug memory 2 Virtualization Product Available

Description This capability provides the ability to add and remove memory in a virtualized environment. This includes both the guest OS’s ability to recognize more resources and the Virtual Machine Monitor’s ability to add and remove resources. Linux supports memory add, but not memory remove in mainline kernels. Alternatives to reducing the guest OS’s memory footprint (like with balloon drivers) are OK if well integrated.

References http://www.usenix.org/events/osdi02/tech/waldspurger/waldspurger.pdf

ID Name of Capability Priority Level Category Maturity Level

V.HotplugIO Hotplug I/O bus 2 Virtualization Product Available

Description This capability provides the ability to add and remove I/O buses in a virtualized environment. This includes both the guest OS’s ability to recognize more resources and the Virtual Machine Monitor’s ability to add and remove resources. Linux supports I/O buses add and remove for most I/O bus types in mainline kernels (refer to the RAS section).

200 Data Center Linux Capabilities

Virtual Machine Management

ID Name of Capability Priority Level Category Maturity Level

V.Metering Metering 2 Virtualization Product Available

Description This capability provides billing support for tools and drivers per Virtual Machine, such as, for example, bandwidth used on an interface (storage, network). If an architecture is supported, then this feature must be present.

ID Name of Capability Priority Level Category Maturity Level

V.QSS Quality of Service Support 2 Virtualization Product Available

Description This capability provides quality of service support in tools and drivers, allowing dynamic adjustment of the virtual resources (CPU, memory, network, storage) to meet service requirements. If an architecture is supported, then this feature must be present.

ID Name of Capability Priority Level Category Maturity Level

V.AutoRebalance Automatic Resource 2 Virtualization Product Available Balancing

Description This capability provides automatic resource balancing for memory, CPU, and I/O. If an architecture is supported, then this feature must be present.

201 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

V.ChkptRestart Checkpoint/Restart Support 2 Virtualization Product Available for Guest OS

Description This capability provides a checkpoint/restart mechanism for the guest OS for recovery from a failure or a planned downtime. If an architecture is supported, then this feature must be present.

ID Name of Capability Priority Level Category Maturity Level

V.VMMigration VM Migration to Another 2 Virtualization Completed Physical Machine

Description This capability enables live migration with low overhead of a guest running on a Virtual Machine to a different system. If an architecture is supported, then this feature must be present.

ID Name of Capability Priority Level Category Maturity Level

V.SaveRestore Save/Restore of a Guest OS 2 Virtualization Completed

Description This capability allows administrators to save a guest OS and restore it on another physical machine. This is typically used as a deployment approach, for example, for the creation of an application server that is then deployed on 1000 nodes simultaneously. If an architecture is supported, then this feature must be present.

ID Name of Capability Priority Level Category Maturity Level

V.CIMSupport CIM Support 2 Virtualization Product Available

Description The CIM virtualization model needs to be completed and providers made available.

202 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

V.LegacyTieIn Tie-in to Legacy Data Center 2 Virtualization Product Available Management Tools

Description This capability allows the virtualization solution to integrate well with legacy data center management tools.

ID Name of Capability Priority Level Category Maturity Level

V.EnhanceServ Enhanced Serviceability 2 Virtualization Development

Description This capability enables enhanced serviceability for multiple guests per host to support routine activities related to dealing with multiple copies of the operating system, such as applying patches or entering “console” commands.

ID Name of Capability Priority Level Category Maturity Level

V.VMClustering VM Clustering 2 Virtualization Development

Description This capability allows virtual machines (presumably those running on separate hardware) to be clustered for higher availability.

ID Name of Capability Priority Level Category Maturity Level

V.VirtNetProvisioning Virtual Network 2 Virtualization Product Available Provisioning, NAT

Description This capability enables a virtualization solution to provide virtual network provisioning combined with network address translation (NAT) for secure virtual network services.

203 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

V.RemoteConsole Remote Console 2 Virtualization Completed

Description This capability provides a secure remote virtual console.

ID Name of Capability Priority Level Category Maturity Level

V.ConsoleNotRequired Console Not Required 2 Virtualization Completed

Description This capability enables a virtualization solution not to require a hardware management console.

Kernel Improvements

ID Name of Capability Priority Level Category Maturity Level

V.SingleBinary Single Kernel Binary for 2 Virtualization Development Paravirtualized Solutions

Description This capability allows the modified guest of a paravirtualized solution to run as a stand-alone operating system without running in a virtualized machine. This would allow a customer to use the same operating system under both conditions (in a Virtual Machine or on bare metal). This reduces the number of OS versions the end-user has to maintain.

204 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

V.Dynticks Dynticks 2 Virtualization Product Available

Description This capability supports dynamic clock ticks, allowing the Virtual Machine to wake up only when an interrupt needs to be serviced, In Linux 2.6, the Virtual Machine wakes up at 1000 times per second, whether an interrupt needs to be serviced or not. With dynamic ticks, no clock interrupt occurs unless there is work to do.

ID Name of Capability Priority Level Category Maturity Level

V.LgPage Large Page Support in the 2 Virtualization Completed VM

Description This capability enables a Virtual Machine to support a guest OS configured to use large pages.

References See “Huge TLB Filesystem” in http://www.ussg.iu.edu/hypermail/linux/kernel/0306.3/1647.html

Development

ID Name of Capability Priority Level Category Maturity Level

V.Debugger Debugger 2 Virtualization Product Available

Description A full-featured, robust debugger is needed that is aware of virtual machines. A debugger that will work for the virtualization environment and all kernels it supports is preferred.

205 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

V.ConvTools Conversion Tools 2 Virtualization Product Available

Description Conversion tools are needed to automate any physical to virtual conversion to different systems, numbers of CPUs, storage configuration, and so forth.

ID Name of Capability Priority Level Category Maturity Level

V.TestSuites Test Suites for Validation 2 Virtualization Product Available and Regression Testing

Description To provide stability, test suites and harnesses need to be created and utilized. Many capabilities must be tested, for example, virtual hotplug. A large variety of operating systems need to be tested as paravirtualized and fully virtualized guests (and in combinations), not just as Linux guests.

ID Name of Capability Priority Level Category Maturity Level

V.PerfTools VM Aware Performance 2 Virtualization Product Available Tools

Description Various performance tools are needed that are aware of the virtualized machine, such as a profiler in the VMM, trace tools, and monitors.

206 Data Center Linux Capabilities

Clusters Capabilities in the Clusters category support the use of multiple-server systems.

Administrative Cluster

ID Name of Capability Priority Level Category Maturity Level

C.Adm-User Administrative–User 2 Clusters Usable Management

Description This capability enables administrators to manage users/groups on computer nodes in a cluster. The capability includes the following:

• Cluster-wide commands for adding/removing users

• A file synchronization tool to synchronize files that define users

• A single file system view (including root directory)

ID Name of Capability Priority Level Category Maturity Level

C.Adm-Deploy Administrative—Software 2 Clusters Integrated Deployment

Description This capability deploys software onto a new or existing cluster. The following numerous tools are available regarding the edge: SystemImager, ClusterWorX, Powercockpit and Kickstart. PowerCockpit and ClusterworX are released products. Not all of these tools use multicast to make them scalable. In general, the clusters in the DB/Content tier are smaller. While deployment is less of an issue at this tier, it is still important due to the requirement of less downtime for the servers in this tier.

207 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

C.Adm-Upgrade Administrative—Software 2 Clusters Integrated Upgrade

Description This capability upgrades the software on an existing cluster.

ID Name of Capability Priority Level Category Maturity Level

C.Adm-CentralClus Administrative—Central 2 Clusters Usable Cluster (Log/Notification/Monitoring)

Description This capability enables administrators to centrally access logs, receive notification and monitor the health of computer nodes in a cluster. Edge maturity level is affected by the use of ClusterworX and VACM. Some tools provide the feature to update only the files that are changed. This feature affects the time needed to do software upgrade in bigger clusters.

ID Name of Capability Priority Level Category Maturity Level

C.Adm-ClusComd Administrative—Cluster 2 Clusters Usable Commands

Description This capability provides commands that affect the whole or a part of a cluster. Edge maturity level is affected by the Cluster Command and Control Project (C3).

References The Cluster Command and Control Project (C3): http://www.csm.ornl.gov/torc/C3/

208 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

C.CW-PDSN Cluster-Wide Persistent Storage 1 Clusters Investigation Device Naming

Description This capability provides device recognition and persistent storage device naming for computing nodes in a cluster. The capability is important in clusters that run applications either in parallel or in a transparent manner on any given computing node. Disk naming is more important than other devices. File system and disk device mapping should be maintained across the cluster. If you use the “by uuid” policy of udev, you can have persistent naming across the cluster. However, you have no easy way to coordinate names across the cluster. You can copy the policies across nodes if it uses the “by uuid” approach.

ID Name of Capability Priority Level Category Maturity Level

C.C-VM Cluster Volume Management 1 Clusters Development

Description Cluster Volume Management extends the Volume Management capability on a single computer node to clusters. The capabilities required are similar to single-node volume manager capabilities, except in cluster environments, including the following:

• Enabling remote nodes to be informed of volume definition changes

• Providing consistent and persistent cluster-wide device names

• Managing volumes from different cluster nodes consistently

• Providing support for striping and concatenation of storage. Clustered mirroring of shared storage is included in this capability

References EVMS on Linux 2.4 and 2.6 is available. We are unsure of how complete coverage is for the .

209 Data Center Linux Capabilities

ID Name of Capability Priority Category Maturity Level Level

C.C-FS Cluster File System 1 Clusters Integrated

Description This capability provides a consistent file system image and service across the computing nodes in a cluster. A file system can be accessed independently from any node, and the file system integrity should be maintained. The physical storage can be in a SAN environment or distributed among file servers.

References Open Source: , OpenGFS, Oracle Cluster File System (OCFS)

AIS Services for Linux-HA and Open AIS

ID Name of Capability Priority Level Category Maturity Level

C.Gp-Messaging Group Messaging 2 Clusters Edge: N/A Application: Development DB/Content: Development

Description Group messaging is one of several services that can be used by applications built over cluster technology, although not all applications require group messaging. This capability provides a mechanism by which processes in a cluster can reliably exchange messages. The service relies on the membership service to determine what nodes are active. Group messaging allows subsetting of the membership in a message domain.

References The SA-Forum AIS specifications define an interface to this mechanism. The interface currently under development will be used by Linux-HA and OpenAIS cluster solutions. Application Interface Specification (AIS) Services. OpenAIS project overview: http://developer.osdl.org/dev/openais/ High-Availability Linux Project: http://www.linux-ha.org/ RTC Article on the Service Availability Forum Application Interface Specification : http://www.rtcmagazine.com/home/article.php?id=100199

210 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

C.Events Event Notification 2 Clusters Edge: N/A Application: Integrated DB/Content: Integrated

Description Event Notification is one of several services that can be used by applications built over cluster technology, although not all applications require event notification. This capability provides a unified way to publish events to interested subscribers. The service spans all cluster software layers.

References The SA-Forum AIS specifications define an interface to this mechanism. The interface currently under development will be used by Linux-HA and OpenAIS cluster solutions. Application Interface Specification (AIS) Services: The OpenAIS Project: http://developer.osdl.org/dev/openais/ Linux-HA: http://www.linux-ha.org/ RTC Magazine article: http://www.rtcmagazine.com/home/article.php?id=100199

211 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

C.Checkpoint Checkpoint 2 Clusters Edge: N/A Application: Integrated DB/Content: Integrated

Description Checkpoint service is one of several services that can be used by applications built over cluster technology, although not all applications require checkpointing. The checkpoint service saves checkpoint data. It retrieves the previous checkpoint data if it is needed to resume execution from the state recorded before a cluster node failed. The service is provided for applications that require rapid fail-over at the application layer.

References The SA-Forum AIS specifications define an interface to this mechanism. The interface currently under development will be used by Linux-HA and OpenAIS cluster solutions. Application Interface Specification (AIS) Services: The OpenAIS project: http://developer.osdl.org/dev/openais/ Linux-HA: http://www.linux-ha.org/ RTC Magazine article: http://www.rtcmagazine.com/home/article.php?id=100199

212 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

C.DLM Kernel—Distributed Lock 2 Clusters Edge: N/A Manager Application: Usable DB/Content: Usable

Description Distributed lock manager is one of several services that can be used by applications built over cluster technology, although not all applications require a distributed lock manager (DLM). DLM is a distributed lock service that provides a mechanism to coordinate access to shared resources across a cluster. Typically, transaction-oriented services like databases, file systems or resource managers need this service. DLM depends on the membership service.

References The SA-Forum AIS specifications define an interface to this mechanism. The interface currently under development will be used by Linux-HA and OpenAIS cluster solutions. Application Interface Specification (AIS) Services. The OpenAIS project: http://developer.osdl.org/dev/openais/ Linux-HA: http://www.linux-ha.org/ RTC Magazine Article: http://www.rtcmagazine.com/home/article.php?id=100199

213 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

C.Membership Membership 1 Clusters Edge: N/A Application: Mainline DB/Content: Mainline

Description Membership service is one of several services that can be used by applications built over cluster technology. Cluster applications all depend on this service, which makes it a Priority One item. The membership service provides information to applications about active nodes (the nodes in the cluster that can send and receive messages). It ensures that every active node has the same view of the consensus membership (which nodes are part of the cluster). It notifies members when membership changes via an event. Membership depends on the communication service.

References The SA-Forum AIS specifications define an interface to this mechanism. The interface currently under development will be used by Linux-HA and OpenAIS cluster solutions. Application Interface Specification (AIS) Services. The OpenAIS project: http://developer.osdl.org/dev/openais/ Linux-HA: http://www.linux-ha.org/ RTC Magazine Article: http://www.rtcmagazine.com/home/article.php?id=100199

214 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

C.Comms Communications 2 Clusters Edge: N/A Application: Usable DB/Content: Usable

Description Communication service is one of several services that can be used by applications built over cluster technology. The membership service is the primary user of the communications service. The communication service provides high bandwidth, low latency, ordered, reliable point-to-point transfers, virtual circuit establishment, failure detection, dynamic path discovery and network media configuration.

References The SA-Forum AIS specifications define an interface to this mechanism. The interface currently under development will be used by Linux-HA and OpenAIS cluster solutions. Application Interface Specification (AIS) Services The OpenAIS project: http://developer.osdl.org/dev/openais/ Linux-HA: http://www.linux-ha.org/ RTC Magazine article: http://www.rtcmagazine.com/home/article.php?id=100199

Load Balancing For scaling purposes, all features related to load balancing clusters need to scale to the following number of nodes: Edge Local =100, Edge WAN= 300, Application=32, DB/Content=NA)

ID Name of Capability Priority Level Category Maturity Level

C.LB-Conn Load Balancing— 2 Clusters Edge: Integrated Connection Based Application: Integrated DB/Content: N/A

Description This capability balances the load of computer nodes based on the number of connections. The connection could potentially be redirected to other nodes. The edge maturity level is due to LVS–Linux Virtual Server. The DB/Content maturity level is N/A because the database application primarily takes care of this function rather than the OS.

References Linux Virtual Server: http://www.linuxvirtualserver.org/

215 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

C.LB-WAN Load Balancing—Wide- 2 Clusters Edge: Usable Area Network (WAN) Application: N/A DB/Content: N/A

Description The edge and application maturity levels are due to LVS (Linux Virtual Server).

References Linux Virtual Server: http://www.linuxvirtualserver.org/

ID Name of Capability Priority Level Category Maturity Level

C.LB-Resource Load Balancing— 1 Clusters Edge: Usable Resource Based Application: Usable: DB/Content: N/A

Description This feature load balances the computation based on resource availability of the cluster nodes. The resources include CPU, memory and I/O (for example, networking, storage.) Resources can be reserved for computations.

References PBS, OpenPBS(not-GPL), SGE, LFS(commercial) Portable Batch System: http://www.openpbs.org/

ID Name of Capability Priority Level Category Maturity Level

C.LB-DynBal Load Balancing—Dynamic 2 Clusters Edge: N/A Balancing Application: Usable DB/Content: N/A

Description This capability rebalances the load of computer nodes by migrating the process from heavily loaded computers to lighter ones. Application maturity level is due to Mosix, Sclyd and OpenSSI.

216 Data Center Linux Capabilities

High Availability (HA) Cluster For scaling purposes, all features related to load balancing clusters need to scale to the following number of nodes: Edge=NA, Transaction Application=32, Continuous Apps = 2, DB/Content=Database handles/NA.

ID Name of Capability Priority Level Category Maturity Level

C.HA-Trans HA—Failover: Transaction 2 Clusters Edge: N/A Based Application: Integrated DB/Content: N/A

Description This capability concerns failover for transaction-based computing (for example clients who need to resend the request). The maturity level is based on ServiceGuard, Lifekeepers, and OpenSSI.

ID Name of Capability Priority Level Category Maturity Level

C.HA-Cont HA—Failover: Continuous 2 Clusters Edge: N/A Application: Usable DB/Content: N/A

Description This capability provides failover for applications that can restart from the last checkpoints.

Single System Image (SSI) For scaling purposes, all features related to load balancing clusters need to scale to the following number of nodes: Edge=NA, Application < 16, DB/Content=16.

217 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

C.SSI-Process Single-System Image— 2 Clusters Edge: N/A Process Application: Usable DB/Content: N/A

Description This addresses single process name space among computer nodes in a cluster. Maturity levels are based on maturity of bproc, Mosix and OpenSSI.

ID Name of Capability Priority Level Category Maturity Level

C.SSI-FS Single System Image—File 1 Clusters Edge: N/A System View Application: Usable DB/Content: N/A

Description This capability provides single view file system hierarchy, image and service across the computing nodes in a cluster, including root directory. Due to the nature of "single view," HA feature (at minimum for root directory) is essential. A full implementation of Single System Image (SSI) would include the other items in this SSI set, but implementations displaying only some of the SSI features exist on other OSs. Therefore this feature has been identified as the most important one in the SSI set.

References Open Source: OpenSSI-CFS

ID Name of Capability Priority Level Category Maturity Level

C.SSI-1/O Single System Image—I/O 2 Clusters Edge: N/A Application: Usable DB/Content: N/A

Description This capability provides single I/O name space (for example disk, network).

218 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

C.SSI-User Single System Image— 2 Clusters Edge: N/A User/Group Application: Usable DB/Content: N/A

Description For single file system views, the maturity level of this capability is based on OpenSSI, where it’s offered by default. Other solutions are Kerberos and NIS.

219 Data Center Linux Capabilities

Standards Capabilities in the Standards category reference specifications controlled outside Data Center Linux working groups.

ID Name of Capability Priority Level Category Maturity Level

ST.LSB2 Linux Standard Base (LSB) 2.0 1 Standards Completed Compliance

Description To attain compliance with the Linux Standard Base (LSB) Specification version 2.0 and to win acceptance in the data center, a Linux distribution should be certified to meet the specifications for a run time environment. Specifically, the distribution should meet the specifications documented in the LSB 2.0 and at least one of the following architectures:

• Linux Standard Base Specification for the IA32 Architecture 2.0

• Linux Standard Base Specification for the Itanium(tm) Architecture 2.0

• Linux Standard Base Specification for the PPC32 Architecture 2.0

• Linux Standard Base Specification for the S390 Architecture 2.0

• Linux Standard Base Specification for the z/Architecture 2.0.

• Linux Standard Base Specification for the PPC64 Architecture 2.0

• Linux Standard Base Specification for the AMD64/x86_64 Architecture 2.0. Compliance with the LSB certification process provides independent software vendors an assurance they can create applications that require minimal to no porting efforts across all compilation distributions. The LSB Specification requires applications to comply with the Filesystem Hierarchy Standard (FHS) version 2.3. Part of LSB specification requires compliance with the level 1(ABI) portion of the OpenI18N globalization specification. The remainder of the LSB specification concerns OpenI18N conformance for distributions that certify to the LSB Internationalized Runtime environment, and this conformance is recommended for all distributions aimed at the data center.

References The LSB Specifications: http://www.linuxbase.org/spec/ The Guide to the LSB Certification Program: http://www.opengroup.org/lsb/cert/docs/LSB_Certification_Guide.html The Free Standards Group LSB Certification information: http://www.freestandards.org/certify/ The LSB home page: http://www.linuxbase.org/ The architectures supported: http://refspecs.freestandards.org/lsb.shtml#LSB_2_0_1

220 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

ST.LSB3 Linux Standard Base (LSB) 3.0 1 Standards Product Compliance Available

Description To attain compliance with the Linux Standard Base (LSB) Specification version 3.0 and to win acceptance in the data center, a Linux distribution should be certified to meet the specifications for a run time environment. Specifically, the distribution should meet the specifications documented in the LSB 3.0 and at least one of the following architectures:

• Linux Standard Base Specification for the IA32 Architecture 3.0

• Linux Standard Base Specification for the Itanium(tm) Architecture 3.0

• Linux Standard Base Specification for the PPC32 Architecture 3.0

• Linux Standard Base Specification for the S390 Architecture 3.0

• Linux Standard Base Specification for the z/Architecture 3.0.

• Linux Standard Base Specification for the PPC64 Architecture 3.0

• Linux Standard Base Specification for the AMD64/x86_64 Architecture 2.0. Compliance with the LSB certification process provides independent software vendors an assurance they can create applications that require minimal to no porting efforts across all compilation distributions. The LSB Specification requires applications to comply with the Filesystem Hierarchy Standard (FHS) version 2.3. Part of LSB specification requires compliance with the level 1(ABI) portion of the OpenI18N globalization specification. The remainder of the LSB specification concerns OpenI18N conformance for distributions that certify to the LSB Internationalized Runtime environment, and this conformance is recommended for all distributions aimed at the data center.

References The LSB Specifications: http://www.linuxbase.org/spec/ The Guide to the LSB Certification Program: http://www.opengroup.org/lsb/cert/docs/LSB_Certification_Guide.html The Free Standards Group LSB Certification information: http://www.freestandards.org/certify/ The LSB home page: http://www.linuxbase.org/ The architectures supported: http://refspecs.freestandards.org/lsb.shtml#LSB_3_0_0

221 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

ST.CIM CIM 2 Standards Completed

Description This capability refers to the Common Information Model (CIM) Standard as defined by the Distributed Management Task Force (DMTF).

References The Distributed Management Task Force, Inc.: http://www.dmtf.org/standards/cim

ID Name of Capability Priority Level Category Maturity Level

ST.SNMP Simple Network Management 2 Standards Completed Protocol (SNMP) through v3

Description This capability refers to the Simple Network Management Protocol (SNMP) Standard as defined by the Internet Engineering Task Force (IETF). SNMP is one of the application-layer protocols of the Internet Protocol Suite.

References The SNMP Research International, Inc., SNMP 3 Specifications and Documentation: http://www.snmp.com/snmpv3/ The IETF homepage: http://www.ietf.org/

222 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

ST.IP Internet Protocol (IP) 2 Standards Mainline

Description This capability ensures that Linux complies with the Internet Protocol (IP) Standard as defined by the Internet Engineering Task Force (IETF). IP is one of the network layer protocols of the Internet Protocol Suite.

References Internet Protocol definition: http://en.wikipedia.org/wiki/Internet_Protocol The IETF homepage: http://www.ietf.org/

ID Name of Capability Priority Level Category Maturity Level

ST.IP-SEC Internet Protocol—SEC 2 Standards Completed

Description This capability specifies that Linux comply with the Internet Protocol Security Protocol (IPsec) as defined by the Internet Engineering Task Force (IETF). IPsec is a cryptographic protocol that is optional in IPv4 and required in IPv6.

References IPsec definition: http://en.wikipedia.org/wiki/IPSEC The IETF homepage: http://www.ietf.org/

223 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

ST.IPMI Intelligent Platform 2 Standards Product Available Management Interface (IPMI)

Description IPMI defines a standardized, abstracted, message-based interface to intelligent platform management hardware. It defines records for describing platform management devices and their characteristics. By reducing time to market and development costs, IPMI enables cross-platform server management software in a heterogeneous, multiple-server, highly available computing environment.

References Intel Intelligent Platform Management Interface: http://www.intel.com/design/servers/ipmi/ The OpenHPI project: http://openhpi.sourceforge.net/

ID Name of Capability Priority Level Category Maturity Level

ST.SAF-AIS SAF—AIS 2 Standards Stable

Description The Service Availability Forum (SAF) Application Interface Specification (AIS) is the defacto standard used for application interfaces to cluster services.

References The Service Availability Forum: http://www.saforum.org

224 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

ST.SAF-HPI SAF—HPI 2 Standards Product Available

Description The Service Availability Forum's Hardware Platform Interface (HPI) provides an abstracted interface to managing computer hardware, typically for chassis and rack-based servers. HPI includes resource modeling; access to and control over sensor, control, watchdog and inventory data associated with resources; abstracted system event log interfaces; hardware events and alerts; and a managed hotswap interface. The OpenHPI was released in January 2003. OpenHPI version 2.4 was released in February 2006.

References The latest developments at the OpenHPI Project: http://openhpi.sourceforge.net/

ID Name of Capability Priority Level Category Maturity Level

ST.ACPI Advanced Configuration and 2 Standards Stable Power Interface (ACPI)

Description The ACPI protocol was developed, by a consortium, to enable new configuration and power management technologies via a variety of hardware and operating system platforms.

References ACPI: http://www.acpi.info/

225 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

ST.International Globalization/ 2 Standards Integrated Internationalization

Description This capability refers to a wide variety of international standards. Further research is needed to identify all that are required for global Linux adoption.

Reference OpenI18N in the Free Standards Group: http://www.openi18n.org Localization is complete for Japanese. See Enterprise Linux for Public Sectors: http://www.osdl.jp/docs/elps3-1.0.pdf

ID Name of Capability Priority Level Category Maturity Level

ST.OpenPrinting Open Printing 2 Standards Released

Description This capability refers to the open printing specified by the Linux Free Standards Group. Not all specifications are complete. The PAPI (Print Application Programming Interface) and the JTAPI (Job Ticket Application Programming Interface) standards have been approved, while others remain under development.

Reference OpenPrinting in the Free Standards Group: http://www.openprinting.org/

ID Name of Capability Priority Level Category Maturity Level

ST.PCI I/O Interface—Peripheral 2 Standards Completed Component Interconnect (PCI)

Description This capability refers to the I/O interface specification for PCI. The PCI-SIG industry organization maintains the specification.

References The PCI specification at the PCI SIG organization: http://www.pcisig.com/specifications/conventional/

226 Data Center Linux Capabilities

ID Name of Capability Priority Level Category Maturity Level

ST.PCI-X I/O Interface—PCI-X 2 Standards Completed

Description This capability refers to the I/O interface specification for PCI-Express. The PCI-SIG industry organization maintains the specification.

References The PCI-Express specification at the PCI SIG organization: http://www.pcisig.com/specifications/pciexpress/

ID Name of Capability Priority Level Category Maturity Level

ST.PCI-IB I/O Interface—InfiniBand 2 Standards Mainline

Description This capability refers to the InfiniBand® standard for PCI-Express. The InfiniBand Trade Association maintains the standard. InfiniBand provides high speed connections between servers, remotely located storage devices or network devices.

References OpenIB project: http://openib.org/index.html Linux Infiniband Project: http://infiniband.sourceforge.net/

227

Security Publishing of the DCL Capabilities 1.0 document generated significant interest in the area of security, which led to the formation of the OSDL Security Special Interest Group (SIG). The Security SIG examined in more detail the original capabilities after having identified some basic assumptions and threats for a typical data center deployment model. To provide a secure system, each security manager must identify the specific basic assumptions and threats for their specific data center environment, then build a security protection profile to protect it against those threats. DCL Security Assumptions The Data Center Linux target is enterprises with raised-floor data centers, implying institutional support for information systems and requirements for information assurance (for example, business continuity policies). The analysis technique used has been to extract key aspects of security for database and application servers in a data center environment, rather than examining specific usage scenarios. To date, no attempt has been made to look at specific industries or needs. We believe that these assumptions apply across a broad range of applications allowing this document to serve as a starting point for most DCL deployments. However, it will be necessary to analyze and address any security risks and threats unique to each installation. Below is a non-exhaustive list of security assumptions related to DCL operating systems, and why these assumptions were made. These requirements were derived with database and application servers in mind; however, they will likely also be appropriate for web and infrastructure support servers.

• Administrative duties should be separated, because administrators authorized to perform some administrative duties should not have to be authorized to perform all administrative functions. Separation of duties is a pillar of good security practice, and it is required to have a good audit trail for actions performed. Examples of technologies that implement a solution for this requirement include sudo and SELinux, which should be configured to support role separation. • Least-privilege integrity protections should exist, because when applications break, whether through code bugs, administrative error or malicious attack, there is no reason the broken application should ever be able to damage anything it isn't allowed to update, modify or delete in the first place. Examples of technologies implementing a solution for this requirement include chroot, jail and SELinux. • Local and perimeter network firewall protections should be included, because it is just common sense to limit exposure to network attacks on ports and protocols you are not supposed to be using, so you can concentrate on defending against attacks on the protocols and ports you must use. An example of a host-base firewall for Linux is iptables. • Assume that any user may be hostile. At the very least, DCL systems must defend against amateur (cracker/hacker) attacks of the sort popular with virus and worm writers, meaning buffer overflow and cross-site scripting attacks. Examples of technologies addressing parts of this requirement include NX support, StackGuard and Nikto. • Assume that MOST administrators can be trusted. However, without background checks, they should not be trusted without limits to their authority on production DCL systems performing mission critical, customer privacy-related or financial reporting functions. Arguably, data center operators who fail to provide adequate controls (either technical or non- technical) in this regard may be failing to meet, for instance, SEC Sarbanes-Oxley requirements to assure that financial systems are protected from access or manipulation by insiders. For example, data center system administrators who have any of the privileges

228

listed below may have the ability to view and/or modify sensitive data such as general ledger reports and financial data. • Anyone with root privileges • Anyone with database installation/modification privileges • Anyone with database administration privileges • Anyone with financial applications administration privileges • Anyone with general ledger application administration privileges • Anyone with print queue administration privileges, if reports are queued to disk where they can be viewed or modified before printing • Anyone with backup/restore privileges, if tapes can be mounted on test or other machines • Anyone with access to the backup tape vault, if tapes can be mounted on test or other machines, if the backups aren't protected by password or otherwise encoded against unauthorized uses • Anyone with web services administrator privileges, if they can intercept transactions or make duplicates of reports or screen shots accessed through the financial applications portal • Anyone who can create accounts with root privileges on the system or database or financial applications • Anyone who can change passwords on a root account or DBA account or in the financial applications (think help desk operators) The Security SIG examined five scenarios for server types in the data center and published these descriptions as use cases. These can be reviewed at the following public URLs:

• Database Server: http://www.developer.osdl.org/dev/security/docs/DatabaseServer_70cols.txt • Mid-Tier Application Server: http://www.developer.osdl.org/dev/security/docs/MidTierServer_70cols.txt • Edge Server: http://www.developer.osdl.org/dev/security/docs/EdgeServer_70cols.txt • Infrastructure Server: http://www.developer.osdl.org/dev/security/docs/InfrastructureServer_70cols.txt • Departmental Server: http://www.developer.osdl.org/dev/security/docs/DepartmentServer_70cols.txt

Capabilities Needed, Based upon Assumptions From the above assumptions and use cases, we conclude that the following classes of capabilities are necessary to provide the protection needed in the data center:

• Vulnerability Avoidance The capabilities in this category are needed to detect and remove the inevitable vulnerabilities that occur, such as software bugs that create opportunities. Also included is the infrastructure (Linux Kernel Modules) that upper layer security solutions need to implement many of the other capabilities in this list. • Access Control These capabilities allow object owners and system administrators the ability to control access to system resources (for example, files and directories) at their discretion. The goal is to limit access to resources to grant access only as needed. For applications that must run in a

229

mixed environment (for instance, when hosting multiple unrelated applications for the purpose of outsourcing or consolidating hardware), application confinement is necessary to provide the level of isolation needed to prevent unwanted access to restricted resources, such as personnel, financial or medical diagnostic data. Although everyone strives to minimize the risks related to common attacks, such as through buffer overflows, other levels of assurance are needed to help prevent unauthorized access to certain types of critical resources. Application confinement techniques are an additional effective technique to do control access. • Intrusion Detection Two different scales of protection can be implemented, depending upon the needs of the deployment: A tamper-evident audit log makes it possible to detect when tampering has occurred on a log. Having a tamper evident log certainly does not completely prevent a trusted user from tampering with an audit log. Rather, the implication that there will be a reliable audit log acts as a deterrent to tampering. A tamper-resistant audit log makes it impossible for an attacker to create or audit events in the audit log. An example implementation of tamper-evident auditing is LogCrypt. • Trusted Architecture These capabilities allow a means of signing binaries, such as drivers, kernels, and applications, to detect hostile manipulation. • Host-based Firewall A software firewall is usually the first line of defense against malicious agents on external networks attempting to gain access to local resources. As part of an overall firewall strategy, it is useful to incorporate host-based firewalls. A host-based firewall can typically be more restrictive than a general-purpose firewall, because an application server can be tailored for a specific application purpose. For example, an application server may only need to allow access via a few ports, whereas a general-purpose firewall must serve all legal entries. • Remote Access/Secure Data Exchange This capability provides an integrated cryptographic framework. • Interoperability For networks that include Microsoft servers, the ability for Linux to provide all levels of Active Directory server/controller types helps to insure that Active Directory will continue to operate when Microsoft systems are under attack.

References The three URLs below provide details on topics related to our security capabilities. This reference is useful for security terminology used within our document: http://www.kernelthread.com/publications/security/ac.html This reference is useful regarding host-based firewalls: http://www.sun.com/blueprints/1103/817-4403.pdf LogCrypt: http://www.lunkwill.org/src/LogCrypt0.1.readme

230

Vulnerability Avoidance

ID Name of Capability Priority Level Category Maturity Level

SE.UserStackOvflow User Stack Overflow 1 Security Stable Protection

Description This capability helps developers detect and eliminate user stack overflows. Tools that provide this capability are readily available, but their licenses are restrictive and expensive, so they cannot be used as part of the development process for Linux.

References Stanford (the “Stanford Checker”) and Coverity offer good tools, but their licenses are restrictive and expensive. However, recently Coverity has indicated that they have “set up a framework internally to continually scan open source projects and provide the results of their analysis back to the community to the developers of those projects. Linux is one of the 32 projects currently scanned at: http://scan.coverity.com/.” See http://marc.theaimsgroup.com/?l=linux-kernel&m=114162333515369&w=2

ID Name of Capability Priority Level Category Maturity Level

SE.StackNotExec User and System Stack 1 Security Product Available Not Executable

Description To avoid attacks that try to execute in places like the user and system stack, this capability should take advantage of the no execute bit. One distro currently offers a solution, but unfortunately it is not in the mainline kernel.

231

ID Name of Capability Priority Level Category Maturity Level

SE.LSMSupport Linux Security Module 1 Security Completed (LSM) Support

Description This capability provides a common framework for loading security kernel modules needed for the development of security mechanisms. This approach allows a security mechanism to be installed or added without having to recompile the kernel.

References Two security mechanisms currently use LSM: SELinux and AppArmor

ID Name of Capability Priority Level Category Maturity Level

SE.SysIntegrityCk System Integrity Check 1 Security Completed

Description This capability provides a way to verify the integrity of critical system configuration files and directories and report any integrity violations. Damage assessment and some reconstruction capability are expected.

References http://www.tripwire.com

232

ID Name of Capability Priority Level Category Maturity Level

SE.StaticAnal Static Analysis Tools 1 Security Development

Description Highly accurate, open source, static analysis tools should be available and used as standard practice by all OSS communities. Bugs in the kernel or kernel modules typically become security holes. Static analysis tools are designed to find bugs and thus avoid security holes. Static analysis of applications is harder to provide than of the kernel, due to multiple languages and home spun applications. The best return is to do static analysis in the core application areas (apache). Two aspects of this capability are (1) tool creation, and (2) the use of the tools. Tools are available but they are still research tools (University of Texas, Cornell). Lots of work remains to achieve a good tool. And in the end, it is likely these tools will be spun off into “for profit” companies. Integrating the use of a static analysis tool into a community project is another huge step. The build system needs to take advantage of the tool, making sure the tool is producing useful information (avoiding false positives). The Linux HA project has had some success in integrating a proprietary tool.

References Open source tools under development are:

• sparse. See the linux-sparse mailing list.

• Flawfinder. http://www.dwheeler.com/flawfinder/ . A list of open source solutions can be found under “Other static analysis tools for security.”

ID Name of Capability Priority Level Category Maturity Level

SE.RunTimeAnalysis Run Time Analysis 1 Security Stable Tools

Description Run time analysis tools that provide highly accurate open source run-time analyses should be used in regression testing as standard practice in all OSS communities. Like static analysis tools, run time analysis tools detect bugs that can become security holes. Valgrind is a tool that could be extended to look for common security holes. CUPS use Valgrind in their regression suite to detect buffer overruns. It is currently more of a profiler than a security tool, but it can still help find problems. The project itself is mature, but it is not integrated into regression test processes. It also has some gaps as it does not cover all aspects of security regressions.

References http://valgrind.org

233

ID Name of Capability Priority Level Category Maturity Level

SE.FastFixProcess Fast Security Fix 1 Security Completed Process

Description This capability provides a viable and fast process for identifying and resolving security problems to the kernel community.

Usability An ease-of-use issue exists for those who are not using a distro stable release (and for the distros themselves). While a process is in place to quickly resolve fixes with patches, the fixes are for the head of the kernel tree, not the particular release used by anyone maintaining their own kernel.

Access Control

ID Name of Capability Priority Level Category Maturity Level

SE.DAC Discretionary Access 2 Security Product Available Control

Description This capability controls access to file systems and devices using discretionary access, where the owner of each entity allows access according to the owner’s discretion. One solution has achieved Controlled Access Protection Profile (CAPP) evaluation, which implies that the kernel is capable of supporting this certification in any solution.

234

ID Name of Capability Priority Level Category Maturity Level

SE.MAC Mandatory Access Control 2 Security Integrated

Description This capability supports access control by means of assigning classifications to entities (such as files, devices, processes, or people) and only allowing access to those entities based on levels of authorization. Solutions in this area need the Linux Security Module to implement MAC. The maturity is based on the progress of the Labeled Security Protection Profile (LSPP) evaluation project, which is well underway.

Usability The solutions need high-level administrative tools, documentation, and a template that makes it easy to create the policies needed to support implementing MAC.

ID Name of Capability Priority Level Category Maturity Level

SE.RestrictNet Restrict Net Access 2 Security Product Available

Description This capability permits fine-grained control of network access. The maturity level assessment is based on the fact that one distro has passed CAPP certification, which includes this capability.

ID Name of Capability Priority Level Category Maturity Level

SE.DistbUserAuth Distributed User 2 Security Integrated Authentication

Description This capability provides the ability to verify legitimacy of a user within a distributed computing environment. This mechanism allows a user to authenticate once, allowing access to all systems to which the user has legitimate access. The maturity is based on LDAP/Kerberos, which is integrated into the mainline but is still a bit clumsy to deploy.

235

ID Name of Capability Priority Level Category Maturity Level

SE.PRM Process Rights 2 Security Release Management

Description This capability allows a process to run without root access in a traditional DAC environment, similar to Solaris Process Rights Management. The motivation for this capability is that ISVs do not want to start their daemons as root. The current Linux approach requires a different initialization procedure to remove root privileges. Code exists, but the patch is not integrated in mainline.

ID Name of Capability Priority Level Category Maturity Level

SE.AppIsolation Application Isolation 2 Security Stable (Vserver)

Description This capability provides application isolation and containment in a way similar to the Solaris vserver. The maturity level is based on the linux vserver project which has a stable 2.1 release as of December, 2005, but is not yet integrated into mainline.

References The linux vserver project homepage: http://linux-vserver.org

ID Name of Capability Priority Level Category Maturity Level

SE.EncryptFSperFile Encryption Per File 2 Security Release

Description This capability supports encrypted file system technology that is transparent to the user. That is, the file does not have to be decrypted to be used or reencrypted to be saved. The GnuPG project offers file encryption without transparency, while the eCryptfs project intends to offer file encryption with transparency.

236

ID Name of Capability Priority Level Category Maturity Level

SE.EncryptFS While Disk Encryption 2 Security Security

Description This capability provides whole disk or file system encryption with transparency for the user. That is, the user does not need to decrypt and reencrypt the file system to use it. Many solutions are not transparent. The eCryptfs project intends to offer this feature with transparency.

References For an article on encryption for security, read: http://www.esj.com/news/article.aspx?editorialsID=1531

Intrusion Detection

ID Name of Capability Priority Level Category Maturity Level

SE.Audit Security Auditing 1 Security Usable

Description This capability enables auditing of security-related events on a system. One distro proved that a security audit could be done on Linux by achieving CAPP certification, but at the time the code used was not part of the Linux kernel. Since then, the linux-audit project has been working on code to support audit functionality that is headed for mainline adoption.

ID Name of Capability Priority Level Category Maturity Level

SE.TamperEvident Security Auditing: Tamper 2 Security Development Evident Audit Logs

Description Given the ability to audit logs, this capability enables detection of any attempt to tamper with an audit log. This capability is not as difficult to achieve as tamper resistance as it does not prevent log tampering. A project is underway to provide tamper-evident system logs, though not temper-evident audit logs. However, the concept should apply to audit logs as well.

237

ID Name of Capability Priority Level Category Maturity Level

SE.TamperResistant Security Auditing: 2 Security Investigation Tamper Resistant

Description Given the ability to do security auditing, this capability prevents attackers from changing security logs to avoid detection. This capability is extremely difficult to do. Feedback from users indicated that tamper evident audit logs are a much higher priority than tamper resistant audit logs.

ID Name of Capability Priority Level Category Maturity Level

SE.DetectAccess Intrusion Detection – 2 Security Usable Detect Unauthorized Access

Description This capability enables detection and comprehension of unauthorized access attempts. It is not necessary to have auditing available, although an audit is useful as a source of information.

ID Name of Capability Priority Level Category Maturity Level

SE.ReportAccess Intrusion Detection: Report 2 Security Stable Unauthorized Access

Description This capability enables reporting of an unauthorized access, assuming the unauthorized access has been detected. Only what has been detected can be reported, so it is easier to provide reporting, than provide detecting. This capability is not based on any audit requirement.

238

ID Name of Capability Priority Level Category Maturity Level

SE.DetectTampering Intrusion Detection: 2 Security Product Available Detect File Tampering

Description This capability enables detection of file tampering and is not dependent on an audit capability. Commercial and open source products are available on Linux that can detect file tampering.

References http://la-samhna.de/samhain http://packages.debian.org/unstable/admin/aide http://www.tripwire.com

Trusted Architecture

ID Name of Capability Priority Level Category Maturity Level

SE.SignedApps Trusted Applications: Signed 2 Security Usable Applications

Description This capability allows signing of applications as a method for users to avoid running sinister applications masquerading as real. Various projects are underway. DIGSIG recently added 64-bit support, RAS2048 support (2kbit keys), script support, and module unloading passwords.

ID Name of Capability Priority Level Category Maturity Level

SE.SignedDrivers Trusted System: Signed 2 Security Release System Drivers

Description This capability allows signing of drivers as a method for system administrators to avoid running sinister drivers masquerading as real. Driver suppliers (ISVs, distros) want to digitally sign their drivers to make sure customers are running their supported versions.

239

ID Name of Capability Priority Level Category Maturity Level

SE.SignedLibs Trusted System: Signed 2 Security Usable Libraries

Description This capability allows signing of libraries as a method for system administrators to know their libraries have not been compromised. Library suppliers (ISVs, distros) want to digitally sign their libraries to make sure customers are running their supported versions. DIGSIG claims to support signed libraries.

ID Name of Capability Priority Level Category Maturity Level

SE.SignedKernel Signed Kernel 2 Security Development

Description This capability allows signing of a kernel as a method for system administrators to know a kernel (like one remotely downloaded for a diskless system) has not been compromised. Kernel suppliers (distros) want to digitally sign their kernel to make sure customers are running their supported versions.

References Projects include secureboot.

ID Name of Capability Priority Level Category Maturity Level

SE.CoreRoot Core Root of Trust 2 Security Development

Description This capability provides support for core root of trust measurements to establish a chain of trust. The capability is provided in hardware, but should be supported in software. Two projects with early code support “trusted grub”, which is a subset of what is needed for core root of trusted measurements. One project is Trousers (on sourceforge), and the second is the trusted grub project.

References The trusted grub project: http://www.prosec.rub.de/trusted_grub.html

240

ID Name of Capability Priority Level Category Maturity Level

SE.NetworkConnect Trusted Network 2 Security Investigation Connect

Description This capability provides additional checks when a system tries to attach itself to the network. It is a standard of the Trusted Computing Group.

Host-based Firewalling

ID Name of Capability Priority Level Category Maturity Level

SE.Ports Filter Incoming/Outgoing 2 Security Completed Ports

Description This capability enables the host to examine various parts of the communication stack headers on an incoming or outgoing port to detect, prevent, and audit security breaches that can be detected based on this information.

ID Name of Capability Priority Level Category Maturity Level

SE.Forwarding Filter Forwarding Traffic 2 Security Completed

Description In the event the host machine is acting as a router, this capabiltiy enables filtering of the communication header information in the network packet to enable detection, avoidance, and auditing of security attacks that can be detected from that information.

241

ID Name of Capability Priority Level Category Maturity Level

SE.ControlProtocols Filter Control Protocols 2 Security Completed

Description This capability enables filtering of Internet Control Message Protocol (ICMP) and Internet Group Management Protocol (IGMP) messages.

ID Name of Capability Priority Level Category Maturity Level

SE.Level3Plus Level 3 + Filtering 2 Security Stable

Description This capability provides a level of stateful filtering (from the application point of view) that is higher than the networking layer in the communication stack to determine more information about a packet in relation to a connection and in the context of other packets seen by the firewall.

ID Name of Capability Priority Level Category Maturity Level

SE.AppLevel Application Level Filtering 2 Security Integrated

Description This capability supports a type of host-based filtering in which the characteristics of an application are examined before the application is executed to avoid execution of malicious code based on a good known application configuration, good known types of system calls and/or good known programming interfaces.

242

Remote Access/Secure Data Exchange

ID Name of Capability Priority Level Category Maturity Level

SE.CryptoFramework Integrated Cryptographic 2 Security Investigation Framework

Description This capability provides an integrated cryptographic framework as a single point of FIPS certification. Currently there are an endless number of competing cryptographic libraries rather than one good standard library. Crypto code is sensitive code which must be right or it can seriously impair security. Therefore, it is desirable to use as little crypto code as possible. The government requires that crypto code be certified and that you be able to prove that you are only calling certified code. The end result is that, with so many different libraries, it becomes impossible to verify the correctness of the code on a system. To solve this problem, distros should use one cryptographic library.

Full-peer Interoperability

ID Name of Capability Priority Level Category Maturity Level

SE.ADAuthAccess Authentication and 2 Security Completed Access Control

Description This capability supports LDAP and SAMBA services that accept and enforce Microsoft active directory authentication and access control (group membership) for files and resources. This capability allows Linux servers to maintain these services in a heterogeneous environment whenever Microsoft servers are under attack.

ID Name of Capability Priority Level Category Maturity Level

SE.DomainController Full Peer Domain 2 Security Completed

Description Linux can provide services accepted as full peer domains with Microsoft Active Directory. SAMBA version 3 provides LDAP and KERBEROS, which are accepted as full peer domains. This capability allows Linux servers to maintain these services in a heterogeneous environment whenever Microsoft servers are under attack.

243

ID Name of Capability Priority Level Category Maturity Level

SE.ADServer Active Directory Server 2 Security Development

Description Services provided from LDAP and KERBEROS are accepted as full domain participants, able to be either full domain or backup domain controllers or Active Directory replicas, including Global Catalog servers, so that Linux services are able to share the load for a Microsoft Active Directory environment. This capability allows Linux servers to maintain these services in a heterogeneous environment whenever Microsoft servers are under attack.

244

Usability Capabilities categorized as “Usability” and listed in this Usability section represent the usability of tools, utilities and services that a system administrator uses for servicing or managing in a non- passive way.

Maturity ID Name of Capability Priority Level Category Level

U.CommonCommandLine Common Command Line 2 Usability Completed Administration across Distributions

Description This capability represents a set of common administrative commands that are usable in a consistent way across all Linux distributions. Although the code is complete, in practice, distributions use different versions of libraries that provide varying functionality. Thus the commands supported by any two distributions might not support exactly the same features.

245

ID Name of Capability Priority Level Category Maturity Level

U.3PSI Third Party Software Integration 1 Usability Usable

Description Customers can’t adopt Linux if the third party software they require doesn’t have a Linux port on their distribution of choice. This is an investigation item to determine the barriers to rapid integration of third party Unix based software into a wide range of existing Linux distributions. We believe the most likely barrier is the fact that currently ISVs cannot port to Linux once, that is, there is no uniformity among distributions that provides the following:

• Standard APIs

• Standard install procedures

• Standard layout

• Standard startup

• Standard administration GUI hooks Another related barrier appears to be the complexity (and therefore cost) to the ISVs associated with supporting multiple architectures, types of distributions and releases of distributions given the current lack of standardization.

• This investigation would determine the following:

• Any other issues, while confirming port uniformity is a concern

• What solutions just need compromise for completion

• Where are there workable standards that can be adopted or modified

• Which standards body owns which part of these problems and needs to be involved There are two other ISV related items defined in this Capabilities document that detail important aspects of these issues. We consider them obvious and important enough to call out separately at this time.

• M.3PI Common interface for third party integration to install tools

• P.PORT Application Porting Quality

References See also in this document: M.3PI (Common Interface for Third Party Integration to Install Tools) and P.PORT (Port Quality) For Admin GUI standardization, there are many active projects, but little consensus. Wenmin provides a nice web-based tool: http://www.webmin.com/

246

ID Name of Capability Priority Level Category Maturity Level

U.Migration Migration Tools 2 Usability Usable

Description This capability provides tools that facilitate the migration from other UNIX-based applications to Linux. The existence of migration tools to Linux is the measure of completeness for this item.

247

General References

Information specific to DCL is at the OSDL website: http://www.osdl.org/projects/dcl

248