Operations Practice What high-reliability organizations get right

Technology isn’t the only—or even the most important—reason high-reliability organizations outperform their peers.

by Raffaele Carpi, Peter Claus, Imke Mattik, and Patrick Schulze

© iStockphoto/Getty Images

May 2019 As Industry 4.0 continues to advance with The three core business practices that breathtaking speed, unleashing new capabilities drive reliability at equally breathtaking speed, it’s all too easy We selected eight best-in-class reliability for business leaders to succumb to relying solely organizations from a cross-section of industries, on technology to drive operational improvement. based on internal reliability metrics (such as Automation, advanced analytics, digital percentage of downtime and overall reliability) and performance management, cloud computing, external performance benchmarks and industry machine learning—all offer powerful and game- awards for operational excellence. We then changing ways for organizations to achieve new interviewed in depth a dozen of their leaders from heights in operational performance. the reliability, operations, or functions to identify the organizations’ key characteristics and But the costs and effort these technologies and practices. platforms entail can often exceed their payoff. The expectations surrounding them, it turns out, As varied as our study sample was—it spanned are often inflated. Take, for example, advanced the mining, oil and gas, power generation and analytics-driven predictive maintenance. As a distribution, pharmaceuticals, airline, and military means of boosting reliability, it is not the panacea sectors—all the organizations adhere to three many think it is. Without engineers who are trained fundamental business practices (Exhibit 1). in data analysis and in developing solutions based on those analytics, companies cannot HROs implement robust reliability processes. possibly expect to realize the full potential of the All eight organizations follow strong reliability technologies. Often, there are simpler, more cost- processes across their operations, from the ground effective ways to accomplish the same goal. level up. In this respect, they stand out from the average reliability organization, whose processes Moreover, technology alone does not make for are either lacking specifics or inconsistently excellence in reliability. In industries that live by followed. the laws of science, leaders often underestimate the role of management processes and skills in For example, HROs clearly define the assets reliability- success. critical to their operations, ensuring that the list is not merely well-understood but also considered in Research we conducted in a cross-section of decision making. They are skillful in disseminating predominantly heavy-asset industries reveals what the definitions and standards throughout their distinguishes high-reliability organizations (HROs) companies. They create equipment-reliability from the rest. These companies focus as much on strategies and execute them by strictly following the enablers—the rigorous processes, role clarity, preventative-maintenance schedules, closely and accountability systems—as they do on the monitoring equipment health, and identifying issues Industry 4.0 technologies. and proactively or promptly resolving them.

Yet as essential as these enablers are, they’re still HROs also engage in root-cause problem solving to not enough. HROs also focus on talent: they put a determine underlying issues and implement holistic, premium on certain skills that other companies don’t, practical solutions. Their reliability engineers draw and they invest more in professional development. on a variety of data sources, tools, capabilities and Finally, HROs structure their organizations subject-matter expertise. according to how centralized the function and its accountability are. To be sure, advanced Another common practice among HROs is that technologies can deliver dramatic improvements, they all have robust systems in place for managing, but ultimately, it’s the human element that spells preserving, disseminating, and updating their success.

2 What high-reliability organizations get right Exhibit 1 Principle 1: High-reliability organizations follow three fundamental business practices.

reliability knowledge base—including both done when they need to get done. Period.” HROs reliability analysis and reliability design standards. employ systematic methods to carry out root-cause For instance, these companies effectively share problem solving on the front lines. learnings from every reliability event, and update their equipment-design standards and work They define roles clearly and institutionalize processes accordingly to ensure the event is knowledge. not repeated. Finally, HROs hold other functions Roles and responsibilities are clearly defined and accountable to execute the reliability processes well understood by operational leaders as well as they’ve put in place. all those with whom they work: the plant managers, maintenance leaders, operators, technicians, One manager from an energy company noted supply-chain managers, and so forth. Each member that his organization eschews advanced reliability of the organization has a clear understanding of the techniques or “fancy predictive maintenance role they play in driving reliability, so the guidelines models,” relying instead on traditional root-cause for dealing with and engaging personnel are thus problem solving and defect-elimination approaches unambiguous. A leading pharmaceutical company to get results. Another interviewee, a former US in our research, for example, rotates personnel— Naval submarine officer, put it plainly: the reason including reliability engineers—to give them a there’s rarely a failure of critical equipment “is firsthand understanding of the critical roles in the two-fold: the design is robust, and things just get organization and how they interact

What high-reliability organizations get right 3 “The design is robust, and things just get done when they need to get done. Period.” – Former US military officer

They set accountability at the executive level levels with those of their competitors. Salaries at and delegate it down. the HROs—average, as well as the low and high HROs believe accountability resides at the top. end of pay scales—were 15 percent higher than To ensure that, they set executive compensation those of their peers (Exhibit 2). HROs also reward according to reliability-specific metrics and high performance; several we interviewed have outcomes. These organizations establish clear developed specific key performance indicators corporate reliability standards and communicate (KPIs) for performance-based compensation and them well: for example, they’re included in bonuses. capability models and reliability metrics, which are tracked publicly on scoreboards. In addition, HROs The fact that HROs pay better is hardly surprising: discuss outcomes at all levels of the organization, in any industry, offering a higher salary is an from the frontline control room to the boardroom. obvious way to attract top talent. But it is by no At a major power generation company, executive means the only way. Nor does it guarantee talent sponsorship is considered a key success factor. retention or reliability success. “Senior executives really bought into frontline support for reliability and communicated its HROs prize communication and problem- importance for our business clearly and frequently.” solving skills. They appreciate that technical expertise alone is not enough for a first-rate reliability HROs put people first engineer. Knowing how to solve problems and HROs recognize that it takes more than technical how to communicate—up, down, and across expertise to make a great reliability engineer. the organization, in ways that earn trust and Our research revealed that to attract and retain support—are critical skills. In fact, HROs rank the best and the brightest, HROs follow three communication and problem-solving among the specific talent-management practices: they five most critical skills in job candidates. pay higher salaries, emphasize communication and coordination skills relevant to the reliability HROs recognize the importance of being able engineer’s role as cross-functional problem solver, to communicate equally well with the frontline and provide well-defined career options and paths. and management; their engineers are adept at translating technical issues into laymen’s HROs offer higher pay than their peers. terms that their non-technically trained peers We analyzed two years’ worth of job postings from can understand. Effective problem solvers are the companies in our sample, comparing their pay solid conceptual thinkers who can understand a

4 What high-reliability organizations get right Exhibit 2

problem or condition by identifying patterns and HROs create attractive paths for career connections that reveal which underlying issues advancement. to address. They’re also leaders: HROs give more Reliability engineers typically follow one of three weight to leadership skills than the more than 70 main career tracks, each with different rates of other organizations in our comparison set (Exhibit 3). retention (Exhibit 4).

Creativity and rigor in problem solving are always —— Field-based engineer to operations manager: valuable, particularly at a time when organizational Entry-level personnel with technical degrees complexity is growing and the costs of failure make up this track. While each organization is (financial, social, and environmental) are ever- different, we see common patterns. Generally, increasing. Each day that a plant is sidelined can there is little opportunity for advancement in translate into millions of dollars lost; a malfunction the initial years. Near the five-year mark, field that releases toxins in the environment can cause engineers have two options if they want to stay untold damage and even loss of life. Similarly, with the company: they can either become a today’s higher-stakes operating environment supervisor, or switch career tracks. There is heightens the importance of communication skills: no ultimate role in for specifically, the ability to engage different teams in reliability-focused personnel. Not surprisingly, constructive dialogue, raise concerns and potential this track experiences the lowest retention issues proactively, and foster consensus on the levels: fewer than one in four remain in it to appropriate action to take. become field operations leaders.

What high-reliability organizations get right 5 Exhibit 3

4 Problem solving

6 Leadership

Leadership

—— Junior to senior onsite subject matter expert them at a specific career milestone. They will (SME): These individuals typically have five years be able to choose among different tracks, of industry experience or an advanced degree including corporate-level functional expert. (or both), and usually spend the first few years Not surprisingly, employees on this track have as a functional area expert. At that point, their the highest retention levels, as they have more choices are either to become a senior SME, opportunity to progress to higher-level roles and change tracks—or leave the company. Engineers influence decision making across multiple sites. in this track have greater longevity than the first However, companies must manage expectations track; 50 percent remain in the track, most likely and performance effectively to ensure that because companies effectively prescreen for the corporate SME stays connected with site this role, and senior onsite SME is considered an operations and continues to deliver value across ultimate role. the network.

—— Field-based engineer to corporate SME: From HROs demonstrate that they value quality talent by the get-go, these engineers (either entry- investing accordingly in their reliability bench. They level or experienced technical personnel) establish well-defined career tracks to give their know that corporate-level opportunities await engineers ample opportunities for professional

6 What high-reliability organizations get right Exhibit 4

Site-based Pre-dened decision Corporate engineer point: Can choose SME between tracks, including SME

Junior Become senior Senior eld-based SME² eld-based SME, eld-based (within corporate change track, or SME reliability group) leave?

Field-based Become Operations engineer supervisor, change manager track, or leave?

and personal development. The HROs we studied At a major energy company, field-based reliability employ a variety of talent-management and employees are “truly engineers, closely linked to -retention practices. All strive to offer numerous the equipment,” as a former manager noted. The career-path options, even within the technical or company considers them high-potential employees managerial ranks. and “gives them the option to pursue other technical, commercial, or managerial roles.” Moreover, HROs are committed to training their people on an ongoing basis, whether through The former head of reliability at another major internal classroom sessions for professional energy company commented on the multiple career certification, paid time off to attend industry- options of field reliability engineers, including sponsored events, on-the-job training, or informal moving into operations or risk management. “More mentoring. importantly,” the leader added, “their career is

What high-reliability organizations get right 7 well-managed from the start, with performance A center of excellence. reviews every six months. We tend to keep our good With this archetype, the corporate center engineers.” establishes rigorous reliability protocols, but each local site is responsible for demonstrating its adherence to processes and procedure. This HROs recognize that form follows archetype works well for organizations whose function assets are uniform and where there is little For all their similarities, HROs vary widely in variability or change in the production process. structure, according to the specific characteristics and challenges of their industry, their overall At a major airline, each fleet has a dedicated organization structure and culture, and the nature team of reliability engineers who work remotely of their products. Essentially, there are four basic to monitor equipment and oversee basic archetypes that vary along two dimensions: the maintenance (when heavy maintenance is needed, strength of the central reliability function, and how they travel to sites). The center’s reliability centralized accountability is: that is, who tracks analysts study big data and their analysis asset reliability and who is ultimately responsible for informs decisions about preventive maintenance. outcomes (Exhibit 5). Reliability engineers engage subject-matter

Exhibit 5

8 What high-reliability organizations get right experts from the central corporate-reliability A major resource company illustrates the benefit function (along with other technical resources) to of this model, with its dozens of facilities and help make recommendations. The senior manager assets that include all manner of heavy equipment, of the company’s large maintenance organization refineries, and processing operations. Although reports directly to the COO. reliability programs and accountability are decentralized, the company defines metrics for use Command-and-control. across different sites—and considers it a priority to In this model, a central team designs, implements, make them transparent to the entire organization. and enforces the reliability programs and standards throughout locations. This approach works well Corporate oversight. for industries or companies that are very process- With this archetype, local operations define the oriented and where there are distinct differences reliability program, but the corporate center tracks between production or operating facilities. In such (and has ultimate responsibility for) outcomes. The companies, strong oversight is needed; there is a central office often develops KPIs for use enterprise high risk of catastrophic failure, and repeat failures wide. This model is effective for businesses whose are unacceptable. A top-down approach ensures products vary considerably and which require all facilities and businesses comply with corporate relatively tailored processes across facilities (such reliability standards. as pharmaceutical companies). The hybrid structure makes sense, given the strong local leadership At a leading oil and gas company, a functional group that can be relied upon to carry out reliability sets global reliability strategy, and a small central without direct responsibility for outcomes. Crucial SWAT team implements procedures throughout prerequisites include either having strong, local the sites. (SWAT, because they are quick, tactical, reliability processes and capabilities, or having other and execute with precision.) Integrity and reliability robust processes and strict metrics that reinforce teams are responsible for site-level reliability. reliability excellence, such as through quality control. Each of the company’s business units has a reliability engineer who oversees maintenance and At a leading pharmaceutical company with dozens implementation for each discipline (for example, of facilities, corporate maintains consistency in equipment rotation or electrical operations). The reliability practices by establishing clearly defined functional group works with the local reliability KPIs, which local sites report on via dashboards. engineer, operators, technicians, operations The central team also shares best practices (and managers, and vice presidents, all of whom are failures) companywide. It holds weekly meetings versed in the reliability standards that apply to their with local facilities to review key topics and issues, roles. and provides resources for major initiatives, such as implementing new technologies. Bottom-up reliability. Here, local entities define reliability standards But it’s field engineers who lead such initiatives. and practices and are responsible for reliability Local reliability personnel also focus on root- outcomes. Technicians conduct tests and file cause investigations (RCIs) and failure-mode and reports to central reliability teams with the help of effect analysis (FMEAs). Led by a maintenance process engineers. In some cases, reliability work manager and general manager, the local engineers, is outsourced. Such an approach is well-suited to maintenance group, project teams, and planning decentralized businesses where no two sites are and scheduling teams work in concert with the alike, either in their culture, operating environment, central reliability function. or both.

What high-reliability organizations get right 9 A driving principle and enterprise financial pressures, and intensified public and priority regulatory scrutiny all mean that reliability Reliability engineering emphasizes statistical organizations, regardless of industry, have analysis, but experience and history show that their work cut out for them. Leaders cannot quantitative methods alone are insufficient for expect Industry 4.0 technologies alone to be success. In many of the most dramatic operational a cure-all. Rigorous reliability processes, role failures in modern times, miscommunication or clarity, and clear accountability structures that poor decision making exacerbated a fundamental align with the broader organization—all are engineering failure—in some cases resulting in essential components of reliability success. So catastrophic loss of life that could have been is talent management and development. But averted. Such events stand as sobering reminders high-reliability organizations go one step further: of the importance of rigorous management, they make reliability an explicit priority, not just transparency, and accountability in reliability an afterthought. As one reliability manager put engineering. it: “Everyone at our company takes reliability seriously, not just the reliability engineers. They Today, supply-chain complexity, heightened know that reliability is a top priority, and one of our business interdependencies, competitive and main criteria for success.”

Matt Gentzel is a partner in McKinsey’s Pittsburg office, where Bill McDonnell is a business analyst; Ethan Hessney is an engagement manager in the New York office; and Joël Thibert is an associate partner in the Santiago office.

Designed by Global Editorial Services Copyright © 2019 McKinsey & Company. All rights reserved.

10 What high-reliability organizations get right