<<

Accepted for Publication: IEEE Trans. on

The and Impact of the Loss of Critical Human Skills Necessary for Supporting Legacy Systems

Peter Sandborn and Varun J. Prabhakar CALCE Electronic Products and Systems Center Department of Mechanical Engineering University of Maryland College Park, MD 20742 USA

Index Terms

Workforce planning, workforce management, obsolescence, DMSMS, life-cycle cost, organizational forgetting, legacy systems

Abstract

The loss of critical human skills that are either non-replenishable or take very long periods of time to reconstitute, impacts the support of legacy systems ranging from infrastructure, military and aerospace to IT. Many legacy systems must be supported for long periods of time because they are prohibitively expensive to replace. Loss of critical human skills is a problem for legacy system support organizations as they try to understand and mitigate the effects of an aging workforce with highly specialized, low-demand skill sets. The existing research focuses on workers that have skills that are obsolete and therefore need to be retrained to remain employable; alternatively this paper addresses the system support impacts due to the lack of workers with the required skill set. This paper develops a model for forecasting the loss of critical human skills and the impact of that loss on the future cost of system support. The model can be used to support business cases for system replacement. A detailed case study of a legacy system from the chemical manufacturing industry is provided and managerial insights associated with the support of the system drawn.

Accepted for Publication: IEEE Trans. on Engineering Management

Managerial Relevance Statement

The model developed in this paper allows managers to provide quantitative answers to the following questions: 1) when will the impact of the loss of critical human skills be felt, i.e., is

there a point in the future where the system starts to become more expensive to support; 2) “how

bad is bad?”, i.e., when will the cost of supporting the system become prohibitive; and 3) is there

an alternative to replacing the system?, i.e., could a change in hiring or training practices (if

possible) resolve the problem, and if so, what hiring rates are needed? The case study provided

in the paper demonstrates the loss of younger workers (seeking what they perceive to be more

lucrative opportunities) and clearly supports a business case for the timing of future system

replacement.

Accepted for Publication: IEEE Trans. on Engineering Management

I. INTRODUCTION

There are many types of mission, safety and infrastructure critical systems that have very long field support lives of 20-30 years or more. These systems are often referred to as legacy systems because they are based on, or are composed of, out-of-date (old) processes, methodologies, , parts, and/or application software.1 Due to mismatches between the fast rate of advancement and the massive expense associated with the replacement of these mission, safety and infrastructure critical systems, legacy system dependence is likely to be a reality forever and many of today’s new systems are guaranteed to become tomorrow’s legacy systems. Examples of legacy systems include military systems, industrial and transportation control systems, and infrastructure. Technology and part obsolescence are common problems faced by these systems, e.g., the inability to procure spare parts to maintain the systems through the system’s support life. This problem is commonly referred to as Diminishing Manufacturing Sources and Material Shortages (DMSMS) and is particularly challenging for electronic part management, [1]. Obsolescence, however, is not solely a hardware problem; it also impacts software [2] and the human skills necessary to maintain legacy systems. Human skills obsolescence is a growing problem for organizations that must support legacy systems. These organizations need to understand, forecast and mitigate the effects of an aging work-force with highly specialized (and in many cases effectively irreplaceable) skill sets.

Product manufacturing and support organizations require a dependable pool of skills so that they can perform their businesses efficiently and without interruption. However, the pool of available people and their associated skills is always in flux, requiring careful management and

1 In software engineering, legacy systems definitions are usually confined to software and architecture, this paper uses a broader definition that includes hardware and processes as well.

Accepted for Publication: IEEE Trans. on Engineering Management forecasting. Workforce planning is commonly defined as getting the right number of people with the right competencies in the right jobs at the right time. There are numerous issues that arise associated with mismatches between the skills possessed by the workforce and the skills needed by employers, which we broadly classify into the following three categories: skills obsolescence, skill shortage, and critical skills loss.

Skills obsolescence (also referred to as human capital obsolescence) is the situation in which workers (for whatever reason) do not have the skills that they need, which includes workers that have skills that are obsolete and need to be retrained. De Grip and Van Loo [3] define five types of skills obsolescence: the wear of skills due to aging or illness that may be related to working conditions; the atrophy of skills due to insufficient use (skills decay); job-specific obsolescence due to technological and organizational change, including skills that are simply no longer needed

(skill depreciation); sector-specific obsolescence due to shifts in employment; and firm-specific skills obsolescence due to displacement. A significant body of literature is devoted to the analysis of skills obsolescence, e.g., [4] and the references contained therein. Skill shortage refers to insufficient current skill competences, e.g., [5]. Skills shortage expresses the need to identify, train and retain the workforce to fill current and expected future skill gaps. Skills shortage also includes an organization’s failure to protect core skill competencies in economic downturns [6].

The third category is critical skills loss, which is the topic of this paper. Critical skills loss describes the loss of skills that are either non-replenishable or take very long periods of time

(many years) to reconstitute. Critical skills loss is a special case of “organizational forgetting”.

Organizational forgetting describes the situation where organizations lose knowledge gained through learning-by-doing due to labor turnover, periods of inactivity, and/or failure to

Accepted for Publication: IEEE Trans. on Engineering Management

institutionalize tacit knowledge [7]. Most existing analyses of organizational forgetting

implicitly assume that the situation is relatively short term and seek to forecast the recovery

period and the associated disruption in productivity and schedule, e.g., [8-10]. Critical skills loss

(the subject of this paper) represents a permanent (non-recoverable) and involuntary form of

organizational forgetting. Critical skills loss is usually the result of long-term (20+ years)

attrition where skilled workers retire and there are an insufficient number of younger workers to

take their place.2 Critical skills loss is not the result of inactivity, poor planning, or a lack of

foresight by an organization. It is simply the inevitable outcome of the organization’s dependence on specialized skills for which there is relatively little, but non-zero, demand (which

is also the nature of DMSMS-type obsolescence problems for hardware and software, see [1]).

System support and management challenges created by the loss of necessary human skills have

been reported in a number of industries including healthcare [11], nuclear power [12], aerospace

[13], and other enterprises [14]. For example, Goodridge and McGee [15] and Hilson [16]

discuss the IT industry and the shortage of mainframe application programmers experienced in legacy applications – the appropriate skills are no longer taught and younger workers are not interested in learning them. The loss of critical skills problem is particularly troublesome for organizations that must support legacy systems for long periods of time, e.g., 20-30 years or longer. For many defense systems, the problems are potentially devastating: “Even a 1-year delay in funding for CVN-76 [aircraft carrier] will result in the loss of critical skills which will take up to 5 years to reconstitute through new hires and training. A longer delay could cause a permanent loss in the skills necessary to maintain our carrier force.” [17].

2 In the case of legacy system support, 5 (or in some cases much longer) years of on-the-job experience by an engineer may be necessary to become competent.

Accepted for Publication: IEEE Trans. on Engineering Management

As the human capital capable of supporting a system shrinks, eventually the time required to

support the system will increase – the objective of this paper is to understand when the increase will occur and its magnitude. Increases in support time increase business interrupt time for manufacturing causing a loss of revenue. Increases in support time in the service, transportation and defense industries can decrease system availability, which can lead to loss of revenue, compromised safety, and possibly result in the loss of life (e.g., air support for Marines compromised by lack of helicopters due to repair delays). Due to prohibitive costs, legacy systems are often not replaced until the consequences of their age (and possible lack of support)

result in catastrophic failures, or a viable replacement business case can be constructed and sold

to their customers.3 In order to construct business cases, management must be able to provide

quantitative answers to the following questions: 1) when will the impact of human obsolescence

be felt, i.e., is there a point in time where the system starts to become more expensive to support;

2) “how bad is bad?”, i.e., when will the cost of supporting the system exceed my profit margin

(or budget)? 3) “is there an alternative to replacing the system?” could a change in hiring or

training practices (if possible) resolve the problem? The model described in this paper provides

a forecast of the loss of human skills as a function of time and the associated impact of that loss

on the system support cost. These outcomes can be used as direct support for system

replacement business case development.

A. Existing Work

The existing literature suggests that some of the possible causes for critical skills loss include: declines in education and training (e.g., university educated engineers are no longer

3 For example, “customers” may be the company owners (e.g., manufacturing system replacement), the tax-paying public (e.g., civil infrastructure replacement), or governments (e.g., military asset replacement).

Accepted for Publication: IEEE Trans. on Engineering Management

trained in the programming languages used in many legacy systems, e.g., [18]); younger workers

that are discouraged from entering particular workforces that they perceive to be in decline (e.g.,

nuclear power [12]) or not cutting-edge [19,20]; younger workers leaving legacy system sustainment jobs to pursue what they perceive to be more lucrative and exciting opportunities

(this observation is partially supported by Fig. 2 in Section III, which shows the distribution of exit age for a legacy control system); shrinking “feeder” occupations (e.g., the U.S. Navy feeding

skilled workers to the nuclear power industry [12]); older workers being protective of what they

know (not passing along their knowledge in order to protect their jobs, e.g., [21]); and

differences in social and cultural influences between younger and older workers regarding the perceptions of their jobs [15].

While there are many qualitative statements of the problem of critical skills loss and non- replenishable knowledge loss [11-17], to the authors’ knowledge there is no quantitative

forecasting of its rate or impact. The majority of existing research addresses skills obsolescence,

which is the opposite of the problem addressed in this paper wherein workers’ skills are no

longer useful or become outdated as a result of automation and the rapid growth in technology,

e.g., [3,4,22-25]. These works address ways to mitigate skill “decay” over time for a given

workforce. The existing work on critical skill loss focuses on knowledge preservation – the

realization that non-replenishable knowledge is being lost and the need to capture it, [26,27].

There is also work on retirement wave planning [28], but this work is focused on head count not

skill content.

The analytical techniques presented by Bohlander and Snell [29] modeled a situation similar

to critical skills loss, however, they do not include the attrition or the costs associated with the

varying availability of the workers. Bordoloi [30] developed a model for workers that are at

Accepted for Publication: IEEE Trans. on Engineering Management

different skill levels entering and exiting a company; additionally, the rate at which the employer

gains and loses employees is also taken into account in their planning model. However,

Bordoloi [30] stops short of estimating the experience in the skills pool as a function of time and

thereby determining the impact that critical skills loss has on the support of systems. Huang et al. [31] developed a planning model using a where the goal was to

determine an ideal hiring rate based on the different levels of skill that workers had. Although

this model uses workforce simulation and calculates an ideal hiring rate, this model does not account for the costs incurred by the unavailability of workers in their determination of an ideal

hiring rate. Holt [32] points out that the traditional understanding of a workforce has been in

terms of the “physical sum of people employed,” which is the basis for most common workforce planning models. A more useful approach is to observe workers in an organization as assets with varying skill levels. Holt defines human capital investment as the “total amount and quality of talent, knowledge, expertise, and training that these workers possess.” Holt minimizes cost while maintaining balanced levels of human capital. The model is based on categorizing workers by

“classes” wherein all workers in a class share similar attributes. However, the Holt model lacks aging and age-related effects of individual workers over time.

The maintenance workforce planning literature, e.g., [33-36], focus on resource allocation and scheduling optimization. The goals of a maintenance policy are maximizing plant availability and minimizing cost via the timely presence of maintenance workers. Koochaki et al. [33] points out that “maintenance workers are usually highly skilled and therefore difficult to recruit” and that “the efficient and effective use of a scarce maintenance workforce is very important”. Koochaki et al. [33] considers the impact of maintenance resource constraints (i.e., limited maintenance workers) on the grouping of maintenance activities and comparing

Accepted for Publication: IEEE Trans. on Engineering Management condition-based maintenance (CBM) and age-based replacement. Ahire et al. [36], minimizes the makespan (the total length of the schedule) for a sets of preventive maintenance tasks when there are workforce availability constraints. Other papers in this area have treated the influence of CBM on workforce planning and maintenance scheduling, e.g., [33] and references contained therein, however, these papers focus on the determining the optimum size of the maintenance workforce. None of the existing work treating the maintenance workforce addresses the problem of loss of irreplaceable maintenance resources.

The existing literature makes no attempt to study or verify the postulated causes of critical skills obsolescence, or quantify its impacts on an organization. The objective of this paper is to develop a model that uses existing historical workforce data (that organizations have) to forecast the future impacts of that loss and to relate the impacts to management decision. This paper is concerned with the dynamics of the cumulative skills pool (“head content”) and its relation to the cost of sustaining a system as a function of time. The questions that need to be addressed are

“what will today’s skills pool look like after ‘x’ years?” and “what impact will the future skills pool have on the ability to continue to support the system?” The model described in this paper offers a projection into the future and a way to quantify cumulative experience versus the influx of new inexperienced workers and worker departures. This model enables management to not only project future costs, but also to make quantitative business cases for system replacement.

Section II describes the dynamic human skills model developed in this paper. This is followed in Section III by a numerical case study of a legacy control system from the chemical manufacturing industry. The case study includes the analysis of the life-cycle cost impacts of critical skills loss and the managerial insights derived from the analysis.

Accepted for Publication: IEEE Trans. on Engineering Management

II. A MODEL FOR ESTIMATING THE HUMAN SKILLS POOL OVER TIME

The motivation for the model developed in this section is to forecast the future change in

“experience” relative to today’s skills pool. The model assumes that sufficient experience to support the system exists today, and determines how long it takes for the experience to decrease to a point where there is an impact on the cost and how large that impact will be.

The model presented in this section estimates the number of skilled employees (“head count” or pool size) and the cumulative experience in the skills pool (“head content”) in order to determine the resources that will be available to maintain a legacy system or family of systems in the future. The cumulative experience in the skills pool impacts the time to perform the maintenance activities to sustain a system. The time to perform maintenance to support a system is a significant cost driver for availability-centric systems. Availability is the ability of a service or a system to be functional when it is requested for use or operation and is determined by the

combination of reliability and maintainability. For many systems, the cost of unavailability is

very large, e.g., high-volume manufacturing, airlines, medical devices, etc.

The assumptions and parameters used to build the model developed in this section were

driven by the actual data available for the numerical case study (see Section III). In several cases

the simplest possible assumption was made because there was no data to support more complex

assumptions – these assumptions are addressed as they appear in the model. The model

developed in this section has four primary inputs: the distribution of current ages (fC), the

distribution of hiring ages (fH), the distribution of exit ages (fL) and the hiring rate (H). The data

from which the distributions can be constructed can be readily obtained from human resource

records kept by organizations. Using these inputs, we wish to determine the quantity of people

Accepted for Publication: IEEE Trans. on Engineering Management of a specific age in the pool as a function of time.4 Once the quantity of people of each age at a point in time are computed, the model converts the head count to cumulative experience at that point in time using a parametric relationship derived from actual data. The experience can then be used to calculate the time to resolve a maintenance event and that time (along with the quantity of maintenance events to be resolved) can be used to calculate the cost.

A. The Pool Size and Experience Calculation

In order to assess pool size and experience, we have to project (i.e., forward calculate into the future) the experience of the people in the pool based on when they were hired while for age related loss.

The level of experience within the skills pool varies over time and can be determined from the following: 1) the new hires added to the skills pool; 2) the attrition rate or loss of skilled employees due to job changes, promotions, retirement, etc.; and 3) accounting for the varying skill levels of the people in the pool and how those skill levels improve as people remain in the pool.

Initially, we assume that the organization’s policy dictates the hiring rate, H, resulting in a pool with varying skills that can be modeled over time. Since generally the data (see Section III for example) is only available on an annual basis, we utilize discrete one year time steps, i, throughout the analyses. Let H be the new hires per year as a fraction of the pool size at the start of the analysis period (i = 0). Let fC be the probability mass function (PMF) of age for the current skills pool, fH be the PMF of age for new hires, and fL be the PMF of age for people exiting the

4 Both stationary and non-stationary models are considered herein, i.e., stationary models assume that the distributions do not change over time and non-stationary models allow them to change over time governed by drift and volatility parameters.

Accepted for Publication: IEEE Trans. on Engineering Management

pool (attrition). Then, Ni(a), the net frequency of people in the pool of age a during year i, is

given by,

Ni (a)  Ni1 (a 1)  Hf H (a)1 f L (a) (1)

where, i is the number of years from the start of the analysis period. The first term in the

brackets is the current pool, the second term in the brackets is the new hires, and the multiplier

models the retention rate. Note, (1) assumes that the hiring rate, H, is the same for all ages. As a

boundary condition, we require that at i = 0, N0(a) = fc(a) corresponding to the current distribution of worker age in the skills pool. Ni(a) is a fraction that is measured relative to the starting (i = 0) pool; for example, the number of people of age a in year 0 would be the product

of N0(a) and the total initial (i = 0) head count. Therefore, the cumulative net frequency of people

in the skills pool, NNET, in year i is the sum of Ni(a) over all the ages given by,

r N NET (i)   N i (a) (2) a y

where, r is retirement age and y is the age of the youngest employee in the skills pool. We use

the term “net frequency” because NNET(i) can be greater than 1, which can occur if the number of

new hires exceeds the number of people exiting the pool. The age of the youngest worker, y, corresponds to the age of the youngest worker at the start of the analysis period (from fC) or the youngest worker that is hired during the analysis period (from fH). This model assumes that a

mandatory retirement age, r, exists. Early retirement along with other reasons for the loss of

skilled employees are captured in the PMF for attrition, fL. Conversely, the PMF of retention, fR for the skills pool can be estimated annually by, fR(a) = 1 – fL(a).

Estimating the size of the pool (head count) over time is necessary but not sufficient to capture the ability to support a system into the future because not all workers in the pool have an equivalent level of experience and therefore an equivalent level of “value” to the support of the

Accepted for Publication: IEEE Trans. on Engineering Management

system. The following analysis estimates cumulative experience in order to track the true value

of the pool of skilled workers where “experience” is defined as the length of time that a worker

has spent in a particular position. The cumulative experience of the pool in year i, Ei, can be

calculated using,

r Ei  aRE  I E N i (a) (3) a y

where, RE and IE are parameters used to map age to effective experience measured in years (RE and IE are determined using a curve fit to actual data, e.g., see Section III.B). Note, “experience”

has units of time, however, Ei is only used in the model to determine the change in cumulative

experience from the initial condition.

The cumulative experience of the pool in year i as a fraction of the current pool experience,

E0, can be calculated as,

E E  i (4) ci E0

Using the cumulative experience, the time to perform maintenance in year i is given by,

T m T m  0 (5) i E ci

m where, T0 is the time to perform a maintenance activity with a skills pool having E0 experience

at time i = 0. The time to perform maintenance modeled in (5) increases as experience decreases

due to several factors: 1) less experienced workers may require more time to perform the task

(learning curve effects), and/or 2) the number of workers who can perform the required

maintenance activity is smaller, so they may not (for example) be available at every site, i.e.,

they may have to travel from a different location.

Accepted for Publication: IEEE Trans. on Engineering Management

All of the development so far assumes a fixed hiring rate that may or may not be sufficient to maintain the current level of experience. The model can be used to estimate the hiring rate required to maintain the current level of experience over the system’s support life. The required hiring rate (assuming hiring is possible) in year i, Hi, can be solved for from (1) as,

 N (a)  1  i   Hi    Ni1(a 1)  (6) 1 fL (a)  fH (a)  subject to the following constraint which ensures that the cumulative experience is constant at

E0,

E E  i  1 (7) ci E0

The simplest assumption for solving (1) or (6) is that we have a “stationary process”. A stationary process is a stochastic process whose joint probability distribution does not change when shifted in time or space (time is the relevant parameter for this model). For a stationary process, the mean and variance do not change over time. In reality, hiring age (fH) and exit age

(fL) are not stationary processes. To model the problem as a non-stationary process, we model the change in the mean of the distribution using geometric Brownian motion,5

dX t  X t dt X t dWt (8) where Xt is the value of the quantity being simulated at time t (the mean of the hiring age or exit age distributions in our case), μ is a drift component, σ is a variance component (also called shock or volatility), and Wt is a Brownian motion (also known as a Weiner or continuous-time stochastic process). Wt is modeled as dt in our case. No shape change in the distributions is modeled (because no case study information could be obtained to support it).

5 Continuous-time models (such as Brownian motion) are commonly used to model continuous degradation processes for asset management of complex engineering systems [37].

Accepted for Publication: IEEE Trans. on Engineering Management

B. The Cost Impact of Human Skills Obsolescence

The most immediate impact of the loss of critical human skills for the support of legacy

systems is in the ability to perform system support activities in a timely manner. In general,

corrective maintenance costs consist of: spare parts, labor, downtime, overhead,

consumables/handling, and equipment/facilities [38]. When a maintenance event occurs for a

system, the cost of performing the necessary maintenance action is modeled using,

T m  T m  C m  C p   0 C l  f  0 C b (9) i i E i bj E i  ci   ci 

where the term in brackets is the maintenance time from (5), f is the fraction of the bj

p maintenance events of severity level j that result in a business interrupt, Ci is the cost of

l replacement parts (if replacements are needed) in the ith year,Ci is the cost of labor (per unit

b time) in the ith year (assumed to be burdened, i.e., includes overhead), and Ci is the cost of

business interrupt (per unit time) in the ith year. All equipment and facilities necessary for performing maintenance are assumed to be subsumed in the overhead, and consumables and

l b p handling are assumed to be negligible. Ci , Ci and Ci are discounted to a base year using the

p weighted average cost of capital. Ci may include holding (inventory) costs if the part is obsolete

in year i and was procured in a lifetime or bridge buy at an earlier time.6 The failure modes of

6 A lifetime buy refers to purchasing a sufficient number of parts to sustain a system through its entire support life (until its end of support date) at the point in time when the part is discontinued (i.e., the point in time when the part is no longer procurable from its original manufacturer), while a bridge buy only procures enough parts to sustain the system to a redesign that will replace the part [39]. Lifetime and bridge buys are a common obsolescence mitigation approach used for managing DMSMS of electronic parts where the ability to procure parts after discontinuance may be limited or introduce unacceptable risks of procuring counterfeit parts. If parts are purchased at a lifetime or bridge buy, the cost of the replacement part (this is a per part cost) in year i is given by,

Accepted for Publication: IEEE Trans. on Engineering Management

the system are assigned “severity levels” (j) that are correlated to business interrupt fractions f bj that can range from 0 to 1 (0 = no business interrupt, 1 = all the maintenance time is business interrupt). Time-to-failure distributions for each severity level are constructed from the time-to- failure distributions of the relevant failure mechanisms and these are sampled to create maintenance events using a discrete event simulation (see Section III).

The default model for maintenance time given in (5), which is embedded in (9), assumes that only the pool experience impacts the time to perform maintenance. The problem is that an organization may have sufficient experience in the pool, but there is a minimum head count that must be staffed per system (a plant, for example, would be a single installation of a system under analysis). If head count drops too low, maintenance time may be impacted even if sufficient experience exists. For such a case, the average number of people per plant is given by,

f M P  s orig (10) N

where fs is the pool size relative to today, Morig is the original number of people (corresponding

to a fully staffed organization), and N is the number of plants. Rearranging (10) to solve for the

threshold (minimum) pool size gives,

P N f  min (11) sthreshold M orig

When f  f the maintenance time is penalized using, s sthreshold

C p in C p p I p For i  n : Ci  n   k ; For i  n : Ci  i (1 w) kn1 (1 w) (1 w) where C p is the purchase price of the part, w is the weighted average cost of capital, n is the year of the lifetime buy or bridge buy, and I is the holding cost per part per year. Note, both i and n are measured relative to the base year for money.

Accepted for Publication: IEEE Trans. on Engineering Management

m T  fs T m   0  threshold (12) i E f  ci  s and (9) becomes,

m p m b l Ci  Ci  Ti fbCi  Ci  (13)

Equation (13) is used to compute the cost of maintenance events in the ith year of system support.

III. NUMERICAL CASE STUDY

This section presents a case study that implements the model developed in Section II with data from a chemical product manufacturing company. The system considered is a legacy control system (developed in the 1970s) with over 2000 instances (plants) installed and currently supported worldwide. The control system has two identical controllers designed so that one can take over for the other upon failure (or stoppage for maintenance). The system consists of software and obsolete electronics hardware, which is supported via existing lifetime buys and aftermarket procurements. The software is machine code and represents the most serious critical skills loss problem. The model developed in Section II requires three primary distribution inputs, which are determined from actual data: the current age distribution (fC), the distribution of hiring age (fH) and the distribution of exit age (fL). However, the two distribution inputs that are available from the actual data are the hiring age (fH) and a current age distribution (fC) shown in

Fig. 1. Notice that the current age distribution (Fig. 1b) has a mode of approximately 55 years of age, which, unfortunately is very close to the early retirement age from the company and demonstrates the issue that this paper focuses on.

Accepted for Publication: IEEE Trans. on Engineering Management

60 70 (a) (b) 50 60 50 40 t t n n40 u u o30 o C C30 20 20

10 10

0 0 15 20 25 30 35 40 45 50 55 60 65 15 20 25 30 35 40 45 50 55 60 65 70 Hiring Age into MOD Current Age in MOD

Fig. 1. Actual data: (a) hiring age distribution (fH), (b) current age distribution (fC).

A. Synthesizing an Exit Age (fL) Distribution

The exit age distribution (fL) for this case study is not known and needs to be synthesized in a

way that is consistent with the known hiring age and current age distributions shown in Fig.1.

A simulator was created that starts with the distribution in Fig. 1a and uses a manually defined exit age distribution to generate the current age distribution in Fig. 1b. The simulator assumes that the hiring age and exit age distributions remain constant over time (a “stationary process”) and that the company is propagated long enough to reach a steady state.7 The exit age

distribution was adjusted until a result that approximately matches the known current age

distribution (Fig. 1b) is obtained. We used 5 year increments in the simulator (assuming that this

allows perturbations in year-to-year hiring and exiting to be averaged out). The resulting

synthesized exit age distribution is shown in Fig. 2.

7 Note, the assumption that the hiring age and exit age distribution models remain constant over time is NOT an assumption of the general model developed in Section II. This assumption is specific to this stationary process case study. Note, in Section III.D the non-stationary process version of the case study lifts this assumption.

Accepted for Publication: IEEE Trans. on Engineering Management

l 0.35 o o p e 0.3 th g in 0.25 it in x b e is 0.2 le h p t o g e n0.15 p ri l u al d f 0.1 o n io 0.05 ct ra F 0 15 20 25 30 35 40 45 50 55 60 65 70 Synthesized Exit Age from MOD

Fig. 2. Synthesized exit age distribution (fL).

The distribution in Fig. 2 is a “bathtub curve”.8 It shows that a large portion of the workers exit early – most of these are workers who are changing jobs within the company. For the company modeled in this case study, it has been difficult to retain new people to support the legacy system as younger employees relocate to other job opportunities that they perceive as having better long-term career prospects. Between the ages of 45 and 60 almost no workers leave, and above 60 the workers are retiring.

B. Mapping Pool Size to Experience

The distributions shown in Figs. 1 and 2 only capture the number of workers (pool size), not

their relative experience. To get from pool size to experience one more input is needed: the

mapping from age to experience within the company. To generate the parameters for the

mapping function in (3), both the years of experience and the years of service to the company

were considered (Fig. 3). The following weighting was adjusted to maximize the correlation

coefficient between the two:

8 The bathtub curve is commonly used in reliability engineering to describe a particular form of the hazard function comprised on infant mortality, random failures, and wear-out. In our case the hazard rate (instantaneous failure rate in time) is analogous to the exit rate (in age).

Accepted for Publication: IEEE Trans. on Engineering Management

Average of Years of Service and Years of MOD Exp 40

35 y = 0.8546x ‐ 21.068 30 R² = 0.708

ce 25 n e ri e 20 p Ex 15 67% confidence intervals 10

5

0 0 10203040506070 Age

Fig. 3. Weighted experience to worker age correlation. R2 = 0.708.

Weighted Experience = 0.32(Years of Experience) + 0.68(Years in the Company) (14)

This weighting does not imply that years in the company is twice as important as experience -

experience is embedded within the years in the company.

At some point, workers cannot gain any more experience. Depending on the role of the

particular workers, experience was capped at between 5 and 15 years, i.e., the aRE  I E term

in (3) is limited to a maximum value. Ideally, if enough data was available, we could have

synthesized separate hiring, loss and current age distributions for each role, however, the number

of usable records available did not allow this.

The best-fit line shown in Fig. 3 gives the values of the parameters in (3) as RE = 0.8446 and

IE = -21.068 for this case study.

C. Stationary Process Case Study Results

This section presents a case study that implements the model developed in Section II with

data from a chemical product manufacturing company. The fC, fH and fL distributions are shown

in Figs. 1 and 2. The analysis in this section assumes that all the input distributions are

Accepted for Publication: IEEE Trans. on Engineering Management

stationary (unchanging) for the entire analysis period. The mandatory retirement age, r, is

assumed to be 65 years at which workers are required to leave the skills pool, and the youngest

worker is y = 22.

Figure 4a shows NNET, the net pool size (number of workers) over time as a fraction of the

pool size in 2010. Fig. 4b shows the relative experience (relative to 2010), and Fig. 4c shows the

average age of the workers in the pool. These results assume no hiring, H = 0. The results

indicate that although a 10% drop in head count occurs in the first 6 years, the experience

remains approximately constant (existing workers are gaining enough experience to offset the

drop in head count). After 2016, the experience drops quickly as the most experienced workers leave and are not replenished.

Accepted for Publication: IEEE Trans. on Engineering Management

1.2 (a) 1 ay d o T o 0.8 t ve ti a 0.6 el R e iz 0.4 l S o o P 0.2

0 2010 2015 2020 2025 2030 2035 2040 2045 2050 2055 YearYear

1.2 (b) 1

ce n0.8 ie er xp E0.6 e iv at el 0.4 R

0.2

0 2010 2015 2020 2025 2030 2035 2040 2045 2050 2055 YearYear

70 (c) 60

50 ge A40 ge ra ve30 A 20

10

0 2010 2015 2020 2025 2030 2035 2040 2045 2050 2055 YearYear

Fig. 4. (a) Pool size, (b) relative experience, and (c) average age. No hiring (H = 0) assumed.

If the lost skills are replenishable, we now wish to estimate the future hiring rate, Hi, required to preserve the current level of experience, E0, in the skills pool. The analysis identifies a

solution for the system of equations (the objective function in (6) is subject to the constraint in

(7)) to determine the annual hiring rate, Hi, needed to replenish the loss in cumulative experience

Accepted for Publication: IEEE Trans. on Engineering Management

10.00% e iz 9.00% l S o o 8.00% P in 7.00% ta ) n % ai ( 6.00% M ce n to e 5.00% ri te e a p 4.00% R Ex g r in o 3.00% ir H 2.00% d e ir u 1.00% q e R 0.00% 2010 2015 2020 2025 2030 2035 2040 2045 2050 2055 Year

Fig. 5. Required hiring rate, Hi, (if hiring is possible) to maintain pool experience.

as a result of attrition and retirement. The estimation of the required hiring rate can be performed

at any point during the analysis or as frequently as needed to replenish the skills pool and

maintain the desired experience level. For this example, the operation is repeated periodically at

1 year time steps until the end of the analysis period (end of the system’s support life). Figure 5

shows results for hiring rate, Hi, as a function of the number of years from the start of the

analysis.

Figure 5 shows that no hiring is initially required (note, hiring is not allowed to drop below

0, which would reflect a layoff situation). Starting in 2017 a hiring rate of over 6% is required

for 9 years and settles to 2-5% thereafter. Note, when H is greater than zero in (1), the hiring

rate is assumed to be applied to the entire hiring age distribution, fH. It is important to note that the required hiring rate in Fig. 5 takes into account both the time required to learn the skills necessary to support the system and the exit age distribution found in Fig. 2.

The impact of the results developed in this example appears in Fig. 6, which shows the annual cost of supporting the legacy control system from the chemical product manufacturing company through year 2040 (over 2000 instances of the system require support in this case).

The cost model used is a stochastic discrete event simulator that samples time-to-failure

Accepted for Publication: IEEE Trans. on Engineering Management

$16,000,000

$14,000,000 Human obsolescence included in cost model $12,000,000 $2M st o$10,000,000 C rt o $8,000,000 p p u $6,000,000 S Human obsolescence not included $4,000,000

$2,000,000

$0 2010 2015 2020 2025 2030 2035 2040 Year

Fig. 6. Annual cost to support a legacy system with and without the inclusion of skills obsolescence (H = 0). For this figure w = 0 (no cost of money), therefore, the results are actual dollars. distributions for the components of the control system to obtain maintenance events (event dates and components that need replacement). Subsystem-specific (and severity category specific9) failure distributions are sampled to obtain failure dates for the system. At each maintenance event, maintenance resources are drawn and a cost is estimated using (13). The majority of the maintenance events do not result in business interrupt time because they only impact one of the two parallel control systems and f = 0, however, a small fraction (the most severe events) bj result in dual control system failures. The risk of dual failures and the resulting business interrupt is captured by the differing severity categories. The specific data associated with the system count, the subsystem/severity category reliabilities, and the cost of business interrupt time is proprietary to the customer and not included here.

9 Severity categories indicate both the level of maintenance required (which translates into the maintenance resources needed) and the amount of business interrupt (if any) associated with the maintenance event.

Accepted for Publication: IEEE Trans. on Engineering Management

For the case study, f was determined to be 0.54, or once the number of people in the sthreshold pool drops below 54% of the number that are in the pool today (in 2010), extra maintenance time

penalty (modeled by (12)) will occur.

Two results are shown in Fig. 6. The results show that there is little effect of skill

obsolescence prior to 2025, after 2030 the impact of the loss of skills becomes significant. Note,

in year 2028 the availability of spares (hardware) also becomes a problem resulting in the cost

step shown between years 2028 and 2030. The lowest curve in Fig. 6 assumes that there is no

human obsolescence ( E = 1 for all i in (9)). Even in this case, the annual cost increases due to ci

part obsolescence that results in lifetime buys of parts that tie up significant capital in pre-

purchased parts and long-term holding costs is evident. The highest cost curve shown in Fig. 6

assumes that no replenishment of lost skills is possible (H = 0, which is close to reality for this

case study).

D. Non-Stationary Process Case Study Results

All the results presented so far are for the stationary process, i.e., the hiring age and exit age

distributions assumed to be the same at all times in the future. Figure 7 is for the same case as

Fig. 4b except that a non-stationary model with the following parameters: μ = 0.005, σ = 0.01

(hiring age), and μ = -0.005, σ = 0.01 (exit age) has been used. 100 samples are shown in Fig. 7

and the stationary result is indicated. It is evident that the stationary result represents a better-

than-average case. A histogram of the experience in 2021 relative to 2010 is shown as an inset to demonstrate the utility of the non-stationary solution.

Accepted for Publication: IEEE Trans. on Engineering Management

Fig. 7. Non-stationary model. Relative experience. No hiring (H = 0) assumed.

E. Managerial Insights

Figure 6 is only an example result but it serves to demonstrate how the model developed in this paper can be used to draw important managerial insights. The two questions posed in the introduction to this paper were: (a) when will the company start to see the effects of the loss of critical skills, and (b) how bad will it be? Based on Fig. 6, one would conclude that for the example considered in the case study the impact of human obsolescence will not be noticed until

2030 and the cost of supporting the system as a result of impending human skills obsolescence will increase to over $13M/year by 2040. What could management do with this information?

This paper offers a means to estimate the theoretical hiring strategy for replenishing a skills pool based on experience. The result shown in Fig. 5 emphasizes a potential deficiency in the planning and allocation of human skills to address future sustainment needs - unfortunately, in the case study described in this paper, changing the hiring profile to that shown in Fig. 5 is impractical (practically speaking, the hiring into the sustainment groups is close to zero). An alternate strategy would be to manage sustainment needs through planned “phase-out” of

Accepted for Publication: IEEE Trans. on Engineering Management

existing systems. The result in Fig. 6 tells management that a replacement system needs to be

developed and deployed to the plants before 2030; this information can be used as part of a

business case made to higher levels of management to support long-term strategic planning. It should be noted (and as demonstrated in Fig. 7) that the stationary solution in Fig. 6 is a better- than-average case. Also, as discussed in the next section, while in the case study “break-fix” may be adequately supported until 2030, the personnel performing break-fix tasks provide additional value that will be lost long before 2030.

In addition to the application-specific insights, Fig. 2 in the case study supports the supposition that younger workers leave legacy system sustainment jobs for other jobs in the company - exit age peaks at 35 years and nobody exits the company between 45 and 55 years.

IV. DISCUSSION AND CONCLUSIONS

The general workforce planning problem is ensuring that you have the right number of people, with the right skills sets, in the right jobs, at the right time. To address this problem in cases where the workforce is non-replenishable, a model for the loss of critical skills necessary to support legacy systems has been developed. The model estimates the number of skilled employees (pool size) and the combined (cumulative) experience of the skills pool in order to determine the resources that will be available to maintain a legacy system or family of systems.

The cumulative experience of the skills pool impacts the time (and thereby the cost) to perform the necessary maintenance activities to sustain a system. Because of the prohibitive cost of replacement, legacy systems are often not replaced until their age results in catastrophic failures or support costs that are untenable. The model presented in this paper can be used to support

Accepted for Publication: IEEE Trans. on Engineering Management

business cases for system replacement by forecasting when issues will arise and how expensive

they will be.

Several simplifying assumptions have been made in the model described in this paper. In our

solution, we have assumed that everyone entering the pool has no experience – depending on the

application this may not be the case; some workers could enter the pool with relevant experience

gained from other positions. Our solution also assumes that the only way workers gain

experience is through years on the job. There may be other methods that can accelerate the rate

at which workers become more experienced, e.g., knowledge bases created to capture the experience of older workers [26,27]. A discrete time analysis is presented in this paper because the available input data only exists on an annual basis, however, a continuous time solution could also be developed.

There are also indirect consequences of a shrinking skills pool that are difficult to quantify in terms of money. The people who are maintaining systems (especially if they are engineers) are

most likely performing other tasks as well. Besides “break-fix,” their tasks may include

preventative maintenance, upgrade projects and knowledge transfer. As resources decrease, we

assume that all tasks except break-fix tasks would decrease; however, even if sufficient resources

are available for break-fix tasks, an acceleration in critical skills loss can occur as a result of

dwindling resources. Reducing non-break-fix tasks can help by increasing or improving: future

maintenance efficiency, system reliability that could decrease future maintenance requirements,

system performance, and retention of skilled people. There are many other factors that could

modify the case study results presented. These include geography (location in the world),

gender, the type of product produced, etc. These effects can certainly be analyzed with the

model if sufficient data existed to distinguish groups of workers and places of employment.

Accepted for Publication: IEEE Trans. on Engineering Management

The model presented in this paper has only been used to assess support (maintenance and

business interrupt) costs, but it could also be used to assess system availability penalties (beyond

business interrupts) that may be assessed by customers for systems subject to availability

contracts such as performance-based contracts [40].

Finally, the initial use case presented in this paper should be followed-up with a larger

population that potentially enlists the cooperation of more than one company possibly via the

solicitation of support from relevant trade organizations whose members share a common

interest in this topic.

REFERENCES

[1] P. Sandborn, “Trapped on technology’s trailing edge,” IEEE Spectrum, vol. 45, no. 4, pp. 42- 58, 2008. [2] P. Sandborn, “Software obsolescence - Complicating the part and technology obsolescence management problem,” IEEE Transactions on Components and Packaging Technologies, vol. 30, no. 4, pp. 886-888, 2007. [3] A. De Grip and J. Van Loo, “The of skills obsolescence: A review,” Research in Labor Economics, vol. 21, ed. A. De Grip, J. Van Loo, and K. Mayhew, Elsevier, pp. 1-26, 2002. [4] A. De Grip, J. Van Loo, and K. Mayhew, The Econonics of Skills Obsolescence: Theoretical and Emperical Applications, Research in Labor Economics, vol. 21, Elsevier, 2002. [5] F. Green, S. Machin, and D. Wilkinson, “The meaning and determinants of skills shortages,” Oxford Bulletin on Economics and Statistics, vol. 60, no. 1, pp. 165-187, May 1998. [6] K. Melymuka, “Don’t just stand there – Get ready,” Computerworld, vol. 36, no. 26, pp. 48- 49, 2002. [7] D. Besanko, U. Doraszelski, Y. Kryukov, and M. Satterthwaite, “Learning-by-doing, organizational forgetting and industry dynamics,” Econometrica, vol. 78, no. 2, pp. 453-508, March 2010. [8] C. D. Bailey, “Forgetting and the learning curve: A laboratory study,” Management , vol. 35, no. 3, pp. 340-352, March 1989. [9] L. Argote, S. Beckman, D. and Epple, “The persistence and transfer of learning in an industrial setting,” , vol. 36, no. 2, pp. 140-154, 1990.

Accepted for Publication: IEEE Trans. on Engineering Management

[10] L. Argote, Organizational Learning: Creating, Retaining and Transferring Knowledge, Springer, New York, 2013. [11] J. Waldman, “Measuring retention rather than turnover: A different and complementary HR calculus,” Human Resource Planning, vol. 27, no. 3, pp. 6-9, January 2004. [12] “Nuclear workforce planning: Preparing for the long haul,” ScottMadden Perspectives, Spring 2008. Available at: http://www.scottmadden.com/insight/233/Nuclear-Workforce- Planning.html [13] “Testimony of Elliot Pulham President & CEO, The Space Foundation, Before the Commission on the Future of the U.S. Aerospace Industry” May 14, 2002, Available at: http://www.spaceref.com/news/viewpr.html?pid=8335 [14] M. Leibold and S. Voelpel, Managing the Aging Workforce: Challenges and Solutions, John Wiley & Sons, New York, NY, 2006. [15] E. Goodridge and M. K. McGee, “IT’s generation gap,” Informationweek.com, 2002. [16] G. Hilson, “Generation gap is big iron’s only threat,” Computing Canada, vol., 27, no. 7, pp. 1 and 8, March 23, 2001. [17] Congressional Record Volume 140, Number 65 (Monday, May 23, 1994). Available at: http://www.gpo.gov/fdsys/pkg/CREC-1994-05-23/html/CREC-1994-05-23-pt1-PgH68.htm [18] S. Shead, “Universities won’t teach ‘uncool’ Cobol anymore – but should they?” ZDNet, March 7, 2013. Available at: http://www.zdnet.com/universities-wont-teach-uncool-cobol- anymore-but-should-they-7000012250/ [19] J. D. Ahrens, N. Prywes and E. Lock, “Software process reengineering: Towards a new generation of CASE technology,” J. Systems Software, vol. 20, pp. 71-84, 1995. [20] W. S. Adolph, “Cash cow in the tar pit: Reengineering a legacy system,” IEEE Software, vol. 13, no. 3, pp. 41-47, 1996. [21] D. M. Andolšek, “Knowledge hoarding or sharing,” Proceedings of the Management, Knowledge and Learning International Conference, Celje Slovenia, 2011. [22] S. S. Dubin, “Obsolescence of lifelong education,” American Psychologist, vol. 27, pp. 486- 498, 1972. [23] J. Allen and A. De Grip, Skill Obsolescence Lifelong Learning and Labour Market Participation, Research Centre for Education and the Labour Market, Maastrict: Maastricht University, 2004. [24] W. Arthur, W. Bennett Jr., P. L. Stanush, and T. L. McNelly, “Factors that influence skill decay and retention: A quantitiatve review and analysis,” Human Performance, vol. 11, no. 1, pp. 57-101, 1998. [25] J. D. Hagman, Effects of Training Schedule and Equipment Variety on Retention and Transfer of Maintenance Skill, Alexandria, VA: U.S. Army Research Institute for Behavioral and Social , Research Report 1309, 1980. [26] C. Joe and P. Yoong, “Harnessing the knowledge assets of older workers: A work in progress report,” Decision Support in an Uncertain and Complex World: The IFIP TC8/WG88.3 International Conference, pp. 392-401, 2004.

Accepted for Publication: IEEE Trans. on Engineering Management

[27] D. E. Hailey and C. E. Hailey, “The nature of process preservation: Tools for capturing and preserving professional skills.” Available at: http://imrl.usu.edu/publications/ProcessP.pdf [28] B. Friel, “Reality check,” Government Executive, May 15, 2002. [29] G. W. Bohlander and S. A. Snell, Managing , 15th edition, Mason, Ohio: South-Western CENEGAGE Learning, 2010. [30] S. K. Bordoloi, “Flexibility, adaptability and efficiency in dynamic manufacturing systems,” Journal of Production and , vol. 8, no. 2, pp. 133-150, 1999. [31] H.-C. Huang, L.-H. Lee, H. Song, and B. T. Eck, “SimMan - A simulation model for workforce capacity planning,” Computers and , vol. 36, no. 8, pp. 2490- 2497, 2009. [32] B. A. Holt, Development of an Optimal Replenishment Policy for Human Capital Inventory, Ph.D. Dissertation from the University of Tennessee Knoxville, 2011. [33] J. Koochaki, J. A. C. Bokhorst, H. Wortmann, and W. Klingenberg, “The influence of condition-based maintenance on workforce planning and maintenance scheduling,” International Journal of Production Research, vol. 51, no. 8, pp. 2339-2351, 2013. [34] S. Martorell, M. Villamizar, S. Carlos, and A. Sanchez, “Maintenance modeling and optimization integrating human and material resources,” Reliability Engineering and System Safety, vol. 95, no. 12, pp. 1293-1239, December 2010. [35] D. Ait-Kaki, J.-B. Menye, and H. Kane, “Resources assignment model in maintenance activities scheduling,” International Journal of Production Research, vol. 49, no. 22, pp. 6677- 6689, November 2011. [36] S. Ahire, G. Greenwood, A. Gupta, and M. Terwillinger, “Workforce-constrained preventative maintenance scheduling using evolution strategies,” Decision Sciences, vol. 31, no. 4, pp. 833-859, 2000. [37] N. Gorjian, L. Ma, M. Mittinty, P. Yarlagadda, and Y. Sun. “A review on degradation models in reliability analysis,” Proceedings 4th World Congress on Engineering Asset Management, Athens, Greece, 2009. [38] W. J. Fabrycky and B. S. Blanchard, Life-Cycle Cost and Economic Analysis, Prentice Hall, Upper Saddle River, NJ, 1991. [39] D. Feng, P. Singh, and P. Sandborn, “Optimizing lifetime buys to minimize lifecycle cost," Proceedings of the 2007 Aging Aircraft Conference, Palm Springs, CA, April 2007. [40] R. Beanum, “Performance-Based logistics and contractor support methods,” Proceedings of the IEEE Systems Readiness Technology Conference (AUTOTESTCON), 2006.