<<

IMPLEMENTATION OF PROGRAMMATIC QUALITY AND THE IMPACT ON SAFETY

Dale T. Huls, Kevin M. Meehan

National Aeronautics and Space Administration , Code OE, 2101 NASA Parkway, Houston, Texas 77058 dale.huls-1@.gov [email protected]

ABSTRACT d) Affordability -- the product or service must be cost- effect. The implementation of an inadequate programmatic quality assurance discipline has the potential to The “quality” of a product or service can therefore be adversely affect safety and mission success. This is best determined by how well it satisfies the above criteria. demonstrated in the lessons provided by the While quality is “achieved” by defining efficient and Apollo 13 Challenger, and Columbia accidents; NASA effective processes to design and fabricate parts or Safety and Mission Assurance (S&MA) benchmarking provide services, it is “assured” by verifying that those exchanges; and conclusions reached by the Shuttle processes are adhered to and remain effective and Return-to-Flight Task Group established following the efficient. Columbia Shuttle accident. Examples from the ISS Program demonstrate continuing issues with In the case of human space flight and, in particular the programmatic quality. Failure to adequately address ISS, quality assurance can be broken into two distinct programmatic quality assurance issues has a real areas. The first area is “quality control”, or the real- potential to lead to continued inefficiency, increases in time verification that certain activities have been program costs, and additional catastrophic accidents. satisfactorily completed in compliance with requirements. The second area, which is the focus of 1. INTRODUCTION this paper, is the “programmatic quality assurance function” (hereafter referred to as the QA function). 1.1. Purpose The QA function establishes the requirements and processes governing (a) the design and fabrication of The purpose of this paper is to provide a generic systems (hardware and software); (b) the assembly and perspective of how the implementation of an inadequate operation of those systems, both ground and on-orbit; programmatic quality assurance discipline has the (c) the identification, documentation, and resolution of potential to adversely affect safety and mission success. deviations from requirements; and (d) the oversight It is also to demonstrate that while the NASA “culture” function that assesses the overall effectiveness of those continues to focus on improving its safety processes and processes, adherence to requirements, and process organizations following the Columbia tragedy, an equal improvement. level of effort and management focus on improving the NASA quality assurance processes is essential for For most organizations that involve highly complex ensuring future safety and mission success of future high-risk endeavors, organizational responsibility for human space flight missions the QA function is typically structured such that:

2. BACKGROUND • Senior management establishes overall quality expectations, roles, and responsibilities; 2.1. Quality of Products and Services • Quality is achieved and maintained by those assigned responsibility for performing the work; There are essentially four basic criteria for determining and the success of a product or service: • Quality is verified, “or assured” by those not directly responsible for performing the work. a) Utility -- a product or service must perform as expected; 2.2. ISS QA Organizational Structure and External b) Reliability -- the product or service must be Interrelationships dependable when called upon; c) Safety -- the product or service must be safe for An example of NASA QA organizational structure is use; and demonstrated by the ISS Program. NASA’s QA function for the ISS Program resides within the ISS ______Proc. of the First IAASS Conference "Space Safety, a New Beginning" 25 - 27 October 2005, Nice, France (ESA SP-599, December 2005)

Safety & Mission Assurance/Program Risk (ISS “cultural” deficiencies in one program are often S&MA/PR) Office. Within the ISS S&MA/PR Office, common to the other programs at JSC. an ISS QA Manager and limited support staff has been assigned responsibility for overall development and The most visible and still most relevant examples of the implementation of ISS quality requirements. impact that deficient programmatic quality assurance processes can have on safety involve the four major The majority of the staff responsible for performing the accidents that NASA has experienced in its human ISS S&MA tasks are “matrixed” (i.e., assigned to space flight program – the Apollo 1 fire, Apollo 13, support) to the ISS Program from NASA institutional Challenger, and Columbia -- and the failure of organizations. In the case of the ISS S&MA/PR Office, corrective actions implemented in response to the first the majority of its safety, quality, and reliability support three accidents to prevent future failures. From these personnel are matrixed from the Johnson Space Center examples, one can draw a direct correlation between (JSC) S&MA ISS Directorate, as illustrated in Figure 1. poor quality implementation and a catastrophic incident. Personnel from other NASA Centers performing ISS In addition, from the corrective actions implemented S&MA functions are matrixed in a similar fashion. after each accident, one can see that the NASA “culture” continues to place an emphasis on “safety” NASA over “quality” without fully comprehending that while Administrator “safety” is a technical discipline (e.g., development of hazard assessments; failure modes and effects analysis), similar to the engineering discipline, quality is an Associate Administrator for “assurance” discipline that requires a different set of skills and experience to adequately oversee the overall effectiveness of other relevant disciplines, such as ISS Program NASA Center engineering and safety. Manager Director While the Columbia tragedy is still recent and corrective ISS S&MA/PR Support Center S&MA actions and recurrence controls are still being Manager Agreements Manager implemented by NASA, there is evidence that NASA has once again failed to grasp the importance of ISS QA QA Center S&MA programmatic quality assurance and its impact on safety Manager Personnel Personnel and mission success, as documented in Appendix A.2 of Figure 1. ISS Programmatic QA Structure the Final Report of the Return-to-Flight Task Group [1]. This report documents an independent assessment of NASA’s progress and effectiveness in resolving the The roles and responsibilities to be fulfilled by the findings identified by the Columbia Accident matrix personnel on behalf of the ISS QA Manager, as Investigation Board (CAIB) in its report following the well as the other staff matrixed to the ISS S&MA/PR Columbia accident, with Appendix A.2 documenting Office, are documented in agreements established dissenting opinions about the effectiveness of those between the ISS S&MA/PR Office and the JSC S&MA corrective actions and recurrence controls. Office. Another set of examples that provide a basis for In addition to being responsible for ensuring proper improving NASA’s “culture” with respect to quality implementation of quality assurance requirements assurance is contained within the benchmarking defined by the ISS S&MA/PR Office, the JSC S&MA assessments performed by NASA Headquarters’ S&MA QA personnel matrixed to the ISS QA Manager have organizations with other non-NASA government and also been delegated responsibility for developing most corporate organizations. of the various processes intended to satisfy those requirements. The primary QA lessons still to be learned from each of the major accidents, the benchmarking studies, and the 3. PROGRAMMATIC QA LESSONS LEARNED independent return to flight task group findings are summarized in the following subsections. Since JSC is home to the primary NASA human space flight programs, the JSC S&MA organizations that 3.1. Quality Lessons Learned from NASA support the ISS Program also support the other various Accidents human space flight programs, such as the Program and the Crew Exploration Vehicle (CEV) Although “safety deficiencies” always seems to be the Project Office currently being established at JSC. primary culprits blamed by media in the aftermath of a Therefore, programmatic quality assurance and other major accident, a strong case can be made that it was the The Roger’s Commission, which investigated the failure of quality processes and the lack of adherence to Challenger accident, pointed out in its report [4] that: engineering, safety, and other processes that eventually led to major NASA human space flight accidents. For Quality Assurance is closely related to both safety and example, the Phillips Report [2] documenting the reliability. All NASA elements prepare plans and investigation of the fatal Apollo 1 fire cited quality institute procedures to insure that high standards of problems as a major contributor to the accident. The quality are maintained. To accomplish that goal, report charged that principles and procedures of elements charged with responsibility for quality configuration management were not followed, poor assurance establish procedural controls, assess workmanship was evidenced by continual high rates of inspection programs, and participate in problem rejection and rework, and that poor quality was identification and reporting. evidenced by a large number of “correction” Engineering Orders and manufacturing discrepancies. However, the Roger’s Commission concluded that this Consequently, it was determined that process philosophy was not in evidence for the S&MA function deficiencies in these areas led to an inability to (which includes quality), with the following findings: understand the general status of the Apollo hardware at any particular point in time. It was also determined that 1. Reductions in the safety, reliability and quality due to the lack of sufficient quality oversight, assurance work force at Marshall and NASA deficiencies in workmanship went unidentified. Headquarters (HQ) had seriously limited capability Subsequently, in addition to fixing specific hardware in those vital functions. deficiencies that contributed to the fire, NASA 2. Organizational structures at Kennedy and Marshall established a new organization called the Office of had placed safety, reliability, and quality assurance Flight Safety in an attempt to resolve programmatic offices under the supervision of the very deficiencies that let deficient quality go undetected and organizations and activities whose efforts they are unresolved. to check. 3. Problem reporting requirements were not concise The investigation into why an oxygen tank exploded on and failed to get critical information to the proper Apollo 13 en route to the moon uncovered quality levels of management. control problems. Although the investigation report [3] 4. Little or no trend analysis was performed on O- cited that the tank was poorly designed, it also cited Ring erosion and blow-by problems. significant quality issues related to configuration 5. As the flight rate increased, the Marshall safety, management and nonconformance reporting, and that reliability, and quality assurance work force was these quality deficiencies were considered major decreasing, which adversely affected mission contributors to the tank explosion. A primary example success. of the lack of rigor and adherence to process was when 6. Five weeks after the Challenger accident, the the contractor ordered thermostatic switches but failed criticality of the Solid Rocket Motor field joint was to inform its vendor that the design specifications had still not properly documented in the problem been changed from 28V to 65V. The vendor provided reporting system at Marshall. 28V switches that were installed into the oxygen tank. Investigation after the accident determined that during a In addition to fixing specific deficiencies associated Count Down Demonstration Test (CDDT) conducted at with the O-Ring design, NASA established a new the (KSC), these switches failed Associate Administrator for S&MA at NASA to automatically shut off at the nominal high Headquarters and requirements for establishing a NASA temperature set-point of 80F, thereby allowing the tank Problem Reporting and Corrective Action (PRACA) to reach an internal temperature of approximately process. 1000F. The investigation concluded that the resulting damage to the tank wiring from this high temperature In its investigation report [5] of the Columbia accident, likely caused the wiring to fail and act as the primary the CAIB cited numerous instances of quality failures ignition source for the tank explosion during the flight. and noncompliance by NASA management, During the CDDT, several workarounds had to be engineering, and S&MA. A primary case in point implemented to “successfully” complete the test. relates directly to the handling of nonconformances. However, investigation into the anomalies that led to the Just as NASA management came to accept an need for these workarounds was not adequately “allowable degree of erosion” in the O-Rings that led to performed, and as a result, without identifying the root the Challenger accident, NASA management also came cause of the anomalies, failure of the heaters and the to accept foam debris losses and impacts to the Orbiter resulting damage to the tank was not discovered. that violated design requirements as acceptable ‘in- family’ anomalies that represented only a maintenance turnaround problem rather than a safety-of-flight risk. and relies on its quality assurance organization to “audit Although foam shedding was observed multiple times, to requirements” to ensure that those line organizations the resulting in-flight anomalies were closed without comply with requirements and processes. Findings of sufficient root cause analysis and validation of noncompliance by the quality assurance organization corrective actions. Indeed, the significant foam are taken very seriously by the line organizations and all shedding incident that occurred two flights prior to the levels of program management, and closure of the Columbia flight was not documented as an “In-Flight findings cannot be made without concurrence from the Anomaly”, but was instead classified as a lower priority quality assurance organization. When disagreement “action” by the Shuttle Program Requirements Control regarding the finding or its resolution does occur, Board. Consequently, two additional Shuttle flights resolution of the disagreement is often made at a level were authorized (Columbia being the second) without above and external to the program. an acceptable root cause determination and rationale for accepting the foam shedding and impact risk. This philosophy works because the Navy has embedded in its “culture” the belief that that every individual is In addition to implementing specific design and ultimately responsible for rigorous adherence to fabrication changes to the External Tank to requirements and that safety is not achievable without a prevent/minimize foam debris and to detect Orbiter strong and vigorous programmatic quality assurance impacts from debris should they occur, NASA also process. responded to the CAIB findings against cultural and S&MA deficiencies by creating a new and independent 3.3. Final Report of the Return to Flight Task organization called the NASA Engineering & Safety Group Center (NESC) to independently assess and audit technical engineering issues within the human space The Return to Flight Task Group was chartered by the flight programs. NASA Administrator to provide an independent assessment of NASA’s implementation of the return-to- A common thread in each of these accidents is the lack flight recommendations documented by the CAIB. As of emphasis in the quality assurance process. In each part of its assessment report, the task group documented case, there was a lack of rigor in the nonconformance several new observations, two of which are directly process to not only identify the root cause, but also to related to the quality assurance discipline – rigor and fully understand and convey to management the impacts requirements. associated with not identifying or not correcting the root cause. In addition, the quality assurance process lacked Observations made by some members of the task group rigor in its audit, trending, and data mining functions indicate that NASA still lacks sufficient rigor in its (when those functions even existed) that could have quality processes. The task group report defines rigor as identified weaknesses in the nonconformance, “the scrupulous adherence to established standards for engineering, change control, and other configuration conduct of work.” While the safe and reliable execution management processes. Additionally, NASA’s of high-risk, complex technical endeavors requires the response to each of these accidents was to establish a rigorous and consistent understanding of and adherence separate organizational structure intended to have to standard processes, the task group observed a lack of greater and more independent responsibility and rigor that has resulted in adjustable performance authority for safety. standards, and where a “best-effort” to meet the goal or standard without strict adherence to processes and 3.2. Quality Lessons Learned from NASA/NAVY requirements is considered acceptable. Additionally, ad Benchmarking Exchange hoc efforts often results in redundant work, increased costs, and an unclear picture as to what has actually Because the U.S. Navy also manages several highly been accomplished. complex high-risk programs, NASA established a joint benchmarking venture with the Navy to exchange best The second main area of concern identified by the task practices and lessons learned with respect to S&MA group was with respect to NASA’s understanding of policies, processes, accountability, and control requirements. The Task Group stated that the “Space measures. Shuttle Program does not seem to have a basic understanding of what requirements are, what they can The NASA/Navy Benchmarking team concluded [6] do for the program, and what they can do to the that NASA could benefit from establishing a safety and program.” The essence of this issue is captured in the quality philosophy similar to that of the Navy’s following excerpt from the report: SUBSAFE Program. Specifically, the Navy embeds its safety and quality processes within its line organizations “Because of this lack of discipline, the Space Shuttle potentially undiscovered synergistic and cumulative Program experienced instances where flight hardware impacts not identified when each waiver was approved was manufactured, accepted, and manifested prior to is also recognized. However, it is the perspective of the the completion of design reviews and the release of authors that these objectives have not yet been achieved approved engineering documentation. Major testing and that further resolution is required. This is due in and design activities were undertaken without specific part to the multiple processes and different tracking requirements or success criteria. In some cases, the systems which greatly complicates trending and data program simply refused to write down requirements, mining efforts. Furthermore, this review is extremely citing the “work” as more important than difficult because it is not always clear that acceptance of documentation. Lacking specific direction from the a noncompliance to a requirement is being categorized program, working-level personnel proceeded to perform or appropriately reflected as a waiver. Finally, because test, design, and analysis activities based on their best of the large number of waivers processed by the ISS guess of what was required. This resulted in designs Program due to the size, complexity, and uniqueness of that failed to meet the requirements that were ultimately its mission, the task of assessing the cumulative effects written, tests that did not apply to actual environments, of waivers is a monumental task. models based on flawed assumptions, and a general expenditure of resources in an uncoordinated manner.” 4.2. ISS Problem Reporting and Corrective Action Process 4. ANALYSIS OF ISS QUALITY FUNCTIONS Although the ISS S&MA/PR Office is responsible for Three ISS quality functions were chosen by the authors defining the requirements governing the ISS Problem to further demonstrate deficiencies in the NASA culture Reporting and Corrective Action (PRACA) process, regarding programmatic quality assurance. implementation responsibility resides with the ISS Program engineering line organization functions. These 4.1. ISS Waiver Process line organizations have the authority to manage investigation and resolution activities and often continue Since adherence to requirements is considered critical in to operate or change the configuration of understanding the overall quality of a system or service, nonconforming hardware in parallel with the ongoing it is important to understand when requirements have investigation. It is not uncommon in the ISS Program not been satisfied, and to fully document and assess the for a nonconformance report and investigation to remain risk when “waiving” compliance with the requirement is open for several months or even years while the accepted. This is especially important for high-risk degraded hardware or system continues to operate activities, in particular when multiple instances of without formal and documented concurrence from all noncompliance to requirements are accepted and it is members of the investigation team, including the necessary to understand the cumulative and synergistic programmatic quality assurance and safety members on risk associated with accepting multiple deviations from the team. requirements. Another perceived deficiency in the ISS problem In the case of the ISS Program, there are many reporting process is that it tends to focus on hardware processes utilized to accept noncompliance to and software anomalies rather than operational or requirements. These range from waivers/deviations planning requirement violations. There are no Program- processed through the ISS Change Request process to level requirements for operational or process-based Safety Noncompliance Reports processed through the requirement noncompliance to be documented, Safety Review Panel to acceptance of on-orbit investigated, and resolved to the same level of rigor as a nonconformances within Problem Reports approved by hardware or software nonconformance. While these System Problem Resolution Teams. These multiple types of violations are sometimes documented and “waiver” processes and tracking systems in which the investigated by the organizations responsible for the waiver information is stored, along with different activities or functions governed by the noncompliant organizations and approval levels for the waivers make requirements, final closure of the investigation does not it extremely difficult if not impossible for the ISS typically include the same types of personnel (e.g., Program to assess the true risk of not satisfying a safety, programmatic quality assurance) that are requirement. involved in investigating and resolving hardware and software nonconformance. In addition, the ISS Program has recognized the need to periodically assess waivers to determine whether the Because different organizations are responsible for risk accepted is still adequately understood and remains investigating and resolving requirement violations, acceptable. Additionally, the need to identify multiple databases are used by the ISS Program and its support organizations to document the investigation and scope of its support contract to include an ISS Contracts resolution of requirements violations and other audit group. While all of these improvements problems. Because data is distributed amongst these demonstrate a positive trend towards improving the various tools, and because data in one tool often quality assurance discipline, they all remain a work in conflicts with data in another tool for the same hardware progress. or anomaly, it is extremely difficult if not impossible for the ISS Program to perform trending and data mining 5. PROGRAMMATIC QA IMPACTS ON THE activities to identify adverse trends and uncover other SAFETY OF EXPLORATION MISSIONS potential problem areas. As NASA continues to expand its exploration of space 4.3. ISS Audit Process with continued assembly and operation of the ISS and development of vehicles to return to the moon and The ISS audit function was established by the ISS proceed to Mars, failure to improve programmatic S&MA/PR Manager as documented in the Quality quality assurance and resolve the aforementioned Assurance Audit and Surveillance Team Letter of deficiencies has the potential to cause catastrophic Delegation (LOD) [7]. This LOD defines the scope of events similar to the Apollo and Space Shuttle audits to be performed on behalf of the ISS Program. accidents. Unfortunately, the ISS Program and new Specific audits are identified by the ISS QA Manager human exploration of space programs continue to grow who delegates responsibility for performing those audits in complexity, both in terms of the ground support and to the JSC S&MA Directorate for the consolidated ISS flight systems themselves and the increasing contracts. Rather than establish an independent participation and coordination required with other programmatic quality audit function with the authority International Partners and organizations. While the to audit the entire ISS Program, its contractors, and complexity of systems and integration activities is other non-ISS NASA supporting organizations, the increasing, budgets have remained the same in certain LOD limits the scope of the audit authority to a few areas and have shrunk in others, with no significant select contractor organizations. While the ISS turnaround anticipated. As a result, implementing S&MA/PR organization believes the LOD doesn’t limit wholesale organizational structure changes and its ability to audit other Program organizations or significantly modifying existing tools and processes to contractors, there is no documentation beyond the LOD resolve problems – or in creating new tools and to define the Program’s audit expectations or goals. processes altogether -- is often assumed as unaffordable. Making smaller changes over time that result in minor For those audits that have been performed, there improvements without full resolution of the underlying remains no approved ISS process, procedure, or other quality deficiencies, as NASA has historically done in governing documented instructions that define how the past, not only tends to drive costs higher later in a audits are to be selected, performed, documented, Program’s life, but it also tends to significantly increase resolved, and retained. risk by allowing inefficient and ineffective tools and processes to sufficiently degrade quality during the 4.4. Recent Improvements in ISS Quality development of the Program and systems to the point Assurance that these deficiencies cannot be overcome later regardless of how much money and effort is put toward The ISS Program has made some recent improvements resolving the deficiencies. with respect to improving ISS quality assurance. The ISS Program created a senior civil service position Future exploration missions will also tend to be much (Grade 15) for an ISS S&MA Quality Manager to longer in duration than Shuttle missions, such as setting manage the ISS Program’s quality program and up remote stations on the Moon and continuing improvement efforts. Subsequently, the ISS Program missions on the ISS lasting several years. As these has formally chartered [8] a Quality & Product Programs evolve and the inherent turnover in personnel Assurance Panel (QPAP). The JSC S&MA Space occurs throughout these Programs throughout the years, Station Division and the ISS Sustaining Contractor have quality assurance and other processes that do not ensure hired additional staff to support their quality assurance sufficient documentation of requirements, organizations. Until these recent staff increases were investigations, and other critical historical information made, there was not a quality assurance presence on the will make it extremely difficult for future personnel to teams responsible for investigating and dispositioning adequately investigate and sustain those Programs, thus problems reports; prior S&MA/PR representatives increasing the likelihood that a catastrophic event could assigned to the teams were skilled in safety and occur. reliability with little or no training or experience in the quality discipline. JSC S&MA has also expanded the For example, hardware anomalies on future moon and responsibilities. The balkanization and implementation Mars missions will need to be investigated and solved philosophy of S&MA have bred a culture whereby not only while the spacecraft is a long distance from engineers are responsible for design and technical Earth, but often years after the hardware is initially issues, safety personnel are responsible for safety, designed and fabricated. The individuals who designed, reliability personnel are responsible for reliability, and fabricated, and tested the hardware may no longer be quality personnel are responsible for quality. This is available, and the only resources available to the contrary to the lesson learned but not implemented by investigation teams may be documentation of the NASA from the joint-NASA/Navy benchmarking design, fabrication, test, maintenance, and operation exercises that engineers, logistics, operations, and safety history. If the quality and accuracy of this historical personnel are all responsible and know that they’re documentation cannot be relied upon, the ability of the responsible for quality, with quality assurance personnel investigation teams to effectively investigate and being responsible for independently verifying the resolve the anomalies to prevent further impacts and quality of a product or service. In conclusion, poor potentially catastrophic results will be significantly “safety” does not impact quality, but poor “quality” hampered. always can potentially impact safety.

6. CONCLUSION The following sections provide recommendations that, if implemented, the authors believe will significantly While the quality deficiencies discussed in this paper improve quality assurance throughout NASA and focused on the major and very public accidents that therefore improve overall safety and risk. have occurred with NASA’s human space flight Implementation of these recommendations within program, the underlying issues and deficiencies that led NASA would significantly contribute to resolution of to those accidents remain unresolved. Many of these the existing quality deficiencies within NASA’s human issues have been recognized by the ISS Program in its space flight programs and help ensure safe and implementation plan for continuing flight [Ref. 9]. successful future exploration activities of space on However, nearly three years following the Columbia behalf of humankind. accident and subsequent release of the implementation plan, many of these issues have yet to be resolved. 7. RECOMMENDATIONS Failure to adequately address the issues raised within this paper will lead to continued inefficiency, rising 7.1. Defense-In-Depth Approach To Quality costs, and potentially another catastrophic accident. Assurance The perspective of the authors is that while quality process improvements in the ISS Program are ongoing, Responses taken by NASA to create new and more NASA as an Agency and even the ISS Program to a independent organizations responsible for quality and large extent still continue to emphasize “safety” and the safety following each of the major accidents discussed need for an improved “safety function” over quality, previously has not prevented recurrence of significant and that emphasis seems to be based on a failure to fully accidents and anomalies. It is human nature to transfer understand the importance and influence of an effective responsibility for those things for which people have no programmatic quality assurance function to safety. This control over to those who do (in this case quality and conclusion is further supported by the fact that while safety). NASA has established many requirements hardware and software quality assurance, programmatic quality Therefore, rather than strive for an “independent” assurance requirements and expectations have never quality assurance function operated out of NASA been defined by NASA at an Agency level and, in many Headquarters or the NASA Centers separate from cases, the Program level. Program/Project Offices, as many within NASA believe was called for by the CAIB, a better approach would be Because of how NASA has placed an emphasis of to build a “defense-in-depth” approach to quality safety over quality, the NASA “culture” within the assurance. Independence within an organization is human space flight programs has ingrained in itself the possible and, in fact, is widely practiced throughout perception that safety and quality are mutually industry, the Department of Defense (DoD), and the exclusive. Based on observations made by the authors, Department of Energy (DOE). This ‘defense-in-depth’ as well as discussions with various personnel approach would consist of increasing management throughout the ISS Program and at JSC, the perception emphasis and training that stresses quality (and safety is that engineering, logistics, operations, and safety for that matter) as the responsibility of all personnel personnel consider quality to be a function satisfied by (especially engineers), a robust quality assurance the quality organization rather than an inherent part of organization that resides within a Program to assure and how they conduct their day-to-day activities and verify adherence to requirements and processes, and an external oversight quality assurance function (NASA Board) an in-depth assessment of NASA quality Centers and/or Headquarters) that would assess throughout the Agency that, at a minimum, adherence to requirements and the degree for which addresses the following: a) historical and current internal quality assurance is implemented, independent management philosophies on which NASA’s QA from costs and schedule. It is believed that this programs and initiatives are based; b) requirements approach, which was recognized as a strength of the and guidance that NASA has provided to explain its Navy’s SUBSAFE program in the joint-NASA/Navy expectations to NASA/Contractor organizations Benchmarking Exercise, would result in fundamental and personnel; c) quality training programs and changes over time to NASA’s culture that the CAIB initiatives at all levels within the Agency; d) Report clearly indicates is necessary for the long-term NASA/Contractor organizations’ manuals of survival of NASA. practice that are responsive to NASA’s quality requirements and guidance; and e) implementation 7.2. Programmatic Quality Audit/Surveillances of those practices by the NASA/Contractor organizations as they design, procure/fabricate, The second quality assurance improvement strategy that inspect/accept, install, test, and operate/sustain should be implemented by NASA is the establishment equipment serving important mission critical or of an independent quality audit and surveillance effort safety functions. to provide Program Managers and Center Directors with 2. Develop a consistent set of NASA HQ issued an objective assessment of how well a Program is quality requirements applicable to all human space adhering to requirements, the rigor to which its flight programs and projects that include processes are implemented, and the risks associated procurement, manufacturing, test, product with areas where noncompliances are identified. This acceptance, operations, and sustaining activities. would ensure that deficiencies and their resolution are 3. Provide mandatory NASA-wide basic quality elevated to Program Managers with respect to training for all NASA employees to explain basic engineering, operations, safety, reliability, quality, and quality concepts and techniques, as well as, other factors, and that senior management outside of the explaining the “value” of rigorous compliance to Programs is also aware, with an independent requirements and processes. perspective, of how well the Programs are performing, 4. Develop an Agency-wide quality training/ and to make decisions to override a Program manager certification program for NASA quality personnel. when necessary. 5. Establish a skill retention and career path program for quality personnel to ensure that an acceptable Within the context of establishing an independent skill level among civil servants is maintained with quality audit function, NASA human space flight respect to quality and that serving within the quality programs should be required to implement a self-audit field is not a detriment to an individual’s career function. Self-audits are critical to an organization’s path. ability to identify and resolve problem areas and to help 6. Provide NASA HQ requirements and guidance for prepare for external oversight audits. NASA human the establishment of internal and external audit space flight programs are notorious for not performing programs for compliance to requirements and self-audits of their internal processes and requirements processes at all organizational levels. compliance. Quality organizations within a 7. Charter and empower quality assurance boards and Program/Project are often only allowed to audit their panels within all levels of NASA organizations own processes and rarely are they allowed to audit an tasked with strict adherence to requirements and engineering or operations critical process. This processes with no technical stake in the outcome predicament is usually the result of sandbox politics, (e.g., engineering, operations, safety, reliability) to lack of adequate quality audit resources, or both. provide a strong quality assurance presence Consequently, the majority of human space flight throughout the Agency. disciplines (e.g., engineering, operations, and training) 8. Develop standardized programmatic QA processes are never evaluated for process breaks or opportunities and tools across the Agency (e.g., waivers, for improvement by trained quality personnel. PRACA, auditing). 9. Continue quality benchmarking exchange exercises 7.3. Generic Recommendations and require that NASA assessments and implementation plans be developed for lessons The following generic recommendations are provided to learned from these exchanges before a particular help foster an improved NASA “quality culture.” case study can be statused as completed.

1. An external organization should be invited to perform (e.g., Defense Facilities Nuclear Safety 8. REFERENCES

1. Final Report of the Return to Flight Task Group: Assessing the Implementation of the Columbia Accident Investigation Board Return-to-Flight Recommendations, July 2005. 2. The Phillips Report 1965-1966, NASA Historical Reference Collection, NASA History Office, NASA Headquarters, Washington, DC. 3. Report of the Apollo 13 Review Board [Cortwright Commission], June 1970. 4. Report of the Presidential Commission on the Space Shuttle Challenger Accident, June 1986. 5. Columbia Accident Investigation Board Report, Volume 1, August 2003. 6. NASA/Navy Benchmarking Exchange (NNBE), Volume I; Interim Report/December 20, 2002. 7. Quality Assurance Audit and Surveillance Team Letter of Delegation (LOD), OE-04-046, October 27, 2004. 8. Charter for the Quality & Product Assurance Panel (QPAP), ISS-JPD-358, November 29, 2004. 9. NASA’S Implementation Plan for the International Space Station Continuing Flight, Volume 2, Revision 1, January 30, 2004.