Achieving Improved Reliability with Failure Analysis Bhanu Sood

Achieving Improved Reliability with Failure Analysis Bhanu Sood

Achieving Improved Reliability with Failure Analysis Bhanu Sood Reliability and Risk Assessment Branch NASA Goddard Space Flight Center [email protected] IPC High Reliability Forum, Linthicum, MD May 15, 2018 Disclaimer The material herein is presented “for guidance only”. We do not warrant the accuracy of the information set out on this presentation. It may contain technical inaccuracies or errors and/or non-updated data. Information may be changed or updated without notice. IPC High Reliability Forum, Linthicum, MD 2 May 15, 2018 [email protected] PDC Outline Section 1: What is reliability and root cause? Section 2: Overview of failure mechanisms Section 3: Failure analysis techniques – Non-destructive analysis techniques – Destructive analysis – Materials characterization Section 4: Summary and closure Discussions and case studies of actual failures and subsequent analysis. IPC High Reliability Forum, Linthicum, MD 3 May 15, 2018 [email protected] IPC High Reliability Forum, Linthicum, MD 4 May 15, 2018 [email protected] NASA GSFC One World-Class Organization What makes Goddard one-of-a-kind? World leader in NASA’s leading science understanding the Sun’s center, with cross- impact on Earth disciplinary, end-to- Executing NASA’s end capabilities most complex missions Communications backbone – 98% Independent Verification and Validation Facility of NASA’s data 1 of 2 US routes for assures NASA’s most is transmitted ISS cargo; 1 of 4 US complex software orbital launch via Goddard functions as planned infrastructure facilities 5 IPC High Reliability Forum, Linthicum, MD 5 May 15, 20182 [email protected] Who We Are THE GODDARD COMMUNITY More than 10,000 People Technicians and Others 6% Clerical 5% GSFC Workforce 3,000+ Civil Servants 6,000+ Contractors 1,000s of Others* Professional & Administrative 28% Scientists & Engineers 61% The Nation’s largest community of scientists, engineers, and technologists *Including off-site contractors, interns, and Emeritus IPC High Reliability Forum, Linthicum, MD 6 May 15, 2018 [email protected] GSFC: A Diverse Mission Portfolio QuikSCAT Wind Voyager EO-1 ERBS Stereo ACRIMSAT RHESSI TOMS TRMM Geotail Aqua ICESat-2 SOHO THEMIS TIMED Landsat 7 TOPEX TRACE CALIPSO Terra ACE GRACE SORCE Cluster POES SDO IMAGE Aura AIM FAST GOES Polar GPM MMS Solar-B IBEX NPP Aquarius CloudSat TDRSS LDCM Osiris-Rex (Sample Return) RBSP WMAP TWINS Cassini (Instrument) NuSTAR JWST HST GALEX New Horizons WISE Messenger Spitzer SWAS Astro-H RXTE Galileo IUE Juno MAVEN Integral Pioneer EUVE FUSE COBE Mars Science Swift Laboratory Fermi Compton GRO LADEE LRO 7 IPC High Reliability Forum, Linthicum, MD 7 May 15, 2018 [email protected] What is Reliability? Reliability is the ability of a product to properly function, within specified performance limits, for a specified period of time, under the life cycle application conditions – Within specified performance limits: A product must function within certain tolerances in order to be reliable. – For a specified period of time: A product has a useful life during which it is expected to function within specifications. – Under the life cycle application conditions: Reliability is dependent on the product’s life cycle operational and environmental conditions. IPC High Reliability Forum, Linthicum, MD 8 May 15, 2018 [email protected] Physics of Failure Perspective of Reliability Failure Distribution (Weibull) Hyper-exponential f(t) Reliability statisticians are interested in tracking <1 > 1 infant mortality system level failure data during the service life for wearout exponential logistical purposes, and in determining how the =1 hazard rate curve looks. “random” failures time •PoF reliability engineers are interested in ‘Defective’ Nominal understanding and controlling the individual failures f(t) population population that cause the curve. & •PoF engineers do so through systematic and detailed random assessment of overstress •influence of hardware configuration and life-cycle events stresses… •on root-cause failure mechanisms… •in the materials at potential failure sites. time IPC High Reliability Forum, Linthicum, MD 9 May 15, 2018 [email protected] Influence of ‘Durability’ and ‘Quality’ on ‘Reliability’ Overstress failures: (stress-strength interference) stress margin 50 strength strength stress stress pdf 0.1 pdf change in failures change in failures mean variance Changes in durability Changes in quality Wearout failures: (damage-endurance interference) Cumulative Distribution Function Cumulative increase 1 life damage endurance 0.8 margin 0.6 t50 0.4 pdf F(t) t0.1 0.2 desired life 0 time (t) IPC High Reliability Forum, Linthicum, MD 10 May 15,1 2018 [email protected] 0 When a Product Fails, There Are Costs . • To the Manufacturer o Time-to-market can increase o Warranty costs can increase o Market share can decrease. Failures can stain the reputation of a company, and deter new customers. o Claims for damages caused by product failure can increase • To the Customer o Personal injury o Loss of mission, service or capacity o Cost of repair or replacement o Indirect costs, such as increase in insurance, damage to reputation, loss of market share IPC High Reliability Forum, Linthicum, MD 11 May 15, 2018 [email protected] Failure Definitions Failure A product no longer performs the function for which it was intended Failure Mode The effect by which a failure is observed. Failure Site The location of the failure. Failure Mechanism The physical, chemical, thermodynamic or other process that results in failure. Failure Model Quantitative relationship between lifetime or probability of failure and loads Load Application/environmental condition needed (electrical, thermal, mechanical, chemical...) to precipitate a failure mechanism. IPC High Reliability Forum, Linthicum, MD 12 May 15, 2018 [email protected] Classification of Failures • It is helpful to distinguish between two key classes of failure mechanism: – overstress: use conditions exceed strength of materials; often sudden and catastrophic – wearout: accumulation of damage with extended usage or repeated stress • It is also helpful to recognize early life failures: – infant mortality: failures occurring early in expected life; should be eliminated through process control, part selection and management, and quality improvement procedures IPC High Reliability Forum, Linthicum, MD 13 May 15, 2018 [email protected] Failure Mechanism Identification Overstress Mechanisms Wearout Mechanisms Fatigue, Yield, Fracture, Mechanical Creep, Wear Mechanical Interfacial de-adhesion Stress driven diffusion voiding (SDDV) Glass transition (Tg) Thermal Thermal Phase transition TDDB, Electromigration, Dielectric breakdown, Surface charge Electrical overstress, Electrical spreading, Hot electrons, Electrical Electrostatic discharge, CAF, Slow trapping Second breakdown Radiation embrittlement, Radiation Charge trapping in oxides Radiation Single event upset Corrosion, Chemical Dendrite growth, Contamination Depolymerization, Chemical Intermetallic Growth IPC High Reliability Forum, Linthicum, MD 14 May 15, 2018 [email protected] Cost of a Single Unplanned Data Center Outage Across 16 Industries The average cost of data center downtime across industries was approximately $5,600 per minute. Ref: Ponemon Inst., “Calculating the Cost of Data Center Outages,” Feb. 1, 2011. IPC High Reliability Forum, Linthicum, MD 15 May 15, 2018 [email protected] Iceberg Model of Cost of Poor Quality Ref: A. Buthmann, “Cost of Quality: Not Only Failure Costs,” iSixSigma. IPC High Reliability Forum, Linthicum, MD 16 May 15, 2018 [email protected] Failure Analysis IPC High Reliability Forum, Linthicum, MD 17 May 15, 2018 [email protected] What Causes Products to Fail? Generally, failures do not “just happen.” Failures may arise during any of the following stages of a product’s life cycle: The damage (failure mode) may not be detected until a later phase of the life cycle. IPC High Reliability Forum, Linthicum, MD 18 May 15, 2018 [email protected] What is Root Cause Analysis? Root Cause analysis has four major objectives: • Verify that a failure occurred; • Determine the symptom or the apparent way a part has failed (the mode); • Determine the mechanism and root cause of the failure; • Recommend corrective and preventative action. While generally synonymous, “Failure analysis” is commonly understood to include all of this except determination of root cause. IPC High Reliability Forum, Linthicum, MD 19 May 15, 2018 [email protected] What is a Root Cause? The root cause is the most basic causal factor or factors that, if corrected or removed, will prevent the recurrence of the situation.* The purpose of determining the root cause(s) is to fix the problem at its most basic source so it doesn’t occur again, even in other products, as opposed to merely fixing a failure symptom. Identifying root causes is the key to preventing similar occurrences in the future. Ref: ABS Group, Inc., Root Cause Analysis Handbook, A Guide to Effective Incident Investigation, ABS Group, Inc., Risk & Reliability Division, Rockville MD. IPC High Reliability Forum, Linthicum, MD 20 May 15, 2018 [email protected] Root Cause Analysis is Different from Troubleshooting • Troubleshooting is generally employed to eliminate a symptom in a given product, or to identify a failed component in order to effect a repair. • Root cause analysis is dedicated to finding the fundamental reason why the problem occurred in the first place, to prevent future failures. IPC

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    234 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us