PERSES: Data Layout for Low Impact Failures

Total Page:16

File Type:pdf, Size:1020Kb

PERSES: Data Layout for Low Impact Failures UC Santa Cruz UC Santa Cruz Previously Published Works Title PERSES: Data layout for low impact failures Permalink https://escholarship.org/uc/item/35h4t5dj Journal Proceedings - IEEE Computer Society's Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, MASCOTS, 2015- February(February) ISSN 1526-7539 ISBN 9781479956104 Authors Wildani, A Miller, EL Adams, IF et al. Publication Date 2015-02-05 DOI 10.1109/MASCOTS.2014.17 Peer reviewed eScholarship.org Powered by the California Digital Library University of California 2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems PERSES: Data Layout for Low Impact Failures Avani Wildani∗, Ethan L. Miller†, Ian F. Adams‡, and Darrell D.E. Long† ∗The Salk Institute, †University of California: Santa Cruz, ‡EVault Inc. Abstract—Growth in disk capacity continues to outpace ad- over 2012 [5]. Cloud applications such as web hosting and vances in read speed and device reliability. This has led to storage document sharing maintain SLAs that guarantee low response systems spending increasing amounts of time in a degraded state time to users. In this environment, the cloud provider saves while failed disks reconstruct. Users and applications that do money by isolating failures to, and thus breaking SLAs with, not use the data on the failed or degraded drives are negligibly as few customers as possible. While they experience more impacted by the failure, increasing the perceived performance degradation, the marginal cost of additional slow response time of the system. We leverage this observation with PERSES,a statistical data allocation scheme to reduce the performance is often lower for users once any slowdown has occurred. impact of reconstruction after disk failure. Exacerbating this degradation, many systems sacrifice re- PERSES reduces degradation from the perspective of the user build speed in exchange for increased reliability. These systems by clustering data on disks such that data with high probability of typically place data evenly across disks in order to distribute co-access is placed on the same device as often as possible. Trace- and thus dilute the impact of failure events. When a failure driven simulations show that, by laying out data with PERSES, event does necessitate a rebuild, however, the rebuild can im- we can reduce the perceived time lost due to failure over three pact a wide array of people or projects even though each could years by up to 80% compared to arbitrary allocation. have only lost a very small amount of data. By combining existing reliability measures with selectively laying out data I. INTRODUCTION according to group membership, we can isolate failures so that they impact fewer users or projects, increasing the perceived Over time, failure events are inevitable in large disk-based availability of the system. storage systems [1], [2]. Even in failure events where data is not lost, there is a system-wide cost for accessing data on the Other use cases that are disproportionately affected by failed disk. The failed disk is inaccessible, and other disks and partial data degradation include compilation and select scien- the interconnecting network links may be saturated. This cost tific workloads. Compiling large code bases typically involve is increasing as growth in disk size outpaces growth in disk, multiple dependencies, and developers working with large code CPU, and network bandwidth [3], [4]. While a 100 GB drive bases often have to wait to compile until all of the files they that can be rebuilt at 50 MB/s is fully operational in about half need are available. An unlucky failure can halt the compilation an hour, a terabyte drive at 50 MB/s can take up to six hours to of large swathes of code for want of one file. Similarly, be back at full bandwidth. Additionally, during a rebuild there scientific data analysis can rely on a small number of key is often significant network overhead that can have negative files such as format descriptors or configuration files that slow side effects on disks that are not even involved in the rebuild. the analysis of terabytes of data. According to the Tech Lead of Facebook’s HBase engineering team: “Even losing a single We examine the issue of excessive rebuild time from the block of data incurs a high fixed cost, due to the overhead of perspective of projects: groups of records that are frequently locating and recovering the unavailable data. Therefore, given accessed together. If a failed disk contains even a small amount a fixed amount of unavailable data each year, it is much better of critical data for multiple projects, all of those projects could to have fewer incidents of data loss with more data each than suffer degraded performance until the rebuild is completed. more incidents with less data.” [6] Examples include developers losing a header file that a build depends on, a high performance system that loses an input PERSES is also a good fit for large storage systems where and misses its time slot, or a cloud provider that breaks a a small but constantly shifting subset of the data is in active performance SLA. use. Systems that organically grow around a dynamic set of requirements naturally tend to have a small set of data in active If, however, the failure occurs on a device that is not use and a long tail of accesses across the remainder of the actively being accessed, it has almost no productivity cost: data [7]. These systems resemble archives in that while there it is the proverbial tree fallen in a forest. PERSES is a data are distinct, consistent projects, there is little repeat access for allocation framework designed to decrease the impact of device popularity based allocation to take advantage of. We study failures on the productivity and perceived availability of a such archival-like workloads in this paper to form a baseline storage system. We name our system after PERSES, the Greek for improvement. titan of cleansing destruction, because it focuses destruction in a storage system to a small number of projects so that others PERSES improves availability by laying out project groups adjacently on disk to isolate faults. Maintaining an out-of- may thrive. Our goal with PERSES is to isolate failures away from as many projects as possible. band index of all of the data in a project requires significant administrative overhead. To avoid this, PERSES dynamically Multi-user cloud storage systems, a primary target of calculates associations in historical trace data to identify likely PERSES, are caught in the data deluge. According to the project groups. Since elements of a project are co-located 2013 IDC report, digital content is expected to grow to 4 on disk, any single failure impacts a very small number of zettabytes by the end of 2013, which would be a 48% increase projects. We derive project groups from either trace metadata 1526-7539/14 $31.00 © 2014 IEEE 71 DOI 10.1109/MASCOTS.2014.17 or access patterns. We use a statistical method to calculate an exabyte scale storage system with 99.999% annual data projects in O(n), where n is the number of accesses in the retention will lose data in a decade comparable to what a training set, without the overhead of maintaining groups by petabyte scale storage system with equal retention loses in hand. a century. While the percent remains constant, the value of data does not necessarily decrease as more of it is stored. To demonstrate PERSES, we injected failures into two Therefore, the point at which advanced reliability measures dataset groupings and ran them through a disk simulator with must be taken to ensure availability in a large storage system a variety of layouts based on calculated project groups. We is likely to be sooner than we are accustomed to. Several show that layout under PERSES leads to faults that affect studies have shown that storing data reliably over the archival fewer projects during the rebuild phase than a group-agnostic time scale presents additional challenges, such as accumulated layout. Our fault injection simulation compares the real time latent sector errors and data migrations, to the already difficult lost during disk rebuilds across layouts by measuring project field of storage reliability [10]–[13]. A challenge of this size hours gained, which we define as the delay avoided by all requires combining known techniques of enhancing reliability project groups as a result of the access requests that are delayed with methods optimized for the expected workload and time while disks are rebuilding compared to the reference allocation. scale of archival storage. We found that in our best case, PERSES gained over 4000 Recent work has shown failures remain an issue for large hours of project time during the three year trace compared disk-based storage systems. For example, Schroeder and Gib- to a random allocation and gained over 1000 hours compared son showed that the average disk replacement rates in the field to an ideal temporal allocation. Additionally, we found that are typically between 2 and 4%, implying a high reconstruction increasing the IOPs did not significantly hurt the fault isolation. rate [1]. Other studies have shown that latent sector errors can Finally, we discovered that restricting the minimum group size, lead to drive failure [2]. Pinheiro et al. showed that failures thereby reducing grouping noise, improved the performance are both difficult to predict and common [14]. Scrubbing under PERSES even on smaller disks, with a maximum pro- techniques can efficiently catch these otherwise unobserved ductivity improvement of 94% across large groups. Our main errors so that the disks can be rebuilt promptly, but it is still contributions are: often necessary to rebuild much of the disk [15].
Recommended publications
  • (12) United States Patent (10) Patent No.: US 9,460,782 B2 Seol Et Al
    USOO946O782B2 (12) United States Patent (10) Patent No.: US 9,460,782 B2 Seol et al. (45) Date of Patent: Oct. 4, 2016 (54) METHOD OF OPERATING MEMORY (56) References Cited CONTROLLER AND DEVICES INCLUDING MEMORY CONTROLLER U.S. PATENT DOCUMENTS 8,050,086 B2 11/2011 Shalvi et al. (71) Applicant: SAMSUNGELECTRONICS CO., 8,085,605 B2 12/2011 Yang et al. LTD., Suwon-si, Gyeonggi-do (KR) 2005.0089121 A1* 4/2005 Tsai ...................... HO3M13/41 375,341 (72) Inventors: Chang Kyu Seol, Osan-si (KR); Jun 2011 0145487 A1 6/2011 Haratsch et al. Jin Kong, Yongin-si (KR); Hong Rak 2011 (0289.376 A1 1 1/2011 Maccarrone et al. 2012,0005409 A1 1/2012 Yang Son, Anyang-Si (KR) 2012/0066436 A1 3/2012 Yang (73) Assignee: Samsung Electronics Co., Ltd., Suwon-si, Gyeonggi-do (KR) FOREIGN PATENT DOCUMENTS JP 2011504276 A 2, 2011 (*) Notice: Subject to any disclaimer, the term of this KR 2011.0128852. A 11/2011 patent is extended or adjusted under 35 KR 2012O061214 A 6, 2012 U.S.C. 154(b) by 274 days. * cited by examiner (21) Appl. No.: 14/205,496 Primary Examiner — Idriss N Alrobaye (22) Filed: Mar. 12, 2014 Assistant Examiner — Dayton Lewis-Taylor (65) Prior Publication Data (74) Attorney, Agent, or Firm — Volentine & Whitt, US 2014/0281293 A1 Sep. 18, 2014 PLLC (30) Foreign Application Priority Data (57) ABSTRACT Mar. 15, 2013 (KR) ........................ 10-2013-0O28O21 A method of operating a memory controller includes receiv ing a first data sequence and generating a coset representa (51) Int.
    [Show full text]
  • PERSES: Data Layout for Low Impact Failures
    PERSES:DataLayoutforLowImpact Failures Technical Report UCSC-SSRC-12-06 September 2012 A. Wildani E. L. Miller [email protected] [email protected] I. F. Adams D. D. E. Long [email protected] [email protected] Storage Systems Research Center Baskin School of Engineering University of California, Santa Cruz Santa Cruz, CA 95064 http://www.ssrc.ucsc.edu/ PERSES: Data Layout for Low Impact Failures Abstract If a failed disk contains data for multiple working sets or projects, all of those projects could stall until the rebuild Disk failures remain common, and the speed of recon- is completed. If the failure occurs in a working set that struction has not kept up with the increasing size of disks. is not actively being accessed, it could potentially have Thus, as drives become larger, systems are spending an zero productivity cost to the system users: the proverbial increasing amount of time with one or more failed drives, tree fallen in a forest. potentially resulting in lower performance. However, if To leverage this working set locality, we introduce an application does not use data on the failed drives, PERSES,adataallocationmodeldesignedtodecrease the failure has negligible direct impact on that applica- the impact of device failures on the productivity and per- tion and its users. We formalize this observation with ceived availability of a storage system. PERSES is named PERSES,adataallocationschemetoreducetheperfor- for the titan of cleansing destruction in Greek mythology. mance impact of reconstruction after disk failure. We name our system P ERSES because it focuses destruc- PERSES reduces the length of degradation from the tion in a storage system to a small number of users so reference frame of the user by clustering data on disks that others may thrive.
    [Show full text]
  • MVME8100/MVME8105/MVME8110 Installation and Use P/N: 6806800P25O September 2019
    MVME8100/MVME8105/MVME8110 Installation and Use P/N: 6806800P25O September 2019 © 2019 SMART™ Embedded Computing, Inc. All Rights Reserved. Trademarks The stylized "S" and "SMART" is a registered trademark of SMART Modular Technologies, Inc. and “SMART Embedded Computing” and the SMART Embedded Computing logo are trademarks of SMART Modular Technologies, Inc. All other names and logos referred to are trade names, trademarks, or registered trademarks of their respective owners. These materials are provided by SMART Embedded Computing as a service to its customers and may be used for informational purposes only. Disclaimer* SMART Embedded Computing (SMART EC) assumes no responsibility for errors or omissions in these materials. These materials are provided "AS IS" without warranty of any kind, either expressed or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. SMART EC further does not warrant the accuracy or completeness of the information, text, graphics, links or other items contained within these materials. SMART EC shall not be liable for any special, indirect, incidental, or consequential damages, including without limitation, lost revenues or lost profits, which may result from the use of these materials. SMART EC may make changes to these materials, or to the products described therein, at any time without notice. SMART EC makes no commitment to update the information contained within these materials. Electronic versions of this material may be read online, downloaded for personal use, or referenced in another document as a URL to a SMART EC website. The text itself may not be published commercially in print or electronic form, edited, translated, or otherwise altered without the permission of SMART EC.
    [Show full text]
  • NEC LXFS Brochure
    High Performance Computing NEC LxFS Storage Appliance NEC LxFS-z Storage Appliance NEC LxFS-z Storage Appliance In scientific computing the efficient delivery of data to and from the compute nodes is critical and often chal- lenging to execute. Scientific computing nowadays generates and consumes data in high performance comput- ing or Big Data systems at such speed that turns the storage components into a major bottleneck for scientific computing. Getting maximum performance for applications and data requires a high performance scalable storage solution. Designed specifically for high performance computing, the open source Lustre parallel file system is one of the most powerful and scalable data storage systems currently available. However, the man- aging and monitoring of a complex storage system based on various hardware and software components will add to the burden on storage administrators and researchers. NEC LxFS-z Storage Appliance based on open source Lustre customized by NEC can deliver on the performance and storage capacity needs without adding complexity to the management and monitoring of the system. NEC LxFS-z Storage Appliance is a true software defined storage platform based on open source software. NEC LxFS-z Storage Appliance relies on two pillars: Lustre delivering data to the frontend compute nodes and ZFS being used as filesystem for the backend, all running on reliable NEC hardware. As scientific computing is moving from simulation-driven to data-centric computing data integrity and protec- tion is becoming a major requirement for storage systems. Highest possible data integrity can be achieved by combining the RAID and caching mechanisms of ZFS with the features of Lustre.
    [Show full text]
  • Techniques for Reducing Communication Bottlenecks in Large-Scale Parallel Systems
    1 Approximate Communication: Techniques for Reducing Communication Bottlenecks in Large-Scale Parallel Systems FILIPE BETZEL, KAREN KHATAMIFARD, HARINI SURESH, DAVID J. LILJA, JOHN SARTORI, and ULYA KARPUZCU, University of Minnesota Approximate computing has gained research attention recently as a way to increase energy efficiency and/or performance by exploiting some applications’ intrinsic error resiliency. However, little attention has been given to its potential for tackling the communication bottleneck that remains one of the looming challenges to be tackled for efficient parallelism. This article explores the potential benefits of approximate computing for communication reduction by surveying three promising techniques for approximate communication: com- pression, relaxed synchronization, and value prediction. The techniques are compared based on an evaluation framework composed of communication cost reduction, performance, energy reduction, applicability, over- heads, and output degradation. Comparison results demonstrate that lossy link compression and approximate value prediction show great promise for reducing the communication bottleneck in bandwidth-constrained applications. Meanwhile, relaxed synchronization is found to provide large speedups for select error-tolerant applications, but suffers from limited general applicability and unreliable output degradation guarantees. Finally, this article concludes with several suggestions for future research on approximate communication techniques. CCS Concepts: • Computer systems
    [Show full text]
  • Non-Volatile Flash Memory Characteristics Implementing High-K Blocking Layer
    NON-VOLATILE FLASH MEMORY CHARACTERISTICS IMPLEMENTING HIGH-K BLOCKING LAYER SEUM BIN RAHMAN UNIVERSITI TEKNOLOGI MALAYSIA Replace this page with form PSZ 19:16 (Pind. 1/07), which can be obtained from SPS or your faculty. Replace this page with the Cooperation Declaration form, which can be obtained from SPS or your faculty. This page is OPTIONAL when your research is done in collaboration with other institutions that requires their consent to publish the finding in this document.] NON-VOLATILE FLASH MEMORY CHARACTERISTICS IMPLEMENTING HIGH-K BLOCKING LAYER SEUM BIN RAHMAN A thesis submitted in fulfilment of the requirements for the award of the degree of Master’s In Electrical Engineering Faculty of Electrical Engineering Universiti Teknologi Malaysia JUNE 2018 iii This work is dedicated to my respected parents and UTM authority for having faith on me. iv ACKNOWLEDGEMENT I want to thank DR. NURUL EZAILA BINTI ALIAS for her great guidance of the project. I also thank all my examiners for asking creative questions that helped improving the report. Besides, my gratitude goes to my fellow research group members, specially Muhammad Afiq Nuruddin and Farah Aqilah for their kind and patient support. v ABSTRACT An Erasable Programmable Read Only Memory (EPROM) is a special kind of memory chip, that can retain the memory even when the power is turned off. This type of memory is known as non-volatile memory (NWM) cell. An EPROM, as a non-volatile memory is widely accepted for its excellent reliability and data storage capability for a large scale of time without noticeable data degradation.
    [Show full text]
  • Data Degradation
    Data degradation Data degradation is the gradual corruption of computer data due to an accumulation of non-critical failures in a data storage device. The phenomenon is also known as data decay, data rot or bit rot. Contents Visual example In RAM In storage Component and system failures See also References Visual example Below are several digital images illustrating data degradation. All images consist of 326,272 bits. The original photo is displayed on the left. In the next image to the right, a single bit was changed from 0 to 1. In the next two images, two and three bits were flipped. On Linux systems, the binary difference between files can be revealed using 'cmp' command (e.g. 'cmp -b bitrot-original.jpg bitrot-1bit-changed.jpg'). 0 bits flipped 1 bit flipped 2 bits flipped 3 bits flipped In RAM Data degradation in dynamic random-access memory (DRAM) can occur when the electric charge of a bit in DRAM disperses, possibly altering program code or stored data. DRAM may be altered by cosmic rays or other high-energy particles. Such data degradation is known as a soft error.[1] ECC memory can be used to mitigate this type of data degradation. In storage Data degradation results from the gradual decay of storage media over the course of years or longer. Causes vary by medium: Solid-state media, such as EPROMs, flash memory and other solid-state drives, store data using electrical charges, which can slowly leak away due to imperfect insulation. The chip itself is not affected by this, so reprogramming it once per decade or so prevents decay.
    [Show full text]
  • Your Internet Data Is Rotting Paul Royster University of Nebraska-Lincoln, [email protected]
    University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Faculty Publications, UNL Libraries Libraries at University of Nebraska-Lincoln 5-15-2019 Your internet data is rotting Paul Royster University of Nebraska-Lincoln, [email protected] Follow this and additional works at: https://digitalcommons.unl.edu/libraryscience Part of the Archival Science Commons, Computer and Systems Architecture Commons, Data Storage Systems Commons, Digital Communications and Networking Commons, Scholarly Communication Commons, and the Scholarly Publishing Commons Royster, Paul, "Your internet data is rotting" (2019). Faculty Publications, UNL Libraries. 376. https://digitalcommons.unl.edu/libraryscience/376 This Article is brought to you for free and open access by the Libraries at University of Nebraska-Lincoln at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in Faculty Publications, UNL Libraries by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln. Your internet data is rotting The Conversation May 15, 2019, 6.47am EDT The internet is growing, but old information continues to disappear daily. Paul Royster, Coordinator of Scholarly Communications, University of Nebraska-Lincoln Disclosure statement: Paul Royster does not work for, consult, own shares in or receive funding from any company or organization that would benefit from this article, and has disclosed no rel- evant affiliations beyond their academic appointment. Partners: University of Nebraska-Lincoln provides funding as a member of The Conversation US. Many MySpace users were dismayed to discover earlier this year that the so- cial media platform lost 50 million files uploaded between 2003 and 2015. The failure of MySpace to care for and preserve its users’ content should serve as a reminder that relying on free third-party services can be risky.
    [Show full text]
  • Meeting the Storage Challenge of Exponential Data Growth in Genomics
    SOLUTION BRIEF Meeting the Storage Challenge of Exponential Data Growth in Genomics Highlights The life sciences and healthcare industries are in the midst of a dramatic transformation that will make personalized medicine common-place. Completing the sequencing of the • Budget Constraints. Need affordable high- first human genome in 2003 was a key breakthrough that took more than 10 years and cost density system to implement an enterprise-class over US$3 billion. Since then, the speed at which genomes can be sequenced has more than storage cloud doubled, easily outpacing Moore’s Law. Today’s high-throughput sequencing machines can • Collaboration. Enable research partners across process the human genome in a matter of hours at a cost approaching US$1 thousand. These the country or around the world to share data advancements have allowed researchers to run more sequencing tests in less time, greatly regardless of device platform increasing the pace of scientific discovery. Clinicians, too, have begun leveraging genome • Scalability/Serviceability. Deliver non-disruptive sequencing to more effectively treat patients with medications tailored to the patient’s unique capacity and performance scaling, data repairs genetic makeup. The result has been an explosion in genomic data, and organizations are now and upgrades looking for ways to affordably store and manage this avalanche of data. • Availability Requirements. Ensure easy access Traditional Archival Systems Have Limitations to data regardless of where it is stored, how old it is or even if a data center goes offline Archival systems are often comprised of complex disk and magnetic tape storage subsystems • Complexity Sprawl. Simplify overall deployment organized into performance tiers.
    [Show full text]
  • Silent Corruption: Bit Rot and Its Impact on Digital Archives
    Ashley Aberg MLIS Candidate ‘21 Silent Corruption: Bit Rot and Its Impact on Simmons University Digital Archives WHAT IS BIT ROT? THREE MITIGATION OPTIONS Preservica services are the highest cost option, but provide the most personal service in creating unique systems. Since 2018 they have offered their “low cost option,” a cloud system that can be implemented for only $3,000 a year.¹ That service, however, may be inadequate for some institutions. ReFS (Resilient File System) is a Microsoft owned self healing file system. ReFS was developed for use with Windows 10 and can handle both large file sizes and large numbers of files.² This system is easy to use and intuitive to regular Windows users, with many guides available online to help get the most out of ReFS. This is a cost effective option for 0 bits flipped 1 bit flipped 2 bits flipped “Bitrot in JPEG files” Jim Salter, Data Degradation, CC BY-SA 4.0 institutions already using Windows. Bit rot is the slow loss of digital data What does it mean to be a self healing file ZFS is a fully open source self healing file system, and thus, is a good over time. This occurs spontaneously system? It means the system in place is set option for institutions not using Windows. However, implementing this as digital files sit on a hard drive and to back up on a regular basis and check system requires a certain level of complex digital literacy that may not some bits are flipped or dropped, those backups against previous versions exist in-house at a given institution.³ Guides are available on the internet, compromising the readability of the that are known to be whole.
    [Show full text]
  • ERM- Glossary
    Electronic Records Management Guidelines GLOSSARY Application: A piece of software that performs Backup: A copy of all or portions of software or a function; a computer program. data files on a system kept on storage media, Architecture: The design and specifications of such as tape or disk, or on a separate system so hardware, software, or a combination of that the files can be restored if the original data thereof, especially how those components are deleted or damaged. In information interact. An enterprise-wide architecture is a technology, archive is commonly used as a logically consistent set of principles that guide synonym for backup. the design and development of an Backward-compatible: The ability of a organization’s information systems and software program or piece of hardware to read technology infrastructure. files in previous versions of the software or Archive: To transfer records from the hardware. individual or office of creation to a repository Bit rot: Also known as data degradation, it is authorized to appraise, preserve, and provide the gradual corruption of computer data due to access to those records. Computing — Data an accumulation of non-critical failures in a stored offline and retained for recordkeeping data storage device. Overtime, the bits and purposes; also — A backup that supports bytes (the 1s and 0s) of computer data will restoration of the entire system after a disaster. reverse until the data is no longer Archives: The division within an organization readable. This is another reason digital files responsible for maintaining the organization’s must be migrated and converted at least every records of enduring value; can also refer to 10 years.
    [Show full text]
  • Experienced Miras Anomalies and Mission Impact
    FOS team, 14/02/13 EXPERIENCED MIRAS ANOMALIES AND MISSION IMPACT This note describes briefly the anomalies experienced so far by MIRAS In Flight, and aims to describe the impact they may have both over the data production (i.e. data gaps) and on the possible degradation of the data itself. It is intended for the Data Users to understand references to anomalies in the weekly reports. For further information on the anomalies, please consult the MoM of the Anomaly Review Boards regularly held. CMN UNLOCK WITH WRONG RELOCKING. 2 CMN UNLOCK WITH PROPER RELOCKING 3 FALSE TEMPERATURE ACQUISITIONS BY CMN-6 (B1) 4 MASS MEMORY PARTITION LATCHUP 5 AUTONOMOUS CCU RESETS 6 PLATFORM MANOEUVRES 7 MASS MEMORY DATA GAPS 8 MASS MEMORY DOUBLE BIT ERRORS 9 1 FOS team, 14/02/13 CMN Unlock with wrong relocking. Description This anomaly was impacting shortly after launch the A2 CMN of the redundant Arm, known before launch to be a lazy one. It consisted in the CMN local oscillator moving away from the common frequency, and not regaining after the event a common frequency to the rest of the instrument CMN’s. It has not been seen since January 2010, once the arm was moved to the nominal branch. Geolocation The anomaly tends to occur at high or low latitudes, or over the South Atlantic Anomaly. Likelihood Before arm swap to nominal, on the 11th of January 2010, the event happened 5 times in 40 days, thus making it quite likely. It has not been seen again in any other CMN.
    [Show full text]