Noise Reduction in Solid State Drive (Ssd) System Validation

NOISE REDUCTION IN SOLID STATE DRIVE (SSD) SYSTEM VALIDATION A Project Presented to the faculty of the Department of Electrical and Electronic Engineering California State University, Sacramento Submitted in partial satisfaction of the requirements for the degree of MASTER OF SCIENCE in Electrical and Electronic Engineering by Srishti Gupta FALL 2020 NOISE REDUCTION IN SOLID STATE DRIVE (SSD) SYSTEM VALIDATION A Project by Srishti Gupta Approved by: _________________________________, Committee Chair Dr. Praveen Meduri __________________________________, Second Reader Dr. Preetham Kumar _________________________ Date ii Student: Srishti Gupta I certify that this student has met the requirements for format contained in the University format manual, and this project is suitable for electronic submission to the library and credit is to be awarded for the project. __________________________, Graduate Coordinator ___________________ Dr. Preetham B. Kumar Date Department of Electrical and Electronic Engineering iii Abstract of NOISE REDUCTION IN SOLID STATE DRIVE (SSD) SYSTEM VALIDATION by Srishti Gupta The process of SSD development includes several stages from design to release. Once the hardware and software for the drive has been implemented, a complete testing on the actual system is critical to test the functionality and performance of the SSD in the real world. Validation systems involve a platform like datacenter servers, PCIe bus, operating system, Quarch, network switches, and software tools. The major challenge with this kind of validation is the fact that the many environmental components can lead to significant noise being introduced. Since, the system validation stage is one of the last phases before the release of a product, there is a strong emphasis on debugging the issues at high a velocity. Hence, considering the complexity of the system and the fixed timelines to deliver with quality, noise reduction is crucial. This project analysis the various noise parameters at the system level, tools to evaluate the noise and methods to minimize it. _______________________, Committee Chair Dr. Praveen Meduri _______________________ Date iv TABLE OF CONTENTS Page List of Tables .................................................................................................................... vii List of Figures .................................................................................................................. viii Chapter 1. INTRODUCTION ...........................................................................................................1 2. SYSTEM DESIGN AND COMPONENTS ....................................................................2 2.1 Overall System Design ............................................................................................. 2 2.2 Hardware Components ............................................................................................. 3 2.2.1 Torridon System............................................................................................. 3 2.2.2 Server Platform .............................................................................................. 6 2.2.3 PCIe Switches ................................................................................................ 9 2.2.4 Network Infrastructure ................................................................................. 10 2.3 Software Components ............................................................................................. 12 2.3.1 Operating Systems ....................................................................................... 12 2.3.2 Test Framework ........................................................................................... 14 2.3.3 Flexible I/O Tester ....................................................................................... 16 2.3.4 Medusa Labs Test Tools Suite ..................................................................... 18 2.3.5 NVMe CLI ................................................................................................... 21 2.3.6 PCIMEM ...................................................................................................... 23 2.3.7 Link Training Status and State Machine ...................................................... 25 v 3. DATA COLLECTION AND ANALYSIS ....................................................................27 3.1 JIRA Tools for Data Collection .............................................................................. 27 3.2 Noise Categorization ............................................................................................... 28 4. FACTORS CONTRIBUTING TO NOISE ...................................................................29 4.1 Key Hot Plug and Hot Swap Issues ........................................................................ 29 4.2 Linux Kernel Crash Events ..................................................................................... 34 4.3 Windows Blue Screen of Death .............................................................................. 39 4.4 Unexpected Shutdown Due to Network Instability ................................................ 45 5. CONCLUSION ..............................................................................................................48 References ..........................................................................................................................50 vi LIST OF TABLES Tables Page 1. Differences between Pain and Maim ............. .………………………………. 19 2. NVMe CLI Commands ....................................... ……………………………. 22 3. PCIe Error Messages.................. ………….…………………………………. 26 vii LIST OF FIGURES Figures Page 1. 28-Port Quarch Controller .................................................................................. 4 2. Flex Cable and Quarch........................................................................................ 4 3. Quarch Connection to the Drive ......................................................................... 4 4. Complete Testing Set Up Using the Torridon System........................................ 5 5. Intel Server Board S2600WF Components......................................................... 6 6. M.2 SSD Connectors on the Server Board ......................................................... 7 7. Onboard OcuLink Connectors ............................................................................ 8 8. NVMe Error Handling Using VMD ................................................................... 8 9. Broadcom Gen 4 Switch Topology..................................................................... 9 10. Data Center Network Topology ........................................................................ 10 11. cat/proc/cpuinfo Output .................................................................................... 13 12. /proc/iomem Output .......................................................................................... 14 13. PCIe Bus Connects to Controller and to the Namespaces ................................ 15 14. Accessing NVMe by Creating PCIe and Controller Object ............................. 16 15. Admin Command Format ................................................................................. 16 16. FIO Output ........................................................................................................ 18 17. Medusa Logging Output ................................................................................... 20 viii 18. NVMe CLI Smart Log Output .......................................................................... 23 19. PCIMEM Command Syntax ............................................................................. 24 20. LTSSM States ................................................................................................... 25 21. Hot Plug Scope Capture .................................................................................... 31 22. Effect of Pin Bounce During a Pull Event ........................................................ 31 23. Effect of Pin Bounce During Hot Plug ............................................................. 32 24. Interposer Layout .............................................................................................. 33 25. Linux Crash Utility ........................................................................................... 37 26. Log Command to Display Message Buffer ...................................................... 38 27. Windows BSOD with Bug Check Code ........................................................... 39 28. Bug Check Code Reference .............................................................................. 41 29. Bug Check Code Parameter Details ................................................................. .42 30. Bug Analysis in Debug Mode ........................................................................... 42 31. SFC Command to Check for File System Errors .............................................. 45 32. Redundancy Network Group ............................................................................ 47 ix 1 CHAPTER 1 INTRODUCTION With the advent of modern computing which involves processing large quantities of data for implementing technologies such as big data and cloud computing there is a high demand for reliable storage techniques. SSDs provide a more scalable solution to the storage demand as compared to Hard Disk Drives (HDD). Less number of moving parts

Load more