Venti Analysis and Memventi Implementation
Total Page:16
File Type:pdf, Size:1020Kb
Master’s thesis Venti analysis and memventi implementation Designing a trace-based simulator and implementing a venti with in-memory index Mechiel Lukkien [email protected] August 8, 2007 Committee: Faculty of EEMCS prof. dr. Sape J. Mullender DIES, Distributed and Embedded Systems ir. J. Scholten University of Twente ir. P.G. Jansen Enschede, The Netherlands 2 Abstract [The next page has a Dutch summary] Venti is a write-once content-addressed archival storage system, storing its data on magnetic disks: each data block is addressed by its 20-byte SHA-1 hash (called score). This project initially aimed to design and implement a trace-based simula- tor matching Venti behaviour closely enough to be able to use it to determine good configuration parameters (such as cache sizes), and for testing new opti- misations. A simplistic simulator has been implemented, but it does not model Venti behaviour accurately enough for its intended goal, nor is it polished enough for use. Modelled behaviour is inaccurate because the advanced optimisations of Venti have not been implemented in the simulator. However, implementation suggestions for these optimisations are presented. In the process of designing the simulator, the Venti source code has been investigated, the optimisations have been documented, and disk and Venti per- formance have been measured. This allowed for recommendations about per- formance, even without a simulator. Beside magnetic disks, also flash memory and the upcoming mems-based storage devices have been investigated for use with Venti; they may be usable in the near future, but require explicit support. The focus of this project has shifted towards designing and implementing memventi, an alternative implementation of the venti protocol. Memventi keeps a small part of the scores in main memory for lookups (whereas Venti keeps them on separate index disks). Memventi therefore does not scale to high storage capacity, but is much easier to set up, much simpler in design, and has good performance. 4 Abstract [The remainder of this report is written in English.] Venti is een programma dat blokken data opslaat op magnetische harde schijven. Een eenmaal geschreven blok data kan nooit meer verwijderd of over- schreven worden; het kan uitsluitend worden geadresseerd aan de hand van de 20-byte SHA-1 hash (genaamd de score) van de data. In eerste instantie was het doel van dit project het ontwerpen en imple- menteren van een trace-based simulator die Venti’s gedrag nauwkeurig genoeg simuleert om gebruikt te kunnen worden voor het bepalen van configuratiepa- rameters zoals groottes van de diverse caches, maar ook voor het testen van nieuwe optimalisaties. De ge¨ımplementeerde simulator is echter niet nauwkeurig genoeg om hiervoor te kunnen worden gebruikt, en is ook niet genoeg door- ontwikkeld voor normaal gebruik. Het gemodelleerde gedrag wijkt te veel af voornamelijk doordat de geavanceerde optimalisaties die in Venti aanwezig zijn niet in de simulator zijn ge¨ımplementeerd. Wel worden ontwerpen voor, en opmerkingen over mogelijke simulatorimplementaties van deze optimalisaties vermeld. Tijdens het ontwerpen van de simulator is de broncode van Venti gelezen, zijn de aanwezige optimalisaties gedocumenteerd, en zijn benchmarks van harde schrijven en van Venti uitgevoerd. Met het inzicht dat hiermee verkregen is, kun- nen ook zonder simulator aanbevelingen worden gedaan over goed presenterende Venti configuraties. Behalve magnetische harde schijven zijn ook flash geheugen en de nog in ontwikkeling zijnde mems-schijven onderzocht voor gebruik met Venti. Ze zullen in de toekomst te gebruiken zijn, maar hebben dan expliciete ondersteuning nodig in Venti. De focus van dit project is verschoven naar het ontwerpen en implementeren van memventi, een alternatieve implementatie van het venti protocol. Memventi houdt van elke aanwezige score een klein deel van de 20 bytes in het geheugen, precies genoeg om de kans op dubbelen hierin klein genoeg te houden (Venti zelf bewaart alle scores op index schijven). Memventi gebruikt relatief weinig geheugen, maar schaalt niet naar de hoge opslagcapaciteit waar Venti wel naar schaalt. Memventi is echter veel eenvoudiger in ontwerp en om in gebruik te nemen, en levert goede prestaties. 6 Preface I would like to thank everyone who has helped me with my master’s project, both on technical and non-technical level. This master’s thesis has taken me longer to finish than I (and others) had hoped, but I am content with the results. I am very thankful to Sape Mullender for supervising this project, helping with technical issues and being patient and helpful from start to finish. The same goes for Axel Belinfante, our regular progress talks have been very useful on the technical level and have helped keeping this thesis on track. I hope you enjoy reading this thesis. I am continuing development of tools related to Venti, so please direct further questions or discussion to my e-mail address, [email protected]. Mechiel Lukkien Enschede, July 11th 2007 8 Contents 1 Introduction 13 1.1 Protocol ................................ 15 1.2 Hash trees ............................... 15 1.3 Performance requirements ...................... 18 1.4 Report ................................. 20 2 Venti design 21 2.1 Index sections, Arenas ........................ 21 2.2 Terminology .............................. 22 2.3 Optimisations ............................. 25 2.3.1 Disk block cache, dcache ................... 25 2.3.2 Index entry cache, icache .................. 26 2.3.3 Lump cache, lcache ...................... 26 2.3.4 Bloom filter .......................... 27 2.3.5 Queued writes ........................ 27 2.3.6 Lump compression ...................... 27 2.3.7 Disk block read-ahead .................... 28 2.3.8 Lump read-ahead ....................... 28 2.3.9 Prefetch index entries .................... 28 2.3.10 Sequential writes of index entries .............. 28 2.3.11 Sequential writes of disk blocks ............... 28 2.3.12 Opportunistic hash collision checking ............ 29 2.3.13 Scheduler ........................... 29 2.3.14 Zero-copy packets ...................... 29 3 Venti clients 31 3.1 Vac ................................... 31 3.2 Fossil .................................. 34 3.3 Conclusion .............................. 35 4 Disks & performance 37 4.1 Disk internals ............................. 37 4.1.1 Implications for Venti .................... 39 4.2 Operating system disk handling ................... 40 4.3 Testing hard disks .......................... 40 4.4 SCSI disk results ........................... 42 4.5 IDE disk results ............................ 44 4.6 Conclusion .............................. 45 9 4.7 Alternatives to magnetic disks .................... 46 4.7.1 Compact-flash memory ................... 46 4.7.2 MEMS-based storage ..................... 50 5 Venti performance 53 5.1 Basic Venti performance ....................... 53 5.1.1 Analysis ............................ 54 5.2 SHA-1 performance .......................... 55 5.3 Whack performance ......................... 55 6 Venti simulator 57 6.1 Design ................................. 58 6.2 Trace files ............................... 60 6.3 Vsim and vtrace ........................... 62 6.4 Future work .............................. 63 6.4.1 Bloom filter .......................... 63 6.4.2 Index entry prefetch from arena directory ......... 63 6.4.3 Better disk model and multiple disks ............ 64 6.5 Conclusions .............................. 64 7 Memventi design & implementation 67 7.1 Storing only a part of the score ................... 68 7.2 Implementation ............................ 70 7.2.1 Features ............................ 71 7.2.2 Data integrity ......................... 73 7.2.3 Differences with design .................... 74 7.2.4 Performance ......................... 75 7.2.5 Problems ........................... 79 7.2.6 Future work .......................... 79 7.2.7 Conclusion .......................... 82 8 Conclusions 83 9 Future work 87 Bibliography 93 A Test setup 95 B Disk tests 97 B.1 SCSI disk ............................... 97 B.2 IDE disk ................................ 97 C Basic Venti tests 103 C.1 sametest ................................ 103 D Memventi tests 105 E Compact flash tests 107 E.1 IDE to flash adaptor ......................... 107 E.2 USB flash memory card reader ................... 107 10 F Technical documentation 111 F.1 Pseudo code for handling venti operations ............. 111 G Tools 115 G.1 ptio.c ................................. 115 G.2 test-sha1.c ............................... 120 G.3 test-whack.c .............................. 121 11 12 Chapter 1 Introduction Venti [1, 2, 3] is a write-once content-addressed archival block storage server. It is a user-space daemon communicating with software clients over tcp connec- tions. The blocks stored can be up to 56kb in size. Once stored, blocks can never be removed. The address of the data is the 160-bits SHA-1 hash (called a score) of that data. SHA-1 is a cryptographic secure hash, the implications of which are explained later in this section. Data blocks are stored on magnetic disk allowing for fast random access. In essence, Venti provides the following two functions: 1. write(data, type) → score 2. read(score, type) → data Venti only stores data, and retrieves it later on request. It does not interpret the data, nor does it need to. The type parameter to the read and write functions is part of the address, and used mostly for convenience,