Xtreemfs - a Distributed and Replicated Cloud File System

Xtreemfs - a Distributed and Replicated Cloud File System

XtreemFS - a distributed and replicated cloud file system Michael Berlin Zuse Institute Berlin DESY Computing Seminar, 16.05.2011 Who we are – Zuse Institute Berlin – operates the HLRN supercomputer (#63+64) – Research in Computer Science and Mathematics – Parallel and Distributed Systems Group – lead by Prof. Alexander Reinefeld (Humboldt University) – Distributed and failure-tolerant storage systems Who we are Michael Berlin PhD student since 03/2011 studied Informatik at Humboldt Universität zu Berlin Diplom thesis dealt with XtreemFS currently working on the XtreemFS client 3 Motivation Problem: Multiple copies of data Where? Copy complete? Different versions? internal external PC Cluster Cluster Nodes Nodes local internal external file server Cluster Cluster storage storage 4 Motivation (2) Problem: Different access interfaces Laptop external PC via 3G/Wi-Fi Cluster SCP Nodes VPN+?/ NFS/ SSHFS <parallel Samba file system> local external file server Cluster storage 5 Motivation (3) XtreemFS goals: Transparency Availability Laptop internal external PC via 3G/Wi-Fi Cluster Cluster Nodes Nodes XtreemFS 6 File Systems Landscape 7 Outline 1. XtreemFS Architecture 2. Client Interfaces 3. Read-Only Replication 4. Read-Write Replication 5. Metadata Replication 6. Customization through Policies 7. Security 8. Use Case: Mosgrid 9. Snapshots 8 XtreemFS Architecture (1) Volume on a Metadata Server: provides hierarchical namespace File Content on Storage servers: accessed directly by clients internal PC Cluster Nodes local internal file server Cluster storage 9 XtreemFS Architecture (2) Metadata and Replica Catalog (MRC): – holds volumes Object Storage Devices (OSDs): – file content split into objects – objects can be striped across OSDs object-based file system architecture 10 Scalability File I/O Throughput parallel I/O: scales with number of OSDs READ Storage Capacity add and removal of OSDs possible OSDs may be used by multiple volumes Metadata Throughput limited by MRC hardware use many volumes spread over WRITE multiple MRCs 11 Accessing Components Directory Service (DIR) central registry all servers (MRC, OSD) register there with their id provides: list of available volumes mapping id URL to service list of available OSDs 12 Client Interfaces XtreemFS supports POSIX interface and semantics mount.xtreemfs: using FUSE runs on Linux, FreeBSD, OS X and Windows (Dokan) libxtreemfs for Java and C++ Laptop internal external PC via 3G/WiFi Cluster Cluster Nodes Nodes mount.xtreemfs mount.xtreemfs mount.xtreemfs XtreemFS 13 Read-Only Replication Requirement: Mark file as read-only Replica types: a. Full replica: requires complete copy external b. Partial replica: Cluster fills itself on demand Nodes instantly ready to use internal Cluster storage external Cluster storage 14 Read-Only Replication (2) 15 Read-Only Replication (3) Receiver-initiated transfer at object level OSDs exchange object lists – Filling strategies: Fetch objects – in order – rarest first – Prefetching available – On-Close Replication: automatic replica creation 16 Read-Write Replication Availability Data safety Allow Modifications PC local internal file server Cluster storage important.cpp important.cpp 17 Read-Write Replication (2) Primary/Backup: 18 Read-Write Replication (3) Primary/Backup: 1. Lease Acquisition at most one valid lease per file revocation = lease timeout 19 Read-Write Replication (4) Primary/Backup: 1. Lease Acquisition at most one valid lease per file revocation = lease timeout 2. Data Dissemination 20 Read-Write Replication (5) Central Lock Service Flease Lease Acquisition XtreemFS: Flease scalable majority-based Data Dissemination Update Strategies: Write All, Read 1 Write Quorum, Read Quorum 21 Metadata Replication Primary/backup replication volume = database transparently replicate database use leases to elect primary replicate insert/update/delete Database = Key/Value Store own implementation: BabuDB 22 Customization through Policies external Cluster Example: Nodes Which replica shall the client select? determined by policies internal Cluster ??? storage external Cluster storage Policies: – Authentication – Authorization – UID/GID mappings – Replica placement – Replica selection 23 Customization through Policies (2) Replica Placement/Selection Policies: filter / sort / group replica list available default policies: FQDN-based datacenter map MRC Vivaldi (latency estimation) open() external can be chained Cluster Nodes sorted replica list own policies possible (Java) node1.ext-cluster internal Cluster storage external Cluster storage osd1.int-cluster osd1.ext-cluster 24 Security X.509 certificates support for authentication SSL to encrypt communication Laptop external via 3G/Wi-Fi Cluster Nodes mount.xtreemfs mount.xtreemfs w/ user certificate w/ host certificate XtreemFS 25 Use case: Mosgrid Mosgrid: ease running experiments in computational chemistry use grid resources through a web portal portal allows to submit and retrieve compute jobs XtreemFS: global data repository 26 Use case: Mosgrid (2) Browser Cluster Nodes PC Submit Job Retrieve Results Input Data Results Unicore Web Portal Frontend mount.xtreemfs w/ user certificate libxtreemfs (Java) mount.xtreemfs w/ host certificate XtreemFS XtreemFS scope Berlin Köln Dresden 27 Snapshots Backups needed in case of accidental deletion/modification virus infections Snapshot stable image of the file system at a given point in time PC unlink(“important.cpp“) local internal file server Cluster storage important.cpp important.cpp 28 Snapshots (2) MRC: create snapshot if requested OSDs: Copy-on-Write on modify: create new object instead of overwriting on delete: only mark as deleted snapshot() write("file.txt“) write("file.txt“) t0 t file.txt: V1, t1 file.txt: V2, t2 29 Snapshots (3) No exact global time: Loosely synchronized clocks assumption: maximum drift ε Time span-based snapshots snapshot() write("file.txt“) write("file.txt“) t write("file.txt“) t - ε 0 t + ε 0 0 t file.txt: V1, t1 file.txt: V2, t2 file.txt: V2, t2 30 Snapshots (4) OSDs: limit number of versions not version-on-every-write Instead: close-to-open problem: client sends no explicit close implicit close: create new version if last write at least X seconds ago Cleanup tool: deletes versions which belong to no snapshot Snapshots on directory level possible 31 Future Research Self-Tuning Quota support Data de-duplication Hierarchical Storage Management 32 XtreemFS Software Open source: www.xtreemfs.org Development: 5 core developers at ZIB integration tests for quality assurance Community: users and bug reporters mailing list with 102 subscribers Release 1.3: Experimental support for read/write replication and snapshots 33 Thank You! References: http://www.xtreemfs.org/publications.php www.contrail-project.eu The Contrail project is supported by funding under the Seventh Framework Programme of the European Commission: ICT, Internet of Services, Software and Virtualization. GA nr.: FP7-ICT-257438. 34 .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    34 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us