Virtually Persistent Data

Virtually Persistent Data

Virtually Persistent Data Andrew Warfield XenSource Quick Overview • Blktap driver overview/update – Performance vs. Safety – Architecture – Tapdisks • Block consistency and live migration – Problem, current solution, future solution Why BlkTap? • It turns out that performance isn’t the only requirement for VM storage. • Other requirements: – Correctness – Managability (e.g. file-based disks) – Crazier things (CoW/encryption/network) • Doing these things at the Linux block layer is tricky. Example: blkback + loopback Dom0 DomU blkback blkfront disk Example 1: Normal block device operation. • Go through safety concerns, mention fix in loopback2, also mention qos/block shed issues. Example: blkback + loopback Dom0 DomU blkback blkfront disk Example 1: Normal block device operation. • Blkback preserves the semantics of the physical disk. • DomU is notified of completion once data is really written. Example: blkback + loopback Dom0 DomU OOM Killer nfs loop blkback blkfront Example 2: Loopback and NFS • New loopback driver should fix this. • Associating requests with a process has qos/sheduling benefits. Architecture Blktapctrl tapdisk tapdisk Each tapdisk process handles requests for a VBD. Blktap Blkfront Blkfront Dom0 DomU DomU • Blktap handles IDC • Tapdisk implements a virtual block device – Synchronous, AIO, QCoW, VMDK, Shared mem • Zero-copy throughout • Tapdisks are isolated processes Current tapdisks • Individual tapdisk drivers may struct tap_disk be implemented as plugins. tapdisk_aio = { • Generally a few hundred lines "tapdisk_aio", of code. sizeof(struct • Very similar to qemu’s block tdaio_state), plugins, but with an tdaio_open, asynchronous interface. tdaio_queue_read, • Currently have asynchronous raw (device or image file), tdaio_queue_write, QCoW, VMDK, and shared- tdaio_submit, memory disks. tdaio_get_fd, tdaio_close, tdaio_do_callbacks, }; Blktap Performance CoW/Image formats • Many image formats now exist for block devices – QCoW, VMDK, VHD • Work a lot like page tables: mapping tree for block address resolution – Typically use a bitmap at the bottom level • Ensuring consistency adds overhead – Metadata is dependent on data – Must be written before acknowledging request to DomU • We are currently working with a modified version of the QCoW format – 4K blocks, all-at-once extent allocation, bitmaps. Migration Consistency Dom0 W DomU Dom0 W W W blkfront W W Host A Host B • Drivers use request shadow rings to handle migration • Unacked requests are reissued on arrival • Slight risk of write-after-write hazard. • Now fixed, more optimal plan for later. What’s next? • Copy on Write • Live snapshots • Maybe some network-based/distributed storage. end Performance(2).

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    14 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us