<<

About SSD

Dongjun Shin Samsung Electronics Outline

 SSD primer  Optimal I/O for SSD  Benchmarking FS on SSD  Case study: , ,  Design consideration for SSD  What’s next?  New interfaces for SSD  Parallel processing of small I/O SSD Primer (1/2)

 Physical unit of

 PageNAND – unit for read & write

 BlockNAND – unit for erase (a.k.a erasable )  Physical characteristics  Erase before re­write  Sequential write within an erasable block

LBA space (visible to OS)

Flash Translation Layer

Flash memory space

NAND page (2­4kB) NAND block = 64­128 NAND pages SSD Primer (2/2)

 Internal organization: 2­dimensional (NxM parallelism)  Similar to RAID­0 (stripe size = sector or pageNAND)  Effective page & block size is multiplied by NxM (max)

0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 Ch0 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92

Chip0 Chip1 Chip2 Chip3

1 5 9 13 17 21 25 29 Ch1 33 37 41 45 49 53 57 61 SSD Host I/F Controller N­channel running (ex. SATA) 2 6 10 14 18 22 26 30 (striping) F/W(FTL) Ch2 34 38 42 46 50 54 58 62

3 7 10 15 Ch3 35 39 43 47

M­way (pipelining) Optimal I/O for SSD

 Key points  Parallelism • The larger the size of I/O request, the better  Match with physical characteristics • Alignment with page or block size of NAND* • Segmented sequential write (within an erasable block)

 What about Linux?  HDD also favors larger I/O  read­ahead, deferred aggregated write  Segmented FS layout  good if aligned with erasable block boundary  Write optimization  FS dependent (ex. allocation policy)

* Usually, partition layout is not aligned (1st partition at LBA 63) Test environment (1/2)

 Hardware  Intel Core 2 Duo [email protected], 1GB RAM  Software  Fedora 7 (Kernel 2.6.24)  Benchmark: postmark  Filesystems  No journaling ­  Journaling ­ , ext4, , xfs • ext3, ext4: data=writeback,barrier=1[,extents] • xfs: logbsize=128k  COW, log­structured ­ btrfs (latest unstable, 4k block), (testing­8)  SSD  Vendor M (32GB, SATA): read 100MB/s, write 80MB/s  Test partition starts at LBA 16384 (8MB, aligned) Test environment (2/2)

 Postmark workload  Ref: Evaluating Block­level Optimization through the IO Path (USENIX 2007) Workload File size # of file # of Total app (work­set) transaction read/write SS 9­15K 10,000 100,000 630M/755M* SL 9­15K 100,000 100,000 600M/1.8G LS 0.1­3M 1,000 10,000 9.7G/12G LL 0.1­3M 4,250 10,000 9G/17G * Mostly write­only Benchmark results (1/2)

 Small file size (SS, SL)

2500

2000

1500

1000 transaction/sec

500

0 SS SL

ext2 ext3 ext4 reiserfs xfs btrfs nilfs Benchmark results (2/2)

 Large file size (LS, LL)

30

25

20

15 transaction/sec 10

5

0 LS LL

ext2 ext3 ext4 reiserfs xfs btrfs nilfs I/O statistics (1/2)

 Average size of I/O

140

120

100

80

60 Avg I/O size (Kbytes) 40

20

0 SS SL LS LL SS SL LS LL read write

ext2 ext3 ext4 reiserfs xfs btrfs nilfs I/O statistics (2/2)

 Segmented sequentiality of write I/O (segment: 1MB)

20.00% 100% 100% 100% 100% 18.00%

16.00%

14.00%

12.00%

10.00%

8.00%

6.00%

4.00%

2.00%

0.00% SS SL LS LL

ext2 ext3 ext4 reiserfs xfs btrfs nilfs Case study ­ ext4

 Condition  data=ordered, allocation: default/noreservation/oldalloc

1200

1000

800 1. Almost no difference between allocation policies 600 2. Why data=ordered is transaction/sec 400 better for SL?

200

0 SS SL

ext4­wb ext4­ord ext4­nores ext4­olda Case study ­ btrfs

 Condition  Block size: 4k/16k, allocation: ssd option on/off

1800

1600

1400 1. 4k is better than 16k 1200 (sequentiality = 12% : 2%) 1000 2. ssd option is effective 800

transaction/sec (10­40% improvement) 600

400

200

0 SS SL LS LL

btrfs­4k btrfs­16k btrfs­ssd­4k Case study ­ xfs

 Condition  Mount with barrier on/off

800

700 Large barrier overhead... 600

500

400

transaction/sec 300

200

100

0 SS SL LS LL

xfs­bar xfs­nobar Design consideration for SSD

 Lessons from flash FS (ex. )  Sequential writing at multiple logging points  Wandering tree • Trace­off between sequentiality vs. amount of write • Cf. space map (Sun ZFS)  Need to optimize garbage collection overhead • Either FS itself or FTL in SSD

 Next topic: End­to­end optimization  Exchange info with SSD (, SSD identification)  Make best use of parallelism New interfaces for SSD (t13.org)

 Trim command  Let device know which LBA range is not used • This will be helpful for optimizing FTL  Should be passed through: FS  bio  scsi  libata • Passing bio with no data • What about I/O reordering & I/O queuing?  SSD identification (added to “ATA identify”)

 Report size of page and erasable block • Physical or effective?  Useful for FS and manager Parallel processing of small I/O

 Make better use of I/O queuing (TCQ or NCQ)  Parallel processing of small I/O  Desktop environment? Barrier?

request A B C D A B C D queue 1 2 3 4 1 2 A Ch0 A Ch0 B,C Ch1 B,C Ch1

Ch2 Ch2

D Ch3 D Ch3

without I/O queuing, 4 steps with I/O queuing, 2 steps

chip is busy chip is idle Summary

 Optimization for SSD  Alignment is important  Segmented sequentiality  Make better use of parallelism (either small or large) • I/O barrier may stall the pipelined processing

 What can you do?  : alignment, allocation policy, design (ex. COW)  Block layer: bio w/ hint, barrier, I/O queueing, scheduler(?)  Volume manager: alignment, allocation  Virtual memory: read­ahead References

 T13 spec for SSD  http://www.t13.org/documents/UploadedDocuments/docs2007/e07153r0­Soild_State_Drive_Identify_Proposal.doc  http://www.t13.org/documents/UploadedDocuments/docs2007/e07154r0­Notification_for_Deleted_Data_Proposal_for_ATA­ACS2.doc  Introduction to SSD and flash memory  http://download.microsoft.com/download/a/f/d/afdfd50d­6eb9­425e­84e1­b4085a80e34e/WNS­T432_WH07.pptx  http://download.microsoft.com/download/d/f/6/df6accd5­4bf2­4984­8285­f4f23b7b1f37/WinHEC2007_Micron_NAND_FlashMemory.doc  http://download.microsoft.com/download/a/f/d/afdfd50d­6eb9­425e­84e1­b4085a80e34e/SS­S486_WH07.pptx  FTL description & optimization  BPLRU: A Buffer Management Scheme for Improving Random Writes in Flash Storage (FAST ’08) Appendix. I/O Pattern

 SS workload – ext4, xfs Appendix. I/O Pattern

 SS workload – btrfs, nilfs Appendix. I/O Pattern

 SL workload – ext4, xfs Appendix. I/O Pattern

 SL workload – btrfs, nilfs Appendix. I/O Pattern

 LS workload – ext4, reiserfs, xfs Appendix. I/O Pattern

 LS workload – btrfs, nilfs Appendix. I/O Pattern

 LL workload – ext4, reiserfs, xfs Appendix. I/O Pattern

 LL workload – btrfs, nilfs