SpeculativeSpeculative EncryptionEncryption onon GPUGPU AppliedApplied toto CryptographicCryptographic FileFile SystemsSystems

Vandeir Eduardo1,2, Wagner M. Nunan Zola1, and Luis C. Erpen de Bona1

1 Federal University of Paraná 2 University of Blumenau AgendaAgenda

➢ Introduction and motivation ➢ Rationale: Cryptographic File Systems (CFSs), CBC and CTR mode, EncFS (user space) and GPU library WAESlib ➢ CTR encryption mode applied to CFSs (in file system EncFS++ ) Generation and storage of nonces ➢ Spawning parallel encryption tasks in EncFS++ (Challenges in organization and management of encryption contexts) ➢ Experimental Performance Analysis EncFS++ ➢ Conclusions

2/31 IntroductionIntroduction andand motivationmotivation

➢ Security in data storage: especially in the era of computing in the cloud. ➢ Natural evolution: integration of encryption in File Systems: FSs → CFSs ➢ Use of symmetric block ciphers (good security/speed ratio) ➢ Problems: Larger data volumes + faster media + alternative ciphers + larger keys = increase in CPU utilization

3/31 MotivationMotivation (cont.)

➢ Wanted: Using parallel processors for the task (e.g with GPUs) (or with multicore processors) ➢ Previous study of acceleration of AES in GPU GPU kernel WAES and WAESlib: exploring CTR mode → defines priorities for generation of encryption masks ➢ Current work: “Explore advantages of CTR mode in the context of CFSs”, with parallel multicore or manycore processors:

- using GPU cryptographic functions (current work) - get higher throughput with more efficient CPU usage - extend to other accelerators, multicore or heterogeneous (future work) 4/31 Cryptographic FILE Systems

➢ Integrated at different system levels:

①11 User Space: FUSE-based CFSs ②22 Kernel Space: CFSs ↔ VFS ③33 Kernel Space: Cryptographic Systems ↔ I / O Blocks

22 User space 33 Kernel space User space eCryptfseCryptfs dm-cryptdm-crypt Kernel space ApplicationApplication Storage VFSVFS FileFile system system DeviceDevice mapper mapper BlockBlock I/O I/O Storage devicedevice EncFSEncFS FUSEFUSE 11 libfuselibfuse

5/31 usually: CBC mode of operation

EncryptionEncryption Clear text 1 Clear text 2 Clear text N Clear text 1 Clear text 2 Clear text N IV ➢ IV Detailed in NIST Cipher textN-1 Cipher textN-1 Key Key Key document SP 800-38A Key Key Key Encrypt Encrypt Encrypt Encrypt Encrypt ... Encrypt (AES) (AES) ... (AES) (AES) (AES) (AES)

Cipher text 1 Cipher text 2 Cipher text N ➢ Sequential encryption Cipher text 1 Cipher text 2 Cipher text N (data dependency) DecryptionDecryption Cipher text 1 Cipher text 2 Cipher text N Cipher text 1 Cipher text 2 Cipher text N Key Key Key Key Key Key ➢ Decrypt Decrypt Decrypt Security requirement: Decrypt Decrypt Decrypt (AES) (AES) ... (AES) (AES) (AES) ... (AES) IV IV Cipher textN-1 necessary to use an Cipher textN-1

“unpredictable” Clear text 1 Clear text 2 Clear text N Clear text 1 Clear text 2 Clear text N Initialization Vector (IV)

6/31 Wanted: work with CTR Mode

➢ Parallelizable EncryptionEncryption Counter: 1 Counter: 2 Counter: N Counter: 1 Counter: 2 Counter: N ➢ Key Key Key Possibility of encryption Key Key Key Encrypt Encrypt Encrypt Encrypt Encrypt ... Encrypt (AES) (AES) ... (AES) Anticipation (of (AES) (AES) (AES)

encryption masks) Clear text 1 Clear text 2 Clear text N Clear text 1 Clear text 2 Clear text N

Cipher text 1 Cipher text 2 Cipher text N ➢ Security requirement: Cipher text 1 Cipher text 2 Cipher text N (uniqueness requirement) DecryptionDecryption Counter: 1 Counter: 2 Counter: N Counter: 1 Counter: 2 Counter: N Key Key Key necessary to use a given Key Key Key Encrypt Encrypt Encrypt Encrypt Encrypt ... Encrypt (AES) (AES) ... (AES) (key, IV) pair only once (AES) (AES) (AES)

at any encryption Cipher text 1 Cipher text 2 Cipher text N Cipher text 1 Cipher text 2 Cipher text N

Clear text 1 Clear text 2 Clear text N ➢ IV is called “Nonce” Clear text 1 Clear text 2 Clear text N 7/31 EncFS file system: (some Features)

➢based on FUSE → works in user space → facilitates development / testing → allows easier GPU library Integration in EncFS++ ➢ CUDA API and libfuse are in user space ➢ IF using kernel space FS module: needed an intermediate process to use CUDA API (+ complexity, + latency)

8/31 EncFS Features

→ based on FUSE / space user facilitates development / testing → uses OpenSSL (CPU) FileFile formatformat → file content encrypted in data blocks → uses CBC for each data block IVAIVA DataData DataData ...... DataData vK = Key HeaderHeader BlockBlock 00 BlockBlock 11 BlockBlock nn IV use unpredictability requirement: IVV = Volume IV IVV = Volume IV * data block IV calculated dynamically with IVA = File IV encryption hash (no need to store) * reusable in block rewriting IVB = data Block IV

data Block IV (IVB) = HMAC_CTX (vK, IVV || (NumBlock ⊕ IVA))

9/31 GPUGPU EncryptionEncryption AccelerationAcceleration

➢ Extensively studied: for varous symmetric ciphers such as AES, Blowfish, IDEA, Camellia, etc. ➢ Related work: acceleration of cryptographic functions in some applications: → User space: Engine-CUDA, CrystalGPU, CRSFS → kernel space: OCF, Gdev, GPUStore ➢ Usually: using CBC+GPU → usually only compensates for larger requests (> 16 KiB) ➢ Applied to CFSs: no previous work have exploited the benefits of CTR mode 10/31 WhyWhy CTR?CTR? ➢ CTR Mode: → parallelizable → allows speculative encryption (creation encryption masks ahead of time) → XOR on CPU (avoids CPU → GPU data transfer) → As safe as CBC ➢ Previous library available in previous work: WAESlib → Reduces GPU processing complexity → Aggregation of small (4 KiB) contexts : ∙ fewer WAES kernel activations ∙ higher throughput (GPU → CPU) ∙ more control in the order of production of masks (with priorities) 11/31 ChallengesChallenges ofof usingusing CTRCTR inin CFSsCFSs

➢ Each recording and rewriting of a block requires a new nonce (due to: the uniqueness requirement) ➢ Problem: Necessary to store a nonce per block (same unique nonce used in encryption is necessary in decryption) ➢ Overhead of nonce storage could negatively impact CFS performance Nonce storage format AND Access mechanism AND granularity are important for performance

12/31 Nnodes: how nonce Nodes are stored in EncFS++

11 22 Nonce Nonce Nonce CFS Global Number of Occupation map ... Counter nnodes used Node 0 Node 1 Node n Nonce nodes file format (loaded when CFS 128 bits 32 bits 524.128 bits 260 260 is mounted) bytes bytes 64 KiB Nonce Nonce node format 11 22 Value obtained from CFS CTR Inode Nonce Nonce Nonce internal ... Global Counter counter number 0 1 15

128 bits 32 bits 16 bytes 16 bytes bits reserved for CTR internal counter

Nonce Nonce Nonce Exclusive nonces file format ... Group 0 Group 1 Group N Only for files > 64 KiB (16 * 4 KiB) (loaded when file is opened)

4096 bytes 4096 bytes (256 nonces) (256 nonces)

13/31 Challenges in using speculative encryption in CFSs

➢ Managing encryption contexts → How to organize the encryption contexts within the FS application? → How to use these contexts in the different CFS operations? ➢ When is the best time to trigger the generation of encryption masks (define contexts)? ➢ How to take advantage of the priority feature?

14/31 Write Context pool: maintained for encryption+writing

CFS Global Counter value: 0 CFS Global Counter value: 256

0 256 512 768 1024 1280 1536 1792 256 512 768 1024 1280 1536 1792 2048 0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 0 Nonces used Pool beginning Nonce used in Pool beginning indicator in ahead of indicator production of a new (next mask to be consumed) time masks (next mask to be mask after a mask production Context indexes consumed) consumption Before a block encryption After aa blockblock encryptionencryption encrypt+write op ➢ Used for sequential and random writing (only one write context POOL needed per CFS) ➢ Contexts initially defined at CFS mount operation ➢ Contexts in this POOL are redefined as masks are consumed (uses lower priority) ➢ Implemented as a virtual circular queue (no storage → performance) 15/31 Context pool for decryption / read (seq.)

0 1 2 3 4 5 6 7 8 9 n File blocks Contexts 0 256 512 768 1024 1280 1536 1792 Context indexes n Contexts (“virtua”l) with nonces 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8 9 New masks Masks being 256 512 768 1024 1280 1536 1792 2048 being consumed 1 2 3 4 5 6 7 0 produced 0 1 2 3 4 5 6 7 8 9

512 768 1024 1280 1536 1792 2048 2304 2 3 4 5 6 7 0 1

Window move direction rotation on “indexes” (modulo window size) ➢ Used for sequential and random reading (1 per file) ➢ Contexts initially defined in each file open operation (decreasing priority according to position) ➢ Contexts redefined as masks are consumed (uses lower priority)

16/31 Context pool for decryption / read (random)

→ TotalTotal windowwindow displacementdisplacement restarart all contexts in pool (hygher speculation overhead) (x-y)>z z New window Old window

9472 9728 9984 10240 10496 10752 11008 11264 12032 12288 12544 12800 13056 13312 13568 13824 5 6 7 0 1 2 3 4 7 0 1 2 3 4 5 6 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

y New start position x Old start position Window move direction

Reused masks New masks produced PartialPartial WindowWindow ShiftShift

(y-x)<=z 10496 10752 11008 11264 11520 11776 12032 12288 New window 1 2 3 4 5 6 7 0 Old window 9728 9984 10240 10496 10752 11008 11264 11520 6 7 0 1 2 3 4 5 Use of 36 37 38 39 40 41 42 43 44 45 46 47 48 49 Use of x y priorities!priorities! Old start position New start position Window move direction 17/31 PerformancePerformance AnalysisAnalysis

➢Performance comparisons between EncFS (CBC), eCryptfs (CBC) and EncFS++ (CPU / GPU) and AESNI in eCryptFS (kernel mode)

➢Microbenchmark with measuring flow in sequential and random read and write operations (requests: 4, 64 and 128 KiB in a big 16 GiB file)

➢Macrobenchmark using filebench workloads with variation in number of threads (fileserver.f and webserver.f)

kernel 4.10.0, Intel Core i7-7700HQ at 2.8 GHz (fixed frequency), 32 GiB RAM, SSD disk (≈500 MB / s) and ramdisk (/ run / shm), libfuse 2.9.4, OpenSSL 1.0 .2g, WAESlib 2.01g0, base FS, NVIDIA GeForce GTX 1070 mobile (Pascal Architecture)

18/31 MicrobenchmarkMicrobenchmark (sequencial(sequencial ReadRead +Decrypt))

SequencialSequencial Read+Decrypt Read+Decrypt (SSD) (SSD) SequencialSequencial Read+Decrypt Read+Decrypt (Memory) (Memory) 500 53 1600 400 ) 500 53 ) 1600 400 ) ) % %

) 1400 350 ( ( 53 ) % %

) 1400 350 s ( ( )

53 s

/ 400

n n / s

400 s

/ 1200 300 n n o o / B B i 52 i 1200 300 t t o o B B i 52 i M t t a M a (

1000 250 i i ( M

300 M a a r r

( 1000 250 i i t 52 ( t

300 r r a a

t

u 52 t u 800 200 a a v v u p

u p 800 200 v v t 51 t p h

p 200 h t t u 51 u 600 150 h g h 200 g u u p p 600 150 g u g u p p h

51 h u o u o 400 100 h h

r 51 g g r o

100 o 400 100 r g g h u u r 100 50 h h u u o o

T 200 50 h T r 50 r o o

T 200 50 T r r h h h 0 50 h 0 0 T 0 50 T 0 0 T 4 64 128 T 4 64 128 4 Request size64 (KiB) 128 4 Request size64 (KiB) 128 Request size (KiB) Request size (KiB) EncFS (CBC, CPU) EncFS++ (CTR, GPU) EncFS (CBC, CPU) EncFS++ (CTR, GPU) EncFS (CBC, CPU) EncFS++ (CTR, GPU) EncFS (CBC, CPU) EncFS++ (CTR, GPU) EncFS++/EncFS EncFS++/EncFS EncFS++/EncFS EncFS++/EncFS

SequencialSequencial Read+Decrypt Read+Decrypt (SSD) (SSD) SequencialSequencial Read+Decrypt Read+Decrypt (Memory) (Memory) EncFS EncFS++ EncFS EncFS++ EncFS EncFS++ EncFS++/EncFS Req. EncFS EncFS++ EncFS++/EncFS Req.Req. (CBC, CPU) (CTR, GPU) EncFS++/EncFS Req. (CBC, CPU) (CTR, GPU) EncFS++/EncFS (CBC, CPU) (CTR, GPU) Size (CBC, CPU) (CTR, GPU) SizeSize Size Thrput CPU Thrput CPU Thrput CPU use (KiB) Thrput CPU Thrput CPU Thrput CPU use (KiB)(KiB) Thrput CPU Thrput CPU Thrput CPU use (KiB) Thrput CPU Thrput CPU Thrput CPU use (MB/s)(MB/s) (%)(%) (MB/s)(MB/s) (%)(%) Var.Var. (%) (%) efficiencyefficiency (MB/s)(MB/s) (%)(%) (MB/s)(MB/s) (%)(%) Var.Var. (%) (%) efficiencyefficiency 4 323.25 12.20 714.82 21.76 121.14 1.24 44 278.47278.47 11.3211.32 419.66419.66 14.7114.71 50.7050.70 1.161.16 4 323.25 12.20 714.82 21.76 121.14 1.24 64 322.91 12.20 1,355.20 23.90 319.68 2.14 6464 297.04297.04 12.1012.10 453.64453.64 11.5411.54 52.7252.72 1.601.60 64 322.91 12.20 1,355.20 23.90 319.68 2.14 128 323.03 12.20 1,485.59 23.90 359.90 2.35 128128 297.69297.69 12.1112.11 454.20454.20 11.5411.54 52.5852.58 1.601.60 128 323.03 12.20 1,485.59 23.90 359.90 2.35

19/31 MicrobenchmarkMicrobenchmark (sequencial(sequencial writewrite +encrypt))

Sequencial Write+Encrypt (SSD) SequencialSequencial Write+Encrypt Write+Encrypt (Memory) (Memory) 800 Sequencial Write+Encrypt (SSD) 300 1000 350 ) 800 300 ) 1000 350 ) ) % %

) 700 ( (

) 300 % %

250

) 700 s ( ( )

s 300

/ 800

n n

250 / s

s 800 / 600 n n o o B / i 600 i B 250 t t o o B B i 200 i 250 M t t a a M (

500 i 200 i ( M

600 M a a r r

( t 500 i 200 i ( t

600 r r a a

t u 200 t 400 150 u a a v v u p

u 400 150 p v v t t 150 p h

p h 400 t t u u 150

h 300 g h g 400 u u p 300 100 p g u g u p p h 100 h 100 u o u 200 o

r h 100 h g g r o

200 o 200 r h g g u u r 50 h 200 50 h u u o o T 100 50 h T r r 50 o o

T 100 T r r h h h 0 0 h 0 0 T 0 0 T 0 0 4 64 128 T 4 64 128 T 4 Request size64 (KiB) 128 4 Request size64 (KiB) 128 Request size (KiB) Request size (KiB) EncFS (CBC, CPU) EncFS++ (CTR, GPU) EncFS (CBC, CPU) EncFS++ (CTR, GPU) EncFS (CBC, CPU) EncFS++ (CTR, GPU) EncFS (CBC, CPU) EncFS++ (CTR, GPU) EncFS++/EncFS EncFS++/EncFS EncFS++/EncFS EncFS++/EncFS Sequencial Write+Encrypt (Memory) SequencialSequencial Write+Encrypt Write+Encrypt (SSD) (SSD) Sequencial Write+Encrypt (Memory) EncFS EncFS++ EncFS EncFS++ EncFS EncFS++ EncFS++/EncFS Req. EncFS EncFS++ EncFS++/EncFS Req.Req. (CBC, CPU) (CTR, GPU) EncFS++/EncFS Req. (CBC, CPU) (CTR, GPU) EncFS++/EncFS Size (CBC, CPU) (CTR, GPU) Size (CBC, CPU) (CTR, GPU) Size Size Thrput CPU Thrput CPU Thrput CPU use (KiB) ThrputThrput CPUCPU ThrputThrput CPUCPU ThrputThrput CPUCPU use use (KiB) Thrput CPU Thrput CPU Thrput CPU use (KiB) (KiB) (MB/s) (%) (MB/s) (%) Var. (%) efficiency (MB/s)(MB/s) (%)(%) (MB/s)(MB/s) (%)(%) Var.Var. (%) (%) efficiencyefficiency (MB/s) (%) (MB/s) (%) Var. (%) efficiency 4 102.88 10.42 188.11 12.19 82.83 1.56 44 97.4397.43 10.4110.41 179.86179.86 11.7511.75 84.6184.61 1.641.64 4 102.88 10.42 188.11 12.19 82.83 1.56 64 214.71 11.00 806.92 18.59 275.82 2.22 6464 203.79203.79 10.5510.55 650.71650.71 14.1614.16 219.30219.30 2.382.38 64 214.71 11.00 806.92 18.59 275.82 2.22 128 211.56 10.43 897.25 18.94 324.12 2.33 128128 199.87199.87 9.939.93 738.43738.43 11.7511.75 269.44269.44 3.123.12 128 211.56 10.43 897.25 18.94 324.12 2.33

20/31 MicrobenchmarkMicrobenchmark (random(random readread +decrypt))

RandomRandom Read+Decrypt Read+Decrypt (SSD) (SSD) RandomRandom Read+Decrypt Read+Decrypt (Memory) (Memory) 160 20 800 150 ) 160 20 ) 800 150 ) ) % %

) 140 700 ( ( ) % %

) 140 700 s ( ( ) s 100 /

n n / s

s 100 / 120 15 600 n n o o / B 120 15 i B 600 i t t o o B B i i M t t a M a ( i

100 i 500 ( M

50 M a a r r

(

100 i 500 i t ( t

50 r r a a

t u t 80 10 u 400 a a v v u p

u 80 10 p 400 v v t t p h

p h 0 t t 60 u 300 u h g h g 0 u u p 60 p 300 g u g u p p h h u o u 40 5 o 200 h h r g g r o

40 5 o 200 -50 r g g h u u r h -50 h u u o o

T 20 100 h T r r o o

T 20 100 T r r h h h 0 0 h 0 -100 0 0 T 0 -100 T 4 64 128 T 4 64 128 T 4 Request size64 (KiB) 128 4 Request size64 (KiB) 128 Request size (KiB) Request size (KiB) EncFS (CBC, CPU) EncFS++ (CTR, GPU) EncFS (CBC, CPU) EncFS++ (CTR, GPU) EncFS (CBC, CPU) EncFS++ (CTR, GPU) EncFS (CBC, CPU) EncFS++ (CTR, GPU) EncFS++/EncFS EncFS++/EncFS EncFS++/EncFS EncFS++/EncFS Random Read+Decrypt (Memory) RandomRandom Read+Decrypt Read+Decrypt (SSD) (SSD) Random Read+Decrypt (Memory) EncFS EncFS++ EncFS EncFS++ EncFS EncFS++ EncFS++/EncFS Req. EncFS EncFS++ EncFS++/EncFS Req.Req. (CBC, CPU) (CTR, GPU) EncFS++/EncFS Req. (CBC, CPU) (CTR, GPU) EncFS++/EncFS Size (CBC, CPU) (CTR, GPU) Size (CBC, CPU) (CTR, GPU) Size Size Thrput CPU Thrput CPU Thrput CPU use (KiB) ThrputThrput CPUCPU ThrputThrput CPUCPU ThrputThrput CPUCPU use use (KiB) Thrput CPU Thrput CPU Thrput CPU use (KiB) (KiB) (MB/s) (%) (MB/s) (%) Var. (%) efficiency (MB/s)(MB/s) (%)(%) (MB/s)(MB/s) (%)(%) Var.Var. (%) (%) efficiencyefficiency (MB/s) (%) (MB/s) (%) Var. (%) efficiency 4 166.68 10.37 59.72 8.30 -64.17 0.45 44 17.8017.80 1.931.93 18.7118.71 3.753.75 5.145.14 0.540.54 4 166.68 10.37 59.72 8.30 -64.17 0.45 64 290.74 11.40 541.54 17.12 86.26 1.24 6464 96.4896.48 4.374.37 107.95107.95 5.355.35 11.8811.88 0.910.91 64 290.74 11.40 541.54 17.12 86.26 1.24 128 297.11 11.45 684.10 17.37 130.25 1.52 128128 120.17120.17 5.345.34 142.50142.50 5.385.38 18.5718.57 1.181.18 128 297.11 11.45 684.10 17.37 130.25 1.52

21/31 MicrobenchmarkMicrobenchmark (random(random writewrite +encrypt))

RandomRandom Write+Encrypt Write+Encrypt (SSD) (SSD) RandomRandom Write+Encrypt Write+Encrypt (Memory) (Memory) 800 300 1000 350 800 300 ) 1000 350 ) ) ) % %

) 700 ( (

) 300 % % 250

) 700 s ( ( ) s 300

/ 800

n n

250 / s s

/ 600 800 n n o o B / i B 600 250 i t t o o B B i 200 i 250 M t t a M a (

500 i 200 i ( M

600 a M a r r

( t 500 i 200 i ( t

600 r r a a

t u 200 t 400 150 u a a v v u p

u 400 150 p v v t 150 t p h

p h 400 t t u 300 150 u h g h g 400 u u p 300 100 p g u g u p p h 100 100 h u o u 200 o

r h 100 h g g r o

200 o 200 r g g h u u r 50 h 200 50 h u u o o T 100 50 h T r 50 r o o

T 100 T r r h h h 0 0 h 0 0 T 0 0 0 0 T 4 64 128 T 4 64 128 T 4 Request size64 (KiB) 128 4 Request size64 (KiB) 128 Request size (KiB) Request size (KiB) EncFS (CBC, CPU) EncFS++ (CTR, GPU) EncFS (CBC, CPU) EncFS++ (CTR, GPU) EncFS (CBC, CPU) EncFS++ (CTR, GPU) EncFS (CBC, CPU) EncFS++ (CTR, GPU) EncFS++/EncFS EncFS++/EncFS EncFS++/EncFS EncFS++/EncFS Random Write+Encrypt (Memory) RandomRandom Write+Encrypt Write+Encrypt (SSD) (SSD) Random Write+Encrypt (Memory) EncFS EncFS++ EncFS EncFS++ EncFS EncFS++ EncFS++/EncFS Req. EncFS EncFS++ EncFS++/EncFS Req.Req. (CBC, CPU) (CTR, GPU) EncFS++/EncFS Req. (CBC, CPU) (CTR, GPU) EncFS++/EncFS Size (CBC, CPU) (CTR, GPU) Size (CBC, CPU) (CTR, GPU) Size Size Thrput CPU Thrput CPU Thrput CPU use (KiB) ThrputThrput CPUCPU ThrputThrput CPUCPU ThrputThrput CPUCPU use use (KiB) Thrput CPU Thrput CPU Thrput CPU use (KiB) (KiB) (MB/s) (%) (MB/s) (%) Var. (%) efficiency (MB/s)(MB/s) (%)(%) (MB/s)(MB/s) (%)(%) Var.Var. (%) (%) efficiencyefficiency (MB/s) (%) (MB/s) (%) Var. (%) efficiency 4 99.81 10.28 182.13 11.94 82.47 1.57 44 96.6196.61 10.3610.36 173.46173.46 11.5411.54 79.5479.54 1.611.61 4 99.81 10.28 182.13 11.94 82.47 1.57 64 213.83 10.97 801.79 18.57 274.96 2.21 6464 202.53202.53 10.5010.50 640.83640.83 14.0614.06 216.42216.42 2.362.36 64 213.83 10.97 801.79 18.57 274.96 2.21 128 211.47 10.43 888.46 18.89 320.13 2.32 128128 198.91198.91 9.919.91 739.30739.30 11.6011.60 271.69271.69 3.173.17 128 211.47 10.43 888.46 18.89 320.13 2.32

22/31 MacrobenchmarkMacrobenchmark (fileserver.f)(fileserver.f)

Filebench - fileserver.f (SSD) 220 Filebench - fileserver.f (SSD) 220 210 ) 210 ) s / s /

B 200

B 200 M ( M 190 ( t

190 t u u p 180 p h 180 h g g u 170 u o

r 170 o r h h

T 160

T 160 150 150 1 thread 2 threads 4 threads 1 thread 2 threads 4 threads EncFS (CBC, CPU) EncFS++ (CTR, GPU) EncFS (CBC, CPU) EncFS++ (CTR, GPU) eCryptfs (CBC, CPU) eCryptfs (CBC, CPU, AESNI) eCryptfs (CBC, CPU) eCryptfs (CBC, CPU, AESNI)

EncFS eCryptfs eCryptfs (CBC, EncFS++ EncFS++/ EncFS++/ EncFS++/eCryptfs s EncFS eCryptfs eCryptfs (CBC, EncFS++ EncFS++/ EncFS++/ EncFS++/eCryptfs s d d a (CBC, CPU) (CBC, CPU) CPU, AESNI) (CTR, GPU) EncFS (CBC) Ecryptfs (CPU) (CPU, AESNI) a e (CBC, CPU) (CBC, CPU) CPU, AESNI) (CTR, GPU) EncFS (CBC) Ecryptfs (CPU) (CPU, AESNI) r e r h Thrput CPU Thrput CPU Thrput CPU Thrput CPU GPU Thrput CPU Thrput CPU Thrput CPU h

T Thrput CPU Thrput CPU Thrput CPU Thrput CPU GPU Thrput CPU Thrput CPU Thrput CPU T (MB/s)(MB/s) (%)(%) (MB/s)(MB/s) (%)(%) (MB/s)(MB/s) (%)(%) (MB/s)(MB/s) (%)(%) (%)(%) Var.Var. (%) (%) Effic.Effic. Var.Var. (%) (%) Effic.Effic. Var.Var. (%) (%) Effic.Effic. 11 164.76164.76 9.949.94 157.22157.22 8.848.84 206.90206.90 4.614.61 209.38209.38 4.684.68 10.4610.46 27.0827.08 2.702.70 33.1833.18 2.522.52 1.201.20 1.001.00 22 188.24188.24 11.4211.42 201.88201.88 11.4611.46 212.36212.36 4.374.37 214.04214.04 5.835.83 12.7512.75 13.7113.71 2.232.23 6.026.02 2.082.08 0.790.79 0.750.75 44 196.24196.24 12.3312.33 209.74209.74 16.1416.14 216.06216.06 10.9210.92 215.12215.12 5.795.79 11.3911.39 9.629.62 2.332.33 2.572.57 2.862.86 -0.44-0.44 1.881.88

23/31 MacrobenchmarkMacrobenchmark (webserver.f)(webserver.f)

Filebench - webserver.f (SSD) 500 Filebench - webserver.f (SSD) 500

) 400 ) s

/ 400 s / B B M

( 300 M

( 300 t

t u u p p h 200 h

g 200 g u u o r

o 100 r h 100 h T T 0 0 1 thread 2 threads 4 threads 1 thread 2 threads 4 threads EncFS (CBC, CPU) EncFS++ (CTR, GPU) EncFS (CBC, CPU) EncFS++ (CTR, GPU) eCryptfs (CBC, CPU) eCryptfs (CBC, CPU, AESNI) eCryptfs (CBC, CPU) eCryptfs (CBC, CPU, AESNI) EncFS eCryptfs eCryptfs (CBC, EncFS++ EncFS++/ EncFS++/ EncFS++/eCryptfs s EncFS eCryptfs eCryptfs (CBC, EncFS++ EncFS++/ EncFS++/ EncFS++/eCryptfs s d d a (CBC, CPU) (CBC, CPU) CPU, AESNI) (CTR, GPU) EncFS (CBC) Ecryptfs (CPU) (CPU, AESNI) a e (CBC, CPU) (CBC, CPU) CPU, AESNI) (CTR, GPU) EncFS (CBC) Ecryptfs (CPU) (CPU, AESNI) r e r h Thrput CPU Thrput CPU Thrput CPU Thrput CPU GPU Thrput CPU Thrput CPU Thrput CPU h T Thrput CPU Thrput CPU Thrput CPU Thrput CPU GPU Thrput CPU Thrput CPU Thrput CPU T (MB/s)(MB/s) (%)(%) (MB/s)(MB/s) (%)(%) (MB/s)(MB/s) (%)(%) (MB/s)(MB/s) (%)(%) (%)(%) Var.Var. (%) (%) Effic.Effic.Var.Var. (%) (%) Effic.Effic. Var.Var. (%) (%) Effic.Effic. 11 192.28192.28 8.038.03 204.88204.88 10.8510.85 422.54422.54 3.613.61 471.80471.80 11.2411.24 25.4725.47 145.37145.37 1.751.75 130.28130.28 2.222.22 11.6611.66 0.360.36 22 239.74239.74 10.7610.76 211.06211.06 11.1711.17 316.12316.12 3.633.63 418.34418.34 10.4610.46 23.2523.25 74.5074.50 1.791.79 98.2198.21 2.122.12 32.3432.34 0.460.46 44 234.76234.76 10.0310.03 222.88222.88 11.6811.68 296.20296.20 6.966.96 322.30322.30 8.768.76 19.0719.07 37.2937.29 1.571.57 44.6144.61 1.931.93 8.818.81 0.870.87

24/31 ConclusionsConclusions

➢ Microbechmark, with FS in memory: Throughput: gains up to ≈360% (sequential read), ≈130% (random read), ≈320% (sequential and random writing). CPU Efficiency: gains up to ≈2.3x (sequential read and write and random write), ≈1.52x (random read) ➢ Macrobechmark (fileserver), with FS in SSD: Throughput: gains up to ≈27% (vs EncFS), ≈33% (vs eCryptfs). CPU Efficiency: ≈2.7x (vs EncFS), ≈2.9x (vs eCryptfs) ➢ Macrobechmark (webserver), with FS in SSD: Throughput: gains up to ≈145% (vs EncFS), ≈130% (vs eCryptfs). CPU Efficiency: ≈1.8x (vs EncFS), ≈2.2x (vs eCryptfs) ➢ Competitive even with AESNI, reaching up to ≈32% gain (vs eCryptfs, webserver). However, CPU usage: up to ≈0.4x (vs eCryptfs, webserver) 25/31 ConclusionsConclusions (cont.)

➢ Main contributions: → advantages of applying CTR mode in CFSs (generation, storage and management of nonces) → explore additional advantages of CTR mode (parallelization, speculative encryption and Encryption Context Management) ➢ WAESlib applied to CFSs (abstracts GPU processing complexity, successfully exploits CTR mode, allows to create different techniques when using the encryption contexts)

26/31 ConclusionsConclusions (cont.)

➢ GPU processing: significant increases in throughput (including small requests) and more efficient CPU utilization in environments where processors do not support the acceleration of cryptographic functions (or use of other ciphers)

➢ Future work: → performance analysis with actual loads (better testing / creating new techniques with context pools) → extend to other accelerators, multicore or heterogeneous

→ explore CTR / encryption in GPU (WAESlib) with kernel space client (e.g. dm-crypt, Crypto-API Linux FS client)

27/31 Thank you!

28/31 Questions?

29/31 Backup slide

30/31 Amount of time to CTR counter “wraparround”:

suppose a (current) 200 Gbps encryption capacity: 2 * 1011 bps AES block size: 128 bits Time for encrypting 1 AES block: 128 / 2 * 1011 = 6,4 * 10-10 s (0,64 ns) Number of possible nonces: 2128 = 3,4 * 1038 blocos Time to uniquely cypher all blocks: (3,4 * 1038) * (6,4 * 10-10) = 2,17 * 1029 s

Years to “wraparround” = 2,17 * 1029 / 31.536.000 = 6,88 * 1021 years

Suppose a BILION times faster machine: 2 * 1020 bps Time for encrypting 1 AES block: 128 / 2 * 1020 = 6,4 * 10-19 s Time to uniquely cypher all blocks: (3,4 * 1038) * (6,4 * 10-19) = 2,17 * 1020 s

Years to “wraparround” = 2,17 * 1020 / 31.536.000 = 6,88 * 1012 years that is: 6,88 trilion years! 31/31