Context-Aware Prefetching at the Storage Server

Gokul Soundararajan, Madalin Mihailescu†, and Cristiana Amza Department of Electrical and Computer Engineering Department of Computer Science† University of Toronto

Abstract repeatable non-sequential access patterns as well. How- ever, the storage system receives interleaved requests In many of today’s applications, access to storage con- originating from many concurrent application streams. stitutes the major cost of processing a user request. Thus, even if the logical I/O sequence of a particular ap- Data prefetching has been used to alleviate the stor- plication translates into physically sequential accesses, age access latency. Under current prefetching tech- and/or the application pattern is highly repeatable, this niques, the storage system prefetches a batch of blocks pattern may be hard to recognize at the storage system. upon detecting an access pattern. However, the high This is the case for concurrent execution of several ap- level of concurrency in today’s applications typically plications sharing a network attached storage, e.g., as leads to interleaved block accesses, which makes detect- shown in Figure 1, and also for a single application with ing an access pattern a very challenging problem. To- multiple threads exhibiting different access patterns, e.g., wards this, we propose and evaluate QuickMine, a novel, a database application running multiple queries, as also lightweight and minimally intrusive method for context- shown in the figure. aware prefetching. Under QuickMine, we capture ap- We investigate prefetching in storage systems and plication contexts, such as a transaction or query, and present a novel caching and prefetching technique that leverage them for context-aware prediction and improved exploits logical application contexts to improve prefetch- prefetching effectiveness in the storage cache. ing effectiveness. Our technique employs a context We implement a prototype of our context-aware tracking mechanism, as well as a lightweight frequent prefetching algorithm in a storage-area network (SAN) sequence mining [38] technique. The context tracking built using Network Block Device (NBD). Our proto- mechanism captures application contexts in an applica- type shows that context-aware prefetching clearly out- tion independent manner, with minimal instrumentation. performs existing context-oblivious prefetching algo- These contexts are leveraged by the sequence mining rithms, resulting in factors of up to 2 improvements in technique for detecting block access patterns. application latency for two e-commerce workloads with repeatable access patterns, TPC-W and RUBiS. In our context tracking mechanism, we simply tag each application I/O block request with a context iden- tifier corresponding to the higher level application con- 1 Introduction text, e.g., a web interaction, database transaction, appli- cation thread, or database query, where the I/O request In many of today’s applications, such as, e-commerce, to the storage manager occurs. Such contexts are readily on-line stores, file utilities, photo galleries, etc., access available in any application and can be easily captured. to storage constitutes the major cost of processing a We then pass this context identifier along with each read user request. Therefore, recent research has focused block request, through the , to the stor- on techniques for alleviating the storage access latency age server. This allows the storage server to correlate the through storage caching [14, 23, 29] and prefetching block accesses that it sees into frequent block sequences techniques [24, 25, 35, 36, 37]. Many traditional storage according to their higher level context. Based on the de- prefetching algorithms implement sequential prefetch- rived block correlations, the storage cache manager then ing, where the storage server prefetches a batch of se- issues block prefetches per context rather than globally. quential blocks upon detecting a sequential access pat- At the storage server, correlating block accesses is per- tern. Recent algorithms, like C-Miner∗ [24, 25], capture formed by the frequent sequence mining component of

USENIX Association USENIX ’08: 2008 USENIX Annual Technical Conference 377 ourapproach approach.h. In particular,particular, we design and implement a MySQL/InnoDB MySQL/InnoDB lightweight aand dynamic frequent sequencemining tech- 1 2 3 5 7 9 IV V VI II IX X nique, called QuicQuickMinekMine. Thread-1 Thread-2 Thread-3 Thread-4 Justlik likee sstate-of-the-art prefetchingalg algorithmsgorithms [24, 25], QuicQuickMinekMMine detects sequential as wwell as non- Operating System Operating System sequential correlationscoorrelations usinga history-based d mechani mechanism.sm. 1 2 5 7 3 9 IV II IX V X VI QuicQuickMine’skMine’s kkeyey no noveltyvelty lies in detectingan andnd le leveragingveraging VolumeVolume 1 VolumeVolume 2 block correlationscorrelaations within logical applicationn contexts.contexts. In Consolidated Storage additionaddition, QuQuickMineuickMine generates andadapts block corre-corre 1 2 IV 5 II 7 IX 3 V X 9 VI lations incrincrementally,rementallyemmentallyy,, on-the-fly , through a lightweight mining algorithm.algorrithm. As we willsho showw inour experi-experi- mental evaluation,evaluuation, these no novelvel features makemmake QuicQuick-k- Figure1: InterleavedInterleaved Accesses. Accessess from con- Mine uniqueuniquelyely suitablefor on-line pattern n miningand current processes/threadsproceesses/threads are interleavedinterleaved atthe s storagetorage prefetching bby i)substantially reducing thee footprint of server.server. the block correlationscorrrelations it generates, ii) improvingimprovving the like-like- lihood that th thehe block correlationsmaintained d will leadto accurate pref prefetchesfetches and iii)pro providingviding flexibilityflexiibility tody- prefetchesor ii)ii) non-repeatable (false)(ffalse)alse) block correlationscorrelations namic chang changesges inthe application pattern, and concur-concur- atconte contextxt boundaries,bouundaries, hence useless prefetch h rulesin the rencrencyy de degree.gree. context-obliviouscontext-obliviious approaches. Our e evaluationvaluationn showsshows that WWee implem implementment QuicQuickMinekMine in alightwe lightweighteight storage QuickMineQuickMine generatesgeenerates substantiallymore ef effectiveffectifefective block cache prototy prototypeype embedded into the Netw Networkorrk BlockDe- correlationru rulesules overall,overall, in termsof both thee number of vice (NBD) code. WeWe alsoimplement se severalveral alterna-alterna- prefetchestri triggeredggered andthe prefetching accuracy.accuracy. WWe e titiveve approach approacheshes for comparison with our scheme,schheme, includ- also showshow tha thatat QuickMineQuickMine is capableof adjus adjustingsting itscor cor-- ing a baseline e LR LRURUU cache replacementalgor algorithmrithm with no relationrules dynamically,dynamically, without incurring undue o over-ver- prefetching, and the follo followingwing state-of-the-artstate-of-thee-art conte context-xt- headfor rapid rapidlydly changingpatterns. obliobliviousvious pre prefetchingefetching schemes: tw twoo adapti adaptiveve se sequentialquential The rest off this paper is organizedorganized as fo follows.llows. Sec- prefetching sschemes [10,16] and the recentlyrecenntly proposed tions 2pro providesviddes thenecessary background an andnd motivatesmotivates C-Miner∗ storagestoorage prefetching algorithm[24 [24,4, 25]. ourdynamic, , context-awarecontext-aware algorithm.Se Sectionection 3 in- troduces our QQuicQuickMinekMine context-awarecontext-aware prefe prefetchingetching solu- Inour e experimentalxpperimental evaluation,evaluation, we use threethhree standard tion.ti SectionS ti 4 pro providesvidesid ddetails t il of f ouriimpl implementation.llementation.t ti databaseapp applications:plications: the TPC-W e-commercee-commmerce bench- Section 5 describesdesscribes our experimentalexperimental platform,platforrm, method- mark[1], th thehe RUBiSRUBiS auctions benchmarkk andDBT DBT-T-- ology,ology, other approaches instorage cache m managementanagement 2 [40],a TPC TPC-C-likeC-C-like benchmark[30]. Theappli applicationscations thatwe e evaluatevaluate in oure experiments.xperiments. Sectionn 6 presentspresents havehave awide range of access patterns. TPC-WTPCC-W and RRU- U- our experimentalexperimeental results. Section7 discu discussesusses r relatedelated BiS are read-intensiveread--intensive w workloadsorkloads with highlyhighly repeatablerepeatable workwork and SectionSecction 8 concludes the paper paper.. access patterns;patterrns; theythey contain 80% and 85%85% read-only transactions, respectively,respectively, in their w workloadorkload mix. In con- trast, DBT-2DBTT-2-2 is a write-intensivewrite-intensive applicationapplicationn with rapidly 2 BackgroundBackgrround andMoti Motivationvation changing accessacccess patterns; it containsonly 4%4 read-only transactionsin itsw workloadorkload mix. WWe e in instrumentstrument the WeWe focuson impro improvingving the cache hit rate at t the storage MySQL/InnoDBMMySQL/InnoSQL/I oDBDB ddatabase t b engine i tto trackt k thethhhe contextscontet xtst off cache in a SA SANAN environmentenvironmentv through prefetching.prefettching. Our interest. WeWe found that changingthe DBMS S toincorpo- techniques arearre applicableto situationswher wherere the w work-ork- rate the contextconteext intoan I/O requestw wasas tri trivial;viaal; the DBMS ing set ofsto storagerage clients,lik likee a database sy systemystem orfile already trackstrackks v variousarious contexts,contexts, such as data databaseabase thread, system, does notfit into the storage cachehi hierarchyierarchy i.e., transactionoor query query,, andthese contextscontexts arearre easily ob- into thecomb combinedbined caching capabilities ofst storagetorage cl clientient tained for eacheaach I/O operation.operation. WWe e perform experimentsexperiments and server.server. ThisThhis situation is, and will remainn commonin using oursto storageorage cache deplo deployedyed within NBD,N running the foreseeableforeseeablle future dueto the followingfollowing reasons.reeasons. in a storageaarea networknetwork (SAN) environment.environmev nt. First,while e both clientand storage serverserverr cache sizes Our experimentsexperiments showshow that the context-awarecontext-aaware Quick-Quick- are increasing increasing,g, so do the memory demandss ofstorage Mine brings substantial latencylatency reductions of up to fac-fac- applications e.g.,e veryvery lar largege databases. Second,S pre- tors of 2.0.2 0 TheT latencylatency reductions correspondcorrespoond to reduc-reduc vious researc researchch hassho shownwn thataccess patternspattterns atthe tions of miss rates in the storage cache of upup to 60%. In storage serv servereer cache typically exhibitexhibit long g reuse dis- contrast, the contextcontext obliviousoblivious schemes performperrform poorly tances [7,26] 26],], hence poorcache utilization. . Third, due for all benchmarks,benchhmarks, with latencies comparablecompaarable to, or to serverserver consolidationconssolidation trends towardstowards reducingreduciing thecos coststs worseworse than thethhe baseline. This is due to either i) inaccurateinaccurate of manageme managementent inlar largege datacenters, severalseveeral applica-applica-

378 USENIX ’08: 2008 USENIX Annual Technical Conference USENIX Association 10000 BiS. WeWe then measure the sequential run lengthslenngths on the t t n n u u 757 interleavedinterleaved accessacccess traces, while progressivelyprogressively increasingincreasing o o C C the number ofo clients. WeWe find that the sequentialseqquential run ) ) e e g g c c v v v n n 50 length decreasesdecreases significantly as we increasee the concur-concur- e e A A A r ( ( e f

r rencyrency degree.degree.. ForFor example,example, with 10 concurrentlyconcuurrently run- 25 e t

n ning clients, thethhe sequential run length is alreadyalreaady affected:affffected: I 0 1.04 (average)(averagee) and 65 (maximum) for TPC-W,TPC-WW,, and 1.05 0 25 50 75 100 (average)(average) andd 64 (maximum) for RUBiS.RUBiS. WithWith the com- Num. ConcurrentConcurrent Clients mon e-commercee-commmerce workloadworkload of 100 clients,, the averageaverageg TPC-W RUBiSUBiS sequential runn length asymptotically approachesapproaaches 1.0 for both benchmarks.benchmaarks. ToTo further understand thee drop in se- Figure2: InterferenceInnterference count. Accessinte interleavingserleavings of quential run length,l we plot the interferencece count for difdifferentfferentferent thre threadseads with increasing number oof concurrent each benchmarkbenchmaark when increasing the numbernumber of clientsclients clients. in Figure 2. TheT figure showsshows that the interferenceinterfeerence count increases steadilysteaadily with the number of concurrentconcurrrent clients,clients, from 5.87 forr TPC-W and 2.91 for RUBiSRUBiS ata 10 clients,clients, tions typicall typicallyly run ona cluster with conso consolidatedolidated stor stor-- to 82.22 for TPC-WT and 15.95 for RUBiSRUBiS withw 100 con- age, ore evenven onthe same ph physicalysical serv server.er. Thisc createsreates currently runningrunnning clients. application in interference,nterference, hence potential capacitycappacity miss misses,es, ToTo study thethhe lowerlower interference count in RUBiSRUBiS com- andd reduced d d prefetching f t hi ef effectivenessfffectiffectitiveness [19]iin ththe sharedh d pared to TPC-W,TPC--WW,, we compute the averageaverage contextcoontext access storage-lestorage-levelvel cache. length per transaction,traansaction, in the twotwo benchmarks.benchmarrks. WeWe find Inthe fo following,ollowing, we motivatemotivate our co context-awareontext-aware that the averageaveraage contextcontext access length for RUBiSRUBiS is 71 prefetching approacha through a w workloadorkload charact characteriza-eriza- blocks, comparedcompaared to 1223 blocks for TPC-W,TPC-WWW,, 87% of the tion for twotwo ee-commerce applications. RUBiSRUBiS transactionstransaactions are short,short, reading only 1 to 10 blocks of data, comparedcomppared to 79% in TPC-W,TPC-WW,, and severalseveral RU-RU- 2.1WWorkload orkkload Characterization BiS transactionstransactioons access a singlesingle block. Hencee in TPC-W,TPC-WW,, longer logicall access sequences result in higherhiggher interfer-interfer- WeWe analyze theaccess patternsof tw twoo populare- ence opportunities,opportunnities, and forfor both benchmarkss only a fewfew commerce be benchmarks:enchmarks: TPC-W,TPC-WW,, and RRUBiS. UBBiS. WeWe con- logical sequencessequennces translate into physicallyphysically sequentialseequential ac- duct e experimentsxperimments using a networknetwork storage serverserver based cesses.cesses onthe NBD (network(network block device)device) protoc protocolcol builtbuilt into The preliminaryprelimminary results presented in thist section . The NNBD serv serverer manages access topph physicalysical stor stor-- showshow that: i) opportunities for sequentialsequentiaal prefetch- age and providesprovvides virtual block de devicesvices toapplic applications.ations. ing in e-commercee-commmerce workloadsworkloads are low,low, andd ii) random WWee experimentexperimeent with each benchmarksepara separately,ately, varyingvarying (non-repeatable)ble) access interleavingsinterleavings can occuroccur for the the number ofo cl clientsients from 1 to100. Eachh applicationapplication high levelslevels ofo application concurrencyconcurrency commoncommmon in e- runs onMyS MySQL/InnoDB,SQL/InnoDB, whichuses theNNBD client to commerce workloads.woorkloads. Accordingly,Accordingly, these resultsr moti-moti- mount the virtualvirrtual block device.device. WWe e provideprovide more detailsdetails vatevate the needd for a prefetching scheme thatthaat i) exploitsexploits on our e experimentalxperiimental setup inSection 5. generic (non-sequential)(non--sequential) access patterns in the applica-applica- WeWe characterizecharaccterize the accesspatterns of thethhe tw twoo appli- tion, and ii) iss awareaware of application concurrencyconcurreency and ca- cations using g thefollo followingwing metrics. The average/maxi-average/maxi- pablep of detectingdeteectingg access patternsp perp applicationapplpp ication con- mum sequentialsequenttial run length [39] is the average/maximumaveragge/maximum text.text. ForFor thiss purpose, in the following,following, wew introduce length of physicallyphyysically sequential block sequencessequeences for the the QuickMineQuickMinne algorithm that employsemploys data miningm prin- duration of thethhe application’sapplication’s execution.execution. The averageaverage con- ciples to discoverdisccover access correlations at runtimeruuntime in a textteextxt access lengthleength is the averageaverage number of I/O requests context-awarecontext-awaree manner.manner. issued by a logicalloogical unit of workwork in the application,application, i.e., by a transaction.transactionn. Finally,Finally, the interferenceinterferrenceence countcouunt [19] mea- sures the interferenceintterference in the storage cache,cachee, defined as 3 Context-awareContexxt-aware Miningand PrefetchingPrefetching the number ofo requests from other transactionstransacttions that oc- cur between consecutiveconsecutive requests from a givengiven transac-transac- In this section,sectioon, we describeour approach to conte context-xt- tion stream. aawareware prefetchingprefetching at thestorage serv server.er. WWe e first pr presentesent In our experiments,expperimentsperiments, we first compute thethhe sequential an overviewoverview of our approach,approach and intr introduceodduce t thehe termi-termi run lengths whenw each thread is run in isolationisollation i.e., on nology we use.usee. WWe e then describe in more detaildettail our tech- the de-tangledde-tangleed trace [39] for each of the twotwo benchmarks. nique fortrac trackingcking application contexts,contexts, the e QuicQuickMinekMine The lengths are:a 1.05 (average)(average) and 277 (maximum)(mmaximum) for algorithm for discoveringdiscovering block correlations and di discussscuss TPC-W and 1.14 (average)(average) and 64 (maximum)(maximmum) for RU-RU- hohoww we le leverageveraage themin our prefetchingalg algorithm.gorithm.

USENIX Association USENIX ’08: 2008 USENIX Annual Technical Conference 379 RULEULE MFU LFULFU RULEULE ID IDD T1 6 7 8 11. 5 6 7 MRUMRU 221. 5 6 7 T2 2 3 4 5 6 7

2 6. 6 4 8 6. 3 4 5 6 7 6 5. 6 3 7 4 8 5. 2 6 7 3 4. 2 4 8 4. 2 5 6 7 Accesses 7 3. 2 7 4 8 3. 2 4 5 6 7 interleaved 4 2. 2 3 7 4 8 at StorStorageage 2. 2 3 4 5 6 7 SerServerver 8 1. 2 6 3 7 4 8 1. 6 7 8 5 LRULRU PrefixPrefix Suffi Suffixx 6 Prefixrefix S Suffixuffix 7 QuickMineuickMine No Contexts

PANELPANEL 1 PANELPANEL 2 PANELPANEL 3

FigureF 3: WWalk-thrWalk-through.alk-through. WeWe comparecoompare QuickMineQuickMine with a conte context-obliviousext-oblivious mining algorithm m

3.1Ov Overviewerrview plication cont contexttext ordered by the timestamp oof their disk

requestsrequests. A se sequenceequence database D = {S1,S, S2,...,S, . . . , Sn} is WeWe use application-levelappllication-level conte contextsxts to guideguidde I/O block a set ofseque sequences.ences. The support of a sequen sequencence R in the prefetching aat thestorage server.server. An appl application-levellication-level database D iss the number ofsequences for which R is contextcontext is a l logicalogical unit ofw workork correspond correspondingding to a spe- a subsequencee. Asubsequence is considered d frfrequentrequentequent if cific levellevel in the application’ application’ss structural hierarchyhierrarchy e.g., a it occurs with h afrequenc frequencyy higher than a pred predefineddefined min- thread, a webb interaction, a transaction, a queryqueery template,template, support threshthreshold.hold. Blocksin frequent subsequencessubseequences are or a queryin instance.stance. said tobe corcorrelatedrrrelatedelated. Correlated blocksdo not ha haveve to WeWe tag eacheach I/Oaccess with a contextconteext identifier be consecuti consecutive.ve. TheyThey should occur within a smalldis- proprovidedvided bythe application and pass thesethesse identifiersidentifiers tance, called a gap or lookahead distance , denoted G. through the operatingooperating systemto the storage e serv server.er. This The largerlarger thee lookahead distance, the more e aggressiveaggressive alloallowsws the st storagetorage serverserver to groupblock accessesper the algorithm g is in determining g correlations. F Foror thepur pur-- application-leapplication-levelevel conte context.xt. In the exampleexample in Figure1, as- poses of oursst storageorage caching and prefetching g al algorithm,gorithm, suming that thet conte contextxt identifier of anI/O access isthe we distinguish h between three typesof cache accesses.A thread identifi identifier,fier, the st storageorage serv serverer is ableto o dif differentiatefferentiferentiate cache hit is an n applicationdemand accessto ablock cur-cur- that blocks {1, 2, 3} are accessedby ThrThread-1read-1ead--1 andblocks rently in the storagestorage cache. A prpromoteromoteomote is a blockbllock demand {5, 7, 9} are accessed by Thread-2Thrread-2ead-2, from thethe interleavedinterleaved access for whichwhhich a prefetch has been pre previouslyvioously issued; access pattern pattern.n. the respecti respectiveve block mayor may not ha haveve ar arrivedrrived at the WithinWithini eac eachch sequence of blockaccesses tthus grouped cache at the timet of the demand access. AAll other ac- by applicatio applicationon conte context,xt, the storage serv servererr applies our cesses are cac cacheche misses. frequent sequencesequuence mining algorithm, calledd QuicQuickMinekMine. The QuickMineQuickMine algorithm predicts aset ofblocks to be 3.2Conte Contextext TTracking racking accessedd wit withi h hihigh h probability b bili iin the h near future.f Theh predictions aare madebased on mining pastpast access pat- ContextsContexts are delineated with beginbegin and end d delimitersdelimiters terns. Specifi Specifically,fically, QuicQuickMinekMine deriderivesves perper-context-ccontext corre- and canbe nested.n WWe e use our contextcontext trackingt for lation rules fo foror blocks that appear together frfrequently for database appl applicationslications and track informationabout three a gi givenven conte context.ext. Creation of ne neww correlati correlationion rules and types of contexts:conttexts: application threadthrreadead, datadatabaseabase trans-trrans-ans- pruning usele uselessess old rules for an application n and its v var-ar- action and databasedaatabase query . ForFor databasequeries, we ious conte contextsxtss occurs incrementally incrementally,, on-the-fl on-the-fly,fly, while the can trackthe conte contextxt of each query instance,instancee, or of each system perfo performsorms its re regulargular acti activities,vities, includ includingding running query templatetemplatte i.e., thesame query withdif differentffferentferent argu-argu- other applica applications.ations. The QuicQuickMinekMine algorithalgorithmhm is embed- ment values.values. WWe e reuse pre-existingpre-existing be begingin andand end mark- ded in the sto storageorage serv serverer cache. The storag storagege cache uses ers, such as, connection establishment/connectionestablishment/connnection tear tear-- the deri derivedved rrules to issue prefetches for blo blocksocks that are dodown,wnwn, BEGINN and COMMIT/ROLLBACK statements,stattementstements, and eexpectedxpected to ooccur within short order after a sequence of thread creationcreatioon anddestruction to identify thet start and already seen blocks. end of aconte context.ext. F Foror application threadthrreadead contcontexts,texts, wetag TTerminology:erminoloogy: FForor the purposes of our da dataata mining al- block accessesaccessees with the threadidentifier of thet databas databasee gorithm, a sesequencequence is a list of I/O reads issu issuedued by an ap- system thread d running the interaction. WeWe dif differentiatefferentiferentiate

380 USENIX ’08: 2008 USENIX Annual Technical Conference USENIX Association database transaction contexts by tagging all block ac- the rule prefix {a2&a3}. QuickMine predicts that the cesses between the BEGIN and COMMIT/ABORT with next block to be accessed will be suffix {a4} or {a5} the transaction identifier. A database query context sim- in this order of probability because {a2,a3,a4} occurred ply associates each block access with the query or query twice while {a2,a3,a5} occurred only once. template identifier. We studied the feasibility of our tag- ging approach in three open-source database engines: MySQL, PostgreSQL, and Apache Derby, and we found We track the blocks accessed within each context and the necessary changes to be trivial in all these existing create/update block correlations for that context when- code bases. The implementation and results presented in ever a context ends. We maintain all block correlation this paper are based on minimal instrumentation of the rules in a rule cache, which allows pruning old rules MySQL/InnoDB database server to track transaction and through simple cache replacement policies. The cache query template contexts. replacement policies act in two dimensions: i) for rule While defining meaningful contexts is intuitive, defin- prefixes and ii) within the suffixes of each prefix. We ing the right context granularity for optimizing the keep the most recent max-prefix prefixes in the cache. prefetching algorithm may be non-trivial. There is For each prefix, we keep max-suffix most probable suf- a trade-off between using coarse-grained contexts and fixes. Hence, the cache replacement policies are LRU fine-grained contexts. Fine-grained contexts provide (Least-Recently-Used) for rule prefixes and LFU (Least- greater prediction accuracy, but may limit prefetch- Frequently-Used) for suffixes. Intuitively, these policies ing aggressiveness because they contain fewer accesses. match the goals of our mining algorithm well. Since Coarse-grained contexts, on the other hand, provide access paths change over time as the underlying data more prefetching opportunities, but lower accuracy due changes, we need to remember recent hot paths and for- to more variability in access patterns, e.g., due to control get past paths. Furthermore, as mentioned before, we flow within a transaction or thread. need to remember only the most probable suffixes for each prefix. To prevent quick evictions, newly added suf- 3.3 QuickMine fixes are given a grace period. QuickMine derives block correlation rules for each ap- plication context as follows. Given a sequence of al- Example: We show an example of QuickMine in Fig-

ready accessed blocks {a1,a2,...,ak}, and a lookahead ure 3. We contrast context-aware mining with context- parameter G, QuickMine derives block correlation rules oblivious mining. On the left hand side, we show an of the form {ai&aj → ak} for all i, j and k, where example of an interleaved access pattern that is seen at {ai,aj,ak} is a subsequence and i

taken repeatedly during the execution of the correspond- Panel 2, we obtain the list of block accesses after T1 ing application context. For the same rule, {c} is one and T2 finish execution. T1 accesses {6, 7, 8} so we of the block accesses about to follow the prefix with obtain the correlation {6&7 → 8} (rule 1 in Panel 2). high probability and is called a sequence suffix. Typi- T2 accesses {2, 3, 4, 5, 6, 7} leading to more correlations. cally, the same prefix has several different suffixes de- We set the lookahead parameter, G =5, so we look pending on the lookahead distance G and on the variabil- ahead by at most 5 entries when discovering correla- ity of the access patterns within the given context, e.g., tions. We discover correlations {(6&7 → 8), (2&3 → due to control flow. For each prefix, we maintain a list 4), (2&3 → 5),...,(5&6 → 7)}. Finally, we show a of possible suffixes, up to a cut-off max-suffix number. snapshot of context-oblivious mining, with false correla- In addition, with each suffix, we maintain a frequency tions highlighted, in Panel 3. At the end of transaction

counter to track the number of times the suffix was ac- T2, the access pattern {2, 6, 3, 7, 4, 8, 5, 6, 7} is extracted cessed i.e., the support for that block. The list of suffixes and the mined correlations are {(2&6 → 3), (2&6 → is maintained in order of their respective frequencies to 7), (2&6 → 4), (2&6 → 8),...,(5&6 → 7)}. With help predict the most probable block(s) to be accessed context-oblivious mining, false correlations are gener- next. For example, assume that QuickMine has seen ac- ated, e.g., {(2&6 → 3), (2&6 → 4), (2&6 → 8)} are cess patterns {(a1,a2,a3,a4), (a2,a3,a4), (a2,a3,a5)} incorrect. False correlations will be eventually pruned in the past for a given context. QuickMine creates rules since they occur infrequently, hence have low support,

{a2&a3 → a4} and {a2&a3 → a5} for this context. but the pruning process may evict some true correlations Further assume that the current access sequence matches as well.

USENIX Association USENIX ’08: 2008 USENIX Annual Technical Conference 381 ity storage firmware. The architecture of our prototype Virtual Storage MySQL is shown in Figure 4. MySQL communicates to the vir- /dev/raw/raw1 NBD Cache Disk tual storage device through standard Linux system calls and drivers, either iSCSI or NBD (network block device),

Linux as shown in the figure. Our storage cache is located on Linux /dev/sdb1 the same physical node as the storage controller, which NBD in our case does not have a cache of its own. The stor- /dev/nbd1 age cache communicates to a disk module which maps virtual disk accesses to physical disk accesses. We mod- CLIENT SERVER ified existing client and server NBD protocol processing modules for the storage client and server, respectively, in order to incorporate context awareness on the I/O com- Figure 4: Storage Architecture: We show one client munication path. connected to a storage server using NBD. In the following, we first describe the interfaces and communication between the core modules, then describe 3.4 Prefetching Algorithm the role of each module in more detail. The storage cache uses the block correlation rules to is- 4.1 Interfaces and Communication sue block prefetches for predicted future read accesses. Block prefetches are issued upon a read block miss. We Storage clients, such as MySQL, use NBD for read- use the last two accesses of the corresponding context to ing and writing logical blocks. For example, as search the rule cache for the prefix just seen in the I/O shown in Figure 4, MySQL mounts the NBD device block sequence. We determine the set of possible blocks (/dev/nbd1)on/dev/raw/raw1. The Linux vir- to be accessed next as the set of suffixes stored for the tual disk driver uses the NBD protocol to communicate corresponding prefix. We prefetch blocks that are not with the storage server. In NBD, an I/O request from currently in the cache starting with the highest support the client takes the form block up to either the maximum suffixes stored or the where type is a read or write. The I/O request is passed maximum prefetching degree. by the OS to the NBD kernel driver on the client, which The number of block prefetches issued upon a block transfers the request over the network to the NBD proto- miss called prefetch aggressiveness, is a configurable col module running on the storage server. parameter. We set the prefetching aggressiveness to the same value as max-suffix for all contexts. How- 4.2 Modules ever, in heavy load situations, we limit the prefetching aggressiveness to prevent saturating the storage band- Each module consists of several threads processing re- width. Specifically, we leverage our context information quests. The modules are interconnected through in- to selectively throttle or disable prefetching for contexts memory bounded buffers. The modular design allows where prefetching is not beneficial. Prefetching benefit is us to build many storage configurations by simply con- determined per context, as the ratio of prefetches issued necting different modules together. versus prefetched blocks used by the application. Disk module: The disk module sits at the lowest The prefetched blocks brought in by one application level of the module hierarchy. It provides the interface context may be, however, consumed by a different appli- with the underlying physical disk by translating appli- cation context due to data affinity. We call such contexts cation I/O requests to the virtual disk into pread()/ symbiotic contexts. We determine symbiotic contexts sets pwrite() system calls, reading/writing the underlying by assigning context identifiers tracking the issuing and physical data. We disable the operating system buffer using contexts for each prefetched cache block. We then cache by using direct I/O i.e., the I/O O_DIRECT flag in monitor the prefetching benefit at the level of symbiotic Linux. context sets rather than per individual context. We dis- Cache module: The cache module supports context- able prefetching for contexts (or symbiotic context sets) aware caching and prefetching. We developed a portable performing poorly. caching library providing a simple hashtable-like inter- face modelled after Berkeley DB. If the requested block is found in the cache, the access is a cache hit and we 4 Implementation Details return the data. Otherwise, the access is a cache miss, we fetch the block from the next level in the storage hi- We implement our algorithms in our Linux-based virtual erarchy, store it in the cache, then return the data. When storage prototype, which can be deployed over commod- prefetching is enabled, the cache is partitioned into two

382 USENIX ’08: 2008 USENIX Annual Technical Conference USENIX Association areas: a main cache (MC) and a prefetch cache (PFC). ning of a context (CTX_BEG) and the end of a context The PFC contains blocks that were fetched from disk (CTX_END). by the prefetching algorithm. The MC contains blocks MySQL: A THD object is allocated for each connec- that were requested by application threads. If an appli- tion made to MySQL. The THD object contains all con- cation thread requests a block for which a prefetch has textual information that we communicate to the storage been issued, we classify the access as a promote. A block server. For example, THD.query contains the query promote may imply either waiting for the block to arrive currently being executed by the thread. We generate the from disk, or simply moving the block from PFC to MC query template identifier using the query string. In addi- if the block is already in the cache. tion, we call our ctx_action() as appropriate, e.g., at We use Berkeley DB to implement the rule cache and transaction begin/end and at connection setup/tear-down store the mined correlations. The caching keys are rule to inform the storage server of the start/end of a context. prefixes and the cached data are the rule suffixes. The suffixes are maintained using the LFU replacement al- 5 Evaluation gorithm and the prefixes are maintained using LRU. The LRU policy is implemented using timeouts, where we pe- In this section, we describe several prefetching algo- riodically purge old entries. We configure Berkeley DB’s rithms we use for comparison with QuickMine and evalu- environment to use a memory pool of 4MB. ate the performance using three industry-standard bench- NBD Protocol module: We modify the original NBD marks: TPC-W, RUBiS, and DBT-2. protocol module on the server side, used in Linux for virtual disk access, to convert the NBD packets into our own internal protocol packets, i.e., into calls to our server 5.1 Prefetching Algorithms used for cache module. Comparison In this section, we describe several prefetching al- 4.3 Code Changes gorithms that we use for comparison with Quick- Mine. These algorithms are categorized into sequential To allow context awareness, we make minor changes to prefetching schemes (RUN and SEQ) and history based MySQL, the Linux kernel, and the NBD protocol. prefetching schemes (C-Miner∗). The RUN and C- Linux: The changes required in the kernel are Miner∗ algorithms share some of the features of Quick- straightforward and minimal. In the simplest case, we Mine, specifically, some form of concurrency awareness need to pass a context identifier on I/O calls as a separate (RUN) and history-based access pattern detection and argument into the kernel. In order to allow more flexi- prefetching (C-Miner∗). bility in our implementation, and enhancements such as Adaptive Sequential Prefetching (SEQ): We im- per-context tracking of prefetch effectiveness, we pass a plement an adaptive prefetching scheme following the handle to a context structure, which contains the transac- sequence-based read-ahead algorithm implemented in tion identifier, and query template identifier. Linux [10]. Sequence based prefetching is activated We add three new system calls, ctx_pread(), when the algorithm detects a sequence (S) of accesses to ctx_pwrite() and ctx_action(). K contiguous blocks. The algorithm uses two windows: ctx_action() allows us to inform the storage a current window and a read-ahead window. Prefetches server of context begin/end delimiters. Each system are issued for blocks contained in both windows. In call takes a struct context * as a parameter addition, the number of prefetched blocks used within representing the context of the I/O call. This context each window is monitored i.e., the block hits in the cur- handle is passed along the kernel until it reaches rent window and the read ahead window, S.curHit and the lowest level where the kernel contacts the block S.reaHit, respectively. When a block is accessed in storage device. Specifically, we add a field context the read-ahead window, the current window is set to the to struct request, which allows us to pass the read-ahead window and a new read-ahead window of context information through the I/O subsystem with no size is 2 ∗ S.curHit is created. To limit the prefetch- additional code changes. Once the I/O request reaches ing aggressiveness, the size of the read-ahead window is the NBD driver code, we copy the context information limited by 128KB as suggested by the authors [10]. into the NBD request packet and pass the information to Run-Based Prefetching (RUN): Hsu et al. [16] show the storage server. that many workloads, particularly database workloads, NBD Protocol: We simply piggyback the context in- do not exhibit strict sequential behavior, mainly due to formation on the NBD packet. In addition, we add two high application concurrency. To capture sequentiality in new messages to the NBD protocol, for the correspond- a multi-threaded environment, Hsu et al. introduce run- ing system call ctx_action(), to signify the begin- based prefetching (RUN) [17]. In run-based prefetch-

USENIX Association USENIX ’08: 2008 USENIX Annual Technical Conference 383 ing, prefetching is initiated when a sequential run is de- population. We use 100K items and 2.8 million cus- tected. A reference r to block b is considered to be part tomers which results in a database of about 4 GB. We use of a sequential run R if b lies within −extentbackward the shopping workload that consists of 20% writes. To 10 and +extentforward of the largest block accessed in R, fully stress our architecture, we create TPC-W by run- denoted by R.maxBlock. This modified definition of ning 10 TPC-W instances in parallel creating a database sequentiality thus accommodates small jumps and small of 40 GB. reverses within an access trace. Once the size of the run RUBiS10: We use the RUBiS Auction Benchmark R exceeds a sequentiality threshold (32KB), prefetch- to simulate a bidding workload similar to e-Bay. The ing is initiated for 16 blocks from R.maxBlock +1to benchmark implements the core functionality of an auc- R.maxBlock +16. tion site: selling, browsing, and bidding. We do not im- History-based Prefetching: There are several his- plement complementary services like instant messaging, tory based prefetching algorithms proposed in recent or newsgroups. We distinguish between three kinds of work [17, 13, 20, 26]. C-Miner∗ is a static mining algo- user sessions: visitor, buyer, and seller. For a visitor rithm that extracts frequent block subsequences by min- session, users need not register but are only allowed to ing the entire sequence database [25]. browse. Buyer and seller sessions require registration. In C-Miner∗ builds on a frequent subsequence mining al- addition to the functionality provided during the visitor gorithm, CloSpan [42]. It differs from QuickMine by sessions, during a buyer session, users can bid on items mining block correlations off-line, on a long sequence and consult a summary of their current bid, rating, and of I/O block accesses. First, C-Miner∗ breaks the long comments left by other users. We are using the default sequence trace into smaller sequences and creates a se- RUBiS bidding workload containing 15% writes, consid- quence database. From these sequences, as in Quick- ered the most representative of an auction site workload Mine, the algorithm considers frequent sequences of according to an earlier study of e-Bay workloads [34]. blocks that occur within a gap window. Given the se- We create a scaled workload, RUBiS10 by running 10 quence databases and using the gap and min support RUBiS instances in parallel. parameters, the algorithm extracts frequent closed se- DBT-2: DBT-2 is an OLTP workload derived from quences, i.e., subsequences whose support value is dif- the TPC-C benchmark [30, 40]. It simulates a wholesale ferent from that of its super-sequences. For example, if parts supplier that operates using a number of warehouse and sales districts. Each warehouse has 10 sales districts {a1,a2,a3,a4} is a frequent subsequence with support and each district serves 3000 customers. The workload value of 5 and {a1,a2,a3} is a subsequence with sup- involves transactions from a number of terminal opera- port value of 5 then, only {a1,a2,a3,a4} will be used tors centered around an order entry environment. There in the final result. On the other hand, if {a1,a2,a3} has a support of 6 then, both sequences are recorded. For are 5 main transactions for: (1) entering orders (New Order), (2) delivering orders (Delivery), (3) recording each closed frequent sequence e.g., {a1,a2,a3,a4}, C- ∗ payments (Payment), (4) checking the status of the or- Miner generates association rules of the form {(a1 → ders (Order Status), and (5) monitoring the level of stock a2), (a1&a2 → a3),...,(a3 → a4)}. As an optimization, C-Miner∗ uses the frequency of at the warehouses (Stock Level). Of the 5 transactions, a rule suffix in the rule set to prune predictions of low only Stock Level is read only, but constitutes only 4% of probability through a parameter called min confidence. the workload mix. We scale DBT-2 by using 256 ware- For example, if the mined trace contains 80 sequences houses, which gives a database footprint of 60GB.

with {a2&a3 → a4} and 20 sequences with {a2&a3 → a5}, then {a2&a3 → a5} has a (relatively low) 5.3 Evaluation Methodology confidence of 20% and might be pruned depending on the min confidence threshold. In our experiments, We run our Web based applications on a dynamic content infrastructure consisting of the Apache web server, the we use max gap =10, min support =1, and PHP application server and the MySQL/InnoDB (ver- min confidence = 10% for C-Miner∗. sion 5.0.24) database storage engine. For the database applications, we use the test harness provided by each 5.2 Benchmarks benchmark while hosting the database on MySQL. We run the Apache Web server and MySQL on Dell Pow- TPC-W10: The TPC-W benchmark from the Transac- erEdge SC1450 with dual Intel Xeon processors running tion Processing Council [1] is a transactional web bench- at 3.0 Ghz with 2GB of memory. MySQL connects to mark designed for evaluating e-commerce systems. Sev- the raw device hosted by the NBD server. We run the eral web interactions are used to simulate the activity of a NBD server on a Dell PowerEdge PE1950 with 8 Intel retail store. The database size is determined by the num- Xeon processors running at 2.8 Ghz with 3GB of mem- ber of items in the inventory and the size of the customer ory. To maximize IO bandwidth, we use RAID 0 on 15

384 USENIX ’08: 2008 USENIX Annual Technical Conference USENIX Association 10K RPM 250GB2550GB harddisks. WeWe installUb Ubuntuuntu 6.06 100% 100% on both the clientclient and serverserver machines with Linux k kernelernel 75% 75%

vversionersion 2.6.1 2.6.18-smp.8-smp. 50% 50%

WeWe configureconfigure our caching library touse 16KB block 25% 25% size to matchh the MySQL/InnoDB block size.s WeWe use 0% 0% 100 clients forfoor TPC-W10 andRRUBiS UBiS10.F Foroor DBT DBT-2,T-2,-2, we LRU S SEQEQ R RUNUN C CMINERMINER Q QMINEMINE LRU SEQ RUN N CMINER QMINE use 256 warehouses.wareehouses. WWe e runeach experimentexperimment for tw twoo HIT PRROMOTEOMOTE MISSISS HIT PROMOTEROMOOTE MISSISS (a)Hit/Miss/Promote Promote (512M) (b) Hit/Miss/PromoteHit/Miss/Prommote (1024M) hours. WeWe traintrrain C-Miner∗ ona trace collec collectedcted from the 100% 3 ) first hour of the experiment.experiment.p WeWe measure statistics for ) s s m m ( 75% ( y ∗ y c both C-Minerr and QuickMineQuickMine duringthe se secondecond hour of c 2 n n e e t t a a L 50% L d the experiment.experimeent. WWe e usea lookahead valuevalue of10 for C- d a a e e e 1 R R . ∗ . g Miner , whichwhicch isthe best valuevalue founde experimentallyxperimentally for 25% g v v A the givengiven app applicationsplications andnumber of client clientsts used inthe 0 0% 5121024 4 2048 LRU SEQ RUN CMINER QMINE experiments.experiments. QuicQuickMinekMine is less sensitivesensitive tto thelooka- Totalotal StorageStoragge CCacheacheche SizeSize (MB)(MB)) headv value,alue, aand an anyy lookaheadv valuealue betwe betweeneen 5 and 10 HIT PROMOTEPROMOOMOTE MISSMISS LRU SEQ RUN CMINER QMINE (c)Hit/M Hit/Miss/Promoteiss/Prromote (2048M) (d) AverageAAvverage ReadRead LatencyLatency givesgives similarr results. WeWe use a lookaheadv valuealue of 5 for QuickMineQuickMine. Figure5: TPTPC-W.PC-WW.. Prefetching benefit in termsteerms of miss rate anda averageveraage read latenc latency.y. 6 Resultss

WeWe evaluateevaluate tthe performance of the followingfollowinng schemes:a baseline cach cachinghing schemewith noprefetching g (denotedas miss rate by1 15%,5%, while QuicQuickMinekMine reduces tthe miss rate LRLRUU), adaptiveadaptiive sequential prefetching (SEQQ), run-based by 60%.The e benefit of sequentialprefetchi prefetchinging s schemeschemes prefetching ( RUNRUN), C-Miner∗ (CMINER),, and Quick-Quick- is lowlow dueto lack of sequentialityin the w workload.orrkload. WithWith Mine (QMINEQMINNE).In Section 6.1,we presentpresennt the overalloverall larlargerger cache ssizes, the baseline missrates are e reducedto performanceof those schemes,whereas inSection 6.2, 45% forthe 11024MB cache (Fig. 5(b)) andd to 20% for we provideprovide detailedd analysisto further understandunnderstand the the 2048MB cache(Fig. 5(c)). WithWith lowerlowerr missrates, achievedachieved o overallveerall performance ofeach schem scheme.me. there are lo lowerweer opportunities forprefetching. g. Inspite of this, QuicQuickMinekMiine still providesprovides a 30% and 17 17%% reduction in missrates for the 1024MBand 2048MB caches sizes,izes, 6.1Ov Overallerrall P Performanceerfformanceormance respectirespectively.vely. In this section,sectioon, wemeasure the storage cachecacche hitra rates,tes, FForor QuicQuickMinekMMine, the cachemiss reductionsreductioons translate miss rates, promotep rates and the a averageverage read latencylatency into substantialsubstantiaal read latencylatency reductions as well,weell, as sho shownwn for ea eachch of thethhe prefetching schemes by runn runningning ourthree in Figure 5(d) 5(d).). In their turn,these translatein intonto decreasesdecreases benchmarks in severalseveral cache configurations.configuratioons. ForFor all in o overallverall stor storagerage access latencylatency by factorsfactors rangingraanging from eexperiments,xperiments, the MySQL/InnoDB bufferbuffferfer poolp isset to 2.0 forthe 51 512MB12MB to 1.22 for the 2048MBB caches, re- 1024MB andd we partition the storagecac cacheche such that spectispectively.vely. TThis isin contrast to the other prefetchingprefetching the prefetchi prefetchinging area isa fix fixeded 4% ofthe totalstorage algorithms, wherew the averageaverage read latenc latencyy is compara- cache size. F Foror TPC-W 10 andRRUBiS UBiS10,we use 100 ble to the baselinebasseline a averageverage latencylatency for all st storagetorage cache clients and wwe v varyary the storage cachesize, sho showingwing re- sizes.i sults for 512 512MB,2MB, 1024MB, and 2048MB sstorage cache RUBiS:RUBiS: CContext-awareontext-aware prefetching bene benefitsefits RRUBiS UBiS for each ben benchmark.chmark. F Foror DBT DBT-2,T-2,-2, we sho showw results only aswell, as ssho shownwn in Figure 6.The baselinebaseeline (LRU)(LRRU)U) for the 1024 1024MB4MB storage cache, sinceresu resultsults for other cache missra ratesates are 85%,36% and 2% for the512MB, cache sizes area similar similar.. In QuicQuickMinekMine,we e usequery- 1024MB,and d 2048MB cache sizes, respectively.respecttively. F Foror a template con contextsntexts for TPC-W and RUBiSRUBiS and transac-transac- 512MBcache cache,e, as sho shownwn in Figure 6(a), t thehe SEQ and tion conte contextsxts for DBT-2.DBTT-2.-2. However,However, the resu resultsults v varyary only RUNRUN schemeschemeses reducethe miss rate by 6% and 9%, re- slightly with the conte contextxt granularityfor our three bench- spectively.spectively. C-MinerC-Miner∗ reduces themiss rate byonly 3% marks. while QuickMineQuickMMine reduces the missrate by 448%. In RRU- U- TPC-W:TPC-W: FiguresF 5(a)-5(c) showshow the hit ratesr (black), BiS, the queriesqueriies accessa spatially-localclust clusterer of blocks. miss rates (white),(wwhite)white), and promote rates (shaded)(shaaded) for all Thus,Thus triggeringtriggeriing prefetching fora weakly sequentialseequential acac- prefetching schemess with TPC-W10. ForFor a 512MB stor-stor- cesspattern, asin RUNRUN,results in morepro promotesomotes than age cache, as shownshown in Fig. 5(a), the baselinee (LRU)(LRRU)U) missmiss for SEQ. C-MC-MinerMiner∗ performs poorly for RUBiSRUBBiS because rate is 89%. The sequential prefetching schemesschhemes reduce manmanyy RRUBiS UBiS conte contextsxts are short. Thisresu resultsults in man manyy the miss ratee by 5% on average.average. C-Miner∗ reducesthe ffalsealse correlati correlationsions across contextcontext boundaries i inn C-Miner∗.

USENIX Association USENIX ’08: 2008 USENIX Annual Technical Conference 385 100% 100% s 2500 1 d n a s u y y o d d h c c T e e 2000 a a u u 0.75 r r s 75% 75% s u u s s c c I I c 1500 c s s A A e e 0.5 h h h h c c c 50% 50% 1000 c t t t t e e e e f f f f e e e e 0.25 r

500 r P 25% 25% P 0 0 512 1024 2048 5121024 24 2048 0% 0% Totalotal Cache SizeSize TotalTotal Cacheche SizSizee LRU SEQ RUN CMINER QMINE LRURU SEQSEQ R RUNUN CMINERCMINER Q QMINEMINE SEQ RUNRUUN CMINER QMINE SEQ RUNRUN CMINERC QMINE HIT PROMOTEPROMOTE MISSMISS HIT PROMOTEPROOMOTE MISS (a)Hit/Miss/P Hit/Miss/PromotePromote (512M) (b) Hit/Miss/PromoteHit/Miss/Prommote (1024M) (a)TPC-W W (Issued) (b) TPC-W(A (Accuracy)Accuracy)

s 2000 1 d

100% n a s

3 u y y o d d h c c T e e e ) ) a a s s u u 1500 0.75 r r s s m m u u ( ( s 75% s c c I I y y c c c c s 2 s n n A A e e e e 1000 0.5 h h t t h h a a c c c c L L t t t 50% t e e d d e e f f f f a a e e e e e 500 e 0.25

1 r r R R . . . P P g g v 25% v A 0 0 0 512 1 1024024 2 2048048 512 1024102424 20482048 51210 102424 2048 0% Totalotal CCacheache SizSizee (MB) Tootaltal Cachee SizSizee (MB) LRULRU SEQR RUNUN CMINERQMINE TotalTotal Storage Cache Size (MB) SEQ RUUNN CMINER QMINE SEQ RUUNN CMINER QMINE HIT PROMOTEPROMOTE MISS LRULRU SEQ RUNRUN CMINER QMINE (c)RRUBiS UBiiS (Issued) (d) RUBiSRRUBUBiS (A (Accuracy)Accuracy) (c)Hit/M Hit/Miss/PromoteMiss/Promoteiss/PPromote (2048M) (d) AverageAverage ReadReead LatencyLatency

Figure 6: RUBiSRUUBiS.Prefetching benefit in termsteerms of miss Figure 8: RUBiSRUUBiS10/TPC-W10.WWe e showshow thee number of rate anda averageverrage readlatenc latency.y. prefetches issuedisssued (inthousands) andthe accuracyaccuracy ofthe prefetches.

100% 3 ) ) s s m 75% m ( ( y y c c quential prefetchingprefeetching schemes decrease themmiss rate by n n e e t t a 50% a L L 2 less than1%. C-Miner∗ and QuickMineQuickMine performperfform slightly d d a a e e e R R ∗ . 25% . g g better.better. C-MinerC-Minner lowerslowers the miss rateby 2%% and QuicQuick-k- v v A

0% 1 Mine reducess the miss rate by6%. Ho However,wever, the high LRU SEQ RUN CMINER QMINE LRU SEQ RUNUN CMINER QMINE I/O footprintof this benchmark causes disk congestion, HIT PROMOTEROMOTE MISSISS LRU SEQ RUNN CMINER QMINE (a)Hit/M Hit/Miss/PromoteMiss/Promote (b) AverageAverage ReadReead LatencyLatency hence increasesincreasees the promote latency.latency. Overall,Overall,, the a averageverage read latencylatency in increasesncreases by2% for thesequent sequentialial prefe prefetch-tch- ∗ FigureFi 77: DBDBT-2.BT-2.2 PPrefetchingf t hi benefitb fit iin te termserms of f miss i ing schemes anda C-MinerC Miner , while theread latencylaatency is re-re rateand averageaveraage read latenc latencyy for 1024MB st storagetorage cache. ducedby 3%for QuickMineQuickMine. 6.2Detai Detailediled Analysis Hence, onlya fe feww prefetch ruleprefix prefixeses de derivederived during Inthis section,sectionn, we evaluateevaluate theprefetching effectivenesseefffectivenessfectiveness training can belater matched on-line,whil whilele manymany rule of the differentdiffferentfereent schemesby measuring thee numberof sufsuffixesffixfixes areppruned due to lowlow confidence. QuicQuickMinekMine prefetches issuedissued andtheir accuracyaccurracyacy,i.e., thee percentage oovercomesvercomes th thehe limitations of C-Miner∗ by trackingt cor-cor- of prefetched blocks consumed bythe applic application.cation. Both relations per context.context. metricsare importantimmportant since if only afe feww prefetchesprrefetches are Theperfor performancermance of the prefetching algorithmsalgoorithms isre- issued, theiro overallverall impact is low,low, e evenven at highhiigh accuracyaccuracy flected in the e a averageverage readlatenc latencyy as well well.l. As sho shownwn forthese pref prefetches.fetches. WeWe alsocompare the ttw twoo his history-tory- in Figure 6(d 6(d),d), thesequential prefetching sch schemeshemes ( SEQ based schemes,schemees, QuickMineQuickMine and C-Miner∗,in more de- and RUNRUN) re reduceeduce the a averageverage read latenc latencyy bby up to 10% tail. Specifically,Specificaally, we showshow the benefits ofcon contextntext a aware-ware- compared to o LR LRU.RU.U. Thereductions in mis missss rateusing nessand the bbenefit of incrementalmining, v versusersus st staticatic QuicQuickMinekMine translatetranslate to reductions in read latencies of mining. 45% (512MB)(512MBB) and 22%(1024MB) compa comparedared to LR LRU,RU,U, Detailed ComparisonC of PrefetchingPrefetching Ef Effectiveness:ffectiveness: correspondincorrespondingng to ano overallverall storage access la latencyatency reduc- In Figure8, wew sho showw the numberof prefetchesprefetches issued, tion by a ffactor acttor of 1.63for the 512MB cach cachehe and 1.3 for and theircorr correspondingresponding accurac accuracyy for allpre prefetchingefetching al- the 1024MB cache. gorithms. ForForr TPC-W 10 witha 512MBcach cache,he, sho shownwn in DBT-2:DBT-2: PPrefetching is difficultdiffficultficult for DBT-2,DBTTT-2,-2, since the Figure 8(a), QQuicQuickMinekMine issues 2M prefetche prefetches,es, while C- wworkloadorkload mi mixix for this benchmarkcontains a high fracfrac- Miner∗,SEQSEQ, , and RUNRUN issue less than500K K prefetches. tion ofwri writes;tess; furthermore, some of its trans transactionssactions i issuessue The RUNRUN schemeschheme is theleast accurate (< 50%)since vveryery fe feww I/O I/Os.Os. The I/O accesses areno notot sequent sequential.ial. manymany prefetchesprefetcches are spuriouslytriggered. TheSEQ As Figure 7 sho shows,ws, the lack of sequentialit sequentialityty causes the schemee exhibitsxhibbits a better prefetch accuracyaccuracy of between sequential pr prefetchingrefetching algorithms to perform m poorly.poorly. Se- 50%and 75% % for thethree cache sizes. Both h QuicQuickMinekMine

386 USENIX ’08: 2008 USENIX Annual Technical Conference USENIX Association and C-Miner∗ achieve greater than 75% accuracy. While head/gap parameter correlates with the concurrency de- C-Miner∗ has slightly higher accuracy than QuickMine gree in context oblivious approaches, such as C-Miner∗, for the rules issued, this accuracy corresponds to sub- i.e., needs to be higher for a higher concurrency degree. stantially fewer rules than QuickMine. This is because, In contrast, QuickMine’s parameters, including the value many of the C-Miner∗ correlation rules correspond to of the lookahead parameter, are independent of the con- false correlations at the context switch boundary, hence currency degree; they mainly control the prefetch aggres- are not triggered at runtime. As a positive side-effect, the siveness. While the ideal prefetch aggressiveness does higher number of issued prefetches in QuickMine allows depend on the application and environment, QuickMine the disk scheduler to re-order the requests for optimizing has built-in dynamic tuning mechanisms that make it ro- seeks, thus reducing the average prefetch latency. As a bust to overly aggressive parameter settings. result, average promote latencies are significantly lower in QuickMine compared to C-Miner∗, specifically, 600 s µ The ability to dynamically tune the prefetching de- versus 2400 s for the 512MB cache. For comparison, a µ cisions at run-time is yet another benefit of context- cache hit takes 7 s on average and a cache miss takes µ awareness. For example, in TPC-W, QuickMine au- 3200 s on average, for all algorithms. µ tomatically detects that the BestSeller and the symbi- For RUBiS10 with a 512MB cache, shown in Fig- otic pair of Search and NewProducts benefit the most ure 8(c), QuickMine issues 1.5M prefetches, which is ten from prefetching, while other queries in TPC-W do not. times more than C-Miner∗ and SEQ. In the RUN scheme, Similarly, it detects that only the StockLevel transac- the spatial locality of RUBiS causes more prefetches tion in DBT-2 benefits from prefetching. Tracking the (250K) to be issued compared to SEQ, but only 38% prefetching benefit per context allows QuickMine to se- of these are accurate, as shown in Figure 8(d). As be- lectively disable or throttle prefetching for low perform- fore, while C-Miner∗ is highly accurate (92%) for the ing query templates thus avoiding unnecessary disk con- prefetches it issues, substantially fewer correlation rules gestion caused by useless prefetches. In particular, this are matched at runtime compared to QuickMine, due to feature allows QuickMine to provide a small benefit for false correlations. With larger cache sizes, there is less DBT-2, while C-Miner∗ degrades the application perfor- opportunity for prefetching, because there are fewer stor- mance. age cache misses, but at all cache sizes QuickMine issues more prefetch requests than other prefetching schemes. Similar as for TPC-W, the higher number of prefetches Benefit of Incremental Mining: We show the ben- results in a lower promote latency for QuickMine com- efit of dynamically and incrementally updating correla- tion rules through the use of the LRU based rule cache pared to C-Miner∗ i.e., 150µs versus 650µs for RUBiS in the 512MB cache configuration. as in QuickMine versus statically mining the entire se- quence database to generate association rules as in C- Benefit of Context Awareness: We compare the to- Miner∗ [24, 25]. Figure 9 shows the number of promotes, tal number of correlation rules generated by frequent se- cumulatively, over the duration of the experiment for C- quence mining, with and without context awareness. In Miner∗, C-Miner∗ with periodic retraining (denoted as our evaluation, we isolate the impact of context aware- C-Miner+) and QuickMine. For these experiments, we ∗ ness from other algorithm artifacts, by running C-Miner train C-Miner∗ on the de-tangled trace to eliminate the without rule pruning, on its original access traces of RU- effects of interleaved I/O, hence make it comparable with BiS and DBT-2, and the de-tangled access traces of the QuickMine. As Figure 9 shows, the change in the ac- same. In the de-tangled trace, the accesses are separated cess patterns limits the prefetching effectiveness of C- by thread identifier, then concatenated. We notice an or- Miner∗, since many of its mined correlations become der of magnitude reduction in the number of rules gen- obsolete quickly. Thus, no new promotes are issued af- ∗ erated by C-Miner . Specifically, on the original traces, ter the first 10 minutes of the experiment. In C-Miner+, ∗ C-Miner without rule pruning generates 8M rules and where we retrain C-Miner∗ at the 10 minute mark, and at 36M rules for RUBiS and DBT-2, respectively. Using the 20 minute mark of the experiment, the effectiveness ∗ the de-tangled trace, C-Miner without rule pruning gen- of prefetching improves. However, C-Miner+ still lags erates 800K rules for RUBiS and 2.8M rules for DBT-2. behind QuickMine, which adjusts its rules continuously, These experiments show that context awareness reduces on-the-fly. By dynamically aging old correlation rules the number of rules generated, because it avoids gener- and incrementally learning new correlations, QuickMine ating false block correlation rules for the blocks at the maintains a steady performance throughout the experi- context switch boundaries. ment. The dynamic nature of QuickMine allows it to au- Another benefit of context-awareness is that it makes tomatically and gracefully adapt to the changes in the I/O parameter settings in QuickMine insensitive to the con- access pattern, hence eliminating the need for explicit re- currency degree. For example, the value of the looka- training decisions.

USENIX Association USENIX ’08: 2008 USENIX Annual Technical Conference 387 s 200 d

n thisarea [4,111, 15]tak takee adv advantageantage of contextconteext informa- a s u o h s s T tion availableavailablee to thedatabase query optimizeroptimizzer [11, 15], e e 150 t t o o or addne neww middlewaremmiddleware components for exploitingexxploiting e ex-x- m m o o r 100 plicitquery dependenciesd e.g., by SQL cod codede re-writingre-writing P .

m to group relatedrelatted queriestogether [4]. Unlik Unlikeke our tech-

u 50 N nique,these e explicitlyxplicitly collaborati collaborativeve approachesapproacches require 0 substantialres restructuringstructuring of thedatabase system,systeem, code re- 0 5 10 15 2025 30 organizationorganization ini theapplication, or modificationsmodificaations to the Timeime (minutes) softwaresoftware stack k in orderto effectivelyefffectifectivelyy leverageleveraageg sem semanticantic CMINER CMINER+ QMINEE contexts.contexts. Incontrast, we showshow thatsubstan substantialntial perfor perfor-- mance advantageadvanttage can be obtained with minim minimalmal changes Figure9: IncrementalInncremental Mining Mining.. When accessacccess patterns to existingexisting sof softwareftware componentsand interf interfaces.faces.acces. change, theperformance of staticstatic mining deterioratesdeteriorates overover time. Gray-boxA Approaches.pproaches. TransparentTransparent techniquestecchniques for storage cacheoptimization leverageleverage I/O meta meta-data,a-data, or ap- plicationsema semanticsantics knownknown apriori .Meta-da Meta-dataata basedap- proaches includeincllude using file-systemmeta-da meta-data,ata, i.e., dis- 7 Relatedd WorkWork tinguishing betweenb i-nodeand data block blocksks explicitly,explicitly, orusing filenamesfilennames [5,20, 22, 35,43], or indirectlyinndirectly by Thissection discusses related techniques fo foror improvingimproving probingat the e storage client [2, 3, 37]. AlternativeAlterrnative tech- cachinghi ef efficiencyffficiencfificiiiency att ththe storage t server,server, including:iincluluding:di i) col-l niquesbased on applicationsemantics leverageleverage the prpro-ro-o- laborativelaborative ap approachespproaches lik likee ouro own,wn, which pass explicitexplicit gramgrramam backtracebacktrraceaace [12], user information [43],[43], or specific hintsbetween n client andstorage caches, or require more characteristics,characteristicss, such asin-memory addresse addresseses of I/O re- eextensivextensive codecodde restructuring and reorganization,reorganizattion, ii) gray- quests[8, 18] to classifyor infer application patterns. approaches,approachhes, which infer applicationpa patternsatterns at the SivathanuSivathanu eet al. [36] useminimally intru intrusiveusive instru- storage basedd onappli applicationcation semantics knownknoown a priori , mentationto the DBMS andlog snooping to record a andiii) blackk box approaches, which inferr applicationapplication number ofstat statistics,tistics, such that the storage systemsysttem can pro- patternsat thethhe storage serverserver in the absenceabsencce of anyany se- vide cachee exclusivenessxxclusiveness and reliability for tthe dat databaseabase mantic information.informmation. systemrunni runningng on top. However,However, thisttechnique is ExplicitlyCollaborati Collaborativeve A Approaches.pproaches. SeSeveralveral ap- DBMS-specific,DBMS-specifific, the storage serverserver needsto be awareaware of proaches passpasss e explicitxplicit hints from the client cacheto the theh characteristicscharacterih isticsi off the h particular i l ddatabase b system. storage cachee [6,12, 23, 28]. These hintscan indicate, for example,example, thereason behinda writebloc blockck request to Graph-basedGraph-baseed prefetching techniques based d on disco discov-v- storage [23], explicitexplicit demotions ofblocks fr fromrom the stor stor-- eringcorrelati correlationsions amongfiles in filesystems [13, 21] also agecli clientent to the serverserver cache [41], or the relativerelative impor impor-- fallfall into this category,ccategory, although theythey are nott scalable to tanceof requestedrequuested blocks [7]. TheseThese techni techniquesiques modify the numberof f blocks typical instorage systemssysteems [24]. the interfaceinterffaceace between the storageclient andannd serv server,er, by In contrastto the aboveabove approaches, our w workork is gen- requiringtha thatat an additional identifier(repr (representingresenting the erallyapplica applicableable toan anyy type of storage clientcliennt and appli- hint)be passedpasssed tothe storageserv server.er. Thu Thus,us, similarto cation; anyany da databaseatabase and file-basedapplicati applicationion can ben- QuickMine,these techniquesimpro improveve stora storageage cache ef- efitfrom Quic QuickMine.ckMine. WWe e can use arbitrary contexts,ccontexts, not ficiencyficiency throu throughugh explicitexplicit conte contextxt informatio information.on. However,However, necessarilytie tieded to theaccesses ofa particularar user [43], as opposedt too our w work,ork, inserting theappr appropriateropriate hints knownkknown applic applicationlicationi coded paths h [12][12], or certa certainaini types of f needs thoroughthorouugh understanding of the applicationappliication inter-inter- meta-data accesses,acccesses, which may be client or r appli applicationcation nals. ForFor example,exaample, Li et al. [23] requirethe e understand- specific. ing ofdataba databasease system internals todistingu distinguishuish the con- BlackBox x A Approaches.pproaches. Our workwork is als alsoso related to texttext surround surroundingding each block I/O request. Incontrast, we caching/prefetchingcaching/prefeetching techniquesthat treat thethe storageas use readily availableavailable information within thee applicationapplication ablack box, anduse fully transparent techn techniquesniques to in- about preexistingpreexissting contexts.contexts. feraccess pat patternstterns [10, 17,26, 27]. These e techniques Ingeneral general,, collaborationbetween the applicationappplication and use sophisticatedsophisticaated sequence detectionalgorit algorithmsthms forde- storage serverserveer has been extensivelyextensively studied d in the con- tectingapplic applicationcation I/O patterns, in spite ofof access in- texttext of databasedatabbase systems, e.g., by providingprovidingg theDBMS terleavingterleaving duee to concurrencyconcurrency atthe storage e serv server.er. In with morekn knowledgenowledge of the underlying storagestoorage characcharac- this paper,paper, we e havehave implementedand comp comparedared ag againstainst teristics[32] [32],, by pro providingviding application semanticsemmantic kno knowl-wl- twotwo such techniques,techniques, run-basedpr prefetchingrefetcefetchinng [17],and edgeto stora storageage serv serversers i.e., by mapping database re- C-Miner∗ [24,[244, 25]. WeWe ha haveve shownshown that th thehe high con- lations to objectsobbjects [33],or by of offloadingffloadingfloading somesoome tasksto currencycurrency degreedegree common in e-commerceapplicat applicationsions thestorage serv serverer [31]. Other recent approachesappproaches in makesmakes these techniquesteechniques ineffective.inefffectifective. WWe e ha haveve alsoar arguedgued

388 USENIX ’08: 2008 USENIX Annual Technical Conference USENIX Association that QuickMine’s incremental, dynamic approach is the cal block numbers e.g., corresponding to the MD5 hash most suitable in modern environments, where the num- function [9]. Extending the applicability of our Quick- ber of applications, and number of clients for each ap- Mine algorithm to such logical to physical mappings is plication, hence the degree of concurrency at the storage an area of future work. server vary dynamically. Acknowledgements 8 Conclusions and Future Work We thank the anonymous reviewers for their comments, The high concurrency degree in modern applications Livio Soares for helping with the Linux kernel changes, makes recognizing higher level application access pat- and Mohamed Sharaf for helping us improve the writing terns challenging at the storage level, because the storage of this paper. We also acknowledge the generous support server sees random interleavings of accesses from dif- of the Natural Sciences and Engineering Research Coun- ferent application streams. We introduce QuickMine,a cil of Canada (NSERC), through an NSERC Canada novel caching and prefetching approach that exploits the CGS scholarship for Gokul Soundararajan, and several knowledge of logical application sequences to improve NSERC Discovery and NSERC CRD faculty grants, On- prefetching effectiveness for storage systems. tario Centers of Excellence (OCE), IBM Center of Ad- QuickMine is based on a minimally intrusive method vanced Studies (IBM CAS), IBM Research, and Intel. for capturing high-level application contexts, such as an application thread, database transaction, or query. References QuickMine leverages these contexts at the storage cache through a dynamic, incremental approach to I/O block [1] Transaction processing council. http://www.tpc.org.

prefetching. [2] ARPACI-DUSSEAU, A. C., AND ARPACI-DUSSEAU, R. H. In- We implement our context-aware, incremental min- formation and Control in Gray-Box Systems. In Proc. of the ing technique at the storage cache in the Network Block 18th ACM Symposium on Operating System Principles (October 2001), pp. 43–56. Device (NBD), and we compare it with three state-of- the-art context-oblivious sequential and non-sequential [3] BAIRAVASUNDARAM, L. N., SIVATHANU,M.,ARPACI- DUSSEAU, A. C., AND ARPACI-DUSSEAU, R. H. X-RAY: A prefetching algorithms. In our evaluation, we use Non-Invasive Exclusive Caching Mechanism for RAIDs. In Proc. three dynamic content applications accessing a MySQL of the 31st International Symposium on Computer Architecture database engine: the TPC-W e-commerce benchmark, (June 2004), pp. 176–187.

the RUBiS auctions benchmark and DBT-2, a TPC-C- [4] BOWMAN, I. T., AND SALEM, K. Optimization of Query like benchmark. Our results show that context-awareness Streams Using Semantic Prefetching. ACM Transactions on improves the effectiveness of block prefetching, which Database Systems 30, 4 (December 2005), pp. 1056–1101. results in reduced cache miss rates by up to 60% and sub- [5] CAO,P.,FELTEN,E.W.,KARLIN, A. R., AND LI,K.A stantial reductions in storage access latencies by up to a study of integrated prefetching and caching strategies. In Proc. of the International Conference on Measurements and Modeling of factor of 2, for the read-intensive TPC-W and RUBiS. Computer Systems, SIGMETRICS (1995), pp. 188–197. Due to the write intensive nature and rapidly changing [6] CHANG,F.W.,AND GIBSON, G. A. Automatic I/O hint gener- access patterns in DBT-2, QuickMine has fewer opportu- ation through speculative execution. In Proc. of the 3rd USENIX nities for improvements in this benchmark. However, we Symposium on Operating Systems Design and Implementation show that our algorithm does not degrade performance (1999), pp. 1–14.

by pruning useless prefetches for low performing con- [7] CHEN,Z.,ZHANG,Y.,ZHOU,Y.,SCOTT, H., AND SCHIEFER, texts, hence avoiding unnecessary disk congestion, while B. Empirical Evaluation of Multi-level Buffer Cache Collabo- gracefully adapting to the changing application pattern. ration for Storage Systems. In Proc. of the ACM SIGMETRICS International Conference on Measurement and Modeling of Com- We expect that context-aware caching and prefetching puter Systems (June 2005), pp. 145–156. techniques will be of most benefit in modern [8] CHEN,Z.,ZHOU,Y.,AND LI, K. Eviction-based Cache Place- environments, where the client load and number of co- ment for Storage Caches. In Proc. of the USENIX Annual Tech- scheduled applications change continuously, and largely nical Conference, General Track (June 2003), pp. 269–281.

unpredictably. In these environments, a fully on-line, in- [9] DECANDIA,G.,HASTORUN,D.,JAMPANI,M.,KAKULAPATI, cremental technique, robust to changes, and insensitive G., LAKSHMAN, A., PILCHIN, A., SIVASUBRAMANIAN, S., to the concurrency degree, such as QuickMine, has clear VOSSHALL,P.,AND VOGELS, W. Dynamo: Amazon’s highly advantages. We believe that our approach can match the available key-value store. In Proc. of the 21st ACM Symposium on Operating Systems Principles (2007), pp. 205-220. needs of many state-of-the-art database and file-based applications. For example, various persistence solutions, [10] DING, X., JIANG, S., CHEN,F.,DAV I S , K., AND ZHANG,X. Diskseen: Exploiting disk layout and access history to enhance such as Berkeley DB or Amazon’s Dynamo [9], use a I/O prefetch. In USENIX Annual Technical Conference (2007), mapping scheme between logical identifiers and physi- USENIX, pp. 261–274.

USENIX Association USENIX ’08: 2008 USENIX Annual Technical Conference 389 [11] GAO,K.,HARIZOPOULOS,S.,PANDIS,I.,SHKAPENYUK,V., [28] PATTERSON, R. H., GIBSON, G. A., GINTING, E., STODOL- AND AILAMAKI, A. Simultaneous pipelining in QPipe: Ex- SKY, D., AND ZELENKA, J. Informed prefetching and caching. ploiting work sharing opportunities across queries. In Proc. of In Proc. of the 15th ACM Symposium on Operating System Prin- the 22nd International Conference on Data Engineering, ICDE ciples (1995), pp. 79–95. (April 2006), pp. 162–174. [29] PHILLIPS, L., AND FITZPATRICK, B. Livejournal’s backend and [12] GNIADY, C., BUTT, A. R., AND HU, Y. C. Program-counter- memcached: Past, present, and future. In Proc. of the 19th Con- based pattern classification in buffer caching. In Proc. of the ference on Systems Administration (2004). 6th Symposium on Operating System Design and Implementation [30] RAAB, F. TPC-C - The Standard Benchmark for Online trans- (2004), pp. 395–408. action Processing (OLTP). In The Benchmark Handbook for [13] GRIFFIOEN, J., AND APPLETON, R. Reducing file system la- Database and Transaction Systems (2nd Edition). 1993. tency using a predictive approach. In Proc. of the USENIX Sum- [31] RIEDEL, E., FALOUTSOS, C., GIBSON, G. A., AND NAGLE, mer Technical Conference (1994), pp. 197–207. D. Active disks for large-scale data processing. IEEE Computer [14] HAAS, L. M., KOSSMANN, D., AND URSU, I. Loading a cache 34, 6 (2001), 68–74. with query results. In Proc. of 25th International Conference on [32] SCHINDLER, J., GRIFFIN, J. L., LUMB, C. R., AND GANGER, Very Large Data Bases (1999), pp. 351–362. G. R. Track-aligned Extents: Matching Access Patterns to Disk [15] HARIZOPOULOS, S., AND AILAMAKI, A. StagedDB: Designing Drive Characteristics. In Proc. of the Conference on File and database servers for modern hardware. IEEE Data Eng. Bull. 28, Storage Technologies (January 2002), pp. 259–274. 2 (2005), 11–16. [33] SCHLOSSER, S. W., AND IREN, S. Database Storage Manage- [16] HSU,W.W.,SMITH, A. J., AND YOUNG, H. C. I/O reference ment with Object-based Storage Devices. In Workshop on Data behavior of production database workloads and the TPC bench- Management on New Hardware (June 2005). marks - an analysis at the logical level. ACM Transaction on Database Systems 26, 1 (2001), 96–143. [34] SHEN,K.,YANG,T.,CHU,L.,HOLLIDAY,J.,KUSCHNER, D. A., AND ZHU, H. Neptune: Scalable Replication Manage- [17] HSU,W.W.,SMITH, A. J., AND YOUNG, H. C. The automatic ment and Programming Support for Cluster-based Network Ser- improvement of locality in storage systems. ACM Transactions vices. In Proc. of the 3rd USENIX Symposium on Tech- on Computer Systems 23, 4 (2005), 424–473. nologies and Systems (March 2001), pp. 197–208. [18] JONES,S.T.,ARPACI-DUSSEAU, A. C., AND ARPACI- [35] SIVATHANU,G.,SUNDARARAMAN, S., AND ZADOK, E. Type- DUSSEAU, R. H. Geiger: monitoring the buffer cache in a virtual Safe Disks. In Proc. of the 7th Symposium on Operating Sys- machine environment. In Proc. of the 12th International Confer- tems Design and Implementation (Seattle, WA, USA, November ence on Architectural Support for Programming Languages and 2006). Operating Systems (2006), pp. 14–24. [36] SIVATHANU,M.,BAIRAVASUNDARAM, L. N., ARPACI- [19] KEETON,K.,ALVAREZ,G.,RIEDEL, E., AND UYSAL,M. DUSSEAU, A. C., AND ARPACI-DUSSEAU, R. H. Database- Characterizing I/O-intensive workload sequentiality on modern Aware Semantically-Smart Storage. In Proc. of the FAST ’05 disk arrays. In 4th Workshop on Computer Architecture Evalua- Conference on File and Storage Technologies (December 2005), tion using Commercial Workloads (2001). pp. 239–252. [20] KROEGER, T. M., AND LONG, D. D. E. Design and implemen- [37] SIVATHANU, M., PRABHAKARAN,V.,POPOVICI, F. I., tation of a predictive file prefetching algorithm. In Proc. of the DENEHY,T.E.,ARPACI-DUSSEAU, A. C., AND ARPACI- USENIX Annual Technical Conference (2001), pp. 105–118. DUSSEAU, R. H. Semantically-Smart Disk Systems. In Proc. [21] KUENNING, G. H., POPEK, G. J., AND REIHER, P. L. An anal- of the FAST ’03 Conference on File and Storage Technologies ysis of trace data for predictive file caching in mobile computing. (March 2003). In Proc. of the USENIX Summer Technical Conference (1994), [38] SRIKANT, R., AND AGRAWAL, R. Mining Sequential Patterns: pp. 291–303. Generalizations and Performance Improvements. In Proc. of the [22] LEI, H., AND DUCHAMP, D. An analytical approach to file 5th International Conference on Extending Database Technology prefetching. In Proc. of the USENIX Annual Technical Confer- (March 1996), pp. 3–17. ence (1997), pp. 21–21. [39] WILKES, J. Traveling to Rome: QoS specifications for auto- [23] LI,X.,ABOULNAGA,A.,SALEM,K.,SACHEDINA, A., AND mated storage system management. In Proc. of 9th International GAO, S. Second-Tier Cache Management Using Write Hints. In Workshop on Quality of Service (2001), pp. 75–91. Proc. of the Conference on File and Storage Technologies (De- [40] WONG,M.,ZHANG, J., THOMAS, C., OLM- cember 2005). STEAD, B., AND WHITE,C. Database Test 2 Ar- [24] LI, Z., CHEN, Z., SRINIVASAN, S. M., AND ZHOU,Y.C- chitecture, 0.4 ed. Open Source Development Lab, Miner: Mining Block Correlations in Storage Systems. In Proc. http://www.osdl.org/lab_activities/kernel_ of the FAST ’04 Conference on File and Storage Technologies testing/osdl_database_test_su%ite/osdl_ (San Francisco, California, USA, March 2004), pp. 173–186. dbt-2/dbt_2_architecture.pdf, June 2002. [25] LI,Z.,CHEN, Z., AND ZHOU, Y. Mining Block Correlations to [41] WONG, T. M., AND WILKES, J. My Cache or Yours? Making Improve Storage Performance. ACM Transactions on Storage 1, Storage More Exclusive. In Proc. of the USENIX Annual Techni- 2 (May 2005), 213–245. cal Conference, General Track (June 2002), pp. 161–175. [26] LIANG, S., JIANG, S., AND ZHANG, X. STEP: Sequential- [42] YAN,X.,HAN, J., AND AFSHAR, R. CloSpan: Mining closed ity and thrashing detection based prefetching to improve per- sequential patterns in large databases. In Proc. of the 3rd SIAM formance of networked storage servers. In Proc. of the 27th International Conference on Data Mining (2003). IEEE International Conference on Distributed Computing Sys- tems (2007), p. 64. [43] YEH,T.,LONG, D. D. E., AND BRANDT, S. A. Increasing predictive accuracy by prefetching multiple program and user [27] MADHYASTHA,T.M.,GIBSON, G. A., AND FALOUTSOS,C. specific files. In Proc. of the 16th Annual International Sympo- Informed Prefetching of Collective Input/Output Requests. In sium on High Performance Computing Systems and Applications Proc. of the ACM/IEEE Conference on Supercomputing: High (2002), pp. 12–19. Performance Networking and Computing (1999).

390 USENIX ’08: 2008 USENIX Annual Technical Conference USENIX Association