<<

A Survey on Some of the Existing Attacks on RC4 adissertationsubmittedinpartialfulfillmentoftherequirements fortheMasterofTechnologyinComputerSciencedegree oftheIndianStatisticalInstitute

by AtanuAcharyya (mtc0509) underthesupervisionof ProfSubhamoyMaitra

Indian Statistical Institute 203,BarrackporeTrunkRoad Kolkata,700108

CERTIFICATE OF APPROVAL This is to certify that the thesis entitled “ A Survey on some of the ExistingAttacksonRC4” issubmittedbyAtanuAcharyyatowardspartial fulfillmentoftherequirementfortheawardofthedegreeoftheMastersin Technology in Computer Science at Indian Statistical Institute, Kolkata, embodiestheworkdoneundermysupervision. Dated:July20,2007

Supervisor: ProfessorSubhamoyMaitra .

Acknowledgements WithgreatpleasureandsenseofobligationIexpressmyheartfeltgratitude tomyguideandsupervisorProf SubhamoyMaitraofAppliedStatistics Unit,IndianStatisticalInstitute,Kolkata.Ithankhimfortheguidance,for theendlesspatiencethatwasrequiredforhispreciserevisionsofmywork, forhissupportwhenthingsseemednottoworkout. IthankallComputerSciencegraduatesofIndianStatisticalInstitutefor beingenjoyablecomrades.IthankthemformakingmytimeatISIamost pleasantone,andforthelongandinterestingdiscussionsonacademicand nonacademicissues.SpecialthankstoKaushikNathforhisallround support. LastlyIsincerelythankallmyfriendsandwellwisherswhohelpedme directlyorindirectlytowardsthecompletionofthiswork. IowealottotheIndianStatisticalInstituteforgivingmetheopportunityto providingmewiththebestenvironmentfordoingthat.

Atanu Acharyya

Abstract

RC4isthemostwidelydeployedstreamcipherinsoftwareapplications,due toitssimplicityandefficiency.Ithasahugeinternalstatebutithasvery lightweightschedulingandoutputgenerationprocesses,which motivatedourcryptanalyticefforts. InthisthesisweanalyzetheKSA(keyschedulingalgorithm)ofRC4,and describeseveralweaknessesinit.Weidentifyalargenumberofweakkeys, inwhichknowledgeofasmallnumberofkeybitssufficestodetermine manystateandoutputbitswithnonnegligibleprobability.Weusethese weakkeystoconstructnewdistinguishersforRC4,andtomountrelatedkey attackswithpracticalcomplexities. AnotherweaknessofRC4initializationmechanismisamajorstatistical biasinthedistributionofthefirstoutputwords.Thisbiasmakesittrivialto distinguishbetweenseveralhundredshortoutputsofRC4andrandom stringsbyanalyzingtheirsecondword.Thisweaknesscanbeusedtomount apracticalciphertextonlyattackonRC4insomebroadcastapplications,in whichthesameplaintextissenttomultiplerecipientsunderdifferentkeys. ThisuniquestatisticalbehaviorisindependentoftheKSA,andremains applicableevenwhenRC4startswithatotallyrandompermutation.

Preface

MyfirstmeetingwithRC4wasduetoalectureaspartofourcourse "Cryptology"inthe3 rd semester,deliveredbyProfSubhamoyMaitraatthe IndianStatisticalInstitute.Ihadnoideathenthatthiscipherisoneofthe mostpopularintheworld.Thealgorithmwasunexpectedlyshortinsize withagreatsimplicity.LaterIcametoknowaboutitshugeapplication powerintheworldofsecureecommunication.Thefutileattacksmounted onitsofarbearsitsclassinthefamilyofalltheexistingstreamciphers.By themiddleofourcourseworkIgottheopportunityfromProfSubhomay MaitratoworkonRC4underhisguidance.Later,afterlookingdeeplyat previousworkonRC4publishedbyotherresearchers,Iwasamazedto figureoutthatmostofitwasquiterudimentaryculminatinginapaperby FluhrerandMcgrewthatdescribedadistinguishingalgorithmwhichwas basedonsimplecountingofoutputpairs.IfiguredoutthatRC4stillhadnot receivedtheattentionitdeserves. Sincetherewasalotofpressureduetothecourseworksduringthefinal semester,IhadneverbeenabletoconcentratemuchonRC4.StillIhave triedtounderstanditsnatureandexisting,applyingthem analyticallyandexperimentally,failingmostofthetimebutsucceedinghere andthere,thoughfinallyunabletomountanattackofmyown.Eventually,I managedtoanalyzeandimplementsomeoftheexistingtechniquesfor analyzingthisuniquecipher,pointingoutbothitsweaknessesandits strength.Thisismymodestcontributiontotheworldof. List of Tables

3.1Importantparameters(mod2)ofthefirstfewroundsofRC4 *, appliedtoa2conservingpermutation…………………………..33 3.2Datarequiredforareliabledistinguisher,fordifferentkeysizes..38 List of Figures

1.1TypicalKeyManagementinstreamciphers.................7 1.2TheKeySchedulingAlgorithmandthePseudoRandom GenerationAlgorithm...............................…..9 2.1ThefirsttworoundsofRC4when S0[2]=0and S0[1]≠2………15 3.1KSAvs.KSA*……………………………………………………24 3.2SchematicrepresentationofTheorem3.1.2andTheorem3.1.4….27 3.3Thisgraphdemonstratestheprobabilitiesofspecial2 qexactkeys q ofRC4 8,16 toproducestreamswithlong2 patternedprefixes…….35 3.4Thisgraphdemonstratestheoutputprefixthatis2patternedwith 12 fixedprobabilities(1/4,1/32,1/256and2 )whenRC4 n,16 isappliedtoa2exactkey,asafunctionof n……………………..36

Table of Contents

1 Introduction ...... 9 1.1IntroductiontoCryptography...... 9 1.1.1SymmetricandAsymmetricSchemes ...... 10 1.1.2StreamCiphersvsBlockciphers...... 11 1.1.3StreamCiphers...... 12 1.2TheRC4StreamCipher ...... 14 1.2.1DescriptionofRC4 ...... 16 1.2.2RelatedWork ...... 17 1.3GeneralNotationsandConventions...... 19 2 A Practical Attack on Broadcast RC4 ...... 21 2.1TheBias...... 21 2.1.1TheBiasedSecondOutputofRC4...... 21 2.1.2TheBiasedFirstandSecondOutputofRC4...... 24 2.2CryptanalyticApplications...... 26 2.2.1DistinguishingRC4fromRandomSources ...... 26 2.2.2AOnlyAttackonBroadcastRC4 ...... 28 3 Weaknesses in the Key Scheduling Algorithm of RC4 ...... 30 3.1TheInvarianceWeakness...... 30 3.1.1TheWeaknessinKSA*...... 30 3.1.2.TheWeaknessinKSA ...... 33 3.2KeyOutputCorrelation...... 37 3.3CryptanalyticApplicationsoftheInvariance ...... 44 Weakness...... 44 4 Conclusion and Scope of Future ...... 49 Works...... 49 4.1Conclusion...... 49 4.2ScopeofFutureWorks...... 50 Bibliography ...... 52

Chapter 1

Introduction 1.1 Introduction to Cryptography Cryptographyisaremarkablefieldincomputersciencewhichdealswith veryhumanissuessuchasofprivacy,authenticity,andtrust.Theword "cryptography"comesfromtheLatin crypt ,meaning secret ,and graphia , meaning writing .So"cryptography"isliterally" secret writing ":thestudyof howtoobscurewhatyouwritesoastomakeitunintelligibletothosewho shouldnotreadit. Mathematicallytosay,a isafivetouple( P,C,K,E,D ),where thefollowingconditionsaresatisfied: 1. Pisafinitesetofpossible plaintexts 2. Cisafinitesetofpossible ciphertext 3. K,thekeyspace,isafinitesetofpossible keys 4.Foreach k∈K,thereisan rule ek∈Eandacorresponding decryptionrule dk∈D.Each ek: P →Cand dk: C →Parefunctionssuch that dk(ek(x))= x∀ x∈P. Inthelastfewdecadescryptographicalgorithms,beingmathematicalby nature,havebecomesufficientlyadvancedthattheycanonlybehandledby computers.Theencryptionschemethatpioneeredthemodernageof cryptographywastheDigitalEncryptionStandard(DES),whichwas designedinIBMlaboratoriesattheearlyseventies.Althoughtherewere manycipherswhichweredesignedandusedbeforeDES,theonlysimilarity betweencipherslikeCaesar'scodeorEnigmaononeside,andDESand Rijndealontheotherside,isthatallofthemaresolutionstothesame fundamentalproblem. 1.1.1 Symmetric and Asymmetric Schemes Encryptionschemesaredividedintotwomaintypes,symmetricschemes (sometimescalledsecretkeyschemes)andasymmetricschemes(sometimes calledpublickeyschemes).Themaindifferencebetweenthesetypesisthe requirementforasharedpieceofsecretinformation(thekey)betweenthe encryptorandthedecryptorinsymmetricschemes.Symmetricand asymmetricencryptionschemeshavevariousadvantagesanddisadvantages, someofwhicharecommontobothofthem. Advantages of Symmetric-Key Cryptography 1.Throughputratesforthemostpopularasymmetricencryptionmethodsare severalordersofmagnitudeslowerthanthoseofthebestknownsymmetric schemes. 2.Keysizesforasymmetricschemesaretypicallymuchlongerthanthose requiredforsymmetricschemes. 3.Symmetriccipherscanbeemployedasprimitivestoconstructvarious cryptographicmechanismsincludingpseudorandomnumbergenerators, hashfunctions,andcomputationallyefficientdigitalsignatureschemes,to namejustafew. 4.Symmetriccipherscanbecomposedtoproducestrongerciphers.Simple transformations,whichareeasytoanalyze,butontheirownweak,canbe usedtoconstructstrongproductciphers. Advantages of Asymmetric-Key Cryptography 1.Inatwopartycommunication,thereisonlyonesecretkey,andthiskeyis generatedandusedbythesameparty.Thekeyagreementstagecanbe carriedoutusingregularchannels,andrequiresno"outofband"interaction. 2.Inordertoachievepairwiseprivacyinalargenetwork,asymmetric cipherwouldrequireaquadraticnumberofkeys(onekeyperpair),whilean asymmetriccipherwouldrequirealinearnumberofkeys(onekeyperuser). 3.Dependingonthemodeofoperation,thekeysofanasymmetricscheme mayremainunchangedforaconsiderableperiodoftime.Forsymmetric schemes,itiscommonpracticetochangekeysfrequently,sometimesfor eachcommunicationsession. Summary of Comparison Symmetricandasymmetricencryptionhaveanumberofcomplementary advantages.Themostsignificantdisadvantageofasymmetricschemesis theirlowefficiency,whilethemostsignificantadvantagesaretheabilityto securelycommunicatewithoutanypreviousinteraction,andtherelatively simplekeymanagement(alinearnumberofkeyswhicharerarelychanged). Currentcryptographicsystemsexploittheirstrengthsbyintegratingboth typesintoprotocolsthatrunintwophases.Duringthehandshakingphase, thepartiesuseanasymmetricencryptiontechniquetosetupaconnection andtoestablishasymmetrickey.Duringthesecondphase,thiskeyisused foranefficientinteractionusingasymmetricscheme. Ifthistwophaseprotocolisexecutedforeverysession,thepartiestake advantageofthelongtermnatureofthekeysoftheasymmetricschemeand theefficiencyofthesymmetricscheme,sincetheasymmetricpartofthe protocolisasmallfractionofthetotalencryptiontime. 1.1.2 Stream Ciphers vs Block ciphers Streamciphersareanimportantclassofencryptionalgorithms.They encryptindividualcharacters(usuallybinarydigits)ofaplaintextmessage oneatatime,usingasimpletimedependentencryptiontransformation. Blockcipherssimultaneouslyencryptgroupsofcharactersofaplaintext messageusingafixedencryptiontransformation.Streamciphersare generallyfasterthanblockciphersinhardware,andhavelesscomplex hardwarecircuitry.The BlumGoldwasserprobabilisticpublickey encryptionschemedescribedin[6]isanexampleofaasymmetricstream cipher.However,moststreamciphersarebasedonsymmetricschemes. Blockciphersarememorylesswhereasinstreamcipherstheencryption functionmayvaryastheplaintextisprocessed.Streamciphersare sometimescalledstatecipherssinceencryptiondependsnotonlyonthekey andplaintext,butalsoonthecurrentstate.Thisdistinctionbetweenblock andstreamciphersisnotdefinitive;addingasmallamountofmemorytoa blockcipher(asintheCBCmode)resultsinastreamcipherwithlarge blocks. Streamciphersaremoreappropriate,andinsomecasesmandatory(e.g., insometelecommunicationsapplications),whenbufferingislimitedor whencharactersmustbeindividuallyprocessedastheyarereceived. Becausetheyhavelimitedornoerrorpropagation,streamciphersmayalso beadvantageousinsituationswheretransmissionerrorsarehighlyprobable. Thereisavastbodyoftheoreticalknowledgeonstreamciphers,andvarious designprinciplesforstreamciphershavebeenproposedandextensively analyzed.However,therearerelativelyfewfullyspecifiedstreamcipher algorithmsintheopenliterature.Thisunfortunatestateofaffairscanbe partiallyexplainedbythefactthatmoststreamciphersusedinpracticetend tobeproprietaryandconfidential.Bycontrast,numerousconcreteblock cipherproposalshavebeenpublished,someofwhichhavebeen standardizedorplacedinthepublicdomain.Nevertheless,becauseoftheir significantadvantages,streamciphersarewidelyusedtoday,andonecan expectmanymoreproposalsinthecomingyears. 1.1.3 Stream Ciphers Theonlyencryptionschemethatisinformationtheoreticallysecureisthe Vernamcipher,orinitsmorepopularname,theOneTimePadscheme (OTP).Usingthisschemerequiresakeythatisaslongasthemessage,and theciphertextisproducedbyXORingtheplaintextwiththekey.Anobvious drawbackoftheOTPisthatthehugekeylengthincreasesthedifficultyof keydistributionandstorage.Thismotivatesthedesignofstreamciphersin whichtheispseudorandomlygeneratedfromasmallersecretkey (seed),sothatthekeystreamappearsrandomtoacomputationallybounded adversary. Streamciphersarecommonlyclassifiedasbeing synchronous or asynchronous .Asynchronousstreamcipherisoneinwhichthekeystreamis generatedindependentlyoftheplaintextmessageandoftheciphertext.An asynchronousstreamcipher(denotedalsoasaselfsynchronizingstream cipher)isoneinwhichthekeystreamisgeneratedasafunctionofthekey andafixednumberofpreviousciphertextdigits.Themaindifference betweenthesetypesistheabilityoftheselfsynchronizingstreamciphersto continuethedecryption,evenwhensomepartoftheciphertextwaslost. Howevermostofthepopularstreamciphersareofthesynchronoustype, andinparticularRC4issuch. Aninherentpropertyofstreamciphersistheabsolutelossofsecurity whenencryptingmorethanonemessagewiththesamekey.Asinglekey k producesasinglekeystream Z,andknowingthe e1and e2ofthe messages m1and m2,makesiteasytocalculate e1 ⊕ e2=( Z⊕ m1) ⊕ ( Z ⊕ m2)=( Z ⊕ Z) ⊕ ( m1 ⊕ m2)= m1 ⊕ m2 whichcontainssignificantinformationabouttheplaintexts.Therearetwo mainapproachestoovercomethisproblem;thefirstoneistouseasingle stream,andthesecondistochangekeysaftereverysession.Toimplement thefirstapproach,thepartiesmustsharetheexactpositioninthestream, fromwhichthenextkeystreamwordsaregoingtobetakenforthenext session.However,communicationproblemsmaycauselossofdataandloss ofsynchronization,andwhenthishappensthepartiescannotcommunicate untiltheyresynchronize.Therearekeystreamgeneratorsthathavethe RandomAccessproperty,whichmeansthatitispossibletoreachevery pointinthestream,withinalogarithmicamountoftime.Forschemesofthis type,itispossibletodividethestreamintolongfiniteintervals,anduseone intervalpersession.AnexampletosuchastreamcipherisLeviathan, designedbyFluhrerandMcGrew([10]). Thesecondapproachismorepopular,andincludesamechanismthat takesasitsinputthesecretkeyandanadditionalpieceofdata,andderivesa sessionkeyfromthem.Thispieceofdataisusuallycalledasthe InitializationVector,andisusuallytransmittedintheclear(seeFigure1.1).

Thecomplexityofthesessionkeyderivationvariesfromsimplemethods suchasconcatenationandXORing,tocomplexmethodswhichhashthetwo values.Thehashfunctionapproachpreservesthesecurityofmostciphers, buttheconcatenationapproachsometimesreducestheirsecurity.An exampleofthisphenomenonistheIVattackonRC4.Eventhoughthereare noknownpracticalattacksonthisstreamcipher,itbecomesextremelyweak whencombinedwithaconcatenationbasedsessionkeyderivation mechanism. 1.2 The RC4 Alargenumberofstreamcipherswereproposedandimplementedoverthe lasttwentyyears.Mostofthesecipherswerebasedonvariouscombinations oflinearfeedbackshiftregisters,whichwereeasytoimplementinhardware, butrelativelyslowinsoftware.In1987RonRivestdesignedtheRC4stream cipher,whichwasbasedonadifferentandmoresoftwarefriendlyparadigm. RC4ismostcommonlyusedtoprotectInternettrafficusingtheSSL(Secure SocketsLayer)protocol.Moreover,itwasintegratedintoMicrosoft Windows,LotusNotes,AppleAOCE,OracleSecureSQL,andmanyother applications.Inaddition,itwaschosentobepartoftheCellularDigital PacketDataspecification.Indeed,theseusesofRC4maymakeRC4 themostwidelyusedstreamcipherintheworld.Itsdesignwaskeptatrade secretuntil1994,whensomeoneanonymouslyposteditssourcecodetothe Cypherpunksmailinglist.Thecorrectnessofthisunofficialdescriptionwas confirmedbycomparingitsoutputstothoseproducedbylicensed implementations. RC4hasasecretinternalstatewhichisapermutation S∈S[N]ofallthe N=2 npossiblenbitswords,andtwoindicesi,j∈[ N]init.Theinitialstate isderivedfromavariablesizekeybyaKeySchedulingAlgorithm(KSA), andthenRC4alternatelymodifiesthestate(byexchangingtwooutoftheN values)andproducesanoutput(bypickingoneofthe Nvalues). Inpracticalapplications nistypicallychosenas8,andthusRC4hasa hugestateof 2 16 log 2(|[256]| .|S 256 |)= log 2(2 .256!) ≈ 1700 bits.Itisthusimpracticaltoguessevenasmallpartofthisstate,ortouse standardtime/memory/datatradeoffattacks.Inaddition,thestateevolvesin acomplexnonlinearway,andthusitisdifficulttocombinepartial informationaboutstatesthatarefarawayintime.Consequently,allthe techniquesdevelopedtoattackstreamciphersbasedonlinearfeedbackshift registersseemtobeinapplicabletoRC4. SinceRC4issuchawidelyusedstreamcipher,ithadattracted considerableattentionintheresearchcommunity,butsofarnoonehad foundanattackonRC4whichisevenclosetobeingpractical:For n=8and sufficientlylongkeys,thebestknownattackrequiresmorethan2700time tofinditsinitialstate.

1.2.1 Description of RC4 RC4consistsoftwoparts(describedinFigure1.2):AKeyScheduling AlgorithmKSAwhichturnsarandomkey(whosetypicalsizeis40256 bits)ofℓwordsintoaninitialpermutation S∈SN,andapseudorandom generationpartPRGAwhichusesthispermutationtogenerateapseudo randomoutputsequence. ThePRGAinitializestwoindices iand jto0,andthenloopsoverfour simpleoperationswhichincrement iasacounter,increment jpseudo randomly,exchangethetwovaluesof Spointedtoby iand j,andoutputthe valueof Spointedtoby S[i]+ S[j].Notethateveryentryof Sisswappedat leastonce(possiblywithitself)withinany Nconsecutiverounds,andthus thepermutation Sevolvesfairlyrapidlyduringtheoutputgeneration process. TheKSAconsistsof NloopswhicharesimilartothePRGAround operation.Itinitializes Stobetheidentitypermutationandiand jto0,and appliesaPRGAlikeroundoperation Ntimes,stepping iacross S,and updating jbyadding S[i]andthenextwordofthekey(incyclicorder).

1.2.2 Related Work Attacks on the Keystream Generation

Abranchandboundattackwhichisbasedonthe"GuessonDemand" paradigmisanalyzedin[4]and[9].Theattacksimulatesthegeneration process,andkeepstrackofalltheknownvaluesin Sthathadbeendeduced sofar.Wheneveranunknownentryin Sisneededinordertocontinuethe simulation,theattackertriesallthepossiblevalues(orguessesavalue). Noticethatthenumberoftrialsistypicallysmallerthan Nsinceknown valuesinthepermutation Scannotberepeated.Actualoutputsareusedby thesimulationtoeitherdeduceadditionalvaluesin S(ifthepointedoutput valueisunknown)ortobacktrackwhenthepointedvalueisknownand differentfromtheactualoutput,thuscontradictingthisbranch. Thetimecomplexityofthisattackwasanalyzedanalyticallyin[9],and theanalyticresultswerefoundcompatiblewithexperimentalestimations. Thistreesearchissimpletoimplement,andneedsonly Noutputwords. However,itisveryinefficientintime,anditsenormousrunningtime,makes itworsethanexhaustivesearchfortypicalkeysizes,andcompletely impracticalforRC4 n>4 .Forredundantregionsofthestream,branches representingincorrectguessestendtoshorten,andthetimecomplexity improves,threateningthesecurityofRC4 5. Distinguishers of RC4 Streams from Randomness AdifferentresearchdirectionwastoanalyzethestatisticalpropertiesofRC4 outputs,andinparticulartoconstructdistinguishersbetweenRC4andtruly randombitgenerators.Goli'cdescribedin[5]alinearstatisticalweaknessof RC4,causedbyapositivecorrelationbetweenthesecondbinaryderivative oftheLSBand1.ThisweaknessimpliesthatRC4outputsoflength26n–7.8 (240.2 forthetypical n=8)canbereliablydistinguishedfromrandomstrings. ThisresultwassubsequentlyimprovedbyFluhrerandMcGrewin[3].They analyzedthedistributionoftripletsconsistingofthetwooutputsproducedat times tand t+1andtheknownvalueof iªt(mod N),andfoundsmall biasesinthedistributionof(7N8)ofthese N3triplets:someofthese probabilitiesarepositivelybiased(1/N3.(1+1/N)),andsomeofthese probabilitiesarenegativelybiased(1/N3.(1 1/N)).Theyusedinformation theoreticmethodstoprovethatthesebiasescanbeusedtodistinguish betweenRC4andatrulyrandomsourcebyanalyzingsequencesof2 30.6 outputwords.Thisnumberofoutputwordswasestimatedrigorously, consideringasuccessprobabilityof90%. TheiranalysisofthistypicalbehaviorofRC4streamsyieldeda classificationofspecialRC4stateswhichtheydenotedasfortuitousstates, whicharethesourceofmostofthesebiases.Theynotedthatthesestatescan beusedtoextractpartsoftheinternalstatewithnontrivialprobability,but sinceRC4statesarehugethisdoesnotleadtopracticalattacksonRC4for n >5. Weaknesses of Initialization Mechanism of RC4 ThemajordifferencebetweentheeffectivekeysizeofRC4andtherealkey, withthesimplicityofthekeyextensionmechanism,stimulatedconsiderable researchontheinitializationmechanismofRC4. Inparticular,AndrewRoosnotedin[11]thatforkeyswhichhave K[0]+ K[1] ª0(modN),thefirstoutputisequalto K[2]+3withprobability 22.85 .Thecryptanalystcanusethisfacttodeducetwobytesofinformation aboutthekey( K[0]+ K[1]and K[2])withprobability2 10:85 insteadofthe trivial2 16 ,andthusreducetheeffectivekeylengthinexhaustivesearchby aboutfivebits. GrosulandWallachshowedin[8]thatforlargekeyswhosesizeisclose to Nbytes,similarkeysproducesimilarinitialstatesandsimilarstreams. Thisobservationimpliesthatforlargekeys,RC4isvulnerabletoarelated keyattack.However,fortypicalshortkeys,everybyteofthekeyisused fairlyearlyduringtheKSAexecution,andconsequentlyachangeinone byte,causesadifferenceintheindex jfromarelativelyearlystage.The differenceinthevaluesof jimpliesthatdifferententriesareswappedinthe roundsthatsucceedthischange.Thiscausessignificantdifferenceinthe outputoftheKSA(whichistheinitialpermutation)andinthegenerated stream. Other Results InterestingpropertiesofRC4weredescribedinseveralpapers.Finney specifiedin[7]aclassofstatesthatRC4canneverenter.Thisclasscontains allthestatesforwhich i= a, j= a+1and S[a+1]=1(afractionof1/N2of RC4statesareinthisclass).Analysisofstatesofthistypeindicatesthatthis classisclosedunderRC4roundoperation.Givensomearbitrarystateinthis class,correspondingtothevalue a,theroundoperationwilltransferthis stateintoanotherstateintheclass,correspondingto a’= a+1.Theround operationwillgive i’= i+1= a+1= a’, j’= j+ S[i’]=( a+1)+ S[a+1]= (a+1)+1= a’+1(satisfyingthefirstcondition),andtheswapwilltransfer thevalue1from a’to a’+1,actuallyimplyingS[ a’+1]=1andsatisfying thesecondcondition.Thusthesestatesareconnectedbyshortcyclesof length N(N1).SincethestatetransitioninRC4isinvertibleandtheinitial state( i= j=0)isnotofthistype,RC4canneverenterthesestatesforany key.However,ifRC4wouldhavebeeninitializedtosuchastate,itwould betotallyinsecure.Thegeneratedstreamwouldhaveaveryshortperiod. GivensuchanRC4stream,Pudovkinanoticedthattheinitialpermutation canbeelegantlyextracted,bylookingattheoutputswithindicesthatare equivalentto(1)mod N1( z(N1) 1 , z2(N1) 1 ,………., z N(N1) 1 ).It appearsthatthese Noutputvaluesareashiftoftheinitialpermutation,and sinceoneofthepermutationvaluesisknown( Si+1=1),theexactshiftcan bealsoisolated. 1.3 General Notations and Conventions Wedenotethenumberofbitsinakeystreamwordas n,andthesizeofthe permutation Sby N2 n. l standsforthenumberofkeywordsand wheneverwediscussanRC4versionwithaspecified nand l ,weadda subscriptindicatingtheseparameters(e.g.RC4 8,16 isthecommonlyused8 bitversionwitha128bitkey).Sometimesweomit l (typicallywhenitis irrelevanttothediscussion)andwriteRC4 8,orevensimplyRC4. Todenotethesymmetricgroupofpermutationsof{0,……, N1}weuse thestandardnotation SN.Thesetofindicesin S∈SNisdenotedby[ N]. S∈SN, i∈[ N]and j∈[ N]areusedtospecifythecomponentsofRC4 statesand risusuallyusedtoindicateroundnumbers(thelastfournotations areusedforKSAandPRGA).Weusethetriplets( S,i,j)toindicatethe permutationandindicesofspecificstates. TheroundsoftheKSAaswellasthoseofthePRGAarenumbered accordingtothevalueof i,whichmeansthattheKSAhasrounds 0 ,……, N −1 ,whereasthePRGAhasrounds1,2,……..ToindicateRC4 stateafterround r(whichisthestateafter r+1KSAroundsorafter r PRGArounds),weusethesubscript rin Sr, irand jr.

Chapter 2

A Practical Attack on Broadcast RC4

Inthischapterwedescribeamajorstatisticalbiasinthedistributionofthe initialbytesofRC4streams,anddiscussitscryptanalyticapplications.

2.1 The Bias

Our main observation is that the second output word of RC4 has a very strongbias,viz.ittakesonthevalue0withtwicetheexpectedprobability (2/ Ninsteadof1/ N).Othervaluesofthesecondoutputandallthevaluesof otheroutputshavealmostuniformdistributions. 2.1.1 The Biased Second Output of RC4 Theorem 2.1.1 Assume that the initial permutation S is randomly chosen fromthesetofallthepossiblepermutationsofS N.Thentheprobabilitythat thesecondoutputwordofRC4is0isapproximately2/N. Proof: Notation: Denotethepermutation Safterithasbeenupdatedinround tby St( S0istheinitialpermutation)andtheoutputofthisroundas zt. Claim 2.1.2 WhenS 0[2]=0andS 0[1]≠2,thesecondoutput(z 2)is0with probability1. Figure2.1:ThefirsttworoundsofRC4when S0[2]=0and S0[1]≠2 Proof:WhenS 0[1]∫2: Round0: i0=0= j0. Denote S0[1]by X. Round1:i1=1 j1= j0 + S0[i1]=0+ S0[1]= X S0[1]↔ S0[X] ⇒ S1[1]= S0[X]= Y,say. S1[X]= S0[1]= X z1= S1[S1[1]+ S1[X]]= S1[Y+ X] Round2:i2=2 j2= j1 + S1[i2]= X+ S1[2]= X(since S1[2]= S0[2]=0). S1[2]↔ S1[X] ⇒ S2[2]= S1[X]= X S2[X]= S1[2]=0 z2= S2[S2[2]+ S2[X]]= S2[X+0]=0 WhenS 0[1]=2 : Round1:i1=1 j1= j0 + S0[i 1]=0+ S0[1]=2 S0[1]↔ S0[2] ⇒ S1[1]= S0[2]=0 S1[2]= S0[1]=2 z1= S1[S1[1]+ S1[2]]= S1[0+2]=0 Round2:i2=2 j2= j1 + S1[i2]=2+ S1[2]=2+2=4 S1[2]↔ S1[4] ⇒ S2[2]= S1[4]= U≠0=,say( U≠0= S1[1]) S2[4]= S1[2]=2 z2= S2[S2[2]+ S2[4]]=S2[U+2]=0,onlywhen U=255 sincethen z2= S2[1]=0. Thus z2 =0withprobability1/ N. á Proof(ofTheorem2.1.1): P[ z2=0] =P[ z2 =0| S0[2]=0].P[ S0[2]=0] +P[ z2 =0| S0[2]≠0].P[ S0[2]≠0] ≈1.1/ N+1/ N.(1–1/ N) =1/ N.(1+1–1/ N) ≈2/ N whichistwiceitsexpectedprobability. á An Interesting Observation Based On This Bias ByapplyingBayesruletotheaboveresult,weget P[ S0[2]=0| z2 =0] =P[ S0[2]=0].P[ z2 =0| S0[2]=0]/P[ z2=0] ≈((1/ N).1)/(2/ N) =1/2 Consequently,wheneverthesecondoutputbyteis0wecanextractanentry of Swithprobability1/2,whichsignificantlyexceedsthetrivialprobability of1/ N.Thisfactcanbeusedtoacceleratemostoftheknownattacksbya factor of N/2, but this improvement does not suffice to mount a practical attackonRC4 n>5. 2.1.2 The Biased First and Second Output of RC4

Asimilarphenomenonappearsinthedistributionofthefirstpairofvalues, wheretheoutputpair( z1 =0, z2 =0)hasprobabilitythatisthreetimesthe expected1/ N2.Thisresultisdescribedintheformofthefollowingtheorem: Theorem 2.1.3 Assume that the initial permutation S is randomly chosen fromthesetofallthepossiblepermutationsofS N.Thentheprobabilitythat boththefirstandsecondoutputwordofRC4is0isapproximately3/N 2. Proof: Theprobabilitymassisdistributedasperthefollowingthreecases: 2 Claim2.1.4 WhenS 0[2]≠0,P(z 1 =0,z 2 =0)=1/N .

Proof: Round0:i0=0= j0. Let S0[1]= X, S0[2]= Y Round1:i1=1 j1= j0 + S0[i1]=0+ S0[1]= X S0[1]↔ S0[X] ⇒ S1[1]= S0[X]= Y,say S1[X]= S0[1]= X z1= S1[S1[1]+ S1[X]]= S1[Y+ X](=0withprobability1/ N) Round2:i2=2 j2= j1 + S1[i2]= X+ S1[2]= X+ Y(since S1[2]= S0[2]= Y) S1[2]↔ S1[X+ Y] ⇒ S2[2]= S1[X+ Y]= U,say S2[X]= S1[2]= Y z2= S2[S2[2]+ S2[X]]= S2[U+ Y](=0withprobability1/ N) 2 ThusP( z1=0, z2=0)=1/ N.1/ N=1/ N . á Claim 2.1.5 WhenS 0[2]=0,S 0[1]≠1,P(z 1=0,z 2=0)=1/N .

Proof: Round0:i0=0= j0. Let S0[1]= X Round1:i1=1 j1= j0 + S0[i1]=0+ S0[1]= X S0[1]↔ S0[X] ⇒ S1[1]= S0[X]= Y,say S1[X]= S0[1]= X z1= S1[S1[1]+ S1[X]]= S1[Y+ X](=0withprobability1/ N) Round2:i2=2 j2= j1 + S1[i2]= X+ S1[2]= X(since S1[2]= S0[2]=0) S1[2]↔ S1[X] ⇒ S2[2]= S1[X]= X S2[X]= S1[2]=0 z2= S2[S2[2]+ S2[X]]= S2[X+0]=0 ThusP( z1=0, z2=0)=1/ N.1=1/ N. á Case3: S 0[2]=0,S 0[1]=1 Claim 2.1.6 S0[2]=0,S 0[1]=1,P(z 1=0,z 2=0)=1.

Proof: Round0:i0=0= j0. Round1:i1=1 j1= j0 + S0[i 1]=0+ S0[1]=1 S0[1]↔ S0[1] ⇒ S1[1]= S0[1]=1 z1= S1[S1[1]+ S1[1]]= S1[1+1]=0(since S1[2]= S0[2]=0) Round2:i2=2 j2= j1 + S1[i 2]=1+ S1[2]=1(since S1[2]=0) S1[2]↔ S1[1] ⇒ S2[1]= S1[2]=0 S2[2]= S1[1]=1 z2= S2[S2[1]+ S2[2]]= S2[0+1]=0 ThusP( z1=0, z2=0)=1 á Proof(ofTheorem2): P[ z1=0, z2=0] =P[ z1=0, z2=0| S0[2]≠0].P[ S0[2]≠0] +P[ z1=0, z2=0| S0[2]=0, S0[1]≠1].P[ S0[2]=0, S0[1]≠1] +P[ z1=0, z2=0| S0[2]=0, S0[1]=1].P[ S0[2]=0, S0[1]=1] ≈1/ N2.(1–1/ N)+1/ N.1/ N.(1–1/ N)+1.1/ N.1/ N =1/ N2.(1–1/ N+1–1/ N+1) =1/ N2.(3–2/ N) ≈3/ N2 whichisthriceitsexpectedprobability. á An Interesting Observation Based On This Bias

ByapplyingBayesruletotheaboveresult,weget P[ S0[2]=0| z1 =0, z2 =0] =P[ S0[2]=0].P[ z1 =0, z2 =0| S0[2]=0]/P[ z1=0, z2=0] =((1/ N).(1/ N+1/ N))/(3/ N2) =2/3 Consequently,wheneverthefirstandsecondoutputbyteis0wecanextract twoentriesof Swithprobability2/3,whichsignificantlyexceedsthetrivial probabilityof1/ N2. 2.2 Cryptanalytic Applications The strong bias described so far has several practical cryptanalytic applications. 2.2.1 Distinguishing RC4 from Random Sources

Thefollowingobservationcanbeusedtoconstructastrongdistinguisher forRC4whichrequiresonly O(N)outputwords. Theorem 2.2.1 Let X, Y be distributions, and suppose that the event e happensinXwithprobabilitypandinYwithprobabilityp(1+q).Thenfor small p and q, O(1/pq 2) samples suffice to distinguish X from Y with a constantprobabilityofsuccess. Proof: Let Xe, Ye be the random variables specifying the number of occurrencesof e in t samples. Then Xe and Ye have binomial distributions withparameters( t, p)and( t, p(1+ q)),andtheirexpectations,variancesand standarddeviationsare: E[ Xe]= tp , E[ Ye]= tp (1+ q) V( Xe)= tp (1–p)≈ tp V( Ye)= tp (1+ q)(1–p(1+ q)) = tp (1+ q–p(1+ q)2) ≈ tp (1+ q)

SD( Xe)= tp

SD( Ye)= tp 1( +q) ≈ tp We'llanalyzethesizeof tthatimpliesadifferenceofatleastonestandard deviationbetweentheexpectationsofthetwodistributions: E[ Xe]–E[Ye]≥SD( Xe) ⇔ tp (1+ q)–tp ≥ tp ⇔ tpq ≥ tp ⇔ t≥1/ pq2 Consequently, O(1/ pq 2) samples (The constant depends on the desired successprobability)sufficeforthedistinguishing. á Let X be the probability distribution of the second output in uniformly distributed streams, and let Ybe the probability distributionof the second outputinstreamsproducedbyRC4forrandomlychosenkeys.Theevent e denotesanoutputvalueof0,whichhappenswithprobabilityof1/ Nin Xand 2/ Nin Y.Byusingtheprevioustheoremwith p=1/ Nand q=1,wecan concludethatweneedabout(1/ pq 2)= Noutputstoreliablydistinguishthe twodistributions.

2.2.2 A Ciphertext-Only Attack on Broadcast RC4

Therearemanybroadcastingprotocolswhichareusedtodayinavarietyof applications. For example, many users send the same email message to multiple recipients (encrypted under different keys), and many groupware applications enable multiple users to synchronize their documents by broadcastingencryptedmodificationliststoalltheothergroupmembers.All theseapplicationsarevulnerabletothisattack. Theorem 2.2.2 LetM be a plaintext, and let C 1,C 2,………,C k be the RC4 encryptionsofMunderkuniformlydistributedkeys.Thenifk=(N),the secondbyteofMcanbereliablyextractedfromC 1,C 2,………,C k. th Proof: Let Kibethe i keyandthecorrespondingRC4outputstreamis Zi, where i=1,……, k.Then Ci[2]= M[2] ⊕ Zi[2] Since,accordingtotheorem2.1.1, Zi[2]=0withprobability 2/ Nwiththe restrictionthat k=( N),therefore Ci[2]= M[2]withprobability2/ N. Thus,afractionof2/ Nofthesecondciphertextbytesareexpectedtohave the same value as the second plaintext byte, and thus the most frequent characterin C1[2],……, Ck[2]islikelytobe M[2]itself. á Animprovementcanbedoneontheaboveattackifweusetheorem2.1.3 andtheresultisasfollows Theorem 2.2.3 LetM be a plaintext, and let C 1,C 2,………,C k be the RC4 encryptionsofMunderkuniformlydistributedkeys.Thenifk=(N 2),the firstandsecondbyteofMcanbereliablyextractedfromC 1,C 2,………,C k. Proof: Ci[1]= M[1] ⊕ Zi[1]and Ci[2]= M[2] ⊕ Zi[2] Since,accordingtotheorem2.1.3, Zi[1]=0and Zi[2]=0withprobability 3/ N2withtherestrictionthat k=( N2),therefore 2 Ci[1]= M[1]and Ci[2]= M[2]withprobability3/ N . Thus,afractionof3/ N2ofthesecondciphertextbytesareexpectedtohave the same value as the second plaintext byte, and thus the most frequent character in C1[1],……, Ck[1] is likely to be M[1] itself and that in C1[2],……, Ck[2]islikelytobe M[2]itself.

Chapter 3

Weaknesses in the Key Scheduling Algorithm of RC4 Herewepresentaspecialkindofweakness,calledinvarianceweaknessin thekeyschedulingalgorithmofRC4.Weidentifyalargenumberofweak keys, in which knowledge of a small number of key bits suffices to determinemanystateandoutputbitswithnonnegligibleprobability. 3.1 The Invariance Weakness 3.1.1 The Weakness in KSA*

Weproveheretheinvarianceweaknessonlyforasimplifiedvariantofthe KSA, which we denote as KSA* as described in Figure 1. The only differencebetweenthemisthatKSA*updatesiatthebeginningoftheloop, whereasKSAupdatesiattheendoftheloop.Afterformulatingandproving the existence of this weakness in KSA*, we describe the modifications requiredtoapplythisanalysistotherealKSA. Figure3.1.KSAvs.KSA* Definition 3.1.1Let Sbeapermutationof SN, tbeanindexin Sand bbe someinteger.Thenif S[t]≡mod b t ,thepermutation Sissaidto bconservethe indext .Otherwise,thepermutation Sissaidto bunconserve theindex t. Denotethepermutation Sandtheindices iand jafterround tof KSA* as St, it and jt respectively. Denote the number of indicesthat a permutation b conserves as Ib(S).Forthesakeofsimplicity,weoftenwrite it instead of Ib(St). Definition 3.1.2Apermutation Sof SN,is bconserving if Ib(S)= N,andis almostbconserving if Ib(S)≥ N2 . Definition 3.1.3Let b,l beintegers,andlet Kbean l wordskey.Then Kis calleda bexactkey ifforanyindex tK[t mod l ]≡mod b ( 1t) .IncaseK[0] = 1and MSB (K[1]) = 1, Kiscalleda specialbexactkey . Proposition 3.1.1 Fora bexactkey,itisnecessary(butnotsufficient)that b| ℓ. Proof : K[i]≡mod b(1– i) K[i+ℓ]≡mod b (1– i) ≡mod b(1–i– ℓ) => K[i+ℓ]=(1– i)+λb =(1–i– ℓ)+b => ℓ=( –λ)b => b| ℓ. Theorem 3.1.2 Letq≤nandℓbeintegersandb 2 q.Supposethatb|ℓ andletKbeabexactkeyofℓwords.ThenthepermutationS=KSA*(K)is bconserving. Beforegettingtotheproofitself,wewillproveanauxiliarylemma. Lemma 3.1.3 Ifi t+1 ≡j t+1 (modb),theni t+1 =i t,i.e.I b(S t+1 )=I b(S t). Proof: Theonlyoperationthatmightaffect S(andmaybe I)istheswapping operation. However, when it + 1 and jt + 1 are equivalent (mod b), St + 1 b conserves it+1 (jt+1)ifandonlyif Stbconserved jt( it).Thusthenumberof indices Stbconservesremainsthesame. St(it + 1 ), and St(jt + 1) are swapped to get St + 1(jt + 1)and St + 1(it + 1) respectively.Thus St+ 1(it + 1)= St(jt + 1).Nowthefollowingtwocasesmay arise: Case1:Let Stbconservesindex jt+1. => St(jt+1)≡mod bjt+1 ≡mod bit+1 (since jt+1 ≡modb it+1) => St+1(it+1)≡mod bit+1 => St+1 bconservesindex it+1. Case2:Let Stdoesnot bconserveindex jt+1. => St(jt+1) Tmod bjt+1 => St(jt+1) Tmod bit+1 (since jt+1 ≡mod bit+1 ) => St+1 (it+1) Tmod bit+1 => St+1 doesnot bconserveindex it+1. á Proof(of Theorem 3.1.2):Wewillprovebyinductionon tthatforany1≤ t ≤ N,itturnsoutthat Ib(St)= Nand it≡ jt(mod b).Thisinparticularimplies that IN= N,whichmakestheoutputpermutation bconserving. For t=0(beforethefirstround),theclaimistrivialbecause i0= j0=0and S0istheidentitypermutationwhichis bconservingforevery b. Supposethat jt≡ itand Stis bconserving.Then it+1 = it+1 and jt+1 = jt+St[it+1]+K[it+1 mod ℓ] ≡ mod bit+it+1+(1it+1)= it+1= it+1 Thus, it+1≡ jt+1(mod b)andbyapplyingLemma1weget it+1 = it= Nand therefore St+1 is bconserving. á KSA*thustransformsspecialpatternsinthekeyintocorrespondingpatterns intheinitialpermutation.

3.1.2. The Weakness in KSA

The small difference between KSA* and KSA is essential in that KSA, appliedtoa bexactkey,doesnotpreservetheequivalence(modb)of iand j evenafterthefirstround.Analyzingitsexecutionona bexactkeygives j1= j0+S0[i1]+K[i1]=0+S0[0]+K[0]= K[0]≡mod b1Tmod b0= i1 andthusthestructuredescribedinSection3.1.1cannotbepreservedbythe cyclic use of the words of K. However, the invariance weakness can be adjustedtotherealKSA,andthepropermodificationsareformulatedinthe followingtheorem: Theorem 3.1.4 Letq≤nandℓbeintegersandb 2 q.Supposethatb|ℓ andletKbeaspecialbexactkeyofℓwords.Then P[KSA(K)isalmostbconserving]≥2/5 whentheprobabilityisovertherestofthekeybits. Extensive experimentation indicates that this bound is not tight, and the probabilityisactuallyveryclosetoonehalf. First we prove the following lemma that indicates special properties of RC4roundoperation(fortheKSA,aswellasthePRGA). Lemma 3.1.5 Leti r andj r betheindicesofroundrofPRGA(orKSA).LetX bethevaluepointedtobyjinthisroundbeforetheswap(i.e.,S r1[j r]=X). ThenXwillnotbeinvolvedindeterminingjduringroundsr+1,...,r+N1. Proof. Wecanconsiderthepermutation Sasaqueueofelementsusedto update j.Assumingarandombehavioroftheentriesof S,thisqueuehasthe followingproperties: Random Entering A new value that enters the queue is entered into a randomposition.Whenthevalue Xispointedtoby i(beforetheswap),itis used to forward jandafterwardsitentersthepermutationinthepseudo randomposition j(relativeposition j+1 i). Turn Loss Oneveryround,arandomlychosenelementinthequeueloses itsturnandisthrowntotheendoftheline.Whenthevalue Xispointedto by j(beforetheswap),itisswappedtoposition i,whichistheworstrelative position( N1).Thechoiceofthisdeprivedvalueispseudorandom. No Overpass The kth elementinthequeuemustwaitatleast kroundsforits turn. The only transfers in the queue are of the first two types, and consequentlynoelementcanmoveforward. The correctness of the lemma stems from these properties of RC4 round operation.Thevaluepointedtoby j(beforetheswap)hasaTurnLossand mustwaitatleast Nroundsbeforebeingusedas S[i](NoOverpass). á

Figure3.2SchematicrepresentationofTheorem3.1.2andTheorem3.1.4

Recallthattheswapoperationofthefirstroundcausedtheentries i0 and j0 to be unconserved, and ruined the equivalence of i and j. However, the additionalconstraintsonthekeyensurethatinthesecondround jbecomes equivalent to i, and that the unconserved entries will not affect the preservation of the structure during the rest of the KSA. The following lemmasummarizesthisscenario. Lemma 3.1.6 IfKisaspecial2exactkeyforwhichK[1]=N2,thenthe indicesiandjofrounds1,...,N1areequivalent(mod2). Proof: K[0]=1,whichcausesthecorruptedindicesafterround0tobe i0 = 0and j0 = j1 + S1[0]+ K[0]=0+0+1=1.Thediscrepancyof S1[1]isused tofixthenonequivalenceof iand jduringround1: i1=1 j1= j0+ S0[1]+ K[1]=1+0+( N2)= N1ªmod2 i1 Consequently, i1ªj1,I1= N2andtheunconservedindicesin S1are0and N1.Lemma3.1.5ensuresthatthesevalues(thatarealmostthelastones inthequeueatthispoint)arenotinvolvedindetermining jduringrounds 2,...... , N2,afactthatensuresthatonlyconservedindicesareinvolved indetermining j2,...... ,jN 2.Thisproperty,alongwiththeexactnessof K, preservestheequivalenceof jand i(justasintheproofofTheorem3.1.2)at leastuntilafterround N 2. á

Lemma 3.1.7 LetKbeaspecial2exactkeyofanevensize.Supposethat K[1]=N2(thisconstraintiscompatiblewiththerequirementsofspecial2 exact keys). Then the outcome of the KSA applied to K is an almost 2 conservingpermutation,regardlessoftheotherbitsofthekey. Proof: I1= N –2.Thusaccordingtolemma3.1.3weget IN2= I1 = N2. Weanalyzetwopossiblescenariosforthelastround(round N 1).If SN1 conserves the index N 1, j is updated appropriately and we can reuse Lemma 3.1.4 to conclude that IN1 = IN2. Otherwise, the index N 1 is unconservedby SN2 causing jtobeupdatedinappropriately,i.e., iN1 TjN1. However, N 1 is unconserved, and thus, swapping it with the non equivalent index jN 1, causes the index jN 1 (independently of its apriori status)tobecomeconserved(weuseherethefactthat b=2,whichimplies thatif aTcand cTdthen aTd).Thus,atmostoneoftheindices iN1, jN1 wasconservedby Sbeforeround N 1andatleastoneofthemisconserved after round N 1. Consequently, the number of conserved indices cannot decreaseand iN1 ¥iN2 = N2,implyingthat SN isalmost2conserving. á

Lemma 3.1.8 LetKbeaspecial2exactkeyandletS=KSA(K) . Then P[KSA(K)isalmost2conserving]≥2/5 Proof:Recallthattheconditionsofthislemmaincludepredeterminingthe whole K[0], but only the MSB of K[1]. The eliminated condition (with respecttoLemma3.1.6)isthat K[1]= N2,whichsentthecorruptedentry to the last position, making it unlikely that it will affect the other entries. Without this condition, the unconserved entries might interfere with the updates of j, ruin its equivalence to i, and generate more unconserved indices.However,withrelativelyhighprobability,thisproblematicscenario willbepreventedevenundertheseweakenedconditions.Asintheprevious case,thecorruptedentryin i0=0ispromisednottobetouchedby iduring theremaining N 2rounds.Thesecondcorruptedentry( j1 )mightbeused to update j in round j1, which would ruin the equivalence of the indices. However,ifthislocationispointedtoby j beforeround j1,thediscrepancy inmovedtoanentrywhich iwillnotvisit(itispossiblethatthisentrywill bepointedtoby jonsomeintermediateround r,andthediscrepancywill movefrom j1to ir= r,buttheentry rwillbeuntouchableby i,andthusthe discrepancy can move only between entries which i will never visit), and willnotinterferewiththeupdatesof j.Noticethat j1=1+K[1]> N/2(recall that MSB (K[1])=1),andthus jhasmanyopportunities(atleast N/2)tovisit position j1before idoes.Theprobabilityof N/2pseudorandom j'storeach somespecificvalue,iswellapproximatedby1 1/ e º2/5.Thisgivesa lower bound on the probability that KSA (K) is almost 2conserving. á Finally, we derive the proof of Theorem 3.1.4 from the proof of Lemma 3.1.8. The equivalence of theindices after the second round (round 1) is independent of b, and the only part of the proof that might change is the probabilityoftheunconservedentriestocorruptthisequivalence.However, thecorruptedentriesarestill0and j2> N/2andthusthisprobabilitycanbe stillboundedfrombelowby2/5. á KSAthustransformsspecialpatternsinthekeyintocorresponding p patternsintheinitialpermutation.ForRC4 nthatusesakeyof l =2 . m words(of nbitseach),wefoundthatforevery qp,thereexistsan assignmentof n+1+ q( l 1)bits(thefirstword, qLSBsofeachofthe other l 1words,andtheMSBofthesecondword)ofKthatdetermines Θ (qN )bitsof S0withasignificantprobabilityofonehalf.Forthe commonlyusedRC4 8withakeyof6bytes,14bitsofKdetermine238bits ofS 0,19bitsof Kdetermine472bitsofS0,etc.Thiscorrelationimpliesthat theKSAofRC4doesnotmixthebitsofthekeyequallybetweenthebitsof thepermutation,andthisphenomenoninducesaweaknessoftheKSA. 3.2 Key-Output Correlation Here we will analyze the propagation of the weak key patterns into the generatedoutputs.FirstweproveTheorem3.2.1whichdealswiththehighly biased behavior of a weakened variant of the PRGA, applied to a b conservingpermutation.Next,wewillarguethattheprefixoftheoutputof theoriginalPRGAishighlycorrelatedtotheprefixoftheswaplessvariant (onthesameinitialpermutation),whichimpliestheexistenceofbiasesin thePRGAdistributionfortheseweakkeys. 3.2.1 Correlation for PRGA* Definition 3.2.1 LetPRGA*beaweakenedvariantofPRGAwithnoswap operations.RC4*beaweakenedvariantofRC4withthatusesPRGA *. q Definition 3.2.2 Let q∈N,b2 .Let {x r}r=1to∞bethestreamofqbit words,producedbyapplyingthe qbitPRGA *totheidentitypermutationin Sb.Thenthestream {x r}iscalledabpattern,andeverystream {Xr}r=1toh ofn bitwords( n¥q)thatsatisfies ∀ r≤h, Xrªxr(mod b) iscalleda bpatternedstream. Claim 3.2.1 AnystreamproducedbyPRGA *fromabconserving permutationisbpatterned. Proof: Let {Xr}r=1to∞bethisstream.Denotethestatecomponentssequences inducedbythisstreamby {Sr}r=1to∞, {Ir}r=1to∞ and {Jr}r=1to∞.Denotethe samesequencesinducedbythe qbitstream {x r}r=1to∞by {sr}, {ir},{jr}. We firstprovebyinductionon rthat ∀ rIrªir(mod b)and Jrªjr(mod b).We usethefactthattherearenoswapoperationsandthusthepermutations S and sdonotchangeandremain bconservingthroughoutthegeneration process. Base Case For r=0(beforethefirstround),I 0=i 0=J 0=j 0=0. Inductive Step Supposethat Ir1ªir1andJr1ªjr1.Then Ir= Ir1 +1 ªir1+1 ªir Jr = Jr1 + Sr1[Ir]ªjr1 + Irªjr1 + irªjr1 + s[ir] ªjr Havingtheseequivalences,wecaneasilyderivetheequivalenceofthe streams xr=s r[s r[i r]+s r[j r]] ªs r[i r]+s r[j r] ªi r+j r Xr =S r[S r[I r]+S r[J r]] ªS r[I r]+S r[J r] ªI r+J r=i r+j r=x r á

Analysisofthis qbitstreamshowsthatitisperiodicwithperiod2b. q Theorem 3.2.2 Letq≤n,b 2 andS 0beabconservingpermutation.Let {X t}t=1to∞betheoutputsequencegeneratedbyapplyingPRGA*toS 0,andx t X t modb.Thenthesequence{x t}t=1to∞isperiodicwithperiod2b. Proof:Sincethereisnoswapoperation,thepermutationdoes not change andremains bconservingthroughoutthegenerationprocess.Noticethatall thevaluesof S0areknown(mod b),aswellastheinitialindices i0= j0=0≡ 0 (mod b), and thus the round operation (and the output values) can be simulated(mod b),independentlyof S0. it= t jt= jt1+ S[it]= jt1+ S[t]≡mod b jt1+ t Thuswehave jt≡mod b jt1+ t jt1 ≡mod b jt2+ t1 ………………… j1 ≡mod b j0+1 Byaddingtheaboveequationswehave t j ≡ j + i = t(t+1)/2 t mod b 0 ∑i=1 Now xt= S[S[it]+ S[jt]] =S[≡mod bit+ jt] ≡mod b it+ jt ≡mod b t+ t(t+1)/2 = t(t+3)/2 Thus x2b+1 ≡mod b (2 b+1)(2 b+4)/2 =(2 b+1)( b+2) ≡mod b2 Also x1 ≡mod b 1.(1+3)/2=2≡mod b x2b+1 Henceprovingtheperiodicity.á Thefollowingfigureillustratestheabovetheoremfor b=2. Table3.1:Importantparameters(mod2)ofthefirstfew roundsofRC4 *,appliedtoa2conservingpermutation 3.2.2 Correlation for PRGA

AfterprovingthisbiasedbehaviorofPRGA *on bconservingpermutations, weanalyzetheexpressionofthisphenomenonwhenusingtheoriginal PRGA.RecallthatateachroundofthePRGA Schangesinatmosttwo locations.Thuswecanexpectadiminishingcorrelationbetweenthe sequencesofpermutationsproducedbyPRGAandPRGA*fromthesame initialpermutation. Thiscorrelationfadesoutwhen risincreased,sinceasmoreswapsthatare madebythePRGA,moreentriesare"spoiled"bytheseswaps.This diminishingcorrelationisexpressedalsointheoutputwords,whichare completelydeterminedbythecorrelatedpermutations.Consequentlyspecial exactkeysarelikelytobetransformedbytheKSAintoalmost bconserving permutations,whicharelikelytobetransformedbythePRGAinto relativelylong bpatternedstreams.Thecorrelationbetweenspecialexact keysandpatternedstreamprefixesisdemonstratedinFigure5.2,wherethe q q function h →P[1 ≤∀ r≤hZrªxrmod2 ](forspecial2 exactkeys)is empiricallyderivedfor n=8, l =16anddifferent q's.Forexample,a special2exactkeycompletelydetermines20outputbits(theLSBsofthe first20outputs)withprobability2 4.2 insteadof2 20 ,andaspecial16exact keycompletelydetermines40outputbits(4LSBsfromeachofthefirst10 outputs)withprobability2 2.3 ,insteadof2 40 . Wehavethusdemonstratedastrongprobabilisticcorrelationbetweensome bitsofthesecretkeyandsomebitsoftheoutputstreamforalargeclassof weakkeys.Animportantobservationaboutthiscorrelationisitsunexpected dependencyon n.Noticethatthesizeofthecorrelatedprefixdependsonthe probabilitythatalltheentriesthatareusedtoproducetheoutput(three entriesperround)are"virgin",thatiswerenotswappedinanearlierround. Theprobabilitythatthethreeindiceswhichareusedtogeneratethe hth output,werenotswappedintheprevious h1rounds,hasnegativelinear dependenceontheprobabilityofarandomentrytohitaspecificentry, whichisproportionalto1/N.Thusthesizeofthecorrelatedoutputdepends linearlyon N(andexponentiallyin n).Thisissomewhatsurprisingsinceone wouldexpectthatenlargingthearraywouldstrengthentheoverallsecurity. However,theexpressionoftheinvarianceweaknessisamplified,which counterbalancesthisexpectation.Moreover,assumingafixedsizekey,the fractionof2 qexactkeysforq >1"prefers"large n's.Thisisduetothefact thatthedependencyon l =#keybits / nisstrongerthanthedependencyon n(recallthata2 qexactkeyrequiresfixing n+1+ q( l 1)bits). Consequentlyusinga128bitkeyinRC4 8,16ismoreimmunetothis weaknessthanusingthiskeyinRC4 16,8 .Thisphenomenon,wherethe expressionoftheinvarianceweaknessisamplifiedwhennisincreased,is demonstratedinFigure3.3.

Figure 3.3: This graph demonstrates the probabilities of special 2qexact q keysofRC4 8,16 toproducestreamswithlong2 patternedprefixes Figure 3.4: This graph demonstrates the output prefix that is 2patterned 12 withfixedprobabilities(1/4,1/32,1/256and2 )whenRC4 n,16 isapplied toa2exactkey,asafunctionof n. 3.3 Cryptanalytic Applications of the Invariance Weakness 3.3.1 Distinguishing RC4 Streams from Randomness In [2] Mantin and Shamir described a significant statistical bias in the second output word of RC4. They used this bias to construct an efficient algorithm which distinguishes between RC4 outputs and truly random sequencesbyanalyzingonlyonewordfrom O(N)differentoutputsstreams. Thisisanextremelyefficientdistinguisher,butitcanbeeasilyavoidedby discardingthefirsttwowordsfromeachoutputstream.Ifthesetwowords arediscarded,thebestknowndistinguisherrequiresabout2 30 outputwords (see[3]).Ournewobservationyieldsasignificantlybetterdistinguisherfor mostofthetypicalkeysizes.Thenewdistinguisherisbasedonthefactthat forasignificantfractionofkeys,asignificantnumberofinitialoutputwords containaneasilyrecognizablepattern.Thisbiasisflattenedwhenthekeys arechosenfromauniformdistribution,butitdoesnotcompletelydisappear andcanbeusedtoconstructanefficientdistinguisherevenwhenthefirst twowordsofeachoutputsequencearediscarded. Noticethattheprobabilityofaspecial2 qexactkeytobetransformedinto a 2 qconserving permutation does not depend of the key length l (see Theorem3.1.4).However,thenumberofpredeterminedbitsislinearin l , and consequently the size of this bias (and thus the number of required outputs) also depends on l . In Table 3.2we specify the quantity of data requiredforareliabledistinguisher,fordifferentkeysizes.Inparticular,for 64 bit keys the new distinguisher requires only 2 21 data instead of the previouslybestnumberof2 30 outputwords. It is important to notice that the specified output patterns extend over several dozen output words, and thus the quality of the distinguisher is almostunaffectedbydiscardingthefirstfewwords.Forexample,discarding thefirsttwowordscausesthedatarequiredforthedistinguishertogrowby a factor of between 2 0.5 and 2 2 (depending on l ). Another important observationisthatthebiasesintheLSB'sdistributioncanbecombinedina naturalwaywiththebiaseddistributionoftheLSB'sofEnglishtextsintoan efficientdistinguisherofRC4streamsfromrandomnessinaciphertextonly attack in which the attacker does not know the actual English plaintext whichwasencryptedbyRC4.Thistypeofdistinguishersisdiscussedinthe nextsection.

Table3.2:Datarequiredforareliabledistinguisher,fordifferentkeysizes

3.3.2 Ciphertext-Only Distinguishers based on the Invariance Weakness ThedistinguisherswepresentedinSection3.3.1,aswellasmostofthe distinguishersmentionedintheliterature(forRC4andotherstreamciphers) assumeknowledgeoftheplaintextinordertoisolatetheXORedkeystream. However,inpracticetheonlyinformationtheattackerhasistypically somestatisticalknowledgeabouttheplaintext,e.g.,thatitcontainsEnglish text.Combiningthenonrandombehaviorsoftheplaintextandthe keystreamisnotalwayspossible,andtherearecaseswhereXORingbiased streamsresultwithatotallyrandomstream(e.g.whenonestreamisbiased initsevenpositionsandtheotherstreamisbiasedinitsoddpositions).We proveherethatiftheplaintextsareEnglishtexts,itiseasytoconstructa ciphertextonlydistinguisherfromaforesaidbiases.Theintuitionofthis constructionisthatthebiasesdescribedinSection3.3.1areinthe distributionoftheLSBs,andconsequentlytheycanbecombinedwiththe nonrandomdistributionoftheLSBsofEnglishtexts. TherearemanymajorbiasesinthedistributionoftheLSBsofEnglishtexts, andtheycanbecombinedwithbiasesofthekeystreamwordsinvarious ways.Intheorem3.3.1weestimatethebiascausedbyXORingtwostreams whicharebiasedintheirLSBsdistribution. Theorem 3.3.1 Let{m t},{z t}and{c t}betheplaintext,keystreamand ciphertext(mod2)ofastreamcipherrespectively(weassumeindependence betweentheplaintextandthekeystream).Supposethatz tandm thave positivebiasesb z>0,b m>0towardsthebitsv z;v mrespectively,i.e,Pr[z t= vz]=0.5+b zandPr[m t=v m]=0.5+b m.Then Pr[c t=v m ⊕ v z]=0.5+2b mbz Proof: Pr[ c = v ⊕ v ]=Pr[ m = v ,z = v ]+Pr[ m = ,z = ] t m z t m t z t vm t vz =Pr[ m = v ] ÿPr[z = v ]+Pr[ m = ] ÿPr[z = ] t m t z t vm t vz 1 1 1 1 =( + bm) ÿ( + bz)+( bm) ÿ( bz) 2 2 2 2 1 = +2b mbz 2 á Nextweexemplifyconstructionsofciphertextonlydistinguishers,by concentratingonspecificbiasesoftheLSBsdistributionsinEnglishtexts. Corollary 3.3.2 LetCbetheciphertextgeneratedbyRC4fromarandom keyandtheASCIIrepresentationofplaintexts,distributedaccordingtothe firstorderstatisticsofEnglishtexts.Letp l betheprobabilityofarandom l wordskeytobespecial2exact.ThenCcanbedistinguishedfroma randomstreambyanalyzingthefirstfewwordsofabout 800 differentRC4 p 2 l streams. Proof: ThefirstorderstatisticsofEnglishtextsgivesa55%probabilityofa charactertohaveLSB0,andthus bmº0.05.Analyzingtheresultsfrom section3.2weget bz=Pr[ c1=0] 0.5 =Pr[ c1 =0|keyisspecial bexact]+ Pr[ c1 =0|keyisnotspecial bexact].(1 p l ) 0.5 º ÿ ÿ 0.75 p l +0.5 (1 p l ) 0.5 ÿ =0.25 p l ÿ ÿ ÿ ÿ ÿ Thus,Pr[c 1=0]=0.5+2 0.05 0.25 p l =0.5+0.025 p l =0.5 (1+0.05 p l )andthedatarequiredforareliableciphertextonlydistinguisheris(again weuseTheorem2.2.1) 1 O( 1 )= O( )= O( 800 ) pq 2 5.0 ⋅ .0 05 ⋅ p 2 p 2 ()l l á 16 41.6 ForthetypicalRC4 8,8 , p l =2 andwegeta2 ciphertextonly distinguisher.Adrawbackofthisapproachisthatourdistinguisherworkson prefixesofmessagesthathavedistributiondifferentfromfirstorder statisticsofEnglish.Forexample,thespacecharacterisnotlikelytooccur atthebeginningofamessage,whereasinfirstorderstatisticsofEnglish,it isthemostprobablecharacter.Experimentswemadeimplythatthe4th characterofEnglishtextshasprobabilityof60%forhavinganLSBof0. º ÿ Thusweget bm=0.1,and bz=Pr[ z4=0] 0.5 0.25 p l (fromthegraph), ÿ ÿ ÿ ÿ ÿ whichgivesPr[ c4=0]=0.5+2 0.1 0.25 p l =0.5 (1+0.1 p l ).Thus thedatarequiredforthedistinguisheris2 39.6 outputs. Itisimportanttonotethatwedidnotuseallthestatisticalinformationthatis availableineitherthekeystreamortheplaintextdistributions,and consequentlythisanalysisdoesnotrepresentthebestpossibleattack.

Chapter 4

Conclusion and Scope of Future Works 4.1 Conclusion Inthisthesiswehavedescribedseveralflawsintheinitializationmechanism ofRC4,whicharecausedbyitsextremesimplicity.Thecomplicatedtaskof theKSA,whichistoextendrelativelyshortrandomkeysintolarge pseudorandompermutations,isreasonablybutonlypartiallyfulfilled.The initializationofthepseudorandomindexjto0seemstobethemost problematicoperation,andthesecondbytebiascouldbeavoidedbyusinga morecomplexinitializationofj.Possiblemethodsforinitializingjaretouse jfromtheendoftheKSAortogiveitthevalueofoneofthekeywords. Theinvarianceweaknessistheinherentconsequenceofthestructureofthe KSA. Aperfectinitializationmechanismisnoteasytoachieve.Wewouldlike toavoidpatternsthatareindependentofthekey(likethesecondbytebias), whileontheotherhandwedonotwantanytrivialdependencybetweenthe keyandthefirstoutputbits(liketheinvarianceweakness).Acommonmode ofoperationtoachievethesecontradictinggoalsistodiscardaprefixof outputbits.Thesemuteroundsusuallydisconnectthegeneratedstreamfrom theinitializationprocess,andimprovethe"randomness"ofthegenerated stream.Discardingthefirsttwobytesvoidsthepracticalattacks,butretains theinvarianceweakness.Consequently,itisrecommendedtodiscardatleast completesequenceofNwords. OnecannoticethatwhenenlargingRC4wordsinto16bits(whichis sometimesrecommendedforfasterencryptionoflargeamountofdata),the discardedprefixshouldalsogrowinthesameway(exponentially).The expressionoftheinvarianceweaknessspreadsoverseveralhundredwords inRC4 16 andeliminatingonly256wordsisnotsufficientwhenNislarger. ThefinalandapositiveconclusionaboutthesecurityofRC4isthatusing RC4inthiswayseemstobesecure,evenwhenusingaconcatenationbased sessionkeyderivation.Itisbelievedthatnoinformationleaksabouteither thekeyoranypartoftheencryptedmessages. 4.2 Scope of Future Works Manynewobservationsabouttheinternalstate,theoutputdistributionand thecorrelationbetweentheminRC4arethereintheliterature.Several methodsarebeingdescribedtoextractpartialinformationabouttheinternal state,andwebelievethatthesemethodscanbefurtherimproved. Excludingprefixdistinguishers,thebestknowndistinguisherforRC4 ([3])isbasedoncountingthenumberofoccurrencesofoutputpairsin periodicpositionsinthestream,whichisanamazinglysimplemethod.A nonnegligibleportionofourthesiswasdedicatedtodescribingbetter distinguishers. Themostpromisingwaytoconstructingadistinguisherisbasedonthe correlationsofJenkins.WeknowthatwheneveravalueXispointedtobyj beforetheswap,itwillnotbepointedtobyi(beforetheswap)foratleastN rounds.However,thevaluesthatarepointedtobyicanbeguessedtobei z.ThusknowingS[j]=Xcanbeusedtopredictthati zwilldifferfromX duringatleastNrounds.Furthermore,thevaluethatwasusedasS[j]hasa constantprobability(1/e)tobeusedasS[i]afterexactlyNrounds,anda doubledprobabilitytobeequaltoi zinthisround.Thelowerboundto successprobabilityofguessingS[j]withsuccessprobabilityis 1 (1+ c ) N N (forasmallconstantc),butnotstrictandunfortunatelyatighterboundisnot yetfound. ApromisingresearchdirectionistoextendtheanalysisoftheKSAinthe viewoftransferfunction.Calculating(recursivelyorexplicitly)thepairs transferfunction,i.e.theprobabilitythatthevaluesinpositionsaandbto reachpositionscanddrespectivelywithinrrounds,mightleadtonew observations.Thisfunctionisatleastasbiasedasthesinglestransfer function,andmightbemoreuseful.Itispossiblethatthepairstransfer functioncanbeeasilyderivedfromthesinglestransferfunction.Ifthisisthe case,itwillprovidenewinsightsintothecompletedistributionofKSA outputs,whichmightbesurprisinglybiased. Bibliography

[1]ScottR.Fluhrer,ItsikMantin,andAdiShamir,Weaknessesinthekey schedulingalgorithmofRC4,SAC:AnnualInternationalWorkshopon SelectedAreasinCryptography,SAC'2001,LNCS,2001. [2]ItsikMantinandAdiShamir,ApracticalattackonbroadcastRC4,FSE: FastSoftwareEncryption,FSE'2001(Yokohama,Japan),Springer Verlag,April2001. [3]ScottR.FluhrerandDavidA.McGrew,Statisticalanalysisofthealleged RC4keystreamgenerator,FSE:FastSoftwareEncryption,FSE'2000, SpringerVerlag,2000,pp.1930. [4]SergeMisterandStaffordE.Tavares,CryptanalysisofRC4likeciphers, SAC:SelectedAreasinCryptography,SAC'98(Kingston,Ontario, Canada),LNCS,vol.1556,SpringerVerlag,August1999,pp.131143. [5]JovanDj.Goli'c,LinearstatisticalweaknessofallegedRC4keystream generator,EUROCRYPT:AdvancesinCryptologyEUROCRYPT'97, InternationalConferenceontheTheoryandApplicationofCryptographic Techniques,EUROCRYPT'97(Konstanz,Germany),LNCS,vol.1233, SpringerVerlag,May1997,pp.226238. [6]ManuelBlumandShafiGoldwasser,Anefficientprobabilisticpublic keyencryptionschemewhichhidesallpartialinformation,Advancesin Cryptology:ProceedingsofCRYPTO'84(SantaBarbara,California, USA),SpringerVerlag,August1985,pp.289302. [7]HalFinney,anRC4cyclethatcan'thappen,September1994. [8]AlexanderL.GrosulandDanS.Wallach,arelatedkeycryptanalysisof RC4,TechnicalReportTR00358,DepartmentofComputerScience, RiceUniversity,October2000. [9]LarsR.Knudsen,WilliMeier,BartPreneel,VincentRijmen,andSven Verdoolaege,Analysismethodsfor(alleged)RC4,ASIACRYPT: AdvancesinCryptology,InternationalConferenceontheTheoryand ApplicationsofCryptologyandInformationSecurity,ASIACRYPT'98 (Beijing,China),LNCS,vol.1514,Springer,October1998,pp.327341. [10]DavidA.McGrewandScottR.Fluhrer,Thestreamcipher LEVIATHAN,https://www.cosic.esat.kuleuven.ac.be/nessie/ , October2000,NESSIEprojectsubmission. [11]AndrewRoos,AclassofweakkeysintheRC4streamcipher, sci.cryptposting,September1995. [12]ItsikMantin,AnalysisofthestreamcipherRC4,November2001