The National Security Agency’s Review of Emerging Technologies 6œ £n œ Î U Óä£ä

Clumps, Hoops, and Bubbles

How Akamai Maps the Net

Compressed Sensing and Network Monitoring

Revealing Social Networks of Spammers

Challenges in Geolocation NSA’s Review of Emerging Technologies

The Letter from the Editor Next Wave 7KLV LVVXH RI 7KH 1H[ :DYH LV ODUJHO\ GHULYHG IURP WDONV JLYHQ DW WKH  DQG  1HWZRUN 0DSSLQJ DQG 0HDVXUHPHQW &RQIHUHQFHV 100&V  KHOG DW WKH /DERUDWRU\ IRU 7HOHFRPPXQLFDWLRQV 6FLHQFHV /76 LQ &ROOHJH 3DUN 0DU\ODQG 7KHVH FRQIHUHQFHV HYROYHG IURP WKH 1HW7RPR ZRUNVKRSV ,,9 ZKLFK JUHZ RXW RI UHVHDUFK RQ QHWZRUN WRPRJUDSK\ VSRQVRUHG E\ WKH ,QIRUPDWLRQ 7HFKQRORJ\ ,QGXVWU\ &RXQFLO ,7,&  %\  LW EHFDPH REYLRXV WKDW D EURDGHU VFRSH ZDV QHHGHG WKDQ VWULFWO\ QHWZRUN WRPRJUDSK\ DQG WKH QDPH FKDQJH ZDV LQVWLWXWHG LQ  1HWZRUN WRPRJUDSK\ DQG PDSSLQJ DUH FORVHO\ UHODWHG ÀHOGV 7R H[SODLQ WKH GLIIHUHQFH EHWZHHQ WRPRJUDSK\ DQG PDSSLQJ KHUH DUH WZR VLPSOH GHÀQLWLRQV 1HWZRUN WRPRJUDSK\ LV WKH VWXG\ RI D QHWZRUN·V LQWHUQDO FKDUDFWHULVWLFV XVLQJ LQIRUPDWLRQ GHULYHG IURP HQGSRLQW GDWD 1HWZRUN PDSSLQJ LV WKH VWXG\ RI WKH SK\VLFDO FRQQHFWLYLW\ RI WKH ,QWHUQHW GHWHUPLQLQJ ZKDW VHUYHUV DQG RSHUDWLQJ V\VWHPV DUH UXQQLQJ DQG ZKHUH $ GHHSHU H[SODQDWLRQ RI WRPRJUDSK\ IROORZV )RU D ORQJHU GLVFXVVLRQ RI PDSSLQJ SOHDVH VHH WKH DUWLFOH ´0DSSLQJ 2XW )DVWHU 6DIHU 1HWZRUNVµ 1HWZRUN WRPRJUDSK\ LV JHQHUDOO\ RI WZR W\SHV³ERWK RI WKHP PDVVLYH LQYHUVH SUREOHPV 7KH ÀUVW W\SH XVHV HQGWRHQG GDWD WR HVWLPDWH OLQNOHYHO FKDUDFWHULVWLFV 7KLV IRUP RI WRPRJUDSK\ RIWHQ LV DFWLYH LQ QDWXUH XVLQJ PDQ\ SLQJV WUDFHURXWHV DQG RWKHU PDSSLQJ WRROV WR REWDLQ WKH QHFHVVDU\ GDWD 'XH WR WKH ODUJH DPRXQW RI XQGHVLUDEOH WUDIÀF H[SHULHQFHG E\ PDQ\ QHWZRUNV URXWHUV RU RWKHU QHWZRUN HTXLSPHQW PD\ QRW UHVSRQG WR SLQJ RU WUDFHURXWH UHTXHVWV 7KLV GHÀFLHQF\ KDV OHG WR D VHFRQG IRUP RI QHWZRUN WRPRJUDSK\ WKDW LV VRPHWLPHV FDOOHG LQIHUHQWLDO QHWZRUN WRPRJUDSK\ 7KLV IRUP RI QHWZRUN WRPRJUDSK\ XVHV LQGLYLGXDO URXWHU RU QRGHOHYHO PHDVXUHPHQWV WR UHFRYHU SDWKOHYHO LQIRUPDWLRQ 7KLV GDWD FDQ EH REWDLQHG SDVVLYHO\ DQG LW GRHV QRW FUHDWH D WUDIÀF EXUGHQ WKDW KDV WKH SRWHQWLDO WR FKDQJH WKH ORJLFDO QHWZRUN VWUXFWXUH 7KH VWXG\ RI QHWZRUN WRPRJUDSK\ LQFOXGHV QHWZRUN WRSRORJ\ ERWK ORJLFDO DQG SK\VLFDO  WKH RULJLQGHVWLQDWLRQ WUDIÀF PDWUL[ DQG TXDOLW\ RI VHUYLFH SDUDPHWHUV VXFK DV ORVV UDWHV RU GHOD\ FKDUDFWHULVWLFV $FFXUDWH DQG WLPHO\ LQIRUPDWLRQ DERXW WUDIÀF ÁRZV DUH QHFHVVDU\ IRU JRRG QHWZRUN PDQDJHPHQW 1HWZRUN WRPRJUDSK\ UHVHDUFK OHDGV WR RWKHU WRSLFV RI LQWHUHVW  +RZ GR \RX PHDVXUH WKH QHWZRUN"  :KDW NLQG RI QHWZRUNV GR WKHVH WHFKQLTXHV DSSO\ WR"  'RHV LW PDWWHU LI \RX WHVW SDUWV RI WKH QHWZRUN LQGLYLGXDOO\ DQG WKHQ SXW WKHP DOO WRJHWKHU RU GRHV WKH HQWLUH QHWZRUN QHHG WR EH LQ WKH WHVW" LQWHJUDWLRQ WHVWLQJ  :KDW VHQVLQJ WHFKQLTXHV DUH EHVW WR XVH"  ([DFWO\ ZKDW NLQG RI GDWD GR \RX QHHG WR JDWKHU"  :KDW DERXW WHFKQLTXHV IURP RWKHU GLVFLSOLQHV VXFK DV VRFLDO QHWZRUNLQJ" :LOO WKH\ DSSO\ WR WKH QHWZRUNV \RX DUH LQWHUHVWHG LQ"  +RZ GRHV LQGXVWU\ GR WKHLU QHWZRUN PDSSLQJ"  :KDW DERXW DWWULEXWLRQ" 6RPH RI WKHVH TXHVWLRQV ZHUH DGGUHVVHG DW WKH 100& VHVVLRQV DQG WKHUHIRUH DUH DGGUHVVHG LQ WKH IROORZLQJ DUWLFOHV 6HH ´&RPSUHVVHG 6HQVLQJ DQG 1HWZRUN 0RQLWRULQJµ IRU H[DPSOH UHJDUGLQJ TXHVWLRQ QXPEHU IRXU DERYH 0DQ\ PRUH TXHVWLRQV DULVH LQ WKH VWXG\ RI QHWZRUN PDSSLQJ DQG PHDVXUHPHQW 7KH 100& VHULHV KDV EHHQ D KXJH VXFFHVV ZLWK SDUWLFLSDQWV IURP GLIIHUHQW FRXQWULHV IHGHUDO DJHQFLHV XQLYHUVLWLHV DQG LQGXVWU\ 100&  ZLOO EH KHOG $XJXVW  DW 0F*LOO 8QLYHUVLW\ LQ 0RQWUHDO &DQDGD

The that appears on the cover of this issue of The Next Wave shows the level connectivity of the Internet as measured by The Next Wave is published to disseminate significant technical the . advancements in telecommunications and information technologies. The work is being commercially Mentions of company names or commercial products do not imply developed by Lumeta Corporation. endorsement by the US government. Articles present views of the authors and not necessarily those of NSA or the TNW staff. &UHGLW For information, please contact us [email protected] 3DWHQW V SHQGLQJ DQG FRS\ULJKW ‹ /XPHWD &RUSRUDWLRQ  $OO ULJKWV UHVHUYHG CONTENTS

FEATURES  0DSSLQJ 2XW )DVWHU 6DIHU 1HWZRUNV

 +RZ $NDPDL 0DSV WKH 1HW $Q ,QGXVWU\ 3HUVSHFWLYH

 &RPSUHVVHG 6HQVLQJ DQG 1HWZRUN 0RQLWRULQJ

 5HYHDOLQJ 6RFLDO 1HWZRUNV RI 6SDPPHUV

 &KDOOHQJHV LQ ,QWHUQHW *HRORFDWLRQ RU :KHUH·V :DOGR 2QOLQH"

 &OXPSV +RRSV DQG %XEEOHV³0RYLQJ %H\RQG &OXVWHULQJ LQ WKH $QDO\VLV RI 'DWD Mapping Out Faster, Safer Networks

Maps. We use them every day. Your GPS guides you to that new restaurant you’ve wanted to try. The information map in the mall points out where HERE is. Online gamers pull up battle maps to navigate virtual worlds. The social network of your friends and your friends’ friends weaves a ’s cradle of intertwined relationships. Your computer files are stored in folders that are displayed hierarchically. Site maps lay out how web pages up. And think how much easier life would be if you had a map of the labyrinth of telephone options you need to navigate—“Press 1 for hours and locations”…”Press 2 to report a problem”…”Press 3 for account information”— when you try to pay your electric bill over the phone. Maps don’t just show how things are connected. They can also identify trouble spots and weak points you need to be aware of. GPS maps are able to alert you to traffic tie ups due to accidents or lane closures so you can adjust your route. Your security system might display a floor plan that shows windows and doors are unlocked so you can protect your property. Network mapping does the same things for the Internet, helping to direct traffic and expose vulnerabilities. Network mapping can happen at different layers of the Internet, including applications, routing, or physical infrastructure, or in different parts of the Internet. Because the Internet changes constantly, any map of any variety—there are many Internet maps and no two agree—addresses a moving target.

4 Mapping Out Faster, Safer Networks FEATURE

Tracing network routes DGRSWHG WUDFHURXWH DV DQ HDV\ ZD\ WR H[SORLW QHWZRUN 1HWZRUN PDSV WUDFN WKH URXWHV LQIRUPDWLRQ YXOQHUDELOLWLHV ,W GLGQ·W WDNH F\EHUFULPLQDOV ORQJ SDFNHWV WDNH DFURVV DQ ,3 ,QWHUQHW SURWRFRO WR GLVFRYHU WKDW QRW RQO\ FDQ WKH XWLOLW\ EH XVHG WR QHWZRUN WR UHDFK D UHPRWH KRVW 1HWZRUN URXWLQJ LV ORFDWH D QHWZRUN·V ZHDN SRLQWV LQLWLDWLQJ WUDFHURXWH RSSRUWXQLVWLF DVVLJQLQJ SDFNHWV WR WKH ÀUVW DYDLODEOH IURP PXOWLSOH V\VWHPV FDQ ÁRRG D QHWZRUN WR ODXQFK URXWHU 7KLV DSSURDFK PHDQV WUDIÀF FDQ EH GLUHFWHG D GHQLDORIVHUYLFH DWWDFN DORQJ GLIIHUHQW SDWKV WR UHDFK D GHVWLQDWLRQ DQG The Internet Mapping Project WKH QXPEHU RI KRSV QHHGHG WR JHW WKHUH FDQ YDU\ 7UDFHURXWHV ZHUH LQLWLDOO\ XVHG E\ QHWZRUN 1HWZRUN PDSSLQJ PDNHV LW HDV\ WR YLVXDOL]H ZKDW DGPLQLVWUDWRUV WR WURXEOHVKRRW DQG WXQH ORFDO URXWHV DUH EHLQJ WDNHQ QHWZRUNV EXW WKH XWLOLW\ ZRXOG HYHQWXDOO\ EH 7KH WUDFHURXWH QHWZRUN XWLOLW\ ZDV LQWURGXFHG DSSOLHG RQ D PXFK ODUJHU VFDOH $V WKH :RUOG :LGH RQ 8QL[ RSHUDWLQJ V\VWHPV LQ  WR PDS QHWZRUN :HE UDSLGO\ JUHZ LQ SRSXODULW\ GXULQJ WKH V WUDIÀF 9DULDQWV RI WKH WUDFHURXWH SURJUDP DUH XVHG WKH QHHG IRU D ZRUOGZLGH PDS ZDV UHDOL]HG (IIRUWV RQ RWKHU RSHUDWLQJ V\VWHPV³WUDFHFHUW DQG SLQJ WR PDS QHWZRUN WUDIÀF JOREDOO\ EHJDQ LQ HDUQHVW XWLOLWLHV DUH XVHG RQ :LQGRZV RSHUDWLQJ V\VWHPV ZLWK WKH ,QWHUQHW 0DSSLQJ 3URMHFW VWDUWHG E\ %LOO DQG WUDFHSDWK LV WKH QHWZRUN WRRO XVHG RQ FXUUHQW &KHVZLFN DQG +DO %XUFK DW %HOO /DEV LQ  /LQX[ LQVWDOODWLRQV (YHU\ GD\ IRU HLJKW \HDUV WKH SURMHFW UHFRUGHG 1HWZRUN WHFKQLFLDQV XVH WKH WUDFHURXWH XWLOLW\ WUDFHURXWHV IRU WULOOLRQV RI SDFNHWV WUDYHOLQJ DFURVV WR WURXEOHVKRRW QHWZRUN SUREOHPV .QRZLQJ D KXQGUHGV RI WKRXVDQGV RI ,3 QHWZRUNV 7KH QHWZRUN SDFNHW·V WUDFHURXWH FDQ KHOS LGHQWLI\ IDLOHG URXWHUV PDS WKDW HPHUJHG SDLQWHG D SLFWXUH UHVHPEOLQJ D RU ÀUHZDOOV WKDW DUH REVWUXFWLQJ WUDIÀF 7UDFHURXWH VN\ ÀOOHG ZLWK ÀUHZRUNV RQ WKH )RXUWK RI -XO\ 6HH FDQ DOVR EH XVHG IRU SHQHWUDWLRQ WHVWLQJ WR KXQW IRU WKH FRYHU LPDJH IRU DQ H[DPSOH QHWZRUN HQWU\ SRLQWV WKDW FRXOG SRVH D VHFXULW\ ULVN 1RZ PDQDJHG E\ WKH /XPHWD &RUSRUDWLRQ +DFNHUV DUH HVSHFLDOO\ LQWHUHVWHG LQ ÀQGLQJ ZKLFK VSXQ RII IURP %HOO /DEV LQ  WKH ,QWHUQHW EDFN GRRUV LQWR QHWZRUNV DQG WKH\ KDYH UHDGLO\ 0DSSLQJ 3URMHFW FRQWLQXHV WR FKDUW WKH EDFN URDGV

The Next Wave „ Vol 18 No 3 „ 2010 5 DQG WKRURXJKIDUHV RI ,QWHUQHW WUDIÀF 7KH JRDO RI WKH Address space SURMHFW KDV EHHQ WR SURYLGH JOREDO QHWZRUN YLVLELOLW\ $V HQWHUSULVHV DQG JRYHUQPHQW DJHQFLHV WU\ WKURXJK WKH DFFXUDWH PHDVXUHPHQW RI IRXU IDFWRUV WR EDODQFH WKH IRUFHV IRU QHWZRUN FKDQJH ZLWK WKH  QHWZRUN WRSRORJ\  DGGUHVV VSDFH  OHDNV UHTXLUHPHQWV IRU ULVN PDQDJHPHQW DQG FRPSOLDQFH DQG  GHYLFH ÀQJHUSULQWV ,QGHSHQGHQW GLVFRYHU\ LQLWLDWLYHV ,7 VHFXULW\ PDQDJHUV DUH IDFHG ZLWK WKH SURFHVVHV DUH XVHG WR UHYHDO WKHVH IRXU FRPSRQHQWV IRUPLGDEOH WDVN RI VHFXULQJ ZKDW WKH\ DUHQ·W HYHQ WKDW GHÀQH D QHWZRUN DZDUH RI 7KH VROXWLRQ OLHV SDUWO\ LQ GLVFRYHULQJ DOO RI D QHWZRUN·V HQWLWLHV³WKRVH WKDW DUH DXWKRUL]HG DV ZHOO DV WKRVH WKDW DUH XQDXWKRUL]HG 1HWZRUN KRVW 1HWZRUN WRSRORJ\ GHVFULEHV WKH ÁRZ RI GLVFRYHU\ LV XVHG WR FRQGXFW D FHQVXV RI ,3 DGGUHVVHV QHWZRUN WUDIÀF DQG WKH ERWWOHQHFNV WKDW VORZ LW DFURVV SURWRFROV DQG UHYHDO NQRZQ DQG SUHYLRXVO\ GRZQ $ FRPSXWHU·V QHWZRUN GLVFRYHU\ VHWWLQJ XQGHWHFWHG QHWZRUN HQWLWLHV +RVW GLVFRYHU\ LV RQH DIIHFWV ZKHWKHU LW FDQ VHH RWKHU FRPSXWHUV RQ WKH RI WKH HDUOLHVW SKDVHV RI QHWZRUN UHFRQQDLVVDQFH QHWZRUN RU EH VHHQ E\ WKHP $ FRPSXWHU FDQ $GGUHVV VSDFH GHWHUPLQHV WKH DPRXQW RI RSHUDWH LQ VWHDOWK PRGH E\ VHWWLQJ LWV QHWZRUN PHPRU\ DOORFDWHG WR D FRPSXWDWLRQDO HQWLW\ VXFK GLVFRYHU\ VHWWLQJ WR RII 6XFK ´GDUNµ FRPSRQHQWV DV D QHWZRUNHG FRPSXWHU D ÀOH D VHUYHU RU VRPH ZKHQ GLVFRYHUHG FDQ DGG GHWDLOV WR D QHWZRUN PDS RWKHU GHYLFH $ XQLTXH QXPEHU DVVLJQHG E\ WKH IRU D FOHDUHU SLFWXUH RI WKH QHWZRUN·V WRSRORJ\ ,QWHUQHW $VVLJQHG 1XPEHUV $XWKRULW\ ,$1$ 9LVXDOL]LQJ QHWZRUN WRSRORJ\ PDNHV LW HDVLHU LGHQWLÀHV LQGLYLGXDO QHWZRUN QRGHV ,3Y ,QWHUQHW 3URWRFRO 9HUVLRQ  DGGUHVV VSDFH LV OLPLWHG WR D WR ÀQG ZD\V WR DFFHOHUDWH QHWZRUN WUDIÀF 7KH ELW ÀHOG \LHOGLQJ D PD[LPXP  ,QWHUQHW KLJKZD\ LV JHWWLQJ FORJJHG ZLWK VWUHDPLQJ XQLTXH DGGUHVVHV %XW WKH VXSSO\ RI DYDLODEOH ,3Y YLGHRV PRXQWDLQV RI HPDLOV PXVLF GRZQORDGV DGGUHVVHV LV UDSLGO\ UXQQLQJ RXW 7KH PRYH WR ,3Y RQOLQH SKRWR DOEXPV KLJKGHÀQLWLRQ PRYLHV DQG ZLWK D ELW ÀHOG VKRXOG H[WHQG WKH DYDLODELOLW\ RI HSLF EDWWOHV LQ YLUWXDO ZRUOGV QHZ DGGUHVVHV ZHOO LQWR WKH IXWXUH $V WKH QXPEHU *ULG FRPSXWLQJ PD\ VRPHGD\ XVKHU LQ DQ DJH RI ,3 DGGUHVVHV LQFUHDVHV WKH QHHG WR LGHQWLI\ KRVWV ZKHQ EDQGZLGWK LV YLUWXDOO\ XQOLPLWHG EXW LQ WKH WKDW DUH DFWLYH DQG WKHQ IRFXV RQ WKHP EHFRPHV PHDQWLPH SUHVVXUH LV RQ WR VTXHH]H WKH EURDGEDQG HYHQ PRUH LPSRUWDQW IRU VHFXULQJ D QHWZRUN WXEH D OLWWOH KDUGHU Network leaks $Q REYLRXV VROXWLRQ IRU VSHHGLQJ XS QHWZRUN 1HWZRUNOHDNVRFFXUDWQRGHVWKDWLQDGYHUWHQWO\ WUDIÀF LV WR LQFUHDVH WKH EDQGZLGWK LW WUDYHOV RQ OHW LQIRUPDWLRQ SDFNHWV SDVV IURP D ORFDO QHWZRUN 6HUYLFH SURYLGHUV ZRUOGZLGH KDYH EHHQ FKDOOHQJHG WR WKH ,QWHUQHW RU PRUH LPSRUWDQW WKDW OHW SDFNHWV WR UROO RXW  0ESV EURDGEDQG RYHU WKH QH[W GHFDGH IURP WKH RXWVLGH JHW LQ /HDN GLVFRYHU\ WRROV LGHQWLI\ DQG WULDOV IRU DFKLHYLQJ VSHHGV WZLFH WKDW DUH DOUHDG\ XQDXWKRUL]HG RU SUHYLRXVO\ XQGHWHFWHG LQERXQG DQG XQGHUZD\ %XW DQRWKHU DSSURDFK WR PRYLQJ QHWZRUN RXWERXQG QHWZRUN WUDIÀF DQG WKH QRGHV WKDW SDVVHG WUDIÀF IDVWHU LV WR PRYH LW VPDUWHU %\ PDSSLQJ RXW WKHP WKURXJK 7KLV LQIRUPDWLRQ LV YLWDO IRU VHWWLQJ D FRPSUHKHQVLYH URXWHEDVHG WRSRORJ\ WKH WUXH XS QHWZRUN GHIHQVHV SHULPHWHU RI WKH QHWZRUN LV GHÀQHG³D ÀUVW VWHS $ FRPPRQ ZD\ WR SUREH IRU QHWZRUN OHDNV LQ XQGHUVWDQGLQJ QHWZRUN OLPLWDWLRQV 1HWZRUN LV E\ WUDFNLQJ WKH URXWHV RI ,3 SDFNHWV WKDW XVH D PDSV FDQ WKHQ EH XVHG WR LGHQWLI\ ERWWOHQHFNV DQG IRUJHG VRXUFH DGGUHVV :KHQ WKH WDUJHWHG PDFKLQH FKDUW VKRUWFXWV PDNLQJ LW SRVVLEOH WR GHYLVH PRUH UHVSRQGV WR D WUDFHURXWH UHTXHVW ORJV IURP WKHVH HIÀFLHQW ZD\V WR PRYH SDFNHWV WR WKHLU GHVWLQDWLRQV VSRRIHG ,3 UHTXHVWV UHYHDO ZKLFK URXWHUV SDVVHG WKH

6 Mapping Out Faster, Safer Networks FEATURE

SDFNHWV RQ WR WKHLU GHVWLQDWLRQ 7HVWV IURP RXWVLGH D PDWFK GHYLFH ÀQJHUSULQWV :KDW·V PRUH DV D QHWZRUN WKDW WXUQ XS LQVLGH LQGLFDWH D ÀUHZDOO OHDN QHWZRUN·V WRSRORJ\ LV ÀOOHG LQ D FOHDUHU SLFWXUH 2QFH QHWZRUN OHDNV KDYH EHHQ GHWHFWHG ,7 RI WKH FKDUDFWHULVWLFV RI WKH QHWZRUN LV UHYHDOHG PDQDJHUV FDQ SOXJ WKH KROHV E\ GHOHWLQJ OLQNV WR SURYLGLQJ IRU HYHQ PRUH GHWDLOHG DQDO\VHV 7KH WKH ,QWHUQHW WKDW VKRXOGQ·W EH WKHUH DQG HOLPLQDWLQJ DQDO\VLV RI QHWZRUN PDSV FDQ OHDG WR PRYLQJ WUDIÀF XQDXWKRUL]HG GHYLFHV /XPHWD·V &KHVZLFN FRQVLGHUV IDVWHU NHHSLQJ LQIRUPDWLRQ VDIHU DQG ÀQGLQJ F\EHU WDNLQJ VXFK SUHFDXWLRQV VLPSO\ DV JRRG ´QHWZRUN FULPLQDOV HDVLHU K\JLHQHµ 7KH ,QWHUQHW 0DSSLQJ 3URMHFW KDV EHHQ D Device fingerprints GULYLQJ IRUFH EHKLQG WKH GHYHORSPHQW RI HIIHFWLYH QHWZRUN PDSSLQJ WRROV DQG SUDFWLFHV %\ SURGXFLQJ (YHQ ZLWK WKH PRVW ULJRURXV QHWZRUN GHIHQVHV YDULHG DQG GHWDLOHG SHUVSHFWLYHV RI JOREDO ,QWHUQHW OHDNV DUH OLNHO\ WR SHUVLVW .QRZLQJ WKH ,3 DGGUHVVHV FRQQHFWLRQV WKH ,QWHUQHW 0DSSLQJ 3URMHFW KDV RI SRWHQWLDO DWWDFNHUV LVQ·W HQRXJK WR SURWHFW D SURYLGHG LQVLJKW LQWR EHWWHU ZD\V WR FRQQHFW WKH QHWZRUN 2SHUDWRUV FDQ DQG GR FKDQJH ,3 DGGUHVVHV ZRUOG RIWHQ LQWHQWLRQDOO\ WR DYRLG LGHQWLÀFDWLRQ %XW PRVW GHYLFHV DOVR FDUU\ D FRGH WKDW VHUYHV DV D ÀQJHUSULQW PDNLQJ WKHLU LGHQWLWLHV KDUGHU WR FRQFHDO 0DQ\ YHQGHUV DVVLJQ D XQLTXH &', FOLHQW GHYLFH LGHQWLÀFDWLRQ FRGH WR WKH SURGXFWV WKH\ PDQXIDFWXUH 7KHVH GHYLFH ,'V RU GHYLFH ÀQJHUSULQWV PDNH LW SRVVLEOH WR FKDOOHQJH RIIVLWH FRPSXWHUV DQG RWKHU FRPSXWDWLRQDO GHYLFHV WU\LQJ WR DFFHVV D ORFDO QHWZRUN $QG DV ZLWK KXPDQ ÀQJHUSULQWLQJ D GHYLFH ÀQJHUSULQW FDQ EH D YDOXDEOH IRUHQVLFV WRRO 'HYLFH ÀQJHUSULQW GLVFRYHU\ SURYLGHV D VXPPDU\ RI WKH VRIWZDUH DQG KDUGZDUH VHWWLQJV FROOHFWHG IURP UHPRWH FRPSXWLQJ GHYLFHV WR LGHQWLI\ WKH VRXUFH RI QHZ DWWDFNV RU RWKHU KRVWV RI LQWHUHVW $OWKRXJK D VRSKLVWLFDWHG DWWDFNHU FDQ VSRRI D &', WKHUH DUH ZD\V WR NQRZ LI WKH FRGH KDV EHHQ WDPSHUHG ZLWK ,GHQWLI\LQJ D GHYLFH ZLWK D ÀQJHUSULQW WKDW KDV EHHQ DOWHUHG RU HYHQ UHPRYHG FDQ WLS RII QHIDULRXV DFWLYLW\ SURYLGLQJ DQRWKHU VRXUFH RI LQIRUPDWLRQ WKDW FDQ EH XVHG WR HQKDQFH QHWZRUN VHFXULW\ Faster, more secure networks 1HWZRUN PDSV DUH XVHIXO WRROV IRU LPSURYLQJ H[LVWLQJ QHWZRUNV DQG WKH\ FDQ EH FUXFLDO IRU WKH HYROXWLRQ RI ,3Y DQG EH\RQG 1HWZRUN PDSSLQJ PDNHV LW SRVVLEOH WR YLVXDOL]H QHWZRUN WRSRORJ\ LGHQWLI\ DGGUHVV VSDFHV ÀQG QHWZRUN OHDNV DQG

The Next Wave „ Vol 18 No 3 „ 2010 7 How Akamai Maps the Net: An Industry Perspective

,Q  HYHU\RQH XVHV WKH ,QWHUQHW (YHQ LI \RX GRQ·W EURZVH WKH :HE \RXU FRPSXWHU '9' SOD\HU DQG RWKHU DSSOLDQFHV WU\ WR SXOO VRIWZDUH DQG ÀUPZDUH XSGDWHV ZLWKRXW \RXU LQWHUDFWLRQ

8 How Akamai Maps the Net FEATURE

WR VFDOH LQ ZD\V WUDGLWLRQDO ,7 V\VWHPV GR QRW 6LQFH LW·V D WUXO\ GLVWULEXWHG V\VWHP PXOWLSOH FRPSRQHQWV RSHUDWH SK\VLFDOO\ VHSDUDWH IURP HDFK RWKHU \HW WKH\ DUH LQWHUGHSHQGHQW $NDPDL 0DSSLQJ WDFNOHV WKH QHHG WR PDS UHVRXUFHV WR RQH DQRWKHU DFURVV WKH QHWZRUN >@ ,QWHUQDOO\ WR $NDPDL D PDS VLPSO\ H[SUHVVHV KRZ WZR RU PRUH JURXSV DUH UHODWHG $NDPDL FDOFXODWHV WKRXVDQGV RI PDSV FRQWLQXRXVO\ 7KLV DUWLFOH GHVFULEHV WKUHH PDMRU W\SHV RI PDSSLQJ WKDW $NDPDL SHUIRUPV 7KH ÀUVW DQG PRVW FRPPRQ W\SH LV HQGXVHU UHTXHVW PDSSLQJ 7KH VHFRQG LV PDSSLQJ FRQQHFWLRQV EHWZHHQ WZR GLIIHUHQW SRLQWV RQ WKH ,QWHUQHW WKURXJK D WKLUG SRLQW $QG WKH WKLUG LV PDSSLQJ WKH Figure 3: DNS Workflow. JHRJUDSKLF ORFDWLRQ RI D QHWZRUN DGGUHVV

End-user request mapping ‡ %HKLQG WKH VFHQHV $NDPDL PDSV WKH XVHU WR WKH GDWDFHQWHU WKDW KDV WKH EHVW 7KHPRVWKHDYLO\XVHGPDSWUDQVODWHV WKH VWUXFWXUH DQG SHUIRUPDQFH SHUIRUPDQFH ZKHQ FRPPXQLFDWLQJ WR WKH GRPDLQ QDPH VHUYLFH '16 UHTXHVWV IRU RI WKH ,QWHUQHW $NDPDL·V '16 XVHU·V ORFDWLRQ D UHVRXUFH WR QHWZRUN DGGUHVVHV ZKHUH UHVSRQGV WR WKH XVHU ZLWK D OLVW RI 7R VLPSOLI\ WKH FRPSXWDWLRQDO WKH UHVRXUFH FDQ EH ORFDWHG ,Q RWKHU QHWZRUN DGGUHVVHV WKDW ZLOO SURYLGH FRPSOH[LW\ RI WKH SUREOHP $NDPDL XVHV ZRUGV ZKHQ VRPHRQH UHTXHVWV VRPH :HE WKH EHVW SHUIRUPDQFH DW WKDW WLPH WKH QRWLRQ RI D FRUH SRLQW RQ WKH HGJH RI FRQWHQW VXFK DV D GRZQORDG RU D YLGHR WKH ‡ 7KH XVHU·V FRPSXWHU FRQQHFWV WR WKH QHWZRUN 0XOWLSOH HQG XVHUV RIWHQ UHVSRQVH WHOOV WKH XVHU·V PDFKLQH ZKLFK WKH $NDPDL VHUYHU DGGUHVV DQG FRPH LQWR WKH QHWZRUN WKURXJK D $NDPDL VHUYHU ZRXOG SURYLGH WKH EHVW GRZQORDGV 1)/ FRQWHQW DQG YLGHRV SDUWLFXODU SLHFH RI LQIUDVWUXFWXUH WKDW FRQQHFWLRQ 7KH ORFDWLRQ RI WKH RSWLPDO IURP D QHDUE\ VHUYHU XVXDOO\ RQ DFWV DV D JDWHZD\ IRU D JURXS RI QHWZRUN VHUYHU LV FDOFXODWHG EDVHG RQ WKH VWUXFWXUH WKH VDPH ,63 XVHG WR DFFHVV WKH DGGUHVVHV $V DQ H[DPSOH DOO XVHUV DW D RI WKH ,QWHUQHW DQG KRZ LW LV SHUIRUPLQJ ,QWHUQHW PDMRU FRUSRUDWLRQ PD\ EH IRUZDUGHG WKH XVHU·V ORFDWLRQ ZKHUH $NDPDL VHUYHUV WR KHDGTXDUWHUV EHIRUH WKHLU LQWHUQDO DUH ORFDWHG DQG KRZ PXFK ORDG H[LVWV RQ 7R PDS UHTXHVWV WR UHVRXUFHV RQ QHWZRUN WRXFKHV WKH ,QWHUQHW RU XVHUV $NDPDL·V LQGLYLGXDO VHUYHUV DW WKH WLPH WKH ,QWHUQHW $NDPDL QHHGV WR NQRZ ERWK FRQQHFWLQJ RYHU '6/ PD\ DOO SDVV 7KH EDVLF ZRUNÁRZ GHSLFWHG LQ WKH VWUXFWXUH DQG WKH FKDUDFWHULVWLFV RI WKURXJK D VSHFLÀF QHWZRUN QRGH EHIRUH )LJXUH  LV DV IROORZV WKH QHWZRUN EHWZHHQ DQ\ WZR UHOHYDQW WKHLU FRPPXQLFDWLRQV UHDFK WKH SXEOLF SRLQWV 7KH ÀUVW VWHS LV WR XQGHUVWDQG WKH ,QWHUQHW ‡ $ XVHU W\SHV D ZHEVLWH KRVWQDPH VWUXFWXUH RI WKH ,QWHUQHW IURP WKH YDQWDJH VXFK DV ZZZQÁFRP LQWR WKHLU $NDPDL XVHV WZR SLHFHV RI SRLQWV RI $NDPDL·V VHUYHUV EURZVHU LQIRUPDWLRQ WR PDS WKH WRSRORJLFDO )RU SXUSRVHV RI PDSSLQJ UHTXHVWV VWUXFWXUH RI WKH QHWZRUN %*3 DQG ‡ 7KH PDFKLQH·V RSHUDWLQJ V\VWHP WR UHVRXUFHV $NDPDL WULHV WR GHYHORS WUDFHURXWHV XVHV '16 WR ORRNXS WKH ,3 DGGUHVV REMHFWLYH REVHUYDWLRQV RI HYHU\ SRVVLEOH Border Gateway IRU WKDW ZHEVLWH SDWK D SDUWLFXODU HQG XVHU FRXOG WDNH LQ Protocol data ‡ 7KH UHTXHVW LV UHGLUHFWHG WR FRPPXQLFDWLQJ ZLWK DQ $NDPDL VHUYHU $NDPDL WUDQVSDUHQW WR WKH XVHU ,I IRU H[DPSOH $NDPDL KDV VHUYHUV %*3 RU %RUGHU *DWHZD\ 3URWRFRO ZKHQ WKH '16 QDPH ZZZQÁFRP ORFDWHG LQ WKUHH GLIIHUHQW GDWDFHQWHUV RQ LV D SURWRFRO XVHG E\ URXWHUV WR LGHQWLI\ LV DOLDVHG WR DQ $NDPDL KRVWQDPH WKH HQG XVHU·V ,63 LW PDNHV VHQVH WR GLUHFW ZKHUH ,3 DGGUHVVHV H[LVW RQ WKH ,QWHUQHW

The Next Wave „ Vol 18 No 3 „ 2010 9 %*3 GLFWDWHV ZKLFK ZD\ QHWZRUNV ZLOO GLUHFWLRQV EHWZHHQ WZR VWUHHW DGGUHVVHV WR $NDPDL GDWDFHQWHUV DQG KRZ ZHOO HDFK VHQG WUDIÀF ZLWK D SDUWLFXODU GHVWLQDWLRQ GHWHUPLQH DOO WKH VWUHHWV LQ EHWZHHQ GDWDFHQWHU FDQ FRPPXQLFDWH WR WKH FRUH DQG LW LV WKH XQGHUO\LQJ PHFKDQLVP $OWKRXJK WUDFHURXWHV RSHUDWH SRLQWV 7R GR WKLV $NDPDL FRQGXFWV WKURXJK ZKLFK WKH PDFURVFRSLF ,QWHUQHW EHWZHHQ WZR QHWZRUN ORFDWLRQV WKH VSHFLÀF PHDVXUHPHQWV GHVLJQHG WR PDLQWDLQV LQWHUFRQQHFWLYLW\ UHTXHVW PXVW EH LQLWLDWHG IURP RQH RI WKH PHDVXUH WKH ODWHQF\ ORVV FDSDFLW\ DQG :KHQ $NDPDL GHSOR\V WR D QHWZRUN ORFDWLRQV ZKLFK FDXVHV VRPH GLIÀFXOW\ RYHUDOO DYDLODELOLW\ RI WKH FRQQHFWLRQ SURYLGHU LW XVXDOO\ QHJRWLDWHV DFFHVV &RQGXFWLQJ WUDFHURXWHV IURP MXVW RQH RU )RU WKLV FDOFXODWLRQ $NDPDL QR ORQJHU WR WKDW QHWZRUN·V %*3 GDWD 7KH GDWD WZR QHWZRUN ORFDWLRQV ZLOO QRW SURYLGH D FDUHV DERXW WKH VWUXFWXUH RI WKH QHWZRUN LV REWDLQHG WKURXJK D SDVVLYH SHHULQJ JRRG VHQVH RI WKH RYHUDOO LQWHUFRQQHFWLYLW\ %HFDXVH $NDPDL WULHV WR RSWLPL]H WKH VHVVLRQ WR UHFHLYH DQ XQDJJUHJDWHG YLHZ RI WKH QHWZRUN MXVW DV UXQQLQJ 0DS4XHVW TXDOLW\ RI GHOLYHU\ ZKLFK LV GLFWDWHG RI WKH QHWZRUN IURP RQH RU PRUH RI LWV IURP &KLFDJR WR HYHU\ PDMRU 86 FLW\ ZLOO VWULFWO\ E\ IDFWRUV WKDW LPSDFW WKH VHUYLFH URXWHUV 8QGHUVWDQGLQJ %*3 GDWD EHQHÀWV QRW UHYHDO DQ\ RI WKH URDGV EHWZHHQ VD\ SURWRFRO·V RSHUDWLRQ WKH VWUXFWXUH RI WKH WKH QHWZRUN EHFDXVH LW DOORZV $NDPDL 1HZ

10 How Akamai Maps the Net FEATURE

Figure 4: The basic Akamai mapping problem is a bipartite graph optimization, as shown. End users, aggregated behind “Core Points” on the right, are mapped to the best Akamai datacenter, on the left, based on real-world network conditions.

DQG SURFHVVLQJ IRU HDFK PDFKLQH $NDPDL UHOHDVHG FRQVLVWHQWO\ ZKHQ ORDG GURSV FRQGXFWV D VHFRQGDU\ FDOFXODWLRQ ORFDO 7KXV VWRUDJH UHVRXUFHV DUH XVHG PRUH WR UHJLRQV RI WKH ,QWHUQHW 5HJLRQV RI WKH RQO\ ZKHQ ORDG HVFDODWHV DQG RYHUDOO ,QWHUQHW DUH XVHG EHFDXVH WKH SUHYLRXV HIÀFLHQF\ RI VFDOH H[LVWV HYHQ LQ D ZRUOG FDOFXODWLRQV VKRXOG KDYH LGHQWLÀHG ZKDW RI XQSUHGLFWDEOH VXSSO\ DQG GHPDQG VXEVHW RI WKH ,QWHUQHW WKH FRUH SRLQWV DQG &RQVLVWHQW KDVKLQJ PHDQV WKDW WKH GDWDFHQWHUV UHVLGH LQ 7KH XVH RI VPDOOHU PDSSLQJ SURFHVVHV ZKLFK UXQ VHSDUDWH UHJLRQVSHFLÀF FDOFXODWLRQV UHGXFHV WKH IURP WKH GHOLYHU\ VHUYHUV FDQ RSHUDWH 1400 Akamai Core Points FRPSXWDWLRQDO FRPSOH[LW\ DQG KHQFH WKH ZLWKRXW NQRZLQJ KRZ PDQ\ ´EXFNHWVµ datacenters (lots) WXUQDURXQG WLPH RI UHFDOFXODWLRQV H[LVW³KDVKHV DUH FRQVLVWHQWO\ DUULYHG DW 7KLV VHFRQGDU\ FDOFXODWLRQ XVHV D UHJDUGOHVV RI WKH VL]H RI WKH VSDFH EHLQJ ELSDUWLWH JUDSK PRGHO DV ZHOO EXW PDSV PDSSHG XVLQJ WKH KDVK PRUH LQIRUPDWLRQ 7DNLQJ LQWR DFFRXQW $NDPDL XVHV YDULDWLRQV RI WKLV 2QFH $NDPDL PHDVXUHV WKH ODWHQF\ WKH RSWLPDO PDSSLQJV RI UHTXHVWV IRU PDSSLQJ PHFKDQLVP YHU\ KHDYLO\ DQG ORVV RI WKH QHWZRUN EHWZHHQ HDFK SDUWLFXODU DSSOLFDWLRQV EHWZHHQ D FRUH IRU D YDULHW\ RI SXUSRVHV EH\RQG MXVW QHDUE\ GDWDFHQWHU DQG WKH FRUH SRLQW LW SRLQW DQG GDWDFHQWHUV LQIRUPDWLRQ RQ WKH PDSSLQJ RI HQG XVHUV WR $NDPDL WUHDWV WKH SUREOHP DV D ELSDUWLWH JUDSK H[LVWLQJ ORDG H[SHFWHG ORDG EDVHG RQ PDFKLQHV )RU H[DPSOH VRPH $NDPDL SUREOHP '16 UHVSRQVHV DOUHDG\ SURYLGHG  DQG FXVWRPHUV KRVW WKHLU VWDWLF ZHEVLWHV RQ $V VKRZQ LQ )LJXUH  WKH WZR JURXSV KLVWRULF ORDG YDULDWLRQ WKLV FDOFXODWLRQ·V GLVWULEXWHG SHUVLVWHQW VWRUDJH IDFLOLWLHV RQ WKH JUDSK DUH $NDPDL GDWDFHQWHUV UHVXOWV DUH RYHUODLG ZLWK D FRQVLVWHQW $NDPDL KDV GHSOR\HG DURXQG WKH ZRUOG RQ WKH OHIW DQG FRUH SRLQWV RQ WKH ULJKW KDVKLQJ WHFKQLTXH 0DSSLQJ DOORZV $NDPDL WR ORDGEDODQFH %HWZHHQ HDFK QRGH RQ WKH OHIW DQG HDFK )RU :HE FRQWHQW $NDPDL FDFKHV DQG RSWLPL]H WKH SHUIRUPDQFH EHWZHHQ QRGH RQ WKH ULJKW D FRVW LV DVVLJQHG EDVHG GDWD RQ VHUYHUV DQG HDFK VHUYHU RQO\ PXOWLSOH VWRUDJH FHQWHUV ZKHQ DQ ´HGJHµ RQ WKH FKDUDFWHULVWLFV RI WKH SURWRFRO WR EH KDV D OLPLWHG VWRUDJH FDSDFLW\ *LYHQ VHUYHU QHHGV WR IHWFK FRQWHQW 2WKHU RSWLPL]HG³IRU H[DPSOH D FRPELQDWLRQ WKDW $NDPDL GHOLYHUV WHQV RI WKRXVDQGV HGJH VHUYHUV IRUZDUG UHTXHVWV WKURXJK D RI DYDLODELOLW\ ORVV DQG ODWHQF\ RI ZHEVLWHV LW LV LQHIÀFLHQW WR XVH HYHU\ KLHUDUFK\ RI PDFKLQHV DQG WKH YDULRXV VSHFLÀF WR WKH DSSOLFDWLRQ LQFOXGLQJ VHUYHU LQGLVFULPLQDWHO\ IRU FDFKLQJ OHYHOV RI WKH KLHUDUFK\ XVH PDSV WR KLVWRULFDOO\ H[SHFWHG YDOXHV DQG SDVVLYH HYHU\ VLWH 7KXV WKH QRWLRQ RI FRQVLVWHQW DXWRPDWLFDOO\ LGHQWLI\ WKH QH[W QRGH LQ PHDVXUHPHQWV IURP SUHYLRXVO\ GHOLYHUHG KDVKLQJ IRU WKLV SXUSRVH ZDV GHYHORSHG WKH UHTXHVW FKDLQ /LYH VWUHDPLQJ IHHGV WUDIÀF 7KH FRVWV DUH RSWLPL]HG EHWZHHQ DW 0,7 DV SDUW RI WKH DOJRULWKPV GHYLVHG DUH UHSOLFDWHG WKURXJK WKH ,QWHUQHW WR VHUYHUV DQG FRUH SRLQWV DOORZLQJ D KLJK EHIRUH $NDPDL·V LQFHSWLRQ >@ (DFK UHGXFH SDFNHW ORVV EXW WKH VHOHFWLRQ RI OHYHO PDSSLQJ RI WKH EHVW DQG VHFRQGEHVW FXVWRPHU LV DVVRFLDWHG ZLWK D ´EXFNHWµ RI UHSOLFDWLRQ QRGHV LV GRQH XVLQJ D VLPLODU GDWDFHQWHU IRU D JLYHQ EORFN RI HQGXVHU FRQWHQW UHIHUHQFHG YLD D KDVK LQGH[ ,Q PDSSLQJ PHFKDQLVP WR RSWLPL]H UHOLDEOH ,3 VSDFH UHSUHVHQWHG E\ D FRUH SRLQW HDFK GDWDFHQWHU D PLQLPXP QXPEHU RI VWUHDPLQJ PHGLD GHOLYHU\ $NDPDL DOORZV %HFDXVH LW ZRXOG EH LQHIÀFLHQW PDFKLQHV DUH GHGLFDWHG WR KDQGOLQJ WKDW FXVWRPHUV WR UXQ RQ LWV SODWIRUP -DYD WR UDQGRPO\ XVH VHUYHUV ZLWKLQ D JLYHQ KDVK WR PLQLPL]H WKH LPSDFW RQ RYHUDOO DSSOLFDWLRQV ZKLFK KDYH WKHLU RZQ XQLTXH GDWDFHQWHU DQG $NDPDL PXVW DFFRXQW VWRUDJH UHVRXUFHV LQ WKH GDWDFHQWHU :KHQ ORDG EDODQFLQJ SDUDPHWHUV DQG WLPLQJ IRU RWKHU IDFWRUV VXFK DV WKH ORDG RQ D ORDG HVFDODWHV KRZHYHU QHZ VHUYHUV DUH FRQVLGHUDWLRQV IHG LQWR WKHLU PDSV 6RPH JLYHQ VHUYHU DQG WKH OLPLWV RQ VWRUDJH UHFUXLWHG FRQVLVWHQWO\ DV QHHGHG DQG RI $NDPDL·V FXVWRPHUV HYHQ XVH D VLPLODU

The Next Wave „ Vol 18 No 3 „ 2010 11 Figure 5: To map through intermediate nodes, Akamai calculates a more complex graph, where paths between Akamai servers, on the right, and some central infrastructure, on the left, are optimized by finding the best intermediate nodes to forward traffic through.

Akamai Akamai edge Intermediates

PDSSLQJ PHFKDQLVP WR ORDGEDODQFH XVHV WR VSHHG ORQJKDXO FRPPXQLFDWLRQV DSSOLFDWLRQ DQG GDWDEDVH LQIUDVWUXFWXUH WKHLU RZQ GDWDFHQWHUV XVLQJ WKHLU RZQ RQ WKH ,QWHUQHW JHRJUDSKLFDOO\ FRPELQDWLRQ RI FULWHULD DOORZLQJ WKHP ,Q $NDPDL·V HDUO\ GD\V VRPH RI (QG XVHUV PDNH D UHTXHVW IRU WR QRW RQO\ ORDG EDODQFH EXW RSWLPL]H WKH QHWZRUN HQJLQHHUV QRWLFHG WKDW LW 7DOHR·V DSSOLFDWLRQ DQG WKH\ FRQQHFW WR SHUIRUPDQFH FRVW DQG GHÀQH KRZ WKH\ ZDV YHU\ GLIÀFXOW WR UHOLDEO\ FRQQHFW WR WKH EHVW $NDPDL VHUYHU DV GHWHUPLQHG ZDQW IDLORYHU VFHQDULRV WR EH KDQGOHG DV VRPH UHPRWH PDFKLQHV IRU VRPH PDQXDO E\ WKH UHVXOWV RI WKH SUHYLRXV PDSSLQJ ZHOO GLDJQRVWLF FKHFNV )RU H[DPSOH LW ZDV FDOFXODWLRQ 7KLV $NDPDL VHUYHU KRZHYHU Mapping through IDU HDVLHU WR FRQQHFW WR PDFKLQHV LQ 6RXWK PD\ EH ORFDWHG LQ $XVWUDOLD DQG LW QHHGV .RUHD IURP DQRWKHU $NDPDL PDFKLQH WR DFFHVV 7DOHR·V FHQWUDO LQIUDVWUXFWXUH intermediate nodes LQ -DSDQ YHUVXV GLUHFWO\ IURP WKH 86 ZKLFK LV ORFDWHG LQ DQRWKHU FRQWLQHQW 7UDIÀF RQ WKH 86 KLJKZD\ V\VWHP 7KLV EDVLF DSSURDFK XVHV LQWHUPHGLDWHV 7KH WKUHH SDUWV RI WKH JUDSK FDOFXODWLRQ LV XQSUHGLFWDEOH DQG LW FDQ FKDQJH DW IRU RSWLPL]LQJ SHUIRUPDQFH DQG KDV IRU WKLV PDSSLQJ DUH 7DOHR·V FHQWUDO DQ\ PLQXWH RQ DQ\ URDG 6HQGLQJ D D XQLTXHO\ LQWHUHVWLQJ YDULDWLRQ WR WKH LQIUDVWUXFWXUH $NDPDL·V HGJH GDWDFHQWHU GHOLYHU\ E\ WUXFN IURP RQH FLW\ LQ WKH WUDGLWLRQDO $NDPDL PDSSLQJ SUREOHP LW LQ WKLV FDVH WKH H[DPSOH LQ $XVWUDOLD  86 WR DQRWKHU FDQ EH GRQH RYHU D YDULHW\ RSWLPL]HV WKH SDWK EHWZHHQ WZR SRLQWV E\ DQG D VHW RI $NDPDL QRGHV WKDW VKRXOG EH RI URXWHV EXW SLFNLQJ WKH EHVW URXWH LV XVLQJ D WKLUG SRLQW DV DQ LQWHUPHGLDWH FRQVLGHUHG DV LQWHUPHGLDWHV WR LPSURYH QRW DOZD\V VWUDLJKWIRUZDUG 7R SURYLGH )RU WKLV WHFKQLTXH WKH PDSSLQJ SHUIRUPDQFH DQG UHOLDELOLW\ WKH EHVW SRVVLEOH VSHHG RI GHOLYHU\ WKH PRGHO VZLWFKHV IURP EHLQJ D ELSDUWLWH 7KH VHW RI LQWHUPHGLDWHV LV FKRVHQ VHQGHU PLJKW ORRN DW 0DS4XHVWFRP WR JUDSK WR EHLQJ D WULSDUWLWH JUDSK 6HH XVLQJ D FDOFXODWLRQ WKDW UHOLHV RQ JOREDO GHWHUPLQH DQ LQLWLDO GHOLYHU\ SDWK DQG )LJXUH  IRU D YLVXDO UHSUHVHQWDWLRQ $V %*3 GDWD 6LQFH ZLGHDUHD QHWZRUN WKHQ SLFN WZR RU WKUHH DOWHUQDWH SDWKV DQ H[DPSOH FRQVLGHU 7DOHR DQ $NDPDL FRPPXQLFDWLRQV DUH GLFWDWHG E\ %*3 XVLQJ GLIIHUHQW KLJKZD\V DQG SRVVLEO\ FXVWRPHU WKDW SURYLGHV 6RIWZDUH DV D D GLUHFW FRQQHFWLRQ ZLOO WUDYHO DFURVV GLIIHUHQW FLWLHV DV LQWHUPHGLDWH VWRSSLQJ 6HUYLFH 7KHLU :HE DSSOLFDWLRQ LV VHFXUH WKH SDWK %*3 KDV LQ SODFH DOUHDG\ %*3 SRLQWV WR WKH GHVWLQDWLRQ 0XOWLSOH FRSLHV DQG KLJKO\ G\QDPLF DQG WKH\ SULPDULO\ GRHV QRW DFFRPPRGDWH SHUIRUPDQFH RI WKH GHOLYHU\ FDQ EH VHQW RYHU HDFK SDWK XVH $NDPDL WR PDNH SHUIRUPDQFH RI WKH KRZHYHU DQG LQ PDQ\ FDVHV SURYLGHV DQG ZKLFKHYHU DUULYHV ÀUVW LV GHOLYHUHG VLWH FRQVLVWHQWO\ IDVW ZRUOGZLGH 8VLQJ D VXERSWLPDO SDWK DFURVV WKH ,QWHUQHW 7KLV FRQFHSW RI LPSURYLQJ GHOLYHU\ VSHHG $NDPDL·V WHFKQLTXH RI PDSSLQJ WKURXJK 7KH RQO\ ZD\ DURXQG LV WR ´WULFNµ %*3 WKURXJK WDNLQJ PXOWLSOH SRVVLEO\ IDVWHU LQWHUPHGLDWH QRGHV KHOSV WKHP PDLQWDLQ E\ IRUZDUGLQJ FRPPXQLFDWLRQV EHWZHHQ LQGLUHFW URXWHV LV WKH VDPH FRQFHSW$NDPDL SHUIRUPDQFH ZLWKRXW GLVWULEXWLQJ WKHLU GLIIHUHQW QHWZRUN DGGUHVVHV FDXVLQJ

12 How Akamai Maps the Net FEATURE

WKH FRPPXQLFDWLRQV WR WDNH D GLIIHUHQW WKH GLUHFW URXWH 8VLQJ LQWHUPHGLDWHV FDQ PDSSLQJ LV WR WDNH LQWR DFFRXQW WKH QHWZRUN SDWK LQ WKH SURFHVV $NDPDL DOVR E\SDVV QHWZRUN URXWLQJ SUREOHPV VWUXFWXUH RI WKH QHWZRUN ORRNV DW WKH SRVVLEOH QRGHV LW FDQ XVH DV ZKLFK %*3 GRHV QRW DOZD\V UHDFW WR )LUVW XVLQJ LWV %*3 IHHGV $NDPDL LQWHUPHGLDWHV FKRVHQ IURP LWV JOREDO HIIHFWLYHO\ PDSV WKH EUHDNGRZQ RI ,3 DGGUHVV EORFNV SRSXODWLRQ RI  GDWDFHQWHUV DQG SLFNV 6LQFH WKLV WHFKQLTXH GRHV QRW UHO\ DQG WKH DVVRFLDWHG UHJLVWU\ LQIRUPDWLRQ QRGHV WKDW DUH OLNHO\ WR SURYLGH D GLYHUVLW\ RQ FDFKLQJ WR RSHUDWH $NDPDL KDV EHHQ ZKLFK LV QRW WRR DFFXUDWH LQ JHQHUDO EXW RI SDWKV LQ FRPSDULVRQ WR WKH H[LVWLQJ DEOH WR SURYLGH RSWLPL]DWLRQV WR ERWK 7&3 ZKLFK FDQ EH XVHIXO LI QR RWKHU LQIRUPDWLRQ GLUHFW %*3 SDWK DQG ZKLFK PD\ DOVR WUDIÀF DQG UDZ ,3 FRPPXQLFDWLRQV (DFK LV DYDLODEOH $NDPDL DOVR H[DPLQHV SURYLGH ORZHUODWHQF\ FRPPXQLFDWLRQV RSWLPL]DWLRQ KDV LWV RZQ FULWHULD DQG UHJLVWULHV WR GHWHUPLQH IXUWKHU VXE EHWZHHQ WKH WZR HQGSRLQWV $V SDUW RI LWV WKHVH DOO DUH EXLOW LQWR FXVWRP ´ÁDYRUVµ RI GLYLVLRQV LQ WKH QHWZRUN VSDFH LPSOLFLW FRQWLQXRXV PDSSLQJ SURFHVVHV $NDPDL PDSSLQJ LQ KRZ EORFNV KDYH EHHQ UHJLVWHUHG PHDVXUHV WKH SHUIRUPDQFH EHWZHHQ LWV DQG DVVLJQHG $NDPDL WKHQ SHUIRUPV Mapping data SRVVLEOH LQWHUPHGLDWH QRGHV DQG 7DOHR·V WUDFHURXWH TXHULHV ZKLFK DUH WDUJHWHG LQIUDVWUXFWXUH EHWZHHQ WKH LQWHUPHGLDWH ,Q DGGLWLRQ WR PDSSLQJ LWV VHUYLFHV DW LGHQWLI\LQJ WKH SDWK WR HDFK SRUWLRQ QRGHV DQG WKH $NDPDL HGJH GDWDFHQWHU $NDPDL FRQGXFWV PDSSLQJ RQ GDWD $ RI WKH EORFNV LGHQWLÀHG RQ WKH ,QWHUQHW DQG EHWZHHQ WKH HQGSRLQWV GLUHFWO\ JRRG H[DPSOH LV $NDPDL·V FRPPHUFLDO FRQGXFWHG RYHU D YDULHW\ RI SHULRGLFLWLHV 7KH EDVLF SULQFLSOH LV WR FKRRVH WKH ,3 JHRORFDWLRQ VHUYLFH ZKLFK SURYLGHV DQG GLIIHUHQW OHYHOV RI FRYHUDJH RI EHVW VHW RI LQWHUPHGLDWHV WKDW SURYLGHV WKH JHRORFDWLRQ DQG RWKHU LQIRUPDWLRQ IRU ,3 VSDFH '16 UHYHUVHORRNXSV RI ,3 ORZHVW RYHUDOO ODWHQF\ IRU HQGSRLQWWR ,3 DGGUHVVHV 'XH WR WKH QDWXUH RI WKH DGGUHVVHV DUH DGGHG WR WKH GDWD DYDLODEOH HQGSRLQW FRPPXQLFDWLRQV LI SRVVLEOH ,Q LQIRUPDWLRQ GHVLUHG GLIIHUHQW GDWD LV VRPHWLPHV LQGLFDWLQJ JHRJUDSKLF PDQ\ FDVHV WKH SHUIRUPDQFH RI WKH GLUHFW FROOHFWHG DQG SURFHVVHG WKDQ LQ RWKHU LQIRUPDWLRQ RQ WKH SDWK LGHQWLÀHG WKURXJK SDWK LV ÀQH EXW LQ PDQ\ RWKHU FDVHV XVLQJ $NDPDL WHFKQLTXHV D WUDFHURXWH WR DQ ,3 DGGUHVV RU UHODWLQJ WR DQ LQWHUPHGLDWH FDQ SURYLGH GUDPDWLF $NDPDL ORRNV DW DOO DYDLODEOH WKH ,3 DGGUHVV LWVHOI 7KH GDWD LV FRPELQHG SHUIRUPDQFH LPSURYHPHQWV VRPHWLPHV VRXUFHV RI SRVVLEOH JHRORFDWLRQ GDWD IRU ZLWK PDQXDO LQIRUPDWLRQ FRQWLQXRXVO\ PRUH WKDQ WZR RU WKUHH WLPHV IDVWHU WKDQ LQIHUHQFH 7KH ÀUVW VWHS LQ JHRORFDWLRQ HQWHUHG E\ $NDPDL·V LQWHUDFWLRQV ZLWK

Last Updated: 07/21 05:38am GHT Code Red Infections

07/19 07/19 07/20 07/20 07/20 07/21 08:08am 04:24pm 12:24am 08:37am 05:00pm 02:30am

Graph showing aggregate level of observed activity on the first few days of the Code Red virus outbreak in 2001. Akamai observed the surge connection attempts from infected machines from around the world.

The Next Wave „ Vol 18 No 3 „ 2010 13 XVHUV RQ WKH V\VWHP ZLWK WKH QHWZRUNV LWV DGGUHVV LQ WKH KHDGHU LV IRU D SXEOLF ,3 RU PXVLF GRZQORDGV $NDPDL SURYLGHV VHUYHUV DUH GHSOR\HG LQ DQG IURP RQJRLQJ DGGUHVV WKHQ WKDW ,3 LV ÁDJJHG DV D SUR[\ VRPH DJJUHJDWHG YLHZV LQWR WKLV GDWD PDQXDO LQYHVWLJDWLRQV RI JHRORFDWLRQ Mapping other DQ H[DPSOH RI ZKLFK LV VKRZQ LQ )LJXUH GDWD /DVWO\ $NDPDL OHYHUDJHV VRPH  ,Q WKLV YLHZ DFWLYLW\ DFURVV DERXW characteristics of the SDVVLYH 7&3 ODWHQF\ GDWD H[WUDFWHG IURP  GLIIHUHQW QHZVUHODWHG ZHEVLWHV LV network: Attacks, proxies, UHDOZRUOG LQWHUDFWLRQV EHWZHHQ HQG XVHU DJJUHJDWHG DOORZLQJ XVHUV WR GHWHUPLQH performance ,3 DGGUHVVHV DQG LWV HGJH VHUYHUV 7KH HQG LI D ELJ QHZV VWRU\ KDV KLW³RU DOORZLQJ UHVXOW LV GHULYHG E\ DSSO\LQJ KHXULVWLFV 7R WUDFN WKH VSUHDG RI FHUWDLQ D QHZV ZHEVLWH WR GHWHUPLQH LI LWV WUDIÀF WR DOO RI WKHVH VRXUFHV WR GHWHUPLQH DQ YLUXVHV DFURVV WKH QHWZRUN $NDPDL KDV ÁXFWXDWLRQV DUH LQ OLQH ZLWK WKH UHVW RI WKH DFFXUDWH JHRORFDWLRQ SURGXFW DW ERWK WKH GHSOR\HG D ´GDUNQHWµ RI VHUYHUV LQ DERXW LQGXVWU\ FRXQWU\ DQG FLW\ OHYHO  GLIIHUHQW QHWZRUNV WKDW SDVVLYHO\ $NDPDL SURYLGHV VRPH FXVWRPHUV $NDPDL LV FXUUHQWO\ FRQGXFWLQJ REVHUYHV DWWHPSWV WR FRQQHFW WR LW 8VLQJ ZLWK D VSHFLÀF YLHZ RI ZKDW $NDPDL VHHV UHVHDUFK RQ XVLQJ SDVVLYHO\ REVHUYHG WKH GDUNQHW $NDPDL FDQ SDVVLYHO\ ZLWK UHVSHFW WR V\PSWRPV RI DFWLYLW\ RQ ODWHQF\ PHDVXUHPHQWV LQ ODUJHU YROXPH REVHUYH WKH VSUHDG RI GLIIHUHQW W\SHV RI WKH QHWZRUN UHODWLYH WR ZKDW WKDW FXVWRPHU KLJKHU ÀGHOLW\ DQG ZLWK JUHDWHU ULJRU YLUXV RU ZRUP RXWEUHDNV 7R KHLJKWHQ VHHV $V DQ H[DPSOH RI WKH W\SH RI GDWD WR LPSURYH WKH RYHUDOO DFFXUDF\ RI WKH DZDUHQHVV RI LQWUXVLRQV VSHFLÀFDOO\ DYDLODEOH $NDPDL FDQ PHDVXUH LI KRXUO\ JHRORFDWLRQ WDUJHWLQJ D VSHFLÀF RUJDQL]DWLRQ QRW SHUIRUPDQFH WR FRUH SRLQWV LQ D SDUWLFXODU 7KH JHRORFDWLRQ VHUYLFH DOVR WKH ,QWHUQHW DV D ZKROH $NDPDL DOVR JHRJUDSK\ LV FKDQJLQJ DV WKH UHVXOW RI D SURYLGHV WZR RWKHU LQWHUHVWLQJ SLHFHV RI PRGHOV WKH EDVHOLQH OHYHO RI YLUXV DQG QHWZRUNEDVHG RU SK\VLFDO HYHQW VXFK GDWD DERXW DQ DGGUHVV WKDW DUH GHWHUPLQHG ZRUP LQWUXVLRQV DV ´EDFNJURXQG QRLVHµ DV D QDWXUDO GLVDVWHU DQG KRZ LW LPSDFWV YLD DOWHUQDWH ZD\V RI ´PDSSLQJµ WKH WKDW VKRXOG QRW EH SHUFHLYHG DV WDUJHWHG SHUIRUPDQFH ,W FDQ DOVR EH XVHG WR QHWZRUN WKURXJKSXW GDWD DQG SUR[\ LQWUXVLRQ DWWDFNV GHWHUPLQH LI D ORFDO YLHZ RI ZKDW LV GDWD :KHQHYHU $NDPDL GHOLYHUV D SLHFH $NDPDL LV FRQGXFWLQJ UHVHDUFK FRPLQJ LQ DQG RXW RI QHWZRUN JDWHZD\V RI FRQWHQW WR DQ HQG XVHU EH LW D SLFWXUH LQWR GHWHUPLQLQJ PRUH LQIRUPDWLRQ DERXW DQG WKH SHUIRUPDQFH WKHUHRI LV VLPLODU IURP D QHZV VLWH DQ DQWLYLUXV SDWFK RU D SUR[LHV DV ZHOO 2XWVLGH RI FXUUHQW WR ZKDW·V KDSSHQLQJ LQ WKH UHVW RI WKH YLGHR GRZQORDGHG IURP L7XQHV $NDPDL WHFKQLTXHV $NDPDL LV ORRNLQJ LQWR ZD\V ,QWHUQHW RU MXVW D ORFDOL]HG LVVXH UHFRUGV WKH DPRXQW RI WLPH WDNHQ WR RI PRGHOLQJ ZKDW DSSHDU WR EH SUR[LHV Summary GRZQORDG WKH FRQWHQW 8VLQJ WKLV SDVVLYH IURP RWKHU FKDUDFWHULVWLFV VXFK DV DQ REVHUYDWLRQ $NDPDL FDQ YHU\ DFFXUDWHO\ DEQRUPDO DPRXQW RI WUDIÀF RYHU WLPH RU 7KH G\QDPLF QDWXUH RI $NDPDL·V PRGHO WKH WKURXJKSXW DQG FRQQHFWLRQ WKH QXPEHU RI XQLTXH HQWLWLHV LGHQWLÀHG VFDODEOH DQG ÁH[LEOH GLVWULEXWHG V\VWHPV VSHHG DYDLODEOH DW WKDW ,3 DGGUHVV RQ WKH EHKLQG DQ ,3 DGGUHVV GHVLJQ UHOLHV KHDYLO\ RQ DQG EHQHÀWV QHWZRUN DQG SURYLGHV WKH UHVXOW :KLOH $NDPDL SRVVHVVHV RWKHU LQIRUPDWLRQ JUHDWO\ IURP WKH ULJRURXV HIIRUWV LQYHVWHG QRW VWULFWO\ D ´PDSµ WKLV PRGHOLQJ LV WKDW PD\ EH XVHIXO IRU SDVVLYH DQDO\VLV LQ QHWZRUN PDSSLQJ $NDPDL·V QRWLRQ RI PRUH D PDSSLQJ RI FKDUDFWHULVWLFV RQWR RI WKH ,QWHUQHW·V ,3 VSDFH 7KURXJK WKH QHWZRUN PDSSLQJ LV UHODWLYHO\ EURDG DQG WKH ,3 DGGUHVV VSDFH SURYLGLQJ D JUHDWHU VKHHU YROXPH RI QRUPDO ,QWHUQHW WUDIÀF LW LV FUDIWHG LQWR VHYHUDO VSHFLÀF PHWKRGV OHYHO RI GHWDLO GHOLYHUV HVWLPDWHG WR EH DERXW  SHUFHQW IRU UHDOWLPH VHUYLFH RSHUDWLRQ RU ORQJ 3UR[\ GDWD LV DOVR LQIHUUHG XVLQJ RI WKH :HE $NDPDL FDQ PRGHO ZKDW SDUWV WHUP GDWD DQDO\VLV $NDPDL·V QHWZRUN SDVVLYHO\ FROOHFWHG REVHUYDWLRQV IURP RI WKH ,QWHUQHW DSSHDU WR KDYH FHUWDLQ W\SHV SUHVHQFH DQG DFFHVV WR WUDIÀF SURYLGHV D WKH KXQGUHGV RI ELOOLRQV RI WUDQVDFWLRQV RI DFWLYLW\ SDWWHUQV )RU H[DPSOH VRPH YHU\ XQLTXH YDQWDJH SRLQW WR XQGHUVWDQG $NDPDL VHUYHV HDFK GD\ 6SHFLÀF WR SDUWV RI WKH ZRUOG PD\ EH DFWLYH DW FHUWDLQ WKH ,QWHUQHW DQG KRZ LW LV RSHUDWLQJ JHRORFDWLRQ $NDPDL ORRNV IRU WKH WLPHV RI GD\ DQG PD\ KDYH VRIWZDUH WKHVH H[DPSOHV SURYLGH D VDPSOLQJ RI SUHVHQFH RI D SDUWLFXODU +773 KHDGHU LQVWDOOHG WKDW DXWRPDWLFDOO\ LGHQWLÀHV KRZ $NDPDL WDNHV DGYDQWDJH RI WKLV WKDW LV SDVVHG E\ ZHOOFRQÀJXUHG SUR[LHV WKH WLPH ]RQH RI WKH XVHU·V PDFKLQH RU LQIRUPDWLRQ IRU YHU\ VSHFLÀF SXUSRVHV ;)RUZDUGHG)RU7KLV KHDGHU LQGLFDWHV ORFDOL]HG VRIWZDUH LQ SODFH 6RPH SDUWV RI :KDWHYHU VKDSHV WKH ,QWHUQHW PRUSKV LQWR WKDW WKH ,3 DGGUHVV LVVXLQJ WKH +773 WKH ,QWHUQHW PD\ DOVR KDYH YHU\ VSHFLÀF LQ WKH IXWXUH \RX FDQ EHW WKDW $NDPDL UHTXHVW LV GRLQJ VR RQ EHKDOI RI DQRWKHU LQWHUHVW LQ FDWHJRULHV RI FRQWHQW RYHU ZLOO EH SUHVHQW DQG ZLOO KDYH QHZ ZD\V RI ,3 DGGUHVV ,I WKH LGHQWLÀHG VHFRQGDU\ ,3 WLPH VXFK DV RQOLQH VKRSSLQJ PHGLD PDSSLQJ LW

14 How Akamai Maps the Net FEATURE

Figure 7: Snapshot of Akamai’s Net Usage Index for News sites, providing a view into the overall usage of about 100 of the Web’s news sites.

RI WKH GHVFULEHG PHWKRGV DUH SDWHQWHG

>@ 0 $IHUJDQ - :HLQ DQG $ /D0H\HU ([SHULHQFH ZLWK VRPH SULQFLSOHV IRU EXLOGLQJ DQ LQWHUQHWVFDOH UHOLDEOH V\VWHP :25/'6  6HFRQG :RUNVKRS RQ 5HDO /DUJH 'LVWULEXWHG 6\VWHPV >@ ' .DUJHU ( /HKPDQ 7 /HLJKWRQ 0 /HYLQH ' /HZLQ 5 3DQLJUDK\ &RQVLVWHQW +DVKLQJ DQG 5DQGRP 7UHHV 'LVWULEXWHG &DFKLQJ References 3URWRFROV IRU 5HOLHYLQJ +RW 6SRWV RQ WKH :RUOG :LGH :HE 0DQ\ RI $NDPDL·V WHFKQLFDO SXE OLFDWLRQV FDQ EH IRXQG RQOLQH DW KWWS ZZZDNDPDLFRPSXEOLFDWLRQV 6HYHUDO

The Next Wave „ Vol 18 No 3 „ 2010 15 Compressed Sensing & Network Monitoring

5HSULQWHG ZLWK SHUPLVVLRQ RI ,((( 2ULJLQDOO\ SXEOLVKHG LQ ,((( 6LJQDO 3URFHVVLQJ 0DJD]LQH SS  0DUFK  - +DXSW : 8 %DMZD 0 5DEEDW DQG 5 1RZDN &RPSUHVVHG 6HQVLQJ IRU 1HWZRUNHG 'DWD ‹ ,(((

Introduction 7KHVH GDWD FRXOG EH ÀOHV WR EH VKDUHG RU VLPSO\ 1HWZRUN PRQLWRULQJ DQG LQIHUHQFH LV DQ VFDODU YDOXHV FRUUHVSRQGLQJ WR QRGH DWWULEXWHV RU VHQVRU PHDVXUHPHQWV /HW XV DVVXPH WKDW HDFK [M LQFUHDVLQJO\ LPSRUWDQW FRPSRQHQW RI LQWHOOLJHQFH LV D VFDODU TXDQWLW\ IRU WKH VDNH RI WKLV LOOXVWUDWLRQ JDWKHULQJ IURP PDSSLQJ WKH VWUXFWXUH RI WKH 7 &ROOHFWLYHO\ WKHVH GDWD [ >[[Q@  DUUDQJHG LQ ,QWHUQHW WR GLVFRYHULQJ FODQGHVWLQH VRFLDO QHWZRUNV D YHFWRU DUH FDOOHG QHWZRUNHG GDWD WR HPSKDVL]H DV ZHOO WR LQIRUPDWLRQ IXVLRQ LQ ZLUHOHVV VHQVRU ERWK WKH GLVWULEXWHG QDWXUH RI WKH GDWD DQG WKH QHWZRUNV ,QGHHG VHYHUDO LQWHUQDWLRQDO FRQIHUHQFHV IDFW WKDW WKH\ PD\ EH VKDUHG RYHU WKH XQGHUO\LQJ DUH GHGLFDWHG WR WKH QDVFHQW ÀHOG RI QHWZRUN FRPPXQLFDWLRQV LQIUDVWUXFWXUH RI WKH QHWZRUN 7KH VFLHQFH 7KLV DUWLFOH FRQVLGHUV D SDUWLFXODUO\ VDOLHQW QHWZRUNHG GDWD YHFWRU PD\ EH YHU\ ODUJH Q PD\ EHD DVSHFW RI QHWZRUN VFLHQFH WKDW UHYROYHV DURXQG WKRXVDQG D PLOOLRQ RU PRUH 7KXV HYHQ WKH SURFHVV ODUJHVFDOH GLVWULEXWHG VRXUFHV RI GDWD DQG WKHLU RI JDWKHULQJ [ DW D VLQJOH SRLQW LV GDXQWLQJ UHTXLULQJ VWRUDJH WUDQVPLVVLRQ DQG UHWULHYDO 7KH WDVN RI Q FRPPXQLFDWLRQV DW OHDVW 

16 Compressed Sensing and Network Monitoring FEATURE

FRGLQJ WHFKQLTXHV VXFK DV 6OHSLDQ:ROI FRGLQJ E\ WKH GHYLDQW QRGHV VHQGLQJ WKDW QRWLÀFDWLRQ FDQ EH XVHG WR GHVLJQ FRPSUHVVLRQ VFKHPHV ZLWKRXW 6HFRQG LI WKH QRPLQDO YDOXH ZHUH QRW NQRZQ EXW FROODERUDWLRQ EHWZHHQ QRGHV 6HH >@ DQG WKH WKH GHYLDQW FDVHV ZHUH DVVXPHG WR EH LVRODWHG WKHQ UHIHUHQFHV WKHUHLQ IRU DQ H[FHOOHQW RYHUYLHZ RI VXFK WKH QRGHV FRXOG VLPSO\ FRPSDUH WKHLU RZQ YDOXHV DSSURDFKHV 8QIRUWXQDWHO\ LQ PDQ\ DSSOLFDWLRQV WR WKRVH RI WKHLU QHDUHVW QHLJKERUV WR GHWHUPLQH SULRU NQRZOHGJH RI WKH SUHFLVH FRUUHODWLRQV LQ WKH QRPLQDO YDOXH DQG DQ\ GHYLDWLRQ RI WKHLU RZQ WKH GDWD LV XQDYDLODEOH PDNLQJ LW GLIÀFXOW RU $JDLQ QRWLÀFDWLRQV IURP WKH GHYLDQW QRGHV ZRXOG LPSRVVLEOH WR DSSO\ VXFK GLVWULEXWHG VRXUFH FRGLQJ SURYLGH WKH GHVLUHG FRPSUHVVLRQ 7KHUH LV D WKLUG WHFKQLTXHV 7KLV VLWXDWLRQ PRWLYDWHV FROODERUDWLYH PRUH JHQHUDO VFHQDULR LQ ZKLFK VXFK VLPSOH ORFDO LQQHWZRUN SURFHVVLQJ DQG FRPSUHVVLRQ LQ ZKLFK SURFHVVLQJ VFKHPHV FDQ EUHDN GRZQ 6XSSRVH WKDW XQNQRZQ FRUUHODWLRQV DQG GHSHQGHQFLHV EHWZHHQ WKH QRPLQDO YDOXH LV XQNQRZQ WR WKH QRGHV D SULRUL WKH QHWZRUNHG GDWD FDQ EH OHDUQHG DQG H[SORLWHG DQG WKDW WKH GHYLDQW FDVHV FRXOG EH LVRODWHG RU E\ H[FKDQJLQJ LQIRUPDWLRQ EHWZHHQ QHWZRUN FOXVWHUHG 6LQFH WKH GHYLDQW QRGHV PD\ EH FOXVWHUHG QRGHV +RZHYHU WKH GHVLJQ DQG LPSOHPHQWDWLRQ RI WRJHWKHU VLPSO\ FRPSDULQJ YDOXHV EHWZHHQ HIIHFWLYH FROODERUDWLYH SURFHVVLQJ DOJRULWKPV FDQ QHLJKERULQJ QRGHV PD\ QRW UHYHDO WKHP DOO DQG EH TXLWH FKDOOHQJLQJ VLQFH WKH\ WRR UHO\ RQ VRPH SHUKDSV QRW HYHQ WKH PDMRULW\ RI WKHP GHSHQGLQJ SULRU NQRZOHGJH RI WKH DQWLFLSDWHG FRUUHODWLRQV DQG RQ WKH H[WHQW RI FOXVWHULQJ ,QGHHG GLVWULEXWHG GHSHQG RQ VRPHZKDW VRSKLVWLFDWHG FRPPXQLFDWLRQV SURFHVVLQJ VFKHPHV LQ JHQHUDO DUH GLIÀFXOW WR GHVLJQ DQG QRGH SURFHVVLQJ FDSDELOLWLHV ZLWKRXW SULRU NQRZOHGJH RI WKH DQWLFLSDWHG UHODWLRQV 7KLV DUWLFOH GHVFULEHV D YHU\ GLIIHUHQW DSSURDFK DPRQJ GDWD DW QHLJKERULQJ QRGHV 7KLV VHUYHV DV D PRWLYDWLRQ IRU WKH WKHRU\ DQG PHWKRGV GLVFXVVHG WR WKH GHFHQWUDOL]HG FRPSUHVVLRQ RI QHWZRUNHG GDWD KHUH 6SHFLÀFDOO\ FRQVLGHU D FRPSUHVVLRQ RI WKH IRUP &RPSUHVVHG VHQVLQJ RIIHUV DQ DOWHUQDWLYH \ $[ ZKHUH $ ^$LM` LV D N Q ´VHQVLQJµ PD WUL[ ZLWK IDU IHZHU URZV WKDQ FROXPQV LH N << Q  PHDVXUHPHQW DSSURDFK WKDW GRHV QRW UHTXLUH DQ\ 7KH FRPSUHVVHG GDWD YHFWRU \ LV N  DQG WKHUH VSHFLÀF SULRU VLJQDO NQRZOHGJH DQG LV DQ HIIHFWLYH IRUH LV PXFK HDVLHU WR VWRUH WUDQVPLW DQG UHWULHYH DQG HIÀFLHQW VWUDWHJ\ LQ HDFK RI WKH VLWXDWLRQV FRPSDUHG WR WKH XQFRPSUHVVHG QHWZRUNHG GDWD [ GHVFULEHG DERYH 7KH YDOXHV RI DOO QRGHV FDQ 7KH WKHRU\ RI FRPSUHVVHG VHQVLQJ JXDUDQWHHV WKDW EH UHFRYHUHG IURP WKH FRPSUHVVHG GDWD \ $[ IRU FHUWDLQ PDWULFHV $ ZKLFK DUH QRQDGDSWLYH DQG SURYLGHG LWV VL]H N LV SURSRUWLRQDO WR WKH QXPEHU RI RIWHQ TXLWH XQVWUXFWXUHG [ FDQ EH DFFXUDWHO\ UHFRY GHYLDQW QRGHV $V ZH VKDOO VHH \ FDQ EH HIÀFLHQWO\ HUHG IURP \ ZKHQHYHU [ LWVHOI LV FRPSUHVVLEOH LQ FRPSXWHG LQ D GLVWULEXWHG PDQQHU DQG E\ YLUWXH RI LWV VRPH GRPDLQ HJIUHTXHQF\ ZDYHOHWWLPH >@²>@ VPDOO VL]H LW LV QDWXUDOO\ HDV\ WR VWRUH DQG WUDQVPLW ,Q IDFW LQ FHUWDLQ ZLUHOHVV QHWZRUN DSSOLFDWLRQV VHH 7R FDUU\ WKH LOOXVWUDWLRQ IXUWKHU DQG WR :LUHOHVV 6HQVRU 1HWZRUNV LQ WKH 1HWZRUNHG 'DWD PRWLYDWH WKH DSSURDFKHV SURSRVHG LQ WKLV DUWLFOH &RPSUHVVLRQ LQ $FWLRQ VHFWLRQ RI WKLV DUWLFOH IRU OHW XV ORRN DW D YHU\ FRQFUHWH H[DPSOH 6XSSRVH GHWDLOV  \ FDQ EH FRPSXWHG LQ WKH DLU LWVHOI UDWKHU WKDW PRVW RI WKH QHWZRUN QRGHV KDYH WKH VDPH WKDQ LQ VLOLFRQ 7KXV FRPSUHVVHG VHQVLQJ RIIHUV QRPLQDO GDWD YDOXH EXW WKH IHZ UHPDLQLQJ QRGHV WZR KLJKO\ GHVLUDEOH IHDWXUHV IRU QHWZRUNHG GDWD KDYH GLIIHUHQW YDOXHV )RU LQVWDQFH WKH YDOXHV FRXOG DQDO\VLV 7KH PHWKRG LV GHFHQWUDOL]HG PHDQLQJ WKDW FRUUHVSRQG WR VHFXULW\ VWDWLVWLFV RU VHQVRU UHDGLQJV GLVWULEXWHG GDWD FDQ EH HQFRGHG ZLWKRXW D FHQWUDO DW HDFK QRGH 7KH QHWZRUNHG GDWD YHFWRU LQ WKLV FDVH FRQWUROOHU DQG XQLYHUVDO LQ WKH VHQVH WKDW VDPSOLQJ LV PRVWO\ FRQVWDQW H[FHSW IRU D IHZ GHYLDWLRQV LQ GRHV QRW UHTXLUH D SULRUL NQRZOHGJH RU DVVXPSWLRQV FHUWDLQ ORFDWLRQV 7KLV PLQRULW\ PD\ EH RI PRVW DERXW WKH GDWD )RU WKHVH UHDVRQV WKH DGYDQWDJHV RI LQWHUHVW LQ VHFXULW\ RU VHQVLQJ DSSOLFDWLRQV &OHDUO\ FRPSUHVVHG VHQVLQJ KDYH DOUHDG\ FDXJKW RQ LQ WKH [ LV TXLWH FRPSUHVVLEOH WKH QRPLQDO YDOXH SOXV WKH UHVHDUFK FRPPXQLW\ DV HYLGHQFHG E\ VHYHUDO UHFHQW ORFDWLRQV DQG YDOXHV RI WKH IHZ GHYLDQW FDVHV VXIÀFH ZRUNV >@²>@ IRU LWV VSHFLÀFDWLRQ &RQVLGHU D IHZ SRVVLEOH VLWXDWLRQV LQ WKLV Compressed sensing basics QHWZRUNHG GDWD FRPSUHVVLRQ SUREOHP )LUVW LI 7KH WKHRU\ RI FRPSUHVVHG VHQVLQJ &6 WKH QRPLQDO YDOXH ZHUH NQRZQ WR DOO QRGHV WKHQ H[WHQGV WUDGLWLRQDO VHQVLQJ DQG VDPSOLQJ V\VWHPV WR WKH GHVLUHG FRPSUHVVLRQ LV DFFRPSOLVKHG VLPSO\ D PXFK EURDGHU FODVV RI VLJQDOV $FFRUGLQJ WR &6

The Next Wave „ Vol 18 No 3 „ 2010 17 WKHRU\ DQ\ VXIÀFLHQWO\ FRPSUHVVLEOH VLJQDO FDQ EH DUH YHU\ SURPLVLQJ EHFDXVH DW OHDVW P SLHFHV RI DFFXUDWHO\ UHFRYHUHG IURP D VPDOO QXPEHU RI QRQ LQIRUPDWLRQ WKH ORFDWLRQ DQG DPSOLWXGH RI HDFK DGDSWLYH UDQGRPL]HG OLQHDU SURMHFWLRQ VDPSOHV )RU QRQ]HUR HQWU\ DUH JHQHUDOO\ UHTXLUHG WR GHVFULEH H[DPSOH VXSSRVH WKDW [ D RQ LV PVSDUVH LH LW KDV DQ\ PVSDUVH VLJQDO DQG &6 LV DQ HIIHFWLYH ZD\ WR QR PRUH WKDQ P QRQ]HUR HQWULHV ZKHUH P LV PXFK REWDLQ WKLV LQIRUPDWLRQ LQ D VLPSOH QRQDGDSWLYH VPDOOHU WKDQ WKH VLJQDO OHQJWK Q 6SDUVH YHFWRUV DUH PDQQHU 7KH QH[W IHZ VXEVHFWLRQV H[SODLQ LQ VRPH YHU\ FRPSUHVVLEOH VLQFH WKH\ FDQ EH FRPSOHWHO\ GHWDLO KRZ WKLV LV DFFRPSOLVKHG GHVFULEHG E\ WKH ORFDWLRQV DQG DPSOLWXGHV RI WKH Compressed sensing for networked data QRQ]HUR HQWULHV 5DWKHU WKDQ VDPSOLQJ HDFK HOHPHQW RI [ &6 GLUHFWV XV WR ÀUVW SUHFRQGLWLRQ WKH VLJQDO E\ 7R LOOXVWUDWH WKH &6 UDQGRP SURMHFWLRQ HQFRG RSHUDWLQJ RQ LW ZLWK D GLYHUVLI\LQJ PDWUL[ \LHOGLQJ LQJ DQG UHFRQVWUXFWLRQ LGHDV FRQVLGHU WKH VLPSOH D VLJQDO ZKRVH HQWULHV DUH PL[WXUHV RI WKH QRQ]HUR UHFRQVWUXFWLRQ H[DPSOH )LJXUH   6XSSRVH WKDW HQWULHV RI WKH RULJLQDO VLJQDO 7KH UHVXOWLQJ VLJQDO LV LQ D QHWZRUN RI Q VHQVRUV RQO\ RQH RI WKH VHQVRUV WKHQ VDPSOHG N WLPHV WR REWDLQ D ORZGLPHQVLRQDO LV REVHUYLQJ VRPH SRVLWLYH YDOXH ZKLOH WKH UHVW RI YHFWRU RI REVHUYDWLRQV 2YHUDOO WKH DFTXLVLWLRQ WKH VHQVRUV REVHUYH ]HUR 7KH JRDO LV WR LGHQWLI\ SURFHVV FDQ EH GHVFULEHG E\ WKH REVHUYDWLRQ ZKLFK VHQVRU PHDVXUHV WKH QRQ]HUR YDOXH XVLQJ D PRGHO \ $[  Ǿ  ZKHUH WKH PDWUL[ $ LV D N PLQLPXP QXPEHU RI REVHUYDWLRQV &RQVLGHU PDN Q &6 PDWUL[ WKDW GHVFULEHV WKH MRLQW RSHUDWLRQV RI LQJ UDQGRP SURMHFWLRQ REVHUYDWLRQV RI WKH GDWD SUHFRQGLWLRQLQJ DQG VXEVDPSOLQJ DQG Ǿ UHSUHVHQWV ZKHUH HDFK REVHUYDWLRQ LV WKH SURMHFWLRQ RI WKH VHQ HUURUV GXH WR QRLVH RU RWKHU SHUWXUEDWLRQV 7KH PDLQ VRU UHDGLQJV RQWR D UDQGRP YHFWRU KDYLQJ HQWULHV “ UHVXOWV RI &6 WKHRU\ KDYH HVWDEOLVKHG WKDW LI WKH HDFK ZLWK SUREDELOLW\  7KH YDOXH RI HDFK REVHU QXPEHU RI &6 VDPSOHV LV D VPDOO LQWHJHU PXOWLSOH YDWLRQ DORQJ ZLWK NQRZOHGJH RI WKH UDQGRP YHFWRU JUHDWHU WKDQ WKH QXPEHU RI QRQ]HUR HQWULHV LQ [ RQWR ZKLFK WKH GDWD ZDV SURMHFWHG FDQ EH XVHG WR WKHQ WKHVH VDPSOHV VXIÀFLHQWO\ ´HQFRGHµ WKH VDOLHQW LGHQWLI\ D VHW RI DERXW Q K\SRWKHVLV VHQVRUV WKDW LQIRUPDWLRQ LQ WKH VSDUVH VLJQDO DQG DQ DFFXUDWH DUH FRQVLVWHQW ZLWK WKDW SDUWLFXODU REVHUYDWLRQ 7KH UHFRQVWUXFWLRQ IURP \ LV SRVVLEOH 7KHVH UHVXOWV HVWLPDWH RI WKH DQRPDORXV VHQVRU JLYHQ N REVHUYD WLRQV LV VLPSO\ WKH LQWHUVHFWLRQ RI WKH K\SRWKHVLV Consistent with all Random vector Hypotheses prior observations VHWV FRQVLVWHQW ZLWK HDFK RI WKH N REVHUYDWLRQV ,W LV HDV\ WR VHH WKDW RQ DYHUDJH DERXW ORJ Q REVHUYD WLRQV DUH UHTXLUHG EHIRUH WKH FRUUHFW XQLTXH VHQ ,((( VRU LV LGHQWLÀHG 'HÀQH WKH  TXDVLQRUP __ ]__  WR

‹ EH HTXDO WR WKH QXPEHU RI QRQ]HUR HQWULHV LQ WKH Networked data YHFWRU ] 7KHQ WKLV VLPSOH SURFHGXUH FDQ EH WKRXJKW RI DV WKH VROXWLRQ RI WKH RSWLPL]DWLRQ SUREOHP

DUJ P]LQ__ ]__  VXEMHFW WR \ $]  Encoding requirements 6XSSRVH WKDW IRU VRPH REVHUYDWLRQ PDWUL[ $ WKHUH LV D QRQ]HUR PVSDUVH VLJQDO [ VXFK WKDW WKH REVHUYDWLRQV \ $[  5HFRYHU\ RI [ LV LPSRV VLEOH LQ WKLV VHWWLQJ VLQFH WKH REVHUYDWLRQV SURYLGH Figure 1: $ VLPSOH UHFRQVWUXFWLRQ H[DPSOH RQ D QHWZRUN RI Q  QRGHV 2QH QR LQIRUPDWLRQ DERXW WKH VSHFLÀF VLJQDO EHLQJ RE GLVWLQJXLVKHG VHQVRU REVHUYHV D SRVLWLYH YDOXH ZKLOH WKH UHPDLQLQJ Q ï  REVHUYH ]HUR VHUYHG 0DWULFHV WKDW DUH UHVLOLHQW WR VXFK DPELJXL 7KH WDVN LV WR LGHQWLI\ ZKLFK VHQVRU LV GLIIHUHQW E\ XVLQJ DV IHZ REVHUYDWLRQV DV SRVVLEOH WLHV DUH WKRVH WKDW VDWLVI\ WKH 5HVWULFWHG ,VRPHWU\ ,Q WKH &6 DSSURDFK WKH GDWD DUH SURMHFWHG RQWR UDQGRP YHFWRUV VXFK DV WKRVH GHSLFWHG 3URSHUW\ 5,3 >@ >@ >@ (VVHQWLDOO\ D N Q LQ WKH VHFRQG FROXPQ ZKHUH QRGHV LQGLFDWHG LQ EODFN PXOWLSO\ WKHLU GDWD YDOXH E\ ï Q DQG WKRVH LQ ZKLWH E\   7KH WKLUG FROXPQ VKRZV WKDW DERXW Q K\SRWKHVLV VHQVRUV DUH VHQVLQJ PDWUL[ $ ZLWK XQLWQRUPHG URZV LH ȸM   FRQVLVWHQW ZLWK HDFK UDQGRP SURMHFWLRQ REVHUYDWLRQ EXW WKH QXPEHU RI K\SRWKHVHV WKDW DUH $LM  IRU L N LV VDLG WR VDWLVI\ D 5,3 RI VLPXOWDQHRXVO\ FRQVLVWHQW ZLWK DOO REVHUYDWLRQV VKRZQ LQ WKH IRXUWK FROXPQ GHFUHDVHV RUGHU V ZKHQHYHU $[  Ⱦ N [  Q KROGV VLPXO H[SRQHQWLDOO\ ZLWK WKH QXPEHU RI REVHUYDWLRQV 7KH UDQGRP SURMHFWLRQ REVHUYDWLRQV DUH __ __  __ __  Q DSSUR[LPDWHO\ SHUIRUPLQJ ELQDU\ ELVHFWLRQV RI WKH K\SRWKHVLV VSDFH DQG RQO\ DERXW ORJ Q WDQHRXVO\ IRU DOO VVSDUVH YHFWRUV [ D R  7KH 5,3 LV REVHUYDWLRQV DUH QHHGHG WR GHWHUPLQH ZKLFK VHQVRU UHDGV WKH QRQ]HUR YDOXH VRQDPHG EHFDXVH LW GHVFULEHV PDWULFHV WKDW LPSRVH

18 Compressed Sensing and Network Monitoring FEATURE

  QHDULVRPHWU\ DSSUR[LPDWH OHQJWK SUHVHUYDWLRQ DUJ P]LQ ^__ \$]__  ƪ__]__ `  RQ D UHVWULFWHG VHW RI VXEVSDFHV WKH VXEVSDFHV RI VVSDUVH YHFWRUV  ,Q VLPSOHU WHUPV D PDWUL[ VDWLV DV SURSRVHG LQ >@ IRU DSSURSULDWHO\ FKRVHQ

ÀHV 5,3 LI DQG RQO\ LI YHFWRUV WKDW DUH VXIÀFLHQWO\ UHJXODUL]DWLRQ FRQVWDQWV ƪ DQG ƪ WKDW HDFK VSDUVH DUH QRW LQ WKH QXOO VSDFH RI WKH PDWUL[ GHSHQG RQ WKH QRLVH YDULDQFH ,Q HLWKHU FDVH WKH  ,Q SUDFWLFH VHQVLQJ PDWULFHV WKDW VDWLVI\ WKH UHFRQVWUXFWLRQ HUURU E >__ [Ø[__   Q@ GHFD\V DW D UDWH RI 5,3 DUH HDV\ WR JHQHUDWH ,W KDV EHHQ HVWDEOLVKHG WKDW P ORJ QN  ,Q SUDFWLFH WKH RSWLPL]DWLRQ  FDQ EH N Q PDWULFHV ZKRVH HQWULHV DUH LQGHSHQGHQW DQG VROYHG E\ D OLQHDU SURJUDP ZKLOH  LV RIWHQ VROYHG LGHQWLFDOO\ GLVWULEXWHG UHDOL]DWLRQV RI FHUWDLQ ]HUR E\ FRQYH[ UHOD[DWLRQ³UHSODFLQJ WKH  SHQDOW\ ZLWK PHDQ UDQGRP YDULDEOHV ZLWK YDULDQFH Q VDWLVI\ WKH  SHQDOW\ 7KH DSSHDO RI &6 LV UHDGLO\ DSSDUHQW D 5,3 ZLWK YHU\ KLJK SUREDELOLW\ ZKHQ N • FRQVW à IURP WKH HUURU UDWH ZKLFK LJQRULQJ WKH ORJDULWKPLF ORJ Qà P >@ >@ >@ 3K\VLFDO OLPLWDWLRQV RI UHDO IDFWRU LV SURSRUWLRQDO WR PN WKH YDULDQFH RI DQ VHQVLQJ V\VWHPV PRWLYDWH WKH XQLWQRUP UHVWULF HVWLPDWRU RI P SDUDPHWHUV IURP N REVHUYDWLRQV ,Q WLRQ RQ WKH URZV RI $ ZKLFK HVVHQWLDOO\ OLPLWV WKH RWKHU ZRUGV &6 LV DEOH WR ERWK LGHQWLI\ WKH ORFDWLRQV DPRXQW RI ´VDPSOLQJ HQHUJ\µ DOORWWHG WR HDFK RE DQG HVWLPDWH WKH DPSOLWXGHV RI WKH QRQ]HUR HQWULHV VHUYDWLRQ ZLWKRXW DQ\ VSHFLÀF SULRU NQRZOHGJH DERXW WKH Decoding: Algorithms and bounds VLJQDO H[FHSW WKH DVVXPSWLRQ RI VSDUVLW\ &RPSUHVVHG VHQVLQJ LV D IRUP RI VXEVDPSOLQJ Transform domain sparsity VR DOLDVLQJ LV SUHVHQW DQG QHHGV WR EH DFFRXQWHG IRU 6XSSRVH WKH REVHUYHG VLJQDO [ LV QRW VSDUVH LQ WKH UHFRQVWUXFWLRQ SURFHVV 7KH VDPH FRPSUHVVHG EXW LQVWHDG D VXLWDEO\ WUDQVIRUPHG YHUVLRQ LV GDWD FRXOG EH JHQHUDWHG E\ PDQ\ QGLPHQVLRQDO 6SHFLÀFDOO\ OHW 7 EH D WUDQVIRUPDWLRQ PDWUL[ DQG YHFWRUV EXW WKH 5,3 LPSOLHV WKDW RQO\ RQH RI WKHVH DVVXPH WKDW Ƨ 7[ LV VSDUVH 7KH &6 REVHUYDWLRQV LV VSDUVH 7KLV PLJKW VHHP WR UHTXLUH WKDW DQ\ FDQ EH ZULWWHQ DV \ $[ $7ïƧ ,I $ LV D UDQGRP UHFRQVWUXFWLRQ DOJRULWKP PXVW H[KDXVWLYHO\ VHDUFK &6 PDWUL[ VDWLVI\LQJ WKH 5,3 WKHQ LQ PDQ\ FDVHV DOO VSDUVH YHFWRUV EXW IRUWXQDWHO\ WKH SURFHVV LV VR LV WKH SURGXFW PDWUL[ $7ï >@ &RQVHTXHQWO\ PXFK PRUH WUDFWDEOH *LYHQ D YHFWRU RI QRLVHIUHH WKH &6 REVHUYDWLRQ SURFHVV GRHV QRW UHTXLUH SULRU REVHUYDWLRQV \ $[ WKH XQNQRZQ PVSDUVH VLJQDO NQRZOHGJH RI WKH GRPDLQ LQ ZKLFK WKH GDWD DUH [ FDQ EH UHFRYHUHG H[DFWO\ DV WKH XQLTXH VROXWLRQ WR FRPSUHVVLEOH 7KH VSDUVH YHFWRU Ƨ DQG KHQFH [ FDQ EH DFFXUDWHO\ UHFRYHUHG IURP \ XVLQJ WKH DUJ PLQ ]  VXEMHFW WR \ $]  ] __ __ UHFRQVWUXFWLRQ WHFKQLTXHV GHVFULEHG DERYH )RU Q H[DPSOH LQ WKH QRLVHOHVV VHWWLQJ RQH FDQ VROYH ZKHUH __ ]__  ȸ L  _]L_ GHQRWHV WKH QRUP SURYLGHG $ VDWLVÀHV 5,3 RI RUGHU P >@ 7KH  ƧØ DUJ PLQ ]  VXEMHFW WR \ $7 ]  UHFRYHU\ SURFHGXUH FDQ EH FDVW DV D OLQHDU SURJUDP ] __ __ VR VROXWLRQ PHWKRGV DUH WUDFWDEOH HYHQ ZKHQ Q LV YHU\ ODUJH WR REWDLQ DQ H[DFW UHFRQVWUXFWLRQ RI WKH WUDQVIRUP FRHIÀFLHQWV RI [ 1RWH WKDW ZKLOH WKH VDPSOHV GR &RPSUHVVHG VHQVLQJ UHPDLQV TXLWH HIIHFWLYH QRW UHTXLUH VHOHFWLRQ RI DQ DSSURSULDWH VSDUVLI\LQJ HYHQ ZKHQ WKH VDPSOHV DUH FRUUXSWHG E\ DGGLWLYH WUDQVIRUP WKH UHFRQVWUXFWLRQ GRHV QRLVH ZKLFK LV LPSRUWDQW IURP D SUDFWLFDO SRLQW RI YLHZ VLQFH DQ\ UHDO V\VWHP ZLOO EH VXEMHFWHG 2IWHQ VLJQDOV RI LQWHUHVW ZLOO QRW EH H[DFWO\ WR PHDVXUHPHQW LQDFFXUDFLHV $ YDULHW\ RI VSDUVH EXW LQVWHDG PRVW RI WKH HQHUJ\ LV FRQFHQWUDWHG UHFRQVWUXFWLRQ PHWKRGV KDYH EHHQ SURSRVHG WR RQ D UHODWLYHO\ VPDOO VHW RI HQWULHV ZKLOH WKH UHFRYHU DQ DSSUR[LPDWLRQ RI [ ZKHQ REVHUYDWLRQV UHPDLQLQJ HQWULHV DUH YHU\ VPDOO 7KH GHJUHH RI DUH FRUUXSWHG E\ QRLVH )RU H[DPSOH HVWLPDWHV Ø[ HIIHFWLYH VSDUVLW\ RI VXFK VLJQDOV FDQ EH TXDQWLÀHG FDQ EH REWDLQHG DV WKH VROXWLRQV RI HLWKHU ZLWK UHVSHFW WR D JLYHQ EDVLV )RUPDOO\ IRU D VLJQDO [ OHW [V EH WKH DSSUR[LPDWLRQ RI [ IRUPHG E\ UH 7  V DUJ P]LQ __ ]__ VXEMHFW WR __ $ \$] __ ’ ” ƪ  WDLQLQJ WKH FRHIÀFLHQWV KDYLQJ ODUJHVW PDJQLWXGH LQ WKH WUDQVIRUPHG UHSUHVHQWDWLRQ Ƨ 7[ 7KHQ [ ZKHUH __ ]__ ’ PD[L Q _]L_ >@ RU WKH SHQDOL]HG LV FDOOHG ƠFRPSUHVVLEOH LI WKH DSSUR[LPDWLRQ HUURU OHDVW VTXDUHV PLQLPL]DWLRQ REH\V

The Next Wave „ Vol 18 No 3 „ 2010 19 ,((( ‹

Figure 2: 6SDUVLI\LQJ WUDQVIRUPDWLRQ WHFKQLTXHV GHSHQG RQ QHWZRUN WRSRORJLHV 7KH VPRRWKO\ YDU\LQJ ÀHOG LQ D LV PRQLWRUHG E\ D QHWZRUN RI ZLUHOHVV VHQVRUV GHSOR\HG XQLIRUPO\ RYHU WKH UHJLRQ DQG VWDQGDUG WUDQVIRUP WHFKQLTXHV FDQ EH XVHG WR VSDUVLI\ WKH QHWZRUNHG GDWD )RU PRUH DEVWUDFW WRSRORJLHV JUDSK ZDYHOHWV FDQ EH HIIHFWLYH ,Q E  WKH JUDSK +DDU ZDYHOHW FRHIÀFLHQW DW WKH ORFDWLRQ RI WKH EODFN QRGH DQG VFDOH WKUHH LV JLYHQ E\ WKH GLIIHUHQFH RI WKH DYHUDJH GDWD YDOXHV DW WKH QRGHV LQ WKH UHG DQG EOXH UHJLRQV D E

V  [[  VXFK DV LQ )LJXUH  D  WKHQ VSDUVLI\LQJ WUDQVIRUPV __ __ ” FRQVW ‡ VƠ  Q PD\ EH UHDGLO\ ERUURZHG IURP WUDGLWLRQDO VLJQDO IRU VRPH Ơ Ơ [7 !  7KLV PRGHO GHVFULEHV SURFHVVLQJ ,Q WKHVH VHWWLQJV WKH VHQVRU ORFDWLRQV IRU H[DPSOH VLJQDOV ZKRVH RUGHUHG WUDQVIRUPHG FDQ EH YLHZHG DV VDPSOLQJ ORFDWLRQV DQG WRROV OLNH FRHIÀFLHQW DPSOLWXGHV H[KLELW SRZHUODZ GHFD\ WKH GLVFUHWH )RXULHU WUDQVIRUP ')7 RU GLVFUHWH 6XFK EHKDYLRU LV DVVRFLDWHG ZLWK LPDJHV WKDW DUH ZDYHOHW WUDQVIRUP ':7 PD\ EH XVHG WR VSDUVLI\ VPRRWK RU KDYH ERXQGHG YDULDWLRQ >@ >@ DQG LV WKH QHWZRUNHG GDWD ,Q PRUH JHQHUDO VHWWLQJV RIWHQ REVHUYHG LQ WKH ZDYHOHW FRHIÀFLHQWV RI QDWXUDO ZDYHOHW WHFKQLTXHV FDQ EH H[WHQGHG WR DOVR KDQGOH LPDJHV ,Q WKLV VHWWLQJ &6 UHFRQVWUXFWLRQ WHFKQLTXHV QRQXQLIRUP GLVWULEXWLRQ RI VHQVRUV >@ FDQ DJDLQ EH DSSOLHG WR REWDLQ DQ HVWLPDWH RI WKH Graph wavelets WUDQVIRUPHG FRHIÀFLHQWV GLUHFWO\ )RU H[DPSOH WKH 6WDQGDUG VLJQDO WUDQVIRUPV FDQQRW EH DSSOLHG VROXWLRQV RI RSWLPL]DWLRQV DQDORJRXV WR  DQG  LQ PRUH JHQHUDO VLWXDWLRQV )RU H[DPSOH PDQ\ \LHOG HVWLPDWHV ZKRVH HVWLPDWLRQ HUURU GHFD\V DW WKH QHWZRUN PRQLWRULQJ DSSOLFDWLRQV UHO\ RQ WKH DQDO\VLV ƠƠ UDWH ORJ QN  TXDQWLI\LQJ WKH VLPXOWDQHRXV RI WUDIÀF OHYHOV DW WKH QHWZRUN QRGHV &KDQJHV LQ EDODQFLQJ RI WKH HUURUV GXH WR DSSUR[LPDWLRQ DQG WKH EHKDYLRU RI WUDIÀF OHYHOV FDQ EH LQGLFDWLYH RI HVWLPDWLRQ >@ 7KLV UHVXOW JXDUDQWHHV WKDW HYHQ YDULDWLRQV LQ QHWZRUN XVDJH FRPSRQHQW IDLOXUHV RU ZKHQ VLJQDOV DUH RQO\ DSSUR[LPDWHO\ VSDUVH PDOLFLRXV DFWLYLWLHV 7KHUH DUH VWURQJ FRUUHODWLRQV FRQVLVWHQW HVWLPDWLRQ LV VWLOO SRVVLEOH EHWZHHQ WUDIÀF OHYHOV DW GLIIHUHQW QRGHV EXW WKH Sparsifying networked data WRSRORJ\ DQG URXWLQJ DIIHFW WKH QDWXUH RI WKHVH UHODWLRQVKLSV LQ FRPSOH[ ZD\V *UDSK ZDYHOHWV &RPSUHVVHG VHQVLQJ FDQ EH YHU\ HIIHFWLYH GHYHORSHG ZLWK WKHVH FKDOOHQJHV LQ PLQG DGDSW WKH ZKHQ [ LV VSDUVH RU KLJKO\ FRPSUHVVLEOH LQ D FHUWDLQ GHVLJQ SULQFLSOHV RI WKH ':7 WR DUELWUDU\ QHWZRUNHG EDVLV RU GLFWLRQDU\ %XW ZKLOH WUDQVIRUPEDVHG GDWD >@ FRPSUHVVLRQ LV ZHOOGHYHORSHG LQ WUDGLWLRQDO VLJQDO 7R XQGHUVWDQG JUDSK ZDYHOHWV LW LV XVHIXO DQG LPDJH SURFHVVLQJ GRPDLQV WKH XQGHUVWDQGLQJ WR ÀUVW FRQVLGHU WKH +DDU ZDYHOHW WUDQVIRUP RI VSDUVLI\LQJFRPSUHVVLQJ EDVHV IRU QHWZRUNHG ZKLFK LV WKH VLPSOHVW IRUP RI ':7 7KH +DDU GDWD LV IDU IURP FRPSOHWH 7KHUH DUH KRZHYHU D ZDYHOHW FRHIÀFLHQWV DUH HVVHQWLDOO\ REWDLQHG DV IHZ SURPLVLQJ QHZ DSSURDFKHV WR WKH GHVLJQ RI GLJLWDO GLIIHUHQFHV RI WKH GDWD DW GLIIHUHQW VFDOHV RI WUDQVIRUPV IRU QHWZRUNHG GDWD VRPH RI ZKLFK DUH DJJUHJDWLRQ 7KH FRHIÀFLHQWV DW WKH ÀUVW VFDOH DUH GHVFULEHG EHORZ GLIIHUHQFHV EHWZHHQ QHLJKERULQJ GDWD SRLQWV DQG Spatial compression WKRVH DW VXEVHTXHQW VSDWLDO VFDOHV DUH FRPSXWHG 6XSSRVH D ZLUHOHVV VHQVRU QHWZRUN LV E\ ÀUVW DJJUHJDWLQJ GDWD LQ QHLJKERUKRRGV G\DGLF GHSOR\HG WR PRQLWRU D FHUWDLQ VSDWLDOO\YDU\LQJ LQWHUYDOV LQ RQH GLPHQVLRQ DQG VTXDUH UHJLRQV LQ SKHQRPHQRQ VXFK DV WHPSHUDWXUH OLJKW RU WZR GLPHQVLRQV DQG WKHQ FRPSXWLQJ GLIIHUHQFHV PRLVWXUH 7KH SK\VLFDO ÀHOG EHLQJ PHDVXUHG FDQ EHWZHHQ QHLJKERULQJ DJJUHJDWLRQV EH YLHZHG DV D VLJQDO RU LPDJH ZLWK D GHJUHH RI *UDSK ZDYHOHWV DUH D JHQHUDOL]DWLRQ RI WKLV VSDWLDO FRUUHODWLRQ RU VPRRWKQHVV ,I WKH VHQVRUV FRQVWUXFWLRQ ZKHUH WKH QXPEHU RI KRSV EHWZHHQ DUH JHRJUDSKLFDOO\ SODFHG LQ D XQLIRUP IDVKLRQ QRGHV LQ D QHWZRUN SURYLGHV D QDWXUDO GLVWDQFH

20 Compressed Sensing and Network Monitoring FEATURE

PHDVXUH WKDW FDQ EH XVHG WR GHÀQH QHLJKERUKRRGV VSHFLÀF DSSURDFK GHVFULEHG ,((( 7KH VL]H RI HDFK QHLJKERUKRRG ZLWK UDGLXV GHÀQHG EHORZ LV PRWLYDWHG E\ PDQ\ ‹ E\ WKH QXPEHU RI KRSV SURYLGHV D QDWXUDO PHDVXUH ZLUHOHVV VHQVRU QHWZRUNLQJ RI VFDOH ZLWK VPDOOHU VL]HV FRUUHVSRQGLQJ WR ÀQHU DSSOLFDWLRQV LQ ZKLFK H[SOLFLW VSDWLDO DQDO\VLV RI WKH QHWZRUNHG GDWD *UDSK URXWLQJ LQIRUPDWLRQ LV GLIÀFXOW ZDYHOHW FRHIÀFLHQWV DUH WKHQ GHÀQHG E\ DJJUHJDWLQJ WR REWDLQ DQG PDLQWDLQ LQ GDWD DW GLIIHUHQW VFDOHV DQG FRPSXWLQJ GLIIHUHQFHV UHDO WLPH ,Q WKLV VHWWLQJ EHWZHHQ DJJUHJDWHG GDWD DV VKRZQ LQ )LJXUH  E  VHQVRUV FRQWULEXWH WKHLU )XUWKHU GHWDLOV DQG JHQHUDOL]DWLRQV RI WKLV FDQ EH PHDVXUHPHQWV LQ D MRLQW IDVKLRQ E\ VLPXOWDQHRXV ZLUHOHVV IRXQG LQ >@ D /RFDO WUDIÀF UDWHV WUDQVPLVVLRQV WR D GLVWDQW Diffusion wavelets SURFHVVLQJ ORFDWLRQ DQG WKH 150 'LIIXVLRQ ZDYHOHWV SURYLGH DQ DOWHUQDWLYH Original REVHUYDWLRQV DUH DFFXPXODWHG Transformed DSSURDFKWRFRQVWUXFWLQJDPXOWLVFDOHUHSUHVHQWDWLRQ DQG SURFHVVHG DW WKDW VLQJOH IRU QHWZRUNHG GDWD 8QOLNH JUDSK ZDYHOHWV ZKLFK GHVWLQDWLRQ SRLQW 100 SURGXFH DQ RYHUFRPSOHWH GLFWLRQDU\ GLIIXVLRQ Compressed sensing for ZDYHOHWV SURGXFH DQ RUWKRQRUPDO EDVLV WDLORUHG WR Magnitude 50 D VSHFLÀF QHWZRUN E\ DQDO\]LQJ HLJHQYHFWRUV RI D networked data storage GLIIXVLRQ PDWUL[ GHULYHG IURP WKH QHWZRUN DGMDFHQF\ and retrieval 0 PDWUL[ KHQFH WKH QDPH ´GLIIXVLRQ ZDYHOHWVµ  7KH ,Q JHQHUDO PXOWLKRS 0 100 200 300 400 500 UHVXOWLQJ EDVLV YHFWRUV DUH JHQHUDOO\ ORFDOL]HG WR QHWZRUNV WZR VLPSOH VWHSV FDQ Coefficient Index (Sorted) QHLJKERUKRRGV RI YDU\LQJ VL]H DQG PD\ DOVR OHDG EH XVHG IRU WKH GHFHQWUDOL]HG E 2UGHUHG &RHIÀFLHQWV WR D VSDUVLI\LQJ UHSUHVHQWDWLRQ RI QHWZRUNHG GDWD FRPSXWDWLRQ DQG GLVWULEXWLRQ $ WKRURXJK WUHDWPHQW RI WKLV WRSLF FDQ EH IRXQG LQ RI HDFK &6 REVHUYDWLRQ RI WKH Q Figure 3: >@ IRUP \L ȸM  $LM [M L N $Q LOOXVWUDWLRQ RI WKH 2QH H[DPSOH RI VSDUVLÀFDWLRQ XVLQJ GLIIXVLRQ 6WHS  (DFK RI WKH Q VHQVRUV M Q ORFDOO\ FRPSUHVVLELOLW\ RI VSDWLDOO\ ZDYHOHWV LV VKRZQ LQ )LJXUH  ZKHUH WKH QRGH FRPSXWHV WKH WHUP $ [ E\ PXOWLSO\LQJ LWV GDWD ZLWK FRUUHODWHG QHWZRUNHG GDWD GDWD FRUUHVSRQG WR WUDIÀF UDWHV WKURXJK URXWHUV LM M XVLQJ GLIIXVLRQ ZDYHOHWV WKH FRUUHVSRQGLQJ HOHPHQW RI WKH VHQVLQJ PDWUL[ 7KH DFWXDO QHWZRUNHG GDWD LQ D FRPSXWHU QHWZRUN 7KHUH DUH VHYHUDO KLJKO\ 7KH VHQVLQJ PDWUL[ FDQ EH JHQHUDWHG LQ D GLVWULEXWHG VKRZQ LQ D DUH QRW VSDUVH ORFDOL]HG UHJLRQV RI DFWLYLW\ ZKLOH PRVW RI WKH IDVKLRQ E\ OHWWLQJ HDFK QRGH ORFDOO\ JHQHUDWH D EXW FDQ EH UHSUHVHQWHG ZLWK UHPDLQLQJ QHWZRUN H[KLELWV RQO\ PRGHUDWH OHYHOV RI D VPDOO QXPEHU RI GLIIXVLRQ UHDOL]DWLRQ RI $LM XVLQJ D SVHXGRUDQGRP QXPEHU WUDIÀF 7KH WUDIÀF GDWD DUH VSDUVHO\ UHSUHVHQWHG LQ ZDYHOHW FRHIÀFLHQWV DV JHQHUDWRU VHHGHG ZLWK LWV LGHQWLÀHU ,Q WKLV H[DPSOH VHHQ LQ E  WKH GLIIXVLRQ ZDYHOHW EDVLV DQG D VPDOO QXPEHU RI WKH LQWHJHUV M Q VHUYH DV WKH LGHQWLÀHUV FRHIÀFLHQWV FDQ SURYLGH DQ DFFXUDWH HVWLPDWH RI WKH *LYHQ WKH LGHQWLÀHUV RI WKH QRGHV WKH GHVWLQDWLRQ DFWXDO WUDIÀF SDWWHUQV QRGH V FDQ DOVR HDVLO\ JHQHUDWH WKH UDQGRP YHFWRUV N Networked data compression ^$LM`L  IRU HDFK VHQVRU M Q in action 6WHS  7KH ORFDO WHUPV $LM [M DUH VLPXOWDQHRXVO\ 7KLV VHFWLRQ GHVFULEHV WZR WHFKQLTXHV IRU DJJUHJDWHG DQG GLVWULEXWHG DFURVV WKH QHWZRUN XVLQJ REWDLQLQJ SURMHFWLRQV RI QHWZRUNHG GDWD RQWR UDQGRPL]HG JRVVLS ZKLFK LV D VLPSOH LWHUDWLYH JHQHUDO YHFWRUV ZKLFK FDQ EH WKRXJKW RI DV WKH GHFHQWUDOL]HG DOJRULWKP IRU FRPSXWLQJ OLQHDU Q URZV RI WKH VHQVLQJ PDWUL[ $ 7KH ÀUVW DSSURDFK IXQFWLRQV VXFK DV \L ȸM  $LM [M VHH )LJXUH   GHVFULEHG EHORZ DVVXPHV WKDW WKH QHWZRUN LV DQ\ 1RWH WKDW JRVVLS DOJRULWKPV DUH KLJKO\ UHVLOLHQW WR JHQHUDO PXOWLKRS QHWZRUN 7KLV PRGHO FRXOG QRGH IDLOXUHV EHFDXVH L HDFK QRGH RQO\ H[FKDQJHV H[SODLQ IRU H[DPSOH ZLUHOHVV VHQVRU QHWZRUNV LQIRUPDWLRQ ZLWK LWV LPPHGLDWH QHLJKERUV DQG LL ZLUHG ORFDO DUHD QHWZRUNV RU HYHQ SRUWLRQV RI WKH ZKHQ WKH\ WHUPLQDWH WKH YDOXH RI \L LV DYDLODEOH DW ,QWHUQHW ,Q WKH PXOWLKRS VHWWLQJ WKH SURMHFWLRQV FDQ HYHU\ QRGH LQ WKH QHWZRUN EH FRPSXWHG DQG GHOLYHUHG WR HYHU\ VXEVHW RI QRGHV 6LQFH WKH DERYH SURFHGXUH HQVXUHV WKDW WKH LQ WKH QHWZRUN XVLQJ JRVVLSFRQVHQVXV WHFKQLTXHV QHWZRUNHG GDWD SURMHFWLRQV DUH NQRZQ DW HYHU\ RU WKH\ PLJKW EH GHOLYHUHG WR D VLQJOH SRLQW XVLQJ QRGH D XVHU FDQ TXHU\ DQ\ QRGH LQ WKH QHWZRUN DQG FOXVWHULQJ DQG DJJUHJDWLRQ 7KH VHFRQG PRUH FRPSXWH Ø[ YLD RQH RI WKH UHFRQVWUXFWLRQ PHWKRGV

The Next Wave „ Vol 18 No 3 „ 2010 21 Figure 4: 5DQGRPL]HG JRVVLS D GHSLFWV RQH LWHUDWLRQ LQ ZKLFK WKH FRORU RI D QRGH FRUUHVSRQGV WR LWV ORFDO YDOXH 7R EHJLQ WKH QHWZRUN LV LQLWLDOL]HG WR D

VWDWH ZKHUH HDFK QRGH KDV D YDOXH [L   L Q 7KHQ LQ DQ LWHUDWLYH DV\QFKURQRXV IDVKLRQ D UDQGRP QRGH D ´DFWLYDWHVµ DQG FKRRVHV RQH RI LWV QHLJKERUV E DW UDQGRP 7KH WZR QRGHV WKHQ ´JRVVLSµ E\ H[FKDQJLQJ WKHLU YDOXHV

[D W DQG [E W  RU LQ WKH &6 VHWWLQJ WKH YDOXHV PXOWLSOLHG D E\ SVHXGRUDQGRP QXPEHUV DQG SHUIRUP WKH XSGDWH [D W   [E W   @ [D W  [E W  ZKLOH WKH GDWD DW DOO WKH RWKHU QRGHV UHPDLQV XQFKDQJHG ,Q E  ZH KDYH DQ H[DPSOH QHWZRUN RI  QRGHV ZLWK L UDQGRP LQLWLDO YDOXHV OHIW  LL DIWHU HDFK QRGH KDV FRPPXQLFDWHG ÀYH WLPHV ZLWK HDFK RI LWV QHLJKERUV PLGGOH  DQG LLL DIWHU HDFK QRGH KDV FRPPXQLFDWHG  WLPHV ZLWK HDFK RI LWV QHLJKERUV ULJKW  ,W FDQ EH VKRZQ WKDW IRU WKLV VLPSOH

SURFHGXUH [L W FRQYHUJHV WR WKH DYHUDJH RI WKH LQLWLDO Q YDOXHV Q ȸ M  [M   DW HYHU\ QRGH LQ WKH QHWZRUN DV W WHQGV WR LQÀQLW\

E ‹ ,(((

RXWOLQHG LQ WKH &RPSUHVVHG 6HQVLQJ %DVLFV VHFWLRQ VHQVRU QHWZRUNV >@²>@ 7KH SURSRVHG GLVWULEXWHG )XUWKHU WKLV FDQ EH TXLWH DQ HIÀFLHQW SURFHGXUH LQ FRPPXQLFDWLRQ DUFKLWHFWXUH³LQWURGXFHG LQ >@ >@ PDQ\ VFHQDULRV )RU H[DPSOH LQ QHWZRUNV ZLWK DQG UHÀQHG LQ >@³UHTXLUHV RQO\ RQH QHWZRUN SRZHUODZ GHJUHH GLVWULEXWLRQV VXFK DV WKH ,QWHUQHW WUDQVPLVVLRQ SHU UDQGRP SURMHFWLRQ DQG LV EDVHG DQ RSWLPL]HG JRVVLS DOJRULWKP XVHV RQ WKH RUGHU RI RQ WKH QRWLRQ RI VRFDOOHG ´PDWFKHG VRXUFHFKDQQHO NQ WUDQVPLVVLRQV WR FRPSXWH N &6 REVHUYDWLRQV >@ FRPPXQLFDWLRQµ >@ >@ +HUH WKH &6 SURMHFWLRQ JHQHUDOO\ N << Q 6R WKLV LV PXFK PRUH HIÀFLHQW WKDQ REVHUYDWLRQV DUH VLPXOWDQHRXVO\ FDOFXODWHG E\ WKH H[KDXVWLYHO\ H[FKDQJLQJ UDZ GDWD YDOXHV ZKLFK VXSHUSRVLWLRQ RI UDGLR ZDYHV DQG FRPPXQLFDWHG  ZRXOG WDNH DERXW Q WUDQVPLVVLRQV 2I FRXUVH LI XVLQJ DPSOLWXGHPRGXODWHG FRKHUHQW WUDQVPLVVLRQV WKH ORFDWLRQ RI WKH QRGH WR EH TXHULHG LV À[HG D RI UDQGRPO\ ZHLJKWHG VHQVHG YDOXHV GLUHFWO\ IURP SULRUL³DQG LI WKH QHWZRUN SURYLGHV UHOLDEOH URXWLQJ WKH VHQVRU QRGHV WR WKH )& YLD WKH DLU LQWHUIDFH VHUYLFH³WKHQ LW PD\ EH PRUH HIÀFLHQW WR UHSODFH $OJRULWKPLFDOO\ VHQVRU QRGHV VHTXHQWLDOO\ SHUIRUP JRVVLS FRPSXWDWLRQ ZLWK DJJUHJDWLRQ XS D VSDQQLQJ WKH IROORZLQJ VWHSV LQ RUGHU WR FRPPXQLFDWH N WUHH RU DURXQG D F\FOH )RU PRUH RQ XVLQJ JRVVLS UDQGRP SURMHFWLRQV RI WKH QHWZRUNHG GDWD WR WKH )& DOJRULWKPV WR FRPSXWHGLVWULEXWH FRPSUHVVHG GDWD LQ PXOWLKRS QHWZRUNV VHH >@ 6WHS  (DFK RI WKH Q VHQVRUV ORFDOO\ GUDZV N HOHPHQWV RI WKH UDQGRP SURMHFWLRQ YHFWRUV Compressed sensing in wireless sensor ^$ `N E\ XVLQJ LWV QHWZRUN DGGUHVV DV WKH VHHG networks LM L  RI D SVHXGRUDQGRP QXPEHU JHQHUDWRU *LYHQ WKH $ W\SLFDO ZLUHOHVV VHQVRU QHWZRUN DV LQ )LJXUH QHWZRUN DGGUHVVHV RI WKH QRGHV WKH )& FDQ DOVR  FRQVLVWV RI D ODUJH QXPEHU RI VPDOO LQH[SHQVLYH NQ HDVLO\ UHFRQVWUXFW WKH UDQGRP YHFWRUV ^$LM`LM  ZLUHOHVV VHQVRUV VSDWLDOO\ GLVWULEXWHG RYHU D UHJLRQ 6WHS  7KH VHQVRU DW ORFDWLRQ M PXOWLSOLHV LWV RI LQWHUHVW WKDW FDQ VHQVH WKH SK\VLFDO HQYLURQPHQW PHDVXUHPHQW [ ZLWK ^$ `N WR REWDLQ D NWXSOH LQ D YDULHW\ RI PRGDOLWLHV 7KH HVVHQWLDO WDVN LQ M LM L  PDQ\ DSSOLFDWLRQV RI VHQVRU QHWZRUNV LV WR H[WUDFW Y $ [ $ [ 7 M Q  VRPH UHOHYDQW LQIRUPDWLRQ IURP GLVWULEXWHG GDWD DQG M  M M NM M WKHQ ZLUHOHVVO\ GHOLYHU LW WR D GLVWDQW GHVWLQDWLRQ FDOOHG WKH IXVLRQ FHQWHU )&  :KLOH WKLV WDVN FDQ EH DQG DOO WKH QRGHV FRKHUHQWO\ WUDQVPLW WKHLU UHVSHFWLYH DFFRPSOLVKHG LQ D QXPEHU RI ZD\V RQH SDUWLFXODUO\ YM·V LQ DQ DQDORJ IDVKLRQ RYHU WKH QHWZRUNWR)& DWWUDFWLYH WHFKQLTXH FRUUHVSRQGV WR GHOLYHULQJ DLU LQWHUIDFH XVLQJ N WUDQVPLVVLRQV %HFDXVH RI WKH UDQGRP SURMHFWLRQV RI WKH VHQVRU QHWZRUNHG GDWD DGGLWLYH QDWXUH RI UDGLR ZDYHV WKH FRUUHVSRQGLQJ WR WKH )& E\ H[SORLWLQJ UHFHQW UHVXOWV RQ XQFRGHG UHFHLYHG VLJQDO DW WKH )& DW WKH HQG RI WKH NWK DQDORJ FRKHUHQW WUDQVPLVVLRQ VFKHPHV LQ ZLUHOHVV WUDQVPLVVLRQ LV JLYHQ E\

22 Compressed Sensing and Network Monitoring FEATURE

Q Sensor network monitoring river water \ ȸYM Ǿ $[  Ǿ   M  Fusion center ZKHUH Ǿ LV WKH QRLVH JHQHUDWHG E\ WKH FRPPXQLFDWLRQ UHFHLYHU FLUFXLWU\ RI WKH )& 7KH VWHSV DERYH FRUUHVSRQG WR D FRPSOHWHO\ GHFHQWUDOL]HG ZD\ RI GHOLYHULQJ N UDQGRP SURMHFWLRQV RI WKH QHWZRUNHG GDWD WR WKH )& E\ HPSOR\LQJ N QHWZRUN WUDQVPLVVLRQV 7KH ÀQDO HVWLPDWH Ø[ FDQ EH FRPSXWHG ‹ ,((( DW WKH )& YLD DQ\ RI WKH PHWKRGV RXWOLQHG HDUOLHU Receive antenna plane $V QRWHG HDUOLHU WKH PDLQ DGYDQWDJH RI XVLQJ WKLV DSSURDFK IRU FRPSXWLQJ UDQGRP SURMHFWLRQV LV WKDW Figure 5: $Q LOOXVWUDWLRQ RI D ZLUHOHVV VHQVRU QHWZRUN DQG IXVLRQ FHQWHU LW FDQ EH LPSOHPHQWHG ZLWKRXW DQ\ FRPSOH[ URXWLQJ $ QXPEHU RI VHQVRU QRGHV PRQLWRU WKH ULYHU ZDWHU IRU YDULRXV IRUPV RI LQIRUPDWLRQ DQG DV D UHVXOW PLJKW EH D PRUH VXLWDEOH FRQWDPLQDWLRQ DQG SHULRGLFDOO\ UHSRUW WKHLU ÀQGLQJV RYHU WKH DLU WR WKH IXVLRQ DQG VFDODEOH RSWLRQ LQ PDQ\ VHQVRU QHWZRUNLQJ FHQWHU &6 SURMHFWLRQ REVHUYDWLRQV DUH REWDLQHG E\ HDFK VHQVRU WUDQVPLWWLQJ D VLQXVRLG ZLWK DPSOLWXGH JLYHQ E\ WKH SURGXFW RI WKH VHQVRU PHDVXUHPHQW DQG DSSOLFDWLRQV D SVHXGRUDQGRP ZHLJKW :KHQ WKH WUDQVPLVVLRQV DUULYH LQ SKDVH DW WKH IXVLRQ FHQWHU WKH DPSOLWXGH RI WKH UHVXOWLQJ UHFHLYHG ZDYHIRUP LV WKH VXP RI WKH Conclusions and extensions FRPSRQHQW ZDYH DPSOLWXGHV 7KLV DUWLFOH KDV GHVFULEHG KRZ FRPSUHVVHG VHQVLQJ WHFKQLTXHV FRXOG EH XWLOL]HG WR UHFRQVWUXFW REVHUYDWLRQ WKH UHTXHVWLQJ QRGH RU IXVLRQ FHQWHU VSDUVH RU FRPSUHVVLEOH QHWZRUNHG GDWD LQ D EURDGFDVWV D UDQGRP LQWHJHU EHWZHHQ  DQG Q WR WKH YDULHW\ RI SUDFWLFDO VHWWLQJV LQFOXGLQJ JHQHUDO QRGHV WR VSHFLI\ ZKLFK WUDQVIRUP FRHIÀFLHQW IURP PXOWLKRS QHWZRUNV DQG ZLUHOHVV VHQVRU QHWZRUNV WKH SUHGHWHUPLQHG EDVLV VKRXOG EH REWDLQHG DQG WKH &RPSUHVVHG VHQVLQJ SURYLGHV WZR NH\ IHDWXUHV SURMHFWLRQ LV GHOLYHUHG XVLQJ DQ\ VXLWDEOH PHWKRG XQLYHUVDO VDPSOLQJ DQG GHFHQWUDOL]HG HQFRGLQJ GHVFULEHG DERYH PDNLQJ LW D SURPLVLQJ QHZ SDUDGLJP IRU QHWZRUNHG )LQDOO\ LW LV ZRUWK QRWLQJ WKDW PDWULFHV GDWD DQDO\VLV 7KH IRFXV KHUH ZDV SULPDULO\ RQ VDWLVI\LQJ WKH 5,3 DOVR DSSUR[LPDWHO\ SUHVHUYH PDQDJLQJ UHVRXUFHV GXULQJ WKH HQFRGLQJ SURFHVV DGGLWLRQDO JHRPHWULFDO VWUXFWXUH RQ VXEVSDFHV RI EXW LW LV LPSRUWDQW WR QRWH WKDW WKH GHFRGLQJ VWHS VSDUVH YHFWRUV VXFK DV DQJOHV DQG LQQHU SURGXFWV DOVR SRVHV D VLJQLÀFDQW FKDOOHQJH ,QGHHG WKH VWXG\ DV VKRZQ LQ >@ $ XVHIXO FRQVHTXHQFH RI WKLV RI HIÀFLHQW GHFRGLQJ DOJRULWKPV UHPDLQV DW WKH UHVXOW LV WKDW DQ HQVHPEOH RI &6 REVHUYDWLRQV FDQ IRUHIURQW RI FXUUHQW UHVHDUFK >@²>@ EH ´GDWD PLQHGµ IRU HYHQWV RI LQWHUHVW >@ >@ ,Q DGGLWLRQ VSHFLDOL]HGPHDVXUHPHQW PDWULFHV )RU H[DPSOH FRQVLGHU D QHWZRUN ZKRVH GDWD PD\ VXFK DV WKRVH UHVXOWLQJ IURP 7RHSOLW]VWUXFWXUHG FRQWDLQ DQ DQRPDO\ WKDW RULJLQDWHG DW RQH RI P PDWULFHV >@ DQG WKH LQFRKHUHQW EDVLV VDPSOLQJ FDQGLGDWH QRGHV $Q HQVHPEOH RI &6 REVHUYDWLRQV PHWKRGV GHVFULEHG LQ >@ OHDG WR VLJQLÀFDQW RI WKH QHWZRUNHG GDWD FROOHFWHG ZLWKRXW DQ\ D UHGXFWLRQV LQ WKH FRPSOH[LW\ RI FRQYH[ GHFRGLQJ SULRUL LQIRUPDWLRQ DERXW WKH DQRPDO\ FDQ EH PHWKRGV )RUWXQDWHO\ WKH VDPSOLQJ PDWULFHV DQDO\]HG ´SRVWPRUWHPµ WR DFFXUDWHO\ GHWHUPLQH ZKLFK FDQGLGDWH QRGH ZDV WKH OLNHO\ VRXUFH RI WKH LQKHUHQW WR WKHVH PHWKRGV FDQ EH HDVLO\ LPSOHPHQWHG DQRPDO\ 6XFK H[WHQVLRQV RI &6 WKHRU\ VXJJHVW XVLQJ WKH QHWZRUN SURMHFWLRQ DSSURDFKHV GHVFULEHG HIÀFLHQW DQG VFDODEOH WHFKQLTXHV IRU PRQLWRULQJ DERYH )RU H[DPSOH 7RHSOLW]VWUXFWXUHG &6 ODUJHVFDOH GLVWULEXWHG QHWZRUNV PDQ\ RI ZKLFK PDWULFHV QDWXUDOO\ UHVXOW ZKHQ HDFK QRGH XVHV WKH FDQ EH SHUIRUPHG ZLWKRXW WKH FRPSXWDWLRQDO EXUGHQ VDPH UDQGRP QXPEHU JHQHUDWLRQ VFKHPH DQG VHHG RI UHFRQVWUXFWLQJ WKH FRPSOHWH QHWZRUNHG GDWD YDOXH LQ ZKLFK HDFK QRGH DGYDQFHV LWV RZQ UDQGRP VHTXHQFH E\ LWV XQLTXH LQWHJHU LGHQWLÀHU DW LQLWLDOL]DWLRQ 6LPLODUO\ UDQGRP VDPSOHV IURP DQ\ RUWKRQRUPDO EDVLV WKH REVHUYDWLRQ PRGHO GHVFULEHG LQ >@ FDQ HDVLO\ EH REWDLQHG LQ WKH VHWWLQJV GHVFULEHG DERYH LI HDFK QRGH LV SUHORDGHG ZLWK LWV ZHLJKWV IRU HDFK EDVLV HOHPHQW LQ WKH FRUUHVSRQGLQJ RUWKRQRUPDO WUDQVIRUPDWLRQ PDWUL[ )RU HDFK

The Next Wave „ Vol 18 No 3 „ 2010 23 References >@ 6 6 3UDGKDQ - .XVXPD DQG . 5DPFKDQGUDQ 'LVWULEXWHG FRPSUHVVLRQ LQ D GHQVH PLFURVHQVRU QHWZRUN ,((( 6LJQDO 3URFHVVLQJ 0DJ   ²  0DUFK  >@ ( - &DQGqV DQG 7 7DR 'HFRGLQJ E\ OLQ HDU SURJUDPPLQJ ,((( 7UDQV ,QIRUP 7KHRU\   ² 'HFHPEHU  >@ ' / 'RQRKR &RPSUHVVHG VHQVLQJ ,((( 7UDQV ,QIRUP 7KHRU\   ² $SULO  >@ - +DXSW DQG 5 1RZDN 6LJQDO UHFRQVWUXFWLRQ IURP QRLV\ UDQGRP SURMHFWLRQV ,((( 7UDQV ,Q IRUP 7KHRU\   ² 6HSWHPEHU  >@ ( &DQGqV DQG 7 7DR 7KH 'DQW]LJ VHOHFWRU 6WDWLVWLFDO HVWLPDWLRQ ZKHQ S LV PXFK ODUJHU WKDQ Q $QQDOV RI 6WDWLVWLFV    'HFHPEHU  >@ : 8 %DMZD - +DXSW $ 0 6D\HHG DQG 5 1RZDN &RPSUHVVLYH ZLUHOHVV VHQVLQJ ,Q 3URF ,361· SDJHV ² 1DVKYLOOH 71 $SULO 

24 Compressed Sensing and Network Monitoring FEATURE

>@ 0 5DEEDW - +DXSW $ 6LQJK DQG 5 1RZDN 3URF QG $QQXDO $OOHUWRQ &RQIHUHQFH RQ &RP 'HFHQWUDOL]HG FRPSUHVVLRQ DQG SUHGLVWULEXWLRQ YLD PXQ &RQWURO DQG &RPS 2FWREHU  UDQGRPL]HG JRVVLSLQJ ,Q 3URF ,361· SDJHV >@ 0 *DVWSDU DQG 0 9HWWHUOL 3RZHU VSDWLR ² 1DVKYLOOH 71 $SULO  WHPSRUDO EDQGZLGWK DQG GLVWRUWLRQ LQ ODUJH VHQVRU >@ : 8 %DMZD - +DXSW $ 0 6D\HHG DQG 5 QHWZRUNV ,((( -RXUQDO 6HOHFW $UHDV &RPPXQ 1RZDN $ XQLYHUVDO PDWFKHG VRXUFHFKDQQHO FRP   ² $SULO  PXQLFDWLRQ VFKHPH IRU ZLUHOHVV VHQVRU HQVHPEOHV >@ : 8 %DMZD $ 0 6D\HHG DQG 5 1RZDN ,Q 3URF ,&$663· SDJHV ² 7RXORXVH 0DWFKHG VRXUFHFKDQQHO FRPPXQLFDWLRQ IRU ÀHOG )UDQFH 0D\  HVWLPDWLRQ LQ ZLUHOHVV VHQVRU QHWZRUNV ,Q 3URF >@ ' %DURQ 0 % :DNLQ 0 ) 'XDUWH 6 6DUYR ,361· SDJHV ² /RV $QJHOHV &$ $SULO WKDP DQG 5 * %DUDQLXN 'LVWULEXWHG FRPSUHVVHG  VHQVLQJ SUHSULQW >2QOLQH@ $YDLODEOH KWWSZZZ >@ : 8 %DMZD - +DXSW $ 0 6D\HHG DQG 5 HFHULFHHGXaGURUESGI'&6SGI 1RZDN -RLQW VRXUFHFKDQQHO FRPPXQLFDWLRQ IRU >@ : :DQJ 0 *DURIDODNLV DQG . 5DPFKDQ GLVWULEXWHG HVWLPDWLRQ LQ VHQVRU QHWZRUNV ,((( GUDQ 'LVWULEXWHG VSDUVH UDQGRP SURMHFWLRQV IRU 7UDQV ,QIRUP 7KHRU\   ² 2FWREHU UHÀQDEOH DSSUR[LPDWLRQ ,Q 3URF ,361· SDJHV  ² &DPEULGJH 0$ $SULO  >@ $ & *LOEHUW DQG - 7URSS 6LJQDO UHFRY >@ ( - &DQGqV DQG 7 7DR 1HDURSWLPDO VLJQDO HU\ IURP UDQGRP PHDVXUHPHQWV YLD RUWKRJRQDO UHFRYHU\ IURP UDQGRP SURMHFWLRQV 8QLYHUVDO HQ PDWFKLQJ SXUVXLW ,((( 7UDQV ,QIRUP 7KHRU\ FRGLQJ VWUDWHJLHV" ,((( 7UDQV ,QIRUP 7KHRU\    'HFHPEHU    ² 'HFHPEHU  >@ 0 )LJXHLUHGR 5 1RZDN DQG 6 :ULJKW >@ ( - &DQGqV 7KH UHVWULFWHG LVRPHWU\ SURSHUW\ *UDGLHQW SURMHFWLRQ IRU VSDUVH UHFRQVWUXFWLRQ DQG LWV LPSOLFDWLRQV IRU FRPSUHVVHG VHQVLQJ ,Q & $SSOLFDWLRQ WR FRPSUHVVHG VHQVLQJ DQG RWKHU 5 $FDG 6FL 6HU , YRO  SDJHV ² 3DULV LQYHUVH SUREOHPV ,((( -RXUQDO 6HOHFW 7RSLFV LQ  6LJQDO 3URFHVVLQJ     >@ 5 %DUDQLXN 0 'DYHQSRUW 5 $ 'H9RUH DQG >@ 6- .LP . .RK 0 /XVWLJ 6 %R\G DQG 0 % :DNLQ $ VLPSOH SURRI RI WKH UHVWULFWHG LVRP ' *RULQHYVN\ $Q LQWHULRU SRLQW PHWKRG IRU ODUJH HWU\ SURSHUW\ IRU UDQGRP PDWULFHV &RQVWUXFWLYH VFDOH OUHJXODUL]HG OHDVW VTXDUHV ,((( -RXUQDO $SSUR[LPDWLRQ    'HFHPEHU  6HOHFW 7RSLFV LQ 6LJQDO 3URFHVVLQJ    >@ 5 :DJQHU 5 %DUDQLXN 6 'X ' -RKQVRQ  DQG $ &RKHQ $Q DUFKLWHFWXUH IRU GLVWULEXWHG ZDYH >@ : %DMZD - +DXSW * 5D] DQG 5 1RZDN OHW DQDO\VLV DQG SURFHVVLQJ LQ VHQVRU QHWZRUNV ,Q 7RHSOLW]VWUXFWXUHG FRPSUHVVHG VHQVLQJ PDWULFHV 3URF ,361· SDJHV ² 1DVKYLOOH 71 ,Q 3URF 663· SDJHV ² 0DGLVRQ :, $X $SULO  JXVW  >@ 0 &URYHOOD DQG ( .RODF]\N *UDSK ZDYHOHWV >@ ( &DQGqV DQG - 5RPEHUJ 6SDUVLW\ DQG LQ IRU VSDWLDO WUDIÀF DQDO\VLV ,Q 3URF ,((( ,QIRFRP FRKHUHQFH LQ FRPSUHVVLYH VDPSOLQJ ,QYHUVH 3URE YRO  SDJHV ² 0DUFK  OHPV   ²  >@ 5 &RLIPDQ DQG 0 0DJJLRQL 'LIIXVLRQ ZDYH >@ - +DXSW DQG 5 1RZDN $ JHQHUDOL]HG UH OHWV $SSOLHG &RPSXWDWLRQDO DQG +DUPRQLF $QDO\ VWULFWHG LVRPHWU\ SURSHUW\ 8QLYHUVLW\ RI :LVFRQVLQ VLV   ² -XO\   0DGLVRQ 7HFK 5HS (&( 0D\  >@ 6 %R\G $ *KRVK % 3UDEKDNDU DQG ' 6KDK >@ - +DXSW DQG 5 1RZDN &RPSUHVVLYH VDPSOLQJ 5DQGRPL]HG JRVVLS DOJRULWKPV ,((( 7UDQV ,Q IRU VLJQDO GHWHFWLRQ ,Q 3URF ,&$663· +RQROXOX IRUP 7KHRU\   ² -XQH  +, $SULO  >@ 0 *DVWSDU DQG 0 9HWWHUOL 6RXUFHFKDQ >@ - +DXSW 5 &DVWUR 5 1RZDN * )XGJH DQG QHO FRPPXQLFDWLRQ LQ VHQVRU QHWZRUNV ,Q 3URF $ @ . /LX DQG $ 0 6D\HHG 2SWLPDO GLVWULEXWHG *URYH &$ 2FWREHU  GHWHFWLRQ VWUDWHJLHV IRU ZLUHOHVV VHQVRU QHWZRUNV ,Q

The Next Wave „ Vol 18 No 3 „ 2010 25 Revealing Social Networks of Spammers

Spam doesn’t really need an introduction—anyone owns an email address likely receives spam emails every day. However, spam is much more than just an annoyance. Spam’s hidden economic cost for companies in wasted storage, bandwidth, technical support, and important, the loss of employee productivity, is astronomical. The annual cost of spam for a company with 12,000 employees is approximately $2.4 million, according to a study conducted by Windows & .NET Magazine in 2003 [1]. Since then, the amount of spam received has only increased. According to estimates from MessageLabs, over 80 percent of emails received from 2005 to 2008 were spam [2].

26 Revealing Social Networks of Spammers FEATURE

••

The Next Wave „ Vol 18 No 3 „ 2010 27 7KH PDJQLWXGH RI WKH VSDP SUREOHP KDV QRW SDVW WKH VSDP ÀOWHUV WR WKH LQWHQGHG UHFLSLHQWV WKH\ JRQH XQQRWLFHG E\ WKH 86 JRYHUQPHQW ,Q  WKH PLJKW FRPSHQVDWH E\ VHQGLQJ PRUH VSDP HPDLOV 8QLWHG 6WDWHV JRYHUQPHQW GUDIWHG WKH &RQWUROOLQJ 7KXV FRQWHQWEDVHG ÀOWHULQJ PD\ HYHQ LQFUHDVH WKH WKH $VVDXOW RI 1RQ6ROLFLWHG 3RUQRJUDSK\ DQG YROXPH RI VSDP VHQW 0DUNHWLQJ &$163$0 $FW WR DGGUHVV WKH LVVXH (PDLO VHUYHUV WKDW VHQG RQO\ VSDP FDQ EH &$163$0 SURYLGHG JXLGHOLQHV RQ XQVROLFLWHG EODFNOLVWHG WR ÀOWHU RXW DOO HPDLOV VHQW IURP WKHP HPDLO SUDFWLFHV DQG VSHFLÀHG KRZ XQVROLFLWHG HPDLO %ODFNOLVWLQJ GLIIHUV IURP FRQWHQWEDVHG ÀOWHULQJ LQ FRXOG EH VHQW OHJDOO\ 8QIRUWXQDWHO\ FRPSOLDQFH WKDW WKH ÀOWHULQJ LV GRQH RQ HPDLO VHUYHUV LQVWHDG KDV EHHQ H[WUHPHO\ ORZ WKHUHIRUH WKH DFW KDV KDG RI RQ LQGLYLGXDO HPDLOV %ODFNOLVWLQJ LV D PRUH YLUWXDOO\ QR HIIHFW RQ ORZHULQJ WKH YROXPH RI VSDP HIÀFLHQW ÀOWHULQJ DSSURDFK EXW WKH GLVDGYDQWDJH WR 2Q WKH RWKHU KDQG &$163$0 DOORZHG EODFNOLVWLQJ LV WKDW PDQ\ HPDLO VHUYHUV VHQG ERWK ,QWHUQHW VHUYLFH SURYLGHUV ,63V DQG ZHE VLWH OHJLWLPDWH HPDLO DV ZHOO DV VSDP EODFNOLVWLQJ VXFK RZQHUV WR ÀOH ODZVXLWV DJDLQVW VSDPPHUV UHVXOWLQJ D VHUYHU ZRXOG UHVXOW LQ OHJLWLPDWH HPDLOV EHLQJ LQ ÀQHV DQG RFFDVLRQDO MDLO VHQWHQFHV IRU FRQYLFWHG PLVFODVVLÀHG DV VSDP VSDPPHUV :KLOH ODZVXLWV DUH FHUWDLQO\ D ZD\ WR &XUUHQW DQWLVSDP PHWKRGV VKDUH RQH FRPPRQ ÀJKW EDFN DJDLQVW VSDPPHUV JLYHQ WKH YDVW QXPEHU ZHDNQHVV³WKH\ DUH ORFDO WKDW LV WKH\ GHWHFW DQG RI VSDPPHUV VXLQJ DQ LQGLYLGXDO KDV D QHJOLJLEOH ÀOWHU RXW VSDP DW D VLQJOH ORFDWLRQ ZKLFK LV WKH HIIHFW RQ UHGXFLQJ WKH RYHUDOO YROXPH RI VSDP UHFLSLHQW·V HPDLO VHUYHU /RFDO DQWLVSDP VROXWLRQV HVSHFLDOO\ ZKHQ ODZVXLWV DUH EURXJKW UHJDUGOHVV RI DUH HDV\ WR PDLQWDLQ EHFDXVH D VLQJOH DGPLQLVWUDWRU WKH LPSDFW RI WKH RIIHQVH 8QIRUWXQDWHO\ VSDPPHUV XVXDOO\ WKH LQIRUPDWLRQ WHFKQRORJ\ JURXS RI WKH KDYH UHVSRQGHG E\ WDNLQJ JUHDWHU PHDVXUHV WR FRPSDQ\ RU ,63 PDQDJHV WKH SURFHVV %XW ZKDW FRQFHDO WKHLU LGHQWLWLHV WR DYRLG EHLQJ GHWHFWHG FRXOG DQ DQDO\VW GLVFHUQ E\ H[DPLQLQJ KRZ VSDP &OHDUO\ RWKHU PHFKDQLVPV DUH QHFHVVDU\ WR FRPEDW RSHUDWHV RQ D JUHDWHU QHWZRUN OHYHO" VSDP HIIHFWLYHO\ ,Q WKLV DUWLFOH ZH LQYHVWLJDWH WKH VSDP 2QH W\SH RI VSDP WKDW UHSUHVHQWV D VLJQLÀFDQW SUREOHP XVLQJ D JOREDO DSSURDFK ZKLFK UHTXLUHV WKUHDW WR LQGLYLGXDOV DQG FRPSDQLHV DOLNH LV SKLVKLQJ GHWHFWLRQ DQG PRQLWRULQJ RI DQ HQWLUH QHWZRUN RU VSDP 3KLVKLQJ LV DQ DWWHPSW WR IUDXGXOHQWO\ DFTXLUH DW PXOWLSOH ORFDWLRQV ZLWKLQ D QHWZRUN %\ WDNLQJ VHQVLWLYH LQIRUPDWLRQ E\ DSSHDULQJ WR UHSUHVHQW D JOREDO DSSURDFK DQ DQDO\VW FDQ FRUUHODWH GDWD D WUXVWZRUWK\ HQWLW\ 3KLVKLQJ VSDP RIWHQ WDNHV RYHU PXOWLSOH HPDLO VHUYHUV WLPHV DQG ORFDWLRQV WKH IRUP RI HPDLOV DSSHDULQJ WR EH IURP D WUXVWHG WR LQIHU WKH EHKDYLRU RI VSDPPHUV RQ D ODUJH VFDOH ÀQDQFLDO LQVWLWXWLRQ ZLWK ZKLFK WKH UHFLSLHQW GRHV ZKLFK FDQ WKHQ EH XVHG WR FRPEDW VSDP QHDUHU WR EXVLQHVV 7KHVH HPDLOV DUH ZULWWHQ WR SHUVXDGH WKH LWV VRXUFH UHFLSLHQW WR UHYHDO FRQÀGHQWLDO LQIRUPDWLRQ VXFK DV 7KH EHVW GHIHQVH VSDPPHUV KDYH DJDLQVW DQWL RQOLQH EDQNLQJ SDVVZRUGV FUHGLW FDUG QXPEHUV RU VSDP WHFKQLTXHV LV WR VHQG VSDP HPDLOV ZLWKRXW D VRFLDO VHFXULW\ QXPEHU 0DQ\ YLFWLPV RI LGHQWLW\ EHLQJ GHWHFWHG 6R KRZ GR WKH\ GR WKLV" &RQVLGHU WKHIW KDYH EHHQ IRROHG LQWR UHYHDOLQJ VHQVLWLYH WKH SDWK RI VSDP LOOXVWUDWHG LQ )LJXUH  )LUVW D LQIRUPDWLRQ E\ SKLVKLQJ HPDLOV VSDPPHU DFTXLUHV HPDLO DGGUHVVHV RQ D ZHE SDJH &XUUHQW PHWKRGV WR FRPEDW VSDP EHIRUH LW XVLQJ D KDUYHVWHU ZKLFK LV D SLHFH RI VRIWZDUH UHDFKHV D XVHU LQFOXGH FRQWHQWEDVHG ÀOWHULQJ DW WKH GHVLJQHG WR YLVLW ZHE VLWHV DQG H[WUDFW HPDLO UHFLSLHQW·V HPDLO VHUYHU DV ZHOO DV EODFNOLVWLQJ HPDLO DGGUHVVHV IURP WKH +70/ VRXUFH FRGH 1H[W VSDP VHUYHUV NQRZQ WR VHQG RQO\ VSDP HPDLOV %RWK VHUYHUV VHQG HPDLOV WR WKH DFTXLUHG DGGUHVVHV 7KHVH PHDVXUHV UHGXFH WKH DQQR\DQFH RI VSDP DQG WKH FDQ EH VHUYHUV WKDW EHORQJ WR WKH VSDPPHUV RU WKH\ ORVV RI HPSOR\HH SURGXFWLYLW\ E\ GHFUHDVLQJ VSDP FDQ EH ]RPELH FRPSXWHUV FRPSXWHUV FRPSURPLVHG HPDLOV DUULYLQJ DW HPSOR\HH LQER[HV +RZHYHU E\ YLUXVHV RU RWKHU PDOZDUH WKDW HQG XS VHQGLQJ WKHVH VWUDWHJLHV FDQ DOVR EDFNÀUH )RU H[DPSOH VSDP ZLWKRXW WKHLU RZQHUV· NQRZOHGJH )LQDOO\ FRQWHQWEDVHG ÀOWHULQJ KDV WKH XQLQWHQGHG VLGH WKHVH VSDP HPDLOV PDNH WKHLU ZD\ WR WKH UHFLSLHQWV· HIIHFW RI PLVFODVVLI\LQJ OHJLWLPDWH HPDLO DV VSDP LQER[ RU MXQN PDLO IROGHU )XUWKHUPRUH ÀOWHULQJ GRHV QRWKLQJ WR UHGXFH 7KH DGGUHVV DFTXLVLWLRQ SURFHVV NQRZQ WKH YROXPH RI VSDP WKDW LV VHQW :KHQ VSDPPHUV DV KDUYHVWLQJ LV DQ RIWHQ RYHUORRNHG SDUW RI WKH NQRZ WKDW D VPDOOHU SHUFHQWDJH RI HPDLOV DUH JHWWLQJ VSDP SUREOHP 0DOLFLRXV VSDPPHUV W\SLFDOO\ WDNH 28 Revealing Social Networks of Spammers FEATURE

6SDPPHU

:HE SDJH +DUYHVWHU 6SDP VHUYHU 5HFLSLHQW Figure 1: The path of spam from an email address on a web page to your inbox

PHDVXUHV WR FRQFHDO WKHLU LGHQWLWLHV ZKHQ VHQGLQJ VSDP VHUYHG IURP D FRPSURPLVHG FRPSXWHU KDV VSDP 2QH FRPPRQ PHWKRG LV WR XVH PDVVLYH OLWWOH DVVRFLDWLRQ ZLWK WKH VSDPPHU :LWK 3URMHFW QHWZRUNV RI FRPSURPLVHG FRPSXWHUV NQRZQ DV +RQH\ 3RW HDFK VSDP HPDLO LV DVVRFLDWHG ZLWK WKH ERWQHWV +RZHYHU VWXGLHV KDYH LQGLFDWHG WKDW KDUYHVWHU WKDW DFTXLUHG WKH UHFLSLHQW·V HPDLO DGGUHVV VSDPPHUV GR QRW WDNH FRPSDUDEOH SUHFDXWLRQV :KHQ VSDPPHUV IDLO WR FRQFHDO WKHLU LGHQWLWLHV ZKHQ KDUYHVWLQJ >@ SHUKDSV EHFDXVH KDUYHVWLQJ ZKLOH KDUYHVWLQJ WKH ,3 DGGUHVV RI WKH KDUYHVWHU LV LV VHHQ DV D VDIHU DQG PRUH DFFHSWDEOH DFWLYLW\ OLNHO\ WR EH FORVHO\ UHODWHG WR WKH DFWXDO ORFDWLRQ RI WKDQ VHQGLQJ VSDP +HQFH PRQLWRULQJ KDUYHVWLQJ WKH VSDPPHUV DFWLYLW\ DQG WUDFNLQJ KDUYHVWHUV FDQ EH XVHIXO IRU %HFDXVH HDFK HPDLO UHFHLYHG DW D WUDS HPDLO LGHQWLI\LQJ VSDPPHUV 7KLV LV RQH RI WKH JRDOV RI DGGUHVV LV DVVRFLDWHG ZLWK WKH KDUYHVWHU WKDW 3URMHFW +RQH\ 3RW FUHDWHG E\ DQWLVSDP FRPSDQ\ DFTXLUHG LW WKH LGHQWLW\ RI WKH VSDPPHU LV UHYHDOHG 8QVSDP 7HFKQRORJLHV ,QF >@ $V RI 0DUFK  3URMHFW +RQH\ 3RW FRPSULVHG Project Honey Pot RYHU  PLOOLRQ KRQH\ SRWV GLVWULEXWHG DOO RYHU WKH ZRUOG >@ 7KH GDWD FROOHFWHG E\ 3URMHFW +RQH\ 3RW 3URMHFW +RQH\ 3RW ZDV VWDUWHG LQ  WR SURYLGHV D JOREDO SHUVSHFWLYH RQ VSDP DQG PDNHV LW PRQLWRU KDUYHVWLQJ DQG VSDPPLQJ DFWLYLW\ YLD D SRVVLEOH WR LQYHVWLJDWH FRUUHODWLRQV RYHU PDQ\ VSDP QHWZRUN RI GHFR\ ZHE SDJHV VHW ZLWK WUDS HPDLO VHUYHUV DQG WLPH SHULRGV DGGUHVVHV NQRZQ DV KRQH\ SRWV 7KHVH KRQH\ SRWV DUH HPEHGGHG LQ WKH +70/ VRXUFH FRGH RI D ZHE Discovering communities of spammers SDJH DQG DUH LQYLVLEOH WR KXPDQ YLVLWRUV +DUYHVWHUV $V PHQWLRQHG HDUOLHU XQGHUVWDQGLQJ WKH ORRNLQJ IRU HPDLO DGGUHVVHV LQ +70/ VRXUFH FRGH EHKDYLRU RI VSDPPHUV RQ DQ H[SDQGHG VFDOH LV RQH VRPHWLPHV VWXPEOH DFURVV WKH WUDS DGGUHVVHV DQG RI WKH EHQHÀWV RI D JOREDO DSSURDFK IRU ÀJKWLQJ DFTXLUH WKHP +DUYHVWHUV FDQ DOVR EH GLUHFWHG WR WUDS VSDP %XW ZKDW GR WKH VRFLDO QHWZRUNV RI VSDPPHUV DGGUHVVHV E\ OLQNV WR KRQH\ SRWV IURP OHJLWLPDWH ORRN OLNH" ,Q SDUWLFXODU KRZ ZHOO RUJDQL]HG DUH ZHE VLWHV WKDW WKH\ DOVR VFDQ IRU HPDLO DGGUHVVHV VSDPPHUV" 'R WKH\ RSHUDWH DORQH RU LQ JURXSV" (DFK WLPH D KRQH\ SRW LV YLVLWHG WKH FHQWUDOL]HG $UH WKHUH PHDQLQJIXO FRPPXQLWLHV RU RUJDQL]DWLRQV 3URMHFW +RQH\ 3RW VHUYHU JHQHUDWHV D XQLTXH WUDS RI VSDPPHUV" 6HQGLQJ VSDP HPDLOV LV SURÀWDEOH IRU HPDLO DGGUHVV 7KH YLVLWRU·V ,3 DGGUHVV LV DVVRFLDWHG VSDPPHUV RWKHUZLVH WKHUH ZRXOGQ·W EH VR PXFK ZLWK WKH WUDS HPDLO DGGUHVV DQG WKHQ UHFRUGHG RQ WKH VSDP &DQ D EXVLQHVV PRGHO EH GHULYHG IURP WKH VHUYHU 7KH HPDLO DGGUHVV HPEHGGHG LQ WKH KRQH\ FRPPXQLW\ VWUXFWXUH RI VSDPPHUV" 7KHVH TXHVWLRQV SRW LV XQLTXH VR RQO\ WKH YLVLWRU WR WKDW KRQH\ SRW FDQ EH DQVZHUHG XVLQJ WKH GDWD FROOHFWHG E\ 3URMHFW FRXOG KDYH FROOHFWHG LW %HFDXVH WKHVH WUDS HPDLO +RQH\ 3RW DQG D WHFKQLTXH NQRZQ DV VSHFWUDO DGGUHVVHV DUH QRW SXEOLVKHG DQ\ZKHUH EHVLGHV WKH FOXVWHULQJ >@ KRQH\ SRW DOO HPDLOV UHFHLYHG DW WKHVH DGGUHVVHV DUH 7KH VRFLDO QHWZRUN RI VSDPPHUV FDQ EH DVVXPHG WR EH VSDP UHSUHVHQWHG DV D JUDSK FRQVLVWLQJ RI QRGHV DQG 3URMHFW +RQH\ 3RW SURYLGHV D XQLTXH HGJHV DV VKRZQ LQ )LJXUH  7KH QRGHV FRUUHVSRQG RSSRUWXQLW\ WR LQYHVWLJDWH WKH VRFLDO VWUXFWXUH RI WR VSDPPHUV DQG DQ HGJH EHWZHHQ WZR QRGHV VSDPPHUV ,W LV QRUPDOO\ YHU\ GLIÀFXOW WR XQFRYHU FRUUHVSRQGV WR D VRFLDO UHODWLRQVKLS EHWZHHQ WKH DQ\WKLQJ DW WKH VSDPPHU OHYHO EHFDXVH ZH FDQQRW FRUUHVSRQGLQJ VSDPPHUV $ VRFLDO UHODWLRQVKLS FDQ DVVRFLDWH D VSDP HPDLO ZLWK D SDUWLFXODU VSDPPHU EH LQIHUUHG E\ WKH XVH RI FRPPRQ UHVRXUFHV RU E\ 7KH ´IURPµ DGGUHVV FDQ EH HDVLO\ VSRRIHG DQG WKH VLPLODU EHKDYLRU SDWWHUQV RYHU WLPH &RPPXQLWLHV

The Next Wave „ Vol 18 No 3 „ 2010 29 LQ D VRFLDO QHWZRUN HPHUJH E\ SDUWLWLRQLQJ WKH ZH LQYHVWLJDWH WZR W\SHV RI EHKDYLRUDO FRUUHODWLRQ JUDSK LQWR JURXSV RI QRGHV 6HWV RI QRGHV LQ WKH EHWZHHQ VSDPPHUV FRUUHODWLRQ LQ VSDP VHUYHU VDPH JURXS DUH KLJKO\ VLPLODU DQG VHWV RI QRGHV LQ XVDJH DQG WHPSRUDO FRUUHODWLRQ GLIIHUHQW JURXSV DUH QRW VLPLODU 6SHFWUDO FOXVWHULQJ Correlation in spam server usage DLPV WR PLQLPL]H WKH QRUPDOL]HG FXW EHWZHHQ JURXSV ZKLFK LV GHÀQHG E\ &RUUHODWLRQ LQ VSDP VHUYHU XVDJH EHWZHHQ WZR VSDPPHUV FRUUHVSRQGV WR FRPPRQ XVDJH RI D VHW 6XP RI DOO HGJH ZHLJKWV EHWZHHQ JURXSV RI VSDP VHUYHUV 6SDPPHUV W\SLFDOO\ WU\ WR FRQFHDO 1RUPDOL]HG FXW 6XP RI DOO HGJH ZHLJKWV ZLWKLQ JURXSV WKHLU LGHQWLW\ E\ XVLQJ VSDP VHUYHUV WKDW DUHQ·W WUDFHDEOH EDFN WR WKHP VXFK DV ERWQHWV 7KXV VSDP )RU H[DPSOH VSHFWUDO FOXVWHULQJ GLYLGHV WKH VHUYHUV FDQ EH YLHZHG DV UHVRXUFHV IRU VSDPPHUV JUDSK VKRZQ LQ )LJXUH  LQWR WKH WZR FRPPXQLWLHV DQG FRPPRQ XVDJH RI D VHW RI VSDP VHUYHUV EHWZHHQ WZR VSDPPHUV WUDQVODWHV LQWR UHVRXUFH LQGLFDWHG E\ WKH EOXH DQG JUHHQ QRGHV UHVSHFWLYHO\ VKDULQJ ZKLFK VXJJHVWV WKDW WKH WZR VSDPPHUV 7KHJURXSVUHYHDOHGE\VSHFWUDOFOXVWHULQJFRUUHVSRQG DUH FROODERUDWLQJ %\ FRQVWUXFWLQJ WKH JUDSK XVLQJ WR FRPPXQLWLHV LQ WKH VRFLDO QHWZRUN )RU WKHVH FRUUHODWLRQ LQ VSDP VHUYHU XVDJH EHWZHHQ DOO DFWLYH FRPPXQLWLHV WR EH PHDQLQJIXO WKH JUDSK PXVW EH VSDPPHUV RYHU D SHULRG RI WLPH PDQ\ LQWHUHVWLQJ FRQVWUXFWHG VR WKH HGJHV EHWZHHQ QRGHV FRUUHVSRQG FRPPXQLWLHV RI VSDPPHUV DUH UHYHDOHG DV VKRZQ WR DFWXDO UHODWLRQVKLSV EHWZHHQ VSDPPHUV LQ )LJXUH  (DFK QRGH LQ WKH JUDSK FRUUHVSRQGV WR D VSDPPHU DQG WKH FRORU DQG VKDSH RI D QRGH LQGLFDWHV WKH FRPPXQLW\ WR ZKLFK KH RU VKH EHORQJV 1RWH WKDW WKH PDMRULW\ RI VSDPPHUV EHORQJ LQ D ODUJH ORRVHO\FRQQHFWHG FRPPXQLW\ LGHQWLÀHG E\ WKH UHG QRGHV 7KHVH DUH WKH VSDPPHUV ZKR GR QRW H[KLELW H[WUHPHO\ KLJK FRUUHODWLRQ ZLWK RWKHU VSDPPHUV +HQFH LW LV QRW D WUXH FRPPXQLW\ EXW D FROOHFWLRQ RI VSDPPHUV ZKR DSSHDU WR EH RSHUDWLQJ DORQH 7KH LQWHUHVWLQJ FRPPXQLWLHV DUH WKH VPDOOHU

Figure 2: An example of a graph and its separation into WLJKWO\FRQQHFWHG RQHV VXUURXQGLQJ WKH ODUJH UHG two communities by spectral clustering FRPPXQLW\ :H EHOLHYH WKDW WKHVH QRGHV FRUUHVSRQG WR DFWXDO VRFLDO FRPPXQLWLHV RI VSDPPHUV ZRUNLQJ 7KH PDLQ FKDOOHQJH LQ FRQVWUXFWLQJ WKH WRJHWKHU DQG VKDULQJ VXEVWDQWLDO HPDLO VHUYHU JUDSK LV FKRRVLQJ WKH HGJHV DQG HGJH ZHLJKWV UHVRXUFHV EHFDXVH ZH FDQQRW REVHUYH UHODWLRQVKLSV DPRQJ 5HLQIRUFLQJ RXU EHOLHI LV WKH REVHUYDWLRQ WKDW VSDPPHUV 7KLV SUREOHP GRHV QRW DULVH LQ PRVW WKH GLVFRYHUHG FRPPXQLWLHV WHQG WR GLYLGH LQWR RWKHU FRPPXQLW\ GHWHFWLRQ VWXGLHV )RU H[DPSOH SKLVKLQJ DQG QRQSKLVKLQJ FRPPXQLWLHV DV VKRZQ LQ IULHQGVKLS RU FROODERUDWLRQ QHWZRUNV XVHUV LQ )LJXUH  7KH VKDGH RI HDFK QRGH FRUUHVSRQGV ZLOOLQJO\ SDUWLFLSDWH LQ WKH VWXG\ DQG LQIRUPDWLRQ WR WKH SKLVKLQJ OHYHO RI HDFK VSDPPHU ZKLFK LV RQ UHODWLRQVKLSV DPRQJ PHPEHUV RI WKH QHWZRUN LV GHÀQHG E\ UHDGLO\ DYDLODEOH +RZHYHU IRU VSDPPHU QHWZRUN 1XPEHU RI SKLVKLQJ HPDLOV VHQW 3KLVKLQJ OHYHO GLVFRYHU\ UHODWLRQVKLSV EHWZHHQ VSDPPHUV 7RWDO QXPEHU RI HPDLOV VHQW DUH RQO\ LQIHUUHG WKURXJK FRUUHODWLRQV EHWZHHQ EHKDYLRU SDWWHUQV 7ZR VSDPPHUV ZKR KDYH KLJK :H GHQRWH VSDPPHUV ZLWK KLJK SKLVKLQJ EHKDYLRUDO FRUUHODWLRQ DUH OLNHO\ WR EH FROODERUDWLQJ OHYHOV DV SKLVKHUV DQG WKH UHVW DV QRQSKLVKHUV 7KLV OLNHOLKRRG ZKLFK LV WUHDWHG DV WKH VWUHQJWK RI 1RWLFH WKDW SKLVKHUV WHQG WR IRUP FRPPXQLWLHV ZLWK WKH UHODWLRQVKLS EHWZHHQ WKHVH WZR VSDPPHUV FDQ RWKHU SKLVKHUV DQG WKDW QRQSKLVKHUV WHQG WR IRUP EH XVHG DV WKH ZHLJKW RI WKH HGJH EHWZHHQ WKH WZR FRPPXQLWLHV ZLWK RWKHU QRQSKLVKHUV 7KLV LV DOVR FRUUHVSRQGLQJ QRGHV LQ WKH JUDSK )RU WKLV UHVHDUFK HYLGHQW IURP ORRNLQJ DW WKH PRVW IUHTXHQW VXEMHFW

30 Revealing Social Networks of Spammers FEATURE

Figure 3: Community structure of spammers inferred by correlation in spam server usage in October 2006

Figure 4: Alternate view of the same social network shown in Figure 3, shaded by phishing level

The Next Wave „ Vol 18 No 3 „ 2010 31 3KLVKLQJ &RPPXQLW\ 1RQ3KLVKLQJ &RPPXQLW\ 3DVVZRUG &KDQJH 5HTXLUHG 0DNH 0RQH\ E\ 6KDULQJ

Table 1: Most common 4XHVWLRQ IURP H%D\ 0HPEHU 3UHPLHUH 3URIHVVLRQDO ([HFXWLYH 5HJLVWULHV ,QYLW subject lines from a phishing and a non-phishing &UHGLW 8QLRQ 2QOLQHNJ  5HZDUG 6XUYH\ 7H[DV /DQG*ROI LV WKH %X]] community (truncated to 50 characters by 3D\3DO $FFRXQW .H\V WR 6WRFN 0DUNHW 6XFFHVV the Project Honey Pot database) 3D\3DO $FFRXQW  6XVSLFLRXV $FWLYLW\ $Q (QWLUH &DVH RI )LQH :LQH SOXV ([FOXVLYH *LIW WR

OLQHV RI HPDLOV IURP DOO VSDPPHUV LQ D FRPPXQLW\ Temporal correlation )RU H[DPSOH WKH PRVW IUHTXHQW VXEMHFW OLQHV IURP 7HPSRUDO FRUUHODWLRQ UHIHUV WR FRUUHODWLRQ RI ERWK D SKLVKLQJ FRPPXQLW\ QDPHO\ WKH RUDQJH WKH WLPHV ZKHQ HPDLOV ZHUH VHQW +LJK FRUUHODWLRQ FRPPXQLW\ RI WULDQJXODU QRGHV DW WKH WRS RI )LJXUH LV H[SHFWHG DPRQJ VSDPPHUV ZKR DUH ZRUNLQJ  DQG D QRQSKLVKLQJ FRPPXQLW\ QDPHO\ WKH EOXH WRJHWKHU %HFDXVH ZH GR QRW NQRZ WKH WLPHV ZKHQ FRPPXQLW\ RI FLUFXODU QRGHV RQ WKH ULJKW RI )LJXUH HPDLOV ZHUH VHQW ZH FRUUHODWH WKH WLPHV ZKHQ  DUH OLVWHG LQ 7DEOH  1RWLFH WKH GLVWLQFW VHSDUDWLRQ HPDLOV ZHUH UHFHLYHG 7KH FRPPXQLW\ VWUXFWXUH EHWZHHQ SKLVKLQJ VXEMHFW OLQHV DQG QRQSKLVKLQJ DV UHYHDOHG XVLQJ WHPSRUDO FRUUHODWLRQ LV VKRZQ LQ VXEMHFW OLQHV 7KH VXEMHFW KHDGLQJV ZHUH QRW )LJXUH  SURYLGHG WR WKH FOXVWHULQJ DOJRULWKP DQG WKHUHIRUH $JDLQ WKH VKDSH DQG FRORU RI D QRGH UHSUHVHQW FRQÀUP WKDW VHUYHU XVDJH SDWWHUQV DORQH FDQ SURYLGH WKH FRPPXQLW\ WKDW D SDUWLFXODU VSDPPHU EHORQJV HYLGHQFH RI FRRUGLQDWHG SKLVKLQJ EHKDYLRU :H QRWH WR 7ZR ODUJH FRPPXQLWLHV DSSHDU DQG DV EHIRUH WKDW SKLVKHUV WHQG WR FRQFHQWUDWH LQ VPDOO WLJKWO\ WKH\ FDQ EH LQWHUSUHWHG DV ORRVHO\FRQQHFWHG FRQQHFWHG FRPPXQLWLHV 7KLV REVHUYDWLRQ SURYLGHV FRPPXQLWLHV RI LQGLYLGXDOV ZKR GR QRW H[KLELW HPSLULFDO HYLGHQFH WKDW FRPPXQLWLHV RI SKLVKLQJ PXFK FRUUHODWLRQ ZLWK HDFK RWKHU +RZHYHU LQ WKH VSDPPHUV DUH VKDULQJ UHVRXUFHV QDPHO\ VSDP VPDOOHU FRPPXQLWLHV VRPH LQWHUHVWLQJ SDWWHUQV VHUYHUV DPRQJ WKH FRPPXQLW\ 7KLV VXJJHVWV WKDW HPHUJH ,Q SDUWLFXODU ZH GLVFRYHU JURXSV RI SKLVKHUV WHQG WR H[LVW LQ LVRODWHG ZHOORUJDQL]HG VSDPPHUV ZLWK QHDUO\ FRKHUHQW WHPSRUDO VSDPPLQJ VRFLDO FRPPXQLWLHV RU WHDPV EHKDYLRU &RQVLGHU WKH JURXS RI WHQ VSDPPHUV

Figure 5: Community structure of spammers inferred by temporal correlation in October 2006

32 Revealing Social Networks of Spammers FEATURE

ZKRVH WHPSRUDO VSDPPLQJ EHKDYLRU LV VKRZQ LQ

)LJXUH  LQ ZKLFK WKH KRUL]RQWDO D[LV FRUUHVSRQGV 3000 208.66.195.2 WR GD\V LQ D PRQWK DQG WKH YHUWLFDO D[LV FRUUHVSRQGV 208.66.195.3 WR WKH QXPEHU RI HPDLOV VHQW HDFK GD\ 7KH ÀJXUH 2500 208.66.195.4 FRQVLVWV RI WHQ OLQHV RYHUODLG RQWR WKH VDPH SORW 208.66.195.5 208.66.195.6 2000 ZLWK HDFK OLQH FRUUHVSRQGLQJ WR WKH WHPSRUDO 208.66.195.7 VSDPPLQJ EHKDYLRU RI RQH VSDPPHU LQ WKH JURXS 208.66.195.8 1500 208.66.195.9 +RZ VWULNLQJ WKDW WKH WHQ VSDPPHUV LQ )LJXUH 208.66.195.10 1000 208.66.195.11  DUH VHQGLQJ DOPRVW LGHQWLFDO QXPEHUV RI HPDLOV Number of emails RYHU WLPH $QG KRZ SUREDEOH WKDW WKH\ DUH ZRUNLQJ 500 WRJHWKHU DQG EHORQJ WR DQ DFWXDO VRFLDO FRPPXQLW\

7KHVH WHQ VSDPPHUV IRXQG LQ WKH FRPPXQLW\ RI 0 0 5 10 15 20 25 30 35 GDUNEOXH FRORUHG QRGHV LQ WKH WRS OHIW RI )LJXUH  Day in month DUH HVSHFLDOO\ LQWHUHVWLQJ EHFDXVH WKH\ DUH DPRQJ WKH KHDYLHVW VSDPPHUV LQ WKH 3URMHFW +RQH\ 3RW Figure 6: Temporal spamming behavior of group of ten spammers over the month of October 2006, by IP GDWD VHW ZKHUH D KHDY\ VSDPPHU GHQRWHV VRPHRQH ZKR VHQGV D ODUJH QXPEHU RI VSDP HPDLOV ,Q DGGLWLRQ WR WKHLU KLJKO\ FRKHUHQW WHPSRUDO EHKDYLRU %\ DQDO\]LQJ VSDP DQG VSDPPHU EHKDYLRU WKHVH VSDPPHUV DOVR KDYH ,3 DGGUHVVHV LQ WKH VDPH IURP D JOREDO SHUVSHFWLYH ZH ZHUH DEOH WR LGHQWLI\ EORFN LQGLFDWLQJ WKDW WKH\ DUH RSHUDWLQJ IURP PHDQLQJIXO FRPPXQLWLHV RI VSDPPHUV 7KH QH[W WKH VDPH SK\VLFDO ORFDWLRQ SHUKDSV LQ WKH VDPH VWHS ZRXOG EH WR XVH WKHVH ÀQGLQJV WR FRPEDW VSDP EXLOGLQJ )XUWKHUPRUH WKHVH WHQ VSDPPHUV· ,3 6HYHUDO DYHQXHV WKDW FRXOG EH SXUVXHG LQFOXGH DGGUHVVHV DUH LQ WKH ,3 DGGUHVV UDQJH RI D NQRZQ LGHQWLI\LQJ VRFLDO FOLTXHV WKDW FRXOG SHUKDSV EH URJXH ,63 0F&ROR &RUS ZKLFK KDG EHHQ KRVWLQJ OLQNHG WR DQ RUJDQL]DWLRQ DQG LGHQWLI\LQJ LPSRUWDQW DQG SURYLGLQJ VHUYLFHV IRU F\EHUFULPLQDOV XQWLO LW PHPEHUV RI WKH VRFLDO QHWZRUN ZKR FRXOG EH ZDV WDNHQ GRZQ LQ 1RYHPEHU  >@ $OO RI WKH VXHG ZKLFK ZRXOG KDYH D PXFK JUHDWHU HIIHFW DERYHPHQWLRQHG REVHUYDWLRQV SRLQW WR WKLV JURXS RI WKDQ UDQGRPO\ WDUJHWLQJ VSDPPHUV 7KHUH LV DOVR VSDPPHUV EHLQJ YHU\ ZHOORUJDQL]HG DQG WKXV ZH FRQFOXGH WKDW WKH\ IRUP D WLJKW VRFLDO FRPPXQLW\ SRWHQWLDO IRU RQOLQH GHWHFWLRQ RI FRPPXQLWLHV WKDW LV XSGDWLQJ WKH GHWHFWHG FRPPXQLWLHV DV HPDLOV Conclusions DUH UHFHLYHG 7KLV ZRXOG DOORZ IRU D QHZ PHWKRG &XUUHQW PHWKRGV RI ÀJKWLQJ VSDP DUH ORFDO RI VSDP ÀOWHULQJ QRW E\ FRQWHQW RU EODFNOLVWLQJ DQG WDNH SODFH DW WKH UHFHLYLQJ HQG ZKLFK GRHV EXW E\ EHKDYLRUDO SDWWHUQV RI VSDPPHUV ZKLFK DUH QRW KHOS WR UHGXFH WKH DPRXQW RI QHWZRUN WUDIÀF OHVV YDULDEOH 7KXV ÀOWHULQJ E\ EHKDYLRUDO SDWWHUQV FRQVXPHG E\ VSDP HPDLOV %\ VWXG\LQJ VSDP KDV WKH SRWHQWLDO WR EH PRUH HIIHFWLYH WKDQ H[LVWLQJ IURP D JOREDO SHUVSHFWLYH XVLQJ WKH GDWD FROOHFWHG ÀOWHULQJ PHWKRGV E\ 3URMHFW +RQH\ 3RW ZH ZHUH DEOH WR FRUUHODWH $OWKRXJK WKH SUREOHP RI VSDP GRHV QRW WKH EHKDYLRU RI VSDPPHUV DOORZLQJ XV WR LGHQWLI\ DSSHDU WR EH JRLQJ DZD\ DQ\WLPH VRRQ PHWKRGV GLIIHUHQW FRPPXQLWLHV RI VSDPPHUV :H IRXQG WKDW DQG WRROV IRU FRPEDWLQJ LW DUH LPSURYLQJ 6SHFWUDO WKH PDMRULW\ RI VSDPPHUV DSSHDUHG WR EH ZRUNLQJ FOXVWHULQJ DQG QHWZRUN GLVFRYHU\ FDQ OHDG WR DORQH EXW D VLJQLÀFDQW QXPEHU RI WKHP DSSHDU WR LQVLJKWV LQWR KRZ VSDPPHUV RSHUDWH E\ UHYHDOLQJ IRUP FRPPXQLWLHV RU RUJDQL]DWLRQV ,Q SDUWLFXODU WKHLU VRFLDO QHWZRUNV 7KH PHWKRGV GHVFULEHG LQ ZH GLVFRYHUHG PDQ\ VPDOO FRPPXQLWLHV RI WKLV SDSHU PLJKW DOVR EH DSSOLHG WR GLVFRYHU\ RI VSDPPHUV ZKR SUHGRPLQDQWO\ VHQW SKLVKLQJ HPDLOV LOOLFLW EHKDYLRU SDWWHUQV LQ RWKHU DSSOLFDWLRQV VXFK OLNHO\ DWWHPSWLQJ WR DFTXLUH VHQVLWLYH LQIRUPDWLRQ DV ÀQDQFLDO WUDQVDFWLRQ QHWZRUNV RU FKDW URRP WR HQJDJH LQ LGHQWLW\ WKHIW :H DOVR GLVFRYHUHG LQWHUDFWLRQ QHWZRUNV )RU DGGLWLRQDO GHWDLOV RQ VHYHUDO FRPPXQLWLHV RI VSDPPHUV RSHUDWLQJ IURP RXU PHWKRGV WKH UHDGHU LV UHIHUUHG WR ´5HYHDOLQJ WKH VDPH SK\VLFDO ORFDWLRQ VXJJHVWLQJ VWURQJ VRFLDO 6RFLDO 1HWZRUNV RI 6SDPPHUV 7KURXJK 6SHFWUDO FRQQHFWLRQV EHWZHHQ WKHVH VSDPPHUV &OXVWHULQJµ >@

The Next Wave „ Vol 18 No 3 „ 2010 33

References: [1] The Secret Cost of Spam. Windows & .NET Magazine. [Online]. Available: http://www. itmanagement.com/whitepaper/the-secret-cost-of-spam/, 2003. [2] MessageLabs Intelligence: 2008 Annual Security Report. Symantec Corp. [Online] Available: http://www.messagelabs.com/download.get?filename=MLIReport_Annual_2008_FINAL.pdf. [3] M. Prince, L. Holloway, E. Langheinrich, B. M. Dahl, and A. M. Keller. Understanding How Spammers Steal Your E- Address: An Analysis of the First Six Months of Data from Project Honey Pot. In Proc. 2nd Conf. Email and Anti-Spam, 2005. [4] Project Honey Pot. Unspam Technologies Inc. [Online] Available: http://www.projecthoneypot. org/, 2009. [5] S. Yu and J. Shi. Multiclass Spectral Clustering. In Proc. 9th IEEE Int. Conf. Computer Vision, 2003. [6] J. Nazario. Third ‘Bad ISP’ Disappears—McColo Gone. Arbor Networks. [Online] Available: http:// asert.arbornetworks.com/2008/11/third-bad-isp-dissolves-mccolo-gone/, 2008. [7] K. S. Xu, M. Kliger, Y. Chen, P. J. Woolf, and A. O. Hero III. Revealing Social Networks of Spammers Through Spectral Clustering. In Proc. IEEE Int. Conf. Communications, 2009.

34 Revealing Social Networks of Spammers FEATURE

Challenges in Internet Geolocation, or Where’s Waldo Online?

s mom rer andd morre of ourr daily lilivves ara e ccondnducu ted online, coonvn enenttiionnalal tooolls andn tecechhnniqiques used byy lawaw--eenfforo cement andd intelligence oro ganniizaz tiionns for surveie llancee and mmoonin tooriring have becomee eefffeectctiive. TThhe dececennttrar lizationon that distinnguguiisshhes thhe IInnteterrnnetet from traditioonanal tteleephony nnetwworo ksks andd makess theh Intternet so ror buusst andd rese ilient alsso makkes it very diiffifficucultt to iiddene ttiiffy itss userrs. Unlikee reae l- woworld inntteeraactc ionss, participants of online trranssacttioionnss aree often very hhardd to lloocaatete. A poossterer on a mmesssage board,d a papartici ippanant off an ono liinene funu ds traansn fer, or a sseendnder foro a particular email meessage cooululd bebe connnnecctedd to the Internett from aanywherer , ana d it is difficuultl to determmine the coc mpputeerr’s prereciise loco ation whenn that ininfoormmatioon iss needded.

The proco esss of dete erminingn thee location of a neetwt ork participi ant on the gloobeb is called

The Next Wave „ Vol 18 No 3 „ 2010 35 Challenges to geolocation RQ D PDS ZKHUH PRVW OLJKWKRXVHV DUH QRW $ LV ZLWKLQ [ PLOHV RI /DQGPDUN /µ EXW *HRORFDWLRQ RQ WKH ,QWHUQHW ZRXOG PDUNHG LQ D ZRUOG ZKHUH OLJKW GRHV QRW DOVR HQFRPSDVV QHJDWLYH LQIRUPDWLRQ EH VXEVWDQWLDOO\ OHVV GLIÀFXOW LI ,QWHUQHW QHFHVVDULO\ WUDYHO LQ D VWUDLJKW OLQH RU DW RI WKH IRUP ´QRGH $ LV IXUWKHU WKDQ \ D FRQVWDQW VSHHG &RQVHTXHQWO\ D QDwYH PLOHV IURP /DQGPDUN / µ %RWK NLQGV RI SURWRFRO ,3 DGGUHVVHV FRUUHVSRQGHG WR  SK\VLFDO ORFDWLRQV PXFK DV DUHD FRGHV LQ DSSOLFDWLRQ RI QDYLJDWLRQDO JHRORFDWLRQ FRQVWUDLQWV FDUU\ YDOXDEOH LQIRUPDWLRQ SKRQH QXPEHUV GR +RZHYHU WKH ,QWHUQHW WHFKQLTXHV WR ,QWHUQHW JHRORFDWLRQ GRHV DQG D FRPSUHKHQVLYH IUDPHZRUN PXVW EH ZDV GHVLJQHG WR EH IXQGDPHQWDOO\ QRW \LHOG DFFXUDWH RU SUHFLVH ORFDWLRQV DEOH WR WDNH DGYDQWDJH RI ERWK NLQGV RI LQIRUPDWLRQ GHFHQWUDOL]HG ZLWKRXW WKH ULJLG UHJLRQ Octant framework EDVHG URXWLQJ KLHUDUFK\ WKDW H[LVWHG LQ (VWDEOLVKLQJ VXLWDEOH ODQGPDUNV RQ 2XU JURXS KDV EHHQ GHYHORSLQJ WKH RULJLQDO SKRQH QHWZRUNV ,QVWHDG WKH ,QWHUQHW LV HVVHQWLDO IRU JHQHUDWLQJ D JHQHUDOSXUSRVH FRPSUHKHQVLYH ,QWHUQHW VHUYLFH SURYLGHUV ,63V FRYHU SUHFLVH FRQVWUDLQWV /DQGPDUNV ZLWK IUDPHZRUN IRU JHRORFDWLRQ FDOOHG 2FWDQW RYHUODSSLQJ JHRJUDSKLF UHJLRQV WKDW LQ NQRZQ ORFDWLRQV VXFK DV QRGHV DW 7KH NH\ LQVLJKW EHKLQG 2FWDQW LV WR YLHZ VRPH FDVHV FDQ VSDQ HQWLUH FRQWLQHQWV XQLYHUVLWLHV DQG GDWDFHQWHUV ZLWK ZHOO WKH JHRORFDWLRQ SURFHVV DV VROYLQJ D +HQFH DQ ,3 DGGUHVV HYHQ ZKHQ QDUURZHG HVWDEOLVKHG SRVLWLRQ FDQ VHUYH DV D EDVLV V\VWHP RI JHRJUDSKLF FRQVWUDLQWV GRZQ WR LWV LVVXLQJ ,63 RQO\ SURYLGHV YHU\ IRU SUHFLVH FRQVWUDLQWV EXW DUH UHODWLYHO\ FRDUVHJUDLQ JHRJUDSKLF LQIRUPDWLRQ 2FWDQW DJJUHVVLYHO\ H[WUDFWV WKHVH IHZ LQ QXPEHU DQG GLVWULEXWHG XQHYHQO\ FRQVWUDLQWV IURP QHWZRUN PHDVXUHPHQWV 6LQFH ,3 DGGUHVVHV DUH PRVWO\ WKURXJKRXW WKH JOREH 7R FRPSHQVDWH DWWDFKHV D ZHLJKW FRUUHVSRQGLQJ WR WKH RSDTXH LGHQWLÀHUV WKDW SURYLGH OLWWOH IRU WKLV 2FWDQW FRRSWV QRGHV ZLWKLQ WKH FRQÀGHQFH DVVRFLDWHG ZLWK WKDW FRQVWUDLQW LQIRUPDWLRQ RQ WKH ORFDWLRQ RI D QRGH QHWZRUN IDEULF DQG XVHV WKHP DV DGGLWLRQDO DQG GHWHUPLQHV D IHDVLEOH UHJLRQ LQ JHRORFDWLRQ RQ WKH ,QWHUQHW UHTXLUHV ODQGPDUNV 6LQFH WKH SRVLWLRQV RI WKHVH ZKLFK WKH QRGH RI LQWHUHVW LV H[SHFWHG WR JRLQJ EDFN WR ÀUVW SULQFLSOHV 7KH EDVLFV QRGHV DUH QRW NQRZQ 2FWDQW ÀUVW XVHV WKH UHVLGH 7KLV DSSURDFK JDLQV LWV DFFXUDF\ RI ORFDWLQJ SK\VLFDO REMHFWV RQ WKH JOREH ZHOOHVWDEOLVKHG ODQGPDUNV WR JHRORFDWH WKURXJK WKUHH QRYHO WHFKQLTXHV )LUVWO\ KDYH EHHQ ZRUNHG RXW LQ JUHDW GHWDLO WKHVH DGGLWLRQDO QRGHV ZKLFK DUH XVHG LQ 2FWDQW FDQ WDNH DGYDQWDJH RI QHJDWLYH RYHU WKH ODVW IHZ FHQWXULHV DQG FRPSULVH WXUQ WR JHRORFDWH WKH ÀQDO QRGH RI LQWHUHVW LQIRUPDWLRQ³LQIRUPDWLRQ RQ ZKHUH WULDQJXODWLRQ ZKHUH EHDULQJV WR NQRZQ ([WUDFWLQJ DQG XVLQJ FRQVWUDLQWV EDVHG RQ D QRGH LV QRW³LQ DGGLWLRQ WR SRVLWLYH ODQGPDUNV DUH XVHG WR GHWHUPLQH ORFDWLRQ  WKHVH DGGLWLRQDO ODQGPDUNV LV QRQWULYLDO LQIRUPDWLRQ³LQIRUPDWLRQ RQ ZKHUH WKH PXOWLODWHUDWLRQ ZKHUH WLPH GLIIHUHQFH RI EHFDXVH WKH SRVLWLRQ RI WKH ODQGPDUN LV QRGH PLJKW EH 6HFRQGO\ 2FWDQW XWLOL]HV DUULYDO IURP D FRPPRQ HPLWWHU DUH XVHG  W\SLFDOO\ XQFHUWDLQ UDWKHU WKDQ D SUHFLVH DYDLODEOH VWUXFWXUDO LQIRUPDWLRQ DERXW WKH DQG WULODWHUDWLRQ ZKHUH GLVWDQFHV WR SRLQW QHWZRUN WR H[WUDFW DGGLWLRQDO JHRJUDSKLF NQRZQ ODQGPDUNV DUH XVHG WR GHWHUPLQH ,Q WKH VLPSOH FDVH ZKHUH WKH ORFDWLRQ FRQVWUDLQWV IURP URXWHUV RQ WKH QHWZRUN ORFDWLRQ  $SSO\LQJ WKHVH DSSURDFKHV WR RI D ODQGPDUN LV NQRZQ ZLWK SLQSRLQW SDWK WKXV FRPSHQVDWLQJ IRU WKH LQGLUHFW ZLUHG ZLGHDUHD FRPSXWHU QHWZRUNV SRVHV DFFXUDF\ WKH WZR W\SHV RI FRQVWUDLQWV DQG FLUFXLWRXV QDWXUH RI URXWLQJ SDWKV RQ VLJQLÀFDQW FKDOOHQJHV )LUVW EHDULQJV DUH FRPELQH WR IRUP DQ DQQXOXV FHQWHUHG RQ WKH ,QWHUQHW )LQDOO\ 2FWDQW FDQ UHDVRQ LQ QRW DSSOLFDEOH DV ZLUHG QHWZRUNV GR QRW WKH ODQGPDUN WKDW GHVFULEHV WKH SRVVLEOH WKH SUHVHQFH RI XQFHUWDLQW\ E\ GHULYLQJ VXSSRUW WUDGLWLRQDO QRWLRQV RI DQJOHV ORFDWLRQ RI WKH QRGH RI LQWHUHVW 7KLV FDVH FRQVWUDLQWV IURP ODQGPDUNV ZKRVH JULGV RU HYHQ D &DUWHVLDQ VSDFH 6HFRQG LV LOOXVWUDWHG LQ )LJXUHD SRVLWLRQV DUH QRW NQRZQ SUHFLVHO\ EXW DUH PHDVXUHPHQWV DUH LQKHUHQWO\ LPSUHFLVH 2FWDQW HQDEOHV PHDQLQJIXO LQVWHDG FRPSXWHG E\ 2FWDQW LWVHOI 7KH DV ODWHQFLHV RQ D QHWZRUN GHSHQG QRW RQO\ H[WUDFWLRQ RI FRQVWUDLQW UHJLRQV HYHQ UHVXOW LV D V\VWHP WKDW FDQ H[WUDFW DQG RQ WKH FLUFXLWRXV SDWKV WKDW SDFNHWV IROORZ ZKHQ WKH SRVLWLRQ RI WKH ODQGPDUN LV FRPELQH DOO DYDLODEOH SRVLWLRQUHODWHG RYHU ÀEHU QHWZRUNV LQVWHDG RI D VWUDLJKW XQFHUWDLQ DQG FRQVLVWV RI DQ LUUHJXODU LQIRUPDWLRQ WR JHRORFDWH OLJKWKRXVHV DQG OLQH IURP D OLJKWKRXVH RU VDWHOOLWH EXW UHJLRQ )RU D ODQGPDUN N ZKRVH SRVLWLRQ QRGHV DOLNH DOVR RQ WKH TXHXLQJ GHOD\V HQFRXQWHUHG HVWLPDWH LV ơN D FRQVWUDLQW WKDW SODFHV LQ URXWHUV DORQJ WKH ZD\ )LQDOO\ XQOLNH Geographic constraints WKH QRGH ZLWKLQ GLVWDQFH G IURP WKH WKH H[WHQVLYH OLJKWKRXVH QHWZRUN RU WKH &RQVWUDLQWV LQ 2FWDQW DUH JHRJUDSKLF ODQGPDUN GHÀQHV D UHJLRQ WKDW FRQVLVWV *36 VDWHOOLWHV WKDW SURYLGH DQ DOPRVW UXOHV WKDW GHVFULEH ZKHUH D QRGH FDQ RU RI WKH XQLRQ RI DOO FLUFOHV RI UDGLXV G DW XELTXLWRXV FRYHUDJH IRU QDYLJDWLRQ FDQQRW EH UHODWLYH WR D ODQGPDUN RQ WKH DOO SRLQWV LQVLGH ơN DV VKRZQ LQ )LJXUH WKH ,QWHUQHW ODFNV ZHOOSODFHG ZHOO JOREH 7KH FRQVWUDLQWV DUH GHULYHG IURP E ,Q FRQWUDVW D FRQVWUDLQW WKDW SODFHV NQRZQ ODQGPDUNV ZKRVH SRVLWLRQV DUH QHWZRUN PHDVXUHPHQWV EHWZHHQ QRGHV WKH QRGH IXUWKHU WKDQ GLVWDQFH G IURP NQRZQ SUHFLVHO\ $V D UHVXOW WKH ,QWHUQHW DQG ODQGPDUNV 7KHVH FRQVWUDLQWV FDQ WKH ODQGPDUN FDQ RQO\ VDIHO\ UXOH RXW WKH JHRORFDWLRQ SUREOHP LV DNLQ WR QDYLJDWLQJ QRW RQO\ EH RI WKH SRVLWLYH IRUP ´QRGH LQWHUVHFWLRQ RI DOO FLUFOHV RI UDGLXV G IURP

36 Challenges in Internet Geolocation FEATURE

Figure 1 (a – d): Comprehensive use of constraints in Octant. The exact location of an IP address is usually available from the ISP that dynamically assigned that IP address to a particular machine. However, acquiring IP addresses from the ISPs leads to a scalability problem when monitoring many IP addresses: special relationships with potentially thousands of ISPs around the world could be required, or court orders from multiple jurisdictions—which might not even be available—could critically delay proceedings.

DOO SRLQWV LQVLGH RI ơN UHJDUGOHVV RI ZKHUH D QRGH FDQ EH WUDQVODWHG LQWR D GLVWDQFH OHQJWKV EHWZHHQ ODQGPDUNV DQG WKH QRGH

WKH ODQGPDUN PLJKW DFWXDOO\ EH ZLWKLQ ơN FRQVWUDLQW XVLQJ WKH SURSDJDWLRQ GHOD\ DUH SURSRUWLRQDO WR JUHDWFLUFOH GLVWDQFHV 7KLV FRQGLWLRQ LV LOOXVWUDWHG LQ )LJXUH F RI OLJKW LQ ÀEHU DSSUR[LPDWHO\ ò WKH +RZHYHU WKLV LV RIWHQ QRW WKH FDVH LQ $ VFDODEOH 2FWDQW LPSOHPHQWDWLRQ PD\ VSHHG RI OLJKW 7KLV \LHOGV D FRQVHUYDWLYH SUDFWLFH GXH WR ,63V· XVH RI SROLF\

GHFLGH WR DSSUR[LPDWH FHUWDLQ FRPSOH[ ơN FRQVWUDLQW RQ QRGH ORFDWLRQV WKDW FDQ WKHQ URXWLQJ EDVHG RQ EXVLQHVV DJUHHPHQWV ZLWK D VLPSOH ERXQGLQJ FLUFOH LQ RUGHU WR EH VROYHG XVLQJ WKH 2FWDQW IUDPHZRUN WR $ JHRORFDWLRQ V\VWHP ZLWK D EXLOWLQ NHHS WKH QXPEHU RI FXUYHV SHU UHJLRQ LQ \LHOG D VRXQG HVWLPDWHG SRVLWLRQ IRU WKH DVVXPSWLRQ RI SURSRUWLRQDOLW\ ZRXOG FKHFN DQG WKXV JDLQ VFDODELOLW\ DW WKH FRVW QRGH VXFK DQ HVWLPDWH ZLOO QHYHU \LHOG QRW EH DEOH WR DFKLHYH JRRG DFFXUDF\ RI PRGHVW HUURU )LJXUHG LOOXVWUDWHV WKH DQ LQIHDVLEOH ’ VROXWLRQ ,Q SUDFWLFH 6SHFLÀFDOO\ QRGHV PLJKW FKRRVH FRQVWUDLQW DSSUR[LPDWLRQ KRZHYHU ,QWHUQHW SDWKV GHYLDWH VR PXFK XQH[SHFWHGO\ ORQJ DQG FLUFXLWRXV URXWHV *LYHQ D VHW RI FRQVWUDLQWV D SUHFLVH IURP JUHDWFLUFOH GLVWDQFHV WKDW VXFK IRU FHUWDLQ ,3 DGGUHVVHV 7KLV RFFXUV UHJLRQ FDQ EH HIÀFLHQWO\ FRPSXWHG FRQVWUDLQWV DUH VR ORRVH WKDW WKH\ OHDG WR RIWHQ HQRXJK LQ SUDFWLFH WKDW DFFXUDWH JHRPHWULFDOO\ E\ WDNLQJ WKH LQWHUVHFWLRQ YHU\ ORZ SUHFLVLRQ JHRORFDWLRQ UHTXLUHV D PHFKDQLVP WR RI SRVLWLYH FRQVWUDLQWV DQG VXEWUDFWLQJ WKH

ORFDWLRQ UHJLRQ GRZQ WR WKH HPSW\ WZR WLJKW ERXQGV 5/ G DQG U/ G  IRU 2FWDQW FDQ SHUIRUP ORFDOL]DWLRQ EDVHG VHW 2QH VWUDWHJ\ LV WR XVH RQO\ KLJKO\ ODQGPDUN / DQG ODWHQF\ PHDVXUHPHQW G VROHO\ RQ URXQGWULS WLPLQJV ORFDOL]LQJ FRQVHUYDWLYH FRQVWUDLQWV GHULYHG IURP WKH )RU D QRGH $ ZKRVH SLQJ WLPH WR ODQGPDUN URXWHUV GRHV QRW UHTXLUH DQ\ DGGLWLRQDO

VSHHG RI OLJKW ERXQGLQJ WKH PD[LPXP / LV G$ 2FWDQW FDQ GHULYH WKH FRQVWUDLQW FRGH WR EH GHSOR\HG ZLWKLQ WKH QHWZRUN GLVWDQFH D SDFNHW FDQ WKHRUHWLFDOO\ WUDYHO U G ” ""ORF / ï ORF $ "" ” 5 G  / $ / $ Handling uncertainty LQ D JLYHQ WLPH :H VKRZ ODWHU KRZ WR ERXQGLQJ WKH QRGH·V GLVWDQFH IURP WKH FRPSXWH UREXVW VROXWLRQV WKDW DUH UHVLOLHQW ODQGPDUN ,Q SUDFWLFH WKLV DSSURDFK :LWK WKH PDQ\ DYHQXHV WR H[WUDFW WR HUURU DQG PHDVXUHPHQW QRLVH \LHOGV JRRG UHVXOWV ZKHQ WKHUH DUH JHRJUDSKLF FRQVWUDLQWV IURP WKH QHWZRUN D PHFKDQLVP WR KDQGOH DQG ÀOWHU RXW Mapping latencies to distances VXIÀFLHQW ODQGPDUNV WKDW LQWHUODQGPDUN PHDVXUHPHQWV DSSUR[LPDWH ODQGPDUNWR HUURQHRXV FRQVWUDLQWV LV FULWLFDO IRU 7KH QHWZRUN ODWHQF\ EHWZHHQ D QRGH PHDVXUHPHQWV PDLQWDLQLQJ KLJK ORFDOL]DWLRQ DFFXUDF\ QRGH DQG D ODQGPDUN SK\VLFDOO\ ERXQGV 2FWDQW XVHV D ZHLJKW DVVLJQPHQW WKHLU PD[LPXP JHRJUDSKLFDO GLVWDQFH Indirect routes PHFKDQLVP WR FKDUDFWHUL]H WKH FRQÀGHQFH $ URXQGWULS ODWHQF\ PHDVXUHPHQW RI G 7KH SUHFHGLQJ GLVFXVVLRQ PDGH RI GLIIHUHQW FRQVWUDLQWV $ FRQVWUDLQW·V PLOOLVHFRQGV EHWZHHQ D ODQGPDUN DQG WKH VLPSOLI\LQJ DVVXPSWLRQ WKDW URXWH UHODWLYH ZHLJKW YDOXH DPSOLÀHV RU

The Next Wave „ Vol 18 No 3 „ 2010 37 GDPSHQV LWV FRQWULEXWLRQ LQ HVWLPDWLQJ WKH H[FHHG D GHVLUHG ZHLJKW RU UHJLRQ VL]H LQJ D JOREDO PDSSLQJ RI ,3 DGGUHVV UDQJHV ORFDWLRQ UHJLRQ RI WKH QRGH RI LQWHUHVW WKUHVKROG WR WKHVH NH\ SRLQWV WR SURYLGH D ORFDWLRQ HVWLPDWH RI HYHU\ ,3 DGGUHVV :H FDQ )RU ODWHQF\EDVHG FRQVWUDLQWV Future directions ODQGPDUNV IDUWKHU IURP D QRGH DUH OHVV SHUIRUP PRUH SUHFLVH JHRORFDWLRQ RQ WUXVWZRUWK\ WKDQ WKRVH WKDW DUH QHDUE\ 7KH 7KH H[LVWLQJ 2FWDQW IUDPHZRUN VSHFLÀF QRGHV ZLWK IHZHU RQGHPDQG VLPSOH LQWXLWLRQ EHKLQG WKLV UHODWLRQVKLS LV LV DFFXUDWH DQG FRPSUHKHQVLYH EXW LW SUREHV E\ OHYHUDJLQJ WKLV JOREDO PDSSLQJ WKDW ODWHQF\ LQ IDUDZD\ QRGHV LQFUHDVHV ZDV GHVLJQHG WR SHUIRUP RQGHPDQG RU XVH WKH JOREDO PDSSLQJ WR LGHQWLI\ GXH WR WKH KLJKHU SUREDELOLW\ RI GDWD QHWZRUN PHDVXUHPHQWV WR JHRORFDWH D QHLJKERUV WKDW FDQ EH SUREHG DV SUR[LHV LI SDFNHWV WUDYHUVLQJ LQGLUHFW PHDQGHULQJ VLQJOH QRGH DW D WLPH &RQVHTXHQWO\ WKH QRGH LV VHQVLWLYH WR SURELQJ URXWHV RU KLJKO\ FRQJHVWHG SDWKV ,Q 2FWDQW UHOLHV DOPRVW HQWLUHO\ RQ DFWLYH 7KHUH DUH RI FRXUVH DVVRFLDWHG 2FWDQW HYHU\ FRQVWUDLQW KDV DQ DVVRFLDWHG QHWZRUN PHDVXUHPHQWV SHUIRUPHG RQ SULYDF\ FRQFHUQV ZLWK JHRORFDWLRQ FRQÀGHQFH OHYHO ZKLFK LV WUDFNHG WKURXJK GHPDQG LW GRHV QRW SHUIRUP DQ\ SUH 6XFK WHFKQRORJLHV SRLQW WR SRWHQWLDO WKH FRQVWUDLQWVDWLVIDFWLRQ SURFHVV 7KLV FRPSXWDWLRQ DQG LW GRHV QRW WDNH LQWR WKUHDWV IRU XQDXWKRUL]HG SHUVRQQHO WR SURFHVV \LHOGV QRW RQO\ WKH VHW RI IHDVLEOH DFFRXQW DQ\ ORQJWHUP QHWZRUN HIIHFWV LGHQWLI\ WKH ORFDWLRQ RI VHQVLWLYH DVVHWV SRLQWV ZKHUH WKH QRGH FDQ SRWHQWLDOO\ OLH RU SHUIRUP ORQJWHUP PHDVXUHPHQWV WR VXJJHVWLQJ WKH QHHG IRU IXUWKHU ZRUN RQ EXW DOVR EH WKH DVVRFLDWHG SUREDELOLW\ IRU DLG JHRORFDWLRQ 2QGHPDQG SURELQJ XQGHUVWDQGLQJ WKH WKHRUHWLFDO OLPLWV RI WKH QRGH UHVLGLQJ DW HDFK SRLQW LQ ODUJHVFDOH GHSOR\PHQWV SRVHV VRPH JHRORFDWLRQ WHFKQLTXHV DQG GHYHORSLQJ DGGLWLRQDO FKDOOHQJHV )LUVWO\ JLYHQ ,Q WKH DEVHQFH RI ZHLJKWV UHJLRQV FRXQWHUPHDVXUHV ZKHUH QHFHVVDU\ WKH VHFXULW\ FRQVFLRXVQHVV RI PRGHUQ FDQ EH FRPELQHG YLD LQWHUVHFWLRQ QHWZRUN PDQDJHPHQW SROLFLHV RQ RSHUDWLRQV OHDGLQJ WR D GLVFUHWH VROXWLRQ GHPDQG QHWZRUN SURELQJ LV FRQVLGHUHG IRU D ORFDWLRQ HVWLPDWH³WKH QRGH LV KLJKO\ XQGHVLUDEOH ZKHQ SHUIRUPHG DW HLWKHU ZLWKLQ D UHJLRQ RU LW OLHV RXWVLGH D ODUJH VFDOH LQ EXUVWV DQG WR DUELWUDU\ 7KH LQWURGXFWLRQ RI ZHLJKWV FKDQJHV WKH FOLHQWV 6XFK SUREHV FDQ EH PLVFODVVLÀHG LPSOHPHQWDWLRQ RI ORFDWLRQ HVWLPDWHV DQG WKHLU GHOLYHU\ PD\ EH LQWHQWLRQDOO\ :KHQ FRPELQLQJ WZR UHJLRQV 2FWDQW GHOD\HG RU GURSSHG LQ UHVSRQVH 6HFRQGO\ GHWHUPLQHV DOO SRVVLEOH UHVXOWLQJ UHJLRQV WKH H[SHQVLYH FRQVWUDLQW HYDOXDWLRQ YLD LQWHUVHFWLRQV DQG RYHUODSSLQJ UHJLRQV FRPSXWDWLRQ LV GHSHQGHQW RQ WKH DUH DVVLJQHG WKH VXP RI WKHLU FRPSRQHQW FRQVWUDLQWV H[WUDFWHG IURP WKH RQGHPDQG ZHLJKWV 1RQRYHUODSSLQJ UHJLRQV DUH QHWZRUN PHDVXUHPHQWV 7KH FRQVWUDLQW UHWDLQHG ZLWK WKHLU RULJLQDO ZHLJKWV HYDOXDWLRQ PXVW WKHUHIRUH EH SHUIRUPHG 7KLV FRQGLWLRQ LV LOOXVWUDWHG LQ )LJXUH DW UXQWLPH )LQDOO\ RQGHPDQG SURELQJ LV  7KH ÀQDO HVWLPDWHG ORFDWLRQ UHJLRQ REVHUYDEOH ZDUQLQJ WKH QRGH RI LQWHUHVW LV FRPSXWHG E\ WDNLQJ WKH XQLRQ RI DOO RI SRVVLEOH VXUYHLOODQFH UHJLRQV VRUWHG E\ ZHLJKW VXFK WKDW WKH\ :H DUH FXUUHQWO\ LQYHVWLJDWLQJ JHRORFDWLRQ WHFKQLTXHV EDVHG RQ SDVVLYH PHDVXUHPHQWV 2XU UHVHDUFK FRQGXFWHG LQ FROODERUDWLRQ ZLWK UHVHDUFKHUV DW 'XNH 8QLYHUVLW\ DQG $NDPDL 7HFKQRORJLHV FHQWHUV RQ LGHQWLI\LQJ NH\ LQJUHVV SRLQWV LQ WKH QHWZRUN ZKHUH HQGXVHUV DUH FRQQHFWHG WR WKH ,QWHUQHW FRUH DFFXUDWHO\ ORFDWLQJ WKHVH SRLQWV RQ WKH Figure 2: Octant assigns weights to constraints based on JOREH XVLQJ SHULRGLF their inherent accuracy. Overlapping regions are given the of the weights of their components. PHDVXUHPHQWV DQG FUHDW

38 Challenges in Internet Geolocation FEATURE

Clumps, Hoops, and Bubbles— Moving Beyond Clustering n the Analysis of Data

xploratory data analysis is the search for structure in complex data. In many cases the origins and statistical properties of a d(ata set may be poorly understood. Data may also be incomplete or noisy. How can we understand the information contained in such a data set when we don’t even know what questions to ask?

One of the most common first steps in data analysis is to cluster the data points—that is, to look for groups of data points that appear to have some set of characteristics in common. Data clusters are statistical features that can be discovered by a computer and then investigated further. In a topological sense, however, data clusters are the simplest of structure—a connected mass of points.

The rapidly developing new field of topological data analysis has given us algorithms that can be thought of as a higher-dimensional analogue to data clustering. This paper will explain some of these topological methods and give examples of how they have been used.

The Next Wave „ Vol 18 No 3 „ 2010 39 ,Q IDFW ZKDW \RX ZRXOG KRSH WR KRPHRPRUSKLF $OJHEUDLF WRSRORJ\ ZDV REVHUYH LV D VRUW RI KROH LQ WKH VTXLUUHO GHYHORSHG WR FRPSDUH WKH SURSHUWLHV RI ORFDWLRQ GDWD :KLOH GDWD FOXVWHULQJ WRSRORJLFDO VSDFHV ZLWKRXW GHDOLQJ ZLWK DOJRULWKPV GHWHFW FRQQHFWHG FRPSRQHQWV WKH VSDFHV GLUHFWO\ $OJHEUDLF WRSRORJLVWV LQ D GDWD VHW D KROH LV WKH DEVHQFH RI D KDYH GHÀQHG WRSRORJLFDO LQYDULDQWV WKDW FRQQHFWHG FRPSRQHQW 7RSRORJLVWV KDYH ODEHO HYHU\ WRSRORJLFDO VSDFH ZLWK VRPH IRUPDOL]HG WKLV QRWLRQ RI ´KROHVµ LQ PRUH FRQFUHWH PDWKHPDWLFDO REMHFW VXFK DUELWUDU\ GLPHQVLRQV DV D QXPEHU RU D JURXS 7KH NH\ SURSHUW\ 6WXG\LQJ WKH WRSRORJLFDO SURSHUWLHV RI DQ LQYDULDQW LV WKDW VSDFHV WKDW DUH RI GLVFUHWH GDWD LV GLIÀFXOW IRU VHYHUDO WRSRORJLFDOO\ HTXLYDOHQW PXVW EH DVVLJQHG UHDVRQV 7KH WUDGLWLRQDO DOJRULWKPV IRU WKH VDPH ODEHO ,I WZR VSDFHV KDYH DQDO\]LQJ WRSRORJ\ WHQG WR EH VORZ DQG GLIIHUHQW ODEHOV \RX NQRZ IRU VXUH WKH\ WKH FRPSXWDWLRQV WHQG WR EH VHQVLWLYH DUH WRSRORJLFDOO\ GLIIHUHQW )RU H[DPSOH WR QRLVH ,Q UHFHQW \HDUV KRZHYHU D WRSRORJLFDO LQYDULDQW ZRXOG KDYH WR JLYH Figure 1: This squirrel will your plants PDWKHPDWLFLDQV DQG FRPSXWHU VFLHQWLVWV WKH VDPH ODEHO WR D FLUFOH DQG D VTXDUH and not feel bad about it KDYH GHYHORSHG VHYHUDO QHZ DOJRULWKPV WKDW DUH WDNLQJ H[SORUDWRU\ GDWD DQDO\VLV Motivating example: LQWR QHZ GLPHQVLRQV Bushy-tailed rats Algebraic topology in pictures /HW·V IDFH LW³VTXLUUHOV DUH WKH $W D EDVLF OHYHO WRSRORJ\ LV D WHUURULVWV RI WKH DQLPDO NLQJGRP *LYHQ PRUH OHQLHQW YHUVLRQ RI JHRPHWU\ DFUHV DQG DFUHV RI GLUW WR GLJ LQ WKH\ ZLOO 7RSRORJ\ LV RIWHQ FDOOHG ´UXEEHU VKHHW LQYDULDEO\ MXPS LQWR WKH QHDUHVW ÁRZHUSRW JHRPHWU\µ EHFDXVH VSDFHV WKDW FDQ DQG XSURRW \RXU EHORYHG JHUDQLXPV ,Q EH FRQWLQXRXVO\ WUDQVIRUPHG LQWR RQH WKH IDOO WKH\ VLW LQ WUHHV DQG WKURZ KDOI DQRWKHU DUH WRSRORJLFDOO\ WKH VDPH )RU HDWHQ DFRUQV DW \RX  H[DPSOH WR D WRSRORJLVW D VTXDUH DQG D 6XSSRVH D QDWXUDOLVW PDUNV DOO FLUFOH DUH HTXLYDOHQW D WRSRORJLVW ZRXOG Figure 2: Data points are represented by squirrel shapes. The data points form two WKH VTXLUUHOV LQ D ZLOGOLIH SUHVHUYH ZLWK VD\ ´KRPHRPRUSKLFµ EHFDXVH WKHUH LV distinct clusters. These clusters represent HOHFWURQLF WDJV WKDW UHFRUG WKHLU ORFDWLRQV D ZD\ WR SDLU HYHU\ SRLQW RQ D VTXDUH an interesting bit of statistical structure HYHU\ KRXU ([SHULPHQWV OLNH WKLV KDYH ZLWK D SRLQW RQ D FLUFOH LQ VXFK D ZD\ that can be further explored. Refuge map is adapted from [3]. EHHQ GRQH ZLWK ]HEUDV DQG ZKDOHV >@ WKDW SRLQWV WKDW DUH FORVH WRJHWKHU RQ WKH >@  (DFK GDWD SRLQW ZRXOG FRQVLVW RI FLUFOH DUH SDLUHG ZLWK SRLQWV WKDW DUH FORVH WKH JHRJUDSKLF FRRUGLQDWHV DQG ,' RI WKH WRJHWKHU RQ WKH VTXDUH DQG YLFH YHUVD  VTXLUUHO ,I WKHUH ZHUH WZR SRSXODWLRQV 2Q WKH RWKHU KDQG D ÀJXUHHLJKW LV QRW RI VTXLUUHOV OLYLQJ LQ RSSRVLWH FRUQHUV RI WRSRORJLFDOO\ HTXLYDOHQW WR D FLUFOH 2QH WKH SUHVHUYH )LJXUH   WKDW IDFW FRXOG ZD\ WR VHH WKLV GLIIHUHQFH LV E\ UHPRYLQJ EH GLVFRYHUHG E\ DQ\ RQH RI D QXPEHU RI DQ\ RQH SRLQW IURP D FLUFOH ZKLFK OHDYHV GDWD FOXVWHULQJ DOJRULWKPV RQH FRQQHFWHG DUF %XW UHPRYLQJ WKH 1RZ VXSSRVH \RX ZDQW WR GHWHFW LQWHUVHFWLQJ SRLQW IURP D ÀJXUHHLJKW D ODFN RI GDWD SRLQWV LQ D UHJLRQ )RU OHDYHV WZR GLVFRQQHFWHG DUFV H[DPSOH VXSSRVH WKDW D IR[ OLYHV LQ WKH 8QIRUWXQDWHO\ GHFLGLQJ ZKHQ FHQWHU RI WKH QDWXUH SUHVHUYH DQG SUH\V RQ WZR VSDFHV DUH WRSRORJLFDOO\ HTXLYDOHQW WKH VTXLUUHOV 7KHUHIRUH QR VTXLUUHOV FDQ KRPHRPRUSKLF LV QRW HDV\ EHFDXVH Figure 3: In this example, the squirrels EH IRXQG LQ WKDW UHJLRQ RI WKH SDUN )LJXUH LW LV GLIÀFXOW WR ÀQG D ZD\ WR SDLU WKH seem to avoid the region in the center. This is also an interesting bit of structure in the   +RZ FRXOG D GDWD FOXVWHULQJ DOJRULWKP SRLQWV RI WZR VSDFHV (YHQ PRUH GLIÀFXOW data. But how do you discover or interpret GHWHFW WKLV DYRLGDQFH EHKDYLRU" LV WR SURYH WKDW WZR VSDFHV DUH QRW the absence of data points in a region?

 7KLV H[DPSOH LV FRPSOHWHO\ PDGH XS DQG WKH DXWKRU KDV DEVROXWHO\ QR H[SHUWLVH LQ VTXLUUHO EHKDYLRU RU ELRORJ\

40 Clumps, Hoops, and Bubbles FEATURE

KDYH D VLQJOH RQHGLPHQVLRQDO KROH $ ÀJXUHHLJKW RQ WKH RWKHU KDQG Fact: A computer can

KDV ơ  EHFDXVH LW KDV WZR GLVWLQFW compute the Betti numbers RQHGLPHQVLRQDO KROHV 0RYLQJ XS D of a simplicial complex.

GLPHQVLRQ D VSKHUH KDV ơ  EXW

ơ  ,W KDV QR RQHGLPHQVLRQDO KROHV EXW GRHV KDYH D YRLG LQVLGH LW )LJXUH  $ VLPSOLFLDO FRPSOH[ LV D FROOHFWLRQ VKRZV VHYHUDO FRPPRQ KRXVHKROG REMHFWV RI REMHFWV FDOOHG VLPSOLFHV $ VLPSOH[

DORQJ ZLWK WKHLU %HWWL QXPEHUV 7KH LV D SRLQW ZULWWHQ >D@ $ VLPSOH[ FDQ

]HURWK %HWWL QXPEHU ơ FRUUHVSRQGV WR EH WKRXJKW RI DV D OLQH FRQQHFWLQJ WZR

WKH QXPEHU RI FRQQHFWHG FRPSRQHQWV LQ VLPSOLFHVZULWWHQ>D D@DQGDVLPSOH[ WKH VSDFH ; ,Q WKLV VHQVH FRPSXWLQJ WKH LV D WULDQJOH ZLWK VLPSOLFHV DW HDFK Figure 4: To a topologist a coffee cup and a ]HURWK %HWWL QXPEHU LV DQDORJRXV WR GRLQJ YHUWH[ DQG VLPSOLFHV DV HGJHV ZULWWHQ donut are equivalent objects. They are both solid three-dimensional objects with one GDWD FOXVWHULQJ >D D D@ $Q\ N   SRLQWV DDDN FDQ hole. Triangulating spaces GHÀQH D NVLPSOH[ >D D  DN@ ZKRVH IDFHV DUH N    VLPSOLFHV )LJXUH  LOOXVWUDWHV 0RVW LQWHUHVWLQJ WRSRORJLFDO VSDFHV VRPH ORZGLPHQVLRQDO VLPSOLFHV DQG DQ 7KH VLPSOHVW WRSRORJLFDO LQYDULDQWV DUH FRQWLQXRXV REMHFWV $ FROOHFWLRQ RI H[DPSOH RI D VLPSOLFLDO FRPSOH[ DUH WKH %HWWL QXPEHUV %HWWL QXPEHUV GR GDWD RQ WKH RWKHU KDQG LV MXVW D EXQFK 6HH UHIHUHQFHV >@ DQG >@ IRU PRUH KDYH D SUHFLVH PDWKHPDWLFDO GHÀQLWLRQ RI GLVFUHWH SRLQWV $ORQH WKH GDWD SRLQWV LQGHSWK PDWKHPDWLFDO EDFNJURXQG EXW LW LV SRVVLEOH WR H[SODLQ WKH LGHD KDYH QR LQWHUHVWLQJ WRSRORJ\ 7R VWXG\ 5HDGHUV IDPLOLDU ZLWK DOJHEUDLF WRSRORJ\ LQ SLFWXUHV )RU QRZ IRFXV LQVWHDG LQ D GDWD VHW WRSRORJLFDOO\ RQH KDV WR N WHUPV RI SLFWXUHV 6XSSRVH \RX KDYH D DVVXPH WKDW WKH GDWD SRLQWV DUH VDPSOHG DUH UHPLQGHG WKDW WKH WK %HWWL QXPEHU LV WRSRORJLFDO VSDFH FDOO LW ; ,QWXLWLYHO\ WKH IURP VRPH XQGHUO\LQJ VSDFH DQG EXLOG HTXDO WR WKH UDQN RI WKH IUHH SDUW RI WKH NWK KRPRORJ\ JURXS $V VXFK IRU VRPH NWK %HWWL QXPEHU ơN FRXQWV WKH QXPEHU D FRQWLQXRXV VWUXFWXUH RQ WKH SRLQWV RI NGLPHQVLRQDO ´KROHVµ DQG WKH ]HURWK 8VXDOO\ WKLV DPRXQWV WR EXLOGLQJ ZKDW VSDFHV WKH NWK %HWWL QXPEHU·V YDOXH ZLOO GHSHQG RQ WKH ULQJ RYHU ZKLFK WKH %HWWL QXPEHU ơ FRXQWV WKH QXPEHU RI WRSRORJLVWV FDOO D VLPSOLFLDO FRPSOH[ FRQQHFWHG FRPSRQHQWV LQ ; )LJXUH  LOOXVWUDWHV WKH GLIIHUHQFH EHWZHHQ FRPSXWDWLRQ LV SHUIRUPHG ,Q WKLV SDSHU )RU H[DPSOH D VTXDUH DQG D D FRQWLQXRXV VSDFH GLVFUHWH GDWD SRLQWV DOO FRPSXWDWLRQV DUH RYHU D ÀHOG VR QR WRUVLRQ SDUWV HYHU DSSHDU FLUFOH ERWK KDYH ơ  VLQFH WKH\ ERWK DQG D WULDQJXODWLRQ Persistent homology A solid 2-dimensional blob A sphere /RRN DW WKH FROOHFWLRQ RI GDWD SRLQWV LQ )LJXUH  D 

‡ LI "[L  [M 7 WKHQ DGG WKH VLPSOH[

>[L [M@

‡ LI "[L  [M" "[M  [N" "[L  [N 7 WKHQ DGG

Figure 5: Some simple topological spaces and their Betti numbers. Intuitively, the kth Betti WKH VLPSOH[ >[L [M [N@ number counts the number of k-dimensional holes in the space.

The Next Wave „ Vol 18 No 3 „ 2010 41 Continuous Space Discrete Data Points Triangulation Simplices Simplicial Complex

Figure 6: A set of discrete points has no interesting topology. Studying the Figure 7: Low-dimensional simplices and triangulation leads to finding the interesting topology. a Simplicial Complex

)LJXUHV  EH VKRZ WULDQJXODWLRQV WRSRORJLFDO IHDWXUHV OLNH ORRSV WKURXJK 7RSRORJLFDO IHDWXUHV WKDW SHUVLVW WKURXJK FRQVWUXFWHG XVLQJ VXFK D SURFHGXUH D QHVWHG VHTXHQFH RI VSDFHV DQG GLVFRYHUV PRVW RI WKH VHTXHQFH DUH UHSUHVHQWHG E\ IRU VHYHUDO GLIIHUHQW WKUHVKROG YDOXHV ZKHQ WKH\ DSSHDU DQG ZKHQ WKH\ DUH ORQJ OLQHV DQG IHDWXUHV WKDW DSSHDU DQG DUH 6LQFH WKH WKUHVKROG LV LQFUHDVLQJ HDFK ÀOOHG LQ 7RSRORJLFDO IHDWXUHV WKDW IRUP TXLFNO\ ÀOOHG LQ QRLVH DUH UHSUHVHQWHG WULDQJXODWLRQ LQ WKH VHTXHQFH LV D VXEVHW RI DQG TXLFNO\ GLVDSSHDU DUH FRQVLGHUHG WKH E\ VKRUW OLQHV 7KH KRUL]RQWDO D[LV LQ D WKH QH[W 2QH ZRXOG KRSH WKDW FRPSXWLQJ SURGXFW RI QRLVH EXW WKH IHDWXUHV WKDW SHUVLVWHQFH EDUFRGH LV ODEHOHG ZLWK WKH WKH %HWWL QXPEHUV RI WKHVH WULDQJXODWLRQV SHUVLVW IURP HDUO\ LQ WKH VHTXHQFH WR WKH WKUHVKROG YDOXHV WKDW ZHUH XVHG WR EXLOG ZRXOG JLYH WKH %HWWL QXPEHUV RI D FLUFOH HQG DUH PRUH IXQGDPHQWDO DQG LQWHUHVWLQJ WKH WULDQJXODWLRQV QDPHO\ ơ  $V LW WXUQV RXW HYHU\  6XSSRVH D ORRS ÀUVW IRUPV LQ WKH )LJXUH  VKRZV WKH %HWWL SHUVLVWHQFH WULDQJXODWLRQ SLFWXUHG JLYHV WKH ZURQJ VSDFH FRQVWUXFWHG ZLWK WKUHVKROG YDOXH EDUFRGH IRU WKH GDWD LQ )LJXUH  7KHUH DUH DQVZHU 'XH WR WKH QRLVH LQ WKH GDWD DV 7 E DQG ÀUVW JHWV ÀOOHG LQ DW WKUHVKROG VHYHUDO VKRUW EOXH OLQHV FRUUHVSRQGLQJ WKH WKUHVKROG 7 LQFUHDVHV GDWD SRLQWV YDOXH 7 G 7KHQ WKH SHUVLVWHQFH LQWHUYDO WR ORRSV WKDW IRUPHG DQG ZHUH TXLFNO\ JHW FRQQHFWHG IRUPLQJ ORRSV 7KH ORRSV IRU WKDW ORRS LV EG  7KH VDPH LV WUXH IRU ÀOOHG LQ DV WKH WKUHVKROG XVHG WR FRQVWUXFW JHQHUDWHG E\ WKH QRLVH DUH TXLFNO\ ÀOOHG LQ KLJKHU GLPHQVLRQDO IHDWXUHV WKH WULDQJXODWLRQV LQFUHDVHG 7KH ORQJ ZLWK WULDQJOHV DV 7 FRQWLQXHV WR LQFUHDVH EOXH OLQH FRUUHVSRQGV WR WKH RYHUDOO Visualizing persistence EXW QHZ ORRSV IRUP SUHYHQWLQJ XV IURP FLUFXODU VWUXFWXUH RI WKH GDWD RU UDWKHU

REWDLQLQJ D VSDFH ZLWK ơ  7KHUH DUH VHYHUDO ZD\V WR YLVXDOO\ WKH KROH LQ WKH PLGGOH RI WKH GDWD  7KH ,W ZRXOG EH LGHDO WR GLVWLQJXLVK UHSUHVHQW SHUVLVWHQFH LQIRUPDWLRQ YDOXHV RQ WKH KRUL]RQWDO D[LV FRUUHVSRQG EHWZHHQ WKRVH ORRSV WKDW IRUP DQG DUH 2QH SRSXODU YLVXDO UHSUHVHQWDWLRQ LV WR WKUHVKROG YDOXHV 7KLV H[DPSOH LV TXLFNO\ ÀOOHG LQ DQG WKRVH ORRSV WKDW FDOOHG WKH SHUVLVWHQFH EDUFRGH ,Q D MXVW WKH EDUFRGH IRU RQHGLPHQVLRQDO SHUVLVW IRU D ORQJ WLPH DV 7 LQFUHDVHV SHUVLVWHQFH EDUFRGH HDFK WRSRORJLFDO WRSRORJLFDO IHDWXUHV ORRSV  )RU DQ\ 7KLV LV H[DFWO\ ZKDW SHUVLVWHQW KRPRORJ\ IHDWXUH WKDW DULVHV LQ WKH VHTXHQFH RI VSDFH WKHUH DUH DOVR SHUVLVWHQFH EDUFRGHV DOORZV XV WR GR ,W WUDFNV LQGLYLGXDO VSDFHV LV UHSUHVHQWHG E\ D KRUL]RQWDO OLQH IRU WRSRORJLFDO IHDWXUHV LQ GLPHQVLRQ 

Figure 8(a–e): A nested sequence of triangulations of a set of data points. Because the data points form what looks like a circle, it seems reasonable to compute the Betti number . In fact, computing the first Betti number for every triangulation in the sequence ơ  gives the wrong answer. This outcome shows how noise can be a problem in topological computations.

42 Clumps, Hoops, and Bubbles FEATURE

&+RP3 >@ KDV GHYHORSHG D YDULHW\ RI DOJRULWKPV DQG VRIWZDUH IRU HIÀFLHQWO\ FRPSXWLQJ %HWWL QXPEHUV KRPRORJ\ DQG IXQFWLRQV RQ KRPRORJ\ 1RWH WKDW &+RP3 ZRUNV ZLWK FXELFDO VLPSOLFHV LQVWHDG 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 RI VLPSOLFLDO FRPSOH[HV 7KH WKHRU\ LV HTXLYDOHQW EXW VRPHWLPHV D FRPSXWHU FDQ Figure 9: The persistence barcode for the data pictured in Figure 8. The short blue lines correspond to loops that appeared and were quickly filled in as the threshold increased PRUH QDWXUDOO\ UHSUHVHQW D VSDFH LQ WHUPV (noise), and the long line corresponds to the circular structure of the data. RI VTXDUHV DQG FXEHV WKDQ DV WULDQJOHV DQG WHWUDKHGUD 2QH W\SH RI VLPSOLÀFDWLRQ FOXPSV  GLPHQVLRQ  EXEEOHV  DQG XVHV OLQHDU DOJHEUD D ORW RI PDWULFHV  7R GHYHORSHG E\ &+RP3 UHVHDUFKHUV WKDW KDV KLJKHU GLPHQVLRQV 7KH EDUFRGH IRU DQ\ FRPSXWH WKH NWK %HWWL QXPEHU ơ  RI D N DQ HIIHFWLYH YLVXDO LQWHUSUHWDWLRQ LV WKH XVH GLPHQVLRQ KLJKHU WKDQ WKH GLPHQVLRQ RI VLPSOLFLDO FRPSOH[ ZLWK 1 NGLPHQVLRQDO RI ´UHGXFWLRQµ ,W WXUQV RXW WKDW RIWHQ D ORW WKH VSDFH ZLOO DOZD\V EH HPSW\ VLPSOLFHV WKH FRPSXWHU QHHGV WR GHDO RI WKH VLPSOLFHV LQ D VLPSOLFLDO FRPSOH[ ZLWK DQ 1 = 1 PDWUL[ 7KH FRPSOH[LW\ RI $OO RI WKH SHUVLVWHQFH UHVXOWV FDQ EH FROODSVHG ZLWKRXW DIIHFWLQJ WKH WKH DOJRULWKP LV SRO\QRPLDO LQ 1 ZKLFK SUHVHQWHG KHUH ZLOO EH H[SUHVVHG LQ WHUPV KRPRORJ\ RU %HWWL QXPEHUV RI WKH VSDFH SUHVHQWV D SUREOHP ZKHQ ZRUNLQJ ZLWK RI EDUFRGHV VR D VLPSOHU H[DPSOH LV ZRUWK 6LQFH WKH KRPRORJ\ FRPSXWDWLRQ FDQ EH GDWD VHWV ZLWK PLOOLRQV RU HYHQ WKRXVDQGV D FORVHU ORRN )LJXUH  VKRZV D SRSXODU SRO\QRPLDO LQ WKH QXPEHU RI VLPSOLFHV RI GDWD SRLQWV H[DPSOH RI D VHTXHQFH RI QHVWHG VSDFHV SUHSURFHVVLQJ WR UHGXFH WKH QXPEHU RI ,W WXUQV RXW WKDW WKH PDWULFHV XVHG Computing homology VLPSOLFHV KDV D ELJ SD\RII WR FRPSXWH %HWWL QXPEHUV KDYH D ORW RI ,I D VSDFH FDQ EH VKRZQ DV D VWUXFWXUH DQG ERWK WKH\ DQG WKH VLPSOLFLDO Computing persistent homology VLPSOLFLDO FRPSOH[ WKHQ D FRPSXWHU FDQ FRPSOH[ LWVHOI FDQ EH VLPSOLÀHG WR 7KH FRPSXWDWLRQV GRQH E\ &+RP3 FRPSXWH LWV %HWWL QXPEHUV 7KH WUDGLWLRQDO JLYH PRUH HIÀFLHQW FRPSXWDWLRQV 7KH DOO GHDO ZLWK D VLQJOH VSDFH RU IXQFWLRQ DOJRULWKP IRU GRLQJ WKLV FRPSXWDWLRQ &RPSXWDWLRQDO +RPRORJ\ 3URMHFW $V DOUHDG\ PHQWLRQHG %HWWL QXPEHU 0 Betti 1 Betti

Figure 10: A simple example of a nested sequence of spaces and the corresponding persistence barcodes for dimensions  and . At there are two disconnected points, so there are two lines in the persistence barcode. At those two points become 7  %HWWL 7  connected, so only one of the lines in the barcode persists. A new disconnected point appears, though, so a new line in the barcode %HWWL begins at . At the first loop forms, so a line appears in the barcode. All of the points are connected at this point. From 7  7  %HWWL on there is only one line in the barcode. At the loop is divided into two distinct loops, so a new line appears in the 7  %HWWL 7  %HWWL barcode. At one of the extra loops is filled in, so the second line in the barcode ends. At all loops are filled in, so all lines 7  %HWWL 7  in the barcode have ended. %HWWL

The Next Wave „ Vol 18 No 3 „ 2010 43 FRPSXWDWLRQV FDQ EH YHU\ VHQVLWLYH WR QXPEHU RI GDWD SRLQWV RQH VWDUWV ZLWK DQG EHWZHHQ SHUVLVWHQFH GLDJUDPV $OVR QRLVH DQG SHUVLVWHQW KRPRORJ\ FDQ EH EXLOGV D PRUH HIÀFLHQW WULDQJXODWLRQ >@ WKH SURSHUWLHV RI D SHUVLVWHQFH EDUFRGH XVHG WR GLVWLQJXLVK EHWZHHQ LPSRUWDQW ,QYHVWLJDWLRQV DUH XQGHUZD\ WR PDNH WKH DUH YHU\ GHSHQGHQW RQ WKH PHWKRG XVHG WRSRORJLFDO IHDWXUHV DQG IHDWXUHV DULVLQJ &+RP3VW\OH UHGXFWLRQ FRPSDWLEOH ZLWK WR WULDQJXODWH WKH GDWD DQG JHQHUDWH WKH IURP QRLVH $Q DOJRULWKP WR FRPSXWH WKH SHUVLVWHQFH DOJRULWKP QHVWHG VHTXHQFH RI VLPSOLFLDO FRPSOH[HV SHUVLVWHQW KRPRORJ\ RYHU WKH ÀQLWH ÀHOG Software tools ,Q >@ GH 6LOYD DQG *KULVW SURYH D = ZDV ÀUVW SXEOLVKHG E\ (GHOVEUXQQHU UHODWLRQVKLS EHWZHHQ WKH 5LSV DQG ÿHFK /HWVFKHU DQG =RPRURGLDQ LQ  >@ 7KH &+RP3 WRROV DUH DYDLODEOH FRPSOH[ RI D GDWD VHW ,Q >@ &KD]DO DQG $ PRUH JHQHUDO DSSURDFK DQG DOJRULWKP IURP 'DUWPRXWK >@ 7KH SHUVLVWHQW 2XGRW VWXG\ WKH UHODWLRQVKLS EHWZHHQ DORQJ ZLWK D PRUH SRZHUIXO PDWKHPDWLFDO KRPRORJ\ WRRO 3OH[ LV D FROOHFWLRQ RI 5LSV ÿHFK DQG ZLWQHVV FRPSOH[HV DQG FRQWH[W IRU WKH DOJRULWKP ZDV SXEOLVKHG 0$7/$% PRGXOHV DQG VFULSWV UHOHDVHG E\ WKHLU HIIHFWV RQ SHUVLVWHQFH FRPSXWDWLRQV E\ =RPRURGLDQ DQG &DUOVVRQ LQ  >@ UHVHDUFKHUV DW 6WDQIRUG >@ ,W VXSSRUWV WKH FRPSXWDWLRQ DQG YLVXDOL]DWLRQ RI Applications and examples 7KH SHUVLVWHQFH DOJRULWKP WDNHV D QHVWHG VHTXHQFHV RI VLPSOLFLDO FRPSOH[HV 3HUVLVWHQW KRPRORJ\ KDV SURYHQ QHVWHG VHTXHQFH RI VLPSOLFLDO FRPSOH[HV DQG SHUVLVWHQW KRPRORJ\ 7KH 6WDQIRUG XVHIXO IRU H[WUDFWLQJ WRSRORJLFDO DQG JHQHUDWHV D FROOHFWLRQ RI SHUVLVWHQFH UHVHDUFKHUV KDYH UHFHQWO\ UHOHDVHG D QHZ LQIRUPDWLRQ IURP GLVFUHWH QRLV\ GDWD 7KH LQWHUYDOV 7KH SHUVLVWHQFH LQWHUYDOV FDQ EH -DYDEDVHG LPSOHPHQWDWLRQ FDOOHG -3OH[ NH\ SURSHUWLHV WKDW PDNH LW VR XVHIXO DUH LWV GLVSOD\HG DV EDUFRGHV RU E\ XVLQJ RWKHU $OO RI WKH SHUVLVWHQFH EDUFRGHV DQG ' DELOLW\ WR WLH WRJHWKHU WRSRORJLFDO IHDWXUHV YLVXDOL]DWLRQ WHFKQLTXHV VLPSOLFLDO FRPSOH[HV LQ WKLV SDSHU ZHUH DSSHDULQJ RQ GLIIHUHQW VFDOHV DQG WKH 5HVWULFWLQJ WKH FRPSXWDWLRQ WR JHQHUDWHG XVLQJ 3OH[ DQG WKH DVVRFLDWHG H[LVWHQFH RI IDVW DOJRULWKPV WR FRPSXWH KRPRORJ\ RYHU ÀHOGV PDNHV FHUWDLQ 0$7/$% VFULSWV LW 3HUVLVWHQW KRPRORJ\ WHFKQLTXHV KDYH VKRUWFXWV SRVVLEOH LQ WKH OLQHDU DOJHEUD EHHQ DSSOLHG WR D QXPEHU RI SUREOHPV Homology and statistics $OVR LQVWHDG RI SHUIRUPLQJ VHSDUDWH LQFOXGLQJ QDWXUDO LPDJH DQDO\VLV >@ FRPSXWDWLRQV IRU HDFK VSDFH LQ WKH QHVWHG $ SHUVLVWHQFH EDUFRGH RU VHW RI PROHFXODU SURWHLQ VKDSHV >@ VXUIDFH VHTXHQFH RI VSDFHV WKH DOJRULWKP DFWXDOO\ SHUVLVWHQFH LQWHUYDOV JHQHUDWHG IURP D GHVFULSWLRQ >@ DQG VHQVRU QHWZRUN GRHV D VLQJOH KRPRORJ\ FRPSXWDWLRQ WKDW GDWD VHW LV D VWDWLVWLF ,Q DOO RI WKH H[DPSOHV FRYHUDJH >@ HQFRGHV DOO WKH LQIRUPDWLRQ DERXW ZKHUH LQ WKLV SDSHU WKLV VWDWLVWLF LV HYDOXDWHG LQ D 0XFK RI WKH UHVHDUFK LQ WKLV DUHD KDV LQ WKH VHTXHQFH GLIIHUHQW VLPSOLFHV DSSHDU IDLUO\ TXDOLWDWLYH ZD\ E\ ORRNLQJ IRU ORQJ EHHQ VXSSRUWHG E\ WKH 'HIHQVH $GYDQFHG OLQHV LQ WKH SHUVLVWHQFH EDUFRGH +RZHYHU 7KH UXQQLQJ WLPH RI WKH SHUVLVWHQFH 5HVHDUFK 3URMHFWV $JHQF\ '$53$ D PRUH REMHFWLYH ZD\ WR GLVWLQJXLVK DOJRULWKP WHQGV WR JURZ OLQHDUO\ ZLWK 7RSRORJLFDO 'DWD $QDO\VLV 7'$ DQG WKH QXPEHU RI VLPSOLFHV 7KH ZRUVW EHWZHHQ LQWHUHVWLQJ WRSRORJLFDO IHDWXUHV 6HQVRU 7RSRORJ\ IRU 0LQLPDO 3ODQQLQJ FDVH FRPSOH[LW\ LV VWLOO SRO\QRPLDO EXW DQG QRLVH LV QHFHVVDU\ 4XHVWLRQV OLNH 67R03 SURJUDPV 5REHUW *KULVW·V SHUIRUPDQFH WHQGV WR EH PXFK EHWWHU ´+RZ ORQJ GRHV D SHUVLVWHQFH LQWHUYDO UHFHQW DUWLFOH >@ FRQWDLQV D VXUYH\ RI WKDQ WKDW LQ SUDFWLFH 7KH PRVW VHULRXV QHHG WR EH EHIRUH EHLQJ FRQVLGHUHG VRPH UHVXOWV IURP WKH 7'$ SURJUDP FRPSXWDWLRQDO SUREOHP LV GXH WR WKH LQWHUHVWLQJ"µ DQG ´+RZ VHQVLWLYH LV D $Q H[DPSOH RI P\ RZQ DSSOLFDWLRQ QXPEHU RI VLPSOLFHV WKDW DSSHDU DV WKH SHUVLVWHQFH EDUFRGH WR QRLVH"µ PXVW EH RI SHUVLVWHQW KRPRORJ\ DSSHDUV ODWHU LQ VHTXHQFH RI WULDQJXODWLRQV LV FRQVWUXFWHG DQVZHUHG WKLV DUWLFOH DQG LQ PRUH GHWDLO LQ >@ 7KH 7KHUHIRUH WKH PRVW SUDFWLFDO XVH RI WKH 7KHVH SUREOHPV DUH MXVW EHJLQQLQJ JHQHUDO SURFHGXUH XVHG LQ P\ H[DPSOH LV SHUVLVWHQFH DOJRULWKP LV WR HPSOR\ D WR EH DGGUHVVHG )RU H[DPSOH LQ >@ WUDGLWLRQDO GDWD FOXVWHULQJ DOJRULWKP WR %XEHQLN DQG .LP FRPSXWH WKH H[SHFWHG ‡ 6WDUW ZLWK D GDWD VHW LGHQWLI\ FRQQHFWHG FRPSRQHQWV DQG WKHQ SHUVLVWHQFH EDUFRGHV IRU FHUWDLQ ‡ 'HÀQH D PHWULF GLVWDQFH IXQFWLRQ FRPSXWH WKH SHUVLVWHQW KRPRORJ\ RI HDFK SUREDELOLW\ GLVWULEXWLRQV RQ FLUFXODU DQG RQ WKH GDWD SRLQWV FOXVWHU LQGLYLGXDOO\ VSKHULFDO VSDFHV ,Q >@ &RKHQ6WHLQHU ‡ %XLOG D QHVWHG VHTXHQFH RI 2WKHU DSSURDFKHV KDYH EHHQ (GHOVEUXQQHU DQG +DUHU SUHVHQW UHVXOWV VLPSOLFLDO FRPSOH[HV EDVHG RQ WKH WDNHQ WR OLPLW WKH QXPEHU RI VLPSOLFHV RQ WKH VWDELOLW\ RI SHUVLVWHQFH GLDJUDPV PHWULF QHFHVVDU\ WR FRPSXWH SHUVLVWHQFH )RU RI IXQFWLRQV VRPHWKLQJ QRW DGGUHVVHG ‡ 8VH D SHUVLVWHQFH DOJRULWKP WR H[DPSOH D W\SH RI VLPSOLFLDO FRPSOH[ KHUH  ,Q WKH SURFHVV WKH\ GHÀQH D FRPSXWH SHUVLVWHQFH EDUFRGHV FDOOHG D ZLWQHVV FRPSOH[ UHGXFHV WKH IXQFWLRQ IRU PHDVXULQJ WKH ´GLVWDQFHµ ‡ ,QWHUSUHW WKH UHVXOWV

44 Clumps, Hoops, and Bubbles FEATURE

Encounter traces FRPPRQ 7KHVH HQFRXQWHUV RFFXUUHG& &DW x x 7KH SHUIRUPDQFH RI ZLUHOHVV WZR SDUWLFXODU &SRLQWV&LQ VSDFH i DQG j QHWZRUNV ZLWK PRELOH QRGHV LV LQÁXHQFHG 7KH ORFDWLRQV xi RU xj DUH QRW NQRZQ EXW E\WKHPRELOLW\RIWKHQRGHV8QIRUWXQDWHO\ LI WKH WZR HQFRXQWHUV DUH FORVH LQ WLPH QRGH PRELOLW\ LV IDQWDVWLFDOO\ FRPSOLFDWHG WKHQ WKH\ PXVW EH FORVH LQ VSDFH ,W FDQ 5HVHDUFKHUV KDYH IRFXVHG LQVWHDG RQ WKH EH GHGXFHG QRGH HQFRXQWHU SDWWHUQV WKDW WKH PRELOLW\ "WL  WM" 7 ‰ "[L  [M" 7 à YPD[ SURGXFHV 7R WKLV HQG WKHUH KDYH EHHQ 7KHUHIRUH LI WZR HQFRXQWHUV HL VHYHUDO H[SHULPHQWV WKDW WDJ SHRSOH RU Figure 11: The topology of the space DQLPDOV ZLWK ZLUHOHVV PRWHV VPDOO VKRUW affects the type of encounter patterns that DQG HM KDYH D QRGH LQ FRPPRQ WKH\ UDQJH %OXHWRRWK UDGLRV WKDW UHFRUG ZKLFK are possible. If the space is like a line, node FDQ EH FRQQHFWHG ZLWK DQ HGJH ZLWK A cannot encounter node C without one of ZHLJKW "W  W" 7KLV DOORZV XV WR GHÀQH RWKHU PRWHV WKH\ FRPH LQ FRQWDFW ZLWK them encountering node B. If the space is L M DQG ZKHQ 7KHVH H[SHULPHQWV SURGXFH like a loop, nodes A and C can encounter D PHWULF RQ WKH VHW RI HQFRXQWHUV each other without encountering node B. HQFRXQWHU WUDFHV D VHULHV RI GDWD SRLQWV ¨d G ( ,ee ji ) if ei and e j are connected in G d( ,ee ji )  © HDFK FRQVLVWLQJ RI WKH HQFRXQWHU WLPH DQG ª' otherwise WKH ,'V RI WKH WZR QRGHV LQYROYHG GR QRW HYHQ FRQWDLQ UHODWLYH SRVLWLRQ ZKHUH G H H LV WKH PLQLPXP GLVWDQFH (QFRXQWHU WUDFH H[SHULPHQWV LQFOXGH LQIRUPDWLRQ 7KH FRQFHSW PDNHV G L M EHWZHHQ WKH YHUWLFHV H DQG H LQ WKH WKH IDPRXV +DJJOH SURMHFW H[SHULPHQWV VRPH LQWXLWLYH VHQVH KRZHYHU ZKHQ L M ZHLJKWHG JUDSK G >@ DQG D VWXGHQW H[SHULPHQW DW 87 FRQVLGHULQJ )LJXUH  ZKLFK LOOXVWUDWHV $XVWLQ >@ :LUHOHVV /$1 WUDFHV VXFK KRZ WKH WRSRORJ\ RI D VSDFH KDV DQ HIIHFW 7KH UHYHUVH RI WKH LPSOLFDWLRQ DERYH WKH 0,7 WUDFH >@ WKH 8&6' WUDFH RQ WKH W\SHV RI HQFRXQWHU SDWWHUQV WKDW DUH LV QRW YDOLG 7ZR HQFRXQWHUV PD\ KDSSHQ >@ DQG WKH 'DUWPRXWK WUDFH >@ DUH SRVVLEOH DW WKH VDPH SK\VLFDO VSRW EXW EH IDU DSDUW DOVR FRPPRQO\ UHSXUSRVHG IRU XVH DV Defining a metric on an LQ WLPH %HFDXVH RI WKLV LW LV EHWWHU WR WKLQN HQFRXQWHU WUDFHV >@ encounter trace RI WKH VSDFH EHLQJ VWXGLHG DV WKH SK\VLFDO VSDFH FURVVHG ZLWK WLPH 7KLV LGHD ZDV 7KH GDWD SRLQWV LQ DQ HQFRXQWHU %XLOGLQJ D ZHLJKWHG JUDSK RQ HYLGHQW LQ )LJXUH  )RU H[DPSOH WUDFH FRQVLVW RI WKH ,'V RI WKH WZR WKH VHW RI HQFRXQWHUV ZLOO JLYH ULVH WR D LI WKH QRGHV DUH PRYLQJ RQ D FLUFOH ZLUHOHVV QRGHV LQYROYHG DQG WKH WLPH RI PHWULF 7KLV PHWULF ZLOO EH GLIIHUHQW IURP ; 6 ½ ZLWK FRRUGLQDWHV [\  WKH HQFRXQWHU )RU H[DPSOH D VHFWLRQ RI WKH (XFOLGHDQ PHWULF DVVRFLDWHG ZLWK WKH WKHQ WKH VSDFH ZKRVH WRSRORJ\ VKRXOG WKH WUDFH FRXOG ORRN OLNH SK\VLFDO VSDFH RI WKH H[SHULPHQW EH UHFRQVWUXFWHG LV D F\OLQGHU ZLWK $VVXPH WKDW DQ HQFRXQWHU WUDFH Node ID 1 Node ID 2 FRRUGLQDWHV [\W  ; = ½  ZKLFK LV FRQWDLQV 1 GDWD SRLQWV RI WKH IRUP 9:42:30 20 12 KRPRORJLFDOO\ HTXLYDOHQW WR ; VR WKH H W  QRGH$  QRGH%     %HWWL QXPEHUV RI WKH SURGXFW VSDFH EHLQJ 9:47:01 72 31 H W  QRGH$  QRGH%     UHFRQVWUXFWHG ZLOO EH WKH VDPH DV WKH %HWWL 9:47:21 58 20  QXPEHUV RI ; 10:02:55 64 45   Building a witness complex ...... H W  QRGH$  QRGH% 1 1 1 1 $ VHW RI GDWD SRLQWV !HL#L 1 ZLWK

7KHVH GDWD SRLQWV SURYLGH YHU\ OLWWOH +HUH HL UHSUHVHQWV WKH LWK HQFRXQWHU D PHWULF RQ WKHP LV QRZ LQ SODFH 7KHUH

LQIRUPDWLRQ ,Q SDUWLFXODU WKHUH LV QRW DQ\ FRQVLVWLQJ RI WL WKH WLPH WKH HQFRXQWHU DUH D YDULHW\ RI ZD\V WR EXLOG D QHVWHG

H[SOLFLW LQIRUPDWLRQ DERXW WKH ORFDWLRQV WRRN SODFH DQG QRGH$L DQG QRGH%L VHTXHQFH RI VLPSOLFLDO FRPSOH[HV IURP RI WKH QRGHV 3HUVLVWHQW KRPRORJ\ WKH WZR QRGHV LQYROYHG LQ WKH HQFRXQWHU WKHVH GDWD EXW EXLOGLQJ WKH ZLWQHVV WHFKQLTXHV DUH XVHG WR GHGXFH LQIRUPDWLRQ 6RPH PD[LPXP QRGH YHORFLW\ LV FRPSOH[ LQ WKH PDQQHU RI GH 6LOYD DQG

DERXW WKH WRSRORJ\ RI WKH VSDFH WKH QRGHV DVVXPHG YPD[ &DUOVVRQ >@ VHHPV WR JLYH WKH PRVW OLYH LQ IURP WKH HQFRXQWHU WUDFH 7KH VDPH &RQVWUXFW D ZHLJKWHG JUDSK G HIÀFLHQW WULDQJXODWLRQ DQG WKH EHVW UHVXOWV WHFKQLTXHV FDQ EH XVHG WR GHWHFW FHUWDLQ LQ ZKLFK WKH YHUWLFHV FRUUHVSRQG WR 7KH FRQFHSW RI WKH ZLWQHVV FRPSOH[ FKDQJHV LQ WKH VSDFH WKH HQFRXQWHUV !HL#L 1 7KH LGHD LV LV EDVHG RQ WKH 'HODXQD\ WULDQJXODWLRQ 6XUSULVLQJO\ SK\VLFDO LQIRUPDWLRQ VXPPDUL]HG LQ )LJXUH  6XSSRVH WZR >@ 7KH ÀUVW VWHS LV WR VHOHFW D VXEVHW

FDQ EH GHGXFHG IURP GDWD SRLQWV WKDW HQFRXQWHUV HL DQG HM KDYH D QRGH LQ RI ODQGPDUN GDWD SRLQWV WR XVH LQ WKH

The Next Wave „ Vol 18 No 3 „ 2010 45 DQDO\VLV 7KH UHPDLQLQJ GDWD SRLQWV DUH VRPH SDUWV DQG DEDQGRQ RWKHUV 6XFK XVHG WR GHFLGH ZKLFK RI WKH ODQGPDUNV WR UHVHDUFK LV RI SDUWLFXODU LQWHUHVW JLYHQ FRQQHFW 7KH FRQVWUXFWLRQ LQFRUSRUDWHV WKH SURPLQHQW UROH QDNHG PROH UDWV KDYH D YDULDEOH WKUHVKROG 7 WKDW FDQ EH XVHG SOD\HG LQ PRYLHV >@ DQG WHOHYLVLRQ WR EXLOG D QHVWHG VHTXHQFH RI VLPSOLFLDO SURJUDPV >@ FRPSOH[HV 6WXG\LQJ WKH HQFRXQWHU WUDFHV Experiments and results RI QDNHG PROH UDWV LV VLPSOHU EHFDXVH WKH WXQQHOV WKH\ OLYH LQ DUH HIIHFWLYHO\ 7KH ÀUVW VWHS IRFXVHV RQ HQFRXQWHU RQHGLPHQVLRQDO VSDFHV 7ZR PROH UDWV WUDFHV JHQHUDWHG IURP VLPXODWLRQV 7KLV FDQQRW DYRLG HDFK RWKHU ZKHQ SDVVLQJ LQ DFFHVV DOORZV IRU WKH DGGLWLRQ RI FHUWDLQ D WXQQHO FKDUDFWHULVWLFV WR WKH H[SHULPHQW ZKLFK ZLOO DOORZ IRU REVHUYDWLRQ RI KRZ WKH 2XU ÀUVW H[SHULPHQWV DUH Figure 13: Naked mole rats live in networks VLPXODWLRQV RI QRGHV PROH UDWV GRLQJ of underground tunnels. These tunnels are FKDQJHV PDQLIHVW WKHPVHOYHV LQ WKH effectively one-dimensional spaces. This is UHVXOWV UDQGRP ZDONV LQ RQHGLPHQVLRQDO VSDFHV what the squirrel in Figure 1 would look like (QFRXQWHUV DUH UHFRUGHG WKH PRPHQW WZR without fur. One-dimensional experiments QRGHV SDVV HDFK RWKHU $ VLPSOH HYHQW 5HPHPEHU WKH VTXLUUHO H[DPSOH GULYHQ VLPXODWRU ZDV EXLOW WR JHQHUDWH ,Q WKH VLQJOHORRS H[SHULPHQWV DW WKH EHJLQQLQJ RI WKLV SDSHU 6LQFH WKHVH GDWD  QRGHV IROORZHG UDQGRP ZDONV LQ D VTXLUUHOV FDQ PRYH SUHWW\ PXFK IUHHO\ &RPSDUH WKUHH W\SHV RI ,' FLUFXODU VSDFH 7KLV VSDFH DOVR KDV RQH DURXQG D WZRGLPHQVLRQDO DUHD WKHLU H[SHULPHQWV FRQQHFWHG FRPSRQHQW VR ơ  EXW HQFRXQWHU WUDFHV ZLOO EH D OLWWOH FRPSOH[ ‡ $ OLQH VHJPHQW LW KDV D VLQJOH QRQWULYLDO F\FOH VR 7R VWDUW ZLWK VRPHWKLQJ VLPSOHU FRQVLGHU ơ ‡ $ VLQJOH ORRS   %\ EXLOGLQJ D ÀOWHUHG ZLWQHVV DQ LPDJLQDU\ H[DPSOH LQYROYLQJ WKH ‡ $ PXOWLORRS FRPSOH[ DQG FRPSXWLQJ WKH SHUVLVWHQW VTXLUUHO·V KDUPOHVV FRXVLQ WKH QDNHG PROH KRPRORJ\ WKH FRUUHFW %HWWL QXPEHUV DUH UDW )LJXUH   1DNHG PROH UDWV VSHQG 7KH OLQH VHJPHQW H[SHULPHQW UHFRYHUHG  SHUFHQW RI WKH WLPH XVHG  QRGHV IROORZLQJ UDQGRP ZDONV WKHLU OLYHV LQ QHWZRUNV RI XQGHUJURXQG ,Q WKH PXOWLORRS H[SHULPHQWV 7RSRORJLFDOO\ WKLV VSDFH KDV RQH WXQQHOV 8QGHUVWDQGLQJ WKH QDNHG PROH DGGLQJ  QRGHV SHU H[WUD ORRS WHQGV WR FRQQHFWHG FRPSRQHQW DQG QR KLJKHU UDWV· EXUURZLQJ KDELWV ZRXOG UHTXLUH JHQHUDWH HQRXJK HQFRXQWHUV WR UHFRQVWUXFW GLPHQVLRQDO WRSRORJLFDO IHDWXUHV VR HLWKHU H[FDYDWLQJ WKHLU EXUURZ DQG WKH VSDWLDO WRSRORJ\ 7KH QRGH PRELOLW\ GHVWUR\LQJ LW LQ WKH SURFHVV RU JDWKHULQJ ơ  DQG ơ  IRU N !   6LQFH WKH IXOO\  N LV WKH VDPH DV EHIRUH 7KH FRUUHFW %HWWL FRQWUDFWLEOH WRSRORJ\ DOZD\V UHVXOWV IURP DQ HQFRXQWHU WUDFH IURP WKH PROH UDWV QXPEHUV DUH ơ  VLQFH WKHUH LV RQH VHWWLQJ WKH ZLWQHVV FRPSOH[ WKUHVKROG  WKHPVHOYHV 7KH WRSRORJLFDO FRQQHFWLYLW\ FRQQHFWHG FRPSRQHQW DQG ơ O ZKHUH KLJK HQRXJK LW LV GLIÀFXOW WR TXDQWLI\ KRZ  RI WKH EXUURZ FRXOG EH GLVFRYHUHG DV D O LV WKH QXPEHU RI ORRSV 7KH VDPH ZHOO WKH PHWKRG LV ZRUNLQJ IRU WKLV W\SH UHVXOW RI WKH HQFRXQWHU WUDFH DQG FKDQJHV WHFKQLTXH DV EHIRUH FRUUHFWO\ UHFRYHUHG RI VSDFH FRXOG EH GHWHFWHG DV WKH PROH UDWV H[WHQG WKLV LQIRUPDWLRQ IRU DOO WZR DQG WKUHH ORRS H[DPSOHV DWWHPSWHG )LJXUH  VKRZV D ZLWQHVV FRPSOH[ DQG SHUVLVWHQFH EDUFRGH IRU D WZRORRS H[SHULPHQW Interpreting the results 6LQFH WKHVH GDWD ZHUH JHQHUDWHG IURP FRQWUROOHG VLPXODWLRQV LW ZDV NQRZQ LQ DGYDQFH KRZ WR LQWHUSUHW WKH UHVXOWV 7KH RQHGLPHQVLRQDO WRSRORJLFDO IHDWXUHV GLVFRYHUHG FRUUHVSRQG WR ORRSV LQ WKH VSDFH WKH QRGHV OLYH LQ Figure 12: An example of four encounters involving four nodes and the resulting weighted graph. Each encounter becomes a vertex in the graph, and encounters with ,Q JHQHUDO LQWHUSUHWLQJ SHUVLVWHQFH nodes in common are weighted with the time difference. UHVXOWV LV QRW VR VWUDLJKWIRUZDUG -XVW DV 46 Clumps, Hoops, and Bubbles FEATURE

ZKLOH WKH QRGH YHORFLWLHV ZHUH WKH VDPH DV XVXDO 7KHQ WKH JUDSK ZDV VFDOHG XS DW D FRQVWDQW UDWH UHDFKLQJ QRUPDO VL]H HGJH

OHQJWK  DW WLPH WPLG )LQDOO\ WKH JUDSK ZDV VFDOHG EDFN GRZQ WR WKH LQLWLDO VL]H DW WKH RSSRVLWH UDWH WKDW LW ZDV H[SDQGHG

HQGLQJ DW WHQG 7KH SHUVLVWHQW KRPRORJ\ DOJRULWKP GRHV UHFRYHU WKH FRUUHFW %HWWL QXPEHUV )LJXUH  VKRZV WKH ZLWQHVV FRPSOH[ UHFRYHUHG IURP WKLV H[SHULPHQW 6LQFH WKH DWWHPSW ZDV WR UHFRYHU WKH VHFRQG %HWWL QXPEHU LW ZDV QHFHVVDU\ WR ÀOO LQ VLPSOLFHV LQ WKH ZLWQHVV FRPSOH[ 7KH QXPEHU RI VLPSOLFHV LQ D FRPSOH[ WHQGV

0 WR LQFUHDVH UDSLGO\ ZLWK GLPHQVLRQ

Betti ZKLFK ZDV WKH FDVH KHUH 7KHUH ZHUH  FHOOV  FHOOV  FHOOV DQG  FHOOV Detecting changes in a 2D space /HW XV UHWXUQ WR WKH VTXLUUHO H[DPSOH IURP WKH LQWURGXFWLRQ 6TXLUUHOV DUH QRW UHVWULFWHG WR OLQHDU DQG ORRSVKDSHG

1 VSDFHV ,W LV LPSRUWDQW WR GHWHUPLQH LI Betti UHFRYHU\ RI WRSRORJLFDO LQIRUPDWLRQ IURP QRGHV OLYLQJ LQ D PRUH JHQHUDO VSDFH LV SRVVLEOH $Q H[SHULPHQW WRRN SODFH LQ ZKLFK  VLPXODWHG VTXLUUHOV SHUIRUPHG Figure 14: The witness complex and persistence barcode for an experiment in the two-loop space. The x and y coordinates in the witness complex plot correspond to the GLVFUHWH UDQGRP ZDONV RQ D ERXQGHG WZR physical location of the encounters and are used to visualize the results, but are not GLPHQVLRQDO JULG $IWHU  VLPXODWLRQ used in the computation. The z-axis corresponds to the encounter time. VWHSV WKH VTXLUUHOV· PRELOLW\ PRGHO LV FKDQJHG VR WKDW WKH VTXLUUHOV DUH UHSHOOHG ZLWK GDWD FOXVWHULQJ WKLV DQDO\VLV H[DPSOH D VLQJOH ORRS WKDW VWDUWV RXW E\ WKH FHQWHU RI WKH JULG 7KLV UHSXOVLYH WHFKQLTXH FDQ ÀQG VWUXFWXUH LQ D GDWD VHW VPDOO HQODUJHV WR D FHUWDLQ H[WHQW DQG WKHQ IRUFH FDXVHV WKHP WR FRQJUHJDWH QHDU EXW LW FDQQRW H[SODLQ ZKDW WKDW VWUXFWXUH VKULQNV GRZQ DJDLQ $VVXPLQJ WKH WLPH WKH ERXQGDU\ RI WKH VSDFH 7KHQ DIWHU PHDQV ,W FDQ EH XVHG LQ H[SORUDWRU\ IRU D QRGH WR FLUFXPQDYLJDWH WKH VKUXQNHQ DQRWKHU  VWHSV WKH UHSXOVLYH IRUFH DQDO\VLV WR EHWWHU IRFXV DQ\ IXUWKHU ORRS LV FRPSDUDEOH WR WKH WLPHV EHWZHHQ LV UHPRYHG DQG WKH VTXLUUHOV UDQGRPO\ ÀOO LQYHVWLJDWLRQ HQFRXQWHUV RQ WKH H[SDQGHG ORRS WKH XS WKH JULG DJDLQ 7KH H[DPSOHV LQ WKH IROORZLQJ VKUXQNHQ ORRS ZLOO DSSHDU FRQWUDFWLEOH 7KH HQFRXQWHU FRPSOH[ GXULQJ WKH VHFWLRQV SURYLGH VRPH FDVHV ZKHUH 7KH SHUVLVWHQW KRPRORJ\ RI WKH HQFRXQWHU UDQGRP ZDON SKDVH RI WKH H[SHULPHQW LQWHUSUHWDWLRQ RI WKH UHVXOWV LV QRW VR FRPSOH[ VKRXOG DSSUR[LPDWH WKDW RI D ZRXOG EH H[SHFWHG WR KDYH QR UHDO VWUDLJKWIRUZDUG VSKHUH WKDW LV ơ  ơ  DQG ơ  WRSRORJLFDO IHDWXUHV 2Q WKH RWKHU KDQG 7KLV H[SHULPHQW ZDV SHUIRUPHG WKH HQFRXQWHUV GXULQJ WKH PLGGOH SKDVH Detecting changes in a space XVLQJ WKH VDPH VLPXODWRU DV EHIRUH $W RI WKH H[SHULPHQW ZKHQ WKH VTXLUUHOV DUH

3HUVLVWHQW KRPRORJ\ FDQ GHWHFW WLPH WVWDUW WKH JUDSK ZDV VFDOHG GRZQ E\ UHSHOOHG E\ WKH FHQWHU RI WKH JULG VKRXOG FHUWDLQ W\SHV RI FKDQJHV LQ D VSDFH IRU D IDFWRU RI  UHODWLYH WR WKH UHJXODU VL]H KDYH WKH KRPRORJ\ RI D ORRS VHH )LJXUH

The Next Wave „ Vol 18 No 3 „ 2010 47 Figure 15: The witness complex obtained from the Figure 16: The witness complex obtained for the repulsion phase of the expanding/contracting loop experiment. The Betti 2D random walk experiment. The number of simplices made it intractable numbers of a sphere were recovered, demonstrating to compute the homology of the entire data set at once. that persistent homology can detect certain types of changes in the physical space.

  $PD]LQJO\ WKHVH IHDWXUHV IRU ERWK 3HUVLVWHQW %HWWL QXPEHUV ZHUH ,Q WKLV FDVH WKH REVHUYHG IHDWXUHV SKDVHV ZHUH UHFRYHUHG &RPELQLQJ DOO FRPSXWHG IRU GLPHQVLRQV  DQG  IRU HDFK DUH SUREDEO\ QRW GXH WR WKH VSDFH WKH WKUHH SKDVHV LQWR RQH FRPSXWDWLRQ VKRXOG GD\·V GDWD LQGLYLGXDOO\ 7KH SHUVLVWHQFH H[SHULPHQW ZDV FRQGXFWHG LQ WKRXJK \LHOG D QRQ]HUR VHFRQG %HWWL QXPEHU GLDJUDPV IRU WKH ÀUVW GD\ DUH VKRZQ WKDW FDQQRW EH UXOHG RXW HLWKHU %DVHG PXFK OLNH WKH H[SDQGLQJFRQWUDFWLQJ LQ )LJXUH  $ VLQJOH IDLUO\ SHUVLVWHQW RQ WKH VLPXODWLRQ H[SHULPHQWV PDQ\ ORRS H[SHULPHQW 8QIRUWXQDWHO\ WKH F\FOH RQ WKH VFDOH RI  PLQXWHV ZDV PRUH PRELOH QRGHV ZRXOG EH UHTXLUHG QXPEHU RI VLPSOLFHV LQYROYHG PDGH WKH REVHUYHG LQ WKH UHVXOWV IRU HDFK GD\ 7KH WR UHOLDEO\ LGHQWLI\ D VSDWLDO F\FOH 7KH FRPELQHG FRPSXWDWLRQ LQWUDFWDEOH ZLWK F\FOH ZDV SDUWLFXODUO\ SURPLQHQW RQ GD\ PRVW OLNHO\ H[SODQDWLRQ IRU WKHVH UHVXOWV WKH 0$7/$% PRGXOHV RQH EXW VLPLODU IHDWXUHV DSSHDU LQ GD\V LV VRPH VRUW RI VFKHGXOLQJ )RU H[DPSOH Experiments with real WZR DQG WKUHH D JURXS RI PRWH FDUULHUV PD\ HQFRXQWHU encounter data 7KH +DJJOH 3URMHFW HQFRXQWHU GDWD 8

LQFOXGHV GDWD IURP WKUHH H[SHULPHQWV 6

7KH VDPH PHWKRGV ZHUH XVHG LQ WKH 0 4 SUHYLRXV VHFWLRQV ZHUH DSSOLHG WR DQDO\]H Betti WKH GDWD IURP WKH &DPEULGJH &RPSXWHU 2 /DE H[SHULPHQW 0 $FFRUGLQJ WR WKH GRFXPHQWDWLRQ 0 500 1000 1500 2000 2500 3000 3500 WKH H[SHULPHQW ZDV FRQGXFWHG RYHU VHYHQ GD\V LQ -DQXDU\  DW WKH 8QLYHUVLW\ RI &DPEULGJH &RPSXWHU /DE 1LQHWHHQ 6 L0RWHV ZHUH FDUULHG E\ JUDGXDWH VWXGHQWV

1 4 IURP WKH 6\VWHP 5HVHDUFK *URXS 2QO\ Betti  RI WKH PRELOH PRWHV \LHOGHG XVDEOH 2 GDWD DQG DQ DGGLWLRQDO  H[WHUQDO %OXHWRRWK GHYLFHV DSSHDUHG LQ WKH WUDFHV 0 0 500 1000 1500 2000 2500 3000 3500 )RU WKH DQDO\VLV HQFRXQWHUV ZLWK H[WHUQDO GHYLFHV ZHUH ÀOWHUHG RXW DQG WKH GDWD Figure 17: The persistence barcodes from the first day of the Haggle Cambridge Computer VRUWHG E\ HQFRXQWHU WLPH WUXVWLQJ WKDW Lab experiment. A fairly persistent 1-cycle on the scale of 50-65 minutes was observed. WKH L0RWH FORFNV ZHUH VXIÀFLHQWO\ ZHOO These results cannot conclusively be explained, but they do demonstrate how persistent homology can reveal topological structure in real encounter trace data. V\QFKURQL]HG WKDW WKLV VRUWLQJ PDGH VHQVH

48 Clumps, Hoops, and Bubbles FEATURE

DQRWKHU JURXS DW D PHHWLQJ RU FODVV DQG References WKHQ DQRWKHU HQFRXQWHU PD\ WDNH SODFH >@ 7 6PDOO DQG = - +DDV 7KH VKDUHG ZLUHOHVV LQIRVWDWLRQ PRGHO D QHZ DG KRF QHWZRUNLQJ SDUDGLJP DW OXQFK 8QIRUWXQDWHO\ QRW HQRXJK LV RU ZKHUH WKHUH LV D ZKDOH WKHUH LV D ZD\  ,Q 0REL+RF · 3URFHHGLQJV RI WKH WK $&0 ,QWHUQDWLRQDO NQRZQ DERXW WKH SDUWLFXODU H[SHULPHQW 6\PSRVLXP RQ 0RELOH $G +RF 1HWZRUNLQJ &RPSXWLQJ SDJHV ² 1HZ @ 3 -XDQJ + 2NL < :DQJ 0 0DUWRQRVL / 3HK DQG ' 5XEHQVWHLQ (QHUJ\HIÀFLHQW FRPSXWLQJ EH ZRUWK SHUIRUPLQJ D VLPLODU H[SHULPHQW IRU ZLOGOLIH WUDFNLQJ 'HVLJQ WUDGHRIIV DQG HDUO\ H[SHULHQFHV ZLWK =HEUD1HW ,Q $63/26 6DQ -RVH &$ 2FWREHU  WKDW UHFRUGV PRUH GHWDLOV DQG LV SHUKDSV >@ 86 )LVK DQG :LOGOLIH 6HUYLFH 3DWX[HQW UHVHDUFK UHIXJH YLVLWRU EURFKXUH >2QOLQH@ $YDLODEOH KWWS PRUH WLJKWO\ FRQWUROOHG WR VHH KRZ ZZZIZVJRYQRUWKHDVW3DWX[HQWSUULQIRSDJHKWPO GLIIHUHQW EHKDYLRUV DIIHFW WKH SHUVLVWHQW >@ * ( %UHGRQ 7RSRORJ\ DQG *HRPHWU\ 6SULQJHU  KRPRORJ\ UHVXOWV >@ $ +DWFKHU $OJHEUDLF 7RSRORJ\ &DPEULGJH 8QLYHUVLW\ 3UHVV  7KHVH UHVXOWV VKRZ WKDW WRSRORJLFDO >@ . 0LVFKDLNRZ DQG HW DO &KRPS &RPSXWDWLRQDO KRPRORJ\ SURMHFW >2QOLQH@ $YDLODEOH KWWS FKRPSUXWJHUVHGX DQDO\VLV PHWKRGV VXFK DV SHUVLVWHQW >@ + (GHOVEUXQQHU ' /HWVFKHU DQG $ =RPRURGLDQ 7RSRORJLFDO SHUVLVWHQFH DQG VLPSOLÀFDWLRQ ,Q KRPRORJ\ FDQ ÀQG VWUXFWXUH LQ UHDO )2&6 · 3URFHHGLQJV RI WKH VW $QQXDO 6\PSRVLXP RQ )RXQGDWLRQV RI &RPSXWHU 6FLHQFH SDJH  :DVKLQJWRQ '& 86$  ,((( &RPSXWHU 6RFLHW\ HQFRXQWHU GDWD WKDW PD\ QRW KDYH EHHQ >@ $ =RPRURGLDQ DQG * &DUOVVRQ &RPSXWLQJ SHUVLVWHQW KRPRORJ\ ,Q 6&* · 3URFHHGLQJV RI WKH DFFHVVLEOH YLD WUDGLWLRQDO VWDWLVWLFDO 7ZHQWLHWK $QQXDO 6\PSRVLXP RQ &RPSXWDWLRQDO *HRPHWU\ SDJHV ² 1HZ @ 9 GH 6LOYD DQG * &DUOVVRQ 7RSRORJLFDO HVWLPDWLRQ XVLQJ ZLWQHVV FRPSOH[HV ,Q 6\PSRV 3RLQW%DVHG Conclusions *UDSKLFV SDJHV ²  >@ 3OH[ 3HUVLVWHQW +RPRORJ\ &RPSXWDWLRQV >2QOLQH@ $YDLODEOH KWWSFRPSWRSVWDQIRUGHGX $OJHEUDLF WRSRORJ\ JLYHV XV >@ 3 %XEHQLN DQG 3 7 .LP $ VWDWLVWLFDO DSSURDFK WR SHUVLVWHQW KRPRORJ\ +RPRORJ\ +RPRWS\ DQG SRZHUIXO WRROV IRU XQFRYHULQJ IHDWXUHV $SSOLFDWLRQV   ²  WKDW PD\ QRW EH DFFHVVLEOH WKURXJK >@ ' &RKHQ6WHLQHU + (GHOVEUXQQHU DQG - +DUHU 6WDELOLW\ RI SHUVLVWHQFH GLDJUDPV ,Q 6&* · 3URFHHGLQJV RI WKH 7ZHQW\ÀUVW $QQXDO 6\PSRVLXP RQ &RPSXWDWLRQDO *HRPHWU\ SDJHV ² $&0 WUDGLWLRQDO VWDWLVWLFDO PHWKRGV DQG  SRZHUIXO DQG HOHJDQW ZD\V RI GHVFULELQJ >@ 9 GH 6LOYD DQG 5 *KULVW &RYHUDJH LQ VHQVRU QHWZRUNV YLD SHUVLVWHQW KRPRORJ\ $OJHEUDLF *HRPHWULF 7RSRORJ\ ²  WKHVH SKHQRPHQD 7KHVH WRROV ZLOO >@ ) &KD]DO DQG 6 < 2XGRW 7RZDUGV SHUVLVWHQFHEDVHG UHFRQVWUXFWLRQ LQ HXFOLGHDQ VSDFHV ,Q 6&* EH XVHIXO IRU GLVFRYHULQJ VWUXFWXUH LQ · 3URFHHGLQJV RI WKH 7ZHQW\IRXUWK $QQXDO 6\PSRVLXP RQ &RPSXWDWLRQDO *HRPHWU\ SDJHV ² FRPSOH[ GDWD $&0  >@ * &DUOVVRQ 7 ,VKNKDQRY 9 GH 6LOYD DQG $ =RPRURGLDQ 2Q WKH ORFDO EHKDYLRU RI VSDFHV RI QDWXUDO LPDJH ,QWHUQDWLRQDO -RXUQDO RI &RPSXWHU 9LVLRQ ²  >@ + (GHOVEUXQQHU DQG - +DUHU 3HUVLVWHQW KRPRORJ\  D VXUYH\ ,Q 7ZHQW\ @ ( &DUOVVRQ * &DUOVVRQ DQG 9 ' 6LOYD $Q DOJHEUDLF WRSRORJLFDO PHWKRG IRU IHDWXUH LGHQWLÀFDWLRQ ,QW - &RPSXW *HRPHWU\ $SSO   ²  >@ 5*KULVW %DUFRGHV 7KH SHUVLVWHQW WRSRORJ\ RI GDWD $06 &XUUHQW (YHQWV %XOOHWLQ     >@ % :DONHU 8VLQJ SHUVLVWHQW KRPRORJ\ WR UHFRYHU VSDWLDO LQIRUPDWLRQ IURP HQFRXQWHU WUDFHV ,Q 0REL+RF · 3URFHHGLQJV RI WKH WK $&0 ,QWHUQDWLRQDO 6\PSRVLXP RQ 0RELOH $G +RF 1HWZRUNLQJ DQG &RPSXWLQJ SDJHV ² 1HZ @ 3 +XL $ &KDLQWUHDX - 6FRWW 5 *DVV - &URZFURIW DQG & 'LRW 3RFNHW VZLWFKHG QHWZRUNV DQG KXPDQ PRELOLW\ LQ FRQIHUHQFH HQYLURQPHQWV ,Q :'71 · 3URFHHGLQJ RI WKH  $&0 6,*&200 :RUNVKRS RQ 'HOD\WROHUDQW 1HWZRUNLQJ SDJHV ² 1HZ @ 6 6KDNNRWWDL 7HFKQLFDO UHSRUW RQ %OXHWRRWK PRWH HQFRXQWHU H[SHULPHQWV 7HFKQLFDO UHSRUW 87 $XVWLQ LQ SUHSDUDWLRQ >@ 0 %DOD]LQVND DQG 3 &DVWUR &KDUDFWHUL]LQJ PRELOLW\ DQG QHWZRUN XVDJH LQ D FRUSRUDWH ZLUHOHVV ORFDODUHD QHWZRUN ,Q VW ,QWHUQDWLRQDO &RQIHUHQFH RQ 0RELOH 6\VWHPV $SSOLFDWLRQV DQG 6HUYLFHV 0REL6\V  6DQ )UDQFLVFR &$ 0D\  >@ 0 0FQHWW DQG * 0 9RHONHU $FFHVV DQG PRELOLW\ RI ZLUHOHVV 3'$ XVHUV 6,*02%,/( 0RE &RPSXW &RPPXQ 5HY   ² $SULO  >@ 7 +HQGHUVRQ ' .RW] DQG , $E\]RY 7KH FKDQJLQJ XVDJH RI D PDWXUH FDPSXVZLGH ZLUHOHVV QHWZRUN ,Q 0REL&RP · 3URFHHGLQJV RI WKH WK $QQXDO ,QWHUQDWLRQDO &RQIHUHQFH RQ 0RELOH &RPSXWLQJ DQG 1HWZRUNLQJ SDJHV ² 1HZ @ :- +VX DQG $ +HOP\ 2Q QRGDO HQFRXQWHU SDWWHUQV LQ ZLUHOHVV /$1 7UDFHV ,Q 3URFHHGLQJV RI WKH 6HFRQG :RUNVKRS RQ :LUHOHVV 1HWZRUN 0HDVXUHPHQWV :L10HH   >@ 9 GH 6LOYD $ ZHDN FKDUDFWHUL]DWLRQ RI WKH 'HODXQD\ WULDQJXODWLRQ >2QOLQH@ $YDLODEOH KWWSSDJHV SRPRQDHGX YGVSXEOLF SDSHUVGH6LOYD :HDN'HODXQD\SGI >@ ( 0RUULV )DVW &KHDS 2XW RI &RQWURO >'9'@ &ROXPELD 7UL6WDU +RPH (QWHUWDLQPHQW  >@ :DOW 'LVQH\ 7HOHYLVLRQ $QLPDWLRQ 'LVQH\·V .LP 3RVVLEOH 'LVQH\ &KDQQHO  >2QOLQH@ $YDLODEOH KWWSWYGLVQH\JRFRPGLVQH\FKDQQHONLPSRVVLEOH

The Next Wave „ Vol 18 No 3 „ 2010 49 The Information Assurance and Network MissMission at NSA Integrity through Trusted Computing

accumulated within the TPM. If the same compo- point can decide whether the host should be al- Control nents are measured at a later time and the mea- lowed on the network. Take of surements have changed, then the components have changed. This mechanism can be used to Recommendations your network detect whether system software has been infected Trusted Computing Technologies can provide net- with malware. work administrators with basic information about Overview host integrity without expensive hardware or - One of the biggest challenges facing computer Measured Boot and cessive administrative overhead. network administrators today is keeping track of Measured Launch The potential benefits of trusted computing are the hosts on their networks. Without this knowl- Measurement is a powerful capability for well worth the minimal investment. While today edge, it is impossible to keep all hosts patched, generating information about the integrity of it is hard to buy a PC that does not come with a up-to-date, and protected from infection and ex- software and data. Many hosts that support a TPM, hosts that support measured launch are less ploitation by malware. TPM include a Trusted Computing Group (TCG)- common. When purchasing new hosts, system Trusted computing technologies can ad- compliant BIOS that automatically measures the owners should look for desktops and servers ministrators take control of their networks so host’s pre-boot environment. When compared that include a TPM and support for measured that they can begin to address security problems. with prior measurements, this measurement launch and protected execution—such as Intel’s Products that leverage these technologies are be- indicates whether the BIOS, boot loader, and Trusted Execution Technology (TXT) or AMD-V coming more and more widely available. Network other low-level system components have been virtualization technology. owners should position themselves to take full ad- modified since the last system boot. Hosts that support TPMs should have their vantage of these new products by making sure that Many modern microprocessors support a TPMs turned on and activated from the BIOS. This they purchase hosts that support the full range of measured launch capability that can be leveraged enables measurement of the pre-boot environment, trusted computing technologies. to ensure the integrity of a post-boot software and is necessary for measured launch. For more Trusted Computing Group environment—such as an information on trusted computing and taking kernel or virtual machine hypervisor. The advantage of the TPM, see “How to Use the TPM: The Trusted Computing Group (TCG) is an indus- measured launch may be used in conjunction with A Guide to Hardware-Based Endpoint Security,” try and government consortium formed to devel- pre-boot measurements to provide reasonable on the TCG website. op and promote standards for trusted computing assurance that critical system components have technologies. They have produced specifications not been modified since the last launch. This www.trustedcomputing.org and guidance for—among other things—the potentially powerful capability is provided by hardware TPM, the measured boot and launch of microprocessors that support Intel Trusted For more information, PC operating systems, and the TNC network secu- Execution Technology (TXT) and AMD-V Email: [email protected] rity architecture. virtualization. Trusted Platform Module Network Access Control Trusted Computing Technologies are included in Simply measuring pre- and post-boot environ- most PC desktop systems sold today. The most ments is not enough to ensure network integrity. common is the Trusted Platform Module (TPM). In order to actually improve the security of a net- The TPM is a motherboard-based cryptoprocessor work, the measurements computed for individual with capabilities that include secure generation hosts must be collected and acted upon. At the and storage of cryptographic keys, and generation very least, measurements should be reported of random numbers. to system administrators, who can then decide Enable An important capability of the TPM with respect to whether action is needed. Ultimately, systems Trusted host integrity is the accumulation and secure storr- can attest their integrity to a centralized network your age of system measurements. Measurements are access-control point using an architecture such hashes of host software computed by the host and as Trusted Network Connect (TNC). The control Platform Module L1