motd
by Rob Kolstad Excellence Dr. Rob Kolstad has long On Debbie Scherrer’s recommendation from several years ago, I took my parents to served as editor of view the “World Famous” Lippazzaner Stallions last night at their Pueblo, Colorado ;login:. He is also head coach of the USENIX- tour stop. I had no expectations, having ridden horses a few times and even enjoyed sponsored USA Com- seeing well-made photographs of them. Most people’s favorite part is the “airs”: the puting Olympiad. jumping and rearing up (as in “Hi ho Silver, away!” from the Lone Ranger intro). Those horses were definitely coordinated, ever moreso considering they weigh in four digits. As horses go, they were excellent.
Radio Buttons and Resumes by Tina A while back I had Dave Clark, owner of a system administration recruiting Darmohray and placement company, write a series of articles about preparing for a job Tina Darmohray, co- editor of ;login:, is a search. One of his articles focused on how to prepare a good resume. While computer security and networking consultant. there are variations on the theme, for the most part, resume format has She was a founding been a fairly static entity. At least that’s what I thought until I came across member of SAGE. an interesting twist during some Web surfing the other day.
Stanford has a Web site full of information about how to apply for positions with the
June 2001 ;login: 3 letters to the editor
BEWARE OF WHAT YOU WISH FOR As far as your question is concerned, the prevent a cracked box from being used as from Frederick M Avolio prospect of using someone's computer as an attack platform. Alternatively, if you
4 Vol. 26, No. 3 ;login:
June 2001 drive down costs. Though capacity is capacity Though drive costs. down backbone requires utilizationto high a businessmodelof The expensive. very Backbones are difficultto buildandare The third bottleneckisthebackbone. ously. more thanfive to 10places simultane- and itisdifficultto peertwo networks at pipesize islimited, isinadequate: peering for thetechnology Also, works compete. there’s noincentive to peerwell —net- Peering isabottleneckbecause works. Peering to how iscritical theInternet manywith small ISPs. the Internet isbecoming more diverse, edgeof The thosenetworks isgrowing. of andthenumber theaccess traffic, 95% of Around 7,000networks account forupto createstraffic aninherent bottleneck. Thiscentralized andbandwidth. routers, switches, loadbalancers, Web servers, applicationservers, provider’s databases, The firstmileincludesthecontent andthelastmile. backbone, thenetwork points, peering the firstmile, Thisscheme hasfourbottlenecks: versa. andvice contentuser to every provider, made sure there wasapathfrom every network providers In between, other. the usersonthe “Internet,” one sideof content providers on simple: was very the Web In thebeginning, company does. andhow thatinfluences whathis built, emphasizing how itisreally Internet, the introduction to thedevelopment of abrief histalkwith started Daniel Lewin Summarized byDarrellAnderson Daniel Lewin,AkamaiTechnologies, Inc. C S KEYNOTE ADDRESS O Implementation (OSDI2000) Systems Designand 4th SymposiumonOperating S YSTEMS ONTENT AN CTOBER D IEGO I SE IN SSUES D ;login: ELIVERY 23–25, 2000 , C ALIFORNIA G LOBAL I NTERNET , USA OSDI 2000 maintaining consistency inamassively invalidation introduce problems in and distribution Object thesystem. with andallowally to humans alert to interact management needto represent datavisu- system and monitoring Also, able setting. andinanunreli-imperfect information andmust work with fault tolerant, and selection mustserver bedistributed and Datagathering selection. server This dataenablesresource management/ in real time? How canwe gatherthisdatareliably and data dowe needto predict performance? Storage? network organize servers? What How shouldacontent distribution tions: ques- then posedthefollowing Lewin consistency andavailability. improves edgedistribution addition, In 86% reduction times. indownload an with between two and46timesfaster, Akamai estimates onaverage speeds are are anditsservers free. tools, Akamai provides network monitoring andcapacity. reliability, performance, improving andbackbone), peering, mile, mostbottlenecks(first tion bypasses Edgedistribu- the endusersaspossible. content from ascloseto serving servers, data from thecontent provider to these distributing 7,000 networks thatmatter, Thegoalisto infuseall many networks. inside deploying servers distribution,” Akamaidoes “edge level, At high avery slow andunreliable. centralized delivery is bottom lineis, The theInternet would collapse. ture of thecurrent infrastruc- a cablemodem, you gave everybody if However, get faster. would expect thatthewholeInternet will one Once thelastmilegetsfast, neck. The lastmilepresents thefinalbottle- increase faster thancapacity. Demandcan thenetwork. of capacity closeto the “useful” ward proxy logs), is around 200Gbps(usagedatafrom for- thebackbone Peak utilizationof demand. itcannotkeep upwith increasing, < see For more information, usersmay besplit. because groups of Theproblem iseasilyparallelized ted. orlinksmay servers beovercommit- fail, When theseassumptions never too large. users behindany is DNSserver particular active andthenumber of are poison, popularities that utilizationsconverge, assumptions: (and sometimeswrong) Akamai’s solutionmakes afewsimple problem. Thisisahard load available resources. shouldnotover- inaccuracies, the face of andeven in tion quicklyandaccurately, wantsinforma- selection Server capacity. and distributions, predict popularity, It isdifficultto principle. uncertainty subject to theHeisenberg place, evil concludedLewin thattheInternet isan selection into usablechunks. up resource managementandserver Akamai usestheDNShierarchy to break selection. server andperforms data, distributes gathersnetwork data, mon pointsets, thesystem computes thecom- together, Pulling it enabling feasiblemeasurement. reduce thesetsize from 200Kto 6K, common points coverage problem, As aset point andenablingcorrelation. endinginacommon routes, ments of seg- measure latency of users andservers, than measure actuallatency between Rather between endusersandservers. estimatingthelatency metric, point” the described Lewin “common Later, would. tion thesameway acentralized server tion system must provide access informa- content A distribu- system. distributed http://www.akamai.com >. 11 CONFERENCE REPORTS SESSION: APPLYING LANGUAGE TECH- MC allows automated checking of many The Devil IDL is based on a few key NOLOGY TO SYSTEMS system-level properties at compile time. abstractions: ports, which serve as a com- munication point and correspond to an CHECKING SYSTEM RULES USING Engler applied an MC system consisting address range; registers, which serve as a SYSTEM-SPECIFIC, PROGRAMMER-WRITTEN of metal specifications that extend the COMPILER EXTENSIONS repository of data and are the grain of xg++ compiler, to Linux, OpenBSD, the data exchange between a device and the Dawson Engler, Benjamin Chelf, Andy Stanford FLASH machine’s protocol Chou, and Seth Hallem, Computer Sys- CPU; and variables, which serve as the code, and the Xok exokernel. By checking tems Laboratory, Stanford University programmer interface and correspond to rules, Engler’s system found over 600 collections of register fragments that are Summarized by David Oppenheimer bugs in these systems. Most extensions given semantic values, such as bounded Dawson Engler described a compiler were written in fewer than 100 lines of integers or enumerated types. extension system that allows compile- code and by individuals unfamiliar with time checking of application-specific the MC system itself. The IDL compiler checks the consistency rules. The extensions are written in a of a Devil specification with respect to During Q&A, several conference partici- state-machine language called metal, properties such as conformance to type pants asked about opportunities to which somewhat resembles the yacc rules, use of all declared entities, absence improve the system in areas such as specification language, and are dynami- of multiple definition of entities, and reducing the number of false-positive cally linked into a modified version of absence of overlap of port and register errors reported, handling function point- g++ called xg++. The metal specification descriptions. It then generates the neces- ers, and automatically deriving rules describes patterns in the source language sary low-level driver code, which includes from code by looking at what the code that, at compile time, cause state transi- code to check for the proper use of the “usually” does (from a static code path tions in the state machine described by generated interface. the metal specification. At the time xg++ perspective). In response to a question translates a function into the compiler’s about the feasibility of using the tool Muller described the Devil description of internal representation, it applies the throughout the software development a device driver for the Logitech Bus- metal extensions down every code path process, Engler indicated that this mouse and several other devices, includ- in the function. States corresponding to sounded like a potentially good idea, and ing an IDE disk controller. Muller has rule violations signal a potential rule vio- he pointed out that the system’s useful- found that device drivers generated using lation in the source program. This system ness is improved when programmers Devil are five times less prone to errors is one instantiation of a general concept structure their code to be as simple as than is C code, and that Devil offers neg- Engler calls Meta-Level Compilation possible, so that it is more amenable to ligible performance overhead while (MC), which raises compilation from the compiler analysis. improving programmer productivity. Muller’s vision is that hardware vendors level of the programming language to the For more information, see will supply device specifications as a “meta level” of the code itself (e.g., inter-
12 Vol. 26, No. 3 ;login: irr htsan olo pthreads thatspawns apoolof a library layer runtime The isimplemented as OS. apage to the either prefetch orrelease of and decideswhento issue arequest for OS-supplied system status the dynamic, compiler-supplied hintsand the static, layerruntime dynamically analyzes both The mediate Format (SUIF)compiler. a passintheStanford University Inter- Theimplementationis and release hints. order to decidewhere prefetch to insert andsoftware pipelining in loop splitting, analysis, locality reuseforms analysis, Thecompiler per- algorithm IRIX 6.5. June 2001 a kernel daemoncalled managementpolicymoduleand memory Thiswasimplemented asa availability. and usage, onpagelocation, information aswell as for pageprefetch andreleases, includesprimitives TheOSsupport layer. andruntime compiler analysis, support, OS three parts: The system consists of additional burden ontheprogrammer. placingany without “good neighbor” order to anout-of-core turn taskinto a theextensions madein described they Brown then environment. grammed other applicationsinamultipro- tend to have asevere negative impacton of-core aggressive taskswith prefetching out- However, on adedicated machine. achieve goodperformance improvement out-of-core applicationscould the code, page prefetchesanalyze andinsert into by usingthecompiler to that, had shown An earlierwork by thesameteam policy. management memory default virtual rates causedby system’s theoperating tions often sufferfrom page-fault high Such applica- applications. “out-of-core” termed largedatasets, problems with computational sumption behavior of con- memory pointing outtherapid Angela Demke Brown beganhertalkby Summarized byTep Narula Mowry, Carnegie MellonUniversity Angela DemkeBrownandTodd C. P C T MN THE AMING HYSICAL OMPILER M -I ;login: NSERTED EMORY M EMORY I R NTELLIGENTLY LAE TO ELEASES H OGS releaser : U SING M on SGI ANAGE OSDI 2000 A P scheduling three threads ontwo CPUs. ( such algorithm, Abhishek how illustrated one Chandra processor systems? work well butdothey onmulti- in use, algorithms There are anumber of weight. toallocates proportional CPUbandwidth each applicationand ates with aweight It associ- addresses theserequirements. that scheduling algorithms class of Proportional-share scheduling isone tems. for efficientimplementationinreal sys- low overheadsapplications andachieving isolating misbehaving oroverloaded Otherrequirements include allocation. resource fairandproportional vide OSmechanismslenge isto to design pro- Akey chal- multiprocessor servers. large, Applications are often hosted on games. and e-commerce, streaming, HTTP, such as and multimedia applications, fordiverseScheduling isimportant Web Summarized byDarrellAnderson Corporation chusetts, Amherst;PawanGoyal,Ensim ofMassa- and MicahAdler,University Abhishek Chandra,PrashantShenoy, M A S SESSION: SCHEDULING scenarios. multiprogrammed improvements bothindedicated and showed performance significant theresults Overall, for theexperiments. fourprocessors 200with wasused Origin An SGI suite plustheMATVEC kernel. taken from theNAS Parallel benchmark five applications out-of-core version of wereExperiments usingthe performed buffered release. and priority-based aggressive release mented inthelibrary: There are two release policiesimple- theapplicationspace.requests within to handletheprefetch andrelease SFQ URPLUS GRTMFOR LGORITHM ULTIPROCESSORS ROPORTIONAL ), can lead to starvation when canleadto starvation ), F AIR S CHEDULING S -S Start-Time Fair Queuing YMMETRIC HARE CPU S : CHEDULING Surplus FairSurplus Scheduling processors. whenschedulingdown across multiple on uniprocessor systems butbreaks correctly SFQperforms Again, problem.” calls the which Chandra “short jobs jobs, short of anddepartures arrivals A second problem underfrequent arises rithms. easilyintorated existing scheduling algo- andcanbeincorpo- CPUs inthesystem, toin timeproportional thenumber of running Weight readjustment isefficient, overall ratios. weight preserving width, thread’s CPU’s to weight asingle band- canbemadelimitingany one ment,” or readjust-“weight simple correction, A ing isdifferent from actual allocation. where theaccount- assignment,” weight “infeasible isaresult of This starvation multiprocessors. Does your Does algorithm multiprocessors. notwork well on will uling algorithm You claimaproportional-share sched- Q: weight. tickets to donottranslate scheduling sor, Onamultiproces- on amultiprocessor. shareproportional whenapplieddirectly scheduling hasproblems Lottery with A: tems? extensions formultiprocessortrivial sys- have nearly scheduling, such aslottery Don’t many scheduling algorithms, Q: uling overhead. isolationatmodestadditionalsched- rior SFSprovides supe- scheduling overhead. forisolationand timesharing with SFSwascompared Second, allocation. tion tests nearoptimal where SFSyields presentedChandra alloca- proportional andmultiprocessoring onsingle systems. jobsproblem seesfairschedul-the short increasing surplus, threads inorder of By scheduling foreach thread. surplus computing thescheduling the idealshare, which itcompares with share, bandwidth scheduler measures processor observed the With thisalgorithm, this problem. ( SFS ) addresses 13 CONFERENCE REPORTS scale? What happens to the overhead requires coordination between medium- work is a study of proposed algorithms with a very large number of processors? and long-term schedulers. and is a negative result. A: We have developed some algorithms Corbalán used the NANOS execution Instantaneous power consumption of independent of the number of proces- environment on a shared memory multi- CMOS components is proportional to sors. processor. NANOS uses a queuing system the square of voltage, times frequency. and CPU Manager to schedule OpenMP Batteries will perform better if drained at Q: How important is the virtual time in parallel applications. The dynamic a lower rate. Secondly, frequency depends your algorithm? performance analysis is done by the Self- on voltage. Greater voltage permits A: The idea is basically heuristic. Analyzer, a tool that estimates execution higher frequency, to a point. Lower fre- time for processes. quencies provide more than linear reduc- Q: It doesn’t seem that the problem is tion in power needs. It is better to run a inherent in SFQ. It seems you need to The SelfAnalyzer remembers baseline system slowly and steadily than to run it have a notion of a global virtual time, performance results for two and four as fast as possible and then shut the rather than weight readjustment. processors, which are then used to pre- processor off. Clock scaling algorithms dict performance. Performance-driven A: We looked at a few algorithms we require load prediction and speed adjust- processor allocation is space sharing, could apply, but did not come up with a ment. allocating for acceptable efficiency. simple answer. Processes run to completion with mini- Adding to prior work, Neufeld’s contri- For more information, see mum allocation of one processor. bution includes a real implementation
execution. One solution involves a priori POLICIES FOR DYNAMIC CLOCK SCHEDULING characteristics would translate well to calculation by measuring multiple execu- Dirk Grunwald and Philip Levis, Univer- clock scaling. In practice, the algorithm tions, using previous results to predict sity of Colorado; Keith Farkas, Compaq performed only marginally better. later performance. Julita Corbalán pro- Western Research Laboratory; Charles The negative result: weighted averaging posed an alternative approach where Morrey III and Michael Neufeld, Univer- methods do not appear to work well. processors are allocated to those applica- sity of Colorado What went wrong? Averaging attenuates tions that can take advantage of them. Summarized by Darrell Anderson oscillations, but does not remove them. This runtime dynamic performance “Saving energy or power is important, Larger interval history might help, but analysis approach, called Performance- both for battery life and at the micro- would reduce responsiveness. Even if Driven Processor Allocation (PDPA) architecture level for clock speeds,”said tuning were possible, it would be fragile. Michael Neufeld, as he indicated that this Also, a linear change in frequency doesn’t
14 Vol. 26, No. 3 ;login: June 2001 E ing authors callthis process The request would take. time theoriginal the affecting for otherreads without therotational latency can beused ever, How- seektimeisinevitable. The tively. respec- time andtherotational latency, Theseare theseek rotate to thediskhead. and waitingforthedesired sectors to timeseekingthedesired track amount of disk headsto spend arelatively large Typical requires diskusegenerally the ground reads andwrites. rotational latencies getsusedforback- spenton onessothattime normally ority requests between high-pri- low-priority by isextracted bandwidth scheduling the This “free” pletion islessimportant. com- must getdonebutwhosetimeof tasksthat low-priority ground requests, andback- thedisk, workload of ority andhigh-pri- which includethenormal foreground requests, requests: pools of two Theirmethoddescribes requests. theoriginal timesof theservice affecting disk’s potential without bandwidth a anadditional20–50%of can extract scheduling diskaccesses that method of His presentation outlineda width. diskband- need forbetter utilization of Ganger pointed outasheexplained the mance Greg inmany computer systems,” “Disk drives limitperfor- increasingly Summarized byMacNewbold Hewlett-Packard Labs Carnegie MellonUniversity;ErikRiedel, R.Ganger,Gregory DavidF. Nagle, Christopher R.Lumb,JiriSchindler, D T SESSION: STORAGEMANAGEMENT usefulforexperimentation. very hardware would be of range Awider ties. has onlylimited voltage-scaling capabili- Neufeld pointsoutthatexisting hardware explanation. The authorsdon’t have aconclusive always meanalinearchange time. inidle XTRACTING OWARDS ISK . D RIVES H ;login: IGHER “F REE ” B D ISK NWDHFROM ANDWIDTH -H EAD freeblock schedul- U TILIZATION B USY : OSDI 2000 nAI SLEDsallow applicationsto An API, (SLEDs) Descriptors Estimation Latency authors proposed theconcept of the To address theabove problem, access to RAM. that of access to atapeand between latency of 11-orders-of-magnitude difference motivation forthiswork wasthe mary Rodney Van Meter indicated thatthepri- Summarized byVijay Gupta nia, Berkeley tion; MinxiGao,UniversityofCalifor- Rodney Van Meter, QuantumCorpora- L increase diskbandwidth. beneficialandcouldbe very significantly itcould disk-drivemodern technology, ing indeedproves to becompatible with freeblock schedul- If low-level interfaces. andthelackof their internal algorithms given thecomplexity of diskdrives, ern mod- scheduling canbedoneoutsideof this It remains to beseenhow much of utilization. theoriginal times thatof 10 utilizationontheorderbandwidth of disk alsoshow They increases of lized. tial free canactuallybeuti- bandwidth thepoten- of over half has beenscanned, andeven untiltheentire disk scanned, thediskhasalready been 90% of total potential) canbeutilized untilover potential free (35–40%of bandwidth thefull entire diskinthebackground, that foraprocess thatwantsto read the demonstrate They are promising. very The actualresults publishedinthepaper or prefetching andprewriting. internal storage optimization, scanning, requirements includethosethatperform Applications that fitthese footprints. andhave smallworking memory access, require order of noparticular blocks, desired have largesetsof low priority, appropriate forthisare processes thatare Tasks thatare most ground requests. requests thatfitwell between thefore- torelies findbackground ontheability freeblock scheduling The effectiveness of ATENCY M NGMN IN ANAGEMENT S TORAGE Storage S YSTEMS . utilities such as modifiedseveral UNIX they addition, In ioctl optionswhich return SLEDsdata. authorsaddednew The Linux kernel. the The SLEDswere addedto v2.2.12of largegains. yields Thus reordering I/Os two hard faults. in thesecond passsothatthere are only SLEDreorders reads On theotherhand, the second have passwill five hard faults. then isLRU, page replacement strategy the If andthefilesize isfive pages. buffer, Suppose there are three pagesinthe file. which makes two sequentialpassesover a anapplication theexample of giving SLEDsby Van Meter motivated the useof are given by theOSto theapplications. SLEDs given by applicationsto theOS, Whereas hintsare prefetching). informed hints(which are usedintransparent of SLEDs are to thenotion complementary thestorage system. dynamic state of the understand andtake advantage of grep SLEDs. the benefitof (which isaGUIfilemanager)to assess common. which are becoming increasingly tems, issuesforheterogeneousimportant sys- somevery thepaperraised Overall, theI/Olibrary.optimized with wasalreadythe factthatprogram reduction inexecution timeinspite of Thisreordering achieved 11–25% tions. was modifiedto reorder theI/Oopera- Theapplication 100,000-line I/Olibrary. It hasa CandFortran. 840,000 linesof which ismadeupof tion (LHEASOFT) anastronomy applica- plex example of com-Then Van Meter presented alarge, skipped. thefilesystem would be of portion thenthat beyond thespecifiedlatency, thefilesystem isgoingto be of portion thelatency to access some cate isthatif search trees; ct.Tefaueo the Thefeature of icate. reorders directory and prunes find find uses their , wc wc , -latency grep reorders I/O; -latency and predi- GMC pred- 15 CONFERENCE REPORTS A LOW-OVERHEAD, HIGH-PERFORMANCE To wrap up, Noh showed the results for user. Typically this is solved by creating UNIFIED BUFFER MANAGEMENT SCHEME THAT trace-driven simulations and showed an account locally for the non-local user EXPLOITS SEQUENTIAL AND LOOPING graphs for how UBM takes into account by sharing passwords, or by installing a REFERENCES the looping references. gateway that is implicitly trusted by the Jong Min Kim, Jongmoo Choi, Jesung server, all of which we know have many Kim, Sang Lyul Min, Yookun Cho, and SESSION: SECURITY weaknesses. The solution proposed by Chong Sang Kim, Seoul National Uni- Howell is a system for delegating author- versity; Sam H. Noh, Hong-Ik Univer- HOW TO BUILD A TRUSTED DATABASE ON ity in a way that the server has a complete sity; UNTRUSTED STORAGE Umesh Maheshwari, Radek Vingralek, proof of correctness and an audit trail for Summarized by Tamara Balac and William Shapiro, STAR Lab, the access the remote user was given. This talk focused on a new solution to an InterTrust Technologies Corporation old problem. The problem is determining The system is based on delegations. For Summarized by Tamara Balac which pages the OS should cache in the instance, local user Alice wants to give main memory. The motivation for their Digital rights management protects remote user Bob access to some of her solution approach was that LRU has rights of the provider (i.e., database bal- files. So she delegates her authority over problems for sequential and looping ref- ances, contracts). Existing systems are those files to Bob. Then when Bob asks erences. If the access is purely sequential, missing trusted storage in bulk. Umesh for file X in that set of files, the server is then caching recently used pages is Maheshwari presented a Trusted Data- presented with a collection of statements: wasteful. If the access is looping, then Base system (TDB) that resists accidental first the request, “Bob wants to access file LRU cannot figure it out. To overcome and malicious corruption by using X,”then the delegation, “Bob speaks for this difficulty, they proposed a new con- Crypto Basis, which leverages persistent Alice concerning file X,”and finally the cept called Inter-Reference Gap (IRG). storage in a trusted environment. basis for delegation, “Alice owns file X.” The server then can make a decision IRG for a block is the difference between The TDB architecture consists of three about allowing Bob to access X, instead the points when the block is successively layers: a Collection Store, an Object of relying on one or more gateways to accessed. Store, and a Chunk Store. The Chunk decide. In this scheme, gateways are Store provides trusted storage for vari- The types of references which the authors required only for translation and relay of able-sized sequences of bytes which are considered are: sequential, looping, and requests and proofs. Of course, at this the unit of encryption and validations other. They developed a new scheme, point, the client trusts the gateway not to (100B–10KB). The Object Store manages called unified buffer management (UBM), abuse its authority. To deal with this, a a set of named objects. The Collection which includes automatic detection, gateway authentication scheme can be Store manages a set of named collections replacement, and allocation, to exploit used. these references. of objects. One advantage of a system like this is Performance evaluation demonstrated In automatic detection, they take into that the gateway is very simple. It only that Crypto overhead is small compared account “fileID, start, end, period” for needs to carefully quote each request and to I/O and that TDB performs well com- references. Initially, the period is infinity. only use Alice’s authority for Alice’s pared to commercial packages (e.g., As more references are encountered, the requests. It need not make access deci- XDB). period is adjusted. For sequential access, sions. This also allows for multiple gate- the period always stays at infinity. But in END-TO-END AUTHORIZATION ways to bridge the gap between client and the case of looping references, there is a Jon Howell, Consystant Design server, and ultimately the server is the definite period. For replacement, they Technologies; David Kotz, Dartmouth one who grants or denies access. adopt different schemes depending on College the type of reference. For sequential The implementation of the authorization Summarized by Mac Newbold access, they use MRU; for looping refer- scheme uses Simple Public Key Infra- ences, they use a period-based replace- In a very entertaining presentation, Jon structure (SPKI) and is part of the ment scheme; and for other references, Howell explained the need for an end-to- Snowflake project. The paper includes they use LRU. The allocation scheme is a end authorization scheme. Currently, performance evaluations that reflect bit too mathematical to explain here, when a local user needs to grant access to some additional overhead for the end-to- although the intuition involves using a resource to a non-local user, it creates end authorization but points out that a marginal gains and IRG. an authentication problem, because the large part of the added overhead is server doesn’t know about the non-local directly attributed to slow and inefficient
16 Vol. 26, No. 3 ;login: IN June 2001 tional systems use guessing, while S4’s while tional systems use guessing, ventional systems showed that conven- S4andcon- comparison of Diagnostic accidental filemodifications). recoveryallowing from accidents (like and enabling speedy recovery, mises, compro- to analyze security opportunity S4includeproviding an The benefitsof detection window). time(called for aguaranteed amountof andto store these write, tions) onevery creating anewversion (calledcollec- by add internal reasoning andauditing, next The step isto system. operating rate hardware asepa- thatruns rate piece of S4isasepa- allmodifications. of history modifications by providing acomplete thatprevents undetectable named S4, a of tion John presented Strunk animplementa- ALL stored information. recovery canmanipulate isthatintruders and diagnosis the majorproblem with Nevertheless, andrecovery. do? Diagnosis What are system to administrators ing. comput- modern areIntrusions of afact Summarized byTamara Balac Mellon University Soules, Gregory R.Ganger, Carnegie Michael L.Scheinholtz,CraigA.N. John D.Strunk,GarthR.Goodson, S resource forend-to-end authentication. valuable nique ispotentially avery Theseresults show thattheirtech- SSL. proving steps thatare by notperformed as SSLplusasmalloverhead forthe mized Snowflake would aswell perform itisexpected thatanopti- operations, expensiveputationally cryptographic which are com- mostof byformed SSL, closelycorrespondvery to thoseper- Because Snowflake steps performs that authorization. HTTP implementation of almost aswell asaJava andJava SSL would perform mized SPKIlibraries It isestimated thatopti- SPKI libraries. ELF C -S OMPROMISED ECURING Self-Securing StorageSelf-Securing System, ;login: S TORAGE S YSTEMS : P ROTECTING D ATA OSDI 2000 venient becauseyou canaccess itfrom It iscon- onecanpublishany data. this, With read-only data. access to public, scalable tion system providing secure, Thisactsasacontent distribu- servers. replicatedto bewidely onuntrusted only filesystem propose they a So, the user. and itrequires thecontinued attention of mostusersignore signatures, purpose; PGPisthatitnotgeneral problem with The packages contain PGPsignatures. software lotsof As anexample, tions. to peopleresort adhocsolu- problem, To mitigate this abreak-in. chance of thegreater the more replicas you have, system isthatthe issue inadistributed Another andconvenience. reliability, ity, scalabil- performance, ple avoid security: He alsogave reasons why peo- network. installinganOSover the an example of by providing David Mazières off started presented atSOSP1999. (SFS) thatwasbuiltby theauthorsand Thiswork usesasecure filesystem tees. guaran- orauthenticity nointegrity with public content ontheInternet comes of thevastmajority Unfortunately, sions. software installationto investment deci- from licly available dataforeverything Internet rely usersincreasingly onpub- The motivation forthistalkwasthat Summarized byVijay Gupta Science Mazières, MITLaboratoryforComputer Kevin Fu,M.FransKaashoek,andDavid F F was lessthan15%. theperformance overhead importantly, More are possible. detection windows, even multi-week windows, detection showedFeasibility evaluation thatlarge time. point of thestorage atany can recreate thestate of anS4administrator Additionally, events. storage audit logshows thesequence of S AND AST ILE S YSTEM S ECURE which hasbeendesigned , D ISTRIBUTED self-certifying read- self-certifying R EAD -O NLY tion shows that Aperformance evalua- the SFSsystem). < see For more information, tothe server prevent break-ins. re-signing dataandputtingthemonto periodically of mentioned thenecessity Mazières also Finally, secure Web server. which is92timesbetter thana on aPC, 1,012 short-lived connections persecond as two daemons( The read-only filesystem isimplemented reader isreferred to thepaper. theinterested details aboutthescheme, For hashes. resistant cryptographic This isaccomplished usingcollision- ontheclientsislow. cryptography of andtheoverhead onservers, performed are operations that nocryptographic The goodthingabouttheirapproach is data. of SFSchecks theauthenticity tions, before dataisreturned to theapplica- pick theirdatafrom thesemachines and Clientmachines machines. untrusted replicatesthen widely thedatabaseon Theadministrator system’s private key. itofflineusingthefile signs and digitally afilesystem’s contents ates adatabaseof cre- anadministrator In theirapproach, infrastructure. distribution publishing hasbeenseparated from the It isscalablebecause any application. mosbe Another alternative isacon- impossible. nearly accurate billingandgoodsecurity andmakes andclients, servers, routers, requires in support guarantees, delivery hasno be usedforlive transmission, It canonly but ithasseveral weaknesses. OneoptionisIPmulticast, the Internet. current multicasting solutionsin with theproblems Overcast addresses many of Summarized byMacNewbold O’Toole Jr.,CiscoSystems Johnson, M.FransKaashoek,JamesW. John Jannotti,DavidK.Gifford, L. Kirk O O SESSION: NETWORKING http://www.fs.net VERLAY VERCAST N : R ETWORK ELIABLE sfsrosd sfsrocd >. M LIATN IHAN WITH ULTICASTING can support and sfsrosd in 17 CONFERENCE REPORTS tent distribution system, which provides ered dead. Through the use of “death cer- just one TCP connection. But HTTP uses on-demand content. But it doesn’t per- tificates” and “birth certificates,”it noti- four connections in parallel between the form so well with live content, often is fies the hierarchy of changes in topology. same two endpoints in the Internet. Fur- not scalable, and doesn’t allow for nodes A key to scalability here is a system for thermore, not all applications necessarily to be added or deleted on the fly. quenching messages that aren’t necessary need the reliability of TCP, so such appli- for nodes further up the hierarchy. Over- cations use UDP. To make them TCP- John Jannotti indicated that Overcast cast network usage for these messages friendly, they end up using some provides a unified reliable multicast solu- scales sublinearly, and space usage on the home-grown congestion control. The tion. It is self-organized, handles live, root node is linear, but even in a group of goal of this work is to have some kernel- time-delayed, and on-demand content, millions of nodes, total RAM cost for the level mechanisms to facilitate TCP and any HTTP client can join without root would be under $1,000. friendliness. modification. As an overlay network, it is also incrementally deployable and Jannotti referred to the paper which also Andersen showed where the congestion requires no special support from the outlines a system for replicating the root controller occurs in the Linux kernel, underlying network or its routers. It uses node to increase reliability of the root where they implemented their scheme. HTTP over TCP port 80 and has other server itself. The system uses techniques They use callbacks for orchestrating features that make it compatible with used by normal redundant server setups, transmissions and application notifica- NATs, firewalls, and HTTP proxies. such as DNS redirection or round-robin tion. The clients for these callbacks are load balancing and, in the case of a failed in-kernel TCP clients. One key to the Overcast design is the server, IP address takeover. The Overcast- process of building an efficient multicast The implementation handles flow con- specific solution is to replicate root state distribution tree. The source of the mul- trol and congestion control. It also has a by setting up the servers linearly, with ticast is designated as the root of the tree, user-level rate callback API, called libcm, each replicated root being the only child and any nodes that want to join contact which tells if bandwidth goes up by some of the root node above it, so that the rest that root node and attach to the tree as factor (e.g., a factor of 2). of the hierarchy is descended from each its child. Periodically, each node consid- of the root nodes. This way when one Andersen addressed evaluation issues ers its “sibling” nodes and “grandparent” fails, another can immediately take over such as the impact on the network and nodes as possible new parents. If it finds without any interruption of service or on the connections. Their approach has a that its connection to a sibling node is loss of state. positive effect on the TCP friendliness of better than its connection to its current host-to-host flows, although the parent (for instance, in terms of latency, Jannotti concluded that the reliable mul- throughput of the connections may be bandwidth, or hop counts), it makes that ticast solution proposed in Overcast is a slightly worse. sibling node its new parent. It could also very feasible solution in terms of deploy- find that a connection directly to its ability, scalability, efficiency, flexibility, For testing the flow integration, they grandparent would be a better connec- and usability in real-world situations. used a series of Web-like requests on the tion than through its current parent, and Utah testbed (see it would, in effect, become a sibling to its SYSTEM SUPPORT FOR BANDWIDTH
18 Vol. 26, No. 3 ;login: June 2001 ( of overview an John beganthetalkwith Griffin Summarized byVijay Gupta F. Nagle,CarnegieMellonUniversity R.Ganger,andDavid Schlosser, Gregory John LinwoodGriffin, StevenW. MEMS-B O SESSION: STORAGEDEVICES >. < see For more information, have ananswer. responded Griffin thathedidnot MTTF, for while upcoming ASPLOS paper, hereferred to their requirements, For power expected inafewyears. andhereplied thatitcould be systems, asked whenwe could seetheminreal Someoneelse be cheaper thanhard disk. saidthatitmight andGriffin price/GB, onepersonasked about Q&A, During large-streaming mediaobjects. andthattheborder beusedfor objects, center beusedformetadataandsmall Sotheauthorssuggestthat borders. andslower fordataatthe in thecenter, storage isthattheaccess isfaster fordata MEMS-based An interesting aspectof storage. for disk-scheduling andMEMS-based which hadthesameshape graphs of ety through Thiswasshown avari- devices. also applicableto MEMS-basedstorage work is ondisk-scheduling algorithms thetalkwasto show that of The purpose disks. lessthanwith magnitude order of isthatthe seektimeisan age devices MEMS-based stor- The real advantage of theMEMSlabatCMU. with ration results inthispaperare basedoncollabo- The alreadysuch ascar airbags usethem. Real-life systems andacademia. industry currently underdevelopment in ogy MEMSisanewstorage technol- devices. MEMS http://lcs.Web.cmu.edu/research/MEMS PERATING ) andMEMS-basedstorage ASED ;login: S YSTEM microelectromechanicals S TORAGE M NGMN OF ANAGEMENT D EVICES OSDI 2000 NA IN a request switching filter, a microproxy a or a request filter, switching networks to high-speed interpose tage of It takes advan- architecture calledSlice. anewstorage system Chasedescribed Jeff Summarized byMacNewbold Amin M.Vahdat, DukeUniversity Jeffrey C.Anderson, S.Chase, Darrell N I < see For more information, ventional approaches. throughput results unmatched by con- candeliverprototype latency and The bottom linewasthattheMimdRAID to test.simulator) thatputsthetheory MimdRAID implementation(which isa have They aprototype theoretical model. their of derivedtions which they aspart theequa- Wang went onto show someof tional delay forreads isreduced to half. andtherota- average rotational delay, themaximum to of theratio third of age seekdistance becomes lessthanone- Theaver- both seekandrotational delay. rotational replicationing with to reduce which flexibly combines strip- SR-array, goes beyond RAIDby an contributing thiswork However, improve reliability. thesystems andto throughput of write tems were made to improve theread/ RAID sys- and diskskeeps onincreasing, Since theaccess timesbetween memory disks. of proposes away to reduce theusagelevel Randolph Wang In thispaper, use them. one canbemore creative abouthow to andhence, usedto whatthey cost, tion of Large disksare now available forafrac- Summarized byVijay Gupta sity ofWashington University, ThomasE.Anderson,Univer- University; ArvindKrishnamurthy, Yale Randolph Y. Wang, KaiLi,Princeton Xiang Yu, BenjaminGum,Yuqun Chen, T NTERPOSED http://www.cs.princeton.edu/~rywang/mimdraid RADING ETWORK D ISK C A S PCT FOR APACITY RRAY TORAGE R EQUEST R UIGFOR OUTING P ERFORMANCE S CALABLE > . µ given clientallpassthrough thesame theconstraint thatrequests fora with replicated freely to provide scalability, itsfunctionscanbe Allof age servers. or inanetwork elementcloseto thestor- inanetwork interface, on theclientitself, soitcaneasilybeplaced across clients, softstate thatisnotshared amount of It maintains abounded response packets. addresses orotherfieldsinrequest and may source rewrite ordestination It able packet filter moduleforFreeBSD. It hasbeenimplemented asaload- fast. and simple, to besmall, and wasdesigned The data. of foreach type theservers cialization of around managernodesandallows spe- Thisdiverts high-volume dataflow utes. onnamespaceoperations orfileattrib- and I/Oonsmallfiles, I/O to largefiles, high-volume separated into three classes: Client requests are fileserver. a single seesitas who to aclient, volume” “virtual large cooperate to provide anarbitrarily that servers isagroup of Slice fileservice The provide scalablenetwork storage. can beusedinaLANenvironment to andsimilarapproaches longer required, nections (like Fibre Channel)isno Network faster network con- (SAN)with aspecialized Storage Area performance, recent advances inLAN Because of andcapacity.well inbothbandwidth clients thathasabackendwhich scales µ int ces Slice isalsocompatible sion to access. directories thatitsclientshave permis- compromised limitingdamagefrom a object identifiers, to protect anduseencryption boundary theserver’s trust located outsideof Thisfeature letsthe tor-based. object-based methodasopposedto sec- The storage nodesthemselves usean nodes. intervention by management further anydirectly to without thestorage array rx.The proxy. andpresents anNFSinterface to proxy, µ proxy handlesallbulkI/Orequests µ proxy routes requests µ proxy to those filesand µ proxy be 19 CONFERENCE REPORTS with sector-based storage if every µproxy For more information, see is trusted. Redundancy can also be pro-
through mirroring and striping. It can be PROACTIVE RECOVERY IN A configured on a per-file basis, and a Slice BYZANTINE-FAULT-TOLERANT SYSTEM configuration could even use redundancy Miguel Castro and Barbara Liskov, MIT at both levels for stronger protection. Laboratory for Computer Science The two types of management nodes, the Summarized by Tamara Balac directory servers and the small file Today’s computer systems provide crucial servers, take load off of the storage nodes information and services which make and allow for further specialization for them more vulnerable to malicious these types of operations. These man- attacks and make the consequences of agers are data-less, and all their state these attacks, as well as software bugs, comes from the storage arrays, so they more serious. As an alternative to the provide only memory and CPU resources usual technique of rebooting the system, to cache and manipulate the structures. this paper proposes a new means of sys- The directory server handles all lookups, tem recovery that does not use public key creation, renaming, and deletion of cryptography. directories and files and their attributes. Miguel Castro presented an asynchro- The small file servers provide more effi- nous state-machine replication system cient space allocation and file growth, as that offers both integrity and high avail- well as batching of multiple small ability and is able to tolerate Byzantine requests into larger ones for efficient disk faults which can be caused by malicious writes. The managers also help provide attacks or software errors. The paper atomicity and recovery features through presents a number of new techniques, a write-ahead log and two-phase com- like proactive recovery of replicas, fresh mits. messages, and efficient state transfer, Performance data indicates that Slice needed to provide good recovery service. does indeed scale very well. Name-inten- The task of recovery from Byzantine sive benchmarks showed directory serv- faults is made harder by the fact that the ice can be improved simply by adding recovery protocol itself needs to tolerate more directory server sites, and for any other Byzantine-faulty replicas. Attackers given configuration, the performance must be prevented from impersonating scales linearly with the number of clients. recovered replicas. The advantage of this Performance was evaluated with the system over previous state-machine industry-standard SPECsfs97 bench- replication algorithms is the use of sym- mark, and Slice kept up perfectly with the metric cryptography for authentication offered load up to its saturation point, of all protocol messages, which bypasses which can be easily raised by adding the major public key cryptography bot- more storage nodes. tleneck. Slice definitely provides a network stor- This algorithm has been implemented as age solution that is scalable, reliable, a simple interface, generic program practical, easy to upgrade, and compara- library that can be used to provide ble to similar commercial solutions in Byzantine-fault-tolerant versions of dif- terms of performance. With benefits like ferent services. these, we are sure to hear more about Slice in the near future. For more information, see
20 Vol. 26, No. 3 ;login: In additionto disk asstablestorage. for theinteractive workloads whenusing 13–40% He foundoverhead of storage. asstable system usedreliable memory 0–12%whenhisrecovery overhead of For theseapplicationshefoundan space. protocols from theprotocol number of Marks June 2001 okivrat”By injectingfaultsinto work invariant.” performance with for measuring “lose- applications: using “save-work forfour invariant,” Lowell presented performance results, over thespace. vary variables design andother recovery time, reliability, simplicity, discussed how performance, protocols onto thisspace and ber of Lowell mappedanum- ministic events. made to identify andconvert non-deter- whiletheotheriseffort events, visible madetospace commit istheeffort only this Oneaxisof upholding save-work. represents adifferent technique for Each pointinthespace protocols. ery recov- Lowell’s definesaspace of theory application state. recovery from failures thataffectthe tion state must bediscarded to allow specifies whatapplica- work invariant” The “lose- preserved to maskthefailures. fies whatapplicationstate must be speci- invariant” “save-work The parency. must beupheldto achieve failure trans- His work proposes two invariants that explicit assistance. programmer without or applicationfailures system, operating recovering applicationsafter hardware, ure-free by operation automatically fail- ating system provides of theillusion “failure inwhich transparency,” anoper- of David Lowell theabstraction described Summarized byDavidOppenheimer M. Chen,UniversityofMichigan tion; SubhachandraChandraandPeter oratory, Compaq ComputerCorpora- David E.Lowell,Western Research Lab- L E MT OF IMITS XPLORING He each applicationona ran . G ;login: F ENERIC AILURE nvi nvi , magic Lowell used , R T ECOVERY ASAEC N THE AND RANSPARENCY , xpilot and , postgres Tread- OSDI 2000 which presented fault-injectionstudy results in Lowell frequently thanapplicationfaults. lose-work less of faults causeviolation OS not manifestaspropagation failures, systemBecause operating faultsoften do cases. recoverygeneric impossibleinthose makingapplication lose-work, violate inthefield applicationcrashes 90% of he estimated greater thatperhaps than publishedfaultdistributions, result with Merging this non-deterministic faults. from applicationcrashes least 35%of caused themto lose-work inat violate ing save-work fortheseapplications He foundthatuphold- fault isactivated. late lose-work by committing after the applicationfaultsthatvio- of the fraction nvi < see For more information, opposed to software failures. lose-work forhardware failures as violating the difference inthechance of and before anerror haspropagated, gram tions asamechanism forstopping apro- rebooting applica- of about thefeasibility conference attendees asked Q&A, During the application. and musttransparently involve helpfrom tion failures cannotbeaccomplished butthatrecovery from propaga- feasible, failure forstop transparency failures is Lowell concluded thatproviding study, From this lose-work. forces of violation because upholdingsave-work often tion failures ismuch more difficult Recovering from propaga- applications. parency ispossibleformany real low-overhead failure trans- cation state, application before any corrupting appli- work since the by crash definitionthey which donot require upholdinglose- Lowell concluded thatforstop failures, respectively. OScrashes, work in15%and3%of http://www.eecs.umich.edu/Rio> and nvi postgres and postgres Lowell alsomeasured , violated lose- violated . hrceitc.Three systems have been characteristics. andrequest network, based onservice, consistencycally trade forperformance It allows applications to dynami- conits. bounds specifiedby theapplication’s layer thatenforces theconsistency TACT isimplemented asamiddleware replicas. nism formaintainingconsistency among anti-entropystyle isusedasthemecha- Bayou- is accepted to beappliedlocally. timebefore anon-localwrite amount of which specifiesamaximum staleness, and thoseupdates inthe “final image”; of applied to alocalreplica andtheordering ference intheorder inwhich updates are which boundsthedif- order error, data; the of and thevaluein “final image” data apiece of between thelocalvalueof which bounds thediscrepancy ical error, aconit are numer- The three elementsof system). airlinereservation distributed in a seatsonaflight ablockof tency (e.g., consis- specific physical unitof orlogical vector each application- associated with conits desired consistency semanticsusing Applications thatuseTACT specifytheir and optimisticconcurrency. are inthespace varied between strong and performance asconsistency policies availability, amongconsistency, tradeoffs ness —TACT allows theinvestigation of andstale- order error, error, numerical quantify theconsistency — spectrum to metrics By proposing asetof policies. tional strong andoptimisticconcurrency thansimplythetradi- rather cations, consistency appli- optionsfordistributed model proposes acontinuous of range This tency modelforreplicated services. TACT, continuous a consis- of evaluation Amin Vahdat and the design described Summarized byDavidOppenheimer University Haifeng Yu andAminVahdat, Duke S C D ERVICES ONSISTENCY SG AND ESIGN . A conit A isathree-dimensional . E M AUTO FA OF VALUATION DLFOR ODEL R EPLICATED C ONTINUOUS 21 CONFERENCE REPORTS built using the TACT platform: a bulletin so any client can work with any server for DDS, except that the partitions on the board, an airline reservation system, and any transaction, which simplifies load failed brick were available only for read- a system for enforcing quality of service balancing and request routing. ing. (QoS) guarantees among distributed This design provides incremental scala- An example of the usefulness of DDS for Web servers. For these systems Vahdat bility as nodes are added to the cluster rapid construction of Internet Services is evaluated such issues as latency for post- and was tested up to terabytes of storage Sanctio, an instant messaging gateway. It ing a bulletin board message as a func- space. Because individual nodes and translates between ICQ, AOL’s AIM pro- tion of the numerical error bound; disks might fail, each partition of data is tocol, email, and voice messaging over reservation conflict rate, throughput, and replicated on multiple nodes, forming a cellular phones. It also uses AltaVista’s reservation latency as a function of replica group, which are all kept strictly BabelFish to do language translation. inconsistency in the airline reservation coherent. Any replica can service a data During his presentation, Gribble told of system; and number of consistency mes- lookup, but any state changes must hap- using Sanctio to translate his English sages as a function of relative error for pen in all replicas. ICQ connection to an Italian AIM con- the distributed Web server QoS system. nection to communicate with a friend’s A simple algorithm is also in place to During Q&A, Vahdat indicated that his grandmother in Italy. Using DDS for its provide fault tolerance. If a replica group is currently investigating issues storage, Sanctio was completed in less crashes, it is simply removed from the such as how responsive the system is to than one person month, and the code replica group and operation continues. rapid variation in desired consistency that interacts with the DDS took less When a node joins or rejoins a replica levels and how much effort is required by than a day to develop. group, the partitions that it will duplicate a programmer to incorporate conits into are copied from existing replicas, and the This work shows that a Distributed Data an application. node is added to the replica group. Indi- Structure can provide a scalable, highly For more information, see
SCALABLE DISTRIBUTED DATA STRUCTURES The partition to be copied is locked by PROCESSES IN KAFFEOS: ISOLATION, FOR INTERNET SERVICE CONSTRUCTION the joining node, copied, and the replica RESOURCE MANAGEMENT, AND SHARING IN Steven D. Gribble, Eric A. Brewer, group maps are updated, and the locks JAVA Joseph M. Hellerstein, and David Culler, are released. Any write operations on that Godmar Back, Wilson H. Hsieh, and Jay University of California, Berkeley partition will have failed during that Lepreau, University of Utah Summarized by Mac Newbold time, but after the lock is released, retries will succeed. Summarized by Tamara Balac Building and running a cluster-based Godmar Back described the design and Internet service is hard. Steven Gribble Performance data shows that the maxi- implementation of KaffeOS, a Java vir- explained the implementation of a Dis- mum throughput of the DDS scales lin- tual machine that supports the operating tributed Data Structure (DDS) that is early with the number of bricks. Read system abstraction of a process and pro- designed to be a reusable storage layer for operations also scale linearly up to the vides the ability to isolate applications Internet services. The goals of the DDS saturation point, where read throughput from each other or to limit their resource are to be scalable, highly available and plateaus, but again, by adding bricks, the consumption and still share objects reliable in the face of failures, to maintain throughput is increased. Write operations directly. Processes enable several impor- strict consistency of distributed data, and run into problems, however, because tant features. First, the resource demands to provide operational manageability. In garbage collection ends up causing an for Java processes can be accounted for particular, they have designed and imple- imbalance between the nodes in a replica separately, including memory consump- mented a distributed hash table that group, and because the writes have to tion and GC time. Second, Java processes meets these goals. happen on all replicas, the slow one can be terminated, if their resource becomes the bottleneck. Even perfor- The basic design starts with a cluster. demands are too high, without damaging mance in recovery was very promising; This provides low latency, redundancy, the system. Third, termination reclaims an N-brick DDS during a single failure and makes a two-phase or multiple the resources of the terminated Java and recovery appeared to yield perfor- round-trip system feasible. All instances process. mance near to that of an (N-1)-brick of the service see the same data structure,
22 Vol. 26, No. 3 ;login:
needles in the craystack: when machines get sick
by Mark Burgess Part 4: Entropy: The Good, the Bad, and Mark is an associate professor at Oslo the Aged College and is the program chair for You and I, and everyone on the planet, are doomed to die because of a LISA 2001. memory leak in the human genome. For better or for worse, whether bug or a feature, DNA contains a sequence of repeated molecular material called telomeres which is used in the unzipping and replication of the DNA strand. Each time DNA is replicated, one of these telomeres is used up and does not
24 Vol. 26, No. 2 ;login: ideal gases, but it was later extended to many other phenomena, as general principles Entropy is a useful measure, were understood. It was many years before the full meaning of entropy was revealed. Many of us have acquired a mythological understanding of entropy, through accounts and its increase is an OMPUTING of popular physics, as being the expression of what we all know in our guts: that every- C
important principle, so thing eventually goes to hell. Disorder increases. Things break down. We grow old and fall apart. understanding it is key to all Entropy: Information’s Lost+Found science. Although there is nothing wrong with the essence of this mythological entropy, it is imprecise and doesn’t help us to understand why resource consumption has inevitable consequences, nor what it might have to do with computers. Entropy is a useful meas- ure, and its increase is an important principle, so understanding it is key to all science. What makes entropy a poorly understood concept is its subtlety. The exuberant mathe- matician John Von Neumann is reputed to have told Claude Shannon, the founder of information theory, that he should call his quantity H informational entropy, because it would give him a great advantage at conferences where no one really understood what entropy was anyway. Before statistical methods were introduced by Boltzmann and others, entropy was defined to be the amount of energy in a system which is unavailable for conversion into useful work, i.e., the amount of resources which are already reduced to waste. This had a clear meaning to the designers of steam engines and cooling towers but did not seem to have much meaning to mathematicians. Physicists like Boltzmann and Brillouin, and later Shannon, made the idea of entropy gradually more precise so that, today, entropy has a precise definition, based on the idea of digitization – discussed in the last part of this series. The definition turns out to encompass the old physical idea of entropy as unusable resources while providing a microscopic picture of its meaning. Think of a digitized signal over time. At each time interval, the signal is sampled, and the value measured is one of a number of classes or digits C, so that a changing signal is classified into a sequence of digits. Over time, we can count the occurrences of each digit. If the number of each type i is ni, and the total number is N, we can speak of the probability of measuring each type of digit since measurement started. It is just the frac- tion of all the digits in each class pi=ni/N. Shannon then showed that the entropy could be defined by
H = - p1 log2 p1 - p2 log2 p2 ...- pC log2 pC where the base 2 logarithm is used so that the measurement turns out to be measured in “bits.”This quantity has many deep and useful properties that we don’t have to go into here. Shannon showed that the quantity H was a measure of the amount of information in the original signal, in the sense that it measured the amount of its variation. He also showed that it is a lower limit on the length of an equivalent message (in bits) to which a signal can be compressed. The scale of entropy tells us about the distribution of numbers of digits. It has a mini- mum value of zero if all of the pi are zero except for one, i.e., if all of the signal lies in the same class, or is composed of the same single digit for all time, e.g., “AAAAAAAAA....” This corresponds to a minimum amount of information or a maximum amount of order in the signal. Entropy has a maximum value if all of the pi are the same. This means that the digitized signal wanders over all possible classes evenly and thus contains the maximum amount of variation, or information, i.e., it is a very disordered signal such as “QWERTYUIOPASD....”
June 2001 ;login: NEEDLES IN THE CRAYSTACK 25 Noise is indeed information: The entropy is just a number: it does not “remember” the sequence of events which led to the value describing state of the system, because it involves an implicit averaging over in fact, it is very rich time (we cannot recover a sequence of changes from a single number). But it does information. record how much average information was present in the sequence since measurements began. As a metaphor, entropy is discussed with three distinct interpretations: gradual degrada- tion (the ugly), total information content (the good), and loss of certainty (the bad). Let’s consider these in turn. Ugly: this interpretation is based on the assumption that there are many random changes in a system (due to unexpected external factors, like users, for instance), which cause the measured signal (or state of the system) to gradually wander randomly over all possible states. Entropy will tend to increase due to random influence from the environ- ment (noise). This corresponds to a gradual degradation from its state of order at the initial time. This interpretation comes from physics and is perhaps more appropriate in physics than in computer science, because nature has more hidden complexity than do computers; still, it carries some truth because users introduce a lot of randomness into computer systems. Thus, entropy is decay or disorder. Good: information can only be coded into a signal by variation, so the greater the varia- tion, the greater the amount of information which it could contain. Information is a good thing, some would say, but this view does not have any prejudice about what kind of information is being coded. Thus entropy is information. Bad: if there is a lot of information, distributed evenly over every possibility, then it is hard to find any meaning in the data. Thus entropy is uncertainty, because uncertainty is conflicting information. It is not hard to see that these viewpoints all amount to the same thing. It is simply a matter of interpretation: whether we consider information to be good or bad, wanted or unwanted; change is information, and information can only be represented by a pattern of change. Our prejudicial values might tend to favor an interpretation where a noisy radio transmission contains less information than a clear signal, but that is not objec- tively true. Noise is indeed information: in fact, it is very rich information. It contains information about all the unrelated things which are happening in the environment. It is not desired information, but it is information. Much of the confusion about entropy comes from confusing information with meaning. We cannot derive meaning from noise, because it contains too much information without a context to decipher it. Meaning is found by restricting and isolating information strings and attaching significance to them, with context. It is about looking for order in chaos, i.e., a subset of the total information. In fact, at the deepest level, the meaning of entropy or information is simple: when you put the same labels on a whole bunch of objects, you can’t tell the difference between them anymore. That means you can shuffle them however you like, and you won’t be able to reverse the process. Entropy grows when distinctions lose their meaning and the system spreads into every possible configuration. Entropy is reduced when only one of many possibilities is preva- lent. What does this have to do with forgetfulness and wastefulness? There are two prin- ciples at work.
26 Vol. 26, No. 3 ;login: The first, grouping by digitization (ignoring detail), involves reducing the number of Memory fragmentation of classes or distinctions into fewer, larger groups called digits. By assimilating individual classes into collective groups, the number of types of digits is fewer, but the numbers of resources can be discussed in OMPUTING each type increase and thus the entropy is smaller. This, of course, is the reason for our C
terms of entropy. current obsession with the digital: digital signals are more stable than continuous sig- nals, because the difference between 1 and 0 is coarse and robust, whereas continuous (analog) signals are sensitive to every little variation. The second principle is about repeated occurrences of the same type of digit. One digit in the same class is as good as the next, and this means that repeated occurrences yield no more information than can be gleaned from counting. Similarity and distinction can be judged by many criteria and this is where the subtlety arises. When we measure basic resources in terms of abstract definitions (car, computer, sector, variable, etc.) there are often a number of overlapping alternative interpretations, which means the entropy of one abstract classification differs from that of a different abstract classification. Consider an example: fragmentation of memory resources can be discussed in terms of entropy. In memory management, one region of memory is as good as the next and thus it can be used and reused freely. The random way in which allocation and de-allocation of memory occurs leads to a fragmented array of usage. As memory is freed, holes appear amidst blocks of used memory; these are available for reuse, provided there are enough of them for the task at hand. What tends to happen, however, is that memory is allocated and de-allocated in random amounts, which leaves patches of random sizes, some of which are too small to be reused. Eventually, there is so much randomization of gap size and usage that the memory is of little use. This is an increase of entropy. Several small patches are not the same as one large patch. There is fragmentation of memory, or wasted resources. One solution is to defragment, or shunt all of the allo- cated memory together, to close the gaps, leaving one large contiguous pool of resources. This is expensive to do all the time. Another solution is quantization of the resources into fixed-size containers. By making memory blocks of a fixed size (e.g., pages or sectors), recycling old resources is made easier. If every unit of information is allocated in fixed amounts of the same size, then any new unit will slot nicely into an old hole, and nothing will go to waste. This is essentially the principle of car parks (aka, parking lots in the US). Imagine a makeshift car park in a yard. As new cars come and park, they park at random, leaving random gaps with a distribution of sizes (non-zero entropy). New cars may or may not fit into these gaps. There is disorder and this ends up in wastefulness. The lack of disci- pline soon means that the random gaps are all mixed up in a form which means that they cannot be put to useful work. The solution to this problem is to buy a can of paint and mark out digital parking spaces. This means that the use of space is now standard- ized: all the spaces are placed into one size/space category (zero entropy). If one car leaves, another will fit into its space. The reason for dividing memory up into pages and disks into sectors is precisely to lower the entropy; by placing all the spaces into the same class, one has zero entropy and less wastage. C’s union construction seems like an oddball data type, until one under- stands fragmentation; then, it is clear that it was intended for the purpose of making standard containers. Standardization of resource transactions is a feature which allows for an efficient use of memory, but the downside of this is that it results in increased difficulty of location. Since the containers all look the same, we have to go and open every one to see what is
June 2001 ;login: NEEDLES IN THE CRAYSTACK 27 The accumulated entropy of a inside. In order to keep track of the data stored, different labels have to be coded into them which distinguish them from one another. If these labels are ordered in a simple change is a measure of the structure, this is easy. But if they are spread at random, the search time required to amount of work which would recover those resources begins to increase. This is also entropy at work, but now entropy of the physical distribution of data, rather than the size distribution. These problems be needed to remember how haunt everyone who designs computer storage. the change was made. The accumulated entropy of a change is a measure of the amount of work which would be needed to remember how the change was made. It is therefore also that amount of information which is required to undo a change. In the car parking example, it was a measure of the amount of resources which were lost because of disorder. In sector frag- mentation it is related to the average seek time. Entropy of the universe is a measure of the amount of energy which is evenly spread and therefore cannot be used to do work. We begin to see a pattern of principle: inefficient resource usage due to the variability of change with respect to our own classification scheme. Entropy reflects these qualities and is often used as a compact way of describing them. It is not essential, but it is precise and lends a unifying idea to the notion of order and disorder. Rendezvous with Ram In classifying data we coarse grain, or reduce resolution. This means actively forgetting, or discarding the details. Is this forgetfulness deadly or life-giving? If we recycle the old, we use less resources but are prevented from undoing changes and going back. The earlier state of order is lost forever to change, by erasure. We could choose to remember every change, accumulate every bit of data, keep the packaging from everything we buy, keep all of the garbage, in which case we drown in our own waste. This is not a sustainable option, but it is the price of posterity. Forgetting the past is an important principle. In a state of equilibrium, the past is unim- portant. As long as things are chugging along the same with no prospect of change, it doesn’t matter how that condition arose. Even in periods of change, the distant past is less important than the recent past. Imagine how insane we would be if we were unable to forget. One theory of dreaming is based on the idea that dreams are used for short- term memory erasure, collating and integrating with long-term experience. In Star Trek: The Next Generation it was suggested that the android Data never forgets. That being the case, one might ask how come he hasn’t exploded already? Where do all those memories go? If elephants never forget, no wonder they are so big! Taxation has long been a way of opposing the accumulation of material wealth or potential resources (money). Throughout the centuries, all manners of scheme have been introduced in order to measure that wealth: hearth (fireplace) tax, window tax, poll tax, income tax, road toll, and entrance fees. Since income tax was introduced, records in the UK have been kept on all citizens’ activities for years at a time, although there is an uncanny feel- ing that tax inspectors might pursue them to the grave for lost revenue, replacing the accumulation of wealth with an accumulation of records. In fact, after 12 years, tax records are destroyed in the UK. A sliding-window sampling model, rather than a cumulative model is the essence of recycling. Queuing and Entropy Memory is about recording patterns in space, but entropy of spatial classification is not the only way that resources get used up. Another way is through entropy of time resources. Everyone is familiar with the problem of scheduling time to different tasks. Interruption is the system administrator’s lot. As one reader commented to me, his
28 Vol. 26, No. 3 ;login: company often insists: drop what you are doing and do this instead! It results in frag- Queues build up because mentation of process: only a small piece of each task gets done. In computer systems it is algorithmic complexity which is responsible for sharing time amongst different tasks. there is a limit to how fast OMPUTING
Context switching is the algorithm multi-tasking computers use for sharing time strictly C
transactions can be expedited between different tasks. This sharing of time implies some kind of queuing with all its attendant problems: starvation and priorities. Context switching places tasks in a by a communications channel. round-robin queue, in which the system goes through each task and spends a little time on it, by assigning it to a CPU. This is an efficient way of getting jobs done, because the number of objects or classes is approximately constant, and thus the parking lot princi- ple applies to the time fragments. It does not work if the number of jobs grows out of control. But if one simply piles new jobs on top of one another, then none of the jobs will get finished. Anyone who has played strategy games like Risk knows that it does not pay to spread one’s army of resources too thinly. This is much the same problem that is considered in traffic analysis (cars and network packets). At a junction, cars or packets are arriving at a certain rate. The junction allows a certain number to flow through from each adjoining route, but if the junction capac- ity is too slow, then the entropy of the total resources grows to infinity because the num- ber of different digits (cars or packets) is growing. No algorithm can solve this problem, because it has to focus on smaller and smaller parts of the whole. This is the essence of the expression a watched kettle never boils taken to extremes. Spamming or denial of ser- vice attacks succeed because resources are used up without replacement. This leads to “starvation” of time resources and/or memory resources. It was once suggested to me that cars should not have to drive more slowly on the motorway when lanes are closed for repair: according to hydrodynamics, everyone should drive much faster when they pass through the constricted channel, to keep up the flow. Unfortunately, unlike ideal fluids, cars do not have purely elastic collisions. Queues build up because there is a limit to how fast transactions can be expedited by a communications channel. Reversible Health and Its Escape Velocity The message I have been underlining above is that there is a fundamental problem where limited resources are involved. The problem is that reversibility (the ability to undo) depends on information stored, but that information stored is information lost in terms of resources. There is an exception to this idea, however. Some corrections occur in spite of no log being made. What goes up must come down. Common colds and other mild diseases are not fatal to otherwise healthy individuals. The pinball will end up in its firing slot. A compass always points to magnetic North. Moths fly toward a flame. Adults prefer to look younger. Follow the light at the end of the tunnel. Ideals. If you drop a ball to the ground, it does not usually land at your feet and remain there: the ground is seldom flat, so it begins to roll downhill until it finds a suitable local mini- mum. It tries to minimize its potential energy under the incentive of gravitation. Now energy is just a fictitious book-keeping parameter which keeps track of how much it cost to lift the ball up earlier and how much will be repaid by letting it roll down again. Like entropy, energy is a summary of information about how resources are distributed in the face of an unevenness. In thermodynamics, energy U and entropy S appear in relationships with the opposite sign: dF = dU - TdS
June 2001 ;login: NEEDLES IN THE CRAYSTACK 29 A potential is an anti-entropy F is the free energy, or the amount of energy available for conversion into useful work, while U is the total energy and S is the entropy (the amount of energy which is spread device. It makes a difference about in a useless form). T is the temperature, which acts as an essentially irrelevant by weighting the possibilities. integrating factor for the entropy in this formulation. A potential is an anti-entropy device. It makes a difference by weighting the possibilities. It tips the balance from pure randomness, to favor one or few possibilities or ideals. Think of it as a subsidy, according to some accounting principle, which makes certain configurations cheaper and therefore more likely. A potential can guide us into right or wrong, healthy or unhealthy states if it is chosen appropriately. While entropy is simply a result of statistical likelihood (deviation from ideal due to random change), a potential actually makes a choice. Potentials are all around us. Indeed, they are the only thing that make the world an interesting place. Without these constraints on behavior, the Earth would not go around the sun; in fact it would never have formed. The universe would just be a bad tasting soup. Emotions are potentials which guide animal behavior and help us to survive. I have often wondered why the writers of Star Trek made such an issue about why the android Data supposedly has no emotions. For an android without emotions, he exhibits constantly emotional behavior. He is motivated, makes decisions, shows com- plex “state” behavior, worries about friends and has “ethical subroutines.”The fact that he does not have much of a sense of humor could just as well mean that he originated from somewhere in Scandinavia (this would perhaps explain the pale skin, too). Probably, complex animals could not develop adaptive behavior (intelligence) without emotions. Whatever we call complex-state information, it amounts to an emotional condition. Emotions may have a regulatory function or a motivational function. They provide a potential landscape which drives us in particular directions at different times. They alter the path of least resistance by enticing us to roll into their local minima. It’s pinball with our heads. There is probably no other way of building complex adaptive behavior than through this kind of internal condition. We think of our emotions as being fairly coarse: happy, sad, depressed, aroused, but in fact, they have complex shades. We just don’t have names for them, because as was noted in the last issue, we digitize. Reversal of state can be accomplished without memory of the details, if there is an inde- pendent notion of what ideal state is: a potential well, like an emotional context, driving the system towards some tempting goal. That is because a potential curves the pathways and effectively makes them distinguishable from one another, labeling them with the value of the potential function at every point. Health is such a state: a complex multifac- eted state whose potential is implemented in terms of a dynamical immune system rather than a static force. Ecologies and societies also have emergent goals and prefer- ences. These represent a highly compressed form of information, which perhaps sum- marizes a great deal of complexity, just as a curve through a set of points approximates possibly more complex behavior. If one could make a computer’s ideal condition, its path of least resistance, how simple it would be to maintain its stability. Usually, it’s not that simple, however. The playing field is rather flat, and armies battle for their position. It is as though hordes of barbaric changes are trying to escape into the system, while administrators representing the forces of law and order try to annex them. Neither one has any particular advantage other than surprise, unless the enemy can be placed in a pit.
30 Vol. 26, No. 3 ;login: Computer systems remain healthy and alive when they recycle their resources and do not drift from their ideal state. This was the idea behind cfengine when I started writing it eight years ago. The ideal computer state is decided by system policy, and this weights OMPUTING the probabilities pn so that randomness favors a part of the good, rather than the bad or C the ugly. Although a potential can only guide the behavior on average (there will always be some escape velocity which will allow a system to break its bounds), the likelihood of its long-term survival, in a world of limited resources, can only be increased by com- pressing complex information into simple potentials. This was the message of cfengine and the message of my papers at LISA 1998 and 2000. Now all we need is to design the pinball table.
So pn – you feelin’ lucky?
June 2001 ;login: NEEDLES IN THE CRAYSTACK 31 Declarations in c9x
Last time, we started looking at some of the new language features in C9X, by Glen the recent update to C. In this column we’ll look at C9X declarations and McCluskey how they have changed from what you are familiar with. Glen McCluskey is a consultant with 20 years of experience Function Declarations and has focused on programming lan- If you’ve used C for a long time, you might remember that the language was once much guages since 1988. looser about function declarations. You didn’t have to declare functions before use; for He specializes in Java and C++ perfor- example, you could say: mance, testing, and technical documenta- void f() tion areas. {
32 Vol. 26, No. 3 ;login: These tighter rules are not arbitrary; they lead to better code. Consider, for example, this program:
#include
f(37); ROGRAMMING P return 0; } void f(int a, int b) { printf(“%d %d\n”, a, b); } We use an old-style declaration of f(), then call the function with a single argument. The function actually requires two arguments, so when the printf() is encountered, one of the parameters has a garbage value. The same consideration applies to return statements, if, for example, you fail to return a value from a function declared to return an int. Inline Functions With C9X you can define a function using the inline keyword, like this: inline void f() {...} inline is a hint to the compiler that it should optimize calls to the function, perhaps by expanding them in the context of the caller. There is no requirement that a compiler actually observe this hint. Inline functions are quite subtle in a couple of areas, illustrated by the following exam- ple, composed of two translation units: // file #1 void f() { } void g(void); int main() { f(); g(); return 0; } // file #2 inline void f() { } void g() {
June 2001 ;login: DECLARATIONS IN C9X 33 f(); } The inline definition of f() has external linkage. To give the function internal linkage, we would use static inline void f(). This definition is not an external definition, and because f() is actually called, an external definition is required somewhere within the program. This definition is found in the first translation unit. A program can contain both an inline and an external definition of a function. The pro- gram cannot rely on the inline definition being used, even if it is clearly visible, as in the case above where g() calls f(). The inline and external definitions are considered distinct, and a program’s behavior should not depend on which is actually called. As another example of this idea, con- sider: // file #1 const int* f() { static const int x = 0; return &x; } // file #2 inline const int* f() { static const int x = 0; return &x; } int main() { return f() == f(); } Given these definitions of f(), it is not necessarily the case that main() always returns 1. Different f()s may be called, containing different static objects. An example of where this might happen would be with a compiler that applies some heuristic about how big inline expansions are allowed to get, with the external definition of the function used in cases where the inline expansion would be too large. The C inline model is similar to that of C++, but differs in that (1) if a function is declared inline in one translation unit, it need not be declared so in all translation units, and (2) all definitions of an inline function need not be the same, but the program’s behavior should not depend on whether an external definition or an inline version of a function is called. Mixed Declarations and Code In C9X you can intersperse declarations and code, a feature also found in C++. It’s no longer necessary to place all declarations at the beginning of a block. Here’s an example that counts the number of “A”characters in a file, using this coding style: #include
34 Vol. 26, No. 3 ;login: { if (argv[1] == NULL) { fprintf(stderr, “Missing input file\n”); return 1; OMPUTING } | C FILE* fp = fopen(argv[1], “r”); if (fp == NULL) { fprintf(stderr, “Cannot open input file\n”); ROGRAMMING
return 1; P } int cnt = 0; int c; while ((c = getc(fp)) != EOF) { if (c == ‘A’) cnt++; } fclose(fp); printf(“count = %d\n”, cnt); return 0; } This style is in many ways more natural than the older approach of grouping all declara- tions at the top of a block. Each declaration is introduced as needed, in a meaningful context, and there’s less temptation to recycle declarations, by using a variable in multi- ple ways within a function. The scope of each identifier declared in this way is from its point of declaration to the end of the block. Declarations in fo r Statements You can also use declarations within for statements, like this: #include
June 2001 ;login: DECLARATIONS IN C9X 35 return 0; } The scope of the loop index goes to the end of the for statement, so the index name is unknown outside. If you want to code this way, you need to continue declaring the loop variable before the loop. Here’s another (legal) example of for statements, one that demonstrates the subtleties of scoping: void f() { int i = 37; for (int i = 47; i <= 100; i++) { int i = 57; } } The for statement introduces a new block, and the statement body is also a distinct block. To make the blocks more visible, we could write the code like this: void f() { int i = 37; { for (int i = 47; i <= 100; i++) { int i = 57; } } } The three declarations of i are all valid, because each occurs in a new scope. Each decla- ration hides variables of the same name in outer scopes. The new declaration features promote better programming and more natural documen- tation. They are natural to use and reduce programming errors.
36 Vol. 26, No. 3 ;login: using CORBA with java
A Mini Napster, Part II by Prithvi Rao
Introduction Prithvi Rao is the co- OMPUTING
founder of KiwiLabs, | C In Part I of this two-part series, I presented the development of a Java which specializes in software engineering client/server application using CORBA. Specifically, I embarked on the devel- methodology and Java/CORBA training. opment of a “mini Napster” example. He has also worked on
the development of ROGRAMMING The development effort began with the writing of a Napster IDL (Interface Definition the MACH OS and a P Language) and ended with the description of the functionality of the files that were gen- real-time version of MACH. He is an erated as a result of compiling Napster.idl using the idltojava compiler. adjunct faculty at Carnegie Mellon and teaches in the Heinz In this article I wish to present the client and server code to complete this example. I School of Public Policy and Management. hope to demonstrate that writing a CORBA application is neither overwhelming nor
June 2001 ;login: USING CORBA WITH JAVA 37 if(record.album_name.equals(albumName)) { if(record.version > highest_version_number) { highest_version_number = record.version; index = records_database.indexOf(record); } } } // Check if a record with album_name = albumName was found if(highest_version_number == -1) { // The record does not exist in the database so return a dummy record // with an empty album name. Also you need to set the other fields to // some dummy values even though you don’t need them logically, // because otherwise CORBA will throw an Exception Record dummy_record = new Record(); dummy_record.album_name = " "; dummy_record.artist_name = " "; dummy_record.owner_name = " "; dummy_record.version = -1; return dummy_record; } // If we are here then we must have found the record in the database, // so return the found record return (Record) records_database.elementAt(index); } Write the addRecordInServer method: public String addRecordInServer(Record desiredRecord) { // Get the record with the highest version number Enumeration record_iter = records_database.elements(); while(record_iter.hasMoreElements()) { Record record = (Record)record_iter.nextElement(); if((record.album_name.equals(desiredRecord.album_name)) && (record.artist_name.equals(desiredRecord.artist_name))) if(record.version >= desiredRecord.version) return "Duplicate Record, Record Not Added to the Server"; } // while // We can now safely add the record to the database records_database.addElement(desiredRecord); return "Record Successfully Added"; } Write the deleteItemInServer method: public boolean deleteItemInServer(Record desiredRecord) { int num_of_records = records_database.size(); boolean record_deleted = false; for(int i = 0; i < num_of_records; i++) {
38 Vol. 26, No. 3 ;login: Record record = (Record) records_database.elementAt(i); if((record.album_name.equals(desiredRecord.album_name)) && (record.artist_name.equals(desiredRecord.artist_name))) { OMPUTING records_database.removeElementAt(i); | C record_deleted = true; } } ROGRAMMING return record_deleted; P } Write the updateRecordInServer method public boolean updateRecordInServer(Record desiredRecord, String newOwner) { // Get the record with the highest version number Enumeration record_iter = records_database.elements(); int highest_version_number = 0; int index = 0; while(record_iter.hasMoreElements()) { Record record = (Record)record_iter.nextElement(); if((record.album_name.equals(desiredRecord.album_name)) && (record.artist_name.equals(desiredRecord.artist_name))) { if(record.version > highest_version_number) { highest_version_number = record.version; index = records_database.indexOf(record); } } } // If the record was not found then return false if(index == 0) return false; // Otherwise update the record Record record = (Record) records_database.elementAt(index); record.owner_name = newOwner; record.version + = 1; records_database.addElement(record); return true; } } The Napster Main Class The following is the Napster main class (Napster.java) in which all the CORBA work happens. First import all the CORBA packages: import Napster.*; import org.omg.CosNaming.*; import org.omg.CosNaming.NamingContextPackage.*;
June 2001 ;login: USING CORBA WITH JAVA 39 import org.omg.CORBA.*; public class Napster { public static void main(String args[]) { try { Create the ORB and initialize it. This is where the ORB knows how to find the location of the NameServer with which to register the name of the server objects. ORB orb = ORB.init(args, null); Create a Server Object here: NapsterServer napster_server = new NapsterServer(); Connect the Server Object to the ORB: orb.connect(napster_server); Get a reference to the nameserver: org.omg.CORBA.Object objRef = orb.resolve_initial_references("NameService"); Get an object reference that is a “Java” object and not a CORBA object: NamingContext ncRef = NamingContextHelper.narrow(objRef); Make sure that the server object can be located using “NapsterServer”: NameComponent nc = new NameComponent("NapsterServer", ""); Save this in a component array: NameComponent path[] = {nc}; Bind this component name with the ORB so that the client can get a reference to the server object: ncRef.rebind(path, napster_server); Make sure that you can have multiple clients connect to the server: java.lang.Object sync = new java.lang.Object(); synchronized(sync) { sync.wait(); } // try catch(Exception e) { System.err.println("ERROR: " + e); e.printStackTrace(System.out); } } } The Napster Client In this section I present the Napster Client (NapsterClient.java). First import the CORBA packages and other Java packages:
40 Vol. 26, No. 3 ;login: import Napster.*; import java.util.*; import java.lang.*; import java.io.*; OMPUTING import org.omg.CosNaming.*; | C import org.omg.CORBA.*; public class NapsterClient { ROGRAMMING private NapsterServerI napster_server; P private BufferedReader stdin; Write the constructor for the NapsterClient: public NapsterClient(NapsterServerI p_napster_server, BufferedReader p_stdin) { napster_server = p_napster_server; stdin = p_stdin; } Write the method to get a record: public void getRecord() { try { System.out.print("Please Enter the Album Name: "); String album_name = stdin.readLine(); System.out.print("Please Enter Your Name: "); String user_name = stdin.readLine() ; Record found_record = new Record(); found_record = napster_server.findItemInServer(album_name) ; // If we don’t find the record then give an appropriate message to the user if(!found_record.album_name.equals(album_name)) System.out.print("The Album Name That You Entered Does Not Exist"); else { // We found the record. So create a new record with the current user // as owner and an increased version number to make it the latest record System.out.println("Information Regarding the Album:"); System.out.println("Album Name : " + found_record.album_name); System.out.println("Artist Name : " + found_record.artist_name); System.out.println("Owner : " + found_record.owner_name); System.out.println("Version : " + found_record.version); Record new_record = new Record(); new_record.album_name = album_name; new_record.artist_name = found_record.artist_name; new_record.owner_name = user_name; new_record.version = ++found_record.version; String server_response = napster_server.addRecordInServer(new_record); System.out.println("New Record with the Following Info Added to the Server:"); System.out.println("Album Name : " + new_record.album_name); System.out.println("Artist Name : " + new_record.artist_name); System.out.println("Owne : " + new_record.owner_name);
June 2001 ;login: USING CORBA WITH JAVA 41 System.out.println("Version: " + new_record.version); System.out.print(server_response); } // else } // try catch(Exception e) { System.out.println("Error: " + e); e.printStackTrace(System.out); } // catch } Write the method to add a record: public void addRecord() { try { // Get the fields of the new record to create System.out.print("Please Enter Your Name: "); String user_name = stdin.readLine(); System.out.print("Please Enter the Album Name: "); String album_name = stdin.readLine(); System.out.print("Please Enter the Artist Name: "); String artist_name = stdin.readLine() ; // Create a new record and initialize its fields Record new_record = new Record(); new_record.album_name = album_name; new_record.owner_name = user_name; new_record.artist_name = artist_name; new_record.version = 1; String server_response = napster_server.addRecordInServer(new_record); system.out.print(server_response); } //try catch(Exception e) { System.out.println("Error: " + e); e.printStackTrace(System.out); } // catch } Write a method to delete a method: public void deleteRecord() { try { System.out.print("Please Enter the Album Name: "); String album_name = stdin.readLine()j; System.out.print("Please Enter the Artist Name: "); String artist_name = stdin.readLine();
42 Vol. 26, No. 3 ;login: // Create a record that you would like to be deleted from the server // database
Record new_record = new Record(); OMPUTING
new_record.album_name = album_name; | C new_record.owner_name = new String(“Aravind”); new_record.artist_name = artist_name; new_record.version = 1; ROGRAMMING
System.out.println("Before Calling Server Delete"); P boolean record_deleted = napster_server.deleteItemInServer(new_record); System.out.println("After Calling Server Delete"); if(record_deleted) System.out.print("The Record Was Successfully Deleted on the Server"); else System.out.print("The Record Could Not Be Deleted on the Server"); } // try catch(Exception e) { System.out.println("Error: " + e); e.printStackTrace(System.out); } // catch } Write a method to update a record: public void updateRecord() { try { System.out.print("Please Enter the Album Name: "); String album_name = stdin.readLine(); System.out.print("Please Enter the Artist Name: "); String artist_name = stdin.readLine() ; // Create a record that you would like to be updated from the server // database Record update_record = new Record(); update_record.album_name = album_name; update_record.artist_name = artist_name; // Got to give dummy values for the other values in the record otherwise // CORBA will complain update_record.owner_name = new String(" "); update_record.version = 1; System.out.print("Please Enter the New Owner Name: "); String new_owner = stdin.readLine() ; // Call the update function on the server boolean record_updated = napster_server.updateRecordInServer(update_record, new_owner); if(record_updated) { System.out.print("The Record Was Successfully Updated on the Server");
June 2001 ;login: USING CORBA WITH JAVA 43 System.out.println("Album Name : " + album_name); System.out.println("Artist Name : " + artist_name); System.out.println("Owne : " + new_owner); } else System.out.print("The Record Could Not Be Updated on the Server"); } // try catch(Exception e) { System.out.println("Error: " + e); e.printStackTrace(System.out); } // catch } Write a method to display menu options: public void displayMenu() { // Display the Menu System.out.println("\n "); System.out.println("Enter One of the Following Options"); System.out.println("1. To Get an Album on the Server"); System.out.println("2. To Add an Album to the Server"); System.out.println("3. To Delete an Album on the Server"); System.out.println("4. To Update an Album on the Server"); System.out.println("5. To Exit"); } Write the client main method: public static void main(String args[]) { try { Initialize the ORB. The args array tells the ORB on the client machine where the name- server is running (machine name and port to use to connect to the nameserver): ORB orb = ORB.init(args, null); Get a reference to the NameServer: org.omg.CORBA.Object objRef = orb.resolve_initial_references ("NameService"); Create a Java object from a CORBA object: NamingContext ncRef = NamingContextHelper.narrow(objRef); Find the path on the NameServer where the server object is located: NameComponent nc = new NameComponent("NapsterServer", " "); Set the path: NameComponent path[]= {nc}; Get a server object and narrow the reference from a CORBA object to a java object (all in one call):
44 Vol. 26, No. 3 ;login: NapsterServerI napster_server = NapsterServerIHelper.narrow(ncRef.resolve(path)); BufferedReader stdin = new BufferedReader(new InputStreamReader(System.in)); OMPUTING | C Start the client: NapsterClient napster_client = new NapsterClient(napster_server, stdin);
String choice_str; ROGRAMMING P
int choice_int = 1; // Keep looping till the user tells us to quit while(choice_int != 5) { napster_client.displayMenu() ; choice_str = stdin.readLine() ; choice_int = Integer.parseInt(choice_str) ; switch(choice_int) { case 1: napster_client.getRecord(); break; case 2: napster_client.addRecord() ; break; case 3: napster_client.deleteRecord(); break; case 4: napster_client.updateRecord() ; break; case 5: choice_int = 5; break; default: System.out.println("Invalid Option. Please Try Again"); break; } // switch } // while } catch(Exception e) { System.out.println("Error: " + e); e.printStackTrace(System.out); } // catch } // main } // class Running Napster 1. Install the idltojava compiler. 2. Install JDK1.2.1. 3. Compile the Napster.idl file: idltojava Napster.idl. 4. Compile the remaining files: Napster.java, NapsterServer.java, NapsterClient.java. javac *.java Napster/*.java
June 2001 ;login: USING CORBA WITH JAVA 45 5. Run the nameserver. tnameserv -ORBInitialHost