conference reports
This issue’s reports are on the Linux 2.5 Linux 2.5 Kernel Developers Linux development, but I certainly Kernel Developers Summit Summit thought that, in all of this time, someone would have brought this group together OUR THANKS TO THE SUMMARIZER: SAN JOSE, CALIFORNIA before. Rik Farrow, with thanks to La Monte MARCH 30-31, 2001 Yarroll and Chris Mason for sharing their Summarized by Rik Farrow Another difference appeared when the notes. first session started on Friday morning. The purpose of this workshop was to The conference room was set up with cir- provide a forum for discussion of cular tables, each with power strips for changes to be made in the 2.5 release of For additional information on the Linux laptops, and only a few attendees were Linux (a trademark of Linus Torvalds). I not using a laptop. USENIX had pro- 2.5 Kernel Developers Summit, see the assume that many people reading this vided Aeronet wireless setup via the following sites: will be familiar with Linux, and I will hotel’s T1 link, and people were busy
June 2001 ;login: 5 Larsh also suggested that Linux do away other process has been locked but is not multiple streams can use the same con- with the elevator algorithm and let the scheduled.” nection. Also, there are no bitwise flags, hardware do the work. Linus Torvalds and all options are word-aligned. Ted Ts’o, who moderated the event, asked if Larsh had tried setting some called a break at that point. Breaks were Someone else asked if there is any talk of elvtune parameter to one, and Larsh said always 30 minutes, giving ample time for moving part of the protocol into hard- he hadn’t. One thing that became clear to discussion. ware. Yarroll answered, “It is a dream. me was that most of the Linux kernel There are a lot of properties that should developers were software guys (some- SCTP make SCTP hardware implementable.” thing that Andre Hedrick really made a La Monte H.P. Yarroll, Motorola Ted Ts’o pointed out that fiber channels point of later). Modern hard drives La Monte Yarroll described a new proto- are very expensive, and SCSI over SCTP reorder the physical location of tracks on col that will be peer to UDP and TCP would be a viable option. the fly based on the current location of (layer four for OSI fans). SCTP stands for the heads, so using any elevator algo- During the break, Stephen Tweedie, the Stream Control Transmission Protocol rithm makes little sense. next presenter, moved toward the front (RFC2960) and has several design goals: and Linus intercepted him at the table Oracle also has problems with the mem- Sequenced delivery of user messages where I was sitting. Soon, Ben LaHaise ory model used by Linux. Some IA32 within multiple streams joined in a spirited discussion about zero (Intel x86) -based systems can have in Network-level fault tolerance copy writes. Zero copy writes avoid the excess of 4GB of RAM, but Linux device through support of multi-homing at performance hit of a memory to memory drivers handle this by using a bounce either or both ends of an association copy, and Linus shared his skepticism buffer to copy data to a region below MTU set at layer four to prevent about how it is being implemented. My 1GB, losing performance to the copy. fragmentation at the IP layer impression was of a professor with not a Asynchronous I/O was also a problem, as Optional bundling of multiple mes- lot of seniority arguing with his grad stu- is the use of the O_SYNC and O_DSYNC sages within the same packet dents and other professors. At one point, flags. Quite a lively debate started at this Linus said something that I thought was point, with one participant saying that SCTP very revealing: “We don’t want to wind O_DSYNC was the default in 2.4. (
6 Vol. 26, No. 3 ;login: June 2001 programs will continue will data, programs to write As most isdonelater. and theactualwrite only thespace forthedataisreserved, indelayed writes immediately to disk, data of scheduling awrite Instead of delayed writes. rounded the notionof key The ideassur- the VFS interface. could bemoved into of thekernel aspart them suggestingthatsomeof XFS, of Lord discussedsomeadvanced features Stephen Lord, SGI A for andsixteen bits atboottime, all devices cases you notbeableto enumerate will Ts’o mentionedthatinsome devices. IDE andthesamewith by probe order, SCSInaming infiberchannel, namings There are worldwide tinue to beanissue. con- naminghasbeenandwill Device adapter. thanforhostbus ing perspindlerather ing could bemademore fairby schedul- Schedul- Tweedie said. to 8,192requests!” up “Each I/Omergeisascanof memory. tiguous ondiskbutdiscontinuous in not order them.” just stillwantusto mergerequests, will butyou canbedone, “That responded, Tweedie theelevator. of disable part We would prefer thatyou can spindles. butare really devices hundredslogical of thatlook likethat we builddevices 36 Someonesaid queue. maintain itsown Tweedie agreed thateach driver should placebo bit.” “The DonDuggansquipped, drive. you disablethe cache onthe flush even if youon haveATA, to doawrite-back caching writeback Becauseof pens now. ashap- nottwice orscanning, inserting either when should onlyoccur once, elevator scans In thefuture, mance. layerthe device andgotbetter perfor- they movedthey the some tests with Andre Hedrick where Jens Axbone mentionedthathehaddone notbeenough. number) will DVANCED dev_t (that holdstheminordevice ;login: F ILE S YSTEM io_request_locks I NTEGRATION within L a hadexperimented with Kutsnetsov, Olssonand working with Hadi Salim, lied!” “They someone replied, so much better, When Andy Grover asked why BSD did not overloaded. othernetwork interfacesstarves thatare This behavior also performance isworse. onSMPsystems, Furthermore, good. thisisnot byte packets isabout25kpps, 64 with mind thataT1filledto capacity Keeping in Linux foldsatonly24kpps. 70kpps (kilopackets persecond) butthat the BSDkernel could handleupto system andaLinux system showed that Acomparison between aFreeBSD lapses. the system col- extreme network loads, under existing Linux kernels: with Hadi problem Salimexposed aserious Robert OlssonandAlexyKutsnetsov) Jamal HadiSalim,ZnyxNetworks(also I fully doingthisforyears. Lord replied thatSGIhasbeensuccess- cient space isreserved forfilemetadata. insuffi- delayed I/Oandwhathappensif The discussionshifted backto talkabout many CPUs. to scaleupto itisnecessary so dered if access) architecturesmemory andwon- Miller brought upNUMA(non-uniform bethefuture.smaller scaleclusters might Othersseemedto thinkthat application. the neededby of differentwidth parts band- variable clusters canfailbecauseof Lord replied that CPUmachines. eight could bereduced by usingclusters of Miller suggested thatthelatency issues that SGI’s customers were them. buying Lord answered to use128CPUsystems. itmakes senseforanyone Miller asked if Dave the needforLinux to scale larger, After somecomments intheQ&Aabout all. may never to needto diskat bewritten files andtemporary become batched, to delayed causediskwrites writes O LMNTN THE LLUMINATING INUX NE M 2.5 K ORE T ERNEL IME D EVELOPERS N ETDRIVER S UMMIT P . . . . API its way into the2.5kernel. which islikely to find about thissolution, everyone feelinggood sion endedwith Theses- polling. driver doesnotsupport system reverts to oldbehavior whenthe Hadi Salim saidthatthe older drivers. and latency (lesslatency), performance), (better thissolutiononservers effect of There wasmore discussionaboutthe applause. round of there wasa clean upthedrivers a lot, When Miller remarked thatthiswould collapsing.” tothen you dointerrupt start you getablast, but if up, packets right you justpassthe right, you dothisstuff If [SGI] hadto slow work processors. with forced because they work inthisarea, good amountof SGI didanenormous “Rob Warnick remarked, at some point, for alltheUNIXworkstation vendors at McVoy, whoseemsto haveLarry worked DMA andpolling. using commodity hardware thatsupports couldbeta 10they get200kppspeak Using 2.4.0 out any processing. further are they dropped with- isfull, DMA ring more packets arrive whenthe If empty. is to thedevice assigned DMA ring increases andonlyre-enabled whenthe arerupts disabledwhentheload Salim’s solutionissimilarinthatinter- Hadi interrupts. polling instead of andthefastest drivers used this out, theearlyninetieshadalready figured of card designers Serial whelms thekernel. processingthis interrupt thatsoonover- anditis interface generates aninterrupt, atthenetworkEach packet arriving bad nooneasked meaboutthisearlier). An solutionoccurred obvious to me(too Tulip chip). cific hardware solutions(e.g., spe- requiring without and throughput, andbalance latency or reduce unfairness, remove drop packets earlyonoverload, were to reduce onoverload, interrupts Theirgoals fold increase inperformance. solution thatappearsto provide aten- 7 CONFERENCE REPORTS LINUX HOT PLUG sion (or flame war) be done offline in the prompted concern from Jes Sorenson, Johannes Erdfelt, VA Linux; Greg Kroah- 10 p.m. BoF. This did not stop the discus- who is apparently porting Linux to IA64. Hartman, WireX sion, which continued until the end of “Can I make it work without Python?” he Erdfelt started off by saying that they the session. Anvin made an interesting asked. Many people pointed out that the don’t have a solution to this problem. His point when he mentioned that the Japan- work could be done on the cross-compil- own area of expertise is USB, particularly ese wanted Japanese device names rather ing system and that Python was not hubs and hot plug needs to deal with than English ones. required on the target system. SCSI, Firewire, PCMCIA/Cardbus, and There was actually a fair amount of hotswap PCI in addition to USB devices. RECEPTION The reception drew most of the atten- resistance to this effort. Raymond The problems include name and file per- pointed out that the new version uses missions: that is, the device name pre- dees, even some people who had said ear- lier that they couldn’t stay. I wound up 40% less code, runs faster, and has fea- sented to the user should not change and tures the old tool does not. At one point, neither should the permissions associ- talking to Peter Loscocco of the NSA about his part in a project to add real the discussion veered into arguing about ated with that device. Since not all of the kernel symbol table. But it ended on a these devices have anything resembling a security to the Linux kernel and missed the BoFs. Well, I could have attended good note, with Raymond saying, “Down serial number or UUID, identifying them the road, I want to make configuration so can be a problem. since they went until midnight, but I packed it in. There were also BoFs after easy that your Aunt Tilly could do it.” Other issues include what to do when the conference finished. You can read This was followed by enthusiastic there are multiple drivers (e.g., using a about the BoFs, and get another perspec- applause. parallel port) and how to notify applica- tive on the conference, at LWN: FUTURE VM WORK tions (hey, a joystick just appeared!).
8 Vol. 26, No. 3 ;login: June 2001 has not been evaluated, although ithas although has notbeenevaluated, system The Booksystem. not anOrange pointis thatSELinux is One important everything. ing thatitispossibleto tiedown mean- nel are pervasive andfine-grained, Thechanges to the ker- can change them. administrator and thatonlyasecurity arepermissions nolongerdiscretionary, MAC meansthat standard Linux kernel. Mandatory Access Controls (MAC) to a SE Linux uses alargekernel patch to add that’s anotherstory. But ties inkeeping itupandrunning. difficul- IMP somewhere elsebecauseof theNSFmoved the Eventually, funny. all thedataon ARPANET were really abouthowstories theNSAwascopying andthat passed through theNSAsite, responsible forthe ARPANET IMPthat hewas also told methatintheeighties, < at you like, orthecode if details, You cangetmore get more support. that hewasdisappointed thathedidn’t Loscocco although told melater received, thatthisproposalI thought waswell Peter Loscocco,NSA M process asthesystem recovers. releasing each andgradually lize theload, inorder to stabi- tosystem thrash, begins gested suspendingprocesses when the Van Rielsug- canalsooccur; Thrashing to freeallocate somepages. memory itneedsto forexample, deadlocks: with The Linux kernel stillhassomeproblems deadlocks. of to leading thepossibility needed, lockingwould alsobe Another layer of onsomearchitectures).4MB boundary (likestraints having ona to bealigned butitwould involve con- asked for, thisisalsosomethingthatOracle Again, case where processes share memory. Shared pagetableswould alsohelpinthe would beto mix largeandsmallpages. sotheideal uses, not suitableforall http://www.nsa.gov/selinux/ ANDATORY ;login: A CCESS C ONTROLS .Loscocco >. / SEL INUX L applications when you wantto avoid that thiswould beusefulinevent-based LaHaise pointed out tion to complete. continue waiting fortheopera- without andthen datato somedevice to write Asynchronous I/Owould allow programs Ben LaHaise,RedHatCanada Ltd. A simply returns. could replace afunctionthat thiswith ple whodidnotwantto usethesecurity Peo- beacceptable. thenitmight added, had code theNSAdesign each sectionof afunctioncallcould replace If them. did notwantto have to choose between andthathe now, ects inprogress right aware thatthere were proj- many security but hewas he liked thecode hehadseen, Linus saying that This sessionendedwith for example. could beusedto a naildown Web server, come somepolicymodelsthat with that thesampleimplementationdoes Loscocco said notpolicy. enforcement, The current implementationfocuseson to that ausercanwrite you canspecify With Type Enforcement, tobut addstheprogram theequation. object (resource) foundinearliermodels includes theusualsubject(user)and Type Enforcement Bookdesigns. Orange which goesway beyond Enforcement, SELinux usesType Symposium). a paperpresented ataUSENIXSecurity Fluke andFlaskwith (see MACH, within evolved from earlierwork such asDTOS development of: SELinux include goalsof The design only when running the only whenrunning INUX SYNCHRONOUS hi fprocessing chain of Invocation policiesto control the tions from modification policiesto protectIntegrity applica- accessserver to data authorized Containment policiesto restrict Web system the of fromarate someotherpart policiestoSeparation keep datasep- 2.5 K ERNEL I/O D EVELOPERS FOR /etc/shadow passwd L INUX S UMMIT program. but calls: would addfournewsystem LaHaise, asproposed by Asynchronous I/O, calls. head fordaemonsthanselect/pollsystem andhaslower over- zerowell with copy, fits raw I/O, It alsomakes efficientuseof requests). could betens orthousandsof andthere per thread pereach request, using thread-based I/O(8KBoverhead thought that thought LaHaise really continued to askif and mappedmemory thisuseof with Linus wasnotatallhappy completion. calling process could useto detect I/O thatthe mapped memory for asectionof named inquestionwould be Thedevice device?” why have anew added foursystem calls, “You haveLaHaise atthispoint: already interrupted theproposal, iar with wasalready whoobviously famil- Linus, oadteedo hisdefense of Toward theendof writing some documentation. writing and pinning, limitingmemory sockets, to be doneincludedaddingnetwork Work currently abouta~3000linepatch. andis This project isnotfinishedyet, could currently dothat. LaHaise responded thathehadcode that se fyou canhave AIO for asked if Drepper UDP. particularly sockets, with his approach really make will adifference andthat thesemantics, goodjobof very pointed outthatPOSIX AIO doesn’t doa LaHaise completely to wrong dothis.” to It characterall thissupport devices? is “Why doyou wantto add asked, call API, thesystem defenderof Drepper, Ulrich ing operations. completed operations. _io_getevents() _io_cancel() anasynchronous operation. off _submit_ios() _io_wait() essentially turning itsynchronous.essentially turning aspecificoperation, the completion of /dev/aio allows aprocess to waitfor mmap is forcanceling outstand- allows aprocess to fire and beusedto arrange gets information on gets information was necessary. fsync() mmap and , , 9 CONFERENCE REPORTS LINUX AND POWER MANAGEMENT, ACPI Grover explained that implementing level Linux kernel, suggesting that SMP clus- Andy Grover, Intel three means that it must be possible to ters, with one kernel monitoring every Grover started out by explaining the rea- shut off devices, and then to turn them four kernels, was the way to go. all on again while restoring sufficient sons why PC power management is McVoy was really there to talk about Bit- state for them to return to working con- important. He suggested some obvious Keeper, a source code control system. dition. This implies having a device man- things, like green PCs and mobile PCs. Subversion, another CVS system, had ager that knows enough about all the When he mentioned that servers might been the topic of a Friday night BoF, devices to shut them down or reinitialize also be able to do this to reduce power which McVoy could not attend. Accord- them. when sitting idly in large clusters, Linus ing to McVoy, as well as Victor Yodaiken, reacted. It was obvious this is something A lively discussion began at this point, BitKeeper had the only GUI interface he wanted. with different people wondering how to that would really work for kernel hackers. In the older scheme, currently supported make this work. Linus pointed out that He went on to describe various features, by the Linux kernel, APM handles power the PCI device has extra, unused fields in as well as demonstrating them for the management. The trouble with that is its structure that would allow it to be kernel hackers. extended to handle any device, and build that APM is an obsolete interface, and “BitKeeper is a peer-to-peer system. You an internal device tree. He had actually that the BIOS keeps track of APM so that get revision control files; we merge revi- considered adding the changes in earlier, it is not managed by the OS. The replace- sion histories. We can sync sideways, but the patches are very large, and adding ment for APM is the Advanced Configu- rather than go up and then down, which large patches disturbs people. Ts’o ration and Power Interface (ACPI), is what we did when I worked at Sun pointed out that 2.5 might be the ideal conceived by Intel, Microsoft, and (and I wrote their systems). You have to time to do this. Toshiba, which allows OS-directed power keep track of revisions, but we compress management. The following Web sites have more info: them.”
Summit group photo. See the Linux Weekly News for a color version with ids of the participants
10 Vol. 26, No. 3 ;login: ie”(McVoy isabout40years old). timer” to listenwilling to anexperienced “old- attendees were thatlotsof was obvious Ienjoyed listening to McVoy, andit use. product butisfree fornoncommercial BitKeeper isa way that Alan works now. says we are the about75%to gettingwto McVoy saidthat Alan CoxIn theend, fan). three days long. Perhaps theworkshop shouldhave been on talking. broke into sixlargegroups whojustkept Theroom onSaturday night. 6:35 p.m. When Ts’o endedthediscussionitwas etc. disksubsystems, tectures, many to support CPUarchi- when trying outtothat turns bereally important something includes itinthebugreport, aboutthesysteminformation and June 2001 RedHat’s RedHat’s Another suggestionwasto use Bugzilla. included usingadatabaseortrying Suggestions for improving bugtracking nodate for2.5 wasannounced. this, of While there discussion wasserious freeze. that adate beannounced forafeature heproposed code submissions, of rush which leadsto alast-minute releases, Rather thanhave between longperiods Ts’o alsomade “a modestproposal.” difficultto do.and founditvery Ts’o thisforawhile saidhetried code. of the personwhomanagesthatportion dispatches andperhaps itto saves it, bug, someone (usually Alan Cox) notices the andthen kernel developers mailinglist, nism isthatsomeonepostsabugto the Thecurrent mecha- lame atthispoint. which appears(to me)to bevery ing, Thefocuswasonbugtrack- cial session. Ted Ts’o took thehelmforthisfinaloffi- B were for UGTRACKING vi/vim sendbug ;login: O (Raymond isanemacs PEN cit which collects script, S ESSION L INUX 2.5 K ERNEL D EVELOPERS S UMMIT 11 CONFERENCE REPORTS