<<

Community Notebook Kernel News Zack’s Kernel News

Chronicler Zack Status of OverlayFS and Union technical problems that remain. At the mo- Filesystems in General ment, none of the projects seem close to get- Brown reports on Recently, Miklos Szeredi requested that Over- ting past Al’s laser-beam code reviews, and the latest news, layFS be included in the main kernel tree. until that happens, I’m certain none of them OverlayFS allows two trees to appear will be merged. views, dilemmas, as one. Two files with the same on each tree would appear to occupy the same direc- Astonishing Tux3 and developments tory in the overlayed filesystem. The project Performance Claims within the has been in existence for several years, but this There seems to be some suspicion between time replied, “Yes, I think we certain kernel developers and Tux3 develop- kernel community. should just do it. It’s in use, it’s pretty small, ers. Tux3 is a versioning filesystem that’s and the other alternatives are worse. Let’s just been in development since 2008. Recently, By Zack Brown plan on getting this thing done with.” Daniel Phillips, the project leader, posted Al Viro said he’d start reviewing the code, some benchmarks that showed Tux3 outper- but he also suggested that if they were going to forming . As he put it, “To put this in merge a union filesystem such as OverlayFS, perspective, we normally regard tmpfs as un- they might as well consider merging other sim- beatable because it is just a thin shim be- ilar projects, such as Unionmount and . tween the standard VFS mechanisms that Unionmount in particular, he said, had been every filesystem must use, and the swap de- getting some good work lately from David vice.” Howells. Dave Chinner took a look at Daniel’s num- Meanwhile, Sedat Dilek jumped for joy at bers and found some issues that he felt indi- seeing OverlayFS close to acceptance. Al also cated a deliberate attempt to mislead people. replied again with his initial review. He’d In particular, he pointed out that the Tux3 identified some security issues and other benchmark didn’t include any “flush” opera- technical problems, and he went back and tions – the Tux3 front end was off-loading all forth with Miklos about them. The two at first of its work to a back end that could take all didn’t see eye-to-eye about how to fix the is- the time it needed to complete the job. The sues, or even whether a given issue was really front end would never block, and so it could a problem. simply race through the benchmark and exit. At one point, George Spelvin offered his, ad- Dave said, “You’ve carefully crafted the mittedly, somewhat hacky solution to one of benchmark to demonstrate a best case work- Al’s problems. The whole thing boiled down to load for the tux3 architecture, then carefully the way OverlayFS or any union filesystem not measured the overhead of the work tux3 would behave under the full range of possible has offloaded, and then not disclosed any of uses. Regarding George’s particular suggestion, this in the hope that all people will look at is Al walked through the convoluted process nec- the headline.” essary to remove a directory [1] and replied, Hirofumi Ogawa, one of the Tux3 develop- “I’m sorry, but this is insane.” ers, responded, saying fsync() had not yet Elsewhere, in an entirely different thread, been implemented, and the benchmarks were Sedat asked about the status of David’s Union- intended to show comparisons between just Zack Brown mount project. David replied, “It’s being reen- the parts of the code that had already been The mailing list gineered again to take account of VFS changes written. comprises the core of Linux that went in in the last merge window.” Daniel also responded to Dave’s post, say- development activities. He added, “It’s a maze of twisty locking ing, “I should indeed have noted that ‘modi- Traffic volumes are immense, problems – some of which also apply to things fied dbench’ was used for this benchmark, often reaching 10,000 like :-(“. thus amplifying Tux3’s advantage in delete messages in a week, and The discussion in both threads ended there. performance. This literary oversight does not keeping up to date with the entire scope of development It appears everyone, including Linus, is ready make the results any less interesting: we beat is a virtually impossible task to see union filesystems like OverlayFS in the Tmpfs on that particular load. Beating tmpfs for one person. One of the kernel. But no one, including Al Viro and the at anything is worthy of note.” few brave souls to take on maintainers of the various union filesystem Regarding the specific issue Dave had this task is Zack Brown. projects, are able to solve satisfactorily the raised about off-loading 100% of Tux3’s

90 August 2013 Issue 153 linux-magazine.com | Linuxpromagazine.com Community Notebook Kernel News work, Daniel said, “Yes, that is the entire the behavior would end up being operat- status quo would burn anyone. At the point of our front/​back design: reduce ing-system-dependent anyway. He said, moment, it still seems unclear. application latency for buffered filesys- “blanket refusal to traverse such beasts is tem transactions.” a legitimate option.” Difficult Bug Hunt Theodore Ts’o pointed out that one Eric Blake replied that the real point Michael Hocko used git bisect to track couldn’t simply ignore the fsync() data was not whether creating an empty sym- down a problem resuming a suspended and expect a meaningful benchmark re- link should be allowed in Linux – it was system. Instead of resuming, the system sult. As he put it, “Since fsync() is de- the way Linux should behave when it en- would just reboot. He posted a patch to fined as not returning until the data writ- countered an empty symlink during path revert the commit that seemed to cause ten to the is flushed out to resolution. the problem. stable storage – so it is guaranteed to be After all, even if Linux didn’t allow H. Peter Anvin asked for more details seen after a system crash – it means that empty symlinks to be created, other op- about Michael’s system; H. Peter said, the foreground application must not con- erating systems did, and the filesystems “This is one of a series of extremely bi- tinue until the data is written by Tux3’s containing those symlinks could be zarre suspend to RAM failures we are back-end.” He added, “any advantage of mounted under Linux. It would make trying to make sense of.” The particular decoupling the front/​back end is nulli- sense to handle those cases correctly. cause of the problem, he said, was “not fied, since fsync() requires a temporal Eric remarked: just bizarre, this is extremely disturbing.” coupling.” “I personally don’t care whether you The reason H. Peter found this so dis- Daniel replied that when they opti- fix the Linux kernel symlink() to allow turbing is that the piece of code Michael mized fsync, he expects “… Tux3 to per- empty symlinks, or successfully argue for had reverted did nothing more than flip form competitively, because our delta a bug fix against POSIX to permit the ex- the NX bit. The NX bit is used in some commit scheme does manage the job isting Linux symlink() behavior. I’d love CPUs to mark areas of memory as being with a minimal number of block to see Linux obtain POSIX certification “never executable” and flipping that bit writes …” [2]. someday, and either of those two courses should never affect anything just on its Elsewhere in the thread, Dave re- of action would get us closer. Meanwhile, own. The only way it could be involved marked on his real concern. He said, “I I know there are enough other issues in in a problem with suspend-to-RAM is if don’t care how fast tux3 is – I care about the kernel … that it will be a long time there were some deeper malignancy. being able to reproduce other people’s before we ever get a POSIX certification Linus joined the discussion and traced results. Hence if you are going to report of a Linux system.” the problem to __initdata, a special part benchmark results comparing filesystems Pavel Machek started exploring the ex- of the kernel that marks certain things as then you need to tell everyone exactly tent of the issue under Linux, trying to being solely related to initialization, so what you’ve tweaked and why, from the identify which tools would break when that once hardware all the way up to the bench- encountering empty symlinks and how initial- mark config.” bad a break it would be, but the discus- The discussion trailed out around sion ended at that point, with no clear there, but some kernel folks also seemed resolution on a course of action, or even to feel that Daniel’s approach was too it was worth doing anything about the marketing-oriented, trying to make big situation. announcements at the expense of clarify- Linus Torvalds is notoriously disdain- ing the real progress made. ful of compliance for compliance’s sake. If there’s no cost to it, he’s not opposed, Dealing with Empty but if there are valid technical reasons to Symlinks implement something in a non-compliant Back in January, Pádraig Brady noticed way, he’ll choose that over compli- that Linux didn’t allow users to create ance every time, and he makes no symlinks that pointed to non-existent secret of his contempt for cer- files. He asked why this was, because tain parts of the POSIX stan- POSIX specified that it should be al- dard. lowed, and other operating systems sup- On the other hand, if ported it. There was no discussion at the there’s a danger that users time, but he recently followed up again, might get burned if they asking if this was going to be fixed. mount a filesystem on Part of the idea was that symlinks which another OS has could be valuable just to store data in created an empty sym- their name alone, without utilizing their link, Linus would traditional purpose of linking to other rather eat sand than let files. that go unfixed. The But Al Viro thought this was “utterly real question may boil pointless,” especially considering that down to whether the

linux-magazine.com | Linuxpromagazine.com Issue 153 August 2013 91 Community Notebook Kernel News

ization is completed, the kernel can free If you have only contributed a few lines mers had to know is whether or not all associated memory. … specifying that “these 15 lines of the some other license was GPL-compatible. Apparently, the NX bit had been pre- function I_worship_at_the_altar_of_ If it was: treat it as GPL. If it wasn’t: ig- venting a particular region of memory rms() are under the GPLv2/​v3, even nore it. from executing as code, and that region through the rest of the file is GPLv2-only” But there’s no “the GPL” anymore. of memory had been getting corrupted is not something that we generally do. Linux and Samba can’t share code, even by __initdata. As long as the NX bit had Speaking as a subsystem maintainer, … though they implement two ends of the been set as it was, the corruption prob- if someone insisted on line-level copy- same protocol. And making your project lem had been covered up. That’s why right statements, I’d just simply reject the “GPLv2 or later” means you can’t take Michael had been able to bisect the issue patch rather than dealing with the ac- code from _either_ source. These days the and trace it to a patch that flipped the counting nightmare. If you want to add a GPL largely serves to _prevent_ code re- NX bit – and why reverting that patch GPLv2/​GPLv3 dual license to a file, … use, and people have responded to the had seemed to fix the problem. Michael you’ll need to get the consent of everyone perceived problems with “GPL-next” ini- posted a fresh patch to fix the underlying who has contributed changes to that file. tiatives where they fragment copyleft fur- issue with the __initdata, and Linus in- Finally, as Jonas has stated, if you are ther with Affero variants, by using cre- corporated it immediately into his tree. trying to impose the anti-Tivoization ative commons on code, and so on. But clause through the back door, it’s not copyleft only ever worked as one big uni- License Discussion going to have that effect, since people can versal license, and now it doesn’t. Eric Appleman opened a wriggling can always choose either license for dual-li- … the most common license on Github of worms when he suggested gathering censed code, and for the kernel GPLv2 al- is “no license specified,” and that’s not Linux kernel copyright holders (i.e., any- ways has to be one of the choices. just ignorance, that’s napster-style civil one who’s contributed code to the ker- A bit later in the discussion, he posted disobedience from a generation of coders nel) into a group to enforce the GPL this equally fascinating follow-up [3]: who lump copyright in with software pat- against license violators. Luke Leighton The more subtle thing to consider is ents and consider it all “too dumb to said he’d love to join this sort of thing, that with dual-licensed code, ***any- live.” (You think GPL enforcement suits although he admitted that his copy- one*** has the ability to strip one of the are viewed any differently than DMCA righted contributions were few. Almost licenses from the code in the course of takedown notices on youtube, both com- as an aside, he remarked, “can it be spe- making [a] modification. … That’s a com- ing from clueless old people?) cifically noted, from this moment on- pletely legal thing to do …. The reason Now add in Android’s official “no GPL wards, that all contributions that I have why I dislike someone taking GPLv2/​v3 in userspace” policy, which means that made to the linux kernel are dual-li- code and stripping out the GPLv2 license if you preinstall GPL (or LGPL) software censed under both the GPLv2 and also is because it makes new versions of code in your install image, you can’t use the the GPLv3+ license?” which I had originally written becoming Android trademark to describe your Cole Johnson replied that as far as he available only under a GPLv3 license. product. (Did I mention that smart- knew, “Linus said he will NOT use the But there’s a flip side to this, which is, phones are replacing the PC? … ) I’m GPLv3 for the kernel.” This apparently the same legal argument ***also*** al- sorry, but Richard Stallman _screwed_ threw Luke into a rage, and he said nei- lows a kernel maintainer to take a contri- up_. GPLv3 succeeded where Sun’s ther Linus Torvalds nor anybody else bution which is under a GPLv2/​v3 or CDDL failed: it split copyleft into incom- could prevent him from releasing his GPLv2+ license, and incorporate it into patible warring factions which are col- copyrighted code under any license he a GPLv2-only file, and not bother to mark lectively shrinking in market share be- wanted. Among the variety of relatively that it originally came from a GPLv2+ or cause none of them are as useful as heated posts in this thread was Theodore GPLv2/​v3 contribution. … you could find “The GPL” was. … Advocating for GPLv2 Ts’o reply to the whole idea of dual-li- that contribution and extract that code to go away is sad, but understandable. censing with the GPLv3: and use it in some other GPLv3 project. Expecting GPLv2 to be replaced by It’s not just Linus; many senior Linux But we are under no obligation to mark GPLv3 is just delusional. nnn kernel developers have spoken very that a particular set of lines in a file orig- clearly that the anti-Tivoization clause in inally came from a GPLv2/​v3 or GPLv2+ Info GPLv3 is totally unacceptable …. This contribution. … That’s not to say that [1] Al Viro on George Spelvin’s “hack- means that GPLv3-only code is always certain drivers won’t be dual licensed, for ish” solution: going to be incompatible with code re- specific reasons, but you shouldn’t expect http://​­lkml.​­indiana.​­edu/​­hypermail/​ leased as part of the Linux kernel, be- that core kernel files will be GPLv3 com- ­linux/kernel/​­ 1303.​­ 2/​­ 02764.​­ html​­ cause substantial parts of the kernel have patible in the near future. [2] Tux3 fsync debate: and will be available only under a GPLv2 Rob Landley gave a very bleak yet http://​­lkml.​­indiana.​­edu/​­hypermail/​ only license. highly interesting assessment of the ­linux/kernel/​­ 1305.​­ 1/​­ 02246.​­ html​­ If anyone wants to release their code modern history of “copyleft” [4]. To [3] Theodore Ts’o on dual-licensed under a dual-license, it’s easiest if that’s Luke, he said: You’re aware that copyleft code in the kernel: https://lkml.​­ org/​­ lkml/​­ 2013/​­ 5/​­ 20/​­ 218​­ how you submitted the code originally. in general is declining, right? [4] Rob Landley on copyleft: For example, … to encourage its use in “The GPL” was synonymous with https://lkml.​­ org/​­ lkml/​­ 2013/​­ 5/​­ 21/​­ 139​­ other operating systems. copyleft … and the only thing program-

92 August 2013 Issue 153 linux-magazine.com | Linuxpromagazine.com