Zack's Kernel News
Total Page:16
File Type:pdf, Size:1020Kb
Community Notebook Kernel News Zack’s Kernel News Chronicler Zack Status of OverlayFS and Union technical problems that remain. At the mo- Filesystems in General ment, none of the projects seem close to get- Brown reports on Recently, Miklos Szeredi requested that Over- ting past Al’s laser-beam code reviews, and the latest news, layFS be included in the main kernel tree. until that happens, I’m certain none of them OverlayFS allows two directory trees to appear will be merged. views, dilemmas, as one. Two files with the same path on each tree would appear to occupy the same direc- Astonishing Tux3 and developments tory in the overlayed filesystem. The project Performance Claims within the Linux has been in existence for several years, but this There seems to be some suspicion between time Linus Torvalds replied, “Yes, I think we certain kernel developers and Tux3 develop- kernel community. should just do it. It’s in use, it’s pretty small, ers. Tux3 is a versioning filesystem that’s and the other alternatives are worse. Let’s just been in development since 2008. Recently, By Zack Brown plan on getting this thing done with.” Daniel Phillips, the project leader, posted Al Viro said he’d start reviewing the code, some benchmarks that showed Tux3 outper- but he also suggested that if they were going to forming tmpFS. As he put it, “To put this in merge a union filesystem such as OverlayFS, perspective, we normally regard tmpfs as un- they might as well consider merging other sim- beatable because it is just a thin shim be- ilar projects, such as Unionmount and Aufs. tween the standard VFS mechanisms that Unionmount in particular, he said, had been every filesystem must use, and the swap de- getting some good work lately from David vice.” Howells. Dave Chinner took a look at Daniel’s num- Meanwhile, Sedat Dilek jumped for joy at bers and found some issues that he felt indi- seeing OverlayFS close to acceptance. Al also cated a deliberate attempt to mislead people. replied again with his initial review. He’d In particular, he pointed out that the Tux3 identified some security issues and other benchmark didn’t include any “flush” opera- technical problems, and he went back and tions – the Tux3 front end was off-loading all forth with Miklos about them. The two at first of its work to a back end that could take all didn’t see eye-to-eye about how to fix the is- the time it needed to complete the job. The sues, or even whether a given issue was really front end would never block, and so it could a problem. simply race through the benchmark and exit. At one point, George Spelvin offered his, ad- Dave said, “You’ve carefully crafted the mittedly, somewhat hacky solution to one of benchmark to demonstrate a best case work- Al’s problems. The whole thing boiled down to load for the tux3 architecture, then carefully the way OverlayFS or any union filesystem not measured the overhead of the work tux3 would behave under the full range of possible has offloaded, and then not disclosed any of uses. Regarding George’s particular suggestion, this in the hope that all people will look at is Al walked through the convoluted process nec- the headline.” essary to remove a directory [1] and replied, Hirofumi Ogawa, one of the Tux3 develop- “I’m sorry, but this is insane.” ers, responded, saying fsync() had not yet Elsewhere, in an entirely different thread, been implemented, and the benchmarks were Sedat asked about the status of David’s Union- intended to show comparisons between just ZACK BROWN mount project. David replied, “It’s being reen- the parts of the code that had already been The Linux kernel mailing list gineered again to take account of VFS changes written. comprises the core of Linux that went in in the last merge window.” Daniel also responded to Dave’s post, say- development activities. He added, “It’s a maze of twisty locking ing, “I should indeed have noted that ‘modi- Traffic volumes are immense, problems – some of which also apply to things fied dbench’ was used for this benchmark, often reaching 10,000 like overlayfs:-(“. thus amplifying Tux3’s advantage in delete messages in a week, and The discussion in both threads ended there. performance. This literary oversight does not keeping up to date with the entire scope of development It appears everyone, including Linus, is ready make the results any less interesting: we beat is a virtually impossible task to see union filesystems like OverlayFS in the Tmpfs on that particular load. Beating tmpfs for one person. One of the kernel. But no one, including Al Viro and the at anything is worthy of note.” few brave souls to take on maintainers of the various union filesystem Regarding the specific issue Dave had this task is Zack Brown. projects, are able to solve satisfactorily the raised about off-loading 100% of Tux3’s 90 AUGUST 2013 ISSUE 153 LINUX-MAGAZINE.COM | LINUXPROMAGAZINE.COM Community Notebook Kernel News work, Daniel said, “Yes, that is the entire the behavior would end up being operat- status quo would burn anyone. At the point of our front/ back design: reduce ing-system-dependent anyway. He said, moment, it still seems unclear. application latency for buffered filesys- “blanket refusal to traverse such beasts is tem transactions.” a legitimate option.” Difficult Bug Hunt Theodore Ts’o pointed out that one Eric Blake replied that the real point Michael Hocko used git bisect to track couldn’t simply ignore the fsync() data was not whether creating an empty sym- down a problem resuming a suspended and expect a meaningful benchmark re- link should be allowed in Linux – it was system. Instead of resuming, the system sult. As he put it, “Since fsync() is de- the way Linux should behave when it en- would just reboot. He posted a patch to fined as not returning until the data writ- countered an empty symlink during path revert the commit that seemed to cause ten to the file descriptor is flushed out to resolution. the problem. stable storage – so it is guaranteed to be After all, even if Linux didn’t allow H. Peter Anvin asked for more details seen after a system crash – it means that empty symlinks to be created, other op- about Michael’s system; H. Peter said, the foreground application must not con- erating systems did, and the filesystems “This is one of a series of extremely bi- tinue until the data is written by Tux3’s containing those symlinks could be zarre suspend to RAM failures we are back-end.” He added, “any advantage of mounted under Linux. It would make trying to make sense of.” The particular decoupling the front/ back end is nulli- sense to handle those cases correctly. cause of the problem, he said, was “not fied, since fsync() requires a temporal Eric remarked: just bizarre, this is extremely disturbing.” coupling.” “I personally don’t care whether you The reason H. Peter found this so dis- Daniel replied that when they opti- fix the Linux kernel symlink() to allow turbing is that the piece of code Michael mized fsync, he expects “… Tux3 to per- empty symlinks, or successfully argue for had reverted did nothing more than flip form competitively, because our delta a bug fix against POSIX to permit the ex- the NX bit. The NX bit is used in some commit scheme does manage the job isting Linux symlink() behavior. I’d love CPUs to mark areas of memory as being with a minimal number of block to see Linux obtain POSIX certification “never executable” and flipping that bit writes …” [2]. someday, and either of those two courses should never affect anything just on its Elsewhere in the thread, Dave re- of action would get us closer. Meanwhile, own. The only way it could be involved marked on his real concern. He said, “I I know there are enough other issues in in a problem with suspend-to-RAM is if don’t care how fast tux3 is – I care about the kernel … that it will be a long time there were some deeper malignancy. being able to reproduce other people’s before we ever get a POSIX certification Linus joined the discussion and traced results. Hence if you are going to report of a Linux system.” the problem to __initdata, a special part benchmark results comparing filesystems Pavel Machek started exploring the ex- of the kernel that marks certain things as then you need to tell everyone exactly tent of the issue under Linux, trying to being solely related to initialization, so what you’ve tweaked and why, from the identify which tools would break when that once hardware all the way up to the bench- encountering empty symlinks and how initial- mark config.” bad a break it would be, but the discus- The discussion trailed out around sion ended at that point, with no clear there, but some kernel folks also seemed resolution on a course of action, or even to feel that Daniel’s approach was too it was worth doing anything about the marketing-oriented, trying to make big situation. announcements at the expense of clarify- Linus Torvalds is notoriously disdain- ing the real progress made. ful of compliance for compliance’s sake. If there’s no cost to it, he’s not opposed, Dealing with Empty but if there are valid technical reasons to Symlinks implement something in a non-compliant Back in January, Pádraig Brady noticed way, he’ll choose that over compli- that Linux didn’t allow users to create ance every time, and he makes no symlinks that pointed to non-existent secret of his contempt for cer- files.