Oversubscribing Inotify on Embedded Platforms

Oversubscribing inotify on Embedded Platforms By Donald Percivalle (CPE) and Scott Vanderlind (CSC) Senior Project California Polytechnic State University San Luis Obispo Dr. Zachary Peterson June 11, 2015 Abstract For most computers running the popular Linux operating system, the integrated kernel component inotify provides adequate functionality for monitoring changes to files present on the filesystem. However, for certain embedded platforms where resources are very limited and filesystems are very populated (like network attached storage (NAS) devices), inotify may not have enough resources to provide watchers for every file. This results in applications missing change notifications for files they have watched. This paper explores methods for using inotify most effectively on embedded systems by leveraging more la- tent storage. Benefits of this include a reduction in dropped notifications in favor of an introduced delay on notifications for files that are less frequently changed. Contents 1 Introduction 3 1.1 Application . .3 1.2 Problem . .3 1.3 Problem Statement . .4 2 Possible Solutions 5 2.1 Modification of inotify . .5 2.2 Wholesale replacement of inotify . .5 2.3 Development of user-space watch aggregator . .5 2.4 Chosen Approach . .6 2.5 Related Work . .6 3 Design 6 3.1 Constraints . .6 3.2 Implementation . .7 4 Takeaways 8 4.1 Solution Viability . .8 4.2 Future Work . .8 5 Reference Implementation 9 5.1 Driver Application . 10 5.2 Watch Aggregation Module . 12 5.3 Directory Analysis Module . 25 2 1 Introduction as possible. This goal requires a system to monitor the entire filesystem for new Western Digital produces lines of Net- files. work Attached Storage (NAS) devices that are built around an embedded Linux 1.1 Application computer. This embedded system acts as the brains of the operation: It inter- Historically, applications on Linux-like faces with the hard disk drives, manages systems have tracked changes to files us- the filesystem, and provides all the ser- ing a kernel API called inotify. Inotify vices that you'd expect in a NAS, like provides applications with the means to AFP, SMB, NFS, etc. NAS appliances subscribe to filesystem events at a folder that strictly provide filesystem services level: Applications can be notified of files are pretty cut-and-dry, however, West- that have been created, modified, and ern Digital also supports a suite of mobile deleted. This can be useful to many types and desktop applications to interface with of applications, especially those providing these NAS devices that require additional media sharing capabilities. For instance, services to be provided, like DLNA and when a media service is notified of new iTunes library sharing and media "cloud" file placed into a watched directory, that integration that makes media available service might examine the file for validity through an iOS or Android app. and make its contents available through Targeted towards both consumers and some other abstracted API. businesses, Western Digital sells My Western Digital's devices provide Cloud devices with storage sizes ranging more advanced media sharing services from 2TBs to 24TBs. It is intended to that synchronize file state and maintain provide a seamless NAS experience where parity between multiple devices. In do- the end user can't tell if they are inter- ing so, the services running depend heav- facing with local storage on their client ily on knowledge about the state of the machine, or on their My Cloud which filesystem as provided by inotify. could be across the Internet. Along with seamless reads and writes, Western Dig- 1.2 Problem ital aims to provide an always-on media server environment for My Cloud cus- The processes running on the embedded tomers. If a customer uploads a movie or system in the My Cloud NAS depend on an album of photos to their My Cloud, knowledge of the state of a large tree of the goal is to have the iOS and Android directories. In the native implementation apps, as well as any DLNA or media of inotify, a watch is created for each file servers running locally on the My Cloud or directory that needs to be monitored. itself be notified of the new file as soon When a directory is being watched, the 3 events that trigger a notification include a plication processes desire to watch for file creation, update, or delete within that changes in the same directory (for in- directory. However, events that occur in stance, DLNA and iTunes both watch- subfolders of that watch do not trigger ing a directory called media), they both events. It is therefore necessary to create may allocate (and expend) a watch for a watch for every folder in a tree should that file. In practice, this limit is easily you want to subscribe to all file events. raised by tweaking the contents of the file The first hurdle is the hard limit /proc/sys/fs/inotify/max user watches. of watches allowed to be allocated by Let's consider the anatomy of an each user on the system. By default inotify watch struct, one of which is this limit is set as 524,288, and there- created for each file and is registered to an fore limits the number of files eligible for inotify watcher (a glorified file descrip- concurrent watching at 524,288. The- tor), the parent struct that aggregates a oretically, this limit is further lowered set of watches in the context of an appli- by redundant watches: Should two ap- cation. 1 /∗ Excerpt from include/linux/inotify.h:L20 ∗/ 2 struct i n o t i f y w a t c h f 3 struct l i s t h e a d h l i s t ; /∗ entry in inotify handle's list ∗/ 4 struct l i s t h e a d i l i s t ; /∗ entry in inode's list ∗/ 5 atomic t count ; /∗ reference count ∗/ 6 struct i n o t i f y h a n d l e ∗ ih ; /∗ associated inotify handle ∗/ 7 struct inode ∗ inode ; /∗ associated inode ∗/ 8 s 3 2 wd ; /∗ watch descriptor ∗/ 9 u32 mask ; /∗ event mask for this watch ∗/ 10 g ; Considering the structures that a sin- ing is that this theoretical maximum does gle inotify watch contains, allocating a not give any allowances for other applica- single watch requires (on a 32-bit system) tions using memory. 540 bytes of kernel memory. Unfortu- nately, the My Cloud EX2 device only has 500MB of memory available to the entire 1.3 Problem Statement system. Assuming the watch limit men- It is not feasible for multiple applica- tioned above has been raised, this allows tions to simultaneously be running on for 940,740 simultaneous watches. This low-power integrated systems while also limit cannot be raised without adding monitoring sufficiently large file systems more memory to the system, as kernel for changes (with 'large' meaning more memory cannot be swapped. Worth not- than 940,740 folders). 4 2 Possible Solutions 2.2 Wholesale replacement of inotify A number of solutions have presented themselves. Let's examine them. While inotify is bundled with Linux (in- cluding the distribution used on My Cloud devices), it is by no means the end-all solution to file monitoring across 2.1 Modification of inotify all platforms. Certainly commercial operating systems like Mac OS and Win- The most obvious solution is to modify dows have developed respective analogs, how inotify works so that it is inherently as have other distributions of linux-like recursive. The immediate benefit is that and unix-like operating systems. The only a single watch would be required on barriers to porting another system from the root folder of the file tree to be mon- a fellow POSIX compliant OS are theoret- itored. Unfortunately, this avenue has ically low. been explored and decided unfit for any However, no system exists that is system, let alone embedded ones: sufficiently appealing. *BSD kernels (FreeBSD, OpenBSD, NetBSD, et al.) provide a module called kqueue, which In many filesystems, includ- has certain advantages over inotify but ing Unix-style ones, recurs- is similarly not recursive. A potential ad- ing down a directory hierar- vangage is that kqueue can provide sub- chy isn't a first class opera- scribers with notifications for all filesys- tion. [...] You list the con- tem events, starting at the root directory tents of a directory and for instead of a specific folder. This is akin each content that is itself a di- to drinking from a firehose while stand- rectory, you transcend into it, ing next to a water fountain, and leads and repeat. [...] to much application-level overhead just to filter through irrelevant events. You could replicate this algo- Other alternatives have similar rithm in the kernel [...] but enough limitations as to obviate the need the performance of the system to discuss them. would suffer. This is much too long-running an operation to perform inside the kernel, 2.3 Development of user- particularly if you were to do space watch aggregator so while holding the appropri- ate locks. - Robert Love, ino- The final option considered is a user- tify author[2] space watch aggregator and event sim- 5 ulator. This hybrid system depends on search and indexing tool) in 2004. The re- inotify for reporting file change events sult of his investigations is an algorithm for as many files as possible, and uses dubbed the Love-Trowbridge Recursive manual scanning of file meta-data at set Directory Scanning Algorithm[1]. The intervals to find the events that were purpose of this algorithm is to ensure that missed by inotify.

Oversubscribing Inotify on Embedded Platforms

The Linux Kernel Module Programming Guide

Procedures to Build Crypto Libraries in Minix

UNIX Systems Programming II Systems Unixprogramming II Short Course Notes

Toward MP-Safe Networking in Netbsd

Advancing Mac OS X Rootkit Detecron

Ivoyeur: Inotify

Chapter 1. Origins of Mac OS X

Linux Device Drivers – IOCTL

Kqueue Madness Have to Ponder These Questions Or Write a Began

Efficient Parallel I/O on Multi-Core Architectures

Mac OS X for UNIX Users the Power of UNIX with the Simplicity of Macintosh

Monitoring File Events