Proceedings of FREENIX Track: 2000 USENIX Annual Technical Conference

San Diego, California, USA, June 18Ð23, 2000

U N I X F I L E S Y S T E M E X T E N S I O N S I N T H E G N O M E E N V I R O N M E N T

Ettore Perazzoli

THE ADVANCED COMPUTING SYSTEMS ASSOCIATION

© 2000 by The USENIX Association All Rights Reserved For more information about the USENIX Association: Phone: 1 510 528 8649 FAX: 1 510 548 5738 Email: [email protected] WWW: http://www.usenix.org Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. Unix file system extensions in the GNOME environment

Ettore Perazzoli Helix Code, Inc. [email protected], http://primates.helixcode.com/˜ettore/

1 Introduction files in a simple way.

This paper explains how the GNOME 1.2 Representing special non-file [GNOME] project is extending the function- objects ality of the Unix file system for use in both the desktop and applications, by using a user- level library called the GNOME Virtual File Current Unix desktop environments lack a System. unified file system -like representation of the system resources. In the Windows operat- There are various reasons for extending the ing system, users can access all the system Unix file system and its API, as explained in resources from the “My Computer” folder the following sections. which acts as the root of the operating system shell’s “namespace”. This means that the control panel, the printer’s spool, the trash- 1.1 Uniform file access can, the removeable devices and so on are all accessible through the same simple “point- and-click” mechanism within the same appli- In a modern desktop environment, users cation (Explorer.exe). Unlike the current have to deal with files that are available in Unix desktop environments, users don’t need different ways. For example, files can be re- to deal with different applications for the var- motely available from a Web or FTP site. ious objects that are in their computer: they Sometimes they are stored locally, but are can just use the desktop shell. contained in tar or zip files. Sometimes they are stored in a compressed form. To implement similiar functionality on Unix, we would need to have an extensible Each of these cases usually requires the mechanism for providing the contents to these user to use a different tool; using different virtual folders and implementing operations tools means that users have to learn the user that can be performed on them. interface for each of them, and have to manu- ally make these tools talk to the applications, for example by using temporary files. 1.3 Asynchronous operation

To avoid this problem, there should be a global file system namespace that all applica- On Unix, there is no API to let applica- tions can use to access these different kinds tions do fully asynchronous I/O in an easy of files. Users should not need to install and and portable way. use different programs to access files available through different methods, and all the appli- The reason why this is so important is that cations should be able to access this global GUI applications need to be responsive all the file system namespace. time. If a GUI application is blocked during a long synchronous file operation, the user So basically we need a consistent API that cannot stop the operation, nor perform any all the applications can use to access these other task with the same application. This is especially important for a file manager, 2.1 The GNU Midnight Comman- which needs to be able to perform multiple der I/O tasks in parallel without taking control away from the user. The GNU Midnight Commander is the file One of the ways of dealing with asyn- manager of choice for the GNOME project chronous I/O, i.e. using the select() sys- at the time of writing. It implements a user- tem call, does not work with all the opera- space Virtual File System library that solves tions; for example, there is no way to make the problem of accessing files within archive an asynchronous gethostbyname(), open() files or on remote Internet sites through an or stat() without complicated hacks. extended URI system which is explained in greater detail in section 4. Unix programmers could also use POSIX threads for performing asynchronous opera- Although the VFS library source code tions, but thread programming is very error- could be made independent of the GNU Mid- prone and can easily lead to problems. Writ- night Commander and thus its functionality ing a threaded application requires more pro- could be used by all the applications, it suf- gramming skills writing a non-threaded one, fers a few design problems: and in the free world it is very im- portant to make things easy for programmers. It does not support asynchronous oper- • Moreover, POSIX threads are not fully ation: if you perform an operation on a portable across different platforms, as not all remote system, the Midnight Comman- implementations are reliable and efficient. der will be blocked until the operation is complete, and the user is not even able Finally, although a POSIX API for asyn- to stop the operation. chronous I/O exist, it is not implemented on The library is hidden under a Unix API all systems and does not work with all system • that is not very extensible. calls. For example, you cannot execute the open() nsystem call in an asynchronous fash- ion. Moreover, this API does not fit nicely in The latter problem is the most important the event-based model of GUI systems; con- one: as a Virtual File System has to deal sequently, it is not suitable for rapid develop- with kinds of files that can be quite different ment of GUI applications. from the standard Unix files, it needs some API extensions to support this functionality. What we need for asynchronous operation Moreover, it needs some API extensions to is an easy to use API that hides most of the deal with asynchronous operations. Unfortu- details from the programmer and integrates nately, it is not possible to add functionality nicely with the existing GUI toolkits. The to the API in a clean way without breaking API should give a better abstraction for doing Unix compat- ibility, so this makes the GNU async I/O, in the most simple way possible. Midnight Commander’s Virtual File System unsuitable as a generic library for GNOME.

2 Existing User Space Virtual 2.2 KDE’s kio File Systems The K Desktop Environment [KDE] pro- vides a completely asynchronous virtual file In the past, there have been other attempts system library called kio. to implement extensions to the Unix file sys- tem in the world, none of which kio uses the same extended URI syntax completely fulfilled the needs of the GNOME that the GNU Midnight Commander uses; project. but unlike the Midnight Commander, all the file operations are performed through exter- The implementation should be portable, nal helper processes (kioslaves) in an asyn- • so that making GNOME VFS a core chronous fashion. The helper process commu- component of GNOME would not re- nicates with the master process using Unix strict the availabilty of GNOME on var- domain sockets and custom protocols; the ious Unix platforms. library performs encoding/decoding of such Using GNOME VFS should not make protocols automatically and notifies the pro- • grammer using the Qt [Qt] signal system. the programmer’s life harder. The asynchronous API should be simple Although this system is closer to the re- • to use, and be nicely integrated with the quirements that we have listed at the begin- standard GLIB main loop that is central ning of this paper, it is still suboptimal, for to GNOME applications. (The GLIB the following reasons: main loop is the main event-handling loop in GTK+ and GNOME applica- tions. The X main loop is nicely wrapped Operations are very simple: the API by the GLIB loop.) • allows the programmer to down- load/upload a whole file and perform Adding new access methods should be some simple file operations, but it does • possible, and programmers should not not provide all the versatility of the need to care about too many of the Unix API. details, such as asynchronous behavior. (By “access method”, we mean the im- Each file operation requires an external plementation of a protocol, of a file for- • process: kio does not take advantage of mat, or any other way to retrieve, create threads. or modify a GNOME VFS file. For ex- ample, there should be an HTTP access There is no API to perform operations in method, a ZIP file access method and so • a synchronous fashion, so the program- on.) mer needs to spawn a process, request an operation and then wait for the result from it. 4 Extended URIs

3 The GNOME Virtual File The GNOME VFS uses an extension of the System traditional web Uniform Resource Identifiers (URIs) to access files, instead of the standard Unix file paths. This enables us to access dif- The solution we gave to these issues in the ferent kinds of resources in a way that is both GNOME project is in the form of a new li- generic and easy to understand for users. brary, designed from scratch to meet all the requirements. This library (the GNOME Vir- GNOME VFS extended URIs are reminis- tual File System, or GNOME VFS for short) cent of the GNU Midnight Commander’s URI takes advantage of the existing GNOME de- syntax, and consequently use the # character velopment libraries, such as GLIB and GTK+ to stack access methods on top of each other. [GTK]. In the simplest form, a GNOME VFS URI The following design ideas were kept in looks exactly like a normal Web URI, that is: mind while implementing the GNOME VFS:

method://user:password @some.host/path/to/file Like the rest of GNOME, the API should • be C-based and follow the standard GNOME programming style. As in normal Web URIs, user and password are optional; in that case, the @ character will If the file was available through FTP in- be omitted as well. stead, the URI would be something like:

For example, the URI ftp://joe:[email protected] /home/joe/foo.zip#zip/baz/bar.c http://www.gnome.org/index.html

Some access methods don’t require a sub- will refer to the file index.html from the path; for example, this is the case with the host www..org through the http access gzip access method that can be used to read method, which is an implementation of the and create compressed files in .gzip format. HTTP protocol. If you wanted to read the contents of a com- The access method for the local file sys- pressed foo.gz file in your home directory, tem is called file, so, for example, the file you would simply have to specify the follow- /etc/passwd is accessed through the URI ing URI:

file:/etc/passwd file:/home/joe/foo.gz{\tt \#}gzip

Unlike normal Web URIs, though, GNOME VFS URIs let you “stack” access By combining the gzip access method with methods on top of each other. GNOME the tar access method, it is possible to access VFS, in fact, supports two kinds of access files contained in tar.gz archives: methods: “toplevel” access methods, that access files directly, and “archive” access methods, that access files within other files. file:/home/ettore/download /gnome-vfs-0.1.tar.gz#gzip #tar/gnome-vfs-0.1/AUTHORS For example, there is a zip access method that lets you access files contained in a .zip archive file: the .zip access needs a “parent” There is no limitation in the amount of access method to access the archive in which stacking that can be performed. Every access the file is contained. method in GNOME VFS is implemented as a dynamically loaded plug-in, as described in Stacking is achieved in GNOME VFS URIs section 10: anyone can extend the VFS with by using the special character #. The generic new access modules. syntax is:

uri#method[/sub_uri] 5 GNOME VFS API basics

When this syntax is used, it specifies that sub uri must be accessed within the Almost all the standard Unix file opera- file available through uri using the specified tions have conterparts in GNOME VFS. In method. the following sections, we will give a brief overview of the GNOME VFS API. For example, imagine you have a foo.zip archive containing a file called bar.c. Also suppose that bar.c is contained in a directory 5.1 The GnomeVFSURI object called baz within foo.zip, and that foo.zip is in your home directory /home/joe. The URI to specify foo.zip is: A GNOME VFS URI is represented by a special GNOME VFS object called file:/home/joe/foo.zip\#zip/baz/bar.c GnomeVFSURI. GnomeVFSURI objects are the preferred way to specify files: users of the li- GNOME_VFS_ERROR_CORRUPTEDDATA, brary can use GnomeVFSURIs to store and ma- /* ... */ nipulate URIs, and the interface between the GNOME_VFS_NUM_ERRORS }; library and its plug-ins uses GnomeVFSURI ob- typedef enum _GnomeVFSResult jects. GnomeVFSResult;

A GnomeVFSURI object is created by using the call Just like the Unix errno variable, you can get a string description from a GnomeVF- SURI value by using the following function: GnomeVFSURI * gnome_vfs_uri_new (const char *s) const gchar *gnome_vfs_result_to_string (GnomeVFSResult result) GnomeVFSURIs also have a reference count that can be controlled by using the calls: 6 Synchronous API void gnome_vfs_uri_ref (GnomeVFSURI *uri)

In GNOME VFS, both a synchronous and void gnome_vfs_uri_unref (GnomeVFSURI *uri) asynchronous API call exist most file oper- ations. The synchronous versions work like A GnomeVFSURI can be also converted into normal Unix calls: they perform the opera- a printable string by using the following call: tion, then return and report success/failure using a GnomeVFSResult value.

char *gnome_vfs_uri_to_string (const GnomeVFSURI *uri, 6.1 The GnomeVFSHandle object GnomeVFSURIHideOptions hopt)

It is also possible to extract the host, user As in the Unix API, files need to be “open” name and password information from it, as before being ready to be read or written. But well as compare and combine them. Virtu- while the Unix API returns a simple inte- ally any path operation that the programmer ger to represent a “file handle”, the GNOME might need is supported directly through the VFS API provides a special object type for GnomeVFSURI API. that, called GnomeVFSHandle. GnomeVFSHandle objects are created us- 5.2 The GnomeVFSResult enumera- ing the “open” or “create” calls tion GnomeVFSResult gnome_vfs_open_uri All the GNOME VFS operations return (GnomeVFSHandle **handle return, GnomeVFSURI *uri, a result value of type GnomeVFSResult GnomeVFSOpenMode open mode) that represents the result of the operation. GnomeVFSURI is a numeric enumeration: GnomeVFSResult gnome_vfs_create_uri enum _GnomeVFSResult { (GnomeVFSHandle **handle return, GNOME_VFS_OK, GnomeVFSURI *uri, GNOME_VFS_ERROR_NOTFOUND, GnomeVFSOpenMode open mode, GNOME_VFS_ERROR_GENERIC, gboolean exclusive, GNOME_VFS_ERROR_INTERNAL, guint perm) GNOME_VFS_ERROR_BADPARAMS, GNOME_VFS_ERROR_NOTSUPPORTED, GNOME_VFS_ERROR_IO, and destroyed by using the “close” call: GnomeVFSResult 7 Asynchronous API gnome_vfs_close (GnomeVFSHandle *handle)

6.2 Synchronous I/O Example In the asynchronous API calls, instead, the operation requested is started in a separate thread of execution and control returns to the Here is a simple example demonstrating caller immediately. The caller will have to synchronous operation in GNOME VFS. This specify a callback function that will be called subroutine will read a file and output its con- when the operation is completed. If the op- tents to stdout. The code should be rather eration is long to perform (such as a “copy” self-explanatory. operation), the callback might be called more than once to report progress. gboolean vfs_cat (const char *uri) All the syncrhonous API calls have asyn- { GnomeVFSURI *vfs_uri; chronous counterparts; their name is the GnomeVFSHandle *handle; same as the synchronous one, but use the GnomeVFSResult result; gnome vfs async prefix, instead of just gnome vfs_uri = gnome_vfs_uri_new (uri); vfs. if (vfs_uri == NULL) { printf ("‘%s’ is not a valid URI.\n", Callbacks for asynchronous operations are uri); triggered in the normal GLib/GTK+ event return FALSE; } loop. This means that the application will be able to handle GUI events and GNOME result = gnome_vfs_open_uri VFS events simultaneously and transpar- (&handle, vfs_uri, GNOME_VFS_OPEN_READ); ently. The programmer just needs to set if (result != GNOME_VFS_OK) { up the callback functions and make sure the printf ("Error opening ‘%s’: %s\n", event loop is running all the time. This makes uri, usage of GNOME VFS in GUI applications gnome_vfs_result_to_string (result)); very convenient and easy to use. return FALSE; }

while (1) { GnomeVFSFileSize bytes_read; 7.1 The GnomeVFSAsyncHandle ob- GnomeVFSFileSize i; char buffer[4096]; ject

result = gnome_vfs_read (handle, buffer, If an asynchronous operation is started suc- sizeof (buffer), &bytes_read); cessfully, the caller is given back a GnomeVF- if (result != GNOME_VFS_OK) { SAsyncHandle object that can be used to printf ("%s: %s\n", uri, stop the operation after it has been started, gnome_vfs_result_to_string by using the following call: (result)); return FALSE; }

if (bytes_read == 0) GnomeVFSResult gnome_vfs_async_cancel break; (GnomeVFSAsyncHandle *handle) for (i = 0; i < bytes_read; i++) putchar (buffer[i]); }

gnome_vfs_close (handle); gnome_vfs_uri_unref (vfs_uri); In the case of file “open” and “create” oper- return TRUE; ations, the GnomeVFSAsyncHandle object can } also be used to request read/write operations on the file after it has been opened or created. 7.2 Opening a file as a GLib I/O I/O channel associated channel with the file. */

static gboolean io_channel_callback A common application case is when the (GIOChannel *source, program needs to get data from a file stream GIOCondition condition, as soon as it becomes available from the gpointer data) { transport layer. In the case of Unix, this is char buffer[BUFFER_SIZE + 1]; achieved through the select() system call. unsigned int bytes_read; unsigned int i;

GLib abstracts this mechanism by merg- if (condition & G_IO_IN) { ing it with the event handling loop, through /* Data is available. */ objects known as GIOChannels. With g_io_channel_read GIOChannels, it is possible to attach callbacks (source, buffer, sizeof (buffer), to a file descriptor, and have functions called &bytes_read); as soon as data becomes available on it. for (i = 0; i < bytes_read; i++) Special GNOME VFS API functions allow putchar (buffer[i]); the programmer to open and read (or write) fflush (stdout); a file through a GIOChannel. For example, } a file can be opened by using the following function: /* An error happened while reading the file. */

if (condition & G_IO_NVAL) GnomeVFSResult return FALSE; gnome_vfs_open_as_channel (GnomeVFSAsyncHandle **handle return, /* We have reached the end of the const gchar *text uri, file. */ GnomeVFSOpenMode open_mode, guint advised_block_size, if (condition & G_IO_HUP) { GnomeVFSAsyncOpenAsChannelCallback g_io_channel_close (source); callback, return FALSE; gpointer closure); }

/* Returning TRUE will make sure Notice that, unlike the normal Unix the callback remains associated select() call, this is reliable with local files to the channel. */ too: the application will never be blocked af- return TRUE; ter reading from a file for which the “data } available” callback has been called. static void open_callback (GnomeVFSAsyncHandle *handle, This is possible because of the GNOME GIOChannel *channel, VFS asynchronous engine described in sec- GnomeVFSResult result, tion 11. gpointer data) { if (result != GNOME_VFS_OK) { printf ("Error: %s.\n", 7.3 Asynchronous API example gnome_vfs_result_to_string (result)); return; } The following function reads a file asyn- printf ("Open: ‘%s’.\n", crhonously, with the output sent to stdout. (char *) data); g_io_add_watch_full (channel, G_PRIORITY_HIGH, #define BUFFER_SIZE 4096 G_IO_IN | G_IO_NVAL | G_IO_HUP, io_channel_callback, /* This is the callback that handle, NULL); will be called whenever } something happens on the void is going on, even with those slow back-ends start_cat (const char *uri) for which reading a directory is an expensive { operation (such as a tar file access method, GnomeVFSAsyncHandle *handle; that requires a sequential scan of the whole /* Start the ‘read’ operation. */ .tar file). gnome_vfs_async_open_as_channel (&handle, uri, GNOME_VFS_OPEN_READ, BUFFER_SIZE, open_callback, "data"); } 9 File transfer support

8 File attributes In GNOME VFS there is a special API call for copying and moving files, while providing the caller with progress information while the GNOME VFS also provides a operation is being performed. GNOME VFS GnomeVFSFileInfo object that works as is able to automatically optimize the case in an extension of struct stat in Unix. In which a file is moved through two different addition to the standard attributes that locations on the same physical file system, by struct stat provides, GnomeVFSFileInfo using the information provided by the source also gives access to: and destination plug-ins.

When this API call is used, all the file MIME type information. GNOME VFS transfer is performed in a separate thread or • is able to take advantage of plug-ins for process, which dispatches the progress infor- which the MIME type is part of the infor- mation to the main thread/process using the matin that the access method provides. same mechanism that is used by the other For example, in the case of the HTTP asynchronous calls. back-end, GNOME VFS can provide the MIME type as returned by the HTTP In order to reduce the impact of dis- server. patching progress information across pro- cess/thread boundaries, the actual calls take Metadata. Arbitrary value/key pairs can place periodically, at a rate specified by the • be associated with files, for generic pur- programmer. In the case of a file manager, for poses. For example, you can use an example, this will be done only a few times in “icon” value for specifying a file’s icon, a second, just enough to make sure the GUI or a “description” value for a verbose de- is updated the way the user would expect. scription of the file.

As some kind of information might not be supported by certain plug-ins (for example, it 10 Implementation of access is not possible to know the number of phys- plug-ins ical blocks occupied by a file via HTTP), GnomeVFSFileInfo provides a bitmask speci- fying which fields are actually valid and which As explained in 4, access methods are im- are not. plemented as dynamically loaded plug-ins. Even the file plug-in that accesses local files GNOME VFS also provides a simple API is implemented as a plug-in. for loading directory information in a progres- sive way, calling an asynchronous callback as Dynamic loading of plug-ins happens dur- data is copied into memory. This function- ing GnomeVFSURI parsing (see section 5.1): ality is particularly useful for a file manager, the URI is split into its #-separated subparts, as the directory view can be updated with- the library looks up the access method names out blocking the user interface and thus giv- (such as file, http and so on), and tries to lo- ing the user effective visual feedback of what cate the corresponding plug-in library, by us- ing both a system-wide and a user-wide con- It will become possible to write plug-ins figuration file called gnome-vfs.conf. If the • in any CORBA-aware language. For ex- library is located, it gets linked in dynami- ample, you could have plug-ins written cally; otherwise, a return code is returned, in Perl, Python, or other scripting lan- reporting that the URI is invalid. guages for which CORBA support exists. This will make it extremely simple for Every access method module must pro- people to come up with their own spe- vide an initialization function; the initializa- cial scripts for special directories or file tion function returns a simple vtable con- systems. taining pointers to the implementations of It will become possible to make GNOME the GNOME VFS operations for that ac- • cess method. Pointers to these vtables are VFS see file systems that are imple- stored directly into the GnomeVFSURI ob- mented in a different process. The ex- ject, ready for the first operation to be in- ternal process might be running all the voked on it. time, and would be contacted through CORBA. For example, you see the list At the time of writing, the following mod- of active print jobs in the print spooler ules have been implemented: as a normal GNOME VFS directory.

A bzip2 module, for files that are com- • pressed with the bzip program. 11 Implementation of asyn- chronous operations A gzip module, for files that are com- • pressed with the gzip program. To make operations totally asynchronous, An ftp method for accessing remote sites • we need to be able to perform them indepen- through the Internet FTP protocol. dently of the code that wants them to be exe- cuted. This can be done using either external An http method for accessing remote helper processes or helper threads. • sites through the HTTP 1.1 protocol. This module also supports WebDAV. The thread-based solution has a number of advantages: An extfs module, supporting Midnight • Commander’s generic archive file sup- port based on simple shell scripts. This It requires less memory to work. multipurpose module makes GNOME • VFS able to deal with zip, zoo, rpm, It is faster, as no data needs to be trans- deb, arj and other formats. • ferred from the helper to the master.

A gconf protocol to access the new • GNOME configuration database, called Unfortunately, it is not fully portable, as GConf. many systems don’t have a suitable thread support. For this reason, GNOME VFS implements asynchronous operation in both Other modules, including a tar one, are ways. This is done by splitting the library being developed. in two parts: the basic file access library, and the asynchronous wrapper library. While While this is not supported at the time there is one single version of the former, there of writing yet, there are also plans to sup- are two versions of the latter: one that is port CORBA-based access method plug-ins, based on POSIX threads, and one that uses possibly using the Bonobo component model. external processes. [CORBA] At run time, GNOME VFS applications This will have important consequences: are dynamically linked to either of the two wrapper libraries. This makes it possible to The name of the module is gnome-vfs. use either method without changes in the source code. In the case of helper pro- The GNOME CVS has a web front-end cesses, the GNOME Virtual File System uses available at the URL CORBA to communicate with them.

Using CORBA has the advantage of not re- http://cvs.gnome.org quiring the creation of a custom inter-process communication protocol; moreover, adding new operations is very simple (you just need to extend the IDL). GNOME comes with its References own ORB by default, so this does not add any further constraints to the applications willing [GNOME] The GNOME home page, to use GNOME VFS. http://www.gnome.org [HelixCode] Helix Code, Inc. http://www.helixcode.com 12 GNOME VFS and GNOME [KDE] The KDE home page, http://www.kde.org

GNOME VFS is a core component of the [Qt] Qt upcoming GNOME 2.0 platform. When http://www.troll.no GNOME 2.0 is released, GNOME VFS will [GTK] GTK+ be available for all applications (GUI or non- http://www.gtk.org GUI) to use. GNOME VFS is also a central component of the new GNOME file manager [Nautilus] Nautilus and shell, which is called Nautilus [Nautilus] http://nautilus.eazel.com and is currently being developed by Eazel, Inc. [Eazel]. Nautilus makes extensive use [CORBA] CORBA of asynchronous operations and file system http://www.corba.org abstractions to improve the usability of the [Eazel] Eazel, Inc. GNOME desktop. The first public release of http://www.eazel.com Nautilus is expected in Summer 2000.

13 Availability

GNOME VFS is available from the GNOME FTP repository:

ftp://ftp.gnome.org /pub/GNOME/unstable/sources/gnome-vfs

It is also possible to access the source code through the GNOME anonymous CVS sys- tem anoncvs.gnome.org, as explained at the URL

http://developer.gnome.org/tools/cvs.html