Usmux: Unix-domain socket multiplexing

Steven Simpson December 3, 2018

Abstract opens a Unix-domain socket, and multiplexes connec- tions on the socket to the Java process via the pipes. A protocol is defined to permit the multiplexing. Although Java programs are executed on a virtual ma- chine, they can run as fast as programs compiled to Specifications of the protocol and the Java API are in- the native architecture when just-in-time (JIT) compi- dependent of this particular use of sockets and pipes, so lation is applied, along with other optimizations involv- they could be re-applied on other platforms with alter- ing profiling and in-lining. These are most effective in native concepts, while allowing the Java code that uses long-running executions, where the cost of initialization them to remain portable between platforms. and analysis can be amortized. Consequently, Java pro- grams can run efficiently over long periods as server Introduction processes and desktop applications, but perform signif- icantly less well as short lived programs such as CGI At first, Java appears inherently slow by its design. Java scripts. source code is first compiled into bytecode, an assembly A long-running Java application could be adapted language for a virtual processor called the Java Virtual for use by short-lived commands by making them con- Machine (JVM). The JVM then processes each (virtual- nect to (say) a socket which the application is listening )machine code instruction to produce the behaviour rep- on. The short-lived commands are then simple clients, resented by the source code, and it is this processing whose job is merely to relay I/O between the user and which is slower that an equivalent program compiled to the application. The only built-in and portable mech- a native machine code. The benefit is a write-once, run- anism to connect with such a Java process is to use anywhere (WORA) environment making Java programs network sockets, possibly restricted to the loopback ad- maximally portable. The additional layer of abstraction dress for security purposes. (the JVM must ultimately be written in native machine There are also third-party Java libraries which pro- code) also provides greater integrity, increasing secu- vide access to Unix-domain sockets, giving additional rity, making dynamic loading of foreign code possible, security (especially on multi-user machines), but these and detecting programmer error. do not fit well with Java’s native socket abstrac- The creators of Java have also sought speed improve- tion, which allows its Internet-specific nature to leak ments along several avenues. Start-up times have been through. They also usually require JNI, and therefore reduced by increasing the modularity of the standard li- introduce additional complexity in terms of installation. brary, and by permitting some lazy class initialization. This report details an approach to supporting Unix- Bytecode can be compiled to native code as it is loaded domain socket access to persistent Java processes, with- with a just-in-time (JIT) compiler installed in the JVM. out using any JNI, and just a small amount of C in a sep- The JVM may also monitor its process, looking for sec- arate wrapper or launcher program. This brings an addi- tions of code most frequently executed, in order to pri- tional benefit of launching the Java process in the back- oritize its optimization. It may also in-line sections of ground, with a synchronization point just after start-up code to reduce the overhead of method invocation. Note to allow the process to abort execution if failure is de- that these optimizations apply at run time in the JVM, tected early. Under the approach, the launcher creates whereas the compiler mostly avoids them, as the effec- a pair of named pipes in the filesystem, starts the Java tiveness of these optimizations depend greatly on run- process (informing it of the names of the two pipes), time characteristics not available during compilation.

1 Altogether, these optimizations can allow a Java pro- junixsocket3 cess to run at least as fast as a natively compiled pro- This is a JNI that uses the the Java Sockets gram, while still retaining the benefits of portability. API, and supports RMI. Fitting in with the Sockets However, each one also comes with an overhead, which API is a feat, as much of it is based on Internet- must be amortized by its results being applicable over domain sockets, and doesn’t seem to fit well when a sufficiently long period. For example, a Java pro- a broader abstraction is required. gram used as a short-lived command cannot easily ben- efit from JIT because JIT-ed code may only be executed J-BUDS a small number of times before exiting (and the next in- This is an orphaned project. [Investigate.] vocation will have to be re-JIT-ed), nor from HotSpot gnu.net.local because the process terminates before enough analysis This is another oprhaned project. [Investigate.] has been performed. Consequently, Java performs best when used for long-running programs, such as desktop Providing native support for Unix-domain sockets in applications and server processes. Java seems to create more problems than it solves. It Some environments are designed to work with short- is necessarily platform-specific, so the Java code that lived processes. A command shell such as Bash re- uses the support must be wrapped behind a platform- peatedly executes external commands, and the Com- independent abstraction if the program is to remain mon Gateway Interface (CGI) used by web servers to platform-independent itself. The fact that the Internet- delegate requests to user programs mandates one pro- domain nature of the Java Sockets API leaks through its cess per request. These are therefore poor environments own abstraction means that that API does not help to in which to use Java. hide platform-specificity.

Persistence for CGI programs Usmux

Java isn’t the only language to suffer great overheads Usmux is an approach to allow one-invocation-per- due to repeated start-ups. Any per-process delay on a request environments like CGI to be used with persis- CGI program, in whatever language, makes CGI the tent Java processes. At the time of writing, the im- bottleneck for any high-demand service. FastCGI1 was plementation is POSIX-specific, but the architecture is devised to address this issue by using a regular TCP platform-independent. Implementations on other plat- connection between the web server and the CGI pro- forms may be possible. gram, which runs persistently and serves multiple re- On a POSIX system, the usmux command takes a quests, instead of running once per request. This is a Java command as its trailing arguments. It also opens a good approach, but the Usmux project aims for a more Unix-domain socket, creates two named pipes, and in- general mechanism than one built to support CGI; it jects the names of these pipes as a configuration string should be possible to implement a FastCGI-like mech- into the Java command, which is then executed in the anism over Usmux. background. It then accepts connections on the socket, [upcgi] and presents them as sessions to the Java process by creating another pair of named pipes per session, using its original pipes to notify the Java process of them, and Java support for Unix-domain sockets relaying data between the pipes and the socket. As a result, the Java process can run persistently, but be in- Another approach to allow a Java process to run per- voked by multiple clients over a Unix-domain socket. sistently is to provide a library supporting Unix-domain On other platforms, a similar command could use sockets. Several implementations are summarized here: other platform-specific features to interact with a server running the same Java software. Only the injected con- JUDS2 figuration string needs to change, and the Usmux Java [Investigate.] library will dynamically load in the matching server- side implementation. 1http://www.fastcgi.com/drupal/ 2https://github.com/mcfunley/juds 3http://code.google.com/p/junixsocket/

2 Even under POSIX, other mechanisms are possible user id, group id) can be detected by the , and too. For example, usmux could create a pair of named be relayed to the server. When the client connects di- pipes, and multiplex several sessions over them. (How- rectly to the server via TCP, its host and port can be ever, experience has shown that this is error-prone to made available to the server. These are exposed to the code, and hampers over-all through-put.) server in a generic form as client meta-data, described The use of a Unix-domain socket to communicate in §2.1. When the daemon is involved in implementing with clients has several advantages. First, as its ren- sessions, the protocols involved in talking to the server dezvous point is part of the filesystem, the host’s native allow the daemon to relay this information as opaque access control can be applied to it. For more sophisti- binary data. cated control, it is possible to inform the server of which Figure 2 depicts one daemon-server scheme which user or group is actually connecting. Finally, invocation can appear platform-independent to the server, in which of the client could exploit external authentication proto- all connections accepted by the daemon are multiplexed cols such as SSH. over two simplex named pipes. This protocol is de- scribed in §3.2, and is capable of relaying arbitrary client meta-data to the server. Architecture Alternatively, the daemon may establish a pair of pipes per session, and inform the server of each one, Usmux aims to allow a client to initiate a session with plus client meta-data, over a pipe pair dedicated for sig- a server. Each session is a pair of simplex streams car- nalling, as shown in Figure 3. This scheme is described rying byte-oriented data. Upstream denotes travel from in §??, and is also capable of relaying arbitrary client the client to the server, while downstream denotes travel meta-data to the server. in the opposite direction. Either peer may terminate ei- Under another alternative, the daemon serves only to ther stream at any time, and the session is terminated fork the server into the background. The server then when both streams are terminated. opens an Internet-domain socket, and accepts TCP con- The basic Usmux architecture is shown in Figure 1. nections directly from the client as sessions. The Usmux daemon sits between the server and its It is essential that a Java implementation of the server clients, and relays messages between them. The dae- component can operate as independently as possible mon is also responsible for forking the server into the from these various communication schemes between background. client, daemon and server. To this end, a Java API (§2.2) Various communication schemes may be employed is defined to provide an abstraction of sessions. This between client and daemon, and between daemon and abstraction provides only an input stream and an output server. In some schemes, client and server commu- stream, with generic access to client meta-data. nicate directly, with the daemon playing no part af- ter start-up. Client-daemon schemes may involve a Client meta-data Unix-domain socket, an Internet-domain socket, or any other connection-oriented communication supported by Meta-data provides information on connecting clients the host platform(s). Daemon-server schemes may be which are otherwise hidden from the server behind the similarly varied, but are constrained mainly by hav- Usmux daemon. The format of this information de- ing to appear platform-independent in order to be ac- pends on the connecting mechanism. For example, if cessible from Java. Effectively, the daemon acts as an the client talks to the daemon via a Unix-domain socket, adapter between the client’s use of a platform-specific the platform may be able to provide the daemon with technology and the server’s requirement for a platform- (say) the user id of the client process. This informa- independent one. tion is potentially useful to the server process, but has The session abstraction must necessarily be very no fixed form, so it is relayed from daemon to server primitive, as the only characteristic that a session has in a binary format, which the multiplexing scheme can under any scheme is that it consists of two streams. regard as opaque. An extensible framework then allows However, the server may be able to make use of infor- multiple binary formats to be interpreted as various Java mation about the connecting client, even though it is structures, by making the data self-describing by its ini- scheme-specific. When the client connects to the dae- tial bytes, length and/or context. mon with a Unix-domain socket, its details (process id, (Under some schemes, the client talks directly to the

3 Figure 1: Basic architecture

Figure 2: Multiplexing between daemon and server over named pipes

Figure 3: Per-client pipe pairs between daemon and server

4 server, leaving the daemon uninvolved as soon as the The Java server should be able to operate with server is established. Under these schemes, the data is different schemes, depending on the platform avail- likely already available to the server, and does not need able. It remains platform-independent by accept- a binary format.) ing an opaque string as an argument, and passing §?? descripbes how meta-data is transmitted in the this to SessionServerFactory.makeServer. multipipe protocol. §3.2.1 describes how meta-data is This method loads and instantiates an appropriate transmitted in the multiplex protocol. SessionServer implementation which can make Two meta-data types are specified here. use of information extracted from the string. Subsequently, the Java server merely needs to Unix-domain client meta-data start() the SessionServer object, and invoke accept on it repeatedly to service sessions. These When the client that instigates a session is calls each yield a Session object, which provides communicating through a connection-oriented the basic input and output streams of the session. Ta- Unix-domain socket, the meta-data can be a ble 1 shows this main loop, where config is the 4 UnixDomainSocketAttributes object con- opaque configuration string, and SessionThread is taining user id, group id and process id of the calling an application-specific class to process each session. client. As binary data relayed between daemon and Session also provides a getAttributes server, this is represented as 10-byte block, starting method to extract scheme-specific information, espe- with the US-ASCII codes for ‘UNIX’ (Figure 4). The cially about the client. For example, Table 2 shows next two bytes are the process id, followed by two how to get a structure providing the process id, user bytes for the user id, and two bytes for the group id, all id and group id of the caller who connected to the in big-endian format. Unix-domain socket, if that scheme was used. 0 15 16 31 0x55 0x4e 0x49 0x58 Multiplexing schemes process id user id group id Three multiplexing schemes are defined. Historically, the single-pipe scheme (§3.2) was defined first. This attempts to multiplex sessions over a single pair of Figure 4: Binary meta-data format for a Unix-domain simplex channels, so communication must be divided socket into discrete messages, and peers must carefully availability of buffer space to avoid session activities impeding each other. The result appears to have poor Internet-domain client meta-data performance. The multipipe scheme (§3.1) attempts to improve When the client that instigates a session is performance by allowing each session to use a dedi- communicating through a connection-oriented cated pair of pipes. An additional pair are used for sig- Internet-domain socket, the meta-data can be an nalling. InetSocketAddress5 object containing the IP The TCP scheme (§3.3) simply allows the server pro- address and port number of the calling client. cess to use a regular TCP socket. However, to keep par- This type of meta-data has no binary representation, ity with other schemes, there is an additional step en- as the client is expected to connect to the server directly. abling the daemon to detect that the server has reached readiness, and then leave it running in the background. Java API The scheme also supports simple access control. The Javadoc-generated documentation is available on- line6. This section gives an overview of its use. Multipipe scheme

4 uk.ac.lancs.scc.usmux.unix.UnixDomainSocketAttributes Under the multipipe scheme, each session operates 5java.io.InetSocketAddress 6http://www.comp.lancs.ac.uk/ ss/apis/usmux/overview- over a dedicated duplex channel, while another duplex summary.html channel (the control channel) is used for signalling.

5 SessionServer server = SessionServerFactory.makeServer(config); server.start(); Session sess; while ((sess = server.accept()) != null) new SessionThread(sess).start();

Table 1: Main server loop

UnixDomainSocketAttributes attrs = sess.getAttributes(UnixDomainSocketAttributes.class); System.out.println("User: " + attrs.getUserId()); System.out.println("Group: " + attrs.getGroupId()); System.out.println("Process: " + attrs.getProcessId());

Table 2: Accessing Unix-domain socket attributes of the client

The scheme uses a configuration string of the form the pipe that forms the downstream half. multipipe:upfile;downfile upfile , where is The multiplexing protocol consists of several byte- the name of a that forms the upstream half oriented messages transmitted over a reliable, full- downfile of the control channel, and is the name of the duplex channel between two end-points. Each end- pipe that forms the downstream half. point is the peer of the other. One end-point is the The downstream control channel is used initially to server process (or just server), while the other is the detect readiness of the server—signaled by a single daemon process (or just daemon). Each session con- byte—and to detect when the server is terminating ac- sists of a pair of streams, one in each direction between ceptance of future connections—the channel is closed. the two end-points. For each session, the daemon creates an additional pair of named pipes, and uses the upstream control The protocol is designed to support connections ac- channel to transmit their names to the server. The down- cepted on a Unix-domain socket as sessions multi- stream pipe’s name is printed on a line of its own, fol- plexed over the pair of named pipes used by Usmux, lowed by the upstream pipe’s name. One channel is then and permits meta-data about each connection to be sup- used to transmit upstream data of the session, and one plied to the Java process. However, it is not dependent downstream. on named pipes or Unix-domain sockets, so it can be The daemon creates all named pipes, and should applied to hosts that support neither of these concepts. delete them on termination if not yet passed to the To this end, the protocol regards the meta-data as an server. However, the server should delete them itself opaque block. Similarly, it is not tied to Java in any way, as soon as it has opened them. so the server process can be written in any language. The meta-data for each session is transmitted as a The protocol has an initialization phase, in which the prefix to the upstream data, beginning with a 2-byte big- server process transmits a ‘ready’ message to indicate endian length. The downstream channel is used exclu- its readiness to receive sessions. This phase also allows sively for application data. the server to indicate how much meta-data it will accept per session. The single-pipe scheme After intitialization, sessions can be relayed over the channel. Each session has a handshake phase, during The single-pipe scheme allows multiple sessions to which meta-data is supplied to the server, and the server share a single duplex channel, over which a multiplex- and daemon exchange references. A data phase fol- ing protocol is run. The scheme uses a configuration lows, in which each end-point repeatedly updates its string of the form pipe:downfile;upfile, where peer about the space available in its reception buffer, upfile is the name of a named pipe that forms the up- and transmits data according to space available in its stream half of the channel, and downfile is the name of peer’s buffer.

6 Initialization phase Messages

The initialization phase consists of the daemon waiting Each message has a type and a length (Figure 5), so for the server to send a RDY message. This message unrecognized messages can be skipped. The message may contain a meta-data limit, the maximum number type is a two-byte big-endian field at the start of the of bytes the server is prepared to receive as payload for message. The length is the two-byte big-endian field future NEW messages sent to initiate each session. that follows it. The whole message need not be word- No other messages may be transmitted over the chan- aligned. Message types are summarized in Table 3. nel until RDY has been received. 0 15 16 31 message type payload length Handshake phase payload hhh hh hhhh Each session begins with a handshake phase. The dae- hhhh hhh hhh hhhh mon initiates each session by transmitting a NEW mes- hhh hhh hhhh hhh sage. This includes the session id that the daemon will hhh hhh hhh hhhh use in all future messages concerning this session. Any hhh hh hhhh additional data is considered to be meta-data, which the payload server may use in any way it sees fit. The server responds to each NEW message with an Figure 5: Message format ACK message. This echoes back the daemon’s session id, and includes the server’s session id. From this point on, each message related to this ses- sion and transmitted by the daemon will use the server’s Unused message types session id, while the server will correspondingly use the daemon’s session id. The UNK and NOP message types must never be sent. The receiver may ignore them entirely. They are in- tended to be used by the daemon or the server inter- Data phase nally to indicate that no complete message header has After a session has been established by a handshake been received, or that the remainder of a message may phase, either party may transmit CTS messages accord- be ignored. ing to the space it has available to receive data for the relevant session. The effect of multiple CTS messages is cummulative, so a peer can send upto the sum of the RDY – Server ready amounts specified by received CTS messages, minus that which it has already sent. An end-point can use The RDY message type (Figure 6) is the first one sent this to ensure that its peer can send a complete message over the channel. It is sent only once, and by the server without blocking due to the receiver’s buffer being full. to the daemon. If its payload is at least two bytes, these are taken as a big-endian unsigned integer specifying To transmit data, a peer sends a STR message, whose the maximum number of meta-data bytes that the server payload starts by identifying the session id, and contin- will accept with each session. Any other payload, or an ues with the data. The sum of the message’s data length incomplete payload, may be ignored. (the payload length minus two) and of the data lengths of all prior STR messages for the same session must not 0 15 16 31 exceed the sum of the clearances of all CTS messages 2 (RDY) payload length received for the same session. When a peer has no more data to send, it must send meta-data limit an EOS message to close the session. A peer can also RDY indicate that it will not use any more data by sending a Figure 6: The message – server ready STP message.

7 Type Value Direction Meaning Parameters UNK 0 never sent Unknown None NOP 1 never sent No operation None RDY 2 server-to-daemon Server ready Meta-data limit NEW 3 daemon-to-server New session Session id and meta-data ACK 4 server-to-daemon Session acknowledged Session ids CTS 5 both Clear to send Session id and clearance STR 6 both Stream data Session id, data length and data EOS 7 both End of stream Session id STP 8 both Stop stream Session id

Table 3: Message types

0 15 16 31 Handshake message types 4 (ACK) payload length

Sessions are initiated by the daemon. For each ses- daemon session id server session id sion, it issues a NEW message (Figure 7), with at least ACK two bytes of payload giving the daemon’s session id, Figure 8: The message – session acknowledged which the server must use in all future communication pertaining to this session. Data message types Any remaining payload is interpreted by the server as meta-data. For example, if the session is a relay for For each session, each end-point must only send the a Unix-domain socket, the meta-data may contain the amount of data it has been permitted by its peer. One process id, user id and group id of the calling process. end-point permits its peer to send data by sending CTS §2.1 lists some meta-data types. messages (Figure 9), which includes the peer’s session id, and a 32-bit signed length. Each message instructs

0 15 16the receiver 31 to accumulate provided lengths, and to send no more data than the accumulated value, including data 3 (NEW) payload length already send on the session.

daemon session id 0 15 16 31 meta-data 5 (CTS) payload length h hhhh hhhh hhh peer’s session id clearance hhh hhhh hhhh hhh hhh hhhh clearance hhhh hhh hhh hhhh hhhh hhh hFigurehhh 9: The CTS message – clear to send

To send data, an end-point sends a STR message Figure 7: The NEW message – new session (Figure 10) to its peer. This includes the peer’s session id, with the remaining payload being the stream data. When an end-point has no more data to send on its On receipt of a NEW message, the server must gen- stream, it must send an EOS message (Figure 11), un- erate a session id of its own, and respond with an ACK less it has already received a STP message for the ses- message (Figure 8). The first two bytes of its payload sion. This includes the peer’s session id. It must not form the daemon’s session id; the next two form the send any further STR messages after that. server’s session id, which the daemon must use in all When an end-point cannot use data received on a future communication pertaining to this session. Addi- stream, it may send a STP message (Figure 12). This tional payload bytes carry no meaning, and can be ig- includes the peer’s session id. The peer need not send nored. any further STR or EOS messages on the session, and

8 0 15 16 31 6 (STR) payload lengthExtensions peer’s session id Creating daemon-server communication stream data schemes hhh hh hhhh hhh hhh The Java server does not need to communicate with hhhh hhh hhh hhhh hhh hhh the Usmux daemon in one particular way. Indeed, hhhh hhh hhh hith doeshh not need to involve the daemon at all. New hhhh hh schemeshh canh be defined, and then implemented by de- riving from SessionServerFactory, and provid- ing a createServer method. This takes an opaque Figure 10: The STR message – stream data configuration string, which the factory class either rec- ognizes or ignores (to be passed to other factories).

0 15 16When the string 31 is recognized, its data can be used to SessionServer 7 (EOS) payload lengthconfigure the that the factory class must create. peer’s session id The recommended format for a configuration string is one that appears URI-like, i.e. Figure 11: The EOS message – end of stream scheme:params The factory class can then either recognize the scheme and process the parameters to create a the end-point need not use them. SessionServer, or return null. When the factory class is packaged, its jar should in-

0 15 16clude an entry 31 in the file named in Table 4 giving the 8 (STP) payload lengthfactory’s class name. This makes it available automati- cally to the static makeServer method that checks an peer’s session id opaque configuration string against all registered facto- ries. Figure 12: The STP message – stop stream Creating client-daemon communication schemes Consider the meta-data that can be provided to the ap- TCP scheme plication through Session.getAttributes. De- fine a class to provide this data. The scheme uses a configuration string of the form pipetcp:sigfile;bindip;bindport;acl, where bindip is IP address to which the server should bind a TCP socket to, and listen on it, and bindport is the port number to which the socket shall be bound. acl is a semicolon-separated list of IP addresses, IP address prefixes, and hostnames (possibly with wild- cards), from which the server should accept connec- tions. Connections from entries prefixed with a dash - should be rejected. sigfile should be a named pipe. The server should open and close it to indicate that it is ready. The dae- mon should not return control to the user until this has happened.

9 META-INF/services/uk.ac.lancs.scc.usmux.ServerSessionFactory

Table 4: holding registrations of server factories

META-INF/services/uk.ac.lancs.scc.usmux.meta.MetadataHandler

Table 5: File holding registrations of meta-data handlers

10