Improving C/C++ Open Source Software Discoverability by Utilizing Rust and Node.js Ecosystems

Kyriakos-Ioannis D. Kyriakou1, Nikolaos D. Tselikas1 and Georgia M. Kapitsaki2

1 Communication Networks and Applications Laboratory, Department of Informatics and Telecommunications, University of Peloponnese, End of Karaiskaki Street, 22 100, Tripolis, Greece

2 Department of Computer Science, University of Cyprus, 75 Kallipoleos Street, P.O. Box 20537, CY-1678 Nicosia, Cyprus 1 {kyriakou, ntsel}@uop.gr, 2 [email protected]

OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Motivation

Translation of programs written in C/C++ for the Web has been a recent topic of interest in various fields of research The potential of using Rust instead of other systems programming languages is another emerging recent topic Can those technologies be combined? Enhance C/C++ OSS with modern development techniques Improve the state of C/C++ OSS discoverability

OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece JavaScript: A Success Story in OSS

o Has been declared as the most popular programming language for the fifth consecutive time, Stack Overflow 2017 survey o Was the programming language that appeared most frequently in multi-language projects, GitHub study by TF Bissyandé et al. o The Node.js platform is such a multi-language project, where its components are written in both JS and C/C++ o What is the key to its proliferation in OSS?

OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece NPM: Node.js package manager and registry Houses the largest distribution of open source libraries in the world, used in the browser, servers, cross-platform and mobile applications, etc. It already contains non-JavaScript code.

OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece WebAssembly

“WebAssembly is a portable size and load-time efficient format, suitable for compilation to the Web”, and more e.g. Nebulet kernel Implemented in all major Web browsers Still under the process of standardization via the W3C WebAssembly Working Group Near native performance and portability Garbage Collection, threads, SIMD, etc. are in progress Asm.js predecessor can be used as a fallback mechanism

OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Inherent Challenges in C/C++ OSS

OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Inflexible Codebase Componentization

Groups of program in directories representing logical components and interfaces defined by a set of header files Large amount of third party build systems available, and are rather complex Incapable of providing the cohesion needed across OSS e.g. CMake, qmake, SCons, GYP, etc. Codebases, not using build systems, gradually degrade in maintainability, as observed by Dayani-Fard et al.

OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Lack of query-able directory or repository and enforced conventions

Manual pursuit of components Generic OSS repositories Web search-engines Various third-party tools Incapable of providing the cohesion needed across OSS e.g. buckaroo, cget, conan, conda, cpm, cppan, etc. No enforced conventions Unlike other popular OSS languages mvn, npm, gem, pip, etc. Facilitate interaction with code repositories, document, test, build etc.

OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Memory management and undefined behaviors

Type safety is not guaranteed by C/C++ programs may not exhibit type errors undefined behaviors are incorporated in the standard specification, e.g. iterator invalidation Memory safety is set at risk null-pointer dereferences dangling pointers buffer overruns Malicious software exploits the way C and C++ programs handle memory Hence, OSS discovered in the wild, may propagate unwanted effects to derivative projects, even when used from a garbage-collected language

OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Rust: A Young Contender in OSS

Was created in order to address the aforementioned challenges “Rust is a systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety” Integrated command line application cargo serves as a complete project management tool enforces the practices that enable OSS to be discoverable and maintainable Interaction with C APIs is free of overheads, and the binaries produced can be called from C with no setup Opportunity of utilizing this system Bridge the gap between C/C++ codebases and modern OSS development practices Target WebAssembly

OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Incremental procedure of repackaging and publishing C/C++ OSS We define and examine a two-step incremental procedure of wrapping a C/C++ codebase utilizing Rust’s and Node.js’ tooling and conventions, targeting OSS repositories as well as NPM and crates.io registries

OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Step 1: Using Rust to package and publish on crates.io

cargo is responsible for instantiating the project Local git repository Metadata and dependencies Hard to discover OSS can be published on public repositories, e.g. GitHub cc, bindgen , cbindgen packages provide automation Generate bindings statically via header files Compile and link files Generate C bindings Specify parameters in build.rs file instead of using third-party build systems cargo build Produces native as well as WebAssembly binaries cargo publish Package becomes discoverable through crates.io registry

OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Step 2: Using Node.js to Package and Publish on NPM

npm is responsible for instantiating the project Metadata and dependencies Expose compiled library to JS via WebAssembly and/or ffi module Idiomatic JS API Optionally bind library to V8 to produce Node.js Addon npm publish Package becomes discoverable through npmjs.com registry

OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Pragmatic Proof Of Concept Experiment: Search for “Park Miller Carta PRNG”

GitHub One JS project Web search A page dedicated to the algorithm, including documented sources in assembly, C and C++ Typical case of a hard-to-discover C/C++ open-source library Download and extract source files Perform the first step of the proposed methodology, i.e. repackaging project with Rust tooling and conventions Sanity check in the form of unit tests

OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Observations during the procedure

The process of bindings creation was fully automated Uploading project to popular OSS repositories and crates.io is possible at this point Creating an idiomatic Rust interface Requires the use of “unsafe” keyword for interacting with foreign code Marks the parts of the code the Rust compiler cannot provide guarantees for Sanity checks fail In this particular PRNG library, the seed-state is held in a global mutable variable Unit tests run in parallel by default, library is not thread-safe, --test-threads=1

OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Observations during the procedure

The original sources were retained in an archive module and documented Rust could be used to provide guarantees In this case, it was trivial to replace the unsafe blocks by altering the way the state is stored and accessed Automatically generated C bindings The product was an idiomatic Rust API, including a C compatible API, that had been documented and tested Pushed on GitHub and published on crates.io Can be directly discovered and used via C/C++/Rust

OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Harvesting JavaScript’s popularity

Library compiled to WebAssembly (and asm.js) Create a high-level idiomatic JS API, simplify engadgement Library compiled to shared-object binary Dynamically link via the ffi module, for comparison Produced Documentation and published on NPM direct distribution and use in all JS environments by filling the metadata package.json Set a 2.5 months monitoring period

OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Performance Evaluation: What is the cost?

The idiomatic Rust implementation performed the same as the original C implementation Asm.js and WebAssembly, were 44x and 144x faster respectively compared to ffi Best of 8.7x slower execution, compared to the standalone native library context switching >> actual computation Hence, the modules were operating at their weakest possible scenario

OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Discoverability Improvement

o Original website hosting the PRNG implementations receives 1 page-view/day, Alexa NA downloads/day o Observed period of 2.5 months NPM: 2.15 downloads/day Crates.io: 0.7 downloads/day Total of 2.85 actual downloads/day Reported metrics suggest that re-packaged OSS according to our proposed methodology can improve the state of C/C++ codebase distribution and discovery By providing a higher-level API, easier engagement can be achieved for C/C++ OSS In the case of more complex codebases, publishing focused components, exposure improvements could potentially be realized collectively OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Code Quality

By interfacing the foreign C/C++ code with Rust, an undocumented thread-safety weakness was discovered Wrapping error-prone code in Rust’s unsafe blocks and documenting them, is a reasonable method to minimize the debugging surface Largely unfocused codebases can benefit from the concept of smaller components in the form of modules

OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Performance degradation

The scenario served well to stress the performance of function invocation during context-switching interoperability and qualitatively determine the overhead Rust was found to be able to call into C/C++ libraries without associated overhead The minimum observed overhead by WebAssembly, while operating in a biased scenario against it, was found to impose about 8.7x slower execution, compared to the standalone native library The common ffi method of interfacing with shared objects has shown worse performance while being less portable

OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Code Portability

The aspect of portability can be greatly improved by the proposed methodology With practically every system incorporating a Web browser, the coverage gains are immeasurable for direct distribution of libraries The aforementioned observations may help to collectively modernize the state of C/C++ OSS distribution and discovery, while addressing typical shortcomings and pitfalls

OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece thank you

OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece