Improving C/C++ Open Source Software Discoverability by Utilizing Rust and Node.Js Ecosystems
Total Page:16
File Type:pdf, Size:1020Kb
Improving C/C++ Open Source Software Discoverability by Utilizing Rust and Node.js Ecosystems Kyriakos-Ioannis D. Kyriakou1, Nikolaos D. Tselikas1 and Georgia M. Kapitsaki2 1 Communication Networks and Applications Laboratory, Department of Informatics and Telecommunications, University of Peloponnese, End of Karaiskaki Street, 22 100, Tripolis, Greece 2 Department of Computer Science, University of Cyprus, 75 Kallipoleos Street, P.O. Box 20537, CY-1678 Nicosia, Cyprus 1 {kyriakou, ntsel}@uop.gr, 2 [email protected] OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Motivation Translation of programs written in C/C++ for the Web has been a recent topic of interest in various fields of research The potential of using Rust instead of other systems programming languages is another emerging recent topic Can those technologies be combined? Enhance C/C++ OSS with modern development techniques Improve the state of C/C++ OSS discoverability OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece JavaScript: A Success Story in OSS o Has been declared as the most popular programming language for the fifth consecutive time, Stack Overflow 2017 survey o Was the programming language that appeared most frequently in multi-language projects, GitHub study by TF Bissyandé et al. o The Node.js platform is such a multi-language project, where its components are written in both JS and C/C++ o What is the key to its proliferation in OSS? OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece NPM: Node.js package manager and registry Houses the largest distribution of open source libraries in the world, used in the browser, servers, cross-platform and mobile applications, etc. It already contains non-JavaScript code. OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece WebAssembly “WebAssembly is a portable size and load-time efficient format, suitable for compilation to the Web”, and more e.g. Nebulet kernel Implemented in all major Web browsers Still under the process of standardization via the W3C WebAssembly Working Group Near native performance and portability Garbage Collection, threads, SIMD, etc. are in progress Asm.js predecessor can be used as a fallback mechanism OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Inherent Challenges in C/C++ OSS OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Inflexible Codebase Componentization Groups of program files in directories representing logical components and interfaces defined by a set of header files Large amount of third party build systems available, and are rather complex Incapable of providing the cohesion needed across OSS e.g. CMake, qmake, SCons, GYP, etc. Codebases, not using build systems, gradually degrade in maintainability, as observed by Dayani-Fard et al. OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Lack of query-able directory or repository and enforced conventions Manual pursuit of components Generic OSS repositories Web search-engines Various third-party tools Incapable of providing the cohesion needed across OSS e.g. buckaroo, cget, conan, conda, cpm, cppan, etc. No enforced conventions Unlike other popular OSS languages mvn, npm, gem, pip, etc. Facilitate interaction with code repositories, document, test, build etc. OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Memory management and undefined behaviors Type safety is not guaranteed by C/C++ programs may not exhibit type errors undefined behaviors are incorporated in the standard specification, e.g. iterator invalidation Memory safety is set at risk null-pointer dereferences dangling pointers buffer overruns Malicious software exploits the way C and C++ programs handle memory Hence, OSS discovered in the wild, may propagate unwanted effects to derivative projects, even when used from a garbage-collected language OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Rust: A Young Contender in OSS Was created in order to address the aforementioned challenges “Rust is a systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety” Integrated command line application cargo serves as a complete project management tool enforces the practices that enable OSS to be discoverable and maintainable Interaction with C APIs is free of overheads, and the binaries produced can be called from C with no setup Opportunity of utilizing this system Bridge the gap between C/C++ codebases and modern OSS development practices Target WebAssembly OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Incremental procedure of repackaging and publishing C/C++ OSS We define and examine a two-step incremental procedure of wrapping a C/C++ codebase utilizing Rust’s and Node.js’ tooling and conventions, targeting OSS repositories as well as NPM and crates.io registries OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Step 1: Using Rust to package and publish on crates.io cargo is responsible for instantiating the project Local git repository Metadata and dependencies Hard to discover OSS can be published on public repositories, e.g. GitHub cc, bindgen , cbindgen packages provide automation Generate bindings statically via header files Compile and link files Generate C bindings Specify parameters in build.rs file instead of using third-party build systems cargo build Produces native as well as WebAssembly binaries cargo publish Package becomes discoverable through crates.io registry OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Step 2: Using Node.js to Package and Publish on NPM npm is responsible for instantiating the project Metadata and dependencies Expose compiled library to JS via WebAssembly and/or ffi module Idiomatic JS API Optionally bind library to V8 to produce Node.js Addon npm publish Package becomes discoverable through npmjs.com registry OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Pragmatic Proof Of Concept Experiment: Search for “Park Miller Carta PRNG” GitHub One JS project Web search A page dedicated to the algorithm, including documented sources in assembly, C and C++ Typical case of a hard-to-discover C/C++ open-source library Download and extract source files Perform the first step of the proposed methodology, i.e. repackaging project with Rust tooling and conventions Sanity check in the form of unit tests OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Observations during the procedure The process of bindings creation was fully automated Uploading project to popular OSS repositories and crates.io is possible at this point Creating an idiomatic Rust interface Requires the use of “unsafe” keyword for interacting with foreign code Marks the parts of the code the Rust compiler cannot provide guarantees for Sanity checks fail In this particular PRNG library, the seed-state is held in a global mutable variable Unit tests run in parallel by default, library is not thread-safe, --test-threads=1 OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Observations during the procedure The original sources were retained in an archive module and documented Rust could be used to provide guarantees In this case, it was trivial to replace the unsafe blocks by altering the way the state is stored and accessed Automatically generated C bindings The product was an idiomatic Rust API, including a C compatible API, that had been documented and tested Pushed on GitHub and published on crates.io Can be directly discovered and used via C/C++/Rust OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Harvesting JavaScript’s popularity Library compiled to WebAssembly (and asm.js) Create a high-level idiomatic JS API, simplify engadgement Library compiled to shared-object binary Dynamically link via the ffi module, for comparison Produced Documentation and published on NPM direct distribution and use in all JS environments by filling the metadata package.json Set a 2.5 months monitoring period OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Performance Evaluation: What is the cost? The idiomatic Rust implementation performed the same as the original C implementation Asm.js and WebAssembly, were 44x and 144x faster respectively compared to ffi Best of 8.7x slower execution, compared to the standalone native library context switching >> actual computation Hence, the modules were operating at their weakest possible scenario OSS 2018 14th International Conference on Open Source Systems June 8-10, 2018 | Athens, Greece Discoverability Improvement o Original website hosting the PRNG implementations receives 1 page-view/day, Alexa NA downloads/day o Observed period of 2.5 months NPM: 2.15 downloads/day Crates.io: 0.7 downloads/day Total of 2.85 actual downloads/day Reported metrics suggest that re-packaged OSS according to our proposed methodology can improve the state of C/C++ codebase