Notebooks to Document Workflow
Total Page:16
File Type:pdf, Size:1020Kb
X XXX XX Notebooks to document workflow: X XX XX XX XXX XX XX XX XXX XX XX XX XXX a possible XX useful component in Virtual Research Environments X Oceanography Data analysis Reproducibility SeaDataCloud ODIP Virtual Research Environments Python Julia Jupyter 0 What do we want to do? Figure 5: Jupyter ( http://jupyter.org/) 6 Anatomy of a notebook Web application for the creation and sharing of notebook-type documents. Notebooks contain: To use jupyter notebooks both as a user guide and Evolved from IPython ,a command shell for interactive computing (2001). 1. Text cell to document the code as a user interface to describes the different steps to Reproducibility ! More than 40 language kernels available 2. Cell codes that can be exectuted piece by piece generate research products. ! Can be used as a multi-user server ( jupyterhub ) 3. Results from the code execution ! avoid installation steps on several users’ machine ”Interactive notebooks: Sharing the code”, Nature (2014) http://www.nature.com/ 4. Figures or animations news/interactive-notebooks-sharing-the-code-1.16261 3 Why do we use Jupyter? 1 A few definitions & acronyms… Text cell for documenting ( ) 1. Open source… as the others Application Programming Interface (API): rules and specifications that software Code cell to be executed 2. Many programming languages… as the others programs can follow to communicate with each other. 3. Easy installation… as some of the others Result of the cell execution DIVAnd: a software tool designed for the interpolation of in situ measurements in n 4. A nice solution to deploy on a cloud: JupyterHub dimensions. Notebook: web application for the creation and share of documents that include: Tool name R-Markdown Jupyter Beaker Cocalc Zeppelin I code .../rstudio/rmarkdown .../jupyter/notebook .../twosigma/beakerx .../sagemathinc/cocalc .../apache/zeppelin https://github.com/... I Julia, Python, R, figures and animations Julia, Python, R, Scala, R ,Python, Octave , R, Python, SQL, Bash, Javascript, C++, Torch, Scala, Python, SparkSQL, Languages Bash, Octave, Rubi, Cython, Julia,Java, I Rcpp, Stan, JavaScript Scala, Bash, Octave, Rubi, Hive, Markdown text, including equations Fortran, PHP, … C/C++, Perl, Ruby Fortran, … HTML, PDF, MS Word, PDF, LaTeX, HMTL, Virtual research environment (VRE): online infrastructure which aims to help researchers Export formats Beaker format JSON Order of execution ComparisonBeamer, HTML5 slides, … Markdown, reST to work collaboratively by managing Cloud deployment – JupyterHub Beaker Lab (discontinued) – Yes I the software tools, services and technologies, Figure obtained from the cell execution I the access to data resources, public or private, 4 A few more words about I the publication of the results obtained. 8 9 < spawns = Code cell to generate the plot Multi-user Hub which : manages ; multiple instances of the single-user Jupyter notebook 2 Notebooks: what exists today? proxies server ( https://github.com/jupyterhub/jupyterhub). Figure 1: Beaker ( http://beakernotebook.com/ ) Running Notebook-style development environment for working interactively with large and complex datasets. notebooks ! Usage of different languages in different cells, and terminals within the same notebook New notebooks ! Language manager with available language kernels Figure 2: ”Collaborative Calculation in the Cloud” ( https://cocalc.com/ ) Web-based cloud computing platform, formerly called formerly called SageMathCloud. ! Support of many languages ! Users to upload their file on the platform Available files to be later read or processed and directories Figure 6: Test instance of jupyterhub deployed for the SeaDataCloud VRE. Figure 3: R-Markdown ( http://rmarkdown.rstudio.com/ ) Spawner: responsible for the start of the computer environment for the user, either directly Dynamic, self-contained documents with embedded on the server or on a cluster. Several spawners are available, among them DockerSpawner, Figure 7: Examples of notebook cells. chunks of code. ! Possible to export in journal which enables JupyterHub to spawn single user notebook servers in Docker containers ( https://github.com/rstudio/rticles ) or ( https://github.com/jupyterhub/dockerspawner). presentation formats 7 Next steps ! A LTEXtemplates to ensure journal standards 5 How will we use notebooks in the VRE? ! Use the files produced by webODV as an input for DIVAnd ! Build a RESTful API to make easier the integration into the VRE 3 steps into 1 Figure 4: Apache Zeppelin ! Publish the notebooks along with the data and the products ( https://zeppelin.apache.org/ ) Web-based notebook for data-driven, interactive and 8With DIVA 9 with DIVAnd collaborative documents. < Read the doc = Acknowledgements Intended for big data and large scale projects. Compile and run the code ! Run the notebook! ! : ; The work presented in this poster has been developed in the frame of Languages can be mixed in the same notebook Document the execution: parameters, configuration, … ! Users can write their own interpreter (language SeaDataCloud–Further developing the pan-European infrastructure for marine and ocean backend) data management, Project ID 730960 and ODIP 2–Extending the Ocean Data Interoperability Platform, Project ID: 654310. Authors: Charles Troupin, Alexander Barth, Sylvain Watelet and Jean-Marie Beckers GHER_ULiege https://github.com/gher-ulg/ http://labos.ulg.ac.be/gher/.