Introduction Jupyter Prof Hans Fangohr, 5 October 2019 1
09:00 - 09:30 Jupyter Notebook and Ecosystem 30', Hans Fangohr (European XFEL GmbH) 09:30 - 10:00 Status updates from facilities ► Jupyter at Synchrotron Soleil (Alain Buteau & Gwenaelle Abeille) 5' ► Jupyter at CERN (Manuel Gonzalez & Jakub Wozniak) 5' ► Jupyter at European Southern Observatory (Gianluca Chiozzi) 5' ► Jupyter at J-PARC MLF (Kentaro Moriyama) 5’ ► Jupyter at MAX IV Synchrotron (Vincent Hardion) 5’ 10:00 - 10:30 Coffee break 10:30 - 12:30 JupyterLab Tutorial 2h0', Saul Shanabrook (Quansight) 12:30 - 14:00 Lunch and networking 14:00 - 14:30 Jupyter for processing neutron event data at ESS 30', Dr. Jonathan Taylor (European Spallation Source) 14:30 - 15:00 Jupyter at Brookhaven National Laboratory 30', Dr. Daniel B. Allan (Brookhaven National Lab) 15:00 - 15:20 Jupyter for Accelerator Physics 20', Dr. Jonathan Edelen (RadiaSoft LLC) 15:30 - 16:00 Coffee break 16:00 - 17:00 Questions and answers, discussion, Hans Fangohr (European XFEL GmbH) Introduction Jupyter notebook and ecosystem
Hans Fangohr Data Analysis European X-ray Free Electron Laser (EuXFEL), Germany Professor of Computational Modelling University of Southampton, United Kingdom [email protected] @ProfCompMod
Jupyter Workshop, ICALEPCS 2019, New York, US, Saturday 5 October 2019 https://indico.desy.de/indico/event/23354 Introduction Jupyter Prof Hans Fangohr, 5 October 2019 3
Outline
Quick introduction Jupyter Notebook
Introduce a number of use cases (most from European XFEL)
Introduce a number of tools from Project Jupyter
Format (for the day)
Informal Ask questions to make it informal To guide the presenter what is useful
Slides will be made available on Workshop site (speakers please send to Hans Fangohr) In [1]: # Import Python and libraries we need later %matplotlib inline Introduction Jupyter Prof Hans Fangohr,from numpy 5 October import 2019 exp, cos, linspace import pylab 4 from ipywidgets import interact
Mathematical model: We would like to understand !(", #, $) = exp(−#") cos($") Jupyter Notebook Code: Here is an implementation: In [2]: def f(t, alpha, omega): """Computes and returns exp(-alpha*t) * cos(omega*t)""" Document hosted in web browser (demo) return exp(-alpha * t) * cos(omega * t) Interactive exploration: We can execute the function for values of ", # and $:
Combines In [3]: f(t=0.1, alpha=1, omega=10) text (markdown with LaTeX support) Out[3]: 0.48888574340060287 Computer code (Python) Or produce a plot (in a function plot_f so it can be re-used for different parameters): Output from code In [4]: def plot_f(alpha, omega): ts = linspace(0, 5, 500) # 500 points in interval [0, 5] ys = f(ts, alpha, omega) Saved in one document pylab.plot(ts, ys, '-') *.ipynb [IPYthon NoteBook] In [5]: plot_f(alpha=0.1, omega=10) # call function and create plot Combines input and output cells JSON format
Using interaction widgets, we can re-trigger execution of the plot_f function via GUI elements such as sliders.
In [6]: interact(plot_f, alpha=(0, 2, 0.1), omega=(0, 50, 0.5));
alpha 1.50
omega 45.50
Conclusion: We observe that parameter # is responsible for damping, and $ for the frequency. Introduction Jupyter Prof Hans Fangohr, 5 October 2019 5 Toolbar Notebook name Kernel name
Markdown cell
Currently selected cell
Code cell
Code output In [1]: # Import Python and libraries we need later %matplotlib inline Introduction Jupyter Prof Hans Fangohr, 5 October 2019from numpy import exp, cos, linspace import pylab 6 from ipywidgets import interact
Mathematical model: We would like to understand !(", #, $) = exp(−#") cos($") History of Jupyter Code: Here is an implementation: In [2]: def f(t, alpha, omega): """Computes and returns exp(-alpha*t) * cos(omega*t)""" return exp(-alpha * t) * cos(omega * t)
IPython à IPYthon NoteBook à *.ipynb Interactive exploration: We can execute the function for values of ", # and $:
In [3]: f(t=0.1, alpha=1, omega=10) IPython notebooks were language agnostic, as they run Out[3]: 0.48888574340060287 over open network protocols Or produce a plot (in a function plot_f so it can be re-used for different parameters):
In [4]: def plot_f(alpha, omega): The community began adding other languages to the ts = linspace(0, 5, 500) # 500 points in interval [0, 5] ys = f(ts, alpha, omega) notebooks, starting with Julia and R pylab.plot(ts, ys, '-')
In [5]: plot_f(alpha=0.1, omega=10) # call function and create plot As the project expanded away from just Python, the notebooks had to be renamed
JUlia, PYThon, and R à Jupyter
“Jupyter” is also a homage to Galileo’s notebooks which recorded the discovery of Jupiter’s moons
Using interaction widgets, we can re-trigger execution of the plot_f function via GUI elements such as sliders.
In [6]: interact(plot_f, alpha=(0, 2, 0.1), omega=(0, 50, 0.5));
alpha 1.50
omega 45.50
Conclusion: We observe that parameter # is responsible for damping, and $ for the frequency. Introduction Jupyter Prof Hans Fangohr, 5 October 2019 7
How does it work?
Starting jupyter-notebook starts the notebook server
Opening a notebook starts up the kernel
The kernel communicates to the notebook server via ZMQ
The notebook server shows content via the browser
JavaScript used in Browser Introduction Jupyter Prof Hans Fangohr, 5 October 2019 8
Supported Kernels
50+ languages supported. More complete list at https://github.com/jupyter/jupyter/wiki/Jupyter-kernels Introduction Jupyter Prof Hans Fangohr, 5 October 2019 9
Use case 1: data analysis in notebook
Explorative data analysis
Convenient combination of processing, results and interpretation
Complete capture of all computational steps good record for reproducibility and re-use ► FAIR data
Through export to HTML, easy to share with collaborators & supervisors
Scientists are confident drivers of this Example on the right from SCS instrument Introduction Jupyter Prof Hans Fangohr, 5 October 2019 10
Sharing and exporting notebooks
Can share *.ipynb files Can be displayed using jupyter-notebook Code can be re-executed on collaborator‘s machine ► Only if software is available, and data is available, and collaborators know how to start notebook
“Static sharing“ through html and email (for example) Often sufficient – does not require installation or additional skills ► Effective to communicate with supervisors, line managers etc Convert using menu “File -> Download as -> html“ Or using nbconvert: $ jupyter-nbconvert --to html 2-widgets.ipynb [NbConvertApp] Converting notebook 2-widgets.ipynb to html [NbConvertApp] Writing 382609 bytes to 2-widgets.html Introduction Jupyter Prof Hans Fangohr, 5 October 2019 11
Use case 2: notebooks as recipes
Pre-populate notebook with cells to carry out a particular type of data analysis Provide a directory full of such recipes to users Users execute cells during beamtime and later
Convenient compromise between Static recipe (=script) Interactive exploration
Experience Keep code in notebook cells short and move functionality into library (here “ToolBox“) Archive directory of modified recipes with data Introduction Jupyter Prof Hans Fangohr, 5 October 2019 12
Use case 4: notebooks as a script
Use Jupyter Notebook as a script Can execute using nbconvert to take commands in notebook, execute them, save resulting notebook. Can create data files and plots in process.
Use case: detector calibration pipeline Use of nbparametrize (or papermill) to insert run parameters into notebook before execution Automatic execution and creation of pdf Error messages embedded in output Introduction Jupyter Prof Hans Fangohr, 5 October 2019 13
Executing a notebook from the command line (nbconvert)
Convert from ipynb to html: $ jupyter-nbconvert --to html 2-widgets.ipynb
Can optionally set output name $ jupyter-nbconvert --to html --output myout.html 2-widgets.ipynb
Can execute notebook before conversion: $ jupyter-nbconvert --execute --to html --output myout.html 2-widgets.ipynb
Execute notebook and save results as notebook: $ jupyter-nbconvert --execute --to ipynb --output myout.ipynb 2-widgets.ipynb
[NbConvertApp] Converting notebook 2-widgets.ipynb to ipynb [NbConvertApp] Executing notebook with kernel: python3 [NbConvertApp] Writing 103398 bytes to myout.ipynb Introduction Jupyter Prof Hans Fangohr, 5 October 2019 14 nbconvert
nbconvert is a tool which can convert notebooks to various other formats such as: HTML, LaTeX, PDF, Markdown, ReStructuredText, Python, …
Static HTML pages can be created as documentation or tutorial Nbsphinx -> Jupyter notebook provides section in sphinx documentation
Homepage: https://github.com/jupyter/nbconvert
Notebook as a script approach Can change parameters inside notebook before execution: ► nbparametrize or papermill Introduction Jupyter Prof Hans Fangohr, 5 October 2019 15
Use case 5: (remote) data analysis environment (JupyterHub)
JupyterHub allows to users connect through browser and https use existing authentication systems serve notebooks on facility hardware connect to user’s file storage
Example: Jupyter Hub at EuXFEL & DESY Uses Maxwell HPC cluster
Popular with users: no software installation & browser of choice works locally and remotely the same Introduction Jupyter Prof Hans Fangohr, 5 October 2019 16
Using remote resources: JupyterHub
In principle user can ssh to HPC resource, use port forwarding to connect local machine with HPC computer running notebook server
JupyterHub helps orchestrate and manage individual Jupyter instances for multiple users
It provides an interface allowing users to easily spawn and connect their own Jupyter Server
Different options for resource allocation Integrated with HPC scheduler . . . Introduction Jupyter Prof Hans Fangohr, 5 October 2019 17
JupyterHub – manage individual Jupyter instances for multiple users Introduction Jupyter Prof Hans Fangohr, 5 October 2019 18
JupyterHub – manage individual Jupyter instances for multiple users Introduction Jupyter Prof Hans Fangohr, 5 October 2019 19 Introduction Jupyter Prof Hans Fangohr, 5 October 2019 20
Use case 6: blending GUI and script
JupyterWidgets provide graphical control elements in notebook Buttons, sliders etc trigger code execution and update of plot
Useful for Data analysis of fixed type Data exploration of data sets
Discussion Less powerful than, for example, Qt GUI Popular with users due to Being embedded in notebook No software installation (via JupyterHub) Introduction Jupyter Prof Hans Fangohr, 5 October 2019 21
Binder project
Given a (github) repository with Jupyter notebooks Software requirements (Dockerfile, requirements.txt, environment.yml)
Binder service builds a container with the required software starts Jupyter notebook server in that container offering the notebooks
Binder project provides free pilot at https://mybinder.org
Institutional Binder instances are being deployed Example: https://github.com/fangohr/jupyter-demo Introduction Jupyter Prof Hans Fangohr, 5 October 2019 22
Example: https://github.com/fangohr/jupyter-demo Introduction Jupyter Prof Hans Fangohr, 5 October 2019 23
Example: https://github.com/fangohr/jupyter-demo Introduction Jupyter Prof Hans Fangohr, 5 October 2019 24
Example: https://github.com/fangohr/jupyter-demo Introduction Jupyter Prof Hans Fangohr, 5 October 2019 25
BinderHub – mybinder.org Introduction Jupyter Prof Hans Fangohr, 5 October 2019 26
Use case 7: documenting software library
Use notebook as chapter in documentation Supported by sphinx à html, pdf as usual
Documentation easy to create: enter commands in notebook ouput is produced automatically updating docs means re-running notebook
Can run regression test on documentation notebooks using NoteBook VALidate (nbval) Introduction Jupyter Prof Hans Fangohr, 5 October 2019 27
Use case 7: documenting software library
Use notebook as chapter in documentation Supported by sphinx à html, pdf as usual
Documentation easy to create: enter commands in notebook ouput is produced automatically updating docs means re-running notebook
Can run regression test on documentation notebooks using NoteBook VALidate (nbval)
With Binder, can make documentation executable (for example DiscretisedField Tutorials) Introduction Jupyter Prof Hans Fangohr, 5 October 2019 28 nbval
py.test is a popular Python testing framework
nbval is a py.test plugin which lets py.test recognise and collect Jupyter notebooks
In each notebook, each cell is a test: The test passes if execution of the input creates the stored output The test fails otherwise
There is a variety of configuration parameters
Home page: - https://github.com/computationalmodelling/nbval Introduction Jupyter Prof Hans Fangohr, 5 October 2019 29 nbval example
$ py.test --verbose --nbval 2-widgets.ipynb ======test session starts ======platform darwin -- Python 3.6.8, pytest-3.10.0, py-1.7.0, pluggy-0.8.0 -- /Users/fangohr/anaconda3/bin/python rootdir: /Users/fangohr/Desktop/jupyter-demo, inifile: plugins: remotedata-0.3.1, openfiles-0.3.0, doctestplus-0.1.3, arraydiff-0.2, nbval-0.9.1 collected 9 items
2-widgets::ipynb::Cell 0 PASSED [ 11%] 2-widgets::ipynb::Cell 1 PASSED [ 22%] 2-widgets::ipynb::Cell 2 PASSED [ 33%] 2-widgets::ipynb::Cell 3 PASSED [ 44%] 2-widgets::ipynb::Cell 4 PASSED [ 55%] 2-widgets::ipynb::Cell 5 PASSED [ 66%] 2-widgets::ipynb::Cell 6 PASSED [ 77%] 2-widgets::ipynb::Cell 7 PASSED [ 88%] 2-widgets::ipynb::Cell 8 PASSED [100%]
======9 passed, 1 warning in 2.11 seconds ======(base) [20:49:09] fangohr:jupyter-demo git:(master*) $ Introduction Jupyter Prof Hans Fangohr, 5 October 2019 30
Use case 8: reproducible publication
Create github repository to complement publication Create one notebook per figure / main result Define software environment in github repository using Binder syntax
Close to reproducible publication: fully specified software environment fully specified data analysis Data access ià Andy Götz talk, PaNOSC
Zenodo for long term preservation Create Zenodo deposit for repository Cite Zenodo DOI in publication Example: https://github.com/maxalbert/paper-supplement-nanoparticle-sensing Introduction Jupyter Prof Hans Fangohr, 5 October 2019 31 Introduction Jupyter Prof Hans Fangohr, 5 October 2019 32 Introduction Jupyter Prof Hans Fangohr, 5 October 2019 33
Jupyter Lab
Next generation Jupyter notebook interface
Window manager embedded in browser
“classic notebook“ in one window
Additional features in other windows File browser Extensions CSV viewer. Introduction Jupyter MultipleProf tabs Hans Fangohr, 5 October 2019 34
Multiple panels
Explore files (e.g. CSV) Table of Contents extension
Dark theme! Introduction Jupyter Prof Hans Fangohr, 5 October 2019 35
Jupyter Notebook vs. JupyterLab
JupyterLab is the newer interface to the Flexible layouts and panes notebooks More flexible console/text editor It provides a more flexible interface closer to a modern IDE Drag/Drop/Expand/Collapse cells
More on this later in the day Themes Extensions Introduction Jupyter Prof Hans Fangohr, 5 October 2019 36
Nbdime – NoteBook DIff and MErge
Jupyter notebooks are rich media documents, stored as plain text JSON files
Basic diff and merge tools (such as that used by git) do not handle this format well Small changes to text/plots result in unreadable diffs
nbdime provides multiple tools to help with “content-aware” diffing and merging for Jupyter notebook files nbdiff compare notebooks in a terminal-friendly way nbmerge three-way merge of notebooks with automatic conflict resolution nbdiff-web shows you a rich rendered diff of notebooks nbmerge-web gives you a web-based three-way merge tool for notebooks nbshow present a single notebook in a terminal-friendly way
Homepage: -https://github.com/jupyter/nbdime Introduction Jupyter Prof Hans Fangohr, 5 October 2019 37 Introduction Jupyter Prof Hans Fangohr, 5 October 2019 38 Introduction Jupyter Prof Hans Fangohr, 5 October 2019 39
Summary – tools
Jupyter Notebook Jupyter Lab – next generation Jupyter user interface Jupyter Hub – serving notebook from compute facility for multiple users NBDIME – DIffing and MErging tools NBVAL – VALidation tool; use each cell as a test NBCONVERT – conversion of notebooks to other formats & execution NBParametrize and Papermill – inject parameters into notebook files IPyWidgets – GUI like elements in notebook Binder – Cloud hosted execution of notebooks from github repositories
There is much more. Introduction Jupyter Prof Hans Fangohr, 5 October 2019 40
Summary – use cases
Use cases Data analysis Provision of recipes Notebook-as-a-script Remote data analysis (JupyterHub) Mixing GUI and script-driven analysis Documentation of Software Reproducibility . . . Introduction Jupyter Prof Hans Fangohr, 5 October 2019 41
Summary
Jupyter notebook and ecosystem provides many options
Acknowledgements Authors and co-authors of ICALEPCS contribution TUCPR02 (Tuesday 14:30) Robert Rosca and Thomas Kluyver OpenDreamKit Horizon 2020, European Research Infrastructures project (#676541), http://opendreamkit.org PaNOSC: Photon and Neutron Open Science Cloud, European Union’s Horizon 2020 research and innovation programme under grant agreement No 654220 The Gordon and Betty Moore Foundation through Grant GBMF #4856, by the Alfred P. Sloan Foundation and by the Helmsley Trust. EPSRC's Centre for Doctoral Training in Next Generation Computational Modelling, http://ngcm.soton.ac.uk (#EP/L015382/1), Introduction Jupyter Prof Hans Fangohr, 5 October 2019 42
09:00 - 09:30 Jupyter Notebook and Ecosystem 30', Hans Fangohr (European XFEL GmbH) 09:30 - 10:00 Status updates from facilities ► Jupyter at Synchrotron Soleil (Alain Buteau & Gwenaelle Abeille) 5' ► Jupyter at CERN (Manuel Gonzalez & Jakub Wozniak) 5' ► Jupyter at European Southern Observatory (Gianluca Chiozzi) 5' ► Jupyter at J-PARC MLF (Kentaro Moriyama) 5’ ► Jupyter at MAX IV Synchrotron (Vincent Hardion) 5’ 10:00 - 10:30 Coffee break 10:30 - 12:30 JupyterLab Tutorial 2h0', Saul Shanabrook (Quansight) 12:30 - 14:00 Lunch and networking 14:00 - 14:30 Jupyter for processing neutron event data at ESS 30', Dr. Jonathan Taylor (European Spallation Source) 14:30 - 15:00 Jupyter at Brookhaven National Laboratory 30', Dr. Daniel B. Allan (Brookhaven National Lab) 15:00 - 15:20 Jupyter for Accelerator Physics 20', Dr. Jonathan Edelen (RadiaSoft LLC) 15:30 - 16:00 Coffee break 16:00 - 17:00 Questions and answers, discussion, Hans Fangohr (European XFEL GmbH)