2017-08-Iucr Computing School-Favre-Nicolin

High-performance computing using Python, GPUs & Modular approach to crystallographic software development Vincent Favre-Nicolin X-ray NanoProbe group, ESRF 1 l 66TH MEETING OF THE ESRF l 30-31 May 2014 l Author 26/07/2013 OUTLINE OF MY TALK • Modular, object-oriented programming • Efficient programming in Python • GPU computing from Python • A few words about Cloud computing Page 2 l IUCr Computing School, Bangalore, 2017-08-16 l Vincent FAVRE-NICOLIN 26/07/2013 MAIN COMPUTING PROJECTS • Ab initio structure solution from powder diffraction: FOX: http://fox.vincefn.net • Coherent diffraction Imaging using GPU PyNX: • https://software.pan-data.eu/software/102/pynx • http://ftp.esrf.fr/pub/scisoft/PyNX/ Page 3 l IUCr Computing School, Bangalore, 2017-08-16 l Vincent FAVRE-NICOLIN 26/07/2013 PROGRAMMING STYLES: FLAT import numpy as np As simple as it gets # Linear programming l, k, h = np.mgrid[-10:11, -10:11, -10:11] n_atoms = 1000 x = np.random.uniform(0, 1, size=n_atoms) y = np.random.uniform(0, 1, size=n_atoms) z = np.random.uniform(0, 1, size=n_atoms) fhkl = np.empty(h.shape, dtype=np.complex) for i in range(h.size): fhkl[i] = np.exp(2j * np.pi * (h.flat[i] * x + k.flat[i] * y + l.flat[i] * z)).sum() Page 4 l IUCr Computing School, Bangalore, 2017-08-16 l Vincent FAVRE-NICOLIN 26/07/2013 PROGRAMMING STYLES: FUNCTIONS … # Function def calc_f_hkl(x, y, z, h, k, l): assert (x.shape == y.shape == z.shape) fhkl = np.empty(h.shape, dtype=np.complex) for i in range(h.size): fhkl[i] = np.exp(2j * np.pi * (h.flat[i] * x + k.flat[i] * y + l.flat[i] * z)).sum() return fhkl You can: fhkl = calc_f_hkl(x, y, z, h, k, l) • Re-use the function • Optimize it Page 5 l IUCr Computing School, Bangalore, 2017-08-16 l Vincent FAVRE-NICOLIN 26/07/2013 PROGRAMMING STYLES: OBJECT-ORIENTED class reCiproCal_spaCe: def __init__(self, h, k, l): You can: assert (h.shape == k.shape == k.shape) self.h = h • Re-use the class self.k = k self.l = l self._fhkl = None • Optimize it def get_f_hkl(self): • Use instance as return self._fhkl storage for result def calc_f_hkl(self, x, y, z): self._fhkl = np.empty(h.shape, dtype=np.Complex) for i in range(self.h.size): self._fhkl[i] = np.exp(2j * np.pi * (self.h.flat[i] * x + self.k.flat[i] * y + self.l.flat[i] * z)).sum() reC = reCiproCal_spaCe(h, k, l) reC.CalC_f_hkl(x, y, z) fhkl = reC.get_f_hkl() Page 6 l IUCr Computing School, Bangalore, 2017-08-16 l Vincent FAVRE-NICOLIN 26/07/2013 OBJECT-ORIENTED: CLASSES Class definition class reCiproCal_spaCe: def __init__(self, h, k, l): Initialization function assert (h.shape == k.shape == k.shape) self.h = h self.k = k Attribute (data member) self.l = l self._fhkl = None Protected attribute _ def get_f_hkl(self): Method (member function) return self._fhkl def calc_f_hkl(self, x, y, z): self._fhkl = np.empty(h.shape, dtype=np.Complex) for i in range(self.h.size): self._fhkl[i] = np.exp(2j * np.pi * (self.h.flat[i] * x + self.k.flat[i] * y + self.l.flat[i] * z)).sum() reC = reCiproCal_spaCe(h, k, l) class instance reC.CalC_f_hkl(x, y, z) fhkl = reC.get_f_hkl() Page 7 l IUCr Computing School, Bangalore, 2017-08-16 l Vincent FAVRE-NICOLIN 26/07/2013 WHY OBJECT-ORIENTED class reCiproCal_spaCe: def __init__(self, h, k, l): assert (h.shape == k.shape == k.shape• ) Separate: self.h = h • Code (Calculations) self.k = k • Data (storage) self.l = l self._fhkl = None • Abstract layer: • No direCt aCCess to data def get_f_hkl(self): return self._fhkl • Methods must keep their signature (API) • Calculation/storage Can be later def calc_f_hkl(self, x, y, z): optimized: self._fhkl = np.empty(h.shape, dtype=np.Complex• ) Limited or full-space hkl (FFT vs direct sum) for i in range(self.h.size): • CPU or GPU ? self._fhkl[i] = np.exp(2j * np.pi * (self.h.flat[i] * x + self.k.flat[i] * y + self.l.flat[i] * z)).sum() • Better code organization reC = reCiproCal_spaCe(h, k, l) • Easy to document reC.CalC_f_hkl(x, y, z) fhkl = reC.get_f_hkl() • Easy to share Page 8 l IUCr Computing School, Bangalore, 2017-08-16 l Vincent FAVRE-NICOLIN 26/07/2013 WHY NOT OBJECT-ORIENTED class reCiproCal_spaCe: def __init__(self, h, k, l): assert (h.shape == k.shape == k.shape• ) It’s complicated self.h = h self.k = k • It’s longer to write self.l = l self._fhkl = None • Simple code don’t need def get_f_hkl(self): return self._fhkl objects def calc_f_hkl(self, x, y, z): • Why document, it’s trivial ? self._fhkl = np.empty(h.shape, dtype=np.Complex) for i in range(self.h.size): self._fhkl[i] = np.exp(2j * np.pi * (self.h.flat[i] * x + self.k.flat[i] * y + self.l.flat[i] * z)).sum() reC = reCiproCal_spaCe(h, k, l) reC.CalC_f_hkl(x, y, z) fhkl = reC.get_f_hkl() Page 9 l IUCr Computing School, Bangalore, 2017-08-16 l Vincent FAVRE-NICOLIN 26/07/2013 PYTHON CLASSES: PROTECTED ATTRIBUTES class reCiproCal_spaCe: def __init__(self, h, k, l): assert (h.shape == k.shape == k.shape• ) All attributes can be self.h = h self.k = k accessed in Python self.l = l self._fhkl = None • Members with a leading _ def get_f_hkl(self): are a programmer’s hint that return self._fhkl the method or data may be def calc_f_hkl(self, x, y, z): self._fhkl = np.empty(h.shape, dtype=np.Complexchanged) in the future for i in range(self.h.size): self._fhkl[i] = np.exp(2j * np.pi * (self.h.flat• Use[i] * x + self.k.flatpublic[i] * y +access self.l.flat[i] * z)). methodssum() reC = reCiproCal_spaCe(h, k, l) • Use _ for data/methods reC.CalC_f_hkl(x, y, z) fhkl = reC.get_f_hkl() # YES which may change fhkl = _fhkl # NO- may trigger the end of the world Page 10 l IUCr Computing School, Bangalore, 2017-08-16 l Vincent FAVRE-NICOLIN 26/07/2013 OO PROGRAMMING: BIG OR SMALL CLASSES For any crystallographic computing, you can choose: • Large classes with many methods • A large number of small classes easily re-used Page 11 l IUCr Computing School, Bangalore, 2017-08-16 l Vincent FAVRE-NICOLIN 26/07/2013 EXAMPLE PROBLEM: STRUCTURE SOLUTION Diffraction Unit Cell, Structure Structure Sample Data Spacegroup(s) Solution Refinement Get atomic positions Least-Squares within 0.1-0.5 Å Reciprocal-space Global Optimization in Methods real space ! Need high-quality |F(hkl)| ! ExtraCted |F(hkl)| Modeling by atoms or atom groups Fourier Phases reCyCling (more or less) random search in direct space from the model degrees of freedom StruCtural model EleCtroniC density Grid SearCh Monte-Carlo GenetiC Algorithms Page 12 l IUCr Computing School, Bangalore, 2017-08-16 l Vincent FAVRE-NICOLIN 26/07/2013 BIG CLASSES: REVERSE MONTE-CARLO Parametrization -> Degrees Of Freedom keep configuration (atomic positions, torsion angles, ..) yes evaluation of the is Configuration starting random Change new Configuration: better ? configuration of parameters Cost (C) C n < C n-1 no keep configuration with probability: $%& Hypersurface � = � ' Cost = f (DOF) Temperature of Need two classes: the algorithm • Object to optimize Generate a distribution of • Algorithm configurations following Boltzmann's law Page 13 l IUCr Computing School, Bangalore, 2017-08-16 l Vincent FAVRE-NICOLIN 26/07/2013 APPROACH 1: BIG CLASSES / OBJCRYST++ OptimizationObj RefinableObj - List of parameters - Cost: log(likelihood) - Random moves Monte-Carlo - Derivatives - Simulated Annealing Least squares - Parallel Tempering ScatteringPower Scatterer Crystal - scattering factor - (list of) position(s) - unit cell ScatteringData - anomalous scattering factor - scattering power(s) - spacegroup - crystal - temperature factor - specific moves - list of scatterers - reflections ScatteringPowerAtom PowderPattern - Background SingleCrystal - Crystalline Phases Atom Molecule (restraints) Profiles Pseudo-Voigt, TOF- Page 14 l IUCr Computing School, Bangalore, 2017-08-16 l Vincent FAVRE-NICOLIN 26/07/2013 OBJCRYST++ API See API from: http://fox.vincefn.net • Molecule • UnitCell • LeastSquares < Page 15 l IUCr Computing School, Bangalore, 2017-08-16 l Vincent FAVRE-NICOLIN 26/07/2013 OBJCRYST++ API: AVOID RE-COMPUTATIONS Powder pattern calculation: • Did Crystal struCture Change ? • No: re-use structure factors • Yes: • Re-compute structure factors • Re-use table of atomic scattering factors • Did unit Cell Change ? • No: keep refleCtion positions • Yes: re-compute reflection positions • Did refleCtion profiles Change ? • No: re-use reflection profiles • Yes: re-compute reflection profiles Note: • top objects know nothing about derived objects implementation – all is hidden behind OO API • Optimization algorithms know nothing about crystallography ! Page 16 l IUCr Computing School, Bangalore, 2017-08-16 l Vincent FAVRE-NICOLIN 26/07/2013 FOX/OBJCRYST++ SPEED • 20 independent atoms, 100 reflections • 10^4 to 5.10^4 configurations/s Drawback of the ‘big object’ approach: the library is optimized for algorithms needing large number of trials/s Page 17 l IUCr Computing School, Bangalore, 2017-08-16 l Vincent FAVRE-NICOLIN 26/07/2013 APPROACH 2: SMALL CLASSES / TOOLKIT / PYNX class ERProj(CLOperatorCDI): """ Error reduction. """ Idea: def __init__(self, positivity=False): super(ERProj, self).__init__() decompose the computing problem self.positivity = positivity in as many def op(self, cdi): if self.positivity: independent snippets as possible self.proCessing_unit.cl_er_real(cdi._cl_obj, cdi._cl_support) else: self.proCessing_unit.cl_er(cdi._cl_obj, cdi._cl_support) return cdi class ER(CLOperatorCDI): """ Error reduction cycle """ def __new__(cls, positivity=False, calc_llk=False): return ERProj(positivity=positivity) * FourierApplyAmplitude(calc_llk=calc_llk) Page 18 l IUCr Computing School, Bangalore, 2017-08-16 l Vincent FAVRE-NICOLIN

Load more