Unsupervised Learning of Anomalous Diffusion Data
Total Page:16
File Type:pdf, Size:1020Kb
Unsupervised learning of anomalous diffusion data Gorka Muñoz-Gil,1, ∗ Guillem Guigó i Corominas,1 and Maciej Lewenstein1, 2 1ICFO – Institut de Ciències Fotòniques, The Barcelona Institute of Science and Technology, Av. Carl Friedrich Gauss 3, 08860 Castelldefels (Barcelona), Spain 2ICREA, Pg. Lluís Companys 23, 08010 Barcelona, Spain The characterization of diffusion processes is a keystone in our understanding of a variety of physical phenomena. Many of these deviate from Brownian motion, giving rise to anomalous diffu- sion. Various theoretical models exists nowadays to describe such processes, but their application to experimental setups is often challenging, due to the stochastic nature of the phenomena and the difficulty to harness reliable data. The latter often consists on short and noisy trajectories, which are hard to characterize with usual statistical approaches. In recent years, we have witnessed an impressive effort to bridge theory and experiments by means of supervised machine learning techniques, with astonishing results. In this work, we explore the use of unsupervised methods in anomalous diffusion data. We show that the main diffusion characteristics can be learnt without the need of any labelling of the data. We use such method to discriminate between anomalous diffusion models and extract their physical parameters. Moreover, we explore the feasibility of finding novel types of diffusion, in this case represented by compositions of existing diffusion models. At last, we showcase the use of the method in experimental data and demonstrate its advantages for cases where supervised learning is not applicable. Stochastic diffusion processes are ubiquitous in nature, the use of ensemble averages [10, 11] or extensive sta- with applications in diverse fields such as physics, chem- tistical analysis [12, 13]. Moreover, trajectories arising istry, biology or social sciences. The best-known model from SPT are usually short and cannot be compared to for these is Brownian motion [1], but recently, deviations the long time predictions theories usually make. On top from such paradigmatic phenomena have been discovered of that, the experimental data is often very noisy and in a huge range of systems [2]. We define as anomalous rather difficult to gather, hence increasing even more the diffusion any of these deviations, which are usually char- difficulty of the problem. acterized by a power-law scaling of the mean squared Lately, considerable interest has been devoted to de- displacement (MSD), velop and improve single trajectory methods, i.e. tools which extract the maximum amount of information from hx2(t)i ∼ tα; (1) a single anomalous diffusion trajectory. Here, machine where α is defined as the anomalous exponent. For α = 1 learning (ML) approaches have shown incredible success one typically recovers normal diffusion, and usually a re- and are able to beat state of the art methods in a variety turn to the Brownian motion universality class (for ex- of scenarios [14]. A wide range of ML architectures have ceptions, see Ref. [3]). Any α 6= 1 showcases anomalous been tested: from usual neural networks [15], convolu- diffusion, while other features such as non-ergodicity or tional [16] and recurrent layers [17, 18], graph neural correlated displacements may also be signals of its ap- networks [19], Bayesian inference [20], or extreme learn- pearance [4,5]. ing machines [21]. All these methods are trained in a In recent years, there has been an increasing interest in supervised scheme, which means that the machine learns the topic, motivated by the development of single particle to characterize the data by training with human-labelled tracking techniques (SPT), which open the possibility of data. While for some applications, such as parameter es- studying the motion of particles beyond the diffraction timation, this approach is completely valid and advanta- limit [6]. Such achievements have shown that diffusion geous, for others it creates an inherent problem that may in many different biological scenarios is indeed anoma- deeply affect our understanding of the data analyzed. lous [7]. Moreover, researchers have satisfactorily applied An example of such inherent problem is the discrimi- such theoretical framework to systems at all scales, rang- nation of the diffusion model of the particles, which has ing from ultra cold atom experiments [5], to animals [8] been usually stated as a ML classification problem [22– arXiv:2108.03411v1 [cond-mat.stat-mech] 7 Aug 2021 or economic signals [9]. 24]. When classifying between diffusion modes (i.e. nor- While currently both theory and experimental tech- mal, anomalous, confined or directed diffusion), the prob- niques are very well developed, connecting these two is lem is simplified, as they probably contain all possi- a non-trivial task. Due to the stochastic nature of the ble kinds of diffusion. However, a different problem is problem, the analysis of anomalous diffusion data and to classify between diffusion models, such as e.g. frac- its comparison to theoretical predictions often relies on tional Brownian motion (FBM) or continuous time ran- dom walk (CTRW) (see SectionIC for further details on these). In this case, it is very possible that the actual model of the data analyzed is not contained in the train- ∗ [email protected] ing set, or may even be a combination of these. Usual 2 supervised approaches lack the tools to discriminate these mean squared error (MSE), which for a single trajectory cases and they associate them with the most resembling reads model [24]. T In this work, we explore the use of unsupervised tech- 1 X (i) (i) MSE = (f(x ) − x )2; (2) niques for single trajectory characterization and show T t t that they can indeed be used to analyze anomalous diffu- t=1 sion data coming both from simulations and various ex- where f(x(i)) is the AE output for the input trajectory perimental setups. In particular, we focus on the use of x(i). neural network autoencoders, a typical ML architecture The AE used consists of a stack of convolutional and for unsupervised and semi-supervised approaches. Our fully-connected layers. Moreover, we consider features aim is: first, to investigate if unsupervised learning yields such as batch normalization, as well as a global aver- the same results of supervised techniques previously stud- age pooling and upsampling before and after the latent ied; second, to study the use of unsupervised approaches space. An schematic representation of such AE is pre- to understand and compare different anomalous diffusion sented in Fig.1(b). In this scheme, connections exist models beyond supervised methods; third, to explore the between subsequent layers, as typically occurs in most possibility of finding new physical processes related to neural networks architectures, but extra connections are anomalous diffusion by means of autoencoders. created between not subsequent ones. Such a special fea- The work is organized as follows: in SectionI we ture is often referred to as skip connections, and is widely present the unsupervised approach of use, namely con- used in residual neural networks [25]. Heuristically, we volutional autoencoders (SectionIA) and anomaly de- find that the use of such connections boosts the power of tection (SectionIB). Moreover, in SectionIC we present the AE, increasing vastly its accuracy. Moreover, due to the anomalous data used, as well as further details on the stochastic nature of the trajectories, any decrease in the anomalous diffusion models considered. Then, the the dimensionality of the trajectory (i.e. to the size of the results of the paper are presented in Section II. First, latent space) will cause the loss of necessary information we explore the feasibility of anomaly detection in simu- for its reconstruction. In that sense, the reconstruction lated data (Section IIA), both for model discrimination error will always have a lower bound related to the size (SectionIIA1) and parameter estimation (SectionIIA2, of the latent space [26]. In our particular implementa- but also to characterize novel types of diffusion, show- tion, the latent space consists of a fully connected layer cased in this case by the combination of existing models. whose size will depend on the length of the trajectories (Section IIA3) We then apply the knowledge gained to T . A detailed description of the architecture is presented analyze experimental data arising from SPT trajectories in AppendixA as well as in the repository of Ref. [27]. (Section IIB). Importantly, in order to avoid the effect of overfitting of the AEs, and to ensure that their predictions are re- lated to the learnt features and not the memorization of I. METHODS the data, all the results presented in this manuscript cor- respond to predictions over sets of trajectories not used A. Autoencoders in the training. Autoencoders (AE) are artificial neural networks that learn efficient encodings of high-dimensional data. They B. Anomaly detection consist of two parts: an encoder that compresses the in- put data into a low-dimensional embedding, referred to Anomaly detection refers to the problem of detecting as the latent space, and a decoder that recovers the orig- instances or samples from a certain dataset that differ inal input from the compressed representation. Ideally, in some sort from the rest. In other words, the goal is the output of the decoder is equal to the input of the en- to detect outliers, a.k.a. anomalies, which while having coder. However, due to the compression of information similarities with the rest of the dataset, also have special happening in the latent space, such an ideal scenario is features that crucially make them different. In recent often not found. Because of such compression, in order years, machine learning methods have shown great capa- to improve the similarity between reconstructions and in- bilities in dealing with such problem [28], with interesting puts, the AE has to learn features of the dataset during application in Physics [29].