Interpolation Technique for Sparse Data Based on Information Diffusion Principle—Ellipse Model

Vol.19 No.1 JOURNAL OF TROPICAL METEOROLOGY March 2013 Article ID: 1006-8775(2013) 01-0059-08 INTERPOLATION TECHNIQUE FOR SPARSE DATA BASED ON INFORMATION DIFFUSION PRINCIPLE—ELLIPSE MODEL 1 1 1 2 ZHANG Ren (张韧) , HUANG Zhi-song (黄志松) , LI Jia-xun (李佳讯) , LIU Wei (刘巍) (1. College of Meteorology and Oceanography, PLA University of Science and Technology, Nanjing 211101 China; 2. School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610031 China) Abstract: Addressing the difficulties of scattered and sparse observational data in ocean science, a new interpolation technique based on information diffusion is proposed in this paper. Based on a fuzzy mapping idea, sparse data samples are diffused and mapped into corresponding fuzzy sets in the form of probability in an interpolation ellipse model. To avoid the shortcoming of normal diffusion function on the asymmetric structure, a kind of asymmetric information diffusion function is developed and a corresponding algorithm-ellipse model for diffusion of asymmetric information is established. Through interpolation experiments and contrast analysis of the sea surface temperature data with ARGO data, the rationality and validity of the ellipse model are assessed. Key words: information diffusion; interpolation algorithm; sparse data; ellipse model CLC number: P731/O241.3 Document code: A 1 INTRODUCTION the ARGO data: the average array interval of ARGO floats was about 300 km, and there is only one profile Accurate and reliable oceanic observational data about every 10 days for each ARGO float. For are very important for marine research and regional oceanic and atmospheric research, the ARGO ocean-atmosphere numerical forecast. However, the data is sparse in space and discrete in time. For partial most difficult obstacle nowadays is the lack of key sea areas, such as the South China Sea and the oceanic environmental data and the lack of data northwest Pacific, the temporal-spatial resolution of analysis and information extraction technique. ARGO data is far from enough. Considering the ARGO (Array for Real-time Geostrophic factors of floats shift and the difference of measuring Oceanography) is a global oceanic observation and periods and regional distribution, the available ARGO investigation project proposed and supported by the data are even rare[2]. United States, Great Britain, France, Japan, China, Interpolation algorithms are common techniques and other countries. The goal of the ARGO project is for estimating and approaching the missing to establish a global real-time observational network information with nearby data. The current which consists of about 3000 automation floats and interpolation algorithms contain the Newton method, can quickly and accurately collect the global Lagrange method, spline interpolation, polynomial temperature and salinity profiling data and their interpolation, finite element method, weighing characteristics. The global ARGO observational function method, variational method, spectrum network can provide more than 10,000 temperature method, successive correction method, optimal and salinity real-time observational profiles monthly. interpolation and so on[3, 4], which can basically meet The ARGO data has such advantages as vast spatial the needs of the interpolation and fitting of large-scale range, long duration and sufficient observational atmospheric and oceanic data. factors and supports whole weather detection, which However, a key premise involved is that the can efficiently make up for the shortage of the oceanic above interpolated techniques need to be provided environment data, especially the observations in the [1] with sufficient data and related information. If the deep layers . observational data is too rare, the precision and However, there are some inherent limitations in reliability of the interpolation algorithm will be Received 2011-04-21; Revised 2012-10-23; Accepted 2013-01-15 Foundation item: Project of Natural Science Foundation of China (41276088) Biography: ZHANG Ren, Ph.D., professor, primarily undertaking research on air-sea interaction. Corresponding author: ZHANG Ren, e-mail: [email protected] 60 Journal of Tropical Meteorology Vol.19 restricted greatly. There is universal scattered and Matrix probability density function estimation sparse observational data such as ARGO data in the through the information diffusion principle is called oceanic environment, but the general interpolation diffusion estimation. The exact definition of the techniques have serious limitations in dealing with diffusion estimation is defined as follows. If μ ( x) such type of data[5]. From the analysis above, it has important scientific meaning and practical value to is defined on a Borel measurable function ll− develop new interpolation techniques to address the in (−∞+∞, ) , d > 0 is a constant, x = i , then issue of sparse observation data. Ellipse interpolation d algorithm model—a new interpolation technique for n ˆ 1 ⎡⎤ll− i dealing with sparse data based on the information fl()= ∑ μ ⎢⎥ (1) diffusion principle—was proposed in this paper. ndi i=1 ⎣⎦ d is the diffusion estimation of the matrix probability 2 THE PRINCIPLE OF INFORMATION density function f ()l where μ ()x is the diffusion DIFFUSION INTERPOLATION function and d is the window width. 2.1 The Principle of information diffusion 2.2 The idea of information diffusion The technique of information diffusion is a concept of research and a mathematic model put In this paper, the idea of information diffusion is forward for solving the imperfect information existing introduced into the fitting and interpolation of sparse in evaluating strong natural disasters, such as data and a new interpolation technique—algorithm for earthquake, storm surge, mudslides, and etc[6], which information diffusion and interpolation, which is are characterized by high degree of severity and small suitable for sparse data and small samples, is samples. Information diffusion is a fuzzy mathematic proposed. method by optimizing the fuzzy information of 2.2.1 FUZZY MAPPING RELATIONSHIP BETWEEN INPUT samples in order to make up for the missing AND OUTPUT information and can effectively deal with the imperfect sample information by transforming single For an input-output system, Ω is denoted as the samples into fuzzy-set samples in the form of matrix, x as the input variable, y as the output [7] probability . Now, the information diffusion variable, X as the input set, and Y as the output technique is only used for the risk assessment of the set, viz. x ∈ X ， y ∈Y ， Ω=X ×Y . small-sample events[8-10], and its idea is suitable for solving the imperfect data such as sparse data Let f (,xy ) be the probability density function interpolation. of matrix Ω , then the density of condition probability of y is described as follows with Suppose that WWW= { 12L Wn} is the x = u . knowledge sample series, L the underlying domain, fYX| (|)yu= fuy (,) fuvdv (,) . (2) and the observational value of Wi is li . Let ∫ vY∈ x =−φ ()lli , then if W is imperfect, there is a Based on the fuzzy-set idea, the Ω input-output system is defined to be an output fuzzy set B under function μ ()x which can make the li point % a given input, the membership function of fuzzy set information (value-1) diffuse into l with μ x . ( ) B% is corresponding to the probability density The diffused information distribution pattern function of output, and the normalized results of the nn probability density function are a membership Ql()==∑∑μμφ ( x )() ( l − li ) can better function, i.e., the membership function of fuzzy set jj==11 B% is described as follows, describe the whole structure of W , which is called the information diffusion principle[6]. fuy(, ) fuvdv (,) ∫ fuy(, ) μ ()y ==vY∈ . (3) B% ⎧⎫max{}f (u ,y ) max⎨⎬fuy ( , ) fuvdv ( , ) yY∈ yY∈ ∫ ⎩⎭vY∈ Therefore, in the input-output system, for any given input x ( x ∈ X ), its whole potential output 60 No.1 ZHANG Ren (张韧), HUANG Zhi-song (黄志松) et al. 61 can be denoted by the fuzzy set B% , which is the n Let qxy= μ , , marked as q , fuzzy mapping relationship between input and output. uvjk∑ uv jk() i i jk Generally, the probability density function i=1 st f (,xy ) of the matrix Ω is hardly acquired at hand tq= ∑∑ jk . By information diffusion, the but currently estimated by mass statistics of samples. jk==11 Under the condition of rare data information, it will estimation of probability density function at point still be given an approximated estimation of the (,)uv in the Ω input-output system can be distribution of whole probability density by the jk information diffusion based on a few sample data denoted as obtained. q fuˆ(,) v = jk . (4) jk t 2.2.2 ESTIMATION OF INFORMATION DIFFUSION FUNCTION S is a set of small sample series for a 2.2.3 MAPPING INTERPOLATION WITH INFORMATION DIFFUSION constructed Ω input/output system, denoted as Substitute the information diffusion estimation Sxyxyxy= {(11 , ),( 2 , 2 ),L (nn , )} . fˆ(,)uv of the probability density function into the For the lack of samples, the probability density jk function is often hardly constructed by current input-output mapping relationship formula (3). When common statistical analysis methods. the input is u , the output is as follows: In the principle of information diffusion, small fuyˆ(, ) samples series S can be regarded as the information B% ==μ ()yy y.(5) ∫∫B% ˆ points scattering in the input/output phase space yY∈∈ yYmaxfuy ( , ) yY∈ {} X ×Y . By point-set mapping, each sample data can be diffused

Interpolation Technique for Sparse Data Based on Information Diffusion Principle—Ellipse Model

The Romance of the Three Kingdoms Podcast. This Is Episode 78. Hey

2018 IEEE CSAA Guidance, Navigation and Control Conference

Immortality of the Spirit: Chinese Funerary Art from the Han and Tang Dynasties Exhibition Catalogue

Dynasty Warriors 4 TOTAL Guide

The Romance of the Three Kingdoms Podcast. This Is Episode 80. Last

Community Sanctions: a Comparative Study Between China and Europe

The Romance of the Three Kingdoms Podcast. This Is Episode 81. Last

RACR-07 September 25-26, 2007 China Executive Leadership Academy Pudong Shanghai, China

2019 IEEE Global Communications Conference (GLOBECOM 2019)

The Romance of the Three Kingdoms Podcast. This Is Episode 79. Last

Paradox of Legal Reform Under the Kuomintang Regime in Mainland

A Robotic Gripper Based on Advanced System Set-Up and Fuzzy Control Algorithm