Smoothing Techniques for Imaging Problems

Appendix A Smoothing Techniques for Imaging Problems This appendix intends to present issues in nonparametric regression and image denoising that are relevant for the analyses of neuroimaging experiments presented in this book. It aims at a deeper level of understanding for the methods presented in this book and does not aim for a comprehensive overview on available methods. For a broad overview on nonparametric regression and smoothing techniques, we refer, e.g., to the books of Fan and Gijbels (1996), Simonoff (1996), Wand and Jones (1995), Bowman and Azzalini (1997), Gu (2013), Mallat (2009), Nason (2008), and Katkovnik et al. (2006). A more detailed text on adaptive smoothing methods available in R is Polzehl et al. (2019). A.1 Nonparametric Regression Regression is commonly used to describe and analyze the relation between explanatory input variables X ∈ X ⊂ Rd and responses Y ∈ Y ⊂ Rq . Usually, this relation is given in terms of the mean response as E(Y |X = x) = f (x) (A.1) or as a model with varying coefficients Y |(X = x) ∼ Pf (x) assuming that the conditional probability law P of Y depends on x only through a function f . Images naturally fit into this scenario, with X classically being a two-dimensional grid and the response variable Y denoting random intensity values at the grid loca- tions. Moreover, in neuroimaging, see Chaps. 4–6, images are generally considered to be defined on a three-dimensional grid. In a more general setting, the response © Springer Nature Switzerland AG 2019 171 J. Polzehl and K. Tabelow, Magnetic Resonance Brain Imaging, Use R!, https://doi.org/10.1007/978-3-030-29184-6 172 Appendix A: Smoothing Techniques for Imaging Problems variable Y can also be a vector of observations, see Chaps. 4 and 5 for examples. In all these cases, the design variable X usually is fixed. We therefore denote the design points by xi . The probability law Pf (x) generally depends on the physical process that generates the image. Examples of Pf (x) considered in this book are Gaussian distributions with expectation f (x) and known variance σ 2 or χ-distributions with f (x) = μ(ζ(x), σ, L) depending on an unknown noncentrality parameter and known scale σ and degrees of freedom L. We will see later how the assumption of known σ can be relaxed. Regression analysis then infers on the parameter function f (x) and its spatial structure. Parametric regression assumes that the regression function f = f (x,θ)is known up to a parameter θ, i.e., f : X × Θ → Y , with some parameter space Θ ⊂ Rp and X ⊂ Rd . In the simplest case, f may be ( ) = θ + θ T ,θ= (θ ,θ ) ∈ R(d+1) ( ) = assumed linear, i.e., f x 0 1 x 0 1 or quadratic, f x θ + θ T + T Θ θ = (θ ,θ ,Θ ) ∈ R(d2+d+1) 0 1 x x 2x 0 1 2 . The inference problem for the model (A.1) then simplifies to the estimation of a global parameter θ rather than a whole function f . For a given data sample (xi , Yi )i=1,...,n, this can then be done by either least squares n ˆ 2 θ = argmin |Yi − f (xi ,θ)| (A.2) θ i=1 or maximum-likelihood n θˆ = ( ), argmax log pθ(xi ) Yi (A.3) θ i=1 where pθ(x) denotes the density associated to the probability law Pθ(x) = Pf (x,θ). Both estimates coincide in case of an independent Gaussian response variables and homoskedastic errors. In many applications like image processing, the assumption of a parametric regression function f (x,θ) is by far too restrictive. A common way to increase the flexibility of parametric regression models is to assume that a parametric model g(ξ − x,θ(x)) for some function g provides a good local approximation for f in a neighborhood U(x) ={ξ :||ξ − x|| < h} of x ∈ X . The parameter h > 0isusu- ally referred to as bandwidth and obviously controls the size of the neighborhood U(x). The regression function f (x) is then characterized by the parameter function θ(x) and the function g. To estimate θ(x) at location x, we define a local model by assigning a weight wi (x) with 0 ≤ wi (x) ≤ 1 to each pair of observations (xi , Yi ). We call W(x) = (w1(x),...,wn(x)) a weighting scheme. Local estimates of θ(x) are then obtained by weighted least squares Appendix A: Smoothing Techniques for Imaging Problems 173 n θ(ˆ ) = (( )n , ( ); θ) = ( )| − ( − ,θ)|2 x argmin R Yi i=1 W x argmin wi x Yi g xi x θ θ i=1 or weighted maximum-likelihood n θ(ˆ ) = (( )n , ( ); θ) = ( ) ( ) x argmax l Yi i=1 W x argmax wi x log pg(xi −x,θ) Yi θ θ i=1 providing local estimates of the regression function as fˆ(x) = g(0, θ(ˆ x)). The weighting scheme W(x) determines the influence of the observation pairs on the estimate. To determine a local parametric estimate, we need to specify the local parametric function g(., θ) and a method for generating the local weighting scheme W(x). A.1.1 Kernel Smoothing The Nadaraya–Watson kernel estimator (Nadaraya 1964;Watson1964) is a special local parametric regression method which is obtained from the general framework above by specifying g(., θ) ≡ θ, i.e., as a constant function, and defining weights wi (x) = K (δ(xi , x)/h). Here, K is a monotone nonincreasing function K : R+ →[0, 1] with K (0) = 1. K is called kernel.1 h > 0 denotes a bandwidth and δ : X × X → R+ defines a distance in X . Common choices for the kernel K include 2 γ K (t) = ((1 − t )+) γ ∈ (0, 1, 2, 3) corresponding to rescaled versions of the uniform (γ = 0), Epanechnikov (γ = 1), bi-weight (γ = 2), or tri-weight (γ = 3) kernel. Another popular choice is the Gaussian kernel 2 γ 1 2 1 t K (t) = √ et /2 = √ lim 1 − . 2π 2π γ →∞ 2γ + + 1This definition slightly differs from the standard assumptions, i.e., K : R → R , K monotone ∞ ( ) = nonincreasing and 0 K t dt 1 but by a scale factor only. The requirements on the kernel can be relaxed, changing the properties of the resulting estimates, see, e.g., Fan and Gijbels (1996). 174 Appendix A: Smoothing Techniques for Imaging Problems Using weighted least squares, the kernel estimate at design point x is then explicitly given as n K (δ(x , x)/h)Y n w (x)Y fˆ(x) = i i i = i i i . n (δ( , )/ ) n ( ) (A.4) i K xi x h i wi x Equation (A.4) also provides the weighted likelihood estimate for one-parameter exponential families parametrized such that EYi = θ(xi ). The regression function can be estimated at any point x with wi (x)>0 for at least one i, i.e., not only at points xi with observations Yi . ˜ Kernel estimates can, in case of δ(xi , x) = δ(||xi − x||), be efficiently computed as convolutions of the image data with the kernel function using fast Fourier trans- form. This approach is for 1D, 2D, and 3D images implemented in function kernsm from package aws. We use the first T1 weighted image from the quantitative MRI experiment considered in Chap. 6 to illustrate the properties of kernel smoothers. The 3D image is read using library(oro.nifti) t1Name <- file.path("data", "MPM", "t1w_mfc_3dflash_v1i_R4_0015", "anon_s2018-02-28_18-26-190921-00001-00224-1.nii") T1 <- as.array(readNIfTI(t1Name, reorient = FALSE)) We now calculate a kernel estimate of the intensity function. library(aws) T1sm <- kernsm(T1, h = 2., kern = "Epanechnikov")@yhat For visualization of the result, we use the function rimage from the package adimpro, with function rimage.options used to modify the default behavior. library(adimpro) rimage.options(ylab = "z", zquantile = c(0.001, 0.999)) A comparison between the original T1 image and its smoothed version is provided in Fig. A.1. rimage(T1[,160,], main="Original", zlim= c(0,2600)) rimage(T1sm[,160,], main="Smoothed image", zlim= c(0,2600)) Even with a small bandwidth 3D kernel smoothing leads to significant noise reduction. Unfortunately, this comes at the cost of severe blurring of tissue borders and consequently loss of spatial resolution. Appendix A: Smoothing Techniques for Imaging Problems 175 A.2 Adaptive Weights Smoothing In this section, we introduce a class of smoothing procedures that do not require global smoothness assumptions. Instead, they rely on a concept of local homogeneity. The design space is assumed to consist of regions where the regression function is smooth or even approximately constant. Discontinuities show up as edges of homogeneous regions. Focused on recovering the homogeneity structure the procedures are edge preserving. The package aws (Polzehl 2019) implements four main classes of edge- preserving smoothing procedures: the propagation-separation approach (Polzehl and Spokoiny 2000, 2006), pointwise adaptive estimates as presented in Katkovnik et al. (2006), the nonlocal means filter (Buades et al. 2005) and total variation (TV) or total generalized variation (TGV) regularization (Rudin et al. 1992; Bredies et al. 2010). Here, we concentrate on the first class. For a comprehensive coverage of the package capabilities, we refer to Polzehl et al. (2019). A.2.1 Local Constant Likelihood Models The nonparametric regression method described in Sect.A.1.1 is designed for the estimation of a smooth regression function f . The assumption that f is twice differ- entiable is questionable in many applications. The relationship between explanatory variables and the response may exhibit abrupt changes at some design points or con- tours.

Smoothing Techniques for Imaging Problems

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support