Data Integration with Adjustment Techniques

Dr.-Ing. Frank Gielsdorf University of Technology Berlin Department for Geodesy and Adjustment Techniques Email: [email protected] Web: www.survey.tu-berlin.de

Contents

2 Concepts and strategies for PAI...... 1 2.4 Data Integration with Adjustment Techniques...... 1 2.4.1 Introduction ...... 1 2.4.2 Estimation of parameters (Least Squares Method) ...... 2 2.4.2.1 Arithmetic Mean ...... 4 2.4.2.2 Weighted Arithmetic Mean...... 6 2.4.2.3 Adjustment with Several Unknown Parameters...... 7 2.4.3 The Law of Error Propagation ...... 9 2.4.3.1 Error Propagation for Linear Functions ...... 10 2.4.3.2 The Importance of Covariances ...... 11 2.4.3.3 Adjustment and Error Propagation...... 13 2.4.4 PAI as Adjustment Problem...... 14 2.4.4.1 Improving Absolute Geometry...... 16 2.4.4.2 Improving Relative Geometry...... 19 2.4.5 Appendix: Information to Matrix Operations in MS Excel ...... 19

Data Integration with Adjustment Techniques 1

2 Concepts and strategies for PAI

2.4 Data Integration with Adjustment Techniques

2.4.1 Introduction The geometrical properties of objects in GIS are almost exclusively described by point coordinates referring to a global reference frame. But it is impossible to measure these point coordinates directly; they are the result of a calculation process. The input parameters of these calculations are measured values. Evan GPS receivers do not measure coordinates directly but calculate them from distance measurements to satellites. There exist several types of measured values, for instance distances, directions, local coordinates from maps or orthophotos etc. Mostly single measured values are grouped to sets of local coordinates. So, the measured values of a total station – horizontal direction, vertical direction and distance - can be seen as a set of spherical coordinates, row and column of digitized pixels as Cartesian coordinates in a local reference frame etc. The final aim of the most evaluation processes in surveying is the determination of point coordinates in a global reference frame. But measured values have two essential properties: 1. They are random variables. Because it is impossible to measure a value with arbitrary accuracy, which leads to the fact that any measured value contains some uncertainty. 2. They are redundant. Commonly there exist more measured values then necessary to be able to calculate unique point coordinates. A function of random variables results again in a random variable. Because of point coordinates are functions of measurement values they are like them random variables. The uncertainties contained in measured values lead necessarily to uncertainties in point coordinates. For the unique determination of a number of coordinates the exact same number of measured values is necessary. For example consider Figure 1.

d d1 2

A B Figure 1: Unique Arc Section

The positions of the control points A and B should be known. The distances d1 and d2 from the control points to the new point N were measured. The position of the new point can be calculated by intersection of arcs. The two unknown coordinate values of the new point xN and yN can be determined unique with the two measured values d1 and d2.

But what happens if we measure a third distance d3 from a control point C to N? Data Integration with Adjustment Techniques 2

d d1 2

A B Figure 2: Ambiguous Arc Section

A geometrical construction with a pair of compasses would yield a small triangle. The size of this triangle depends of the magnitude of the uncertainties in the measured values. This means for the calculation that its result depends on which measured values were used. A calculation with the distances d1 and d2 yields another result then a calculation with the distances d2 and d3. The example raises several questions. How can one get a unique result from redundant measured values containing uncertainties? How can one quantify the accuracy of the measured values on one and of the result on the other hand? The reply to these questions is the objective of adjustment theory. Objectives of Adjustment Theory: 1. Determination of optimal and unique output parameters (coordinates) from input parameters (measured values) which are redundant random variables under consideration of their accuracy. 2. Estimation of accuracy values for the output parameters. 3. Detection and localisation of blunders. But where is the relevance of adjustment techniques for the positional accuracy improvement in GIS? The coordinates in GIS result from the evaluation of measured values. In a first step these measured values often were local coordinates of digitized analogue maps which were transformed into a global reference frame. These so determined global coordinates describe the geometry of the GIS objects unique whereby the coordinates have to be addressed as random variables. During the process of PAI new measured values (even global coordinates can be seen as special measured values) with higher accuracy are introduced. The new measured values are redundant to the already existing coordinates. Therefore, the determination of new global coordinates with improved positional accuracy is a typical adjustment problem. But before the application of adjustment techniques for PAI will be presented in detail it is necessary to consider the basics of the adjustment theory.

2.4.2 Estimation of parameters (Least Squares Method) This section describes the process of parameter estimation on the basis of the least square method. Figure 3 shows a symbolic representation of an adjustment process. Data Integration with Adjustment Techniques 3

x 1 unknown x2 l1 parameters … x xu l2 observations Adjustment l … v1 residual errors v2 l n … v vn Figure 3: Symbolic Representation of an Adjustment

Input parameters are measured values called observations. Output parameters are the so called unknown parameters (mostly coordinate values) and the residual errors of the observations. The number of observations is named n, those of the unknown parameters is named u. Typical for an adjustment problem is the fact that the number of observations is greater than the number of unknown parameters. The difference between both values is conterminous to the number of supernumerary observations and is called the redundancy r.

r = n-u

Observations are signed with the symbol l, unknown parameters with the symbol x and residual errors with the symbol v. To be able to formulate adjustment problems in a clear way it is necessary to use the vector/matrix notation. The observations, unknown parameters and residual errors can then be grouped in vectors.

⎛ l ⎞ ⎛ x ⎞ ⎛ v ⎞ ⎜ 1 ⎟ ⎜ 1 ⎟ ⎜ 1 ⎟ ⎜l2 ⎟ ⎜ x2 ⎟ ⎜v2 ⎟ = ⎜ ⎟ xl = ⎜ ⎟,, v = ⎜ ⎟ ⎜ ⎟ ⎜ MM ⎟ ⎜ M ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ln ⎠ ⎝ xu ⎠ ⎝vn ⎠

In that course we want only regard the adjustment with observation equations. Basis of that model is the presentation of the observations as explicit functions of the unknown parameters. The observation equations in matrix notation are

+ = Axvl or transformed

= − lAxv . (1)

The matrix A in this observation equation system is called configuration matrix or design matrix. It describes the linear dependency between observations and unknown parameters. The observation equation system has an infinite number of solutions. To gain the optimal solution it is necessary to formulate an additional constraint. This additional constraint is the demand that the square sum of the residual errors should be minimal. In matrix notation it is

! T vv = min . Data Integration with Adjustment Techniques 4

With this constraint it is possible to formulate an extreme value problem and to derive an optimal and consequently unique solution for the unknown parameters:

−1 = ( T ) T lAAAx (2)

The equation system (2) is called the normal equation system. After solving the normal equations the residual errors can be calculated by insetting of x in the observation equations (1).

2.4.2.1 Arithmetic Mean The adjustment theory is very complex and it is impossible to impart its whole content during this course. Therefore we will begin with very simple special examples and try to generalise them step by step. The simplest case of an adjustment is the calculation of an arithmetic mean value: We consider two points A and B. The distance between these two points was measured 10 times. The results of the measurement are 10 observation values which we call l1…l10. In demand is the unknown distance between A and B which we call x. Now, it would be possible to solve the problem by formulation of the equation

= lx 3 , all the other observations would be redundant in that case. For each observation we could calculate a residual error vi.

1 = − lxv 1 −= lxv 2 2 (3) M

10 −= lxv 10

The residual error v3 would have the value 0 whilst the other residuals would have values non equal 0. The value of x depends in that case on which observation we used to its determination. But this result is not optimal. The demand for an optimal result can be formulated as follows: Find exactly that solution of x for which the sum of the squares of the residual errors vi becomes minimal! The first step of calculating an optimal x is the formulation of observation equations.

= − lAxv

In our special case the configuration matrix A has a very simple structure with just one column filled with ones and the vector x has just one row.

⎛1⎞ ⎜ ⎟ ⎜1⎟ = ⎜ ⎟ xA = ()x ⎜M⎟ ⎜ ⎟ ⎝1⎠ Data Integration with Adjustment Techniques 5

With A and l we are able to calculate x.

= T )( −1 T lAAAx

We want to apply this formula now for our special problem of arithmetic mean. The matrix product ATA is 10.

⎛1⎞ ⎜ ⎟ ⎜1⎟ ⎜1⎟ ⎜ ⎟ ⎜1⎟ ⎜ ⎟ ⎜1⎟ ⎜1⎟ ⎜ ⎟ ⎜1⎟ ⎜1⎟ ⎜ ⎟ ⎜1⎟ ⎜ ⎟ ⎝1⎠ ()(101111111111 )

The product ATA is called the normal equation matrix and is signed with the symbol N. We have the special case of an N-matrix with one row and one column what is conterminously to a scalar. The inverse (ATA)-1 of ATA is equal to the reciprocal of the scalar 10.

1 T AA )( −1 = 10

The expression (ATA)-1 respectively N-1 is the cofactor matrix of the unknown parameters and is signed with the symbol Qxx. The expression ATl results in the sum of the Observations

⎛ l ⎞ ⎜ 1 ⎟ ⎜ l2 ⎟ T lA = ⎜ ⎟ = l ⎜ M ⎟ ∑ i ⎜ ⎟ ⎝l10 ⎠ ()()L 111 21 L+++ lll 10 so that x results in

1 x ()+++⋅= lll . 10 21 L 10

As we can see the solution for the least squares approach is the same as for the arithmetic mean. Exercise 1 - Name the given array of observations l. - Enter the array with the configuration matrix A and name it A. Data Integration with Adjustment Techniques 6

- Calculate the normal equation matrix N = ATA with =mmult(mtrans(A);A) and name the resulting array N. - Calculate the cofactor matrix of parameters by inverting the normal equation matrix -1 Qxx = N with =minv(N) and Name the resulting array Qxx. - Calculate the matrix product ATl with =mmult(mtrans(A);l) and name the resulting array Atl. T - Calculate the parameter vector x = QxxA l with =mmult(Qxx;Atl) and name the resulting array (one cell) x. - Calculate the residual vector v = Ax-l with mmult(A;x)-l. 2.4.2.2 Weighted Arithmetic Mean In the arithmetic mean example we assumed that all observations have the same accuracy. But in reality mostly this assumption is not applicable. In the general case, the ingoing observations have different accuracies e.g. because of different measuring devices. For that case it is necessary to modify the adjustment model of the preceding section. We have now to introduce a weight for each observation. The observation weight steers the influence of each observation to the resulting unknown parameters. The higher the weight the higher is the influence of the corresponding observation. The weight of an observation is a function of its standard deviation. The standard deviation quantifies the accuracy of a value. The standard deviations of the observations are mostly known before the adjustment. Usually a standard deviation is signed with the symbol σ. The square of a standard deviation σ2 is called variance. The meaning of a standard deviation will be explained in following sections. The formula for a weight pi of an observation is

σ 2 p = 0 . i 2 σ i

In that formula means

pi – weight of observation i 2 σi – variance of observation i 2 σ0 – variance of unit weight

The variance of the unit weight is a freely selectable constant. Commonly one chooses the value for σ0 in that way that the weights get an average value of 1. If one introduces weighted observations in an adjustment the least squares constraint changes from

! T vv = min to

! T Pvv = min . (4)

In expression (4) P is the weight matrix which is a diagonal matrix of dimension × nn with the observation weights on the principle diagonal. Data Integration with Adjustment Techniques 7

⎛ p ⎞ ⎜ 1 0 ⎟ p P = ⎜ 2 ⎟ ⎜ O ⎟ ⎜ 0 ⎟ ⎝ pn ⎠

The formula for the calculation of the unknown parameters x changes to

−1 = ( T ) T PlAPAAx and the formula for the cofactor matrix of the unknown parameters Qxx to

T −1 xx = ( PAAQ ) .

Exercise 2

- Define an arbitrary value for the standard deviation of unit weight σ0. - Calculate the weight of each observation on the basis of its standard deviation σ 2 p = 0 . i 2 σ i - Fill an array of the size ×1010 with zeros. Enter the weights in the principal diagonal of the array. Name the array P. - Calculate the normal equation matrix N = ATPA.

T −1 - Calculate the cofactor matrix of the unknown parameters xx = ( PAAQ ) . - Calculate the vector ATPl. T - Calculate the parameter vector (one cell) x = QxxA Pl. - Calculate the residual vector v = Ax-l. 2.4.2.3 Adjustment with Several Unknown Parameters The arithmetic mean represents a special case of adjustment with just one unknown parameter. In the general case the number of unknown parameters is greater then one. This case should be explained with a simple example again.

d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 x A 1 2 3 4 5 6 7 8 9 B 0 Figure 4: Points in a One Dimensional Coordinate System

Figure 4 shows points in a one dimensional coordinate system. The coordinates xA and xB B of the control points A and B are known. The distances between the points d1…d10 were measured. In demand are the coordinates of the new points x1…x9. The standard deviations are proportional to the corresponding distance:

σ i = 01.0 ⋅ di (5) Data Integration with Adjustment Techniques 8

In a first step we have to formulate the observation equations:

vd 11 1⋅+=+ x1 − x A

vd 22 1 11 ⋅+⋅−=+ xx 2

vd 33 =+ 2 11 ⋅+⋅− xx 3

vd 44 =+ 3 11 ⋅+⋅− xx 4

vd 55 =+ 4 11 ⋅+⋅− xx 5

vd 66 =+ 5 11 ⋅+⋅− xx 6

vd 77 =+ 6 11 ⋅+⋅− xx 7

vd 88 =+ 7 11 ⋅+⋅− xx 8

vd 99 =+ 8 11 ⋅+⋅− xx 9

vd 1010 =+ 1 9 +⋅− xx B

If we shift the observations to the right side of the equation system we get:

v1 1⋅+= x1 A −− dx 1

v2 1 11 ⋅+⋅−= xx 2 − d 2

v3 = 2 11 ⋅+⋅− xx 3 − d3

v4 = 3 11 ⋅+⋅− xx 4 − d 4

v5 = 4 11 ⋅+⋅− xx 5 − d5

v6 = 5 11 ⋅+⋅− xx 6 − d 6

v7 = 6 11 ⋅+⋅− xx 7 − d 7

v8 = 7 11 ⋅+⋅− xx 8 − d8

v9 = 8 11 ⋅+⋅− xx 9 − d9 v10 = 1 9 B −+⋅− dxx 10

From this equation system we can directly derive the structure of our matrices:

⎛ 1 ⎞ ⎛ 1 + xd A ⎞ ⎜ ⎟ ⎛ x1 ⎞ ⎜ ⎟ ⎜− 11 ⎟ ⎜ ⎟ ⎜ d2 ⎟ ⎜ x2 ⎟ ⎜ − 11 ⎟ ⎜ d ⎟ ⎜ ⎟ 3 ⎟ ⎜ ⎟ x3 ⎜ ⎜ − 11 ⎟ ⎜ ⎟ ⎜ d4 ⎟ ⎟ ⎜ x4 ⎟ ⎜ ⎟ ⎜ − 11 d ⎜ ⎟ ⎜ ⎟ ⎜ 5 ⎟ A = x = ⎜ x5 ⎟ l = ⎜ − 11 ⎟ ⎜ d6 ⎟ ⎜ ⎟ ⎜ x6 ⎟ ⎜ ⎟ − 11 ⎜ ⎟ d ⎜ ⎟ x ⎜ 7 ⎟ ⎜ − 11 ⎟ ⎜ 7 ⎟ ⎜ d ⎟ ⎜ ⎟ 8 ⎜ ⎟ x8 ⎜ ⎟ ⎜ − 11 ⎟ ⎜ ⎟ ⎜ d9 ⎟ ⎜ ⎟ ⎝ x9 ⎠ ⎜ ⎟ ⎝ −1⎠ ⎝ 10 − xd B ⎠

For the calculation of the parameter vector x the weight matrix P is still needed. The weights can be calculated by formula (5). Data Integration with Adjustment Techniques 9

⎛ p ⎞ ⎜ 1 ⎟ ⎜ p2 ⎟ ⎜ p ⎟ ⎜ 3 ⎟ ⎜ p4 ⎟ ⎜ p ⎟ P = ⎜ 5 ⎟ ⎜ p6 ⎟ ⎜ ⎟ ⎜ p7 ⎟ ⎜ p ⎟ ⎜ 8 ⎟ ⎜ p9 ⎟ ⎜ ⎟ ⎝ p10 ⎠

Now the parameter vector and the residual vector can be calculated:

−1 = T T PlAPAAx ( ) −= lAxv

Exercise 3 - Calculate the weight of each observation on the basis of its standard deviation. - Build the configuration matrix A. - Build the weight matrix P. - Calculate the parameter vector x. - Calculate the residual vector v.

2.4.3 The Law of Error Propagation As already mentioned above adjustment techniques have two main objectives. One task is the estimation of optimal parameter values but the second task is the estimation of the parameter accuracy. The Basis for that second task is the law of error propagation. In mathematical statistics as well as in the adjustment theory the measure for the scattering of a random value is its standard deviation. The meaning of that value should be explained with an example: If we measured a distance d with an observation value of 123.45m then this value has to be seen as a random sample of an infinite population of observations. The average value of this infinite quantity of observations is the so called true value. A standard deviation of σd = ± 2cm means that the true but unknown value of the distance d is with a probability of 67% in the interval (123.45m-2cm)…(123.45m+2cm). The standard deviation is a measure of the average deviation of the observed value from the theoretical true value. Usually the standard deviation is signed with σ. The square of the standard deviation is called variance and signed with σ2. The standard deviation is not identical to the maximal deviation from the theoretical true value. But there exist a rule of thumb for the ratio of both values: maximal deviation ≈ 3 * standard deviation The input parameters of an adjustment are observations which are in the sense of mathematical statistics random samples of a population. The true value λi of an observation li Data Integration with Adjustment Techniques 10

is unknown and exist only in theory but its standard deviation σi is mostly known. The output parameters of an adjustment are the parameters xj and the residual errors vi. Parameters as well as residual errors are functions of the observations and therefore also random variables. The law of error propagation describes the propagation of accuracies for linear functions of random variables. Applying this law to an adjustment it is possible to calculate the standard deviations of the unknown parameters and those of the residual errors.

2.4.3.1 Error Propagation for Linear Functions If we have a linear function of random variables with the structure

= ⋅ + ⋅ 2211 +L+ ⋅lflflfx nn and the standard deviations σ1…σn of the random variables l1…ln are known, then the standard deviation of the parameter x can be calculated by the formula

2 2 2 2 2 22 x 1 1 ff 2 σσσ 2 L f ⋅++⋅+⋅= σ nn (6)

The simplest case of a linear function is a sum. For a sum of random variables

n ∑ i 21 L+++== llllx n i=1 the formula for the calculation of its standard deviation is

n 2 2 2 2 2 2 2 2 x ∑ i 1 2 L n x 1 2 ++=⇒++== Lσσσσσσσσσ n i=1

In case of a difference the application of (6) yields

2 2 2 llx 21 x 1 +=⇒−= σσσ 2 (7)

Consider Figure 5 to have an example.

d1 d2 d3 d4 d5 d6 d7 d8 d9 x A 1 2 3 4 5 6 7 8 9 0 Figure 5: Addition of Distances

We see our one dimensional coordinate system of the previous section. The control point A is fix and we want to calculate the coordinates of the new points by adding the distances.

1 = A + dxx 1 ++= ddxx 2 A 21 L 9 A 21 L++++= dddxx 9

We apply now the law of error propagation to these functions and get: Data Integration with Adjustment Techniques 11

2 2 1 = σσ dx 1 2 2 += σσσ 2 2 1 ddx 2 L 2 2 2 2 9 1 ddx 2 L+++= σσσσ d 9

Exercise 4

- Calculate the standard deviations σ Kσ xx 91 .

- Calculate the standard deviation for the coordinate difference x8-x7.

- Compare the standard deviation of the coordinate difference x8-x7 with that of the distance d8. 2.4.3.2 The Importance of Covariances

In exercise 4 we calculated the standard deviation of the difference x8-x7 and compared it with the standard deviation of the distance d8. As we could see the values are not identical. Why?

The answer is that we neglected the covariance between the random values x7 and x8. But before we explain calculations with covariances we want to interpret the standard deviations of the distances di and those of the coordinates xi. The standard deviations of the coordinates σxi represent the absolute accuracy of the coordinates in relation to the reference frame. On the other hand the standard deviations of the distances σdi represent the relative accuracy of the coordinates related to each other. But which meaning has the covariance? If two calculated random values are functions of partial the same random variable arguments they are stochastically dependent. The degree of their stochastical dependency is quantified by their covariance.

The parameters x7 and x8 are stochastically dependent because they are functions of partial the same random variables. The distances d1…d7 are arguments of both functions; just d8 is an argument of only one of the two functions. But how can we regard these dependencies in the error propagation? This problem can be solved by a generalisation of the law of error propagation. The general form of the law of error propagation can be represented in matrix notation. If there is a system of linear equations describing the functional dependency of parameters xj on the arguments li

= ⋅ lFx (8) and the standard deviations of the arguments li are known then the variances and covariances of the parameters xj can be calculated by the formula

T xx ll ⋅⋅= FCFC .

In this formula the functional matrix F contains the coefficients of the linear functions. The matrix Cll is called the covariance matrix of observations and contains the variances of observations on its principal diagonal and their covariances on its secondary diagonals. In the most common case of stochastically independent observations Cll is a diagonal matrix. Cxx is the covariance matrix of the unknown parameters and contains their variances and covariances. Data Integration with Adjustment Techniques 12

⎛σ 2 ⎞ ⎛ σ 2 xx xx ),cov(),cov( ⎞ ⎜ l1 ⎟ ⎜ x1 21 L 1 u ⎟ 2 2 ⎜ σ l2 ⎟ ⎜ xx 21 ),cov( σ x2 M ⎟ Cll = ⎜ ⎟ C xx = ⎜ ⎟ ⎜ O ⎟ ⎜ M MO ⎟ ⎜ 2 ⎟ ⎜ 2 ⎟ ⎝ σ ln ⎠ ⎝ 1 xx u ),cov( LL σ xu ⎠

Covariance matrices are always quadratic and symmetric. Now, we want to apply the general law of error propagation to our example. At first we have to build the functional matrix F. Because the functions are simple sums F contains just ones and zeros.

⎛1 ⎞ ⎛σ 2 ⎞ ⎜ d1 ⎟ ⎜ ⎟ 2 ⎜ 11 ⎟ ⎜ σ d 2 ⎟ ⎜ ⎟ ⎜ 111 ⎟ σ 2 ⎜ ⎟ ⎜ d3 ⎟ ⎜ 1111 ⎟ ⎜ σ 2 ⎟ ⎜ d 4 ⎟ F = ⎜ 11111 ⎟ C = 2 ⎜ ⎟ ll ⎜ σ d 5 ⎟ ⎜ 2 ⎟ ⎜ 111111 ⎟ σ d 6 ⎜ ⎟ ⎜ 2 ⎟ ⎜ 1111111 ⎟ ⎜ σ d 7 ⎟ ⎜ 11111111 ⎟ ⎜ σ 2 ⎟ ⎜ ⎟ ⎜ d8 ⎟ ⎜ ⎟ ⎜ 2 ⎟ ⎝ 111111111 ⎠ ⎝ σ d9 ⎠

In the result of calculation we get the covariance matrix Cxx which is fully allocated.

Now, we can use elements of Cxx for a further calculation to get the standard deviation of the coordinate difference x8-x7. The functional matrix F for that case is simple again:

F = (− 11 )

The Cll matrix contains the variances of x7 and x8 as well as their covariance from the previous calculation:

⎛ σ 2 xx ),cov( ⎞ C = ⎜ x7 87 ⎟ ll ⎜ 2 ⎟ ⎝ xx 87 ),cov( σ x8 ⎠

If we solve the matrix equation for the general law of error propagation in a symbolic way then we get the expression

2 2 2 − 78 7 σσσ xxxx 8 ⋅−+= xx 87 ),cov(2 .

As we can see this formula which does not neglect the covariance between dependent random variables yields the right result.

Exercise 5

- Build the functional matrix F for the coordinates x1…x9 as functions of the distances d1…d9.

- Build the covariance matrix Cll of the observations d1…d9. Data Integration with Adjustment Techniques 13

- Calculate the covariance matrix Cxx of the coordinates x1…x9.

- Calculate the standard deviations σx1...σx9 of the coordinates x1…x9.

- Build the functional matrix Fd for the coordinate difference x8-x9.

- Calculate the variance and the standard deviation of the coordinate difference x8-x9.

- Compare the calculated value with σd8. 2.4.3.3 Adjustment and Error Propagation If we consider the calculation formula for the unknown parameters in an adjustment then we can see that the parameters are linear functions of the observations.

−1 = ( T ) T ⋅ lPAPAAx 1424 434 F

−1 The expression ()T T PAPAA in this formula is equivalent to the functional matrix F in formula (8). This approach allows for the application of the law of error propagation to calculate the covariance matrix of the unknown parameters. Basis of further calculations is the empirical standard deviation of unit weight. The formula for its calculation is

T Pvv s = , 0 r where v means the residual vector an r the redundancy. The redundancy is the number of supernumerary observations. The value equals the difference between the number of observations n and the number of unknown parameters u. r = − un

The value s0 can be interpreted as the empirical standard deviation of an observation with a weight p = 1. A standard deviation σ should not be mistaken with an empirical standard deviation s. A standard deviation is a constant value and it is an input parameter of an adjustment calculation; whereas an empirical standard deviation is a random variable which is estimated as an output parameter of an adjustment. After an adjustment calculation the empirical standard deviation s0 should be approximately equal to the standard deviation σ0 what means that the quotient s0/σ0 should be in the interval 0.7…1.3. Is s0 too small that means that the a priori supposition of observation accuracy (σi) was too pessimistic. A too large s0 often indicates that there is a blunder among the observations. The complete derivation of the formula for the covariance matrix of unknown parameters is too complex to be presented in this course. Therefore, the formula is just given.

2 2 T −1 xx 0 xx ss 0 ⋅=⋅= ( PAAQC )

Exercise 6 This exercise is based on the result of the adjustment of exercise 3. - Calculate the product vTPv with =mmult(mtrans(v);mmult(P;v)) Data Integration with Adjustment Techniques 14

T Pvv - Calculate the empirical variance of the unit weight s 2 = 0 r

T Pvv - Calculate the empirical standard deviation of unit weight ss 2 == 00 r 2 - Calculate the covariance matrix of parameters Cxx = s0 Qxx

- Calculate the empirical standard deviations of the coordinates x7 and x8

- Calculate the empirical standard deviation of the coordinate difference x8-x7

2.4.4 PAI as Adjustment Problem In this section we want to give a typical example for a PAI adjustment problem. At first we consider the workflow from map digitalization to improved global coordinates under adjustment aspects. Then, we simplify the example to a one dimensional problem and reproduce the process with a concrete calculation.

Local Coordinates Map Digitization xl, yl

Control Points

Transformation Global Coordinates xg, yg

GPS Coordinates GPS Measurement xGPS, yGPS

Improved PAI Global Coordinates xg, yg

Figure 6: PAI Workflow

Map Digitization The result of scanning a map is a raster image with a row-column coordinate system. The coordinates of specific points (building corners, boundary points e.t.c.) are determined in the raster system. On the basis of the scan resolution the raster coordinates can be converted into metrical map coordinates. If the scan resolution is measured in dots per inch (dpi) then the converting form is

m0254.0 xx ⋅= . map raster resolution[dpi]

The standard deviations respectively variances of the digitized map coordinates are dependent on scale factor and quality of the underlying map. By a rule of thumb the standard deviation of a map coordinate value σmap is about 0.5mm. Data Integration with Adjustment Techniques 15

The measured coordinates are random variables and they are stochastically dependent! Why? The point positions on a map result of real world measurements and plots. Both, measurements and plots, have been done by the principle of adjacent points. Therefore, the coordinate values are stochastically dependent analogue the example in exercise 5. But in general it is impossible to reconstruct a map history in detail and therefore the covariances between digitized coordinates are not known. The outcome of a digitalization process is an observation vector lr with map coordinates and a covariance matrix of these observations Cll with known principal diagonal but unknown covariances.

Transformation Map coordinates are transformed in a higher-level global reference frame. Control points with known coordinates in the local map coordinate system and the global reference frame are used to determine the necessary transformation parameters. The general transformation approach is

global = + ⋅ xRtx map with

xglobal: vector of global coordinates t: translation vector of transformation R: rotation matrix of transformation

xmap: vector of map coordinates As we can see the global coordinates are linear functions of map coordinates. Applying the law of error propagation to this equation it is possible to calculate the standard deviations of global coordinates whilst their covariances remain unknown.

GPS Measurement For a number of points global coordinates with higher accuracy are determined by GPS measurements. Each GPS point yields two redundant coordinates.

PAI If the global coordinates of a GIS would be stochastically independent then we could just exchange the less accurate coordinates by more accurate ones. But such a proceeding would lead to a violation of geometrical neighbourhood relationships. Data Integration with Adjustment Techniques 16

Two Data Sets Before Neglected Correlations Considered Correlations Integration

Figure 7: PAI Neglecting and Considering Correlations

Figure 7 illustrates the necessity to consider stochastical dependencies between coordinates of adjacent points. In the presented example coordinates with higher accuracy for the building corners were determined but not for the adjacent tree. If the correlations remain unconsidered and the coordinates of the building are exchanged then the tree seems to stand inside the building after PAI. But if the stochastical dependencies are considered the tree is shifted with the building during PAI.

2.4.4.1 Improving Absolute Geometry But how can correlations be quantified and taken into account? One option is the introduction of artificial covariances. These can be calculated as functions of the distances between always two points. The smaller the distance between two points the bigger becomes their covariance. But in practice this approach is difficult to handle. Often there are more then 100.000 points to be processed what would lead to extremely large covariance matrices. A more practical solution is the introduction of pseudo observations. In that approach stochastical dependencies between GIS coordinates are modelled by coordinate differences. Basis of the determination of these pseudo observations is a delaunay triangulation (see Figure 8).

Figure 8: Delaunay Triangulation

Before triangulation the point positions are expressed unique by global coordinates. This parameterization is exchanged with pseudo observations generated from coordinate differences of points which are adjacent in the triangle network. This new parameterization is redundant but still consistent. Data Integration with Adjustment Techniques 17

3 ⎛ Δx ⎞ ⎜ 12 ⎟ ⎛ x1 ⎞ 2 ⎜ ⎟ ⎜ Δy12 ⎟ ⎜ y1 ⎟ ⎜ Δx ⎟ ⎜ ⎟ ⎜ 23 ⎟ x2 ⎜ ⎟ ⎜Δy23 ⎟ ⎜ ⎟ ⎜ y2 ⎟ Δx xGIS = ⎜ ⎟ l = ⎜ 34 ⎟ x3 GIS ⎜ ⎟ ⎜Δy34 ⎟ ⎜ y3 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ Δx41 ⎟ ⎜ x4 ⎟ ⎜ Δy ⎟ ⎜ ⎟ ⎜ 41 ⎟ ⎝ y4 ⎠ 1 4 ⎜Δx24 ⎟ ⎜Δy ⎟ ⎝ 24 ⎠

Now the observation vector lGPS is extended with GPS coordinates of higher accuracy.

⎛ Δx ⎞ ⎜ 12 ⎟ ⎜ Δy12 ⎟ ⎛ Δx ⎞ ⎜ Δx ⎟ ⎜ 12 ⎟ 3 ⎜ 23 ⎟ ⎜ Δy12 ⎟ 2 GPS ⎜ Δy23 ⎟ ⎜ Δx ⎟ ⎜ Δx ⎟ ⎜ 23 ⎟ ⎜ 34 ⎟ ⎜Δy23 ⎟ ⎜ Δy34 ⎟ ⎜ ⎟ ⎜ Δx ⎟ ⎜ 34 ⎟ ⎜ Δx41 ⎟ lGIS = lGIS = ⎜Δy34 ⎟ ⎜ Δy ⎟ ⎜ ⎟ ⎜ 41 ⎟ ⎜ Δx41 ⎟ ⎜ Δx24 ⎟ ⎜ Δy ⎟ ⎜ Δy ⎟ ⎜ 41 ⎟ ⎜ 24 ⎟ ⎜Δx24 ⎟ 1 4 ⎜ xGPS 1_ ⎟ ⎜Δy ⎟ GPS ⎜ y ⎟ ⎝ 24 ⎠ ⎜ GPS 1_ ⎟ ⎜ xGPS 3_ ⎟ ⎜ ⎟ ⎝ yGPS 3_ ⎠

This leads to an adjustment problem with the improved global coordinates as unknown parameters and the coordinate distances and GPS coordinates as observations. The observation equations have the structure:

Δxij −=+Δ xxvx ij

Δyij −=+Δ yyvy ij M

_ xiGPS =+ xvx i

_ yiGPS =+ yvy i M

These equations represent the functional model of the adjustment. The stochastical model is determined by the observation weights which are on their part functions of standard Data Integration with Adjustment Techniques 18 deviations. The standard deviations of the GPS coordinates depend on the applied measurement procedure and on the reproduction accuracy of the measured points. A practical value is σxy = ±2cm. The standard deviations of the coordinate differences are functions of those of the underlying map coordinates and of the corresponding point distances. We want to explain this with an example. If map coordinates with a standard deviation of σxy = ±1m are given then the formula for the calculation of a coordinate difference standard deviation could be:

d Δ σσ xy ⋅= with d0 = m100 d0

In that case an observation weight is in inverse proportion to the square of the point distance. The results of adjustment are improved coordinates for all points and their covariance matrix. The point accuracy depends then on the distance to the new introduced GPS points.

Exercise 7 We consider a one dimensional coordinate system with 12 points.

1 2 3 4 5 6 7 8 9 10 11 12 x 0

The global point coordinates xGIS are known and were descended from a map digitization. Their standard deviation is σGIS = ± 1m.

The coordinates xGPS of the points 2, 8 and 12 are new determined by GPS measurements. The standard deviation of the GPS coordinates is σGPS = ± 0.02m.

GPS GPS GPS

1 2 3 4 5 6 7 8 9 10 11 12 x 0

- Calculate the coordinate differences ΔxGIS as pseudo observations. - Calculate the standard deviations of pseudo observations by the formula ΔxGIS ΔGIS σσ xGIS ⋅= with d0 = 10m. d0 - Calculate the weights of pseudo observations and of GPS observations using a standard deviation of unit weight of σ0 = ± 1m. - Build the configuration matrix A for the observed coordinate differences and the observed GPS coordinates. - Build the observation vector l. - Build the weight matrix P. T -1 T - Calculate the improved coordinates xPAI = (A PA) A Pl.

- Calculate the residual vector v = AxPAI-l. Data Integration with Adjustment Techniques 19

- Calculate the empirical variance and the empirical standard deviation of unit weight T Pvv s 2 = , = ss 2 . 0 r 00 - Calculate the empirical covariance matrix of improved coordinates 2 2 T −1 xx 0 QC xx ss 0 ⋅=⋅= ()PAA .

- Calculate the empirical standard deviations of improved coordinates sxi = ()C xx ii .

- Draw a MS Excel diagram with the empirical standard deviations of coordinates sxi over the coordinates xi. 2.4.4.2 Improving Relative Geometry The accuracy improvement of GIS coordinates is not exclusively effected by the introduction of precise global coordinates. There are also observations describing the relative geometry between adjacent points. For instance is mostly known that sides of buildings are rectangular or parcel limits are straight. Such geometrical constraints can be modelled by the introduction of corresponding observations. Rectangularity for instance can be expressed by a scalar product. The standard deviation of this scalar product observation can be calculated by an error propagation of the corresponding point coordinates. Because of a mason can build a house with an accuracy of approximately 2cm a standard deviation of coordinates in about the same size would be feasible.

Exercise 8 We want to simulate such a case by introducing an accurate distance observation in the adjustment model of exercise 7. Observed was the distance between the points 5 and 6 with d56 = 10.34m. The standard deviation is σd56 = ± 0.02m. - Extend the configuration matrix A, the observation vector l and the weight matrix P by the distance observation. - Repeat the adjustment calculation of exercise 7. - Compare the results of the exercises 7 and 8.

2.4.5 Appendix: Information to Matrix Operations in MS Excel To be able to use matrix operations in MS Excel it is necessary to activate the analyse functions: - Choose in the main menu Extras > Add-Ins - Activate the check Box ‘Analyse Functions’ respectively keep it activated - Press OK

Terminating the Entry of Matrix expressions The entry of each matrix operation expression has to be terminated by keeping the keys control and shift pressed and then pressing enter

Data Integration with Adjustment Techniques 20

Marking of Arrays Option 1: Set the cell cursor in the left upper cell of the array, Keep the left mouse button pressed and pull the cursor to right lower corner of the array. Option 2: Enter the coordinates of the left upper and the right lower cell of the array separated by a colon. Example: (G27:N40)

Transpose of a Matrix Highlight the resulting array with the cursor. Enter in command line =mtrans( ) To solve the expression keep the keys control and shift pressed and press enter.

Product of Two Matrices Highlight the resulting array with the cursor. Enter in command line =mmult( ; ) To solve the expression keep the keys control and shift pressed and press enter.

Inverting of a Matrix Highlight the resulting array with the cursor. Enter in command line =minv( ) To solve the expression keep the keys control and shift pressed and press enter.

Indicate an Array There are two options to indicate an array in MS Excel: 1. By cell addresses Enter the addresses of the upper left and the lower right cell separated by ‘:’ e.g. C7:F13 or highlight the array with the cursor 2. By naming the array Mark the array with the cursor, click in the name field in the upper right corner of the window and enter the name of the array e.g. A for the configuration matrix.

It is possible to nest the single commands e.g. =mmult(mtrans(A);mmult(P;A))