<<

Technische Universitat

Sonderforschungsb ereich

Numerische Simulation auf massiv parallelen Rechnern



G Globisch and SV Nep omnyaschikh

The hierarchical preconditioning

having unstructured grids

Preprint SFB

Abstract

In this pap er we present two hierarchically preconditioned metho ds for the fast

solution of mesh equations that approximate Delliptic b oundary value problems on

arbitrary unstructured quasi uniform triangulations Based on the ctitious space

approach the original problem can b e emb edded into an auxiliary one where b oth

the hierarchical grid information and the preconditioner by decomp osing functions

on it are well dened We implemented the corresp onding Yserentant precondi

tioned conjugate gradient metho d as well as the BPXpreconditioned cgiteration

having optimal computational costs Several numerical examples demonstrate the

eciency of the articially constructed hierarchical metho ds which can b e of enor

mous imp ortance in the industrial engineering when often only the no dal co ordi

nates and the element connectivity of the underlying ne discretization are avail

able

Key words partial dierential equations parallel computing multilevel metho ds

hierarchical preconditioning nite element metho ds automatical grid generation

The work of this author is also supp orted by the Deutsche Forschergemeinschaft DFG

PreprintReihe des Chemnitzer SFB

April SFB

Contents

Intro duction

The description of the original problem

The construction of the auxiliary problem

The application of the ctitious space lemma

Asp ects of the numerical implementation

Numerical results

Sequential Computing

First results of the parallel computing

References

Authors address

Dr rer nat Gerhard Globisch

Faculty for Mathematics

Technical University ChemnitzZwickau

D Chemnitz

email gerhardglobischmathematiktuchemnitzde

httpwwwtuchemnitzdesfbp eopleglobischhtml

Dr Sergey V Nep omnyaschikh

Computing Center

Sib erian Branch Russian Academy of Sciences

Lavrentiev av

Novosibirsk Russia

email svnepcomcennsksu

Intro duction

In the next section we intro duce the Db oundary value problem of second order having

formally selfadjoint dierential op erator Our aim is the numerical solution of the prob

lem by hierarchical metho ds although in practice we have its unstructured discretization

available only For this in section we construct the structured auxiliary problem into

which the original one can b e emb edded We dene the onetoone corresp ondence b e

tween the no des of the unstructured mesh and the no des of the hierarchically discretized

square dening the ctitious space cf also This approach is used for applying the

ctitious space lemma to derive the corresp onding sp ectral equivalence inequality describ

ing the preconditioning prop erty of the articially constructed hierarchical preconditioner

b elonging to the auxiliary grid p oints In section we give a short survey of the under

lying theory presented detailed eg in In the mentioned pap ers the convergence

rate of the iterative pro cess was proved to b e so fast as it is the case for the conventional

hierarchical solution metho d ie it is nearly indep endent of the mesh size In section

we discuss the various asp ects of the numerical implementation of the new hierarchical

metho d We do it in the case of the auxiliary Yserentant preconditioning as well as in the

more imp ortant case of the articial BPXpreconditioner In section we illustrate the

ecient implementation of the two algorithms computing several Dp otential problems

Moreover in each case we compare our articially constructed hierarchical iteration based

on the canonically p erformed renement of the coarse and structured user triangulation

with this metho d using unstructured ne grids generated by an advancing front mesh

generator Finally rst numerical results of the parallel implementation of our approach

are given where the corresp onding numerical analysis is yet under consideration The

iteration numb ers are rather satisfactory although the comparison with the parallelized

structured metho ds is avoided The basis of the implementation of the unstructured paral

lel solvers is a nonoverlapping domain decomp osition data structure see eg

such that they are wellsuited for parallel machines with MIMD architecture Section

is also an impressive p erformance to demonstrate the practical imp ortance of the de

signed metho ds Often in the industrial engineering b oundary value problems have to

b e solved where a rather ne mesh of the domain and the discretization concept are

given sometimes already resulting in the corresp onding system of equations see eg

But no fast hierarchical solver can b e applied b ecause nothing is known ab out the grid

structure Using our approach this b ottleneck isnt any more We only mention here that

our metho d can b e transfered to the D calculations of b oundary value problems

The description of the original problem

Let R b e a b ounded plane domain with a piecewise smo oth b oundary which

b elongs to the class C and satises the Lipschitz condition see We consider the

elliptic b oundary value problem

P

u

T

a x a x u f x x x x

ij

x x

i j

ij

ux g x x

u

xu g x x

N

Here the symb ol N denotes the conormal derivative wrt the outward normal On the

b oundary of the domain b oth Dirichlet b oundary conditions and Neumann b oundary

conditions are imp osed We have We intro duce the following subspaces of

the Sob olov space H

H fu H ux g x x g

o

H fv H v x x g

Let us supp ose that the co ecient functions a x i j and the righthand side

ij

f x of the ab ove b oundary value problem are such that from we may derive the

symmetric and co ercive bilinearform

Z Z

X

u u

a x uv dx a x au v x uv dx

ij

x x

j j

ij

and the continuous linear functional

Z Z

g x v dx f x v dx l v

which dene the well known variational problem

o

u H au v l v for all v H

where we seek for the solution u H Having this as we know eg by for

the variational problem there is an unique solution u H which we want to

compute numerically Hereafter for simplicity we may supp ose g x to b e equal to zero

h M

Let a p ositive parameter h b e xed which is suciently small and let

i

i

b e a quasiuniform triangulation of the domain In practice often the triangulation is

rather ne and unstructured ie the mesh data information is consisted of the no dal

co ordinates and the element connectivity only see eg Figure The quasi uniformity of

h

the triangulation means that there are p ositive constants l l and s indep endently of

the discretization parameter h such that

r

i

s i M l h r l h

i

i

where r and are the radii of the circumscrib ed and the inscrib ed circles for the triangles

i i

resp ectively see also Moreover we also assume that the triangulation b oundary

i

h

approximates the b oundary with an error O h see for more details

r

r

A

A

A A

h

r A A

r

h

h

Figure The domain and its unstructured quasiuniform triangulation

h h

For the triangulation we dene the space H of real continuous functions which

h

h h

are linear on each triangle of and vanish at the b oundary part We extend these

h h

functions on n by zero The solution u of the following pro jection problem

h h h h h h h

u H au v l v for all v H

h h

is called an approximate solution Asp ects of approximation have b een thoroughly stud

h h

ied in Each function u H is put in standard corresp ondence with a

h

N h

real column vector u R whose comp onents are the values of the function u at the

h

corresp onding no des of the triangulation Then the problem is equivalent to the

solution of the system of mesh equations

f where we have Au

h h h h h

Au v au v for all u v H

h

h h h

f v l v for all v H

h

h h

and v The symb ol Here u and v are the corresp onding prolongations of the vectors u

N

denotes the Euclidian scalar pro duct in R

The aim of this pap er is to construct a symmetric p ositive denite preconditioning

op erator C for the problem satisfying the sp ectral equivalence inequality

N

c C u u Au u c C u u for all u R

where the p ositive constants c and c are indep endent of the discretization parameter

h Furthermore jumping co ecients a x i j may not essentially detoriate

ij

this fast convergence prop erty of the corresp onding solution metho d capitalizing from the

ab ove preconditioning Clearly the multiplication of a vector by C should b e easy to

implement

The construction of the auxiliary problem

The preconditioner C is constructed applying the metho d of ctitious space see eg

in two stages At the rst and interim stage we pass from the arbitrary unstructured

h

triangulation to an auxiliary structured nonhierarchical mesh and using this at the

second stage to the hierarchical mesh which is dened to b e the hierarchical mesh of the

square containing the original domain We note that the passage from an arbitrary

triangulation to a structured mesh was earlier used in The preprint includes the

development of for the case of lo cally rened grids Other techniques for constructing

the preconditioners on unstructured meshes were prop osed in

The denition of preconditioning op erators having nonhierarchical grids was considered

in

In order to use the Lemma of ctitious space for analysing the articially dened

h

preconditioners we construct the discretized auxiliary space and the corresp onding

h h

op erators b etween and as follows We emb ed the domain in a square see

h

Figure Let K denote the union of triangles in the triangulation which have the

i

vertex z in common and let d b e the maximum radius of the circles which may b e

i i

h

inscrib ed into K In the square we intro duce an auxiliary rectangular grid having

i

the step size h such that

p

h min d

i

iN

J

Now let us x h l where l is the length of the sides of and J is an appropriately

h

cho osen p ositive integer Throughout the pap er sp eaking ab out the set and their

h

subsets we identify h by h We denote the no des of the grid by Z x y i j

ij i j

h

L Using the cell diagonals we triangulate For this see also Figure

h

J

d

i

h

mapping

T

h h

R

p

mind h

i

h h

Figure Emb edding the unstructured mesh into the auxiliary grid

h

Let the cells of denoted by D fx y x x x y y y g Doing as

ij i i j j

h L

D given we get

ij

ij

h h

Let Q b e the minimum gure consisting of cells D and containing Hence we

ij

h h h h h

have Q Hereafter b ecause of the ab ove triangulation of the sets and Q

h

and their subsets refer to triangulations as well Let S b e the set of the b oundary no des

h h h h

as follows If D and S of Q We sub divide the set S into two subsets S

ij

h h h h h

S nS Consequently we have S all the no des of D S are in S

ij

h

Let H Q b e the space of the real continuous functions which are linear on the

h

h h

triangles of Q and vanish at the no des of S

h h

Now we dene the pro jection op erator R H Q H and the extension

h h

h h h h

op erator T H H Q For a given mesh function U Z H Q we dene

h h ij h

h h h

a function u H as follows Let z b e a vertex in the triangulation Assuming

h l

that z D we put

l ij

h h h

u z RU z U Z

l l ij

h h

The function u is equal to zero at the no des z Let us dene the expanded op erator

l

h h h

R H H to b e the op erator of restriction on as follows

h h

h h h

RU Z U Z for all Z Q

ij ij ij

h h h

Sub dividing the no des of into the no des of Q including those of S and the remaining

no des and ordering them we obtain the matrix representation of R to b e R R O cf

also

The denition of the op erator T is given in the following For the mesh function

h h h h h

u H we supp ose a function U H Q The function U is equal to zero at

h h

h h

the no des Z S At all of the other no des the function U is dened as follows If a

ij

h

cell D contains a certain vertex z of the triangulation we put

ij l

h h h

U Z T u Z u z

ij ij l

h

For each of the remaining no des Z Q we nd the closest vertex z of the triangulation

ij l

h

In the case of several closest vertices we may cho ose any of them and using it we put

the same as ab ove By the theorem of the extension of mesh functions given in there

h h

exists the extension op erator T H H which is also uniformly b ounded

h h

h h

wrt the parameter h This op erator spreads over

h

Finally in the space H Q we dene the op erator A to b e

h Q

Z

h h h h

A U V rU rV U V dx

Q

h

Q

h h h

for all U V H Q This is an auxiliary problem we did not discretize but we use

h

the op erator A to make the application of the ctitious space lemma p ossible as it is

Q

done in the next section

The application of the ctitious space lemma

Taking the conventions into account which we adopted in the previous section the pre

conditioning op erator C in can b e constructed by means of the lemma of ctitious

space see also For convenience we give this lemma here

Lemma

and respectively Let Let H and H be Hilbert spaces with the scalar products

H H

A H H and A H H be symmetric and positive denite continuous operators

in the spaces H and H Suppose that R is a linear operator such that R H H and

c Av v is full led for al l v H Moreover there exists an operator A Rv Rv

R H H

T such that T H H for which the conditions RT u u and c AT u T u

T H

A u u are valid for al l u H Here c and c are positive constants Then

H R T



u u RA R c A c A u u u u

H R T H H



holds for al l u H The operator R is the adjoint to R wrt the scalar products

 

and ie we have R H H and R u v u Rv

H H H H

We note that for constructing and implementing the preconditioner ie the op erator



RA R we only require the existence of the op erator T Having the situation given in

section and the role of the op erator A is played by the stiness matrix A in The

h h

nite dimensional space H plays the role of the space H The space H Q is used

h h

to b e the ctitious space Thus for the op erator A may stand the A

Q

Now according to the ab ove lemma there exist p ositive constants c and c indep endent

of the mesh size parameter h such that



A u u RA R u u cA u u c

Q

N

is valid for all vectors u R For the pro of of this see also Hereafter we use the

same designation for an op erator and its matrix representation

Considering inversely nally we get the following result which was proved in

h

taking distinct b oundary conditions on into account There are p ositive constants c

and c such that

 

c A u u C R u c A u u R u

h h

b c

N

R b elonging to the original discretization In the precon is fullled for all vectors u

ditioner C is either the BPXmultilevelpreconditioner see also or the Yserentant

h

hierarchical preconditioner see also which we may construct now on the structured

h

hierarchical grid As it was exp ected in the case of the articial BPXmultilevel

preconditioner the constants c and c are indep endent of the auxiliary mesh size param



R A which we applied numer eter h Hence the condition numb er of the op erator RC

h

ical within the cgiteration pro cess is of order O In the case of the articially con



structed Yserentant preconditioning we may have the condition numb er RC R A

h

O J where the index J indicates the depth of the articially constructed hierar

h

chical mesh The numerical results given in section illustrate the go o d convergence

b ehaviour of the corresp onding cgmetho ds impressively

We implement the corresp onding hierarchical preconditioners C using the auxiliary

h

h

grid as it is given in the next section

Asp ects of the numerical implementation

In this section we itemize and analyse the numerical op erations which are additionally

necessary for implementing the articially constructed preconditioners within the corre

sp onding conjugate gradient metho d

In the case of the Yserentant preconditioning the correction vector w of the cgiteration

pro cess is computed to b e

T

w C r QJ jj J Q r

where Q is the well known basis transformation b etween the usual nite element no dal

denotes the residual basis and the hierarchical basis see eg The symb ol r

vector and the matrix J diag A p erformes the Jacobi preconditioning as given

In the case of the BPXpreconditioning we have

J

X

T

r Q jj Q w C r r

j

j

j

where Q is the basis transformation b elonging to the j th level j J see also

j

Note that in this case the leveldep ending Jacobi preconditioning can also b e

applied

The numerical implementation of our preconditioning metho ds C means that we

h



work in and using the longer vectors v R r consisting of either L comp o

h

nents b elonging to the usual hierarchical list of the grid or the corresp onding numb er

h

of comp onents dened by the hierarchical BPXlist of We call the rst articially

constructed hierarchical preconditioning artYs and the second metho d is epitomized by

artBPX Let us note that in b oth cases we do not make use of any coarse grid solver as

it is usually the case when the application of the hierarchical preconditioners is classically

implemented Now we describ e the following actions

p

mind i N The computation of the parameter h

i

Instead of calculating the maximum of the radii of inscrib ed circles wrt the triangle

h

set K having the p oint i in common for all of the triangles k M we

i i

compute their three heights Then taking the minimum of them for dening d it must b e

divided by two to get the parameter h For this we need M N N op erations

h

The denition of the auxiliary grid depending on h

h i

Using only the co ordinates of the p oints at the b oundary we calculate x minx

l

i i i h

x maxx y minx and y maxx i card Then we have

r b u

J

l maxy y x x and J log l h log Thus we arrive at l h

u b r l

J

which is the length of the square and at L which is the numb er of

h h

p oints in The ab ove symb ol denotes the entierop eration Finally we center

h

with resp ect to using x y and l The numb er of op erations which is necessary for

l b

p erforming is negligible in comparison with the other eorts analysed in this section



The denition of the matrices R and R respectively



For the matrix R we use a vector having L comp onents which are dened according to

h

section Sp eaking more detailed for each p oint k k N we are seeking

h

for the cell D i j L containing the p oint k Provided that the no des of

ij

are numb ered linewise from b elow to ab ove and knowing L for the p oint k the row and

h

column indices i and j are computed Then we get the numb er k of a vertex in We

p

 

put k into R at the k th p osition The other comp onents of R are set to b e zero We

p

get



k if k N

R k

p

otherwise

where k L The numb er of arithmetical op erations for implementing the

p

h

ab ove calculation of the no dal p oint in uniquely assigned to the vertex in is a total

of approximately N

h

The determination of the hierarchical list depending on the mesh

h

Let the p oints of the grid b e numb ered linewise from b elow to ab ove throughout the

h

structured hierarchical mesh having the depth J

By the little subroutine called lo cp ointk L L J j f ath f ath for each input no de

k k L we compute the p osition j of k in the auxiliary hierarchical list

h

and the two father no des taking the mesh connectivity of the hierarchical grid of

h

depth J into account For this we need a total amount of approximately L C N

h

h

dep ends on the given homogeneity of numerical op erations where the constant C

h

T

h

the unstructured grid Because the op eration of the typ e QQ see and

resp ectively can b e reduced to the simple utilization of the corresp onding hierarchical

list we get the amount for this which is equivalent to L in each iteration step of the cg

metho d Although the memory size of L words for the auxiliary Yserentant hierarchical

list is considerable we note that their determination p erformed only once at the b eginning

of the cgiteration is really more eective than the utilization of nested dierences of the

h

co ordinates of all of the no des k of the grid which had to b e used within each iteration

step two times for doing the same as required by and The auxiliary Yserentant

h

hierarchical list can b e easily extended to the BPXlist of the grid calling the program

hbbpx The amount for implementing is equivalent to those which was given

ab ove for where corresp ondingly for the auxiliary BPXlist the memory size less

than L words is necessary

Computing the diagonal matrix J h for also performing the Jacobi preconditioning

At rst we set the real vector J h k k L zero Now for all p oints k in

h

the closure we want to compute a real numb er approximating the inverse of the

corresp onding main diagonal element of the auxiliary stines matrix A intro duced by

Q

in section Then we put this numb er at the k th p osition of the vector We

h

implement this numerically for all of the triangles i M as follows

i

Using the three vertices of the we dene the minimum rectangular union of cells

i

h

D encompassing the triangle Now for all vertices x of this cell union we p erform

ij

the decision whether x is inside or outside the closure of the triangle i M

i

In the case of interior p oints we mark them by putting the numb er of the corresp onding

k In the case of a vertex which is lo cated at an triangle into the vector p ositions J

h

edge of the triangle we also mark this lo cation by the numb er of the triangle but in

i

addition to sp ecically The pro cess for p erforming the innerouter decision is considered

in the following

Let P t i b e the parametrizations of the three straight lines dened by the

i i

three edges of the triangle Obviously to get the ith edge including b oth the start and

i

the end vertex we vary the real parameter t in the interval Fixing the vertex x we

i

T

dene the parametrized equation of the horizontal straight line to b e P t x t

x x

where t Cutting P t and P t i we count the numb er of p ositive

x x i i

and negative signs of the parameter t resp ectively We do this when the horizontal

x

straight line has a non empty intersection with each of the straigh lines P t i

i i

where the parameter t is in the op en interval We get

i

T

cardfsignt signt where x t P i t g

x x x i i

T

cardfsignt signt where x t P i t g

x x x i i

Having the triangle domain it is easy to see that the two sets ab ove have either an even

or a o dd cardinal numb er In the case of the o dd numb er the discrete vertex x is in

the closure of the triangle otherwise it is outside Naturally if the horizontal line

i

parameter t is equal to zero we have x

x i

In the limit case ie if one edge related straight line parameter t is equal to or

i

we shift the horizontal test straight line orthogonally upward and downward using the

T T

shift vector and resp ectively where h is suciently small Then

according to the ab ove regulation again testing the cutb ehaviour we get the desired

signs of the parameter t putting it into the decision set This algorithm for p erforming

x

the innerouter decision can b e generalized to the case of a domain which is b ounded by

arbitrary piecewisely parametrized b oundary descriptions In our case taking all triangles

h

of the unstructured grid into account we approximately need the numerical amount

M L to do as describ ed

Let us note the following By the sp ecial marking of all of the cell p oints of the grid

h

that remained up to now unmarked and have the directly horizontal andor vertical

andor inclined edgeconnection with at least one marked interior p oint which is not

h h

lo cated at the edge of an triangle we get the shap e of the auxiliary domain Q closing

h

in the form of steps This marking pro cess runs globally throughout ie step by step

h

for all the p oints k k L in the corresp onding triangular seven p oint

star is considered making the ab ove decision

When the ab ove marking was well done we may use the interim numb er entries in the

vector J h to approximate real main diagonal values as follows where we have the entries

a i N of the stiness matrix A of the original problem available

ii



a if R k i i N

ii

J h k



a if R k

k k

When the vector comp onent J h k was marked by a triangle numb er b elonging to k the

value a can b e computed to b e eg the arithmetic mean

k k

a a a a

k k i i i i i i

where the values a a and a are the main diagonal entries of A b elonging to the

i i i i i i

three no des of the corresp onding triangle numb er previously set into the k th p osition of

J h k When the comp onent J h k was unmarked the zero value remains as it was set

at the b eginning Moreover we can denea to b e the edge related weighted mean

k k

a a s s s a a s s s

i i i i i i i i i i i i i i i i i i i i

a

k k

s s

a a s s s

i i i i i i i i i i

s

h

and the magnitude s s s where s s is the distance of the vertex x

i i i i i i i i

m n

in the closure of the triangle to the triangle edge which has the start p oint i and the

i m

end p oint i m n m n Cho osing the ab ove denition of the vector J h in

n

comparison with the arithmetic mean denition the convergence b ehaviour of the metho d

b ecomes hardly improved For the ab ove two opp ortunities to set real values into J h k

we need less than approximately L and L numerical op erations resp When

k k L we apply the articially constructed BPXpreconditioner the vector J

h

can b e extended to the corresp onding BPXlength with negligible amount

Thus we may conclude as follows In the case of artYs as well as in the case

of artBPX the preconditioner has an optimal computational cost ie the numb er of

arithmetic op erations required for their implementation is prop ortional to the numb er of

unknowns in the problem

If the problem includes jumping co ecients a x i j we p erform the

ij

outer Jacobipreconditioning that corresp onds to the calculation w J C J r

Esp ecially if the ratio of the jumps are large say eg greater than the articial

preconditioning metho ds would fail without doing as given In this case the diagonal

T

matrix J h k b etween Q and Q has simply the entries when the k were marked

k L see and otherwise The ab ove outside vector J of length N

contains the ro ot of the main diagonal of A

To get the rst parallel implementation of our metho ds we have done the following

Whereas for the parallelization of the classical hierarchical metho ds by the non overlap

ping domain decomp osition the communication wrt the correction values b elonging to

the coupling no des is p erformed at the stage marked by the symb ol jj in and

resp ectively see eg in our case we can not use this approach Emb ed

h

ding the p sub domain meshes s p arisen from the domain decomp osition of

s

h

the domain into the corresp onding auxiliary grids s p in general we get

s

an overlapping union of the auxiliary regions Having the describ ed situation the corre

sp onding vector typ es of the parallel cgmetho d can not b e handled as it is well known

up to now see eg and the references therein To overcome the diculties that o ccur

when the rst exp eriments in section were made we p erform one communication b e

T

and one communication after Q was completed Hence we accumulate fore applying Q

the correction vector w s p b efore starting the next cgstep By means of this

s

strategy we get a parallelizable preconditioner Thus eg for the Yserentant hierarchical

p p

preconditioning we have

X X



T

H r H R C R w jj C jj r

s s s h s

s

s s s

s

s s

where the accumulation matrices H symb olically handle the communication wrt the

s

s p distributed to p pro cessors having L comp onents there residual vectors r

s

s

As it is given in section it seems that this approach gives rather bad iteration

h

numb ers when the auxiliary grids s p do really overlap The problem of the

s

denition of the preconditioner in the case of the parallel cgmetho d such that the eect

of the preconditioning is indep endent of the mesh size remains yet to b e solved

Numerical results

This section is devided into two subsections consisting of the numerical tests computing

p otential problems sequentially on a large HP workstation and in parallel using the

GCPowerPlus multipro cessor computer resp ectively

The tables presented here contain the results for the cgalgorithm preconditioned by

the articially constructed Yserentant preconditioner artYs as well as by the aricially

constructed BPXpreconditioner artBPX b oth computing the itemized test examples

The sub column marked by struct grid means that we p erform computations using

a coarse structured initial grid successively rened canonically as the level depth J in

h

creases but emb edded in the corresp onding auxiliary grid consisting of L p oints

For comparison the sub column marked by unstr grid contains the results b elonging to

really unstructured grids generated by the mesh generator given in having nearly

the same numb er N of degrees of freedom Here b oth the numb er of cgiterations and

the corresp onding CPUtime in sec are given which were needed to get the relative error

of the cgiteration less than the previously dened accuracy The relative error

was measured in the AC Anorm In the rst column indicating the depth J sometimes

two numb ers divided by the symb ol are given which dier from each other Then

the rst numb er b elongs to the auxiliary grid depth due to the canonical renement of

the structured initial mesh and the second one is the depth of the auxiliary grid having

some inhomogeneities causing the dierent J by means of the computation of the triangle

heights Naturally here we have another numb er of L given in the corresp onding row

b elow

At the b ottom of all of the tables the p ercentages of the CPUtime are given which were



necessary for p erforming the op erations indicated by R R and the preconditioning C

h

within the cgiteration resp ectively where the third p ercentage includes also the amount

of the cgiteration itself The p ercentages are measured on an average wrt the given

depths J of the auxiliary grids Taking this p ercentages into account we nally discover

that the articially constructed hierarchical metho ds using only the no dal co ordinates

and the element connexion need the numerical eort which is approximately times

1

In every table changed in the columns marked by struct grid using scriptsize the added brack

ets include the iteration numb er and the corresp onding CPUtime for the real structured hierarchical

metho ds

2 h

In the given CPUtime neither the times for computing the hierarchical lists of the auxiliary grid

h

and the step form approximation Q inside nor the time for considering the supp ort of the corresp onding

h

grid functions wrt the b oundary conditions on are incorp orated In practice this hidden amount

do es enlarge the real CPUtime substantially

more than the eort of the original hierarchical approach having a lot of additional mesh

data information to b e input Therefore the application of our new metho ds is a go o d

practice esp ecially for the industrial engineering

We do not hide the following which we observed eg computing the examples and

The more the unstructured meshes get lost their quasiuniformity the more the iteration

numb ers of the corresp onding preconditioned cgmetho d increase But this is in accor

dance with our theory For the approach to get rid of the b ehaviour caused by lo cally

rened grids see eg

Sequential Computing

The results are computed by means of the HP Kworkstation using large

memory size GigaByte and on an average MFlop p erformance The executable pro

grams are called pmhiartYsHPPApx in the case of the articially p erformed Yserantant

hierarchical preconditioning and pmhiartBPXHPPApx in the BPXcase resp ectively

The information ab out the software background of these packages including to ols of the

pre and p ostpro cessing are contained eg in

Preconditioning having the potential problem in the square

u in

T

on fx x x x x g fx x x g

u

on fx x x g

where and u N on n

artYs artBPX

N L struct grid unstr grid struct grid unstr grid J

mem ex

mem ex memory exceeded memex



R

R

C

h

Table cgiterations and CPUtimes for the computing in the square SFB 393 - TU Chemnitz-Zwickau SFB 393 - TU Chemnitz-Zwickau

struct. mesh: N=25 unstr. mesh: N=82

Figure The structured mesh N and the unstructured mesh N

Preconditioning having the potential problem in the club shaped domain

u in fx x x g

x marked by in Figure

u

x marked by in Figure

u N on n

SFB 393 - TU Chemnitz-Zwickau SFB 393 - TU Chemnitz-Zwickau

unstr. mesh: N=47 unstr. mesh: N=158

SFB 393 - TU Chemnitz-Zwickau SFB 393 - TU Chemnitz-Zwickau

Z

Z

Z

Z

struct. mesh: N=16 unstr. mesh: N=563

Figure The structured mesh and a subsequence of the unstructured meshes

artYs artBPX

J N L struct grid unstr grid struct grid unstr grid

mem ex memex



R

R

C

h

Table cgiterations and CPUtimes for the computing in the club shap ed domain

Preconditioning having the problems a and b in the circular domain

divaxgradux in fx x x g

where a ax x

x n

and b ax

x marked by in Figure

p p

x g x x on fx x

u

on n

SFB 393 - TU Chemnitz-Zwickau

SFB 393 - TU Chemnitz-Zwickau

struct. mesh: N=41 unstr. mesh: N=40

Figure The structured mesh N and the unstructured mesh N

artYs artBPX

J N L struct grid unstr grid struct grid unstr grid

mem ex

mem ex mem ex memex



R

R

C

h

Table cgit and CPUtimes for the homogeneous problem in the circular domain

artYs artBPX

J N L struct grid unstr grid struct grid unstr grid

mem ex

memex memory exceeded

Table cgiterations and CPUtimes for the material problem in the circular domain

Obviously this test example and also the rst one have the p eculiarity of the given jumping

b oundary condition in common Naturally the real unstructured meshes for computing

the inhomogeneous problem were also generated by the advancing front algorithm given

in where the interfaces can b e taken into account To abbreviate this section we

renounce to present the corresp onding grids

Preconditioning having the problems a and b in the SFBdomain

divaxgradux in SFB see Figure

where a ax x SFB

x S

x F

and b ax

x B

u x x on exterior part of

u N on n interior b oundary pieces

For this problem the really unstructured mesh having N no des was already shown

in Figure For completing the structured part of the tables b elow we used the initial

mesh given in Figure N consecutively rening it canonically

artYs artBPX

J N L struct grid unstr grid struct grid unstr grid

memex mem ex



R

R

C

h

Table cgit and CPUtimes for the homogeneous problem in the SFBdomain

artYs artBPX

J N L struct grid unstr grid struct grid unstr grid

mem ex memex



R

R

C

h

Table cgiterations and CPUtimes for the material problem in the SFBdomain

SFB 393 - TU Chemnitz-Zwickau

"SFB"-Isolines, 4 Level

Figure The lled SFBisoline picture delivered by our p ostpro cessing

Magnetic eld computation in an electronic motor

This example is of imp ortant practical interest cf also for details The domain

is the fourth of the cross section of an electronic motor the magnetic eld computation

must b e calculated in The following Figure presents motors geometry with its distinct

material prop erties additionally connected with geometric p eculiarities which are causing

solutions singularities in several indicated p oints P i see also for more

i

details

abs p ermeability VsAm

b

rel p ermeability and the given materials

d

a iron rotor

r

r

P

P

r

b c p ermanent magnet r

r

a

P

r

d sheetmetal shell

r

P

e

c

e air gap

r

r

r

P P

Figure The fourthcross section of the electronic motor containing materials

By means of Maxwells laws the magnetic eld problem dened on motors cross section

can b e rewritten in the following variational formulation cf also

o o

Find the function u H such that for all v H holds

Z Z

v v

T

dx d r u rv dx dx B B

x x

x x y x

r r

denote the remanent inductions of the p ermanent magnet in x and and B where B

x x

in x direction resp ectively

Because of the complicate inner geometry no structured grids for discretizing this domain

are available Moreover as it can b e seen already in Figure we hint at the fact that by

the automatical mesh generator in the unstructured mesh can b e initially adapted to

the given p oint singularities

artYs artBPX

J N L struct grid unstr grid struct grid unstr grid

mem ex



R

R

C

h

Table cgiterations and CPUtimes for the magnetic eld problem SFB 393 - TU Chemnitz-Zwickau

unstr. mesh: N=469

Figure The adaptive mesh of the motors fourth losing the quasiuniformity

For problems with inhomogeneous co ecient functions a x i j having jumps

ij

we used the outer Jacobipreconditioning in p oint of the previous section More

over if all of the interfaces dened by the dierent material prop erties coincide with

h

edges of the auxiliary mesh no diculties o ccured when using the auxiliary dened

inner Jacobipreconditioning nevertheless Hence we may conclude that this inner

Jacobipreconditioner b ecomes inadequately disturb ed when the interface approximation

is p erformed in the form of steps Therefore we prop ose the shifting of appropriately

h

cho osen no des of the domain Q to the interfaces to overcome the describ ed diculties

in an other way which may b e even more successful Computing the inhomogeneous

problems the weeker increasing of the iteration numb ers starting at a certain stage of J

observed eg in Table is due to the b etter approximation of the interfaces as it is

made more precisely in the form of steps

First results of the parallel computing

To get the results of the subsection we used the well known Parsytec parallel computer

GCPowerPlus having MByte memory at each pro cessor no de and a p eak p erformance

of MFlop The programs are called pmhiartYspp cpx in the case of the articially

p erformed Yserantant hierarchical preconditioning and pmhiartBPXpp cpx in the BPX

case resp ectively For more details describing the related software to ols see also

The next three examples are computed using pro cessors in each case The domain

decomp ositions used to b e the basis of the parallelization in the case of the structured

gridcalculation are given according to the meshes presented in part of the Figures

and resp ectively Computing in the square we have the subsquares consisting of

the two initial triangles shown in Figure Computing in the club shap ed domain we have

the subtraingles given in the leftb elow of Figure to b e the sub domains for the DD

based parallelism For the parallelization of the real unstructured gridcomputations

esp ecially in the case of the magnetic eld calculation we applied the FEdata distribution

of the meshes after their generation was done by the parallel mesh generator in Here

P

p

the parameter L is the sum L and for J holds J maxJ s p

s

s

s

The decomposed problem no of the previous subsection

artYs artBPX

J N L struct grid unstr grid struct grid unstr grid

mem ex mem ex

memex mem ex mem ex



R

R

C

h

Table cgit CPU for problem subsquares and data distribution resp

The decomposed problem no of the previous subsection

artYs artBPX

J N L struct grid unstr grid struct grid unstr grid

mem ex

mem ex mem ex

memory exceeded



R

R

C

h

cgit CPU for problem subtriangles and data distribution resp

The paral lel magnetic eld computation in the fourth of the motor

artYs artBPX

J N L struct grid unstr grid struct grid unstr grid

mem ex



R

R

C

h

Table cgit and CPU for problem where data distribution was made

Finally let us give the following remarks comparing the results of the three tables pre

h

sented here If all of the sub domain meshes s p into which the whole mesh

s

h h

is decomp osed coincide with the auxiliary square grids s p the computa

s

tion in parallel is very ecient as it was exp ected see Table Otherwise the step form

h

approximation of the coupling b oundaries dened by the domains Q which are subsets

s

h

of the overlapp ed grids s p detoriates the convergence of the preconditioned

s

parallel cgmetho d substantially In comparison with the results of the parallel artYs

metho d the more bad iteration numb ers of the parallel version of the artBPX are caused

by the weekness that in the latter case up to now the communication is only p erformed

wrt the coupling no des assigned to the nest level zone of the articial BPXlist

We are seeking for the remedy to recover the fast convergence of the articially precondi

tioned cgmetho ds also in the general case of their parallelization

References

T Ap el SPCPM Po D Users manual Preprint SPC TU ChemnitzZwickau

Decemb er

T Ap el F Milde and M The SPCPM Po D Programmers manual Preprint SPC

TU ChemnitzZwickau Decemb er

GP Astrachanzev Fictitious domain metho d for the secondorder elliptic equation with

natural b oundary conditions Zh Vychisl Mat Mat Fiz pp

JP Aubin Approximation of El liptic Boundaryvalue problems WileyInterscience New

York London Sydney Toronto

RE Bank and J Xu An algorithm for coarsening unstructured meshes Numer Math to

app ear

RE Bank and J Xu The hierarchical basis multigrid metho d and incomplete LU decom

p osition Contemp orary Mathematics pp

FA Bornemann and H Yserentant A basic norm equivalence for the theory of multilevel

metho ds Numer Math pp

JH Bramble JE Pasciak and J Xu Parallel multilevel preconditioners Math Comp

pp

TF Chan and BF Smith Domain Decomp osition and Multigrid algorithms for elliptic

problems on unstructured meshes Contemp orary Mathematics pp

Ph Ciarlet The Finite Element Method for El liptic Problems NorthHolland Amsterdam

G Globisch PARMESH a parallel mesh generator Parallel Computing no

pp

G Globisch Robuste Mehrgitterverfahren fur einige elliptische Randwertaufgab en in zwei

dimensionalen Gebieten Technische Universitat Chemnitz Dissertation Chemnitz

G Haase U Langer and A Meyer Parallelisierung und Vorkonditionierung des CG

Verfahrens durch Gebietszerlegung in G Bader R Rannacher G Wittum eds Nu

merische Algorithmen auf TransputerSystemen TeubnerSkripten zur Numerik Teubner

Verlag Stuttgart

G Haase T Hommel A Meyer and M Pester Bibliotheken zur Entwicklung paralleler

TU ChemnitzZwickau June Algorithmen Preprint SPC

B Heise Analysis of a fully discrete nite element metho d for a nonlinear magnetic eld

problem SIAM J Numer Anal no pp

M Jung Parallelization of multigrid metho ds based on domain decomp osition ideas

TU ChemnitzZwickau Novemb er Preprint SPC

R Kornhub er and H Yserentant Multilevel metho ds for elliptic problems on domains not

resolved by the coarse grid in DE Keyes J Xu eds Domain decomp osition for PDEs

Contemp orary Mathematics pp

AM Matsokin Extension of mesh functions with normpreserving in Variational Meth

o ds of Numerical Analysis Comp Centre Sib erian Branch of Acad Sci of the USSR

Novosibirsk pp in Russian

AM Matsokin Solution of grid equations on nonregular grids Preprint No Comp

Centre Sib erian Branch of Acad Sci of the USSR Novosibirsk in Russian

AM Matsokin and SV Nep omnyaschikh The ctitious domain metho d and explicit

continuation op erators Zh Vychisl Mat Mat Fiz pp

A Meyer A parallel preconditioned conjugate gradient metho d using domain decomp osi

tion and inexact solvers on each sub domain Computing pp

A Meyer and M Pester Verarb eitung von SparseMatrizen in Kompaktsp eicherform

KLZKZU Preprint SPC TU ChemnitzZwickau June

SV Nep omnyaschikh Fictitious space metho d on unstructured meshes EastWest J

Numer Math no pp

SV Nep omnyaschikh Mesh theorems of traces normalization of function traces and their

inversion Sov J Numer Anal Math Mo del no pp

SV Nep omnyaschikh Metho d of splitting into subspaces for solving elliptic b oundary

value problems in complexformdomains Sov J Numer Anal Math Mo del

no pp

TU SV Nep omnyaschikh Optimal multilevel extension op erators Preprint SPC

ChemnitzZwickau March

SV Nep omnyaschikh Preconditioning op erators on unstructured grids Preprint No

WeierstraInstitut fur Angewandte Analysis und Sto chastik ISSN Berlin

LA Oganesyan and LA Ruchovets Variational Dierence Methods for Solving El liptic

Equations Izdat Akad Nauk Arm SSR Erevan in Russian

P Oswald Multilevel Finite Element Approximation Theory and Application BG Teub

ner Stuttgart

M Pester and S Rjasanow A parallel version of the preconditioned conjugate gradient

TU ChemnitzZwickau June Journal of Num Lin Alg metho d Preprint SPC

with Appl

W Queck ed FEMGP Finite Element Multigrid Package Programmdokumentation

Technologieb eratungszentrum Parallele Informationsverarb eitung GmbH TBZPARIV

Bernsdorfers Str D Chemnitz

J Xu Iterative metho ds by space decomp osition and subspace correction SIAM Review

no pp

J Xu The auxiliary space metho d and optimal multigrid preconditioning techniques for

unstructured grids submitted to Computing

l

GN Yakovlev On traces of piecewise smo oth surfaces of functions from the space W

p

Mat Sb ornik pp

H Yserentant On the multilevel splitting of nite element spaces Numer Math

pp

Other titles in the SFB series

V Mehrmann H Xu Chosing p oles so that the singleinput p ole placement problem is

wellconditioned Januar

T Penzl Numerical solution of generalized Lyapunov equations January

M Scherzer A Meyer Zur Berechnung von Spannungs und Deformationsfeldern an

InterfaceEcken im nichtlinearen Deformationsb ereich auf Parallelrechnern

March

Th Frank E Wassen Parallel solution algorithms for Lagrangian simulation of disp erse

multiphase ows Pro c of nd Int Symp osium on Numerical Metho ds for Multiphase

Flows ASME Fluids Engineering Division Summer Meeting July San Diego

CA USA June

P Benner V Mehrmann H Xu A numerically stable structure preserving metho d for

computing the eigenvalues of real Hamiltonian or symplectic p encils April

P Benner R Byers E Barth HAMEV and SQRED Fortran Subroutines for Comput

ing the Eigenvalues of Hamiltonian Matrices Using Van Loanss Square Reduced Metho d

May

W Rehm Ed Portierbare numerische Simulation auf parallelen Architekturen April

J Weickert NavierStokes equations as a dierentialalgebraic system August

R Byers C He V Mehrmann Where is the nearest nonregular p encil August

Th Ap el A note on anisotropic interp olation error estimates for isoparametric quadri

lateral nite elements Novemb er

Th Ap el G Lub e Anisotropic mesh renement for singularly p erturb ed reaction diu

sion problems Novemb er

B Heise M Jung Scalability eciency and robustness of parallel multilevel solvers for

nonlinear equations Septemb er

F Milde R A Romer M Schreib er Multifractal analysis of the metalinsulator transi

tion in anisotropic systems Octob er

R Schneider P L Levin M Spaso jevic Multiscale compression of BEM equations for

electrostatic systems Octob er

M Spaso jevic R Schneider P L Levin On the creation of sparse Boundary Element

matrices for two dimensional electrostatics problems using the orthogonal Haar wavelet

Octob er

S Dahlke W Dahmen R Ho chmuth R Schneider Stable multiscale bases and lo cal

error estimation for elliptic problems Octob er

B H Kleemann A Rathsfeld R Schneider Multiscale metho ds for Boundary Integral

Equations and their application to b oundary value problems in scattering theory and

geo desy Octob er

U Reichel Partitionierung von FiniteElementeNetzen Novemb er

W Dahmen R Schneider Comp osite wavelet bases for op erator equations Novemb er

R A Romer M Schreib er No enhancement of the lo calization length for two interacting

particles in a random p otential Decemb er to app ear in Phys Rev Lett March

G Windisch Twop oint b oundary value problems with piecewise constant co ecients

weak solution and exact discretization Decemb er

M Jung S V Nep omnyaschikh Variable preconditioning pro cedures for elliptic prob

lems Decemb er

P Benner V Mehrmann H Xu A new metho d for computing the stable invariant

subspace of a real Hamiltonian matrix or Breaking Van Loans curse January

B Benhammouda Rankrevealing topdown ULV factorizations January

U Schrader Convergence of Asynchronous JacobiNewtonIterations January

UJ Gorke R Kreiig Einufaktoren b ei der Identikation von Materialparametern

elastischplastischer Deformationsgesetze aus inhomogenen Verschiebungsfeldern March

U Groh FEM auf irregularen hierarchischen Dreiecksnetzen March

Th Ap el Interp olation of nonsmo oth functions on anisotropic nite element meshes

March

Th Ap el S Nicaise The nite element metho d with anisotropic mesh grading for elliptic

problems in domains with corners and edges

L Grab owsky Th Ermer J Werner Nutzung von MPI fur parallele FEMSysteme

March

The complete list of current and former preprints is available via

httpwwwtuchemnitzdesfbsfbprhtml