Fuzzy Identication from a Grey Box Mo deling

Point of View

P Lindskog

Linkoping University S Linkoping

Intro duction

The design of mathematical mo dels of complex realworld and typically nonlinear systems is

essential in many elds of science and engineering The develop ed mo dels can be used eg to

explain the b ehavior of the underlying system as well as for prediction and control purp oses

A common approach for building mathematical mo dels is socalled black box modeling Ljung

Soderstrom and Stoica as opp osed to more traditional physical mo deling or white

box mo deling where everything is considered known a priori from physics Strictly sp eaking a

blackbox mo del is designed entirely from data using no physical or verbal insight whatso ever The

structure of the mo del is chosen from families that are known to b e very exible and successful in

past applications This also means that the mo del parameters lackphysical or verbal signicance

they are tuned just to t the observed data as well as p ossible

The term blackbox mo deling is sometimes used almost as a synonym to system identication

although a much more convenient denition and the one often used is that system identication is

the theory of designing mathematical mo dels of dynamical systems from observed data Hence by

combining the blackbox approachwithphysical or verbal mo deling in suchaway that certain prior

knowledge from the system is taken into account we end up with sp ecial identication pro cedures

that commonly are referred to as grey box modeling approaches see eg Bohlin Hangos

Two imp ortant facts makesuch metho ds intuitively app ealing

In a realworld mo deling situation we never have complete pro cess knowledge There are

always uncertain factors aecting the system thus indicating that a complete physical mo del

hardly ever can b e constructed However uncertain factors can b e revealed through exp eri

ments and at least partlytaken care of by employing suciently exible mo del families

The mo deling pro cedure on the other hand allows us to restrict the exibility to comply

with the prior knowledge This makes it p ossible to follow at least partly another basic

identication principle namely to only estimate what is still unknown

Traditional grey box approaches assume that the structure of the mo del is given directly as

a parameterized mathematical function which at least partly is based on physical principles

However for many realworld systems a great deal of information is provided byhuman exp erts

who do not reason in terms of but instead describ e the system verbally through vague

or imprecise statements For example in case it is hard to design a suitable mathematical mo del

of a heating system an imp ortant part of its b ehavior can still b e characterized eg through

If more energy is suppliedtotheheater element then the temperature wil l increase

Because so much human knowledge and exp ertise come in terms of verbal rules a sound engi

neering approach is to try to integrate such linguistic information into the identication pro cess

A convenient and common way of doing this is to use fuzzy logic concepts in order to cast the

verbal knowledge into a conventional mathematical representation a mo del structure which sub

sequently can b e netuned using inputoutput data It turns out that the structure so obtained

very well can be viewed as a layered network having much in common with an ordinary neural

and network see eg Brown and Harris Haykin Roger Jang and Sun Lin

Lee Chen As a matter of fact the kinship is so evident that many researchers refer

to this approachas neurofuzzy modeling

With this in mind the palpable question is what is conceptually gained by this approach

compared to standard blackbox neural network mo deling

Firstly and contrary to neural networks neurofuzzy mo deling or just fuzzy modeling oers

a highlevel structured and convenientway of incorp orating linguistic prior into the mo dels

Secondly the basic linguistic knowledge entered is of the form speedt is high In

fuzzy mo deling such a prop osition is given a precise mathematical meaning through a basis

function memb ership function having parameters asso ciated with the prop ertyhigh thus

meaning that the parameters can be assigned reasonable initial values This is imp ortant

in that the parameter estimation algorithm which often is iterative can b e started from a

point where the risk of getting stuck in an undesired lo cal minimum is reduced compared to

if the initial parameters are chosen at random which often is the case for neural networks

Thirdlyphysically unsound regions can b e avoided By randomly cho osing initial parameter

values in a neural network this cannot b e guaranteed and although regularization see b elow

is applied in the estimation phase basis functions corresp onding to unsound regions are

seldom removed from the nal mo del which then b ecomes more complex than necessary

A fourth p otential advantage comes in terms of extrap olation capabilities While data can

b e used to explain certain system features the linguistic exp ert knowledge here the rules

can b e employed to pick up other phenomena that are not revealed in the available data

Finally the human exp ert who supplied the verbal knowledge can always b e consulted for

mo del validation

This contribution concentrates on how to maintain these advantages when fuzzy mo deling is

complemented with system identication techniques More preciselytheaimistoprovide answers

to a numb er of central greyb oxtyp e of questions

What kind of mathematical rule base in terpretation is suited when system identication

asp ects are also taken into account

What parameter estimation algorithms should b e used

How can the knowledge provided by the domain exp ert ie the meaning of the rule base

b e preserved throughout the parameter estimation step

How can dierent nonstructural system features b e built into the mo dels By nonstructural

knowledge we mean eg that the step resp onse is known to b e monotone or that the steady

state gain curve is monotonic in certain input variables or some other qualitative prop erty

To b e able to address these issues werstgive a brief intro duction to the eld of parametric

system identication fo cusing mainly on basic concepts ideas and algorithms from which the

following sections can depart Sect addresses various fuzzy mo deling matters It is argued

that a Mamdani typ e of rule base interpretation Mamdani and Assilian Roger Jang and

Sun is suited when the rules are of the form and when identication asp ects are also

accounted for The remaining main three questions from ab ove are then considered and answered

in Sect whereup on Sect illustrates the usefulness of the suggested framework on a realworld

lab oratoryscale application example Some practical asp ects of the prop osed mo deling approach

are thereafter discussed in Sect and in Sect we nally put forward some concluding remarks

and give a few directions for further research within the fuzzy identication area

In fact the considered mo del representation turns out to b e structurally equivalent to a zeroorder TakagiSugeno

fuzzy structure Takagi and Sugeno Sugeno and Kang which is just a sp ecial case of the general

TakagiSugeno fuzzy mo del family

System identication

Basic ingredients and notation

System identication deals with the problem of how to infer relationships between past input

output measurements and future outputs Ljung Soderstrom and Stoica In practice

this is a pro cedure that is highly iterative in nature and is made up from three main ingredients

the data the mo del structure and the selection criterion all of which include choices that are

sub ject to p ersonal judgments

The data Z By the rowvector

N

m

y t u t u t

z t R

m

we denote one particular data sample at time t collected from a system having one output and m

input signals ie we consider a multi input single output MISO system This restriction is mainly

for ease of notation and the extension to multi output MIMO systems is fairly straightforward

see Ljung Lee Wang Stacking N consecutive samples on top of each other

gives the data matrix

T

N m

T T T

z z z N

R Z

N

It is of course crucial that the data reect the imp ortant features of the underlying system This

will typically b e the case if the input signals are suciently exciting and if large enough data

sets are collected However such a situation is unrealistic in many realworld applications since

rstly the exp erimental time is limited and secondlymany of the inputs are restricted to certain

signal classes Having to live with this reality it is worth stressing that the problem of having

incomplete data very well can b e alleviated considerably by building various prior system prop erties

into the mo dels or rather into the applied mo del structure

The mo del structure g t It is generally agreed up on that the single most dicult step

in identication is that of mo del structure selection Roughly sp eaking the problem can b e divided

into three subproblems The rst one is to sp ecify the type of mo del set to use This involves the

selection b etween linear and nonlinear representations b etween blackbox grey b oxandphysically

parameterized approaches and so forth The next issue is to decide the size of the mo del set This

includes the choice of p ossible variables inputs and outputs and combinations of variables to use

in the mo dels It also involves xing orders and degrees of the chosen mo del typ es often to some

intervals The last item to consider is howto parameterize the mo del set what basis functions

should be used and how should these be parameterized etc With the typ e already determined

here we will in the sequel fo cus on the latter two issues

Mathematically sp eaking a quite general MISO predictor family or model structure is

ytj g t R

where ytj accentuates that the function g is a predictor ie it is based on signals that are

known at time t The predictor structure is ensured by the regressor t which maps output

signals up to index t and input signals up to index t to an r dimensional regression vector

This vector is often of the form

T

y t y t k u t u t k u t u t k

t

m m m

although in general its entries can b e anyattime t known combinations of inputoutput signals

r d

The mapping g fromR to R is parameterized by D R with the set D denoting the

set of values over which is allowed to range due to parameter restrictions With this formulation

the work on nding a suitable mo del structure splits naturally into two subtasks b oth p ossibly

b eing nonlinear in nature

the choice of dynamics ie the choice of regression vector t followed by

the choice of static mapping g

The selection criterion V Z Measured and mo del outputs never match p erfectly in

N N

practice but dier as

tj y t ytj

where tj is an error term reecting unmo deled dynamics on one hand and noise on the other

hand An obvious mo deling goal must b e that this discrepancy is small in some sense This is

achieved by the selection criterion which ranks dierent mo dels according to some predetermined

cost function The selection criterion can come in several shap es although we will here start o

with the usual quadratic measure of the t b etween measured and predicted values ie with

N N

X X

y t ytj tj V Z

N N

N N

t t

Once these three issues are settled wehave in principle dened the searched for mo del It then

only remains to estimate the parameters and to decide whether the mo del is go o d enough or

not If the mo del cannot b e accepted some or even all of the entities ab ovehave to b e reconsidered

in the worst conceivable case one must start from the very b eginning and collect new data

Nonlinear mo del structures The series expansion approach

In the intro duction we stated that fuzzy mo dels havemuch in common with neural networks As

a matter of fact these and many other nonlinear mo deling approaches can be viewed as series

expansions Adopting the ideas of the comprehensive and unifying work of Sjob erg et al

such a function expansion can b e written

n

X

g ytj g t t

j j

j j

j

T

T

T T

where sometimes with abuse of notation wecall a g a basis function These are usually

j

rather simple and typically they are all of one single typ e The basis functions are also lo cal in

the sense that each g essentially covers a certain part of the total regression space Which

j

part is sp ecied by the parameters and where is related to the scale or direction of the

j j j

basis function and sp ecify the p osition or translation of it The remaining parameter is a

j

j

t giving the basis function its nal amplitude shap e co ordinate parameter a weigh

The basic dierence from one series expansion approach to another is the choice of basis func

tions In principle there are three fundamentally dierentways of generalizing simple univariate

basis functions to multivariate ones

Ridge construction A ridge basis function has the form

T

g t t

j j j

j j

r

where is a function in one variable having parameters R and R Notice that the

j

j

ridge nature has nothing to do with the choice of It is attributed to that is constant

T

t is constant thus forming a ridge along for all regression vectors in the subspace where

j

that direction see Fig With n weighted ridge basis functions the dimension of b ecomes

nr Typical examples of this family are eg feedforward neural networks with one hidden

layer Kung Haykin Ljung et al and hinging hyperplane models Breiman

Pucar and Sjob erg a Pucar and Sjob erg b

Fig From left to right ridge construction radial construction and comp osition

Radial construction Radial basis functions do not show the ridge directional prop ertybuthave

true lo cal supp ort as is illustrated in Fig Such a radial supp ort can be obtained by using

basis functions of the form

g t kt k

j j

j j

j

where the weighted norm sp ecify the scaling of the basis function In general is a p ositive

j j

semidenite and symmetric matrix of dimension r r although quite often it is chosen to be

a scaled identity matrix This means that the dimension of is at least nr and at most

nr r Popular choices within this category are kernel estimators Watson radial

basis function networks RBFN Poggio and Girosi Chen and Billings and wavelet

networks Zhang and Benveniste

Comp osition A comp osition tensor pro duct in Sjob erg et al is obtained whenever

ridge and radial constructions are combined when forming the basis functions Atypical example

is shown in the rightmost plot of Fig The most extreme comp osition is

r

Y

g t g t

j jk k jk jk

j j

k

where each g R is either a ridge or a radial function In a more general setting suchan

jk

r

element needs not livein R but can b e dened in a any subspace of R If all n basis functions

are of the commonly encountered form then it is easy to verify that the dimension of

b ecomes nr Within this mo del class we nd certain regression tree approaches Breiman

et al Stromb erg et al and as will b e discussed in the following section the kind of

fuzzy identiers considered in this contribution

General parameter estimation techniques

After having determined the typ e of basis functions to apply the next step is to use inputoutput

data to estimate what is still unknown It is here useful to distinguish the estimation needs by the

kind of parameters involved in the mo dels The following three categories can b e identied

Structure estimation This is the case when the typ e of basis functions to use have b een decided

but where the size ie the numb er of basis functions n to employ is estimated Selecting the r

b est regressors out of a set of p ossible regressors is a typical example in this category It should

b e noted that structure estimation often can viewed as a combinatorial optimization problem that

in complexity grows exp onentially eg with the numb er of regressors This means that exhaustive

algorithms so on b ecome impractical which has motivated schemes that provide if not optimal

then at least go o d enough solutions However another way to reduce the complexity is to use prior

structural system knowledge as is the case when a grey b ox approach is adopted

Nonlinearintheparameters estimation Having decided the size of the mo del structure it remains

to nd reasonable parameter values With the scalar loss function as the p erformance

criterion the parameter estimate is given by

N

arg min V Z

N N N

D

where arg min is the op erator that returns the argument that minimizes the loss function This

isavery imp ortant and wellknown problem formulation leading to prediction error minimization

PEM metho ds The typ e of PEM algorithm to apply dep ends on whether the parameters

enter the mo del structure in a linear or a nonlinear way The latter situation leads to a nonlinear

leastsquares problem and app ears whenever the mo del structure contains unknown direction

or translation parameters

Linearintheparameters estimation In case all parameters enter the structure in a linear fashion

one usually talks ab out a linear leastsquares problem For the series expansion such an

approach is applicable if only co ordinate parameters are to b e estimated

j

It should b e emphasized that the complexity of the estimation problem decreases in the listed

order yet at the price of that the amount of prior needed to arriveat a useful mo del typically

increases With these preliminary observations we next present some dierent minimization algo

rithms unconstrained as well as constrained ones

Unconstrained linear leastsquares algorithms The parameters of an unconstrained

linear leastsquares structure a can b e estimated eciently and analytically by

solving the normal equations

T

ty t t t

N

for t N The optimal parameter estimate is

N N

X X

T

ty tR t t f

N

N

N

t t

provided that the inverse of the d d regression matrix R exists For numerical reasons this

N

inverse is rarely formed but instead the estimate is computed via socalled QR or singular value

decomp osition SVD Golub and Van Loan Bjork which both are able to handle

rank decient regression matrices

Unconstrained nonlinear leastsquares algorithms When the parameters app ear in a

nonlinear fashion the typical situation is that the minimum of the loss function cannot b e computed

analytically Instead wehave to resort to certain iterative search routines most of which can b e

seen as sp ecial cases of Newtons algorithm see among many others Dennis and Schnab el

Scales Fletcher

h i

i i i i i i

i

V Z V Z Z

N N N

N N N N N N N N N

i

d d

where R is the parameter estimate at the ith iteration V is the gradient of R

N

N

dd

the loss function and V R the Hessian of it b oth computed with resp ect to the current

N

parameter vector More sp ecically the gradientisgiven by

N

X

i i i

Z tj J tj V

N

N N N N

N

t

i

d

R b eing the Jacobian vector with J tj

N

T

i i

i

ytj ytj

N N

J tj

N

i i

d

Dierentiating the gradient with resp ect to the parameters yields the Hessian

i

N

X

i i i i

J tj

N

T

J tj J tj V Z tj

N

N N N N N

i

N

t

N

whichthus means that the second derivative of the loss function is needed in Simply put

Newtons algorithm searches for the new parameter vector along a Hessian mo died gradient of

the current loss function

The availability of derivatives of the loss function with resp ect to the parameters is of course of

paramount imp ortance in all Newtonbased estimation schemes In case arbitrary though dier

entiable predictor structures are considered these mayvery well b e to o hard to obtain analytically

or to o exp ensive to compute One way around this diculty is to numerically approximate the

derivatives by nite dierences The simplest such a metho d is just to replace eachofthe d elements

of the Jacobian by the forward dierence

i i i

i

y tj ytj h e ytj

j j

N N N

J tj

j

N

i

h

j

j

with e b eing a column vector with a one in the j th p osition and zeros elsewhere and with

j

h being a small p ositive scalar p erturbation Because the parameters may dier substantially

j

in magnitude it is here exp edient to individually cho ose these p erturbations Atypical choice is

p

i

maxh where is the relative machine precision and h is the smallest h

min min j

j

p erturbation allowed consult Dennis and Schnab el Scales for further details on this

If a more accurate approximation is deemed necessary one can employ the central dierence

i i i

i

ytj ytj h e ytj h e

j j j j

N N N

J t

j

N

i

h

j

j

at the cost of d additional function evaluations

It now turns out that the Newton up date has some severe drawbacks most of which

are asso ciated with the computation of the Hessian Firstly it is in general exp ensive to

compute the derivative of the Jacobian It may also happ en that the inverse of the Hessian do es

not exist so if further progress towards a minimum is to be made the up date vector must be

constructed in a dierentway Furthermore even if the inverse exists it is not guaranteed to b e

p ositive denite and it may therefore happ en that the parameter up date vector is such that the

loss function actually b ecomes larger Finally although the parameter up date vector is a descent

h to o large lo cating the new parameters at a p oint with higher loss than what one it mightbemuc

is currently the case Toavoid these problems other search directions than are muchmore

common in practice

Gradient metho d Simply replace the Hessian by an identity matrix of appropriate size This

however do es not prevent the up date vector from b eing so large that also V b ecomes larger

N

Toavoid such a b ehavior the up dating is often complemented with a line searchtechnique

i i i

i

Z V Z

N N

N N N N

i i

The choice of step length is where thereby giving a dampedgradient algorithm

i

not critical and the pro cedure often used is to start with and then rep eatedly halveit

until a lower value of the loss function is obtained

GaussNewton metho d By neglecting the second derivative term of the Hessian and in

cluding line searchasabovewearriveata damped GaussNewton algorithm with up date vector

N N

X X

i i i i i

i

T i

J tj J tj Z tj J tj

N

N N N N N N

t t

which is of the same form as the linear leastsquares formula To cop e with a singular or near

to singular Hessian approximation the inverse is normally replaced by the socalled pseudoinverse

which easily can b e obtained by computing the SVD of the Jacobian Golub and Van Loan

Levenb ergMarquardt metho d The LevenbergMarquardt algorithm handles simultaneously the

up date step size and the singularity problems through the up date

N N

X X

i i i i i i

i T

tj Z J tj J tj I J tj

N

N N N N N N

t t

i

where the Hessian is guaranteed to b e p ositive denite since As is the case for the ab ove

i

pro cedures it can be shown that this up date is in a descent direction However must be

carefully chosen so that the loss function also decreases The metho d by Marquardt Scales

i

achieves this by starting with a whereup on it is reduced typically a factor at the

b eginning of each iteration thereby aiming at mimicking a GaussNewton up date step If this

i

is rep eatedly increased typically a factor until results in an increased loss then the step

i i

which means that the up date is forced towards a scaled gradient V Z V Z

N N N N

N N

i

direction Other and more elab orate choices of are discussed in eg Fletcher

Although simple a ma jor drawback with the gradient metho d is that the convergence rate can

be fairly p o or close to the minimum This fact favors the latter two metho ds which esp ecially

near the minimum show similar convergence prop erties as the full Newton algorithm Dennis

and Schnab el For illconditioned problems Dennis and Schnab el recommend the

Levenb ergMarquardt mo dication How ever this choice is far less obvious when the pseudo

inverse is used in the GaussNewton up date In such a case b oth metho ds try to up date the

parameters that really inuence the criterion t most whereas the remaining parameters are kept

unchanged This means that socalled regularization is built into the algorithms see b elow

A last algorithmic issue to consider here is when to terminate the search In theory V is

N

zeroataminimum so an obvious practical test is to terminate once jV j is suciently small

N

Another useful test is to investigate the relativechange in parameters from one iteration to another

and terminate if this quantity falls b elow some tolerance level The algorithms will also terminate

when a certain numb er of maximum iterations has b een carried out or if the line search algorithm

fails to decrease the loss function in a predetermined numb er of iterations

It is worth stressing that the three schemes ab o ve all return estimates that are at least as

go o d as the starting p oint Nonetheless should the algorithm converge to a minimum then it is

imp ortanttorememberthatconvergence needs not b e to a global minimum but can b e to a lo cal

one

Constrained minimization algorithms In a grey b ox mo deling situation the param

eters usually have physical or linguistic signicance To really maintain such a prop erty it is

necessary to take the corresp onding parameter restrictions into accountin the estimation pro ce

dure ie constrained optimization metho ds are needed

Therefore assume that there are l parameter constraints collected in a vector

T

l

c c c

c R

l

elldened function suchthat c for j l hence sp ecifying a where each c isaw

j j

feasible parameter region D There exist quite a few schemes that handles such constraints see

eg Scales An old but simple and versatile idea is to rephrase the original problem into

a sequence of unconstrained minimization problems for which a Newton typ e of metho d like the

gradient one can b e applied without to o much extra co ding eort

This is the basic idea b ehind the barrier function estimation pro cedure Algorithmically the

metho d starts with a feasible parameter vector whereup on the parameter estimate is itera

N

tively obtained by solving each iteration is started with the estimate of the previous iteration

l

X

k

k

A

arg min W Z arg min V Z c

N N N N j

N

D D

j

k k

where typically with k starting from and then increasing byforeach iteration

until convergence is obtained In order to maintain a feasible estimate the barrier function

added to the ob jective function W as the is chosen so that an increasingly larger value is

N

b oundary of the feasibility region D is approached from the interior at the b oundary itself this

quantity should b e innite Agoodchoice of barrier function for many kinds of problems seems

to b e the logbarrier function c lnc see Scales for further details on this

j j

k

At this stage one maywonder why it is not sucienttoset to a much smaller value in the

b eginning One reason is that if the true minimum is near the b oundary then it could b e dicult

to minimize the overall cost function b ecause of its rapidly changing curvature near this minimum

thus giving rise to an illconditioned problem One could also argue that the metho d is to o complex

as an outer iteration is added This is only partially true as the inner estimate esp ecially at the

rst few outer iterations needs not be that accurate A rule of thumb is to only p erform some

ve iterations in the inner lo op Finally the outer lo op is terminated once the parameter up date

is suciently small or when a number of maximum outer iterations has b een carried out

The biasvariance tradeo

The series expansion approach has b een widely used in nonlinear blackbox identication where the

idea is to employ a parameterization that covers an as broad system class as p ossible In practice

however the typical situation is that merely a fraction of the available exibility is really needed

ie the applied mo del structures are often overparameterized This fact p ossibly in combination

with an insuciently informative data set Z leads to illconditioning of the Jacobian and the

N

Hessian This observation also suggests that the parameters should b e divided into twosets the set

of spurious parameters which do not inuence the criterion t that much and the set of ecient

parameters which do aect the t Having such a decomp osition it is intuitively reasonable to

treat the spurious or redundant parameters as constants that are not estimated The problem with

this is now that it is in general hard to make this decomp osition b eforehand

However using data one can overcome the illconditioning problem and automatically unveil an

ecient parameterization by incorp orating regularization techniques or trust region techniques

Dennis and Schnab el When suchaneectisbuiltinto the estimation pro cedure as in the

Levenb ergMarquardt algorithm we get socalled implicit regularization as opp osed to explicit

regularization which is obtained by adding a p enalty term to the criterion function eg as

N

X

W Z tj V Z

N N N N

N

t

d

where is a small usertunable parameter ensuring a p ositive denite Hessian and R

is some a priori determined parameter vector p ossibly representing prior parameter knowledge

Here the imp ortant p oint is that a parameter not aecting the rst term that muchwillbekept

close to by the second term This means that the regularization parameter can b e viewed as

a threshold that lab els the parameters to b e either ecient or spurious Sjob erg et al A

large simply means that the numb er of ecient parameters d b ecomes small

From a system identication viewp oint regularization is a very imp ortant means for addressing

the ever present biasvariance tradeo as is emphasized in Ljung Ljung et al

There it is shown under fairly general assumptions that the asymptotic criterion mist essentially

dep ends on two factors that can b e aected bythe choice of mo del structure First wehave the

bias error which reects the mist b etween the true system and the b est p ossible approximation of

it given a certain mo del structure Typically this error decreases when the numb er of parameters

d increases The other term is the parameter variance error which usually grows with d but

decreases with N There is thus a clear tradeo b etween the bias and the variance contributions

At this p oint supp ose that a exible enough mo del structure has b een decided up on Decreasing

the numb er of parameters that are actually up dated d by increasing is b enecial for the total

mist as long as the decrease in variance error is larger than the increase in bias error In other

words the purp ose of regularization is to decrease the variance error contribution to a level where

it balances the bias mist

Fuzzy mo deling framework

The history of metho ds based on fuzzy concepts is rather short It all started in the mid s with

Zadehs pioneering article Zadeh in which a new wayofcharacterizing nonprobabilistic

uncertainties via socalled fuzzy sets was suggested Since then and esp ecially in the last ten or

so years there has b een a dramatic growth of subdisciplines in science and engineering that have

adopted fuzzy ideas To a great extent this development is due to a large number of successful

industrial applications spanning suchdiverse elds as rob otics consumer electronics signal pro

cessing bio engineering image pro cessing pattern recognition management and control See the

comprehensive compilations Marks I I Chen

The elds of fuzzy control and fuzzy identication have b een develop ed largely in parallel A

go o d rst b o ok on fuzzy control is Driankov et al and a shorter but informativeoverview is

given by Lee Various fuzzy identication metho ds have b een prop osed byseveral authors

The work by Sugeno and coworkers Takagi and Sugeno Sugeno and Kang Sugeno

and Yasukawa and byWang constitute some of the most inuential contributions

The merging of fuzzy control and fuzzy identication is discussed eg in Wang and in

Roger Jang and Sun Many of the ideas detailed in this section can b e found in the latter

reference which is exceptionally well written and highly recommended With these sources as a

basis the aim of the section is to derive and motivate the use of one particular fuzzy rule base

interpretation that is suited for identication purp oses

Comp onents of a fuzzy mo del

The basic conguration of a fuzzy model is shown in Fig The mo del involves six comp onents

of which the four lowermost ones are fuzzy mo del sp ecic

Scaling The physical values of the actual inputs and outputs may dier signicantly in mag

nitude By mapping these to prop er normalized domains via scaling one can instead work with

signals that roughly are of the same magnitude which is desirable from an estimation p oint of

view However the need for scaling is highly problem dep endent and therefore not considered

and that any further in here ie from nowonwe assume that t is formed directly from z t

y tj y tj

s

Regressor generator The kind of dynamics to include in a fuzzy mo del is engendered in the

regressor generator The regression vector t can contain any at time t known combinations

of inputoutput measurements z t although for such a combination to make sense it ought to

have a linguistic interpretation This is ascrib ed to that the entries of t are sp ecied by the

linguistic database or actuallyby socalled linguistic variables see b elow Suc hatypical variable

is speedt which in terms of inputoutput data maybeinterpreted as z t This also

means that the mathematical purp ose of the remaining comp onents of a fuzzy mo del is to provide

r

a static map from t R to ytj R

Linguistic database The linguistic database is the heart of a fuzzy mo del The exp ert knowledge

Crisp

Data

Scaling Scaling

z t y tj

s

Regressor

generator

r

Crisp Crisp

t R ytj R

Linguistic variables

Linguistic connectives

Fuzzier Defuzzier

Fuzzy rule base

Fuzzy

inference

r

Fuzzy sets in u U

Fuzzy sets in y Y

engine

Fig Structure of a MISO fuzzy mo del Thin arrows indicate the computational ow and thick arrows

the information ow The grey b ox is a linguistic database reecting prior knowledge

which is assumed to b e given as a numb er of ifthen rules is stored in a fuzzy rule base These

rules are subsequently given a precise mathematical meaning through usersupplied denitions of

the employed linguistic variables and connectives and or etc

Fuzzier The fuzzier maps the crisp values of tinto suitable fuzzy sets discussed b elow

F uzzy inference engine The fuzzy sets provided by the fuzzier are then interpreted by the fuzzy

inference engine which uses the fuzzy rule base knowledge in order to pro duce some fuzzy sets in

the output y

Defuzzier As a last step the defuzzier converts the output fuzzy sets to a standard crisp signal

ytj R

From this short description it should b e clear that fuzzy sets are vital ob jects to comprehend in

order to understand how a fuzzy mo del op erates Let us therefore discuss such sets in more detail

Fuzzy sets and memb ership functions

An ordinary set is a set with a crisp b oundary ie an element can either b e or not b e a member

of that set A fuzzy set on the other hand do es not show this absolute eitheror membership

prop erty The transition from b elonging to to not b elonging to a fuzzy set is instead gradual

where the degree of b elonging is characterized byamembership function Mathematically sp eaking

the denition is as follows Driankovetal

Denition If u is an element in the universe of discourse U then a fuzzy set A in U is the

set of ordered pairs

u Ug A fu u

A

where u is a memb ership function carrying an elementfromU into a memb ership value b etween

A

no degree of memb ership and full degree of memb ership

Example Supp ose that wewant to describ e a car traveling at high sp eed on the motorway

As a rst step let u denote the sp eed of any car and intro duce U R which states

that no car can go faster than kmh By Nordic standards a car running at say kmh

is considered to have a high sp eed while this is not the case when the sp eed is say kmh

Moreover in case the car is running at around kmh most p eople would say that the sp eed is

neither low nor high Based on this information a fuzzy set describing that the sp eed of a car is

high is eg

high u u u

high

u

e

This sub jectivechoice of membership function gives that cars running at and kmh

are considered to go at a high sp eed to a degree of and resp ectively

An imp ortant point illustrated in this example is that the fuzziness do es not emanate from the

fuzzy set itself but rather from the vagueness of what it describ es This is manifested by the sub

jective and nonrandom nature of the choice of memb ership function whichmayvary considerably

dep ending on who determined it This is also the main philosophical dierence between fuzzy

memb erships and probabilities whichconvey ob jective information ab out random phenomena

As noted ab ove the memb ership function MF can b e any function pro ducing a value b etween

and Here we will fo cus on three common classes of MFs all b eing convex in nature ie the

memb ership functions are of the form increasing decreasing or b ellshap ed see Driankov

et al for the mathematical denition First wehav ewhatmay b e called the networkclassic

MFs which b ecause of their smo othness are b ecoming increasingly p opular in fuzzy mo deling

Denition Networkclassic MFs This class consists of the sigmoidal and the Gaussian

memb ership functions dened as

mfsigu u

A

u

e



u



mfgaussu u e

A

where and are related to the scale and the p osition of the memb ership function resp ectively

The second class widely used in fuzzy logic theorywas originally suggested by Zadeh thus meriting

the lab el ZadehformedMFs

Denition Zadehformed MFs The Zadehformed MFs are the Z the S and the

functions named after their shap e in order dened as

u

u



u



mfzu u

A

u

 

u



u

mfsu u mfzu

A

mfsu u

mfpiu u

u

A

mfzu u

The last category is piecewise linear MFs which primarily b ecause of realtime asp ects have b een

extensivelyusedinvarious fuzzy control applications Driankov et al

Denition Piecewise linear MFs The piecewise linear MFs are the op en left the op en

right the triangular and the trap ezoidal functions

u

mflu u max min

A

u

mfru u max min

A

u u

mftriu u max min

A

u u

mftrapu u max min

A

Notice that with the terminology from the previous section an MF is really nothing but a basis

function and since it involves one variable u only it is of comp osition typ e As will b e evident

in the following section fuzzy sets constitute the main building blo ck of a linguistic variable

Linguistic variables and fuzzy prop ositions

Linguistic variables are fundamental in approximate or fuzzy reasoning In a generalized form cf

Driankov et al suchavariableisconveniently describ ed by a threetuple

hU AU Di

where U is the name of the variable A isasetoflinguisticvalues eachofwhichischaracterized

by a fuzzy set that can be assigned to U and D provides information on how to connect the

linguistic domain to the physical measurement domain

Example The linguistic variable speedt of a car on a motorway is eg

hspeed t AU flow medium highgD tz t i

low fu u mflu u Ug

low

medium fu u mftrapu u Ug

medium

high fu u mfru u Ug

high

where U

The assignmentofvalues to a linguistic variable is simply achieved by an atomic fuzzy prop o

sition using the syntax U is property eg speedt is low

Several atomic fuzzy prop ositions can now be combined using linguistic connectives such as

not and and or thus forming more complex prop ositions as eg

U is not A and U is A

where A and A refer to two dierent fuzzy sets which normally are dened in dierentuniverses

U and U resp ectively While it is mathematically natural to interpret U is not A as the fuzzy

set u with u b eing the MF asso ciated with A there are many dierentways

A A

of interpreting and and or Often however a fuzzy conjunction and is dened in terms of a

triangular norm which combines MFs as u u see Driankov et al for the

A A



details The most widely used triangular norms are intersection the min op erator and algebraic

pro duct multiplication Similarly a fuzzy disjunction or is usually dened as a triangular co

The most commonly encountered conorms norm u syntactically written u u u

A A



are union the max op erator and algebraic sum u u u u

A A A A

 

If in the ab ove op erations u and u are dened in dierentuniverses then a triangular norm

or conorm p erforms a mapping from to Otherwise the mapping is from

to By combining several atomic fuzzy expressions using suitable connectives others than

those ab ove can of course also b e dened it is p ossible to construct arbitrarily complex fuzzy sets

In doing so the imp ortantpoint is that the result always is a new fuzzy set although the space in

which it is dened is not restricted to one or two dimensions

Fuzzy mo del structure

Wearenow in a p osition to discuss the computational units of Fig As with the interpretation

of the basic connectives there is also here a number of choices to b e made However various grey

box identication asp ects eg the complexity of computing predictors Jacobians and p ossibly

Hessians the approximation capability as well as the interpretability of the estimated mo dels

naturally lead to one particular typ e of fuzzy mo del structure as will b e derived next

Fuzzication The fuzzication unit is conceptually quite simple For each value of t

k

k r it returns a fuzzy set a fuzzy fact denoted A with MF  u U Thus if

k k

A

k

k

we are given N measurements and use r regressors then this implies that N r fuzzy sets will b e

generated by the fuzzier These sets can in principle b e constructed either by a singleton fuzzier

or by a nonsingleton ditto The latter approachmay b e used to capture noise prop erties of the

regressors Wang though at a rather high computational cost This fact motivates the use

of a singleton fuzzier which simply returns a in one single p ointin U

k

if u t

k k

u



k

A

k

otherwise

Fuzzy rule base A fuzzy rule base R consists of a set of say n fuzzy rules Using a somewhat

unortho dox gridoriented multiindexing lab eling system such a rule base often takes on the form

R If U is A and and U is A then Y is B

r r

R If U is A and and U is A then Y is B

r r

R If U is A and and U is A then Y is B

n r n r n

r r r

R If U is A and and U is A then Y is B

n n r r n

R If U is A and and U is A then Y is B

n n r r n

R If U is A and and U is A then Y is B

n n n r n r n n

r r r

where A A are the linguistic values that can be assigned to the linguistic variables

n r

r

U U while B B denote the linguistic values that can be alloted to the

r n n

r

linguistic output Y As usual the mathematical meaning of any A and B are given by

j k j j

r

k

suitable memb ership functions denoted u and y resp ectively

A k B

j j

j k

r

k

needs not b e linguistically With the rule base it is worthwhile stressing that each B

j j

r

unique On the contrary the typical situation is that many rules share the same linguistic conse

quence yet they have dierentantecedents Furthermore each U can b e assigned to n dierent

j j

linguistic values which means that the number of rules in a complete fuzzy rule base b ecomes

Q

r

n n In case the rule base is incomplete there will b e regions eg physically imp ossible

k

k

ones in the overall regression space for which no output can b e inferred Notice also that a rule

base with and connectives only is not that restricted as one might rst susp ect The reason

is that many other constructs eg not and or can b e logically converted eg by De Morgans

Law to the desired form as is shown in Wang Lindskog

Interpreting fuzzy ifthen rules Each of the rules of describ es essentially a relation

between U U on one side and Y on the other side This suggests that a rule R should

r j j

r

b e dened as a fuzzy relation with MF uy u u y

A B A A B r

j j r j j

j j

r r

dened on U U Y There now exist some dierentways of interpreting implication

r

Lee but striving for computational simplicitywe also here cho ose the simplest and most

widely used translation namely

u y uy u y u

r r B A B A B A A

j j

j j j

j j

r

which is sometimes referred to as Mamdani implication

Inference engine Faced with some facts from the fuzzier and a fuzzy rule base the fuzzy

inference engine is resp onsible for inferring conclusions in terms of output fuzzy sets The most

used inference mechanism is generalizedmodus ponens GMP which as the name indicates is a

generalization of the classical mo dus p onens rule to the fuzzy domain see eg Driankovetal

More sp ecically the GMP inference scheme takes some fuzzy facts as input maps it via a

fuzzy relation the rule representation and returns a fuzzy conclusion

U is A and and U is A

r

r

If U is A and and U is A then Y is B

j r j r j j

r r

Y is B

j j

r

Mathematically sp eaking Dub ois and Prade this scheme rst combines the overall MFs

of the given fact and the rule Pro jecting the so generated MF onto the y axis gives a new MF

y Y which can be thought of as the p ossibility distribution of the fuzzy output 

B

j j

r

given the fuzzy fact Representing the fuzzy fact byaMF  u u u dened

 

r

A

A

A

r

on U U w ethus get

r

uy uy sup  u y proj  u and



A B A B

A A

j j B

j j

j j

r

r r

uU uU

which is known as supstar composition

Assuming now that the facts are represented by fuzzy singletons  u will b e in one single

A

r

p ointinU namely when u t and zero elsewhere Because a triangular norm returns a

memb ership degree when one of its op erands is simplies to

y  y ty 

A B

B B

j j

j j

j

r

which means that  y is obtained by slicing uy along the uco ordinate sp ecied by

A B

B

j j

j

t Compare this with a standard function evaluation Notice also that the supstar computation

b ecomes muchmoreinvolved in case a nonsingleton fuzzier is used

Finallyby using the Mamdani rule interpretation can b e written

y w t  y y y w w

B A B A A A B

B j j r

j j j j j

r

j

where w is a weight known as the degree of full lment or ring strength of the rule

A

j

Clearly the higher value on this weight the higher value on y In particular with w



A

B

j

j

y we get the intuitively reasonable result that y equals



B

j B

j

Defuzzication The last issue to consider is how to aggregate the generated fuzzy sets into a

form that can b e converted into a crisp output ytj R For this purp ose weuseadefuzzier

which returns the crisp value that in some sense b est corresp onds to the p ossibility distribution of

the combined output fuzzy sets Because there is no universally correct way of doing this quite a

few dierent defuzzication schemes have b een suggested in the literature Driankov et al

Dominating in the fuzzy control genre is center of area COA or center of sums COS defuzzi

cation whicharechosen much b ecause of p erformance reasons Lee The center of sums

Rule

A

Q

A



w

A

r

A

r

P P

t

ytj

Normalization

Rule n

A

n

Q

n n

A

r

n 



w

A

n

r

A

n r

r

z z

z z

Input layer Rule layers Rule combination layers Output layer

Fig Fuzzy mo del structure suitable for identication purp oses

metho d op erates on the rules on an individual basis

Z

n n

r

X X

y y dy 

B

j j

r

Y

j j

r

ytj

Z

n n

r

X X

 y dy

B

j j

r

Y

j j

r

and is here preferred to the COA metho d since it do es not involve a complex rule aggregation part

Network representation To predict y t using it now only remains to sp ecify which

op erator to use Although max is often employed for in control applications Lee it is not

that suited from an estimation p oint of view The main reason is that it intro duces discontinuities

whichmay lead to problems when computing gradients Another problem that may o ccur when

up dating MF parameters is that certain MFs can aect the overall mapping at one iteration

while at the next iteration they have no impact on the mapping whatso ever It is not hard to

realize that suchabehavior can confuse the estimation algorithm quite a bit Algebraic pro duct

multiplication on the other hand shows none of these deciencies thus making it b etter suited

for identication purp oses

The integrals of are in general also rather costly to compute This is serious as the

number of integrals that must be computed is at least the number of parameters d times the

numb er of iterations needed in the estimation pro cedure However by restricting the MFs of the

rule consequences to b e fuzzy singletons ie each y is in one single yet unknown p oint

B

j

Y the predictor structure simplies to the Mamdani fuzzy model structure

j

n n n n

r

r r

X X X Y X

t w

j j A k j A

r j k j k

j k j

k k

k

j j j j

k

r r

ytj

n n n n

r

r r

X X X Y X

t w

k A A

j k j k

j k j

k k

k

j j j j

k

r r

The corresp onding network of is repro duced in Fig from which it is evident that it

b elongs to the comp osition typ e of series expansions that was discussed in Sect although here

all parameters have linguistic meanings The simplicity of structure is the main reason for

whywe fo cus on it in the following sections Another and more theoretical reason is the fact that

it is actually capable of approximating any real continuous function on a compact closed and

b ounded domain to any degree of accuracy Kosko Wang Wang and Mendel

To this end observe that each can b e viewed as a lo cal mo del a leaf in the mo del regression

j

tree terminology which is active to a degree w An obvious generalization is to replace the

A

j

constants with more complex lo cal mo dels such as linear regressions This gives what is known

j

as the TakagiSugeno mo del structure Takagi and Sugeno Sugeno and Kang Sugeno

and Yasukawa which also has b een suggested by Johansen Johansen and Foss

who suppressed the fuzziness and instead referred to the approach as op erating regime

based identication The concluding part of each fuzzy rule is thus replaced by a linearinthe

parameters predictor structure for which it is hard to nd a linguistic interpretation

Fuzzy identication based on prior knowledge

This section considers various system identication issues based on the mo del structure In

Sect we further discuss and motivate the use of a rule base provided by an exp ert as opp osed to

a pure blackbox approach where the rule base itself is estimated The following three subsections

constitute the core of the section Their purp ose is to provide answers to three fundamental grey

boxtyp e of questions how to estimate the parameters of structure Sect howto

preserve the exp ert knowledge when up dating these parameters Sect and lastly howto

ensure certain nonstructural system features in this case how to guarantee a monotone steady

state gain curve Sect Finally Sect presents three fuzzy hybrid mo deling approaches

of which two aim at reducing the number of parameters to estimate and one aims at mo deling

dynamics that could not b e captured by the fuzzy mo del

Black versus grey box approaches

There are two main ways of determining the regressors the numb er of MFs and the numb er of rules

to use in the mo del structure The blackboxway implies that the structure is estimated

from data and the mo deling grey b ox waythatitisprovided by an exp ert

Structure estimation Structure estimation can b e split into two separate tasks The rst one

is to determine what regressors t to construct from the data z t This is clearly not a fuzzy

sp ecic problem but is present in all black box approaches The matter is usually resolved by

restricting the regressors to delayed versions of the measurements What regressors to include can

b e determined in many dierentways eg by statistical screening tests Drap er and Smith

by certain clustering techniques Aguirre and Billings by trial and error and so forth Here

the key thing to recognize is that regressor selection is a combinatorial problem growing rapidly

in complexity with the numb er of p ossible regressors

second task is to nd the number of dierent membership Having chosen r regressors the

functions asso ciated with each regressor the number of dierent centers and the number of

j

rules n As mentioned earlier the numb er of rules in a complete rule base is

r

Y

n n

k

k

which clearly suers from the curse of dimensionality problem ie the numb er of p ossible rules

increases exp onentially with r Observe that even mo derate values of r and n give a large n

k

eg r n n result in n Moreover a large n often leads to many MF

r

parameters to estimate which in turn easily leads to overt problems recall the biasvariance

tradeo For many complex mo deling problems the typical situation is that only some of the n

rules are imp ortant for describing the underlying system The mo deling of glycemic variations in

the human b o dy detailed in Sjob erg et al eg includes only out of p ossible rules

Hence weshouldkeep the m out of n rules that are most inuential ie we should again solvea

combinatorial optimization problem

Now in the light of the fact that iterative searchschemes typically must b e employed for MF

parameter estimation see the next section it is highly impractical to apply exhaustive search

algorithms for rule base structure estimation This has motivated metho ds that try to explore the

most promising alternatives in one way or another It seems that three main classes of structure

estimation algorithms have b een suggested in the literature those that successively try to partition

the regression space in a tree like manner Takagi and Sugeno Sugeno and Kang

Higgins and Go o dman Sun those that provide nonoptimal but often go o d enough

solutions simulated annealing genetic algorithms etc Kirkpatrick et al Ishigami et al

Lin and Lee and those that apply clustering techniques Sugeno and Yasukawa

Yoshinari et al BabuskaandVerbruggen Kaymak and Babuska

Even if these metho ds usually are far less computationally demanding than exhaustive search

they are still quite complex chiey due to that structure estimation in general cannot b e separated

from the task of adjusting MF parameters Another problem is that it is not that easy to sayhow

many dierent output centers to estimate The fuzzy mo deling advantages listed in the rst

j

section are also weakened or even lost if a blackbox approach is followed

Mo deling The mo deling approachovercomes these diculties at least partly The computa

tional cost is reduced at the price of the time it takes for the exp ert to provide the knowledge

At the linguistic level the knowledge required is the name of the linguistic variables the corre

sp onding linguistic values and the rule base itself This is later complemented with descriptions

of how to create the regressors the initial shap e of the MFs and other numeric information eg

parameter restrictions Because natural language is rather coarse it can here b e argued that the

rule base often b ecomes relatively small and that sup eruous parameters therefore can b e more

easily av oided compared to if a blackboxapproach is adopted

Perhaps the most imp ortant advantage with the mo deling approach is due to the curse of

dimensionality and comes in terms of extrap olation capabilities To see this recall that the re

r

gression vector lives in R Even for a mo derate r the observations t are by necessity sparse in

r

any b ounded region of R For example lling up the unit cub e in R using a grid of granularity

requires a million measurements Since such excessive amount of data cannot b e collected in

practice there will always be regression regions having no real data supp ort By using a black

box metho d rules corresp onding to such regions are likely to be removed b ecause they do not

improve the optimization criterion This is in contrast to the mo deling approach which allows

the combination of dataexp ert and pure exp ert explained regression regions However in case it

is hard to determine the latter regions it must here be required that only data supp orted MF

parameters are up dated by the estimation algorithm

A p otential drawback with the mo deling approachisnow that it can b e quite arduous to capture

all the imp ortant system features esp ecially in a complex mo deling situation It is then reasonable

to condense what is actually linguistically known into a fuzzy rule base and then try to describ e

the remaining dynamics within some other mo del structure eg by letting a blackbox structure

a standard neural network etc op erate in parallel with the fuzzy ditto Suchhybrid approaches

will b e further investigated in Sect Altogether these facts and p ossibilities suggest that the

strength of fuzzy identication really shows up when the mo deling path is followed

Parameter estimation

The mo del structure is a regular mathematical function with tunable parameters of the

form By adopting the quadratic p erformance criterion and for a moment neglecting

that the parameters have linguistic meanings this merely reduces to a standard unconstrained

nonlinear leastsquares problem where the nonlinear nature stems from the fact that the scale

and the p osition parameters enter the predictor in a nonlinear fashion

The nonlinear leastsquares estimation algorithms presented in Sect rely on the assumption

d

that the Jacobian can b e constructed for any D R A p ossible complication with

the fuzzy predictor o ccurs if t and are such that the denominator of the predictor

evaluates to zero This takes place in the rare situations when also the numerator is zero and is

due to that no rule is able to explain the current t The natural way around the dicultyis

simply to exclude samples causing an undened ytj

The Jacobian can now b e constructed if the derivatives of the individual MFs with resp ect to

their parameters exist This is always the case for the parameters owing to that they enter the

j

predictor linearly and thus it suces to investigate the derivatives of the MFs at the input side

Both networkclassic Denition and Zadehformed Denition MFs havewelldened

and continuous derivatives with resp ect to any of their parameters at least if pathological cases

are excluded eg of a Gaussian of a Zformed MF etc This is however not the

case for piecewise linear MFs Denition and a rather common misunderstanding is therefore

that Jacobian based estimation algorithms cannot b e used in such a case To put an end to this

misconception we rst observe that the only p oints without welldened derivatives corresp ond to

data lo cated at the breakp oints corners of these MFs Since the breakp oints are nitely many

in any universe of practical interest it follows that there will b e no data p oints at these p ositions

in the generic case Nevertheless in order to really guarantee a welldened estimation problem

we simply adopt the convention that derivatives at the breakp oints are zero which means that

anysuch data is excluded from the criterion t Wethus conclude that the Jacobian can

b e formed regardless of which of the mentioned MFs that are used and that algorithms built on

it can b e employed to estimate the parameters of the structure

Based on these observations a suitable MF parameter estimation pro cedure is as follows

Fix all and parameters and estimate using an unconstrained linear leastsquares

metho d see Sect This gives a rough idea of the quality of the rule base and provides

further clues on howtocho ose the initial values of the comp onents

Next let all parameters lo ose and estimate these using either the Levenb ergMarquardt or

the damp ed pseudoinverse variant of the GaussNewton algorithm which b oth are equipp ed

with regularization thus meaning that only data supp orted MF parameters are up dated

see Sect Besides this extrap olation feature it is worth emphasizing that another

decisive reason for using one of these schemes instead of a pure gradient pro cedure whichis

quite common in this area is due to their sup erior convergence prop erties near the minimum

Dennis and Schnab el Notice though that a gradient descent metho d can b e warranted

in online situations where realtime asp ects are esp ecially imp ortant

The main problem with this straightforward approachisnow that nothing hamp ers the parameters

from being up dated in sucha waythat the original meaning of the parameters is lost In what

follows we showhowtoavoid suchabehavior

Preserving the meaning of a fuzzy rule base

The MFs and their parameters are directly coupled to certain linguistic values What is imp ortant

to recognize is that these values are ordered Consider eg the linguistic variable speedt

hspeed t A flow rather low medium rather high highgDi

which can b e assigned to ve dierentvalues here listed in an order representing higher and higher

sp eed It is of course crucial that this order is reected by the corresp onding MFs If this is the case

for all involved linguistic variables then wesay that the corresp onding fuzzy mo del is linguistical ly

a Output MFs b Input MFs of networkclassic typ e

j j j j

j

k k k k k

   

k k k k k w w w w lo lo high high ther lo ther lo medium medium ther high ther high ra ra

ra ra

U

Y

k

c Input MFs of Zadehformed typ e d Input MFs of piecewise linear typ e

c b b c b b c b b c b b

k k k k k k k k k k k k

d d d d d d d d

k k k k k k k k w w w w lo lo high high ther lo ther lo medium medium ther high ther high ra ra

ra ra

a a a a a a a a

k k k k k k k k

U U

k k

Fig Restrictions on the parameters  and  that are necessary for a linguistic variable here

speedt to have a reasonable linguistic interpretation

sound The imp ortant question is nowhow to relate this soundness concept to the parameters of

the memb ership functions

In case speedt is an output linguistic variable then the MFs characterizing its linguistic

values are fuzzy singletons lo cated at p ositions Y Assuming that the order of these

j j



centers corresp onds to the order of the linguistic values the interpretation of speedtispreserved

if and only if

j j j j j

   

See plot a of Fig For n dierent linguistic values of the output variable this merely generalizes

to the parameter restrictions

j j j

n 

The ordering of the linguistic values is also essential for the linguistic variables involved in the

rule antecedents but then the situation b ecomes more complicated owing to that the corresp onding

memb ership functions are more complex

Supp ose that speedt is a linguistic input variable coupled to the k th regressor with lin

or these MFs to have a linguistic interpretation guistic values describ ed bynetworkclassic MFs F

a rst requirement is that the ve p osition parameters are ordered as

j k

k

k k k k k

Furthermore the scale parameters can b e divided into two classes those that must

k k

b e p ositive and those that must b e negative The latter category is applicable for MFs

of sigmoidal typ e and are used to reect that the degree of memb ership decreases when the input

increases If on the other hand the memb ership degree should increase with the input or if the

MF is a Gaussian then the scale parameter must b e p ositive See b of Fig The generalization

to r regressors each with n dierent linguistic values listed in increasing notion is obvious

k

k r

k k n k

k

These restrictions still allow some inconsistencies as is also illustrated in b of Fig The problem

is that the memb ership degree of medium exceeds the memb ership degree of rather high for

large input values which is quite unlogical To cop e with this diculty one idea is to imp ose

further restrictions on However this is not that easy unless unnecessarily hard restrictions are

erse of discourse and inicted b ecause what can b e accepted dep ends partly on the currentuniv

partly on the valuesoftheinvolved p osition parameters

Zadehformed or piecewise linear MFs are b oth able to resolve this dilemma as they do not

involve scale parameters The j th linguistic value of the k th linguistic variable is instead char

k

acterized by an MF with two three or four p osition parameters Let the comp onents of

j k

k

i

b e ordered on U and denoted for i fa b c dg where a denotes the p ointinU from

k k

j k

j k

k

k

which the memb ership degree starts to increase b is the point where it reaches a full degree of

memb ership c is the p oint where the memb ership degree starts to decrease a parameter if and

only if b c and nally d is the point from which the memb ership degree is zero For each

individual Zadehformed or piecewise linear MF it must thus hold that

a b c d

j k j k j k j k

k k k k

where parameters not present in the current MF are simply removed from In order to

maintain a reasonable meaning of the linguistic variables wemust also guarantee the ordering of

twe obtain a language consistent meaning their resp ective MFs For the linguistic variable speed

if b esides it additionally holds that see c or d of Fig

a a a a

k k k k

c b b c b b

k k k k k k

d d d d

k k k k

In case there are r linguistic variables eachwith n k r dierent and ordered linguistic

k

values this generalizes to

a a a

k k n k

k

c b c c b b

n k n k k k k k

k k

d d d

n k k k

k

where as b efore parameters not present in the corresp onding MFs are removed from

By a prop er mo deling pro cedure the initial values of and will agree with the applicable

restrictions from ab ove ie the initial value of can b e assumed to corresp ond to the feasibility

region D In order to also ensure a feasible parameter estimate when using an unconstrained

algorithm it must here be required that the parameter up date at any stage is such that the

constraints are not violated Otherwise convergence can b e to a lo cal and undesired minimum

where the parameters cannot b e linked to the linguistic domain

Although this problem do es not always o ccur it still app ears surprisingly often in practice even

for simple static systems as is demonstrated in Lindskog There are several plausible reasons

for this One is that certain regression regions may b e reected by few and noisy data p oints that

actually suggest an infeasible up date of the parameter vector Notice that this is likely to happ en

when many parameters are estimated ie when the mo del structure is to o exible Another reason

is that the initial parameter values and thereby the initial shap e of the corresp onding membership

functions are just to o inaccurate

The only way to really guarantee that the exp ert knowledge is preserved is to take the parameter

restrictions into account in the estimation phase Since all restrictions considered thus far are

inequalities this is a situation that straightforwardly can b e handled through the barrier function

estimation approach discussed in Sect More sp ecically we intro duce a number of simple

inequality constraints c

j

if then let c

i j i

if then let c

i i j i i

y of the parameter restriction chains which when put together bring ab out an

or thereby dening D

Further parameter restrictions of inequality typ e can of course also be imp osed via barrier

functions We can eg restrict a p osition parameter to anyinterval thus guaranteeing that an

MF stays in a p osition where the corresp onding linguistic value is considered valid

Another p ossibilityistoallow soft parameter knowledge which is more or less consultativein

nature and whose purp ose it to try to balance the exp ert and the data knowledge by assessing the

qualityof the initial parameter values The easiest metho d to achievesuch a b ehavior is to use

explicit regularization as was briey discussed in Sect As for the barrier function metho d the

basic idea is to add a p enalty term to the ob jective function of ie to nd parameters by

iteratively minimizing something like

N l i

X X X

k

k

A

arg min y t ytj c

j i i

N

i

D

N

t

j j

where denotes the initial value of the ith parameter and is a usertunable parameter

i

i

expressing the relative b elief in the value of In eect a large compared to the other

i i j

implies that the cost for moving awayfrom b ecomes high thus expressing that its value is

i

i

believed to b e close to

i

Guaranteeing certain nonstructural system prop erties

Many dynamic pro cesses are known from simple physics to have a steadystate gain curve that

is monotonically increasing decreasing in the inputs Consider eg a simple tank system where

the inow is the input and the liquid level the output Here it is known that a certain constant

inoweventually leads to a constant liquid level Starting from such a steadystate condition it

is also known that an increase in the inow causes the liquid level to increase in a nonoscillatory

manner and settle at a higher level This is a nonstructural system prop erty that is extremely

imp ortant to retain in certain applications eg when the mo del is going to b e used in a predictive

control arrangement Koivisto

The main problem is now that by applying a exible nonlinear blackbox mo del structure a

neural network etc then it can be quite hard to ensure this monotonicity feature esp ecially

if there are regression regions with few and noisy data To remedy this we will here consider a

restricted variantofthe fuzzy mo del structure that guarantees an increasing decreasing

r

function mapping from the regression space R to the output space R This structure together with

aproperchoice of regressors t delayed inputs and outputs only result in dynamical mo dels

showing the desired monotone b ehavior

A conceptually simple way to ensure monotonicity is to rst restrict the MFs at the input side

to corresp ond to fuzzy partitions Brown and Harris Sjob erg et al

k k k k k k k k k k k k

k k k k k k k k k k k k k k k k k k k k k k k k k k k k k k t t t k t t t k k t t t t k k k k k k k

mfpi mfpi mfs mftrap mftri mfr mfz mfpi mfl mftri

U U

k k

Fig Fuzzy partitions formed by Zadehformed MFs left and piecewise linear MFs right

Denition fuzzy partition Supp ose that the k th linguistic variable can be assigned

to n dierent values each describ ed by a memb ership function t U

k A k k

j k j k

j k

k k

k

These MFs form a fuzzy partition if it holds on the entire domain U that

k

n

k

X

t

A k

j k j k j k

k k

k

j

k

By imp osing this restriction on all the r linguistic variables regressors and additionally assuming

r

that the rule base is complete in the sense that it covers the whole input domain U it immediately

follows that the mo del structure simplies to

n n n n

r

r r

Y X X X X

t w ytj

A k j A j j

j k j k r

j k j

k k

k

j j j j

k

r r

Before pro ceeding notice that a fuzzy partition puts certain demands on the MFs and their

parameters For example we cannot in general use sigmoidal or Gaussian MFs b ecause of their

spreading and curvature Zadehformed or piecewise linear MFs on the other hand can readily b e

parameterized so that a fuzzy partition is obtained See Fig

Besides simplifying the predictor structure no normalization is needed a fuzzy partition always

leads to fewer parameters at the input side The ve MFs shown in Fig are eg describ ed

by parameters only whereas a full degree of freedom parameterization of the MFs implies

parameters as is shown in Fig Still of course the remaining problem is that the complexity

of the predictor typically increases rapidly with r

Consider now the case with a single input linguistic variable r Guaranteeing that the

predictor is monotonically increasing in t can b e done in many dierentways esp ecially if the

origin of is neglected but then it can b e quite hard to express restrictions on the parameters

that ensure that monotonicity is preserved in the estimation step However this is a simple task

when the input MFs form a fuzzy partition

To see that this is the case assume that all input MFs are ordered on the universe U in such

away that reaches a full degree of memb ership for a value of t that is lower than what

A

j

is the case for See Fig If the ordered MFs at the input side form a fuzzy partition

A

j 

and the corresp onding centers reecting the output MFs are suc hthat

j

n

theny tj will show a monotonically increasing b ehavior In verifying this we rst notice that at

intervals where the j th input MF is fully active then the corresp onding output b ecomes With

j

fuzzy partitions constructed by Zadehformed or piecewise linear MFs wealsohavethat

ytj

j A j A j j A j

j j  j 

for all intervals U suchthat and are not always zero Since

j j A A j j

j j 

equalitygives a constant output on the currentinterval and is an increasing function on

A

j 

it follows that also ytj is an increasing function on that interval with values ranging

j j

from to These facts altogether give that the overall predictor is a nondecreasing function

j j

To get a strictly increasing mapping it additionally must b e required that all the input MFs lack

intervals at parts with a full degree of memb ership

Two things are worth emphasizing b efore considering systems with several input linguistic

variables The rst is that Zadehformed MFs always result in mo dels with lo cal plateaus at

each p osition This is a b ehavior that is quite unrealistic in certain applications thus favoring

j

piecewise linear MFs as these do not intro duce such plateaus unless trap ezoidal MFs are used

The second observation is that the restrictions imp osed by a fuzzy partition typically reduces the

risk for p osition changes among the parameters and which in terms of estimation algorithms

means that it is often sucient to use an unconstrained pro cedure instead of a constrained one

Now in order to generalize the ab ove result to predictors having r regressors it is instructive

to rst formally dene what is mean tby a monotonically increasing predictor in t

r

Denition regressor ordering Let t t R We say that t t if

t t for j r

j j

r

Denition monotonically increasing predictor Let t t R Wesay that a

predictor g t is monotonically increasing in the regressors if whenever t t it holds

that g t g t

Using the latter denition wenowhave the following central theorem

Theorem Let the mo del structure b e complete and given by If for all k r

it holds that

n

k

X

t

j j A k

r j k j k

j k

k k

k

j

k

are monotonically increasing functions in t on U for all p ossible combinations of xed values

k k

then the predictor is monotonically increasing in t of j j j j

k k r

Pro of See Lindskog page

The main p oint with Theorem is that it is sucienttowork with onedimensional functions A

simple way to ensure increasing functions in all t is therefore to restrict the input MFs to fuzzy

k

partitions and order the corresp onding centers as was done in the onedimensional case

Lemma Let the mo del structure b e and let t denote one of its regressors Assume

k

that the ordered on U MFs asso ciated with t are either Zadehformed or piecewise linear and

k k

such that they form a fuzzy partition If for all p ossible combinations of j j j j

k k r

it holds that

j j j j j j

r r

k k

for all j n then every

k k

n

k

X

t

j j A k

r

j k j k

k

k

j

k

is a monotonically increasing function in t on U

k k A

Fuzzy rule base with rules

ytj

A t

U A

A

t U

A

A A

A

t

U

U

t

Fig Graphical representation of a complete fuzzy rule base containing rules left Both linguistic

variables at the input side have MFs forming fuzzy partitions Ordering the centers as

j j j j

 

for j and j andas for j and j gives an

j j j j

 

increasing function mapping as is shown to the right

Pro of Follows directly from the onedimensional case discussed ab ove

The requirements for Theorem to hold are fullled if all the MFs are chosen according to

Lemma This is the case for the fuzzy rule base shown in Fig from which it is clear that

one or more of the regressors the resulting predictor returns a larger or unchanged output if

become larger Notice that this prop erty continues to hold if only the original orders among

the parameters and are maintained Since this will be the case when pursuing constrained

estimation sub ject to these order constraints we conclude that the monotonicity prop erty can b e

preserved throughout the estimation phase

At this p oint assume that the regressors include dynamics

T

y t y t ut ut

t

where without loss of generality only one input signal is present A globally asymptotically stable

predictor g in t implies that a constant input u ut ut leads to a

constant output y as t Plotting y for eachvalue of u gives the socalled steadystate gain

curve

Lemma Let u y and u y be two steadystate solutions to a globally asymptotically stable

predictor g t ie

T

y y u u

g y g

T

y u u y

g g y

If g t is monotonically increasing in t and u u then y y

Pro of See Lindskog page

If the requirements of Lemma are fullled then we get a predictor with a monotonically

increasing steadystate gain curvein the input Moreover starting from a steadystate solution

and increasing the input in a stepwise fashion it follows by simple induction that y tj increases

monotonically with t This in particular means that the predictor shows a nonoscillatory step

resp onse behavior which is a restriction but also a prop erty that is valid for many industrial

pro cesses eg thermal systems We will apply fuzzy identication based on to one such

pro cess in Sect

y tj y tj

f f f f

Fuzzy mo del

t

structure

ytj

tj

y t

f f

y tj

b b

Blackbox mo del

structure

Fig Combined twostage fuzzy and blackbox estimation pro cedure The fuzzy mo del is rst obtained

whereup on the residuals tj are used for the tuning of the blackbox parameters 

f f b

Fuzzy hybrid mo delingSome p ossibilities

Although a fuzzy mo del of the form is able to theoretically approximate anywellb eha ved

system to a desired degree of accuracythismay require a to o complex rule base particularly if

r is large With the aim of reducing the complexity of the mo dels while also maintaining or even

enhancing their p erformance it is interesting to marry together fuzzy and other identication

approaches This can b e done in many dierentways

Fuzzy mo deling based on physically induced regressors A novel rst idea is to keep struc

ture but apply more involved and physically motivated regressors linguistic variables than

just delayed in and outputs Parts of the imp ortant system nonlinearities can then b e captured

directly in the regressors thus typically leading to that fewer MFs parameters andor regressors r

are needed in the resulting mo dels For example in order to mo del the p ower delivered by a heater

element a resistor of some kind an obvious physically motivated regressor to use would b e the

squared voltage applied to the heater In other and more sophisticated mo deling situations suit

able regressors can b e implicitly given in terms of some dynamic andor static equations To then

arrive at a set of physically induced regressors requires b oth symb olic and numeric computations

as is stressed in Lindskog and Ljung Lindskog

Com bining fuzzy and traditional grey b ox mo deling Many realworld systems are comp osed of

several sub comp onents some of which are welldescrib ed directly in terms of physical principles

conservation laws etc and some of which are b etter describ ed in linguistic terms This fact

strongly motivates the use of several small and interacting fuzzy and grey b ox mo del structures

whichwhencombined give the overall predictor It is our opinion that such a mo del decomp osition

always should b e considered in a complex mo deling situation

Combining fuzzy and blackbox mo deling Even if the ab ove p ossibilities are contemplated there

can still b e imp ortant system phenomena that are hard to reect within the fuzzy framework As

mentioned earlier it is then app ealing to complementthe exp ert determined fuzzy structure by

a suciently exible blackboxditto typically a neural network which is solely resp onsible for

picking up the remaining dynamics This naturally leads to the setup shown in Fig whichin

structure is similar to what is discussed and proved useful in Forssell and Lindskog

Notice that the parameters of the fuzzy structure are rst estimated based on the measure

f

t The residuals tj y t y tj are then formed and used for the tuning of the ments y

f f f f

blackbox parameters The main reason for using this particular scheme is that the obtained

b

fuzzy mo del gives useful insightinto the choice of blackbox structure its size and so forth

H cm V

t

u

ht

ut

Time min

cm t

h

Time min

Fig Schematic picture of the lab oratoryscale tank system left Exp erimental data used for

estimation right

Example Tank level mo deling

The ob jective in this application example is to mo del how the liquid level ht of a simple

lab oratoryscale tank system shown in Fig changes with the inow that is generated by

the voltage ut applied to the pump We see that the measured estimation data samples

rather well cover the interesting mo deling domain

To get a feeling for the nonlinearities it is useful to rst takea closer lo ok at the

behavior of a simple linear regression mo del and compare this with exp erimental validation data

One of the b est mo del structures having regressors of the form involves three parameters only

htj ht ut

Simulated outputs from the corresp onding linear leastsquares tted mo del are compared to real

tank measurements new samples in the left plot of Fig The t is clearly not that bad

yet the mo del output is physically imp ossible since it is sometimes negative This is of course a

nontrivial complication if we are going to use the mo del to study the b ehavior of the real system

In fact all linear regression mo dels with delayed in and outputs as regressors show this defect

A simple idea to overcome this dicultyisnow to try some semiphysical mo deling The tank

level change dep ends on the dierence b etween in and outow conservation of mass While the

inow is roughly prop ortional to ut the outow can be approximated using Bernoul lis law

p

which for a small outlet hole states that the outow is prop ortional to ht By combining these

facts it is pretty straightforward to arrive at the nonlinear mo del structure a linear regression

p

ht htj ht ut

By tuning the four parameters of structure we obtain a mo del whose simulation b ehavior is

detailed in the right plot of Fig Compared to the mo del of the form the semiphysical

mo del gives a physically sound resp onse and is seemingly b etter except for large tank levels Still

however there is no guarantee that the mo del outputs are physically sound for other input values

Before trying to counteract this it is exp edient to list what is actually known ab out the pro cess

First of all weknow that the more inow the higher will the liquid level b e The steadystate

gain curve of the mo del should thus b e monotonically increasing in utu

The input utmayvary from to V which means that there is always a ow across

the tank Even if the estimation data set is of rather high quality it still shows some gaps

The tank is eg never emptied nor is it completely lled up However we know for sure

that these situations can o ccur for ut V A go o d mo del should b e equipp ed with

these extrap olation capabilities

Measured outputs Measured outputs

RMS error RMS error

Max error Simulated outputs Simulated outputs Max error

cm cm

t t h h

el el

ank lev ank lev

T T

Norm of prediction errors Norm of prediction errors

Time min Time min

Fig Simulation b ehavior based on validation data of typical ARX left and semiphysical right

mo dels describing the level of the tank depicted in Fig

The true function mapping shows no intermediate lo cal plateaus

Since these features can be captured using the earlier discussed fuzzy framework we next turn

to some fuzzy identication The ARX and the semiphysical mo dels structures and

resp ectively indicate that ht and ut are useful signals regressors Taking the ARX

mo del as the starting p oint and noticing its good p erformance at high levels ab ove cm it

is reasonable to put further mo deling eort into regions where ht is low Desiring also a low

complexity mo del it is sensible to describ e each linguistic variables with few linguistic values

hlevel t A fzero verylow low rather low high maxgD htj i

hlevelt A fzero low highgD tht z t i

hvoltage t A flow highgD tut z t i

htj Y t U t U

ht ut

where z t The listed system prop erties can now b e guaranteed if the MFs

htj htj htj

zero low high

h tj h tj htj

verylow rather low max

and

t t mfl t

zero A

t t mftri t

low A



t t mfr t

high A



t t mfl t

low A



t mfr t t

A high



are used in the fuzzy predictor

Y X X

t htj

k A j j

j j 



j j

k



which contains free parameters

T

chosen so that

w w t o o t w w ylo lo high max lo high zer zer ther lo level ver ra

level

Y V U cm

w w t t o w w ge ylo max lo high lo zer high a t ther lo level ver ol ra

v

U V Y V

Fig Premise left and consequence right MFs for describing the liquid level of the tank system

Dotted curves show the situation when only the centers are estimated Solid curves show the situation

after constrained estimation sub ject to the constraints

Measured outputs Measured outputs

RMS error RMS error

Simulated outputs Simulated outputs

Max error Max error

cm cm

t t h h

el el

ank lev ank lev

T T

Norm of prediction errors Norm of prediction errors

Time min Time min

Fig Simulation based on validation data of unconstrained linear leastsquares left and constrained

right estimated fuzzy mo dels reecting the liquid level of the tank from Fig

A graphical representation of the corresp onding complete fuzzy rule base is shown in Fig

Notice that the MFs asso ciated with each regressor form a fuzzy partition and that this fact

together with the restrictions guarantee a monotonically increasing predictor in t By

Lemma we also get a steadystate gain curve that is monotonically increasing in ut u

the rst prop erty Furthermore the extrap olation prop erty is ensured by xing some of the

parameters and thereby assuring that the predictor is able to

return values in the whole output universe Y The third prop erty is nally guaranteed by the use

of piecewise linear MFs in accordance with Equations

With xed according to the left plot of Fig dotted curves unconstrained linear least

squares estimation of the four free centers yields a feasible parameter estimate see the upp er right

ulation detailed to the left of Fig indicates also that this rst mo del plot of Fig The sim

is rather go o d Starting from this p oint it is now true that unconstrained estimation of all seven

parameters renders a mo del withalower ro ot mean square RMS error compared to

for the rst mo del but then it b ecomes dicult to linguistically interprete the obtained mo del

cm

cm

h Y

N

N j j t t

h

h

t

U

cm

V

U

t

utu V

Fig Function mapping left and steadystate gain curve right of the fuzzy tank mo del obtained

using constrained estimation

Resolving this dilemma by p erforming constrained estimation sub ject to the constraints gives

linguistically sound MFs as is shown in Fig solid curves On top of that this nal mo del

shows the b est simulation p erformance of all mo dels derived Compare Figs and The built

in increasing nature of the nal predictor is now evident from the left plot of Fig and although

this mapping is at a rst sight quite similar to a linear one it is from the steadystate gain curve

of Fig clear that the mo del has an imp ortant nonlinear b ehavior in the interesting op erating

region Notice also that this steadystate gain curve is monotonically increasing in utu

From this discussion we conclude that the tank system can b e accurately describ ed by a fuzzy

mo del having few estimated parameters A sound physical b ehavior is guaranteed by applying

the mo del structure which allows inclusion of certain extrap olation and steadystate gain

monotonicity features The latter prop erty is esp ecially imp ortant to reect in certain predictive

control applications as is stressed in Koivisto Even if mo dels that are go o d from a loss

function p oint of view are used it is there illustrated that without such a prop erty when known

from physics severe stability problems often arise This b ehavior is indeed related to the diculties

o ccuring when p erforming standard adaptivecontrol based on linear mo dels for which the sign of

the rst B q parameter is incorrect Astrom and Wittenmark

Practical asp ects

This section addresses a numb er of practical issues that ought to b e considered in connection with

fuzzy grey b ox identication

Mo del complexity

We have earlier stressed that the mo del complexity typically increases rapidly with the number

of p ossible linguistic variables r particularly if these can b e assigned to many dierent linguistic

values While the number of variables can b e reduced by a fuzzy hybrid approach see Sect

the numb er of linguistic values can b e kept down by using a coarse description language

In a way this is in conict with the exp erts attempt of pursuing accurate linguistic mo deling

yet the use of few MFs is often desirable from an estimation p oint of view esp ecially in complex

mo deling situations where the data is sparse in the regression space The main reason for this is

that a to o dense MF conguration implies that the corresp onding parameters are t to few data

whichtypically leads to mo dels that p erform rather p o orly for other data records

Toovercome these diculties a go o d practice is often to start with a rather coarse rule base

and if necessary successively rene it This can be accomplished by lumping together similar

linguistic values into one single notion treat eg very low low and somewhat low as one

linguistic value describ ed by one MF If the estimated coarse mo del is not good enough then

intro duce new MFs based partly on the exp ert knowledge and partly on the p erformance of the

coarse mo del The use of the coarse mo del is foremost motivated by the fact that it provides

lo cal p erformance information ie it gives useful data guided renement information It may

also provide information ab out phenomena that were overlo oked in the mo deling phase This ne

grain pro cedure is now iterated until hop efully a good enough mo del is found Although user

interaction is go o d for identifying and avoiding pitfalls the ma jor drawback with the approachis

of course that it is rather timeconsuming compared to pure blackbox mo deling

Robustness of the identication metho d

The success of any identication metho d relies on the descriptive power of the mo del structure

as well as on the quality of the estimation data Compared to a pure data driven identication

approach it is easier to avoid data caused pitfalls by using an exp ert determined mo del structure

Concerning estimation algorithms it is worth stressing that the prefered schemes of Sect

are all robust in the sense that the t of the tuned mo del is at least as go o d as what is obtained

with the initial parameters With a constrained estimation pro cedure we can in addition guarantee

that the estimated mo dels are linguistically sound Notice though that this do es not imply that

a b etter mo del is obtained as this only can b e assessed after a careful validation pro cedure

A distinct advantage with exp ert mo deling is that redundancy in terms of similar MFs as well

as physically unsound regions can be avoided Apart from reducing the mo del complexity this

also leads to less illconditioning problems To fully handle illconditioning we used regularization

having the nice addon prop erty that it enables extrap olation into more or less exp ert explained

regression regions Such a regularizing eect is mediated in the algorithms through SVD compu

tations which are known to b e n umerically robust to carry out Golub and Van Loan

Software

An always present and relevant issue in system identication is the availability of software to ols

The prototyp e package used for the ab ove exp eriments consists of a number of matlab The

MathWorks Inc mles whichcanbedownloaded from the library

ftpftpcontrolisyliusepubSoftwareFuzzy

Owing to that the mo del structure as well as the constraints are represented as strings this package

can b e used for rather general predictors of the form and not just fuzzy ones For example

it can b e applied directly to the fuzzy hybrid approaches suggested in Sect

However working with strings on a textual basis is a bit awkward and error prone This

is a problem that can be relaxed signicantly through a graphical user interface GUI of the

kind provided by MathWorks fuzzy logic to olb ox Roger Jang and Gulley The design of

such GUI means is an obvious pro ject for the future Other imp ortantsoftware pro jects for the

future include the implementation of more ecient constrained estimation algorithms as well as

the development of general and versatile validation pro cedures

Conclusions and future work

After exp eriment design and data collection a typical system identication session involves two

main issues mo del structure determination followed by parameter estimation In this contribu

tion wehave considered fuzzy grey b ox identication which assumes that the former problem is

addressed at least partlybyahuman domain exp ert who indirectly describ es the mo del structure

in terms of a numb er of ifthen rules Taking various fuzzy and identication asp ects into account

we arrived at the Mamdani fuzzy mo del structure which in a more traditional identica

tion setting is nothing but a series expansion of comp osition typ e having much in common with

feedforward neural networks RBFN networks mo del regression trees etc

This kinship in particular means that ecient Newton kind of algorithms the pseudoinverse

GaussNewton or the Levenb ergMarquardt pro cedures can b e applied for MFs parameter estima

tion Since these schemes are equipp ed with regularization it is to some extent p ossible to preserve

exp ert knowledge having minor data supp ort However the series expansions are usually richin

terms of the numb er of parameters This fact esp ecially in combination with few and noisy data

sometimes leads to that the original linguistic interpretation of the rules are lost in the estimation

step Toavoid such an undesired b ehavior it is necessary to imp ose certain restrictions on the MFs

parameters and then solve the obtained constrained minimization problem

For some mo del based control applications it is also extremely imp ortant that the applied mo dels

reect certain nonstructural system prop erties eg a monotonically increasing steadystate gain

curve andor a nonoscillatory step resp onse b ehavior Whereas such features are in general dicult

to guarantee when using neural networks or other exible series expansions these can b e dealt with

by employing the sp ecial fuzzy partition based mo del structure Exp eriments on realworld

data in this case a tank system as well as other applications Lindskog have demonstrated

the feasibility and the usefulness of this approach

To this end let us nally p oint to some extensions and op en problems related to the fuzzy

identication framework discussed ab ove

Stabilityandvarious robustness issues are very imp ortant when the mo dels are going to b e

used in control applications Then howdoesthechoice of MFs aect stability How and

to what extent can the linguistic system knowledge b e exploited for robust control design

Can we apply mo dern stability to ols stemming from the robust control eld unstructured

uncertainties etc For this problem Suykens et al has already suggested an in

teresting metho d based on a particular neural network mo del The basic idea is to view the

neural network as a nominal linear mo del with bounded nonlinear feedback p erturbations

and then use a standard robust control design scheme The obvious question is here if a

similar pro cedure can b e devised for fuzzy mo dels as well

To ensure a monotonic steadystate gain curvewe restricted the MFs to corresp ond to fuzzy

partitions What other and p erhaps b etter MF congurations are able to preserve this

knowledge Also what other kinds of nonstructural prop erties can b e captured within the

fuzzy framework

A water heating system Koivisto with a known increasing steadystate gain curve

b ehavior is successfully mo deled in Lindskog using the fuzzy mo del structure

Because of seasonal temp erature variations and some other factors this mo del is only valid

under certain op erational conditions To also handle long time seasonal changes there is here

a need for fuzzy sp ecic and monotonicity preserving recursive estimation algorithm

The applications considered and mentioned ab ove are rather small However it is our b elief

that the use of linguistic exp ert knowledge really pays o for more involved pro cesses To

investigate this it is worth lo oking further into application elds where verbal knowledge is

dominating as eg is the case for many biomedical or bio chemical systems

Bibliography

Aguirre L A and S A Billings Improved structure selection for nonlinear mo dels based

on term clustering International Journal of Control

Astrom K J and B Wittenmark Adaptive Control nd ed Electrical Engineering Con

trol Engineering AddisonWesley

Babuska R and H B Verbruggen Applied fuzzy mo deling In Proceedings of the IFAC

Symposium on Articial Intel ligenceinReal Time Control pp Valencia Spain

Bjork A Numerical Methods for Least Squares Problems SIAM

Bohlin T Interactive System Identication Prospects and Pitfal lsCommunications and

Control Engineering SpringerVerlag

Breiman L Hinging hyp erplanes for regression classication and function approximation

IEEE Transactions on Information Theory May

Breiman L J H Friedman R A Olshen and C J Stone Classication and Regression

Trees The Wadsworth Probability Series Wadsworth Bro oks

Brown M and C Harris Neurofuzzy Adaptive Model ling and Control Systems and Control

Engineering Prentice Hall International

Chen C H Ed Fuzzy Logic and Neural Network Handbook McGrawHill

Chen S and S A Billings Neural networks for nonlinear dynamic system mo delling and

ol identication International Journal of Contr

Dennis J E and R B Schnab el Numerical Methods for Unconstrained Optimization and

Nonlinear Equations Prentice Hall

Drap er N and H Smith AppliedRegression Analysis nd ed John Wiley Sons

Driankov D H Hellendo orn and M Reinfrank An Introduction to Fuzzy Control

SpringerVerlag

Dub ois D and H Prade Fuzzy sets in approximate reasoning part Fuzzy Sets and

Systems

Fletcher R Practical Methods of Optimization John Wiley Sons

Forssell U and P Lindskog Combining semiphysical and neural network mo deling an

example of its usefulness Submitted to the th IFAC Symp osium on System Identication

SYSID to b e held in Fukuoka Japan July

Golub G H and C F Van Loan Matrix Computations nd ed Johns Hopkins University

Press

Hangos K M Ed International Journal of Adaptive Control and Signal Processing

Special Issue on Grey Box Model lingVol Novemb erDecemb er John Wiley Sons

Haykin S Neural Networks AComprehensive Foundation Macmillan

Higgins C M and R M Go o dman Fuzzy rulebased networks for control IEEE Trans

actions on Fuzzy Systems February

Ishigami H T Fukuda T Shibata and F Arai Structure optimization of fuzzy neural

network by genetic algorithm Fuzzy Sets and Systems May

Johansen T A Operating Regime Based Process Modeling and Identication Phd thesis

W Division of Engineering Cyb ernetics UniversityofTrondheim T rondheim Norway

Novemb er

Johansen T A and B A Foss Identication of nonlinear system structure and parameters

using regime decomp osition In Preprints of the th IFAC Symposium on System Identica

tion M Blanke and T Soderstrom Eds VolJuly pp Cop enhagen Denmark

Kaymak U and R Babuska Compatible cluster merging for fuzzy mo deling In Proceed

ings FUZZIEEEIFES pp Yokohama Japan

Kirkpatrick S C D Gelatt and M P Vecchi Optimization by simulated annealing

Science May

Koivisto H APractical Approach to Model Based Neural Network Control Phd thesis

Tamp ere UniversityofTechnologyTamp ere Finland December

Kosko B Fuzzy Systems as Universal Approximators In Proceedings of the st IEEE

International ConferenceonFuzzy Systems pp San Diego CA USA

Kung S Y Digital Neural NetworksPrentice Hall

Lee C C Fuzzy logic in control systems fuzzy logic controller parts I and II IEEE

Transactions on Systems Man and Cybernetics SMC MarchApril

Lin CT and C S G Lee Neural Fuzzy Systems A NeuroFuzzy Synergism to Intel ligent

SystemsPrentice Hall

for System Identication Based on Prior Lindskog P Methods Algorithms and Tools

Know ledge Phd thesis Department of Electrical Engineering Linkoping University

Linkoping Sweden May

Lindskog P and L Ljung To ols for semiphysical mo delling International Journal of Adap

tive Control and Signal Processing Novemb erDecemb er Identication nonlinear

systems

Ljung L System Identication Theory for the User Prentice Hall

Ljung L J Sjob erg and H Hjalmarsson On neural network mo del structures in system

identication In Identication Adaptation Learning The Science of Learning Models from

Data S Bittanti and G Picci Eds Vol of Series F Computer and Systems Sciences

pp SpringerVerlag

Mamdani E H and S Assilian An exp erimen t in linguistic synthesis with a fuzzy logic

controller International Journal of ManMachine Studies

Marks I I R J Ed Fuzzy Logic Technology and ApplicationsIEEETechnology Up date

IEEE Technical Activities Board

Poggio T and F Girosi Networks for approximation and learning Proceedings of the

IEEE

Pucar P and J Sjob erg a On the hinge nding algorithm for hinging hyp erplanes revised

version Technical Rep ort LiTHISYR Department of Electrical Engineering Linkoping

University Linkoping Sweden Available by anonymous ftp

Pucar P and J Sjob erg b Parameterization and conditioning of hinging hyp erplane mo dels

Technical Rep ort LiTHISYR Department of Electrical Engineering Linkoping Univer

sity Linkoping Sweden Available byanonymous ftp

Roger Jang JS and N Gulley Fuzzy Logic Toolbox The MathWorks Inc Co chituate

Place Natick MA USA

Jang JS and CT Sun Neurofuzzy mo deling and control Proceedings of the Roger

IEEE March

Scales L E Introduction to Nonlinear Optimization Computer Science Series Macmillan

Sjob erg J Q Zhang L Ljung A Benveniste B Delyon PY Glorennec H Hjalmarsson and

A Juditsky Nonlinear blackb ox mo deling in system identication a unied overview

Automatica Decemb er

Soderstrom T and P Stoica System IdenticationPrentice Hall International

Stromberg JE F Gustafsson and L Ljung Trees as blackb ox mo del structures for

dynamical systems Technical Rep ort LiTHISYI Department of Electrical Engineering

Linkoping University Linkoping Sweden

Sugeno M and G T Kang Structure identication of fuzzy mo del Fuzzy Sets and Sys

tems

Sugeno M and T Yasukawa A fuzzylogicbased approach to qualitative mo deling IEEE

Transactions on Fuzzy Systems February

Sun CT Rulebase structure identication in an adaptivenetworkbased fuzzy inference

system IEEE Transactions on Fuzzy Systems February

Suykens J A KBLRDeMoorandJVandewalle identication

using neural state space mo dels applicable to robust control design International Journal of

Control July

Takagi T and M Sugeno Fuzzy identication of systems and its applications to mo d

eling and control IEEE Transactions on Systems Man and Cybernetics SMC Jan

uaryFebruary

The MathWorks Inc MATLAB HighPerformance Numeric Computation and Visualiza

orks Inc Co chituate Place Natick MA USA tion Software The MathW

Wang LX Fuzzy systems are universal approximators In Proceedings of the st IEEE

International ConferenceonFuzzy Systems pp San Diego CA USA

Wang LX Adaptive Fuzzy Systems and Control Design and Stability Analysis Prentice

Hall

Wang LX Design and analysis of fuzzy identiers of nonlinear dynamic systems IEEE

Transactions on Automatic Control AC January

Wang LX and J M Mendel Fuzzy basis functions universal approximation and orthog

onal leastsquares learning IEEE Transactions on Neural Networks Septemb er

Watson G Smo oth regression analysis Sankhya Series A

Yoshinari Y W Pedrycz and K Hiroto Construction of fuzzy mo dels through clustering

techniques Fuzzy Sets and Systems march

Zadeh L A Fuzzy sets Information and Control

Zhang Q and A Benveniste Wavelet networks IEEE Tr ansactions on Neural Net

works