Robust Neurofuzzy Rule Base Knowledge Extraction And

IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS–PART B: CYBERNETICS, VOL. ?, NO. ?, ? 2003 2 Robust Neurofuzzy Rule Base Knowledge Extraction and Estimation using Subspace Decomposition Combined with Regularization and D-optimality Xia Hong, Senior Member, IEEE, Chris J. Harris and Sheng Chen, Senior Member, IEEE

Abstract— A new robust neurofuzzy model construction algo- a fuzzy membership function of individual rules. This prop- rithm has been introduced for the modelling of a priori unknown erty is critically desirable for problems requiring insight into dynamical systems from observed finite data sets in the form the underlying phenomenology, i.e. internal system behavior of a set of fuzzy rules. Based on a Takagi and Sugeno (T-S) inference mechanism a one to one mapping between a fuzzy interpretability and/or knowledge (rule) representation of the rule base and a model matrix feature subspace is established. underlying process. This link enables rule based knowledge to be extracted from The problem of the curse of dimensionality [8] has been matrix subspace to enhance model transparency. In order to a main obstacle in non-linear modelling using associative achieve maximized model robustness and sparsity, a new robust memory networks or fuzzy logic. Networks or knowledge extended Gram-Schmidt method has been introduced via two effective and complementary approaches of regularization and D- representations that suffer from the curse of dimensionality optimality experimental design. Model rule bases are decomposed include all lattice based networks such as Fuzzy Logic (FL), into orthogonal subspaces, so as to enhance model transparency Radial Basis Function (RBF), Karneva distributed memory with the capability of interpreting the derived rule base energy maps, and all neurofuzzy networks (e.g. adaptive network level. A locally regularized orthogonal least squares algorithm, based fuzzy inference system (ANFIS) [9], Takagi and Sugeno combined with a D-optimality used for subspace based rule selection, has been extended for fuzzy rule regularization and model [5], etc.). This problem also mitigates against model subspace based information extraction. By using a weighting transparency for high dimensional systems since they generate for the D-optimality cost function, the entire model construction massive rule sets, or require too many parameters, making procedure becomes automatic. Numerical examples are included it impossible for a human to comprehend the resultant rule to demonstrate the effectiveness of the proposed new algorithm. set. Consequently the major purpose of neurofuzzy model construction algorithms is to select a parsimonious model Index Terms— Neurofuzzy networks, orthogonal decomposi- structure that resolves the bias/variance dilemma (for finite tion, subspace, regularization, optimal experimental design. training data), has a smooth prediction surface (e.g. parameter control via regularization), produces good generalization (for I. INTRODUCTION unseen data), and with an interpretable representation -often in the form of (fuzzy) rules. For general linear in the parameter Associative memory networks (such as B-spline networks, systems, an orthogonal least squares (OLS) algorithm based on RBF’s, support vector machines (SVM)) have been exten- Gram-Schmidt orthogonal decomposition can be used to deter- sively developed [1], [2], [3], [4]. Most conventional neural mine the models significant elements and associated parameter networks lead only to ‘black box’ model representation, yet estimates, and the overall model structure [10]. Regularization a neurofuzzy network has an inherent model transparency techniques have been incorporated into the orthogonal least that helps users to understand the system behaviours, oversee squares (OLS) algorithm to produce a regularized orthogonal critical system operating regions, and/or extract physical laws least squares (ROLS) algorithm that reduces the variance of or relationships that underpin the system. Based on the fuzzy parameter estimates [11], [12]. To produce a model with good rules inference and model representation of Takagi and Sugeno generalization capabilities, model selection criteria such as the [5], a neurofuzzy model can be functionally expressed as an Akaike information criterion (AIC) [13] are usually incorpo- operating point dependent fuzzy model with a local linear rated into the procedure to determinate the model construction description that lends itself directly to conventional estima- process. Yet the use of AIC or other information based criteria, tion and control synthesis [1], [6], [7]. The model output if used in forward regression, only affects the stopping point is decomposed into a convex combination of the outputs of of the model selection, but does not penalize regressors that individual rules, and the basis function can be interpreted as might cause poor model performance, e.g. too large parameter Manuscript received ?, 2002; revised May 16, 2003. This work was variance or ill-posedness of the regression matrix, if this is supported by the EPSRC, UK. selected. This is due to the fact that AIC or other information Xia Hong is with Cybernetic Intelligence Research Group,Department of based criteria are usually simplified measures derived as an Cybernetics University of Reading, Reading, RG6 6AY, UK. Chris J. Harris and Sheng Chen are the Department of Electronics and Computer Science approximation formula that is particularly sensitive to model University of Southampton, Southampton SO17 1BJ, UK complexity. IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS–PART B: CYBERNETICS, VOL. ?, NO. ?, ? 2003 3

In order to achieve a model structure with improved model one mapping between a fuzzy rule base and a model matrix generalization, it is natural that a model generalization ca- feature subspace is established [22]. This link enables rule pability cost function should be used in the overall model based knowledge to be extracted from matrix subspace to searching process, rather than only being applied as a measure enhance model transparency. In order to achieve maximized of model complexity. Optimum experimental designs have model robustness and sparsity, a new robust extended Gram- been used [14] to construct smooth network response surfaces Schmidt algorithm has been introduced via two effective and based on the setting of the experimental variables under well complementary approaches of regularization and D-optimality controlled experimental conditions. In optimum design, model experimental design. This new algorithm decomposes the adequacy is evaluated by design criteria that are statistical model rule bases via an orthogonal subspace decomposition measures of goodness of experimental designs by virtue of de- approach, so as to enhance model transparency with the sign efficiency and experimental effort. Quantitatively, model capability of interpreting the derived rule base energy level. A adequacy is measured as function of the eigenvalues of the locally regularized orthogonal least squares algorithm tailored design matrix. In recent studies [15], [16], the authors have for rule regularization has been combined with a D-optimality outlined efficient learning algorithms, in which composite cost for subspace selection. By using a weighting for the D- functions were introduced to optimize the model approxima- optimality cost function, the entire model construction pro- tion ability using the forward orthogonal least squares (OLS) cedure becomes automatic. The proposed algorithm enhances algorithm [10], and simultaneously determined model ade- the previous algorithm [22] via the combined LOLS and quacy using an A-optimality design criterion (i.e. minimizes D-optimality for robust rule selection, and is based on the the variance of the parameter estimates), or a D-optimality extension of the combined LOLS and D-optimality algorithm criterion (i.e. optimizes the parameter efficiency and model [18] from conventional regressor regression to orthogonal robustness via the maximization of the determinant of the subspace regression. design matrix). It was shown that the resultant models can be This paper is organized as follows. Section 2 introduces improved based on A- or D-optimality. These algorithms lead a general class of neurofuzzy systems as a modelling ap- automatically to an unbiased model parameter estimate with an proach. Section 3 introduces the proposed new algorithm, with overall robust and parsimonious model structure. Combining analysis into the associated model transparency, robustness a locally regularized orthogonal least squares (LROLS) model enhancement via D-optimality and rule based regularization. selection [17] with D-optimality experimental design further Numerical examples are provided in Section 4 to illustrate enhances model robustness [18]. the effectiveness of the approach and Section 5 is devoted to Due to the inherent transparency properties of a neurofuzzy conclusions. network, a parsimonious model construction approach should lead also to a logical rule extraction process that increases II. A NEUROFUZZY MODELLING APPROACH model transparency, as simpler models inherently involve fewer rules which are in turn easier to interpret. One drawback This section briefly presents a general class of neurofuzzy of most current neurofuzzy learning algorithms is that learning systems as a nonlinear data modelling approach within a is based upon a set of one-dimensional regressors, or basis coherent framework of both mathematical representation for

functions (such as B-splines, Gaussians, etc), but not upon a set learning and linguistic logic rule representation for model¡

¢¡¤£¤¥ ¦¨§ © § © ¨ of fuzzy rules (usually in the form of multi-dimensional input transparency. Given a ﬁnite data set variables), resulting in opaque models during the learning of observed input/output data pairs, consider the identiﬁcation

process. Since modelling is inevitably iterative it can be of a general nonlinear system that generates this data

© £¨§ ¦§ © ¨ § © greatly enhanced if the modeller can interpret or interrogate § (1) the derived rule base during learning itself, allowing him/her to terminate the process when his/her objectives are achieved.

where !

¦§ © £ ! " #%$'&($') There are valuable recent developments on rule based learning (2)

and model construction, including a linear approximation

approach combined with uncertainty modelling [19], various is an observed system input vector, ¨§ is a priori unknown.

§ ©

fuzzy similarity measures combined with genetic algorithms The observ ation noise is assumed uncorrelated with

[20], [21]. Recently the authors have introduced a new neuro- variance + . is an unknown parameter vector associated

fuzzy model construction and parameter estimation algorithm with an appropriate but yet to be determined model structure. /

from observed ﬁnite data sets, based on a Takagi and Sugeno Model (1) can be simpliﬁed- by decomposing it into a set

- § ¦¨. § © - £¤1

0 , ,

(T-S) inference mechanism and a new extended Gram-Schmidt of , local models , , where

/ ! 2

orthogonal decomposition algorithm, for the modelling of a is to be determined, each of which operates - on a local region

$') ¦¨.

/ ! 2

priori unknown dynamical systems in the form of a set of depending on the sub-measurement- vector , a subset

¦ ¦¨. $&-3$) § 45-36745 &

8 /

fuzzy rules [22], which, based on a Takagi and Sugeno (T-S) of the input vector , i.e. , - ,

&9:£7& - § ¦¨. § © - inference mechanism, establishes a one to one mapping be- 8 . Each of the local models can

tween a fuzzy rule base and a model matrix feature subspace. be represented by a set of linguistic rules

- / - /

. .

In this paper, a new neurofuzzy model construction and ¦

IF is ; /

parameter estimation algorithm has been introduced. Based -

IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS–PART B: CYBERNETICS, VOL. ?, NO. ?, ? 2003 4

- / - /

- /

. .

! 2

. £

; ; ! 2

where the fuzzy set ; denotes a fuzzy Note that for a complete model base, the number of rules

45- ) -21

set in the -dimensional input space, and is given , increases exponentially as the input dimension as an array of linguistic values, based on a predetermined increases, (which is commonly known as the curse of dimen-

input spaces partition into fuzzy sets via some prior system sionality). To alleviate this disadvantage, input dimension or

/ ¨ / /

kno/ wledge of the operating range of the data set. Usually if variable reduction can be used. Notably an ANOVA (Analysis

¡ ¡

. ¢ £§© £ ¦¨. ¢ £§¦ . ¦¨. ©

- / ; 9 ;

, for £¥¤ , then , where denotes of Variance) representation of multivariable functions uses

8 5 empty set. ; deﬁnes a complete fuzzy partition of the lower dimensional tensor products of models inputs, e.g. input space & . For an appropriate input space decomposition, in many practical applications, the number of multiplication

the local models can have essentially local linear behavior. In terms maybe limited to as low as 3 , yet maintaining sufﬁ- this case, using the well known Takagi-Sugeno fuzzy inference cient modelling capability [1]. For practical applications, not mechanism [5], the output of system (1) can be represented only is the ANOVA approach effective in overcoming the

by 9 curse of dimensionality, it has additional advantage of model

- / - /

. .

¦§ © £ - § ¦ § © - § ¦ § © - ¨§ transparency because a lower input dimension than can be (4)

- visualized and interpreted [24].

/ - /

- Substitute (5) & (4) into (1)

- § ¦¨. § © - ¦¨. /

where is a linear function of , given by -

- § ¦ § © #- % § © § © £

- / - /

. .

- § ¦ § © - £ ¦ § © #¨-

- 4

(5) ¨

! 2

£ § ¦§ © #:% § © -$ )

- /

and denotes parameter vector of the 0 th fuzzy rule (9)

- § ¦¨.

- / - /

- § © - § ¦§ © - ! 2 § © " £ - § ¦¨. § © 3¦¨. $

or local model. is a fuzzy membership function of #

! 5 ! 2 ! 6

where = , ,

§ ¦ § © ) § ¦§ © 9 § ¦ § © " $%)87

- / - /

# # #

4 4

the rule (3), subject to a unity of support condition: 4

- § ¦¨. £71 - § ¦¨. 1 9

- . = , , .

" $')87 £ 45-

# # #

4 4

, . Each of the linguistic rules 4

9 ¡

= , , , where .

/ ¢¡ £ ¥ ¦§ © § ©

(3) can- be evaluated via the known fuzzy membership function

§ ¦¨. § © - For the ﬁnite data set , (9) can be .

written in a matrix form as

Consider a neurofuzzy network using B-spline functions 9

- / - /

. .

[23] as membership functions. A general one-dimensional B- £

spline model can be formed as a linear combination of -

5<;

£2?¢ @= B-spline basis functions, , as

(10)

£ § 1 § 0 § " $¤)

§ £

- / ¡BA ! 2

where , is the output

(6)

. £ - § ¦§ 1 - § ¦§ " $ )

¨ vector, , is the

=(£

;

4 4 ¡

regression matrix associated with the 0 th fuzzy rule,

§ 1 § " $7) ? £

The coefﬁcients ’s represent the set of adjustable parameters #

/ 9 / ¡BA

is the model residual vector.

. . "¨$') 7

associated with the set of basis functions. ’s, which are

, is the full regression matrix.

; polynomials of a given degree and are uniquely deﬁned ; An effective way of overcoming the curse of dimensionality

by an ordered sequence of real values denoted as a knot

£ is to start with a moderate sized rule base according to the

vector . The knot sequence forms

actual data distribution. In this paper, the selection of a partitioning of the input domain into disjoint local models as an initial model base is based on model intervals. The basis functions set can be deﬁned by recursive identiﬁability via an A-optimality design criterion [14] with equation [23]

the advantage of enhanced model transparency to quantify and

§ § £

interpret fuzzy rules and their identiﬁability.

¡ ¡

III. RULE BASED MODEL CONSTRUCTION AND LEARNING

(7)

¡ ¡

ALGORITHMS

A. Rule based learning and initial model base construction

1 6

with

¡ ¡ ¡

§ £#"

Rule based knowledge, i.e. information associated with a

0 ) %$ fuzzy rule, is highly appropriate for users to understand a Multidimensional B-spline basis functions are formed by a derived data based model. Most current learning algorithms

direct multiplication of univariate basis functions via in neurofuzzy model are based on an ordinary p-dimensional

! 2

- / /

- linear in the parameter model. Model transparency during

¢ +

- § ¦ £ §

¡ , ¡

(8) learning cannot be automatically achieved unless these

¨ regressors have a clear physical interpretation, or are directly

- /

- 1 - - £/. ¦¨. £

£ associated with physical variables. Alternatively, a neurofuzzy

- / - / - /

0 2

for , where! ,

- /

. .

. network is inherently transparent for rule based model

! 2

" $ ) ¦ £ 1 0

¡ ¡ ¡

. , - is the construction. In (10), each of is constructed based on

- §

;

£ /

number of B-spline- basis functions deﬁned in , the th a unique fuzzy membership function , providing a link ¦¨. component of . between a fuzzy rule base and a matrix feature subspace

IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS–PART B: CYBERNETICS, VOL. ?, NO. ?, ? 2003 5

/ - / - /

£ £ . . " .

0 ¤ £

spanned by . Rule based knowledge can be easily , for . ;

extracted by exploring this link. 9

§ £ 45-

- /

9 - /

The 9 -dimensional space , , can be decom-

$%) 45-

. £1

0 ,

Deﬁnition 1: Basis of a subspace: If vectors , posed by , orthogonal subspaces , , given

£ 1 0 45-

- / - /

- / A ! 2

, satisfy ¡ the nonsingular condition that by[25], [26]

. .

! 2

?¢. £( 45- "3$ )

/ 9 / A ¡

- / - /

has a full rank of , they

. . 7

£ $')

45- . ?¢.

4 4 /

span a -dimensional- subspace , then is the basis (14)

¡ . of the subspace .

where denotes sum of orthogonal sets. From Deﬁnition

- /

45-

- /

- / - /

1, if there are any linear uncorrelated vectors located in

- /

Deﬁnition 2: Fuzzy rule subspace: Suppose the is .

. £ 1 45- .

45- .

;

- / - /

- / - / /

nonsingular, - clearly is the basis of a -dimensional , denoted as , , then the

. .

! 2

. £ . "

;

subspace , which is a functional representation of the matrix , forms a basis of . Note

45-

- / - / - / ! 2 A ! 2 - / /

fuzzy rule (3) by using Takagi-Sugeno- fuzzy inference that these vectors need not to be mutually orthogonal, i.e.

. " . £'. $ ) '.

¡ #

. - §

mechanism with a unique label . is deﬁned as a , where is not required

/ - /

fuzzy rule subspace of the 0 th fuzzy rule. to be diagonal.

. .

/ ¡BA ! - / ¡BA ! 2 /

- Clearly if two matrix subspaces , have the basis

$ ) + . $ ) .

, the sub-matrix associated with the 0 th rule, can be of full rank matrices , , then

;

- / / ¤ - /

expanded as - they are orthogonal if and only if

. . .

££¢

- / /

. .

" # £ ! 2 A ! (11) £

;

- / - / ¡ A ¡

(15)

. £ ¢ . £¦¥ ¥ - § 1 - § $ )

- / ¡BA ! 2 - / - / ! 2 A !

§ ¨

where 0 ,

. . . ! 2 A ! $') +

§ " $¤) § 0 ¦ § 1 ¦ ¦

. (11) shows that where + is a zero matrix.

each rule base is simply constructed by a weighting matrix

- /

multiplied to the regression matrix of original input variables. Deﬁnition 4: Vector decomposition to subspace basis: If ,

¢ .

. £ 1

- / ,

The weighting matrix can be regarded as a data based orthogonal subspaces , 0 , are deﬁned by a

. £ 1

- /

¡ 0 ,

spatial preﬁltering over the input region. Without loss of series of , matrices , as subspace basis

. 45-

$%) $

- /

generality, it is assumed that is nonsingular, and , based on Deﬁnition 3, then an arbitrary vector

' 4 ¦ § . £ 45-

as § . As can be uniquely decomposed as

- / - / - /

9 ! 2

. . .

§ £ 5 § ¢ § " /

rank rank rank (12) -

£ -

;

- / - /

, ¡

. .

45- § ¢

¡ 5

For to be nonsingular, then rank , this means 5

- § 9

that ; for the input region denoted by , its basis function

- /

45-

. - needs to be excited by at least data points. £

(16)

5 /

The- A-optimality design criteria for the weighting matrix

¢ .

which is given by [14], [22]

- - £

, ¡

¡ 2

where ’s are ! combination coefﬁcients.

- - ! 2 " $')

1 #

- /

, ,

§ ¢ £ - § ©

- /

. /

. " . £

(13) #

As the result of the orthogonality of , (for

¤ £ 0 ), from (16),

provides an indication for each fuzzy rule on its identiﬁability 9

- /

£ # - and hence a metric for selecting appropriate model rules. The #

: : (17)

derived model rules can then be rearranged in descending order - ¨

of identiﬁability, followed by utilizing only the ﬁrst , experts

with identiﬁability to construct a model rule base set. :

- /

Clearly the variance of the vector projected into each

'. - £1

- , subspace can be computed as , for 0 . B. Orthogonal subspace decomposition and regularization in Consider the nonlinear system (1) given as a vector form

orthogonal subspace by (10). By introducing an orthogonal subspace decomposition

? £

, (10) can be written as

For ease of exposition, we initially introduce some notations

£ >=

and deﬁnitions that are used in the development of the new :

extended Gram-Schmidt orthogonal decomposition algorithm. 9

- /

£ - @=

(18)

¡BA

Deﬁnition 3: Orthogonal subspaces: For a 9 -dimensional

$ ) 7

/ ¡ A ! 2 / ¡BA ! / 9 /

matrix- space , two of its subspaces

¡ ¡

. $ ) . $:) + 45-6 £ . . "

- / - / 9 - /

and , ( 9 , where , , spans a -dimensional space

4 6 . . £ 1 .

0 ,

9 ) are orthogonal if and only if any two vectors with , spanning its subspaces ,

- / - / / /

and that are located in the two subspaces respectively, as deﬁned via Deﬁnition 3. The auxiliary parameter vector

¡ ¡

. $. . $. £ ¤£ " $)87

# # #

i.e. and , are orthogonal, that is, , where is a block upper IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS–PART B: CYBERNETICS, VOL. ?, NO. ?, ? 2003 6

triangular matrix The output variance projected onto each subspace can be

¡ £ ¤

, , , interpreted as the contribution of each fuzzy rule in the fuzzy

¡ ¤

; ; ;

¡ ¤

, , system, subject to the existence of previous fuzzy rules. To

¡ ¤

; ;

¡ ¤

A include the most signiﬁcant subspace basis with the largest

¡ ¤

7 7

£ $')

' ' " -

(19) as a forward regression procedure is a direct extension of

, ¡

;

conventional forward OLS algorithm [10]. The output variance

¢ ¥

9 9

, projected into each subspace can be interpreted as the output

;

2 A !

! energy contribution explained by a new rule demonstrating

- $ ) + - -£§¦ ! 2 A ! 2

, ¡ ,

; ; 2 A ! 2 in which! . , a unit matrix the signiﬁcance of the new rule towards the model. At each $') . regression step, a new orthogonal subspace basis is formed

by using a new fuzzy rule and the existing fuzzy rules in the

' ' " - Deﬁnition 5: The extended Gram-Schmidt orthogonal de- model, with the rule basis with the largest to be included

composition algorithm [22]: An orthogonal subspace de- in the ﬁnal model until

composition for model (18) can be realized based on an

1 ' ' " -6 /

extended Gram-Schmidt/ orthogonal decomposition algorithm (26)

. £ . £§¦ ! 5 A ! 5

;

as follows, Set , , and, for

£§0 £¨¦ ! A !

;

¡ , ¡

+ +

, ;

£ , set , satisﬁes for an error tolerance to construct a model with

4'6 - £:1 4

, 0

/ - / ©

/ rules. The parameter vectors , can

¡ ¡

. . .

- £

, ¡

;

(20) be computed by the following back substitution procedure: Set

! £ ! £ 4 1 1

; ¨

, and, for 0

! ©

where

-5£ - -

- / - / /

, ¡ ¡

; . . .

£ " #

- (27)

, ¡

;

! 2 A !

;

+ $')

(21) The concept of orthogonal subspace decomposition based

£71 on fuzzy rule bases is illustrated in Figure 1. This ﬁgure £ for 0 . illustrates (20) that forms the orthogonal bases. Because of the

Deﬁnition 6: Locally regularized least squares cost function one to one mapping of a fuzzy rule to a matrix subspace, a / series of orthogonal subspace- basis are formed by using fuzzy

in orthogonal subspaces: The orthogonal subspace based .

/ - / / / - /

rule subspace/ basis in a forward regression manner, such

¡ ¡ ¡

. . . ¥ . . .

regularized least squares uses the following error criterion: ¥

;

8 8

that, = ,

§ £§= # = #

(22) 0 , whilst maximizing the output variance of the model at

£ " ¦ £ 1 0 -

# 0

where , , £ schemes such as the classical Gram-Schmidt method construct

are regularization parameters, and the diagonal matrix

¥ ¦ ! 5 A ! 5 ¦ ! A ! ¦ ! A ! ¦

¥ orthogonal vectors as basis based on regression vectors (one

§ ¨ 0 , is a unit ma- dimensional), but the new algorithm extends the classical trix. The regularized least squares estimates of , is given by Gram-Schmidt orthogonal decomposition scheme to the or-

[27]

£7§ # #

: thogonalization of subspace bases (multidimensional). The ex-

(23) tended Gram-Schmidt orthogonal decomposition algorithm is An appropriate choice of can smooth parameter estimates not only an extension from classical Gram-Schmidt orthogonal (noise rejection), and can be optimized by using a separate axis decomposition to orthogonal subspace decomposition, but procedure, such as Bayesian hyper-parameter optimization also as an extension from basis function regression to matrix [18], or a genetic algorithm. In this paper, it is assumed that subspace regression, introducing a signiﬁcant advantage of an appropriate is predetermined to simplify the procedure. model transparency to interpret fuzzy rule energy level.

The regularized least squares solution of (18) is given by

- / - /

. .

-5£ - ¦ " " # : C. New extended Gram-Schmidt orthogonal decomposition

(24) /

- algorithm with regularization and D-optimality in orthogonal

. £1

- / 0 , - / - /

which follows from the fact that , are mu- subspaces

. £: . " .

# tually orthogonal subspaces basis, and .

: The above discussion has been largely introduced in [22],

' ' " - /

From (16), if the system output vector is decomposed- as

. £

: except that in [22], the was used for subset selection

a term by projecting onto orthogonal subspaces ,

= § ©

1 without parameter regularization ( ). Regularization

, , and an uncorrelated term that is unexplained

can be used as an effective resort to overcome overﬁtting

' ' " - by the model, such that the projection onto each subspace to noise. Note that the use of aims to optimize the basis (or a percentage energy contribution of these subspaces

: model in terms of approximation capability, but not in terms

towards the construction of ) can be readily calculated via

- of model robustness. In addition to parameter regularization,

. -

' ' " -¨£ composite cost function such as least squares plus a penalty

# (25) : : term based D-optimality experimental design criterion can be IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS–PART B: CYBERNETICS, VOL. ?, NO. ?, ? 2003 7

(1) W

rule 1 linear combination of Α Α (2) Α 1,2 1,j W 1,K

rule 2 Σ

Α Α 2,j 2,K orthogonal subspaces

(j) W y rule j Σ

Α j,K System output

(K) W rule K Σ

Fig. 1. Orthogonal subspace decomposition based on fuzzy rule bases.

# used [16]. To enhance rule model robustness, the proposed with orthogonal subspaces with design matrix as ,

algorithm combines the two separate previous works, the and a subset of these subspaces are selected in order to

4 § 4£¢

subspace based rule based model construction [22] and the construct a -subspace , model that maximizes

§ ! !

¥ ¦ combined LOLS and D-optimality algorithm [18] for robust the D-optimality ¤ , where is a column

rule based model construction. The combined LOLS and D- subset of representing a constructed subset model with

4 , optimality algorithm [18] was not previously introduced as sub-matrices selected from (consisting of sub-

a rule based learning algorithm, hence some extensions to matrices). It is straightforward to verify that the maximization

§ ! § £

¥ ¦

orthogonal subspace decomposition domain are necessary, as of ¤ is equivalent to the minimization of

©¨ § § !

¥ ¦ introduced in the following. ¤ [22]. Clearly

The concept of parameter regularization may be incorpo-

§ £ ©¨ § § # !

¥ ¦

rated into a forward orthogonal least squares algorithm as ¤

a locally regularized orthogonal least square estimator for -

£ ©¨ §

¥ ¦ subspace selection by deﬁning a regularized error reduction ¤

ratio due to the submatrix as follows. !

- /

After some simpliﬁcation, it can be shown that the criterion

£ ¨ §

¥ ¦

(22) can be expressed as ¤ (31)

- /

# § - ¦ - = #<= # £ #

- : : (28)

- It can be easily verify that the maximization of

§ !

¥ ¦

¤ is identical to the maximization of

- / - / - /

? ! § ? ? ! § ?

. . .

£ "

# #

: :

¥ ¦

¤ , where is a column subset of

where . Normalizing (28) by yields

- representing a constructed subset model with sub-matrices

? .

= = § - ¦ -

# # #

£1 selected from (consisting of sub-matrices) [22].

# #

: : : : (29)

- 5

Deﬁnition 8: Combined Locally Regularized cost function and

' ' ' " - /

The regularized- error reduction ratio due to the sub- D-optimality in orthogonal subspaces: The combined LROLS .

matrix and D-optimality algorithm based on orthogonal subspace

- /

§ . - ¦ - #

- decomposition is based on the combined criterion

' ' ' " -£

: : (30)

§ ¨ ££ § ¨ § (32)

for model selection, where is a ﬁxed small positive weight- Deﬁnition 7: D-optimality experimental design cost function ing for the D-optimality cost. Equivalently a combined error

in orthogonal subspaces: In experimental design, the data

§ ? ?

# reduction ratio deﬁned as

/ - /

covariance matrix is called the design matrix. The D- -

§ '. - ¦ - ¨ § .

¤ ¥ ¦

' ' " -¨£

optimality design criterion maximizes the determinant of the

# (33) : design matrix for the constructed model. Consider a model :

IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS–PART B: CYBERNETICS, VOL. ?, NO. ?, ? 2003 8

£ 0 ¡

is used for model selection, and the selection is terminated a model using 5 rules with is plotted in Figure 3.

with a 4 -subspace model when This example demonstrates that the proposed method has good

' ' " - 4 1

approximation and some robustness improvement. Clearly the

08, for (34) proposed modelling approach is additionally advantageous The introduction of D-optimality enhances model robustness via its signiﬁcant model transparency during the modelling and simplify the model selection procedure [18]. Given a process. proper , the new extended Gram-Schmidt orthogonal sub- Example 2: Nonlinear 2-D surface modelling. The Matlab

space decomposition algorithm with regularization and D- logo was generated by the ﬁrst eigenfunction of the L-shaped

1 1 optimality for rule based model construction is given in membrane. A meshed data set is generated by using

Appendix I. Matlab commands

£ 4 § 1 1

) 9 §

IV. NUMERICAL EXAMPLES 0

£ 4 § 1 1

0 ) 9 §

Example 1: We start with a simple illustrative mapping exam- ¤

" £ & ' ¥ § 5

) ¨ 0

ple. Consider a nonlinear functional approximation of:

£ ' 45 § 1 0

§ £ §

(36)

"¡

¥ ¢ £ 3

§ £ § 5 § 1

¡ ¡

such that output is deﬁned over an unit square input region

1 " § 5

¥ ¨

500 data pairs are generated where the system input is . The data set , shown in Figure 5.(a), is used

generated as a uniformly distributed random number ranged in to model the target function (the ﬁrst eigenfunction of the L-

0 0 1 1 0 "

¡ ¡ ¡ ¤ ¡ ¥ ¡ ¦ ¡

[0,1]. Deﬁne a knot vector , , , shaped membrane function).

¨ 0

¡ ¤ ¡

and use a piecewise linear B-spline fuzzy membership function For both , deﬁne a knot vector ,

0 1 1 0 1 "

- £

¡ ¡ ¡ ¡ ¡ ¤

to build a one-dimensional model, resulting ¥ basis , , and use a piecewise quadratic B-

functions. These basis functions, as shown in Figure 2, corre- spline fuzzy membership function to build a one-dimensional

- £ sponding to 6 fuzzy rules: (1) If (x = 0) (very small); (2) IF model, resulting ¥ basis functions. These basis functions, (x = 0.2) (small); (3) IF (x = 0.4) (medium-small); (4) IF (x as shown in Figure 4, correspond to 6 fuzzy rules: (1) If (x or = 0.6) (medium-large); (5) IF (x = 0.8) (large), and (6) If (x y ) is (very small) (VS); (2) IF (x or y ) is (small)(S); (3) IF

= 1) (very large). (x or y ) is (medium-small)(MS); (4) IF (x or y ) is (medium-

By using the fuzzy model (4) for the approximation of , large)(ML); (5) IF (x or y ) is (large)(L), and (6) If (x or y )

the neurofuzzy model is simply given as is (very large)(VL).

¨ The univariate and bivariate membership functions (inter-

§ © £ § ¨§ © 5§ ©

§ action between univariate membership function and via

¡ ¡

(35)

¡ tensor product) are used as model set and shown in Table IV,

in which, the identiﬁability of fuzzy rules are listed based on ©

where/ denotes the data label, with each of the fuzzy rule (13). From Table IV, it is seen that all the rules have been

. £ § 5§ © ¨§ ©

¡ spanning a one dimensional space, uniformly excited. There are 48 rules.

4 £ 1

§ 5

;

¡ i.e. , £ . The identiﬁability of these fuzzy rules are By using the fuzzy model (4) for the modelling of ,

computed based on (13) and are listed in Table I. Because this the neurofuzzy model is simply given as

example only involves a scalar input variable, the extended

§ © £ § ¦§ © ¦5#

¡ Gram-Schmidt orthogonal decomposition algorithm reduces ¡

(37)

¡ to the conventional OLS algorithm, with each rule subspace 5

being spanned by a one-dimensional rule basis. The proposed

algorithm produces rule based information of percentage en- where denotes the data label, and is given

5 " 1 " /

ergy increment (or the model error reduction ratio) by the by the meshed values of in the input region .

. £ § ¦§ © ¦§ ©

selected rule to the model, as shown in Table II (in the order Hence each of the fuzzy rule ¡ spans

0 4 £ 0

;

¡ of selected rules), shown in two cases of with or without a dimensional space, i.e. , £ . The proposed parameter regularization. Each rule contribution in reducing algorithm based on the extended Gram-Schmidt orthogonal model error (or increasing the model energy level) provides decomposition has been applied, in which each rule subspace model transparency for the fuzzy rules interpretability. To being spanned by a 2-dimensional rule basis is mapped into verify the model’s approximation and robustness, Table III lists orthogonal matrix subspaces. The modelling results contain

the mean squares error (MSE) of model in target to the noisy rule based information of percentage energy increment (or the

§ §

observations ( ) and the true function ( ), respectively. model error reduction ratio) by the selected rule to the model

£ 1 £ 1

For this example, the modelling results are insensitive to a as shown in Table. V for , . The MSE

0 71

¡ ¤

wide range of the parameter associated with D-optimality of the resultant 20-rule model is 3 . In Table

¨ ¨

£ 1 1 £¤1

selection process automatically terminates at a rule model selected, and the model selection automatically terminates at

' ' " 6

(rule 1 is excluded). This insensitivity means that varying a 20-rule model ( ). The model prediction of the

within a certain range will all terminates the modelling 20-rule model is shown in Figure 5.(b). For this example, the within a suitable structural range. The modelling results of modelling results are insensitive to value of ¡ . It has shown IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS–PART B: CYBERNETICS, VOL. ?, NO. ?, ? 2003 9

1.2

rule 1 rule 2 rule 3 rule 4 rule 5 rule 6

0.8

0.6 membership function 0.4

0.2

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x

Fig. 2. The fuzzy membership functions for in Example 1.

TABLE I FUZZY RULES IDENTIFIABILITY IN EXAMPLE 1

Rule Index ¡ 1 2 3 4 5 6

£¤

¢ ¦¨§ ©

TABLE II SYSTEM ERROR REDUCTION RATIO BY THE SELECTED RULES IN EXAMPLE 1.

Rule Index ¡ 5 3 4 6 2 1

§ §¨

§ §¨ ,

TABLE III MODEL MEAN SQUARES ERRORS (MSE) FOR NOISY OBSERVATIONS AND UNDERLYING FUNCTION.

Modelling results 6 rule model 5 rule model

! &

'() * + '() * +

§' "$#

! &

'() * + '() * +

§¨ ",#

% %

'() ¨()

§¨ "$# * - * -

% %

.() * - '() * -

"$# §¨

, IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS–PART B: CYBERNETICS, VOL. ?, NO. ?, ? 2003 10

Model predictions 0.1 Model residual for true function Noisy observations Underlying true function

0.08

0.06

0.04 Modelling Results

0.02

−0.02 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

§¨ Fig. 3. The modelling results of a 5-rule model with for Example 1.

0.9

rule 3 0.8 rule 2 rule 4 rule 5

0.7

0.6

0.5 rule 1 rule 6

0.4

0.3 Fuzzy membership functions

0.2

0.1

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

x, or y !

Fig. 4. The fuzzy membership functions for , or in Example 2. IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS–PART B: CYBERNETICS, VOL. ?, NO. ?, ? 2003 11

TABLE IV !

FUZZY RULES IDENTIFIABILITY IN EXAMPLE 2; (A)RULES ABOUT ; (B)RULES ABOUT ; (C)RULES ABOUT X AND Y (THE STAR ‘* ’ INDICATES RULES INCLUDED IN THE FINAL MODEL)

(a)

Rules ( ) VS S MS ML L VL

£¤

£ ¡ ¡ ¡ ¡

¢ §

(b)

Rules (! ) VS S MS ML L VL

£¤

£ ¡ ¡ ¡ ¡

¢ §

(c)

¥ Rules ( )

VS S MS ML L VL

¡ ¡

Rules S

¡ ¡ ¡ ¡

( ) ML

1 1 ¡ that by using a weighting for the D-optimality cost function, . the entire model construction procedure becomes automatic. It can be seen that the model has some limitations over the modelling of corner and edge of the surface due to the data V. CONCLUSIONS being only piecewise smooth and piecewise nonlinear. This This paper has introduced a new robust neurofuzzy model factor may contribute to the fact that regularization may not construction algorithm for the modelling of a priori unknown help in reducing misﬁt in some strong nonlinear behavior dynamical systems in the form of a set of fuzzy rules. A region. Global nonlinear modelling using B-spline for strong one to one mapping between a fuzzy rule base and a model nonlinear behavior such as piecewise smooth and piecewise matrix feature subspace has been established by extending a nonlinear data is under investigation. Takagi and Sugeno (T-S) inference mechanism. Rule based Example 3: Consider the benchmark Henon time series given knowledge are extracted from matrix subspace to enhance

§ model transparency due to this mapping link. In order to

¡ ¤ ¡ 3 (38) achieve maximized model robustness and sparsity, a new ro-

500 data points are generated with an initial condition bust extended Gram-Schmidt method has been introduced via

:£ § 1 :£

§ two effective and complementary approaches of regularization

. All the data points were used in the modelling by using the proposed approach. The mod- and D-optimality experimental design. By combining a sub-

elling process is brieﬂy described here. The input vector space approach and the concept of robust model construction,

# . For each input, deﬁne a knot a locally regularized orthogonal least squares algorithm is

0 1 1 1 1 0 "

extended for fuzzy rule regularization and subspace based

¡ ¡ ¢ ¡ ¦ ¡ ¦ ¡ ¢ ¡

vector , and use a piecewise

quadratic B-spline fuzzy membership function to build a information extraction, and by combined with a D-optimality £ - for subspace based rule selection. Model rule bases are de-

one-dimensional model, resulting ¤ basis functions,

0 £1 composed into orthogonal subspaces, so as to enhance model corresponding to 6 fuzzy rules. That is, for 0 , (1) If (y(t- i) ) is (small) (S); (1) If (y(t-i) ) is (Medium Small) (MS);(1) transparency with the capability of interpreting the derived rule If (y(t-i) ) is (Medium Large) (ML);(1) If (y(t-i) ) is (Large) base energy level, and are automatically selected for a model (L); Then bivariate membership functions are formed by using with robustness. tensor product.

The modelling results derived by the subspace forward

£ 1 £ 1 1

APPENDIX I

regression process, with , , is given

1 THE ALGORITHM in Table VI, with the ﬁnal model consisting of 3 fuzzy rules. This table shows the energy level per rule extracted for An extended classical Gram-Schmidt scheme combined this chaotic time series. Figure6 demonstrates the excellent with parameter regularization and D-optimality selective cri-

approximation of the derived model. The ﬁnal model MSE terion in orthogonal subspaces can be summarized as the

¡ ¤

is . This is very small compared to signal variance of following procedure: IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS–PART B: CYBERNETICS, VOL. ?, NO. ?, ? 2003 12

TABLE V

SYSTEM ERROR REDUCTION RATIO BY THE SELECTED RULES IN EXAMPLE 2.

¡ ¢¤£¦¥ § ¢¤£¦¨ ¡ ¢¤£¦¥

Selected

§ ¢¤£©¨ ¢¤£©¥

and and and

! ! !

Rules

§ 0.7044 0.1181 0.0954 0.0292 0.0193

Selected

¡ ¢¤ ¨ ¢¤¥

and and and

! ! !

Rules

§ 0.0086 0.0056 0.0040 0.0028 0.0021

Selected

§ ¢¤¨ ¢¤ ¥

and and and

! ! !

Rules

§ 0.0037 0.0016 0.0011 0.0004 0.0003

§ ¢¤¥ ¡ ¢¤ ¥ § ¢¤¨

Selected

§ ¢¤¥ ¢¤ ¤¨

and and and

! ! !

Rules

§ 0.0002 0.0005 0.0000 0.0002 0.0001

1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2 Model output

Target function 0 0

−0.2 −0.2

−0.4 −0.4 1 1

0.8 1 0.8 1 0.6 0.8 0.6 0.8 0.4 0.6 0.4 0.6 0.4 0.4 0.2 0.2 0.2 0.2 0 0 0 0 y x y x

Fig. 5. Modelling results for Example 2.

TABLE VI SYSTEM ERROR REDUCTION RATIO BY THE SELECTED RULES IN EXAMPLE 3.

Selected

! ! ! ! !

" " " " "

Rules

! ! ! !

" " " "

£¦¥ £©¨ £©¥

Selected £¦¨

¥ and and and and

! ! ! !

" " " "

£©¥ £©¨ ¥

Rules ¥

'() * +

! ! !

" " "

and and and

! ! !

" " "

¨ £©¨

Rules ¨

¨() * + () * - ¨( * -

§ IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS–PART B: CYBERNETICS, VOL. ?, NO. ?, ? 2003 13

2 System output Model predictions 1.5

0.5

0 y(t−2)

−0.5

−1

−1.5

−2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 y(t−1)

Fig. 6. Modelling results for Example 3.

£¡

1) At the £ th forward regression step, where , for and select

. ¢ +

- £ £1

£ ,

¡ , compute ,

, ¡

; 0 £

; for

- / - / / / - /

. § . ? . ¢ . ¢ + .

. £ £ ? -

/ , ¡

¤ ¥

. ¢ + ;

. ¢

£71 1 £1

, ¡

;

0 £ £¥¤

for and if /

¦ ! ¦ A ! ¦ £

. ¢ +

' ' " £ ' ' "

0 £

if ¡

. ¢ +

/ /

¡ ¡

. .

/ '. £ " #

. ¢

? £1

¡ /

£¨§ £

/ / - /

" '. if

. ¢

¡ ¡

. ¢ ?¢. ¢ . 1

- ©

' ' " £

, ¡

; £

: : (43) "!$#

§ for selected rule base energy

/ / /

¡ ¡ ¡

. . .

£ " #

level information extraction

/ / /

. ¢ . ¢ . ¢

. ¢ + (39) ?

The selected sub-matrix/ exchanges columns with

?¢.

sub-matrix . For notational convenience,/ all the sub-

? . £ 1

/ / /

£ , ¡

¡ matrices will still be referred as ,

. . ¢ .

" # ¦ " £

/ /

. ¢ . ¢ (40) according to the new column sub-matrix order £ in , even if some of the column sub-matrices have been

interchanged.

¦ 2) The procedure is monitored and terminated at the de-

/ /

+ +

¦ ¦

/ £ 4 ' ' " !

. . +

+ +

. ¢

£ © ' ' "

rived step, when , for a prede-

£ 1

£ £ termined . Otherwise, set , goto step (41) 1. 3) Calculate the original parameters according to (27). Find

ACKNOWLEDGMENT

/ /

. ¢ + . ¢

' ' " £ ¥ ' ' "

XH gratefully acknowledges that part of this work was

¡ ¡

£ ,

¢ supported by EPSRC in the UK. The authors would like to

"!$# § for rule base selection (42) thank the referees for the constructive comments. IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS–PART B: CYBERNETICS, VOL. ?, NO. ?, ? 2003 14

REFERENCES Xia Hong received her university education at Na- tional University of Defense Technology, P.R.China [1] C. J. Harris, X. Hong, and Q. Gan, Adaptive Modelling, Estimation and (BSc, 1984, MSc, 1987), and University of Sheffield, Fusion from Data: A Neurofuzzy Approach, Springer-Verlag, 2002. UK (PhD,1998), all in automatic control. She [2] M. Brown and C. J. Harris, Neurofuzzy Adaptive Modelling and Control, PLACE worked as a research assistant in Beijing Institute Prentice Hall, Hemel Hempstead, 1994. PHOTO of Systems Engineering, Beijing, China from 1987- [3] K. M. Bossley, Neurofuzzy Modelling Approaches in System Identi®ca- HERE 1993. She worked as a research fellow in the De- tion, Ph.D. thesis, Dept of ECS, University of Southampton, 1997. partment of Electronics and Computer Science at [4] R. Murray-Smith and T. A. Johansen, Multiple Model Approaches to University of Southampton from 1997-2001. She is Modelling and Control, Taylor and Francis, 1997. currently a lecturer at Dept of Cybernetics, Univer- [5] T. Takagi and M. Sugeno, “Fuzzy identification of systems and its sity of Reading. She is actively engaged in research applications to modelling and control,” IEEE Trans. on Systems, Man, into neurofuzzy systems, data modelling and learning theory and their and Cybernetics, vol. 15, pp. 116–132, 1985. applications. Her research interests include system identification, estimation, [6] M. Feng and C. J. Harris, “Adaptive neurofuzzy control for a class of neural networks, intelligent data modelling and control. She has published state-dependent nonlinear processes,” International Journal of Systems over 30 research papers , an coauthored a research book. She was awarded a Science, vol. 29, no. 7, pp. 759–771, 1998. Donald Julius Groen Prize by IMechE in 1999. [7] H. Wang, M. Brown, and C. J. Harris, “Modelling and control of nonlinear, operating point dependent systems via associative memory networks,” J. Dynamics and Control, vol. 6, pp. 199–218, 1996. [8] R. Bellman, Adaptive Control Processes, Princeton University Press, 1966. [9] J. S. R. Jang, C.T. Sun, and E. Mizutani, Neuro-Fuzzy and Soft Comput- ing: A Computational Approach to Learning and Machine Intelligence, Upper Saddle River, NJ : Prentice Hall, 1997. [10] S. Chen, S. A. Billings, and W. Luo, “Orthogonal least squares methods and their applications to non-linear system identification,” International Journal of Control, vol. 50, pp. 1873–1896, 1989. [11] S. Chen, Y. Wu, and B. L. Luk, “Combined genetic algorithm Chris Harris received university education at optimization and regularized orthogonal least squares learning for radial Leicester (BSc), Oxford (MA) and Southampton basis function networks,” IEEE Trans. on Neural Networks, vol. 10, pp. (PhD). He previously held appointments at the Uni- 1239–1243, 1999. versities of Hull, UMIST, Oxford and Cranfield, [12] M. J. L. Orr, “Regularisation in the selection of radial basis function PLACE as well as being employed by the UK Ministry centers,” Neural Computation, vol. 7, no. 3, pp. 954–975, 1995. PHOTO of Defence. His research interests are in the area [13] H. Akaike, “A new look at the statistical model identification,” IEEE HERE of intelligent and adaptive systems theory and its Trans. on Automatic Control, vol. AC-19, pp. 716–723, 1974. application to intelligent autonomous systems, man- [14] A. C. Atkinson and A. N. Donev, Optimum Experimental Designs, agement infrastructures, intelligent control and es- Clarendon Press, Oxford, 1992. timation of dynamic processes, multi-sensor data [15] X. Hong and C. J Harris, “Nonlinear model structure detection using fusion and systems integration. He has authored or optimum experimental design and orthogonal least squares,” IEEE co-authored 12 books and over 300 research papers, and he is the associate Transactions on Neural Networks, vol. 12, no. 2, pp. 435–439, 2001. editor of numerous international journals including Automatica, Engineering [16] X. Hong and C. J Harris, “Nonlinear model structure design and Applications of AI, Int. J. General Systems Engineering, International J. of construction using orthogonal least squares and d-optimality design,” System Science and the Int. J. on Mathematical Control and Information IEEE Transactions on Neural Networks, vol. 13, no. 5, pp. 1245–1250, Theory. He was elected to the Royal Academy of Engineering in 1996, 2001. was awarded the IEE Senior Achievement medal in 1998 for his work on [17] S. Chen, “Local regularization assisted orthogonal least squares regres- autonomous systems, and the highest international award in IEE, the IEE sion,” Submitted to International Journal of Control, 2003. Faraday medal in 2001 for his work in Intelligent Control and Neurofuzzy [18] S. Chen, X. Hong, and C. J. Harris, “Sparse kernel regression modelling System. using combined locally regularised orthogonal least squares and d- optimality experimental design,” IEEE Trans. on Automatic Control, p. Accepted., 2003. [19] T. Taniguchi, K. Tanaka, H. Ohtake, and H. O. Wang, “Model construction, rule reduction and rubust compensation for generalised form of takagi-sugeno fuzzy systems,” IEEE Trans. on Fuzzy Systems, vol. 9, no. 4, pp. 525–538, 2001. [20] Y. Jin, “Fuzzy modelling of high dimensional systems: complexity reduction and interpretability improvement,” IEEE Trans. on Fuzzy Systems, vol. 8, no. 2, pp. 212–221, 2000. [21] H. Roubos and M. Setnes, “Compact and transparent fuzzy models and classifiers through iterative complexity reduction,” IEEE Trans. on Fuzzy Sheng Chen obtained a BEng degree in control en- Systems, vol. 9, no. 4, pp. 516–524, 2001. gineering from the East China Petroleum Institute in [22] X. Hong and C. J Harris, “A neurofuzzy network knowledge extraction 1982, and a PhD degree in control engineering from the City University at London in 1986. He joined and extended gram-schmidt algorithm for model subspace decomposi- PLACE tion,” IEEE Transactions on Fuzzy Systems, vol. Accepted, 2002. the University of Southampton in September 1999. PHOTO He previously held research and academic appoint- [23] P. Dierckx, Curve and Surface Fitting with Splines, Clarendon Press, HERE Oxford, 1995. ments at the Universities of Sheffield, Edingburgh [24] S. R. Gunn, M. Brown, and K. Bossley, “Network performance and Portsmouth. Dr Chen is a Senior Member of assessment for neurofuzzy data modelling,” in Intelligent Data Analysis, IEEE in the USA. His recent research works include adaptive nonlinear signal processing, modelling and pp. 313–323. 1997. [25] K. W. Gruenberg and A. J. Weir, Linear Geometry, D. Van Nostrand identification of nonlinear systems, neural network Company, Inc., 1967. research, finite-precision digital controller design, evolutionary computation methods and optimization. He has published over 200 research papers. [26] T. Soderstrom¨ and P. Stoica, System Identi®cation, Prentice Hall, 1989. [27] D. W. Marquardt, “Generalised inverse, ridge regression, biased linear estimation and nonlinear estimation,” Technometrics, vol. 12, no. 3, pp. 591–612, 1970.