FISHERIES RESEARCH BOARDOF CANADA rteJq Translation SeÈies No. 2348

A method of mathematical modelling of complex ecological systems

by . G. Ivakhnenko, . V. Koppa N. N. •Todua, and G. Petrake

Original title: Metod matematychnoho modelyuvannya sladnykh ekologichnykh system

From: Avtomatykâ, Instytut kibernetyky AN URSR•(Automatic control, Institute of Cybernetics of Academy of Sciences of the Ukrainian Soviet Socialist Republic), (4) : 20-34, 1971

Translated by the Translation Bureau(JS/TTH) Foreign Languages Division Department of the Secretary of State of Canada

Department of the .Environnent Fisheries Research Board of Canada Marine Ecology Laboratory Dartmouth, N. S. 1973

33 pages typescript 7 4 t It F PJ:3 • DEPAiTMEIT OF THE SECRETARY OF STATE r- SECRÉTARIAT D'ÉTAT 4 TRANSLATION BUREAU BUREAU DES TRADUCTIONS

MULTILINGUAL SERVICES DIVISION DES SERVICES CANADA DIVISION MULTILINGUES

TRANSLATED FROM - TRADUCTION INTO - Ukrainian English AUTHOR - AUTEUR A. G. Ivakhnenko, Yu. V. Koppa, N. N. Todua and H. Petrake

TITLE IN ENGLISH - TITRE ANGLAIS

A method of mathematical modelling of complex ecological systems.

TITLE irt , OREIGN LAN ( UAGE (TRANSLITERATE FOREIGN CHARACTERS) TITRE EN LANGUE ÉTRANGÉ'RE (TRANSCRIRE EN CARACTkRES ROMAINS)

Metod matematychnoho modelyuvannya skladnykh ekologichnykh system.

REFERENCE IN Ï-- OREIGN LANGUAGE (NAME OF BOOK OR PUBLICATION) IN FULL. *TRANSLITERATE FOREIGN CHARACTERS. RÉFÉRENCE EN LANGUE ÉTRANGÉRE (NOM DU LIVRE OU PUBLICATION), AU COMPLET, TRANSCRIRE EN CARACTIRES ROMAINS.

Avtomatyka, Instytut kibernetyky AN URSR, No. 4, 1971.

REFERENCE IN ENGLISH - RÉFÉRENCE EN ANGLAIS

Automatic control, Institute of Cybernetics of the Academy of Sciences of_Ihe Ukrainian Soviet Socialist Republic, No. 4, 1971. PUBLISHER- ÉDITEUR PAGE NUMBERS IN ORIGINAL DATE OF - PUBLICATION NUMÉROS DES PAGES DANS Institute of Cybernetics of • DATE DE PUBLICATION L'ORIGINAL the Acad. of Sci. of the Ukrainian SSR. 20-34 YEAR ISSUE NO. VOLUME PLACE OF PUBLICATION ANNÉE NUMÉRO NUMBER OF TYPED PAGES LIEU DE PUBLICATION NOMBRE DE PAGES Kyiv, UKR.SSR DACTYLOGRAPHIÉES 1971 4 33

REQUESTING DEPARTMENT TRANSLATION BUREAU NO.

MINISTÈRE-CLIENT Environment NOTRE DOSSIER NCI 183074 Fisheries Res. Board BRANCH OR DIVISION Marine Ecology Lab. TRANSLATOR (INITIA LS) J.S. erlIr DIRECTION OU DIVISION TRADUCTEUR (INITIALES) uo ea, e Bedford Inst. of Oceanography 1 PERSON REQUESTING Dr. K. H. Mann CD UNEDITED TRANSL/4TI011 DEMANDÉ PAR ni or "4,4tormation.only YOUR NUMBER TRADUCT,ION NON REVISPI: VOTRE DOSSIER NCI M.f...p.rnution souleir

DATE OF REQUEST DATE DE LA DEMANDE July 25, 1972

SOS.200-10-5 (REV. 2/58) 75'30.21.029.5333 Tr DEPARTMENT OF THE SECRETARY OF STATE SECRETARIAT D'ÉTAT TRANSLATION BUREAU BUREAU DES TRADUCTIONS

MULTILINGUAL SERVICES DIVISION DES SERVICES DIVISION MULTILINGUES

CL ENT'S NO, DEPARTMENT DI VISI ON/BRANCH CITY N° DU CLIENT MINIS Ti.RE DIVISION/DIRECTION VILLE F.R.B. Environment Marine Ecology Lab. Dartmouth, N.S.

BUREAU NO. LANGUAGE TRANsLAT0R(INITIALs) N ° DU BUREAU LANGUE TRADUCTEUR (INITIALES) DEC 1 4 972 183074 Ukrainian J.S.

•or inliormar:on Automatic control. TRADUCTION

Institute of Cybernetics of the Academy of Sciences of the Ukrainian

Soviet Socialist Republic. •

A method of mathematical modelling of complex ecological systems.

By: A. G. Ivakhnenkô, Yu. V. Koppa, N. N. Todua and G. Petrake

(Kyiv.)

• Summary .

The method of data handling by group (M1.)1IG) is applied to synthesize an analog predicting the quantity of bacteria in the I"Zybinslt reservoir with extrapolation time for a year. The method is based on the principle of self-organization at which it is enough to observe only a small part of the characteristic vector components, as a result of which a complex problem of simulation turns into a comparatively simple one.

SOS.-2.00-10-31

7530.21-029.5332 ■

7 2

Statement of the problem of modelling aquatic

ecological systems.

In the coming years, automatic computerized control centres

will created. These centres will be connected by means of

telemetered systems to the transducers, which will apt upon the

active elements controlling the ecological conditions in reservoirs.

A reservoir will thus become an object of automatic control, and,

because of this, mathematical modelling of ecological systems in the

reservoirs will be more and more necessary.

Below, an attempt is made to adopt a new approach for the

simulation of aquatic ecological systems by introducing heuristic

self-organization in which, among others, nonlinear high-degree

finite-difference equations ("polvnnmial descriptions") are used

instead of differential equations. This method is more adequate for

the problems involved in the simulation of complex Systems, and maY

provide not only qualitative, but also quantitative estimations of

the variables.

The models available so far are useful only for a qualitative

analysis of various processes, a fact admitted even by the authors of

these models. For example, in (2), where one of the best deterministic

models is described read the following: "The results of the analysis

of an aquatic ecosystem provided by this model can only be treated as

purely qualitative; in order to obtain well-founded quantitative data,

a considerable amount of further work is required".

3

The authors of the present paper claim, however, to have

developed a mathematical model, which provides not only qualitative,

but also quantitative estimates.

Accuracy in the simulation of complex systems requires

an increase in the complexity of mathematical descriptions.

There exists a certain discrepancy between the complexity of

the objects of mathematical modelling and the simplicity of the means

employed for this purpose. Until recently, modelling was done either

by means of deterministic methods (based on the study of simple

differential equations, e.g. of the linear equations of convective .

diffusion), or by statistical methods of simpie regression analysis

(which do not reach beyond the scope of the linear or quadratic

regression). A simple . substitution of . finite differences for deriv-

atives suffices to show that the complexity of the mathematical

description is extremely low in all cases and that, in principle, it

cannot ensure accuracy in modelling complex systems.

Example. The equation of convective diffusion in the diffetential

form is

- OS ô OS) - -1- - I\ •• `Oxi

where S stands for concentration of matter; t for time; -v. for stream /p.21

velocity;xi for coordinates; for the coefficient of turbulent Kij diffusion; K for the coefficient of non-conservation. 4

Let us approximate the derivatives by finite differences

DS-• A S S. — S S • — , i i•1 DS AS S 1 - • , . OXe b à

By substituting the expressions obtained in the original

equation, we obtain the algebraic equation

Sivi+ ao

It may thus be seen that the above differential equation

corresponds, from the poinE of view of its complexity, to the linear

regression equation.

The fact that the striking discrepancy between the complexity

of the mathematical apparatus on the one hand, and the complexity

of the object on the other, is not even noticed (the reasons for the

inaccuracy are often explained by the fact that some other factors

should also be considered) is due to the deterministic way of thinking

of the researchers, which has become deeply rooted and represents the - main shortcoming of contemporary mathematical modelling. This deficiency

is eliminated by means of the MDHG.

Method of data handling by group ( ,IDHG).

The method of data handling by group (MDHG) is similar to

the methods of mass selection of plants or animais (7). A certain number of input data (which are called factors or arguments) is used 5

for the construction of all their possible combinations by pairs.

Each pair of arguments provides a "partial description", the coefficients of which are determined by the solution of a small system of normal equations (on the basis of the minimum mean square deviation). In the above procedure use is made of a certain experimental selection of data referred to as the learning sequence. The complete description is obtained by excluding intermediate variables from the set of partial descriptions.

0.1it «MI

0.10 Choice according to "the rule of the left angle".

005“

0.04 Oubip ia070 rigria" 0.02 0015 _

4 8 quCel pöô Ceti Number of selecti.on steps

Fig. 1. Use .of the index of regularity (deviations in control sequence) in increasing the number of selection steps (the problem of forecasting the number of bacteria at the number of nodes in the learning sequence = 9, in the control sequence = 5).

The essential difference between the MDHG arid other mathematical methods (recurrent methods, decomposition methods, et al.) is that both initial sequences of experimental data (the learning and the control sequence) increase in each consecutive selection step: the best results of a preceding step are used as the initial data in the next selection step, and so on. 6

The structure of the MDHG algorithm is multiple-stage: the intermediate variables, obtained in the first step, are used to form combinations by pairs in the next step of selection etc. At the end of each step a threshold self-selection takes place, similar to that observed in the mass selection of plants or animais: only a certain percentage of the most regular intermediate variables is admitted into the next step.

The regularity of the variables is defined either by their correlation coefficient, or by the value inverse to the mean square deviation, determined from a separate control sequence.

The rule for stopping the increase in the number of selection steps is as follows: as soon as the regularity criterion rises to the permissible level or begins to decrease systematically, the increase in the complexity of the full description discontinues ("the rule of the left angle").

We may choose the first or the second local minimum of deviation (Fig. 1).

Over 20 different modifications of the MDHG algorithm have been proposed so far. They differ in the type of the key functions that are used for the construction of "complete" and "partial" descriptions and which are similar to one another in each individual algorithm.

In the main MDHG algorithm, quadratic polynomials are used as key functions (3). The degree of complete description increases then in each successive selection step: a quadratic regression is realized in the first step, regression to the fourth degree in the second step, regression to the eighth degree in the third step etc. 7

The examples of computation available so far show that a

high degree of accuracy in modelling is attained at a high degree

.of complete description. For example, in building a mathematical

model of the balance of paYments in England for 1969, it was

discovered that an error of Z= 0.168% can be attained at the 64-th

degree of the complete regression equation (4). Such a high degree

of accuracy of the quantitative estimations in modelling had been

previously unattainable.

Optimization of the complexity of a model by means of the 'DHG.

The method of data handling by group (MDW) discussed below,

represents an attempt at developing a new technique of mathematical

modelling proceeding from the principles of the heuristic self-

organization, a technioue, which would make it possible to gradually

increase the complexity of a mathematical representation (model) as

long as it leads to an increase in the accuracy of the modelling.

In other words, the MDHG solves the problem of attaining optimum

complexity of a model. The gradual increase in the complexity is

controlled according to the value of the mean square deviation, which

is determined, from a separate control sequence of data. Usually,

with an increase in the complexity of the model the deviation decreases

drastically, producing two or three minima, followed by a slow, gradual

increase. The first (or global) deviation minimum determines a single

model of the optimum complexity. 8

Plurality of mathematical models and singleness of

the most regular (optimal) model.

Until now emphasis was placed on the so-called plurality of regression equations. This consists in the fact.that, with a change in the complexity of the regression equation (e.g., with a change in the number of its terms or in the degree), the numerical values . of the coefficients at a given variable also change. Thus, coefficients of a regression equation cannot be regarded as coefficients of value indicating the function of a given variable.

For example, V. V. Nalimov and N. 0. Chernovaya state in their well-known work "StochastiC methods of planning n exneriments" that

"there is no point in attaching a value to individual coefficients of regression". Taking into account that the principles of regularization make it possible to determine the one single optimal polynomial of regression, the above statement appears to be erroneous. Coefficients of the one single regular polynomial provide one single value for each variable.

Addition to the regression analysis of various procedures aimed at regularizing the solutions (according to M. 0. Tikhonov,

V. K. Ivanov et al.) makes it possible to find the one single optimum equation of regression, the coefficients of which may be regarded as the value coefficients of corresponding variables.

Irregularity or incorrectness manifests itself in the fact that the regression equation, obtained on the basis of a given set of * Stokhasticheskie metodv planirovaniya eksperimenta. 9

interpolation nodes, differs from the regression equation obtained by using another set of nodes of the same process. The purpose of the regularization is to decrease the deviation obtained for the new points.

Therefore, particular attention is paid in the MDHG to making

the regression equationsregular, and the problem of determining the accuracy of description is treated in a different manner. If the

interpolation nodes, which were used to estimate the coefficients, are also utilized to verify the accuracy, then the more complex is

the model, the higher will be its accuracy. It is easy to prove -

that in a situation where the number of terms in a new model is greater

than the number of interpolation nodes (n N),.'the solution is incorrect:

small changes in the data induce great changes in the values of the

coefficients. For the purpose of regularization, all the data are

divided into two approximately equal groups: a learning sequence and

a control sequence. The former sequence is used only for the estimation

of the coefficients, the latter - only for the estimation of

accuracy. This technique makes it posèible to determine the optimum

complexity of the mathematical model, because, while the complexity

(degree of polynomial) of the complete model increases, the accuracy

first increases, then (having attained the maximum), begins to decrease.

The accuracy maximum enables us to determine the one single mathematical model of a: complex object, which is both regular and optimal from the

viewpoint of complexity. A magnitude inverse to the mean square

deviation of the control sequence, or the correlation coefficient may 10

serve as a criterion of regularity of the regression equation.

If we had not adopted the principle of regularization, we would have arrived at an erroneus result: with the increase in complexity of the model, the accuracy would increase continuously and, eventually, if the number of possible states of the model should become equal to the number of experimental points (inter- polation nodes), the deviation would seem to disaDpear. It is impossible to find the optimum complexity of a model without regularization.

When the principles of regularization are applied, the plurality of regression equations does not contradict the singleness of the optimal equation (optimal model) for a 4ven set of variables.

"Vertification of hypotheses".

The one single regular optimal regression equation has an interesting feature: its coefficients reflect the value of the arguments before whiCh they stand.

This makes it possible to verify hypotheses. A hypothesis may represent the assumption that an equation should inclUde a certain term, usually non-linear and complex, corresponding to a given phenomenon.

Should it be determined as a result of the regularization, that the coefficient standing before the term in question is close to zero, it would mean that the physical mechanism of the process is such that this term does not exist in the given law. Only the terms with coefficients 11

that are not equal to zero remain in the equation. Mathematical verification of physical, biological and other hypotheses thus becomes possible.

Example. Let us consider the function of time

1— r— /3 + 15 .

Let us assume that the function is unknown, and that we are given only seven points of this function (Table 1). The problem consists in determining it by means of the MDHG.

We shall apply the MDHG algorithm with linear polynomials (8).

Let us assume that we choose m = 8 as for our.first example, we would obtain the following complete regression equation:

at) F ail a,12 a 113 ate.

If by using the MDHG we obtain a = a = a = 0, it will confirm 6 7 8 that the method examined provides a "verification of hypotheses" in the above sense.

As a criterion of regularity we shall choose the coefficient of correlation between a given intermediate variable and the primary variable r , calculated from the data of the control sequence: the higher the coefficient, the greater the regularity.

12

Table 1

m P cli .. ., .. 4-) H H H 0) .., 0 •, 0 . 0 • 0 P 0 P 0 P 0 ai P 4 P 4-) P 4..) P P CO 0 rd 0 çe 0 cyl cd a) 0 cl) 0 (1) 0 Cl) R-1 H c) H Ci H C.) H

I Irpa sir pli uo' nep I 11.11e1 nep yawl nei, I 1::1 Lei

--3 0 2 .; 364 6:3 G 1 . —21 —182

First step of selection. There are eight variables on the input of the

MDHG algorithm.

E, X2 77-= 12, X3 X8 =

These variables provide 28 partial equations of the first step. The

following five equations (with corresponding correlation coefficients

turned out to be the most regular ones:

h c•iti =. 0,9992866; 0,9958979; 0,9909868; 0,9884087 i 0,968 9235): fr y1 (4 — 5). 1,9 H- 1,1x4 — 1,1235x3, y2 (2 —3).--8 Jr 1 1 — 10,101x1, y3 (7— 8). 2,987 0,12.1729x7 0,013.1146x5, y4 (6— 7) = 2,87913 0,120879x6 — 0,124829.1:7, y5 (1 — 2). — 8 -- 82,2x 1 ± 11x2.

13

Second step of selection. The five intermediate variables

. provide ten partial equations of the second step. The most i Y regular ones among them are the following four eauations (the

corresponding correlation coefficients are 0.9999233; 0.9998364;

0.9997113; 0.9982104):

• z1 (45 12) ---=-- — 0,09817363 0,023y 0,979y45, z, 23 —45) == — 0.00832 ± 0,208577y 23 0,791504y45,

za (67 -- 23) — 0,0123 -4 - 0,712148/47 +.0.2881134i/23, (78-231.-0.0122 -;-0.712466y7,+ 0,2:S77451/23.

Third step of selection. The four intermediate variables

z provide six partial equations of the third step. 'The most regular i one among them is the last equation (çorrelation coefficient 0.9999730)

ul (12-23-45) = —0,00)J 0,5z1 -1- 5z,

Having excluded the intermediate variables, we obtain a so

called ttanalogueT of the initial complete regression equation

0,702663 -- 0,9455S77x, 1.273712x., — 0,973723.v., 0,99.16671A%.

In fact we obtain a = a = a = 0, which is what we wished to /p.25 6 7 8 prove. If the specified points were valid for several polynomials

rather than for one polynomial, we would obtain at the end of the selection

all these polynomials with a zero deviation. L.

14

Relation between regularization and the Gedel theorem.

The problem of separating the information concerning a process -

into the basic and supplementary information, is related to the

"principle of external complement" of Stafford Beer, as well-as to the

Gedel theorem.

In the above prOblem four nodes have been assigned to the basic **, information, and three nodes were classified as supplementary information

Since the number of terms of the regression equation (ten) exceeds the

number of basic nodes (four), it is possible to find an infinite multitude

of polynomials which will satisfy these nodes. The supplementary

information (three nodes) ià used only for the purpose of choosing the

one single regular polynomial out of this infinite multitude of polynomials.

If we used a different approach to this problem, we could include

ail the eight nodes among the basic information. In that case some other

supplementary (supporting) information would be required to enable us

to choose the one single regular polynomial. Apparently "additional

complement" is necessary in principle for solving the problem of

selection of the one single polynomial.

Peculiarity of the self-organization approach and indirect

determination of essential variables in complex systems.

By exaggerating slightly, we can formulate the self-organization

approach in the following manner: "I know that I do not know anything

about a complex system; let us leave it to the computer to deal with the * Transliteration. Might stand for g8del. (Translation). ** Several nodes are required in the control sequence only in the presence of noise. One small supplementary point, suffices for deterministic functions without disturbances (see the algebraic minimum of interopolation nodes presented below). 15

whole problem on the basis of experimental data only".

The initial parameters, qualitative indices of the situation, must, however, be provided by man. In problems concerning the optimal

control we must also indicate the controlling factors. Depending on

the choice of the initial values and controlling factors, we can

obtain a vast variety of optimal mathematical models of the complex

system.

Let us assume that the characteristic vector of a given complex

sYstem includes M variables which may be interconnected by m relation-

ships (equations). In order to obtain a complete mathematical description

of the system it suffices them to use (14 - m) variables. The other

intermediate variables can be eliminated. Knowing the composition and

number of the variables, as well as the number of their interrelations,

it is easy to calculate the minimum number of the variables, which

should be included into the model .

• For example, if the system is characterized by 21 variables

related to one another by 16 equations, it suffices to study the relation

between the initial magnitude and only 21-16 = 5 variables (example

of the mathematical model of England's economy). It is important to

note that these five variables may be any of the variables included

in the original set. This simplifies considerably the problem of

finding the necessary set of initial data, since we can use readily

'measurable values. •

The peculiarity of the models obtained by the method of self-

organization consists in the fact that it is possible to exclude the

seemingly important factors. For example, in the model of the biomass

of aquatic plants in a reservoir, variables such as the solar radiation

* We may include more, but not fewer variables than the number indicated above. 16

or the inflow of foreign matter etc. may be absent. This does not

mean that the model is wrong. It only indicates that these factors

are measured indirectly by means of other variables, since all the

variables of a complex system are interrelated.

The indirect measuring of variables has been used for a long

time in automatic control, where it bears the name of the"differential

fork method". The task of simulation becomes considerably easier,

since it does not matter greatly whether it is possible to measure a

certain important variable. Experimental values of the other variables

replace the variable in question according to the principle of the

indirect determination of the variables. The peculiarity of the self-

*organization approach consists in the fact that in the general event

one does not even know which variables replace one another: the diagram

of the model similar to that shown in Fig. 2, is not self-explanatory.

We may thus see, that a physical model of the kind shown in Fig. 2, is

by no means better suited for obtaining quantitative results. For

this purpose we need models that are completely different from the

physical models both in the type of functions and in the arguments. .A

great multitude of equivalent, though diverse representations is obtained,

depending on which intermediate variables are excluded and which ones

remain in the description. In this sense, the optimal models are

multiple. But each set of variables has one single optimal model. 17

• 0 2 exchange_ Solar radiation. ------Ommtie 4zx CO exchange COitiiqtra 2 ____...6.5m;„ci..,„ir,, X . „,,-- L

0

0Pa/in Y■4 th ,?O xino.-wor .petiaiuitu Inflow of foreig--- organic matter iN nodxGawehm me,7peniiiy.1 N - Ca0r1110P ,1111 I pmeetam peqauflu YD T puou &fle Inflow of foreign in- - pabil litni?CerfriR 1 dtmE•cennsi 1 aglefic- organic matter beoliemn 0,020- dame Heupeamitulux rilVillIX petioôim pettoidn . • / Outflo and deposition Breeding Catching of organic matter of fish of fish Outflow and deposition of inorganic matter

Pi g. 2. Structure of a model of an ecological system (constructed on the basis of the dàta contained in (2) ). X - oxygen; W - carbonic acid; B A - biomass of bacteria; B A saprophylitic bacteria; D - organic matter (oxialzability); N biogenic rnorganic matter; B 1 - biomass of plankton; Bi - primary productivity; B2 - biomass of zooplankton; B - planktofagous fish. 3 Combined method of simulation.

We refer to our method of simulation as the combined method

and to the models themselves as "post-balance" models. The reason

for these names lies in the fact that the method rests on the choice

of the characteristic vector on the basis of the grouping . of the

equations of balances (the deterministic part) that after that the

data are handled by the group method (the self-organization). This 18

combined method permits to solve problems faster since it eliminates or reduces considerably the accidental surplus of arguments by using the information criteria or the criterion of accuracy.

In spite of the'fact that the deterministic approach and the self-organization approach are contradictory in principle, the most effective methods are, nevertheless, the combined methods, in which the composition of the characteristic vector describing a complex system is established by the common deterministic methods

(composition of differential equations, equations of the balance of matter or of energy etc.), while the synthesis of the equations Of the mathematical model and the determination of its parameters are realized by the methods of self-organization.

Optimization of the complexity of the mathematical model of an ecological system by means of the MDHG algorithm with quadratic

partial polynomials (using the example of the Rybinsk reservoir).

Initial information for the model of a complex system may be represented by a set Of mean annual data from 10-15 years of observations.

These data suffice for the determination of the coefficients of the so-called "partial" descrintions, which relate the principal variable to any two arguments, by way of a simple regression analysis (i.e., based on the criterion of the minimum mean square deviation), e.g.,

. . ...„ . _ . . aux./ + auxrvh ar,n,

2 where i = 1,2..., n-1; h = 2,3..., n; i = C h - number of arguments. 19

"Partial"descriptions of this type are composed for all the possible pairs of arguments, which characterize the state of the ecological system. This constitutes the system of "partial" descriptions of the first step. The resultant variables of the first step are used as arguments of the second step etc. The threshold elements allow only the most regular variables to pass from one step to the next (waich corresponds to the law of mass selection). The important point is that a small number of the interpolation nodes, remaining the same throughout the entire procedure, enables us, to determine the values of the coefficients of all the partial models in any step by a simple regression. analysis. And subsequent elimination of interm'ddiate variables enables us to obtain no matter how complex "complete" descriptions of a complex system with a number of arguments of up to N = 100. The complexity of a model is determined by the number of the steps of partial descriptions used or, to put it in different words, by the degree of the complete polynomial.

The necessary minimum number of nodes of the learning sequence

(with a matrix of 6 x 6 elements) is equal to seven. The higher the number of its nodes, the More (under certain conditions) reliable will be the results. The minimum number of nodes in the control sequence depends on the "degree of stochasticity" of the system: in the case of deterministic systems it is sufficient if one single node is used. The algebraic minimum of nodes is therefore 7 + 1 = 8 nodes. It has been suggested that this number be reduced'to 5 nodes.

The last figure represents the maximum possible learning speed of the 20

taught" model model. The speed of the decision-making response of a u- time is determined by the maximum displacement of its arguments in

(usually comes to 3 - 4 years). This displacement (time lag) is the proportional to the duratin of transient processes within

system.

Table 2

Variables Correlation coefficient

C.D - . mean annual abundance of bacteria 1.0 x - abundance of blue-green algae (e.■; = 1 ) : 0.59 area of 2 the-summer feeding grounds "(et': 4) 0.77 x - p/b 3 coefficient of zooplankton ( fe = 1) o.633 x -alundance 4 of zooplankton (May-October) ( frp = 0.630 x - 5 biomass of the blue-green algae (ie = 1) 0.6508 x6 - biomass of the blue-green algae (IF: = 4) 0.586 - biomass of zooplankton (per year) (re . 1) 0.576 x8 - permanganate oxidizability at the station POM . 2) 0.568

X 9 -abundance of zooplankton (per year) (1; . 1) 0.561 x10 -alundance of blue-green algae , 4) 0.539 11 -biomass of diatomaceous algae 1) 0.545 x12 - ion content ( Ç 1) . a.516 xi3 - saprophytic bacteria (at MPA) 2) 0.879 xi4 - accumulation of water in the trough of reservoir ( = 4) 0.557 21

Table 3 (translation of captions)

(0) Source data (departure of the normalized deviations from the average)

(1) Variant

(2) Learning sequence

(3) Control sequence

(4) Variant II

(5) Learning sequence (optimal)

(6) Control sequence

(7) Variant III

(8) Learning sequence

(9) Control sequence

(10) Correlation coefficient •

(11) Years 2 (12)Dispersion D

(13)Parameters

(14)Abundance of bacteria in the reservoir

(15) Abundance of blue-green algae

(16) Area of the summer feeding grounds

(17) P/B coefficient of zooplankton •

(18) Abundance of zooplankton

(19) Biomass of blue-green algae

(20) Biomass of blue-green algae

•(21) Biomass of zooplankton per year

(22) Permanganate oxidizability at the POM station

(23) Abundance of zooplankton per year

(24) Abundance of blue-green algae Table 3 (cont'd)

(25) Biomass of diatomaceous algae

(26) NO content (average from the three stations) 3 (27) Resistance of the algae which grow in the MPA

(28) Accumulation of water in the trough of the reservoir

Table 3

Cs:

(c1 (:;;;,...-14: y itupw ,:;:orlx CiX.' I11' U , ccpcsmboro)

. ... . .. . (2) (3 ¶1• -•,o. • • (I ) r•sp.mir 1 ) I , - ii,,,,,o,, ,,,,...::,.,,..,e;:. „„„,,,,..,,,,„, 1 ;,„...„.„.1 1 ..M.---:;,(, ■1:11:‘ 71, (4;) (1.1.:mr 11 (J-} (6) ti. I — ------ _...—_____. — Franxv1'e ,.. rt.li,er -;c.d iii,.:,, (9) 1: 1_ ,, , ,..,..-ii:-. • .., -i• / ii.4.i.urr III 1 ( 6') I __ (7)--- - I . . 19f;t ■ lei 19:;') • 1,1*;7.) ito..),i 11•131 19 ,5 19•,2 1957 J P.:4 ( ii) Poil 1967 ol,r;oft -- I .:.ii.::%:., - 1,4Get ' 5.'1;4 5,3.17 4.4r;.-,.: (i2) "Incul..pciR I." 473.77. jp9.152 210,69 35.72:3 2r..7• 4.149 i.3b9

5 , $; 7 e 9 19 . 11 12 13 1; (i3) rbpa.miTot 1 2 3 4

a KTepin V 0,493 Oldt111Ce.lbl1iCTb er 0, I 83 : —0.9887 0,2042 0.0282 —0,4014 0.2254 0,1901 0,1338 - --0.12b 6 - --0.(Y, 04 0,1972 0,08-15 --0.5845 1 1:02.01:X0Bi1111 0.52-111Ce.11-.111CTb BO 1 0;70- X1',..X.t --0.092 -- 0.000 0 o 2,597 —0,451 •- 0,2:',' --- 0.6 ;5 --0,52 1,11 —0,38! —0,133 ' —0.081 0 70,595 1 1.-.outa niTysa 0 0 0 o o 1.4 5372 j 057 --0.7000 0 o o 0 —0.537 o 0.674 I 0- -,, X,:,1 7).11:5 KOC:PilliE117 30011.1311X3 0,0381 0 0.43944 .--0,45,;) 0 —0,03114 0,12155 —0,1484 0.02422 0 0,633 (// 1 - 0.0959 0.295 0,3737 0 T o;s a tr=1 (ioeitte.e.lbUiCib 300111a11R- --.0,3315 0,05511 --0.5795 —0,29919 0 0,4986 0,50683 --0,06102 0,00269 0 0,2073: --0,00808 0.37412 0 0,630

—0,315 —0,5-19 0 o 3,793 —0.339 : —0,464. -0,72 —0,639 1,057 --0,847 —0,036 —0,113 o —0.0508 (/5)510%1:Ica c,'s oz.; pocrerr 1 ,

1.057 --0,03-3 - 0,639 - - :.1,72 0 3,79 —0.847 —0.51933 —0,41593 0 --0,859 0 • —0,315 0 0,583 (e,/).)iomaLa CI3 sc,..LopocTeii - 4

2,013ioNnIc3 soorua KKTOKS 3a --0,17920 --0,07127 —0,78-101 --0,11447 0 —0,02807 0,53347 —0,13506 0,447 0 —0,15765 0.44708 0,46858 0 0,573 ( piK I e;•1.r.pma Kra :1 aT;:a --0.1279 0.050 -.0,1.S71 --0.1100 0 ' 0.0076 —0,1194 —0,0008 0.0609 0 —0,1104 0,0669 0,0754 0 0,503 ... B3i:iCTb ILO CT31.1il1i 130M 2 I (23)-lace.-.btaTb soormal:KTolin 0.',1',313.3 0,11652 —0.55101 —0.181 0 0.22579 0,45731 —0,01547 —0,01655 0 0,32316 (.1,0335F 0,52000 0 0,561 33 p:K 1 (24,thic,:.:ibi1iCTS C/3 so,s.opo- xio= 1,11 -- 0,13.3 --0,52 —0,645 0 2.597 —0.33! --u.569 --0,08! 0 —0,451 0 —0,092 0 0,532 cTeR 4 n= (2oy.cco .3.f3Tomos;ix so- 'e —0.348 — 0.676 0 o 1,7 : 5 --0.823 —0,313 —0,696 —0,62 -.1,4541 --0.503 1.431 —0,586 0 --0.5-15 sopocTe.fi I NO, c-pc,].;tiii 33 xi.2 .---1:31 — 0,05953 — 0,15 -- 0,08778 0,13684 --0,17894 0.1:O 2 0,20736 0.38947 —0,07:368 —0.04421 —0.03137 0.12 i 3- - r/.-1:: 0 = 1

(21,.1.iTiri so:to p ocicii, 3Ui pc-x13.=; 0 0,49 —0,302 0,144 1,56 0 —0.632 —0,355 —0,154 0 . --0,353 0 —0,193 0 —0,879 - cVTI. :1:1 , ,- 2,./ --19,5 14,416 5.1r7 0 0,838 - — 1,5..!4 1,62 1,98 0 1..059 0 !' 1' ,',, 9 —0,557 t 4 24

Example. The MDHG algorithm with quadratic "partial" regression equations is described in (3) and a program of computation with the help of this method is outlined in (8).

The problem consists in forecasting the average annual numbers of bacteria for 1968. Correlation analysis enabled us to determine 14 variables correlated most closely to the numbers of bacteria (Table 2) .

The value of the time lag which should be used with each argument is /p.29

given in the parenthesis. For example, in order to make the forecast 1964 (2'.r 4) etc. for 1968, we must take the value of x2 for Two formulations of the problem of forecasting. Let us not /p.31

forget (4) that there are two formulations of the problem of forecasting:

a) the problem of passive forecasting;

b) the problem of active forecasting for control purposes.

.q).x, • t t X k

of points of a control sequence of data; i is consecutive where N2 is number number of the point. The formula for calculating the mean square deviation in the control sequence is given in the example. When the correlation coefficient has the maximum value (is equal to one), the mean square deviation has the minimum value (is equal to zero). Each of these estimates may serve, however, as an index of regularity of the solution (sought function).

The former formulation precludes the use of arguments with the

time lag y = 0 (i.e. with the subscripts "m"). In the latter formulation

of the forecasting problem allowed such arguments may be used if they

are at our disposal and, consequently, represent controlling factors.

We can assign future values to these factors (for example, taxes for

The correlation coefficient of the initial variable with an argument xk (k = 1,2,...,$) is calculated from the formula 25

the next year). In the present example we shall confine ourselves

to the passive forecasting so that all the arguments with if= 0 and with the subscripts "m" will be excluded.

As is required bv the MDHG, the initial data (presented in

Table 3) were divided (according to the value of dispersion) into

the learning (nine points) and contrel sequence (five points)i

It has been proven by practice that' . the accuracy of forecasting

increases if the so called trend of the accidental preCess is isolated,

i.e. presented as the sum of a deterministic and accidental functions

of time. The method of isolating the trend is shown in (3).

In this example it is not necessary to isolate the trend, since

the mean values of the variable (f agree fairiy well with the trend.

The data Shown in the tables therefore deviate from the mean value

and normalized by it. For example,

--- A lcep --X 7-- i T. A. (Pi Ice.p

First step of selection. Fourteen arguments allow us to build

91 partial equations of the first step. It turned out that the most

regular ones among them are 14 equations of the following type:

— 6) — 0,11848 -1- 2,7312,t i — 0,5588xu + 3,856 -1- 4,0181:1 —

0.16015 0,0t5C)577.v, 1- 0,01358,v., — O. 82513x 4x,,

• • 0,169x.; . 1('2x , — 101 — 0.11631 — (,-161)23x3 — 0, I (51515x 10 — 0,8060:.;8x ;x1„ 0,516.q

1 rr1 — ( 14 1.) 0,01012 -; 0,437xi 0,021.1xt 9,788:: — 0,40544 -- 0,00955.,,r,. 26

Second step and the subsequent steps of selection. For the sake of brevity of exposition we shall not quote the results of the second or of the subsequent steps. The measure of regularity

(mean square deviation for the control sequence) varied as is shown in Fig. 1. In each step the 14 most regular regression equations were selected.

Presented schematically, the selection proceeded as follows:

14-91-14-91-14-91-14-91-14-91 etc.

Using the "rule of the left angle" (see above) we discontinue the selection at the fifth step. As a result of this we obtain a system of regression equations (Table 4).

Estimation of accuracy and forec-ast - for 1968. Accuracy of the model is characterized by two indices (determined from the control sequence).

27

Table 4

List of Dartial descriptions of regression equations

ao 1- niy] •1. n2Y3 01313 'r 02 4Y! 2 1- 05Ys2

• -*- -0,039075 0.1181,' 0,8302 • a= 99 - 2.273121 - 1,235574 1I3 ,- -1,726672 a 3 - 3.ti56111 04 .- 1,254919 4,0181798 ai,= -1,678156 n2 = -0;482695

a2Y74- a3.1/14Yrfa4qi1 2 + Y3- 4i F al v3 -1- a..,x10+ a3x3):10+ aoy i2

ao = 0,0002373455 :10=-0.116571 L1= 1,04151 al=-0,489233 03 =-0,01681926 2 -.-0,161516 a a3 -= -0.2711322 33 =-0,836038 04 = -0,25333925 0.516047 02 = 0,4701865 a3= 0,3035 23

21 ao -1- alY6+ a2y7+ 131/01/7-1- a41/22-1- Y14=00-1-alx14-Fa3x1-i-03214x1+ +asY72 -1- a4x14.2 -1- asxt a • a=-0010!2557 00=-0,121847 . • = 0,4370541 0 1 = 1,107143 . • 0,0211406 03= 0,9075258 03 = .0,7889551 (13 --.8,11022 04 -0,105898 04 = 2,959377 (12 -0,00965536 • =‘ 3,583206

2 ÷. y2 = 00 + a l x74-a2x1 + a3x7x1 -1- x7=.00-FaiY7-1-02Ye+a3y7ye,+‘24y..r . -';-0 4x72 4- n2x1 2 asYa a0 =-0,0459635 00=-0,1190251 a l .= 0,734754 . ai = 1,1002154 ay= 0,151264 02 = 0,7701988 423 -2,2286787 03=-9,654656 04 = 0,583286 04 = 4,66158 as = 0,357512 02 = 3,70143 Po= a0 -Fa1x6+c-34 -1- a3x.6x2 + -1-04.262 = 05x82 23 = ao+ aw2+a2Y7+ azY2-,Y7+ + 041/72-F 05Y72 00 =-0,3345039 • ---0,787132 n0 =-0,1832440524 03 =-0,662727 al = 1,340948 a3 - -5,996684 a= 0,727201 04 = 0,277284 02 = -33,111923 02 = 10,33776 04 - 11,44996 08 = 20,50573 A=avi- 014 -1- a2x10 -1- (13xelvi- +a4x22 +as-rio2 a0 = -0,22i 725 al = 0,1060015 a=-0,439457 3 =-2,796954 0 a 4 = 8,528394 02 = 0,255586 = fl + 01-r2-1- -a33-24÷ a4x22 -1- a 2x22 - -0,160157

02 0,0115m06ii a=-0,826l31 0.41b>117 02 - 1.316268 I I

28

Table 4 (cont'd)

illz:+ 1-1?zi4i. 176 at) -1- a /,16 - 1 1/2111+ 001 .2 1(1S 4 0 29.4+ 0 :./.5cf; (1,1162 t7,,qeq:4 ‘14116 2 • aw 1t2 —0,0355 .871 aor • 0.002 9381C, ao 0 1 = 0.ei2.:,.; ■ a 2 = 0,474(;51 ,•.; a l 0,4 ;7:bri 0.3 • 1 529 n2 0,52.3P.79 • 2.94 u0 24 e* 0,b0e.445 0, 2,9:i 3,22; i:42 a, = 0,605657 • —0,03750165 as--, 0 =-- Clo + a 1 :6 0.1Z -h aszei ± C,,es2 4- asz 72 -1- a14;,7 1+ 04,71 4 2 -1. usg 1 2 00 = 0,00059 ; 97 ao — 0,000157881 = 0,976:127 • —0.037016476 az = 0,01 ; 021;07 = 1,03s3447 as = —0,22.i781 01= 0.292989 (34 —0,10853374 04 =-- 0,288262 as = 0,353438 as = 0 an4-0 ..:14-`-a222+ + 0314+ a4z1 4 2 -r- aszil 00=-- —0,025576 a l -= 0,944372 02=-- 0,392049 113 =- —8,534277 04 = 1,0518968 ab = 6,8203543

For 1968, the arguments have the following values:

• (CP t• P \-1 , i 1=1 15;21e p 0,016826. 100 ° 2,8% .

cp9: • '

By Using the model obtained, we find the forecast 0; x2 0; X3 , , 0,55353; — 0,45013; x0 = 0: X6 — 0, 4 64; x =0,41681; x„ 0.333; x, 0,48717; x10-0,232; xn 0 ; x12 — 0,2 ; x13 0: x14 -- 1.784. However, the correct value is

— 0,058.

({) — 0.042 and the absolute deviation Of the forecast amounts to

Ay r.= — 0,016. '•

29

Method of optimization of the division of a specified number of

interpolation nodes into the learning sequence and control sequence.

A total of 14 interpolation nodes was specified in the original

data. The nodes were divided into the learning and control sequence

in a proportion of 9 : 5.

It may be seen from Fig. 3 that this choide is Optimal with

respect to the choice of the number of selection steps. The simplest

and, therefore, the most reliable model is obtained at the subdivision

of the nodes adopted here. A similar curve was also obtained for a

number of other problems (e.g., for the model of England's'economy).

As was noted earlie'r, the role of the index of regularity is

played by the value inverse to the mean square deviation in the control

sequence (or by the correlation coefficient). The value inverse to the

number of selection steps may be used as the index of reliability. The

problem consists in achieving the maximum regularity at - the maximum

(Fig. 3). reliability, i.e., in our example, to select the point 0 2 By changing the value of the threshold -(the number of the variables

in each step which are allowed to pass to the next step), we may attempt

to increase still further - the regularity and reliability. 30

; \

„. •

Fig. 3. Determination of the optimal division of the data into the learning and the control sequences. 2 s - number of selection steps; - mean square deviation; 1 - number of selection steps; •2 - our choice.

Conclusions

The heur.istic self-organization approach and the method of data handling by group, which is based on it, creates the possibility of constructing mathematical models of ecological systems, which may be used not only for the qualitative, but also for thequantitative estimation of any variable that is of interest to us. This opens new prospects for the optimal control of the regime in reservoirs.

In addition to the model of the abundance of bacteria, a mathematical model was constructed for forecasting permanganate oxidizability. The principle of self-organization allowed us to obtain an equation (optimal with respect to both complexity and accuracy) forecasting the oxidizability in the three central stations of the Rybinsk reservoir (on the Volga) a year in advance. 31

The general mathematical model of a reservoir appears to us

in the form of a collection of such forecasting and control equations

for all main indices of the given reservoir, which would be brought up to date every time new data appear.

The authors express their appreciation to prof. N.M. Kamshilov, who introduced them to the problem of the Rybinsk reservoir. The present work would have been impossible otherwise.

Bibliography

1. G.G. Vinberg and C.A. Anisimov. Metematicheskaya model' vodnoi

ekologicheskoi sistemy, Sbornik "Fotosintez sistemy vysokoi

produktivnosti", izdaterstvo "Nauka" . Moskva, 1966.

(Mathematical model of an aquatic ecological system, Selection

"PhotosynthesiS-of a system of high efficiency", Publishers

"Science", Moscow, 1966).

2. V.V. Menshutkin and A.A. Umnov. Matematicheskava model' prosteishei

vodnoi ekologicheskoi sistemy. "Gidrobiologicheskii zhurnal",

tom VI, No. 2, 1970.

(Mathematical model of the simplest aquatic ecological system.

"Journal of Hydrobiology, vol. 6, No. 2, 1970).

3. 0.G. Ivakhnenko, Metod hrupovoho vrakhuvannya arhumentiv konkurent

metodu stokhastychnoi aproksymatsii, "Avtomatika", No. 3, 1968.

(Method of data handling by group - a competitor of the method of

stochastic approximation, "Automatic control", No. 3, 1968). t

tIt

32

4. 0.G. Ivanhnenko, Yu. V. Koppa and Vu Suan Min'. Polinomial'na

i lohichna teoriya skladnykh system, ch. I i II, "Avtomatika",

No. 4 i 4, 1970.

(Polynomial and logical theory of complex systems ,parts I and II,

"Automatic control", No. 3 and 4, 1970).

5. 0.G. Ivakhnenko and Yu. V. Koppa. Rehulyaryzatsiya rozv'yazu-

yuchykh funktsii u metodi hrupovogo vrakhuvannya arhumentiv,

"Avtomatika", No. 2, 1970.

(Regularization of the problem-solving functions in the method

of data handling by group, "Automatic control", No. 2, 1970).

6. A.G. Ivakhnenko, Samoobuchivayushchiesya sistemy raspoznavaniya

avtomaticheskogo upravleniya, izdatel'stvo "Tekhnika", 1969.

(Self-learning systems of recognition and automatic control,

Publishers "Technics", 1969).

7. A.G. Ivakhnenko, Heuristic self-organization in. problems of

Engineering cybernetics, "Automatica", vol.. 6, Perggmon Press, 1970.

8. Yu. V. Koppa,- Prohrama na movi translyatora ALGOL-BESM 6 dlya

rozv'yazannya interpolyatsiinykh zadach za alhorytmom MGVA z

polinomamy pershoho abo druhoho stepenya, "Avtomatica", No. 1, .

1971.

(Program in the language of the ALGOL-BESM 6 translator for the

solution of interpolation problems by means of the MDHG algorithm

with polynomials of the first or second degree, "Automatic

control", No. 1, 1971). •

33

9. A.G. Ivakhnenko, Yu. V. Koppa and V.I. Braverman, Opredelenie /D.34

statisticheskikh kharakteristik kolonny sinteza metilkhlorsilanov

P0 algoritmu MGUA s kvadratichnymi polinomami. Sb. "Tekhnicheskaya

kibernetika", vypubk 22, IK AN USSR, 1971.

(The determination of statical characteristics of the column

of methyl-chlor-silane synthesis by means of the MDHG algorithm

with the 'second-order polynomials. Collection "Technical cybernetics",

Issue 22, Institute of Cybernetics of the Academy of Sciences of

the Ukrainian Soviet Socialist Republic, 1971).

10. A.G. Ivakhnenko, V.D. Dimitrov and G.D. Strukova, Mnogoryadnaya

veroyatnostnaya model' dlya upravleniya proizvodstvom metilkhlor-

silanov, tam . •

(Multiple-stage probabilistic model for controlling the production

of methyl-chlor-silanes, ibid).

Manuscript received: March 24, 1971.