FISHERIES RESEARCH BOARDOF CANADA rteJq Translation SeÈies No. 2348
A method of mathematical modelling of complex ecological systems
by A. G. Ivakhnenko, Yu. V. Koppa N. N. •Todua, and G. Petrake
Original title: Metod matematychnoho modelyuvannya sladnykh ekologichnykh system
From: Avtomatykâ, Instytut kibernetyky AN URSR•(Automatic control, Institute of Cybernetics of the Academy of Sciences of the Ukrainian Soviet Socialist Republic), (4) : 20-34, 1971
Translated by the Translation Bureau(JS/TTH) Foreign Languages Division Department of the Secretary of State of Canada
Department of the .Environnent Fisheries Research Board of Canada Marine Ecology Laboratory Dartmouth, N. S. 1973
33 pages typescript 7 4 e t It F PJ:3 • DEPAiTMEIT OF THE SECRETARY OF STATE r- SECRÉTARIAT D'ÉTAT 4 TRANSLATION BUREAU BUREAU DES TRADUCTIONS
MULTILINGUAL SERVICES DIVISION DES SERVICES CANADA DIVISION MULTILINGUES
TRANSLATED FROM - TRADUCTION DE INTO - EN Ukrainian English AUTHOR - AUTEUR A. G. Ivakhnenko, Yu. V. Koppa, N. N. Todua and H. Petrake
TITLE IN ENGLISH - TITRE ANGLAIS
A method of mathematical modelling of complex ecological systems.
TITLE irt , OREIGN LAN ( UAGE (TRANSLITERATE FOREIGN CHARACTERS) TITRE EN LANGUE ÉTRANGÉ'RE (TRANSCRIRE EN CARACTkRES ROMAINS)
Metod matematychnoho modelyuvannya skladnykh ekologichnykh system.
REFERENCE IN Ï-- OREIGN LANGUAGE (NAME OF BOOK OR PUBLICATION) IN FULL. *TRANSLITERATE FOREIGN CHARACTERS. RÉFÉRENCE EN LANGUE ÉTRANGÉRE (NOM DU LIVRE OU PUBLICATION), AU COMPLET, TRANSCRIRE EN CARACTIRES ROMAINS.
Avtomatyka, Instytut kibernetyky AN URSR, No. 4, 1971.
REFERENCE IN ENGLISH - RÉFÉRENCE EN ANGLAIS
Automatic control, Institute of Cybernetics of the Academy of Sciences of_Ihe Ukrainian Soviet Socialist Republic, No. 4, 1971. PUBLISHER- ÉDITEUR PAGE NUMBERS IN ORIGINAL DATE OF - PUBLICATION NUMÉROS DES PAGES DANS Institute of Cybernetics of • DATE DE PUBLICATION L'ORIGINAL the Acad. of Sci. of the Ukrainian SSR. 20-34 YEAR ISSUE NO. VOLUME PLACE OF PUBLICATION ANNÉE NUMÉRO NUMBER OF TYPED PAGES LIEU DE PUBLICATION NOMBRE DE PAGES Kyiv, UKR.SSR DACTYLOGRAPHIÉES 1971 4 33
REQUESTING DEPARTMENT TRANSLATION BUREAU NO.
MINISTÈRE-CLIENT Environment NOTRE DOSSIER NCI 183074 Fisheries Res. Board BRANCH OR DIVISION Marine Ecology Lab. TRANSLATOR (INITIA LS) J.S. erlIr DIRECTION OU DIVISION TRADUCTEUR (INITIALES) uo ea, e Bedford Inst. of Oceanography 1 PERSON REQUESTING Dr. K. H. Mann CD UNEDITED TRANSL/4TI011 DEMANDÉ PAR ni or "4,4tormation.only YOUR NUMBER TRADUCT,ION NON REVISPI: VOTRE DOSSIER NCI M.f...p.rnution souleir
DATE OF REQUEST DATE DE LA DEMANDE July 25, 1972
SOS.200-10-5 (REV. 2/58) 75'30.21.029.5333 Tr DEPARTMENT OF THE SECRETARY OF STATE SECRETARIAT D'ÉTAT TRANSLATION BUREAU BUREAU DES TRADUCTIONS
MULTILINGUAL SERVICES DIVISION DES SERVICES DIVISION MULTILINGUES
CL I ENT'S NO, DEPARTMENT DI VISI ON/BRANCH CITY N° DU CLIENT MINIS Ti.RE DIVISION/DIRECTION VILLE F.R.B. Environment Marine Ecology Lab. Dartmouth, N.S.
BUREAU NO. LANGUAGE TRANsLAT0R(INITIALs) N ° DU BUREAU LANGUE TRADUCTEUR (INITIALES) DEC 1 4 972 183074 Ukrainian J.S.
•or inliormar:on Automatic control. TRADUCTION
Institute of Cybernetics of the Academy of Sciences of the Ukrainian
Soviet Socialist Republic. •
A method of mathematical modelling of complex ecological systems.
By: A. G. Ivakhnenkô, Yu. V. Koppa, N. N. Todua and G. Petrake
(Kyiv.)
• Summary .
The method of data handling by group (M1.)1IG) is applied to synthesize an analog predicting the quantity of bacteria in the I"Zybinslt reservoir with extrapolation time for a year. The method is based on the principle of self-organization at which it is enough to observe only a small part of the characteristic vector components, as a result of which a complex problem of simulation turns into a comparatively simple one.
SOS.-2.00-10-31
7530.21-029.5332 ■
7 2
Statement of the problem of modelling aquatic
ecological systems.
In the coming years, automatic computerized control centres
will be created. These centres will be connected by means of
telemetered systems to the transducers, which will apt upon the
active elements controlling the ecological conditions in reservoirs.
A reservoir will thus become an object of automatic control, and,
because of this, mathematical modelling of ecological systems in the
reservoirs will be more and more necessary.
Below, an attempt is made to adopt a new approach for the
simulation of aquatic ecological systems by introducing heuristic
self-organization in which, among others, nonlinear high-degree
finite-difference equations ("polvnnmial descriptions") are used
instead of differential equations. This method is more adequate for
the problems involved in the simulation of complex Systems, and maY
provide not only qualitative, but also quantitative estimations of
the variables.
The models available so far are useful only for a qualitative
analysis of various processes, a fact admitted even by the authors of
these models. For example, in (2), where one of the best deterministic
models is described we read the following: "The results of the analysis
of an aquatic ecosystem provided by this model can only be treated as
purely qualitative; in order to obtain well-founded quantitative data,
a considerable amount of further work is required".
3
The authors of the present paper claim, however, to have
developed a mathematical model, which provides not only qualitative,
but also quantitative estimates.
Accuracy in the simulation of complex systems requires
an increase in the complexity of mathematical descriptions.
There exists a certain discrepancy between the complexity of
the objects of mathematical modelling and the simplicity of the means
employed for this purpose. Until recently, modelling was done either
by means of deterministic methods (based on the study of simple
differential equations, e.g. of the linear equations of convective .
diffusion), or by statistical methods of simpie regression analysis
(which do not reach beyond the scope of the linear or quadratic
regression). A simple . substitution of . finite differences for deriv-
atives suffices to show that the complexity of the mathematical
description is extremely low in all cases and that, in principle, it
cannot ensure accuracy in modelling complex systems.
Example. The equation of convective diffusion in the diffetential
form is
- OS ô OS) - -1- - I\ •• Ot `Oxi
where S stands for concentration of matter; t for time; -v. for stream /p.21
velocity;xi for coordinates; for the coefficient of turbulent Kij diffusion; K for the coefficient of non-conservation. 4
Let us approximate the derivatives by finite differences
DS-• A S S. — S S • — , i i•1 DS AS S 1 - • , . OXe b à
By substituting the expressions obtained in the original
equation, we obtain the algebraic equation
Sivi+ ao
It may thus be seen that the above differential equation
corresponds, from the poinE of view of its complexity, to the linear
regression equation.
The fact that the striking discrepancy between the complexity
of the mathematical apparatus on the one hand, and the complexity
of the object on the other, is not even noticed (the reasons for the
inaccuracy are often explained by the fact that some other factors
should also be considered) is due to the deterministic way of thinking
of the researchers, which has become deeply rooted and represents the - main shortcoming of contemporary mathematical modelling. This deficiency
is eliminated by means of the MDHG.
Method of data handling by group ( ,IDHG).
The method of data handling by group (MDHG) is similar to
the methods of mass selection of plants or animais (7). A certain number of input data (which are called factors or arguments) is used 5
for the construction of all their possible combinations by pairs.
Each pair of arguments provides a "partial description", the coefficients of which are determined by the solution of a small system of normal equations (on the basis of the minimum mean square deviation). In the above procedure use is made of a certain experimental selection of data referred to as the learning sequence. The complete description is obtained by excluding intermediate variables from the set of partial descriptions.
0.1it «MI
0.10 Choice according to "the rule of the left angle".
005“
0.04 Oubip ia070 rigria" 0.02 0015 _
4 8 quCel pöô Ceti Number of selecti.on steps
Fig. 1. Use .of the index of regularity (deviations in control sequence) in increasing the number of selection steps (the problem of forecasting the number of bacteria at the number of nodes in the learning sequence = 9, in the control sequence = 5).
The essential difference between the MDHG arid other mathematical methods (recurrent methods, decomposition methods, et al.) is that both initial sequences of experimental data (the learning and the control sequence) increase in each consecutive selection step: the best results of a preceding step are used as the initial data in the next selection step, and so on. 6
The structure of the MDHG algorithm is multiple-stage: the intermediate variables, obtained in the first step, are used to form combinations by pairs in the next step of selection etc. At the end of each step a threshold self-selection takes place, similar to that observed in the mass selection of plants or animais: only a certain percentage of the most regular intermediate variables is admitted into the next step.
The regularity of the variables is defined either by their correlation coefficient, or by the value inverse to the mean square deviation, determined from a separate control sequence.
The rule for stopping the increase in the number of selection steps is as follows: as soon as the regularity criterion rises to the permissible level or begins to decrease systematically, the increase in the complexity of the full description discontinues ("the rule of the left angle").
We may choose the first or the second local minimum of deviation (Fig. 1).
Over 20 different modifications of the MDHG algorithm have been proposed so far. They differ in the type of the key functions that are used for the construction of "complete" and "partial" descriptions and which are similar to one another in each individual algorithm.
In the main MDHG algorithm, quadratic polynomials are used as key functions (3). The degree of complete description increases then in each successive selection step: a quadratic regression is realized in the first step, regression to the fourth degree in the second step, regression to the eighth degree in the third step etc. 7
The examples of computation available so far show that a
high degree of accuracy in modelling is attained at a high degree
.of complete description. For example, in building a mathematical
model of the balance of paYments in England for 1969, it was
discovered that an error of Z= 0.168% can be attained at the 64-th
degree of the complete regression equation (4). Such a high degree
of accuracy of the quantitative estimations in modelling had been
previously unattainable.
Optimization of the complexity of a model by means of the 'DHG.
The method of data handling by group (MDW) discussed below,
represents an attempt at developing a new technique of mathematical
modelling proceeding from the principles of the heuristic self-
organization, a technioue, which would make it possible to gradually
increase the complexity of a mathematical representation (model) as
long as it leads to an increase in the accuracy of the modelling.
In other words, the MDHG solves the problem of attaining optimum
complexity of a model. The gradual increase in the complexity is
controlled according to the value of the mean square deviation, which
is determined, from a separate control sequence of data. Usually,
with an increase in the complexity of the model the deviation decreases
drastically, producing two or three minima, followed by a slow, gradual
increase. The first (or global) deviation minimum determines a single
model of the optimum complexity. 8
Plurality of mathematical models and singleness of
the most regular (optimal) model.
Until now emphasis was placed on the so-called plurality of regression equations. This consists in the fact.that, with a change in the complexity of the regression equation (e.g., with a change in the number of its terms or in the degree), the numerical values . of the coefficients at a given variable also change. Thus, coefficients of a regression equation cannot be regarded as coefficients of value indicating the function of a given variable.
For example, V. V. Nalimov and N. 0. Chernovaya state in their well-known work "StochastiC methods of planning n exneriments" that
"there is no point in attaching a value to individual coefficients of regression". Taking into account that the principles of regularization make it possible to determine the one single optimal polynomial of regression, the above statement appears to be erroneous. Coefficients of the one single regular polynomial provide one single value for each variable.
Addition to the regression analysis of various procedures aimed at regularizing the solutions (according to M. 0. Tikhonov,
V. K. Ivanov et al.) makes it possible to find the one single optimum equation of regression, the coefficients of which may be regarded as the value coefficients of corresponding variables.
Irregularity or incorrectness manifests itself in the fact that the regression equation, obtained on the basis of a given set of * Stokhasticheskie metodv planirovaniya eksperimenta. 9
interpolation nodes, differs from the regression equation obtained by using another set of nodes of the same process. The purpose of the regularization is to decrease the deviation obtained for the new points.
Therefore, particular attention is paid in the MDHG to making
the regression equationsregular, and the problem of determining the accuracy of description is treated in a different manner. If the
interpolation nodes, which were used to estimate the coefficients, are also utilized to verify the accuracy, then the more complex is
the model, the higher will be its accuracy. It is easy to prove -
that in a situation where the number of terms in a new model is greater
than the number of interpolation nodes (n N),.'the solution is incorrect:
small changes in the data induce great changes in the values of the
coefficients. For the purpose of regularization, all the data are
divided into two approximately equal groups: a learning sequence and
a control sequence. The former sequence is used only for the estimation
of the coefficients, the latter - only for the estimation of
accuracy. This technique makes it posèible to determine the optimum
complexity of the mathematical model, because, while the complexity
(degree of polynomial) of the complete model increases, the accuracy
first increases, then (having attained the maximum), begins to decrease.
The accuracy maximum enables us to determine the one single mathematical model of a: complex object, which is both regular and optimal from the
viewpoint of complexity. A magnitude inverse to the mean square
deviation of the control sequence, or the correlation coefficient may 10
serve as a criterion of regularity of the regression equation.
If we had not adopted the principle of regularization, we would have arrived at an erroneus result: with the increase in complexity of the model, the accuracy would increase continuously and, eventually, if the number of possible states of the model should become equal to the number of experimental points (inter- polation nodes), the deviation would seem to disaDpear. It is impossible to find the optimum complexity of a model without regularization.
When the principles of regularization are applied, the plurality of regression equations does not contradict the singleness of the optimal equation (optimal model) for a 4ven set of variables.
"Vertification of hypotheses".
The one single regular optimal regression equation has an interesting feature: its coefficients reflect the value of the arguments before whiCh they stand.
This makes it possible to verify hypotheses. A hypothesis may represent the assumption that an equation should inclUde a certain term, usually non-linear and complex, corresponding to a given phenomenon.
Should it be determined as a result of the regularization, that the coefficient standing before the term in question is close to zero, it would mean that the physical mechanism of the process is such that this term does not exist in the given law. Only the terms with coefficients 11
that are not equal to zero remain in the equation. Mathematical verification of physical, biological and other hypotheses thus becomes possible.
Example. Let us consider the function of time
1— r— /3 + 15 .
Let us assume that the function is unknown, and that we are given only seven points of this function (Table 1). The problem consists in determining it by means of the MDHG.
We shall apply the MDHG algorithm with linear polynomials (8).
Let us assume that we choose m = 8 as for our.first example, we would obtain the following complete regression equation:
at) F ail a,12 a 113 ate.
If by using the MDHG we obtain a = a = a = 0, it will confirm 6 7 8 that the method examined provides a "verification of hypotheses" in the above sense.
As a criterion of regularity we shall choose the coefficient of correlation between a given intermediate variable and the primary variable r , calculated from the data of the control sequence: the higher the coefficient, the greater the regularity.
12
Table 1
m P cli .. ., .. 4-) H H H 0) .., 0 •, 0 . 0 • 0 P 0 P 0 P 0 ai P 4 P 4-) P 4..) P P CO 0 rd 0 çe 0 cyl cd a) 0 cl) 0 (1) 0 Cl) R-1 H c) H Ci H C.) H
I Irpa sir pli uo' nep I 11.11e1 nep yawl nei, I 1::1 Lei
--3 0 2 .; 364 6:3 G 1 o . —21 —182
First step of selection. There are eight variables on the input of the
MDHG algorithm.
E, X2 77-= 12, X3 X8 u=
These variables provide 28 partial equations of the first step. The
following five equations (with corresponding correlation coefficients
turned out to be the most regular ones:
h c•iti =. 0,9992866; 0,9958979; 0,9909868; 0,9884087 i 0,968 9235): fr y1 (4 — 5). 1,9 H- 1,1x4 — 1,1235x3, y2 (2 —3).--8 Jr 1 1 — 10,101x1, y3 (7— 8). 2,987 0,12.1729x7 0,013.1146x5, y4 (6— 7) = 2,87913 0,120879x6 — 0,124829.1:7, y5 (1 — 2). — 8 -- 82,2x 1 ± 11x2.
13
Second step of selection. The five intermediate variables
. provide ten partial equations of the second step. The most i Y regular ones among them are the following four eauations (the
corresponding correlation coefficients are 0.9999233; 0.9998364;
0.9997113; 0.9982104):
• z1 (45 12) ---=-- — 0,09817363 0,023y 0,979y45, z, 23 —45) == — 0.00832 ± 0,208577y 23 0,791504y45,
za (67 -- 23) — 0,0123 -4 - 0,712148/47 +.0.2881134i/23, (78-231.-0.0122 -;-0.712466y7,+ 0,2:S77451/23.
Third step of selection. The four intermediate variables
z provide six partial equations of the third step. 'The most regular i one among them is the last equation (çorrelation coefficient 0.9999730)
ul (12-23-45) = —0,00)J 0,5z1 -1- 5z,
Having excluded the intermediate variables, we obtain a so
called ttanalogueT of the initial complete regression equation
0,702663 -- 0,9455S77x, 1.273712x., — 0,973723.v., 0,99.16671A%.
In fact we obtain a = a = a = 0, which is what we wished to /p.25 6 7 8 prove. If the specified points were valid for several polynomials
rather than for one polynomial, we would obtain at the end of the selection
all these polynomials with a zero deviation. L.
14
Relation between regularization and the Gedel theorem.
The problem of separating the information concerning a process -
into the basic and supplementary information, is related to the
"principle of external complement" of Stafford Beer, as well-as to the
Gedel theorem.
In the above prOblem four nodes have been assigned to the basic **, information, and three nodes were classified as supplementary information
Since the number of terms of the regression equation (ten) exceeds the
number of basic nodes (four), it is possible to find an infinite multitude
of polynomials which will satisfy these nodes. The supplementary
information (three nodes) ià used only for the purpose of choosing the
one single regular polynomial out of this infinite multitude of polynomials.
If we used a different approach to this problem, we could include
ail the eight nodes among the basic information. In that case some other
supplementary (supporting) information would be required to enable us
to choose the one single regular polynomial. Apparently "additional
complement" is necessary in principle for solving the problem of
selection of the one single polynomial.
Peculiarity of the self-organization approach and indirect
determination of essential variables in complex systems.
By exaggerating slightly, we can formulate the self-organization
approach in the following manner: "I know that I do not know anything
about a complex system; let us leave it to the computer to deal with the * Transliteration. Might stand for g8del. (Translation). ** Several nodes are required in the control sequence only in the presence of noise. One small supplementary point, suffices for deterministic functions without disturbances (see the algebraic minimum of interopolation nodes presented below). 15
whole problem on the basis of experimental data only".
The initial parameters, qualitative indices of the situation, must, however, be provided by man. In problems concerning the optimal
control we must also indicate the controlling factors. Depending on
the choice of the initial values and controlling factors, we can
obtain a vast variety of optimal mathematical models of the complex
system.
Let us assume that the characteristic vector of a given complex
sYstem includes M variables which may be interconnected by m relation-
ships (equations). In order to obtain a complete mathematical description
of the system it suffices them to use (14 - m) variables. The other
intermediate variables can be eliminated. Knowing the composition and
number of the variables, as well as the number of their interrelations,
it is easy to calculate the minimum number of the variables, which
should be included into the model .
• For example, if the system is characterized by 21 variables
related to one another by 16 equations, it suffices to study the relation
between the initial magnitude and only 21-16 = 5 variables (example
of the mathematical model of England's economy). It is important to
note that these five variables may be any of the variables included
in the original set. This simplifies considerably the problem of
finding the necessary set of initial data, since we can use readily
'measurable values. •
The peculiarity of the models obtained by the method of self-
organization consists in the fact that it is possible to exclude the
seemingly important factors. For example, in the model of the biomass
of aquatic plants in a reservoir, variables such as the solar radiation
* We may include more, but not fewer variables than the number indicated above. 16
or the inflow of foreign matter etc. may be absent. This does not
mean that the model is wrong. It only indicates that these factors
are measured indirectly by means of other variables, since all the
variables of a complex system are interrelated.
The indirect measuring of variables has been used for a long
time in automatic control, where it bears the name of the"differential
fork method". The task of simulation becomes considerably easier,
since it does not matter greatly whether it is possible to measure a
certain important variable. Experimental values of the other variables
replace the variable in question according to the principle of the
indirect determination of the variables. The peculiarity of the self-
*organization approach consists in the fact that in the general event
one does not even know which variables replace one another: the diagram
of the model similar to that shown in Fig. 2, is not self-explanatory.
We may thus see, that a physical model of the kind shown in Fig. 2, is
by no means better suited for obtaining quantitative results. For
this purpose we need models that are completely different from the
physical models both in the type of functions and in the arguments. .A
great multitude of equivalent, though diverse representations is obtained,
depending on which intermediate variables are excluded and which ones
remain in the description. In this sense, the optimal models are
multiple. But each set of variables has one single optimal model. 17
• 0 2 exchange_ Solar radiation. ------Ommtie 4zx CO exchange COitiiqtra 2 ____...6.5m;„ci..,„ir,, X . „,,-- L
0
0Pa/in Y■4 th ,?O xino.-wor .petiaiuitu Inflow of foreig--- organic matter iN nodxGawehm me,7peniiiy.1 N - Ca0r1110P ,1111 I pmeetam peqauflu YD YN T puou &fle Inflow of foreign in- - pabil litni?CerfriR 1 dtmE•cennsi 1 aglefic- organic matter beoliemn 0,020- dame Heupeamitulux rilVillIX petioôim pettoidn . • / Outflo and deposition Breeding Catching of organic matter of fish of fish Outflow and deposition of inorganic matter
Pi g. 2. Structure of a model of an ecological system (constructed on the basis of the dàta contained in (2) ). X - oxygen; W - carbonic acid; B A - biomass of bacteria; B A saprophylitic bacteria; D - organic matter (oxialzability); N biogenic rnorganic matter; B 1 - biomass of plankton; Bi - primary productivity; B2 - biomass of zooplankton; B - planktofagous fish. 3 Combined method of simulation.
We refer to our method of simulation as the combined method
and to the models themselves as "post-balance" models. The reason
for these names lies in the fact that the method rests on the choice
of the characteristic vector on the basis of the grouping . of the
equations of balances (the deterministic part) that after that the
data are handled by the group method (the self-organization). This 18
combined method permits to solve problems faster since it eliminates or reduces considerably the accidental surplus of arguments by using the information criteria or the criterion of accuracy.
In spite of the'fact that the deterministic approach and the self-organization approach are contradictory in principle, the most effective methods are, nevertheless, the combined methods, in which the composition of the characteristic vector describing a complex system is established by the common deterministic methods
(composition of differential equations, equations of the balance of matter or of energy etc.), while the synthesis of the equations Of the mathematical model and the determination of its parameters are realized by the methods of self-organization.
Optimization of the complexity of the mathematical model of an ecological system by means of the MDHG algorithm with quadratic
partial polynomials (using the example of the Rybinsk reservoir).
Initial information for the model of a complex system may be represented by a set Of mean annual data from 10-15 years of observations.
These data suffice for the determination of the coefficients of the so-called "partial" descrintions, which relate the principal variable to any two arguments, by way of a simple regression analysis (i.e., based on the criterion of the minimum mean square deviation), e.g.,
. . ...„ . _ . . aux./ + auxrvh ar,n,
2 where i = 1,2..., n-1; h = 2,3..., n; i = C h - number of arguments. 19
"Partial"descriptions of this type are composed for all the possible pairs of arguments, which characterize the state of the ecological system. This constitutes the system of "partial" descriptions of the first step. The resultant variables of the first step are used as arguments of the second step etc. The threshold elements allow only the most regular variables to pass from one step to the next (waich corresponds to the law of mass selection). The important point is that a small number of the interpolation nodes, remaining the same throughout the entire procedure, enables us, to determine the values of the coefficients of all the partial models in any step by a simple regression. analysis. And subsequent elimination of interm'ddiate variables enables us to obtain no matter how complex "complete" descriptions of a complex system with a number of arguments of up to N = 100. The complexity of a model is determined by the number of the steps of partial descriptions used or, to put it in different words, by the degree of the complete polynomial.
The necessary minimum number of nodes of the learning sequence
(with a matrix of 6 x 6 elements) is equal to seven. The higher the number of its nodes, the More (under certain conditions) reliable will be the results. The minimum number of nodes in the control sequence depends on the "degree of stochasticity" of the system: in the case of deterministic systems it is sufficient if one single node is used. The algebraic minimum of nodes is therefore 7 + 1 = 8 nodes. It has been suggested that this number be reduced'to 5 nodes.
The last figure represents the maximum possible learning speed of the 20
taught" model model. The speed of the decision-making response of a u- time is determined by the maximum displacement of its arguments in
(usually comes to 3 - 4 years). This displacement (time lag) is the proportional to the duratin of transient processes within
system.
Table 2
Variables Correlation coefficient
C.D - . mean annual abundance of bacteria 1.0 x - abundance of blue-green algae (e.■; = 1 ) : 0.59 area of 2 the-summer feeding grounds "(et': 4) 0.77 x - p/b 3 coefficient of zooplankton ( fe = 1) o.633 x -alundance 4 of zooplankton (May-October) ( frp = 0.630 x - 5 biomass of the blue-green algae (ie = 1) 0.6508 x6 - biomass of the blue-green algae (IF: = 4) 0.586 - biomass of zooplankton (per year) (re . 1) 0.576 x8 - permanganate oxidizability at the station POM . 2) 0.568
X 9 -abundance of zooplankton (per year) (1; . 1) 0.561 x10 -alundance of blue-green algae er, 4) 0.539 11 -biomass of diatomaceous algae 1) 0.545 x12 - ion content ( Ç 1) . a.516 xi3 - saprophytic bacteria (at MPA) 2) 0.879 xi4 - accumulation of water in the trough of reservoir ( ve = 4) 0.557 21
Table 3 (translation of captions)
(0) Source data (departure of the normalized deviations from the average)
(1) Variant
(2) Learning sequence
(3) Control sequence
(4) Variant II
(5) Learning sequence (optimal)
(6) Control sequence
(7) Variant III
(8) Learning sequence
(9) Control sequence
(10) Correlation coefficient •
(11) Years 2 (12)Dispersion D
(13)Parameters
(14)Abundance of bacteria in the reservoir
(15) Abundance of blue-green algae
(16) Area of the summer feeding grounds
(17) P/B coefficient of zooplankton •
(18) Abundance of zooplankton
(19) Biomass of blue-green algae
(20) Biomass of blue-green algae
•(21) Biomass of zooplankton per year
(22) Permanganate oxidizability at the POM station
(23) Abundance of zooplankton per year
(24) Abundance of blue-green algae Table 3 (cont'd)
(25) Biomass of diatomaceous algae
(26) NO content (average from the three stations) 3 (27) Resistance of the algae which grow in the MPA
(28) Accumulation of water in the trough of the reservoir
Table 3
Cs:
(c1 (:;;;,...-14: y itupw ,:;:orlx CiX.' I11' U , ccpcsmboro)
. ... . .. . (2) (3 ¶1• -•,o. • • (I ) r•sp.mir 1 ) I , - ii,,,,,o,, ,,,,...::,.,,..,e;:. „„„,,,,..,,,,„, 1 ;,„...„.„.1 1 ..M.---:;,(, ■1:11:‘ 71, (4;) (1.1.:mr 11 (J-} (6) ti. I — ------ _...—_____. — Franxv1'e ,.. rt.li,er -;c.d iii,.:,, (9) 1: 1_ ,, , ,..,..-ii:-. • .., -i• / ii.4.i.urr III 1 ( 6') I __ (7)--- - I . . 19f;t ■ lei 19:;') • 1,1*;7.) ito..),i 11•131 19 ,5 19•,2 1957 J P.:4 ( ii) Poil 1967 ol,r;oft -- I .:.ii.::%:., - 1,4Get ' 5.'1;4 5,3.17 4.4r;.-,.: (i2) "Incul..pciR I." 473.77. jp9.152 210,69 35.72:3 2r..7• 4.149 i.3b9
5 , $; 7 e 9 19 . 11 12 13 1; (i3) rbpa.miTot 1 2 3 4
a KTepin V 0,493 Oldt111Ce.lbl1iCTb er 0, I 83 : —0.9887 0,2042 0.0282 —0,4014 0.2254 0,1901 0,1338 - --0.12b 6 - --0.(Y, 04 0,1972 0,08-15 --0.5845 1 1:02.01:X0Bi1111 0.52-111Ce.11-.111CTb BO 1 0;70- X1',..X.t --0.092 -- 0.000 0 o 2,597 —0,451 •- 0,2:',' --- 0.6 ;5 --0,52 1,11 —0,38! —0,133 ' —0.081 0 70,595 1 1.-.outa niTysa 0 0 0 o o 1.4 5372 j 057 --0.7000 0 o o 0 —0.537 o 0.674 I 0- -,, X,:,1 7).11:5 KOC:PilliE117 30011.1311X3 0,0381 0 0.43944 .--0,45,;) 0 —0,03114 0,12155 —0,1484 0.02422 0 0,633 (// 1 - 0.0959 0.295 0,3737 0 T o;s a tr=1 (ioeitte.e.lbUiCib 300111a11R- --.0,3315 0,05511 --0.5795 —0,29919 0 0,4986 0,50683 --0,06102 0,00269 0 0,2073: --0,00808 0.37412 0 0,630
—0,315 —0,5-19 0 o 3,793 —0.339 : —0,464. -0,72 —0,639 1,057 --0,847 —0,036 —0,113 o —0.0508 (/5)510%1:Ica c,'s oz.; pocrerr 1 ,
1.057 --0,03-3 - 0,639 - - :.1,72 0 3,79 —0.847 —0.51933 —0,41593 0 --0,859 0 • —0,315 0 0,583 (e,/).)iomaLa CI3 sc,..LopocTeii - 4
2,013ioNnIc3 soorua KKTOKS 3a --0,17920 --0,07127 —0,78-101 --0,11447 0 —0,02807 0,53347 —0,13506 0,447 0 —0,15765 0.44708 0,46858 0 0,573 ( piK I e;•1.r.pma Kra :1 aT;:a --0.1279 0.050 -.0,1.S71 --0.1100 0 ' 0.0076 —0,1194 —0,0008 0.0609 0 —0,1104 0,0669 0,0754 0 0,503 ... B3i:iCTb ILO CT31.1il1i 130M 2 I (23)-lace.-.btaTb soormal:KTolin 0.',1',313.3 0,11652 —0.55101 —0.181 0 0.22579 0,45731 —0,01547 —0,01655 0 0,32316 (.1,0335F 0,52000 0 0,561 33 p:K 1 (24,thic,:.:ibi1iCTS C/3 so,s.opo- xio= 1,11 -- 0,13.3 --0,52 —0,645 0 2.597 —0.33! --u.569 --0,08! 0 —0,451 0 —0,092 0 0,532 cTeR 4 n= (2oy.cco .3.f3Tomos;ix so- 'e —0.348 — 0.676 0 o 1,7 : 5 --0.823 —0,313 —0,696 —0,62 -.1,4541 --0.503 1.431 —0,586 0 --0.5-15 sopocTe.fi I NO, c-pc,].;tiii 33 xi.2 .---1:31 — 0,05953 — 0,15 -- 0,08778 0,13684 --0,17894 0.1:O 2 0,20736 0.38947 —0,07:368 —0.04421 —0.03137 0.12 i 3- - r/.-1:: 0 = 1
(21,.1.iTiri so:to p ocicii, 3Ui pc-x13.=; 0 0,49 —0,302 0,144 1,56 0 —0.632 —0,355 —0,154 0 . --0,353 0 —0,193 0 —0,879 - cVTI. :1:1 , ,- 2,./ --19,5 14,416 5.1r7 0 0,838 - — 1,5..!4 1,62 1,98 0 1..059 0 !' 1' ,',, 9 —0,557 t 4 24
Example. The MDHG algorithm with quadratic "partial" regression equations is described in (3) and a program of computation with the help of this method is outlined in (8).
The problem consists in forecasting the average annual numbers of bacteria for 1968. Correlation analysis enabled us to determine 14 variables correlated most closely to the numbers of bacteria (Table 2) .
The value of the time lag which should be used with each argument is /p.29
given in the parenthesis. For example, in order to make the forecast 1964 (2'.r 4) etc. for 1968, we must take the value of x2 for Two formulations of the problem of forecasting. Let us not /p.31
forget (4) that there are two formulations of the problem of forecasting:
a) the problem of passive forecasting;
b) the problem of active forecasting for control purposes.
.q).x, • t t X k
of points of a control sequence of data; i is consecutive where N2 is number number of the point. The formula for calculating the mean square deviation in the control sequence is given in the example. When the correlation coefficient has the maximum value (is equal to one), the mean square deviation has the minimum value (is equal to zero). Each of these estimates may serve, however, as an index of regularity of the solution (sought function).
The former formulation precludes the use of arguments with the
time lag y = 0 (i.e. with the subscripts "m"). In the latter formulation
of the forecasting problem allowed such arguments may be used if they
are at our disposal and, consequently, represent controlling factors.
We can assign future values to these factors (for example, taxes for
The correlation coefficient of the initial variable with an argument xk (k = 1,2,...,$) is calculated from the formula 25
the next year). In the present example we shall confine ourselves
to the passive forecasting so that all the arguments with if= 0 and with the subscripts "m" will be excluded.
As is required bv the MDHG, the initial data (presented in
Table 3) were divided (according to the value of dispersion) into
the learning (nine points) and contrel sequence (five points)i
It has been proven by practice that' . the accuracy of forecasting
increases if the so called trend of the accidental preCess is isolated,
i.e. presented as the sum of a deterministic and accidental functions
of time. The method of isolating the trend is shown in (3).
In this example it is not necessary to isolate the trend, since
the mean values of the variable (f agree fairiy well with the trend.
The data Shown in the tables therefore deviate from the mean value
and normalized by it. For example,
--- A lcep --X 7-- i T. A. (Pi Ice.p
First step of selection. Fourteen arguments allow us to build
91 partial equations of the first step. It turned out that the most
regular ones among them are 14 equations of the following type:
— 6) — 0,11848 -1- 2,7312,t i — 0,5588xu + 3,856 -1- 4,0181:1 —
0.16015 0,0t5C)577.v, 1- 0,01358,v., — O. 82513x 4x,,
• • 0,169x.; . 1('2x , — 101 — 0.11631 — (,-161)23x3 — 0, I (51515x 10 — 0,8060:.;8x ;x1„ 0,516.q
1 rr1 — Yo ( 14 1.) 0,01012 -; 0,437xi 0,021.1xt 9,788:: — 0,40544 -- 0,00955.,,r,. 26
Second step and the subsequent steps of selection. For the sake of brevity of exposition we shall not quote the results of the second or of the subsequent steps. The measure of regularity
(mean square deviation for the control sequence) varied as is shown in Fig. 1. In each step the 14 most regular regression equations were selected.
Presented schematically, the selection proceeded as follows:
14-91-14-91-14-91-14-91-14-91 etc.
Using the "rule of the left angle" (see above) we discontinue the selection at the fifth step. As a result of this we obtain a system of regression equations (Table 4).
Estimation of accuracy and forec-ast - for 1968. Accuracy of the model is characterized by two indices (determined from the control sequence).
27
Table 4
List of Dartial descriptions of regression equations
ao 1- niy] •1. n2Y3 01313 'r 02 4Y! 2 1- 05Ys2
• -*- -0,039075 0.1181,' 0,8302 • a= 99 - 2.273121 - 1,235574 1I3 ,- -1,726672 a 3 - 3.ti56111 04 .- 1,254919 4,0181798 ai,= -1,678156 n2 = -0;482695
a2Y74- a3.1/14Yrfa4qi1 2 + Y3- 4i F al v3 -1- a..,x10+ a3x3):10+ aoy i2
ao = 0,0002373455 :10=-0.116571 L1= 1,04151 al=-0,489233 03 =-0,01681926 2 -.-0,161516 a a3 -= -0.2711322 33 =-0,836038 04 = -0,25333925 0.516047 02 = 0,4701865 a3= 0,3035 23
21 ao -1- alY6+ a2y7+ 131/01/7-1- a41/22-1- Y14=00-1-alx14-Fa3x1-i-03214x1+ +asY72 -1- a4x14.2 -1- asxt a • a=-0010!2557 00=-0,121847 . • = 0,4370541 0 1 = 1,107143 . • 0,0211406 03= 0,9075258 03 = .0,7889551 (13 --.8,11022 04 -0,105898 04 = 2,959377 (12 -0,00965536 • =‘ 3,583206
2 ÷. y2 = 00 + a l x74-a2x1 + a3x7x1 -1- x7=.00-FaiY7-1-02Ye+a3y7ye,+‘24y..r . -';-0 4x72 4- n2x1 2 asYa a0 =-0,0459635 00=-0,1190251 a l .= 0,734754 . ai = 1,1002154 ay= 0,151264 02 = 0,7701988 423 -2,2286787 03=-9,654656 04 = 0,583286 04 = 4,66158 as = 0,357512 02 = 3,70143 Po= a0 -Fa1x6+c-34 -1- a3x.6x2 + -1-04.262 = 05x82 23 = ao+ aw2+a2Y7+ azY2-,Y7+ + 041/72-F 05Y72 00 =-0,3345039 • ---0,787132 n0 =-0,1832440524 03 =-0,662727 al = 1,340948 a3 - -5,996684 a= 0,727201 04 = 0,277284 02 = -33,111923 02 = 10,33776 04 - 11,44996 08 = 20,50573 A=avi- 014 -1- a2x10 -1- (13xelvi- +a4x22 +as-rio2 a0 = -0,22i 725 al = 0,1060015 a=-0,439457 3 =-2,796954 0 a 4 = 8,528394 02 = 0,255586 = fl + 01-r2-1- -a33-24÷ a4x22 -1- a 2x22 - -0,160157
02 0,0115m06ii a=-0,826l31 0.41b>117 02 - 1.316268 I I
28
Table 4 (cont'd)
illz:+ 1-1?zi4i. 176 at) -1- a /,16 - 1 1/2111+ 001 .2 1(1S 4 0 29.4+ 0 :./.5cf; (1,1162 t7,,qeq:4 ‘14116 2 • aw 1t2 —0,0355 .871 aor • 0.002 9381C, ao 0 1 = 0.ei2.:,.; ■ a 2 = 0,474(;51 ,•.; a l 0,4 ;7:bri 0.3 • 1 529 n2 0,52.3P.79 • 2.94 u0 24 e* 0,b0e.445 0, 2,9:i 3,22; i:42 a, = 0,605657 • —0,03750165 as--, 0 =-- Clo + a 1 :6 0.1Z -h aszei ± C,,es2 4- asz 72 -1- a14;,7 1+ 04,71 4 2 -1. usg 1 2 00 = 0,00059 ; 97 ao — 0,000157881 = 0,976:127 • —0.037016476 az = 0,01 ; 021;07 = 1,03s3447 as = —0,22.i781 01= 0.292989 (34 —0,10853374 04 =-- 0,288262 as = 0,353438 as = 0 an4-0 ..:14-`-a222+ + 0314+ a4z1 4 2 -r- aszil 00=-- —0,025576 a l -= 0,944372 02=-- 0,392049 113 =- —8,534277 04 = 1,0518968 ab = 6,8203543
For 1968, the arguments have the following values:
• (CP t• P \-1 , i 1=1 15;21e p 0,016826. 100 ° 2,8% .
cp9: • '
By Using the model obtained, we find the forecast 0; x2 0; X3 , , 0,55353; — 0,45013; x0 = 0: X6 — 0, 4 64; x =0,41681; x„ 0.333; x, 0,48717; x10-0,232; xn 0 ; x12 — 0,2 ; x13 0: x14 -- 1.784. However, the correct value is
— 0,058.
({) — 0.042 and the absolute deviation Of the forecast amounts to
Ay r.= — 0,016. '•
29
Method of optimization of the division of a specified number of
interpolation nodes into the learning sequence and control sequence.
A total of 14 interpolation nodes was specified in the original
data. The nodes were divided into the learning and control sequence
in a proportion of 9 : 5.
It may be seen from Fig. 3 that this choide is Optimal with
respect to the choice of the number of selection steps. The simplest
and, therefore, the most reliable model is obtained at the subdivision
of the nodes adopted here. A similar curve was also obtained for a
number of other problems (e.g., for the model of England's'economy).
As was noted earlie'r, the role of the index of regularity is
played by the value inverse to the mean square deviation in the control
sequence (or by the correlation coefficient). The value inverse to the
number of selection steps may be used as the index of reliability. The
problem consists in achieving the maximum regularity at - the maximum
(Fig. 3). reliability, i.e., in our example, to select the point 0 2 By changing the value of the threshold -(the number of the variables
in each step which are allowed to pass to the next step), we may attempt
to increase still further - the regularity and reliability. 30
; \
„. •
Fig. 3. Determination of the optimal division of the data into the learning and the control sequences. 2 s - number of selection steps; - mean square deviation; 1 - number of selection steps; •2 - our choice.
Conclusions
The heur.istic self-organization approach and the method of data handling by group, which is based on it, creates the possibility of constructing mathematical models of ecological systems, which may be used not only for the qualitative, but also for thequantitative estimation of any variable that is of interest to us. This opens new prospects for the optimal control of the regime in reservoirs.
In addition to the model of the abundance of bacteria, a mathematical model was constructed for forecasting permanganate oxidizability. The principle of self-organization allowed us to obtain an equation (optimal with respect to both complexity and accuracy) forecasting the oxidizability in the three central stations of the Rybinsk reservoir (on the Volga) a year in advance. 31
The general mathematical model of a reservoir appears to us
in the form of a collection of such forecasting and control equations
for all main indices of the given reservoir, which would be brought up to date every time new data appear.
The authors express their appreciation to prof. N.M. Kamshilov, who introduced them to the problem of the Rybinsk reservoir. The present work would have been impossible otherwise.
Bibliography
1. G.G. Vinberg and C.A. Anisimov. Metematicheskaya model' vodnoi
ekologicheskoi sistemy, Sbornik "Fotosintez sistemy vysokoi
produktivnosti", izdaterstvo "Nauka" . Moskva, 1966.
(Mathematical model of an aquatic ecological system, Selection
"PhotosynthesiS-of a system of high efficiency", Publishers
"Science", Moscow, 1966).
2. V.V. Menshutkin and A.A. Umnov. Matematicheskava model' prosteishei
vodnoi ekologicheskoi sistemy. "Gidrobiologicheskii zhurnal",
tom VI, No. 2, 1970.
(Mathematical model of the simplest aquatic ecological system.
"Journal of Hydrobiology, vol. 6, No. 2, 1970).
3. 0.G. Ivakhnenko, Metod hrupovoho vrakhuvannya arhumentiv konkurent
metodu stokhastychnoi aproksymatsii, "Avtomatika", No. 3, 1968.
(Method of data handling by group - a competitor of the method of
stochastic approximation, "Automatic control", No. 3, 1968). t
tIt
32
4. 0.G. Ivanhnenko, Yu. V. Koppa and Vu Suan Min'. Polinomial'na
i lohichna teoriya skladnykh system, ch. I i II, "Avtomatika",
No. 4 i 4, 1970.
(Polynomial and logical theory of complex systems ,parts I and II,
"Automatic control", No. 3 and 4, 1970).
5. 0.G. Ivakhnenko and Yu. V. Koppa. Rehulyaryzatsiya rozv'yazu-
yuchykh funktsii u metodi hrupovogo vrakhuvannya arhumentiv,
"Avtomatika", No. 2, 1970.
(Regularization of the problem-solving functions in the method
of data handling by group, "Automatic control", No. 2, 1970).
6. A.G. Ivakhnenko, Samoobuchivayushchiesya sistemy raspoznavaniya
avtomaticheskogo upravleniya, izdatel'stvo "Tekhnika", 1969.
(Self-learning systems of recognition and automatic control,
Publishers "Technics", 1969).
7. A.G. Ivakhnenko, Heuristic self-organization in. problems of
Engineering cybernetics, "Automatica", vol.. 6, Perggmon Press, 1970.
8. Yu. V. Koppa,- Prohrama na movi translyatora ALGOL-BESM 6 dlya
rozv'yazannya interpolyatsiinykh zadach za alhorytmom MGVA z
polinomamy pershoho abo druhoho stepenya, "Avtomatica", No. 1, .
1971.
(Program in the language of the ALGOL-BESM 6 translator for the
solution of interpolation problems by means of the MDHG algorithm
with polynomials of the first or second degree, "Automatic
control", No. 1, 1971). •
33
9. A.G. Ivakhnenko, Yu. V. Koppa and V.I. Braverman, Opredelenie /D.34
statisticheskikh kharakteristik kolonny sinteza metilkhlorsilanov
P0 algoritmu MGUA s kvadratichnymi polinomami. Sb. "Tekhnicheskaya
kibernetika", vypubk 22, IK AN USSR, 1971.
(The determination of statical characteristics of the column
of methyl-chlor-silane synthesis by means of the MDHG algorithm
with the 'second-order polynomials. Collection "Technical cybernetics",
Issue 22, Institute of Cybernetics of the Academy of Sciences of
the Ukrainian Soviet Socialist Republic, 1971).
10. A.G. Ivakhnenko, V.D. Dimitrov and G.D. Strukova, Mnogoryadnaya
veroyatnostnaya model' dlya upravleniya proizvodstvom metilkhlor-
silanov, tam zhe. •
(Multiple-stage probabilistic model for controlling the production
of methyl-chlor-silanes, ibid).
Manuscript received: March 24, 1971.