<<

R&D evaluation methodology based on group-AHP with uncertainty

Alberto Garinei1,*, Emanuele Piccioni2, Massimiliano Proietti3, Andrea Marini3, Stefano Speziali3, Marcello Marconi1, Raffaella Di Sante4, Sara Casaccia5, Paolo Castellini5, Milena Martarelli5, Nicola Paone5, Gian Marco Revel5, Lorenzo Scalise5, Marco Arnesano6, Paolo Chiariotti7, Roberto Montanini8, Antonino Quattrocchi8, Sergio Silvestri9, Giorgio Ficco10, Emanuele Rizzuto11, Andrea Scorza12, Matteo Lancini13, Gianluca Rossi2, Roberto Marsili2, Emanuele Zappa7, Salvatore Sciuto12

1Department of , Guglielmo , , 2Department of Engineering, University of , Perugia, Italy 3Idea-re S.r.l., Perugia, Italy 4Department of Industrial Engineering-DIN, , Forl`ı,Italy 5Universit`aPolitecnica delle Marche, Dipartimento di Ingegneria Industriale e Scienze matematiche (DIISM), Ancona, Italy 6Universit`aTelematica eCampus, Novedrate (CO), Italy 7Department of Mechanical Engineering, Politecnico di Milano, Milan, Italy 8Department of Engineering University of , Messina, Italy 9Research Unit of Measurements and Biomedical Instrumentation, Campus Bio-Medico University of Rome, Rome, Italy 10Department of Civil and Mechanical Engineering (DICEM), University of Cassino and Lazio Meridionale, Cassino (FR), Italy 11Department of Mechanical and Aerospace Engineering, Sapienza, University of Rome, Rome, Italy 12Department of Engineering, University of Roma Tre, Rome, Italy 13Department of Mechanical and Industrial Engineering, University of , Brescia, Italy *Corresponding author: [email protected]

Abstract

arXiv:2108.02595v1 [cs.CY] 5 Aug 2021 In this paper, we present an approach to evaluate Research & Development (R&D) performance based on the Analytic Hierarchy Process (AHP) method. Through a set of questionnaires submitted to a team of experts, we single out a set of indicators needed for R&D performance evaluation. The indicators, together with the corresponding criteria, form the basic hierarchical structure of the AHP method. The numerical values associated with all the indicators are then used to assign a score to a given R&D project. In order to aggregate consistently the values taken on by the different indicators, we operate on them so that they are mapped to dimensionless quantities lying in a unit interval. This is achieved by employing the empirical Cumulative Density Function (CDF) for each of the indicators. We give a thorough discussion on how to assign a score to an R&D project along with the corresponding uncertainty due to possible inconsistencies of the decision process. A particular example of R&D performance is finally considered.

Keywords: AHP, Multi-Criteria Decision Making, R&D performance, R&D measures 1. Introduction parlance) are selected following the existing literature and, more in detail, have been identified to be: Inter- The Analytic Hierarchy Process (AHP) is a Multi- nal Business perspective, Innovation and Learning per- Criteria Decision Making (MCDM) method developed spective, Financial perspective, Network and Alliances by Saaty in the 1970’s (Saaty (1977)). It provides a perspective. systematic approach to quantifying relative weights of We then single out a set of sub-criteria (indicators) decision criteria. through a set of questionnaires submitted to a team of Its strength relies on the fact that it allows to de- experts selected from academia or private research hubs compose a decision problem into a hierarchy of sub- in Italy. The indicators, along with the corresponding problems, each of which can be analyzed independently criteria, will form in our analysis the basic hierarchical in a similar manner. It is used in a wide variety of deci- structure of the AHP method. sion situations, in fields like , industry, health- care and so on. In order to have a sensible way to aggregate the values the different indicators take on, we operate on them in In this paper, we propose a method to evaluate Re- such way they share the same scale, namely they are all search and Development (R&D) performance, based on dimensionless quantities varying over the same range, group-AHP, through the introduction of a “score” as- which for convenience we choose to be 0 to 1. This is signed to each R&D projects in a given set. attained by employing as transformation map for each R&D represents the set of innovative activities un- the indicator the corresponding empirical Cumulative dertaken by companies and/or governments to develop Density Function (CDF). In this way, all the resulting new and more efficient services or products as well as to variables are approximately uniformly distributed over improve the existing ones. It has become somewhat cru- the unit interval. cial to have a systematic method to evaluate the perfor- mance a given project or research activity (Lazzarotti It is well-known that decision processes in complex et al. (2011)). See, among others, also (Kerssens-van systems carry along judgmental inconsistencies. Aware Drongelen & Bilderbeek (1999)), (Moncada-Patern`o- of the fact that some inconsistencies are difficult to get Castello et al. (2010)), (Tidd et al. (2000)), (Griffin rid of, we propose a rigorous method to quantify the un- (1997)), (Bremser & Barsky (2004)), (Jefferson et al. certainty affecting the “score” of a given R&D project. (2006)), (Kim & Oh (2002)), (Kaplan et al. (1996)), In order to better show how our method works, we give (Chiesa et al. (2009)) and references therein for the im- an example of application in the last section of this pa- portance of R&D performance assessment. Quantita- per. The method has been employed to evaluation of tive methods coupled with qualitative assessments are R&D projects whose data are stored in the DPR&DI used in decision support systems, for example by project (Digital Platform for R&D and Innovation Projects). funding commissions. This paper is organized as follows. In Section 2 we However, there are currently no standards for measur- discuss in detail the basics of the AHP method as de- ing the performance of an R&D project. The method veloped originally. In Section 3 we propose a method developed in this paper stems from a critical approach to choose the criteria and sub-criteria to evaluate R&D to the measurement problem concerning complex sys- performance through a set of questionnaires. We then tems (such as Research and Development). With the give a detailed and precise account on how to evaluate help of group multi-criteria methodologies, we tried to R&D performance of a given project, and finally we dis- faithfully represent the evaluations of R&D projects cuss the consistency of the proposed method. We give through the involvement of stakeholders. As a matter an example of R&D performance evaluation in Section of fact, the latter represent diverse interests, and belong 4 and our conclusions in Section 5. to different domains of knowledge. We used three questionnaires addressed to stakehold- ers at different stages of the process with the ideal goal of developing a shared decision support tool that is 2. Theoretical Background: the AHP easy to use and whose operation can be directly ex- plained. In view of adopting the logic of the metrolog- method ical method, we defined a model capturing the subtle features of R&D performance evaluation and keeping In this section, we discuss the basics of the AHP (An- track of measurements uncertainties. alytic Hierarchy Problem) method as developed origi- In order to introduce the standard AHP decision nally by Saaty in the 1970’s. More details can be found, structure, we need to define precisely what our criteria for example, in the book (Saaty (2010)) or in the review and sub-criteria will be. Criteria (or perspective in our (Ishizaka & Labib (2011)).

1 2.1 Decision problems can create a matrix1 A ∈ Rn×n of pairwise comparisons in the following way We face many decision problems in our daily lives. They   can be as simple as deciding what jeans we want to buy w1/w1 w1/w2 ··· w1/wn or more involved, like what person to hire for a post-doc w2/w1 w2/w2 ··· w2/wn  A =   . (2.1) position. Whatever decision problem we are facing, a  . . .. .  systematic way to deal with it can be useful, and this  . . . .  is where AHP comes to play a role. wn/w1 wn/w2 ··· wn/wn In AHP, each decision problem can be broken down in The matrix A is an example of a reciprocal matrix, i.e. three components, each with the same basic structure: a matrix where each entry satisfies aij = 1/aji. This is indeed what we would expect when there is an un- • The goal of the problem, namely the objective that drives the decision problem. derlying standard scale. For example, if we are are to determine which among two apples is the reddest and, • The alternatives, namely the different options that according to a given scale, apple a is twice as red as are being considered in the decision problem. apple b, it necessarily follows that apple b is one-half as red as apple a. • The criteria, namely the factors that are used to Note the following interesting fact, that will be rel- evaluate the alternatives with respect to the goal. evant for us later. If we define the vector w = (w , . . . , w )T it is easily seen that Moreover, if the problem requires it, we can asso- 1 n ciate sub-criteria to each criterion, adding extra layers A · w = nw , (2.2) of complexity. We will see an example of this in Section 4. where the dot-product is just matrix product, i.e. w The three levels (or more if we consider sub-criteria) is an eigenvector of A with eigenvalue n. In fact, it is define a hierarchy for the problem, and each level can rather easy to convince ourselves that the matrix A in be dealt with in a similar fashion to the others. This eqn. (2.1) has rank 1 and a theorem in linear algebra is essentially the basic structure of the AHP method in tells us that it must have only one non-zero eigenvalue. decision problems. The rest of this section is devoted to On the other hand, the trace of a matrix gives the sum spelling out the details of how a decision is eventually of it eigenvalues which in our case turns out to be 1 + made. ··· + 1 = n. It is therefore coherent to conclude that a consistent matrix like A above has only one non-zero 2.2 Weighting the problem eigenvalue, n. In this case n is also called the principal eigenvalue, i.e. the largest of the eigenvalues of a square A crucial ingredient in any decision problem is the map- matrix. ping of notions, rankings etc. to numerical values. Ba- As we said before, sometimes we have to deal with sic examples of mappings are scales of measurements, decision processes where a standard scale does not exist like the Celsius-degree for the temperature or dollars for and thus we are not given a priori a weight vector w. money. In these cases we have what are called standard What is really meaningful in this case is the matrix of scales, where standard units are employed to determine pairwise comparisons between alternatives, similar to the weight of an object. that in eqn. (2.1) However, it often happens that the same number (say   100°) means different things to different people, accord- 1 a12 ··· a1n ing to the situation, or different numbers are as good a21 1 ··· a2n A = (a ) =   . (2.3) (or as bad) for a given purpose (e.g. when trying to ij  . . .. .   . . . .  find the right temperature for a fridge 100 is as bad as ° an1 an2 ··· 1 −100°). Moreover, it might be the case that we need to analyze processes for which there is no standard scale. Here aij tells us how the i-th object compares to the Thus, we need to find a way to deal with these situations j-th object according to a give criterion/goal. Notice consistently. 1In this paper we deal mainly with finite dimensional real vec- It turns out that what really matters is pairwise com- tor spaces. In particular, if V and W are vector spaces of dimen- parisons between different options. In this way we can sions n and m respectively, a choice of bases v = {v1, . . . , vn} and n create a relative ratio scale and, in fact, here is the crux w = {w1, . . . , wm} determines isomorphisms of V and W with R m and R , respectively. Any linear operator from V to W has a ma- of the AHP method, as we will see in a moment. n×m trix presentation A ∈ R with respect to the given bases. In In the case we are dealing with a standard scale, we this respect, the eigenvalue eqn. (2.2) is a linear transformation can assign to n objects n weights w1, ... , wn. Then, we from a space to itself.

2 that also in this case we should impose aij = 1/aji, i.e. in the evaluation process. One major drawback, for we should have a reciprocal matrix, but now each entry example, is that the fundamental scale ranges from 1/9 is not given by a ratio of two quantities. to 9 and a product of the form aijajk might very well In order to make the pairwise-comparison coefficients be outside the scale, making it impossible to respect 2 aij as explicit as possible, the Saaty’s 1-9 scale is often multiplicative consistency. In the next subsection, we used (see Figure 1). The scale should be read in the will see how to manage possible inconsistencies. following way: If an object i is as important as the In order to have a (nearly) consistent matrix of pair- object j, then we should set aij = 1. If, instead object wise comparisons A, λmax should not differ much from i is more important than the object j, then aij should the dimension of A, n. In particular, finding the T be set to 3, 5, 7 or 9, following the scheme in Figure eigenvector w = (w1, . . . , wn) amounts to finding the 1. Also the intermediate even values (2, 4, 6, 8) can be weights (or priorities) of the n objects (alternatives), used and allow for finer assessments. and we are assured that, if the matrix A is sufficiently consistent, aij ≈ wi/wj. Note that multiplying both 9 ← Extreme importance sides of eqn. (2.4) by an arbitrary constant is harmless, and therefore the vector w can be conveniently normal- 8 ized as we please. We will have to say a little more on 7 ← Very strong importance this below. 6 5 ← Essential or strong importance 2.3 How to compute weights 4 We now find ourselves in the position where we should 3 ← Moderate importance determine the priority vector w, eqn. (2.4), once a pair-

Level of importance 2 wise comparison matrix is given. The easiest way to do 1 ← Equal importance so is to solve eqn. (2.4) using standard methods in lin- ear algebra. However, general procedures are not always exempt from inconsistencies (in AHP). For example, for Figure 1: Saaty’s 1-9 scale. inconsistent matrices with dimension greater than 3, there is a right-left asymmetry, i.e a right-eigenvector What if we considered an eigenvalue equation also for is not a left-eigenvector. the matrix A defined in (2.3)? And what would be the In order to avoid this issue, a common alternative to compute the priority vector w makes use of the loga- meaning of the weights (priorities) wi in this case? Let us begin by answering the first question first. rithmic least squares (LLS) method (De Jong (1984)), The Perron-Frobenius theorem tells us that there ex- (Crawford & Williams (1985)). The relation between the matrix pairwise comparison A and the relative pri- ists one principal eigenvalue, λmax, and that it is unique. We then find an equation of the form ority vector w can be expressed as

      wi 1 a12 ··· a1n w1 w1 a = ε , i, j = 1, . . . , n , (2.5) ij w ij a21 1 ··· a2n w2  w2  j     = λ   (2.4)  . . .. .   .  max  .   . . . .   .   .  where εij are positive random perturbations. It is com- an1 an2 ··· 1 wn wn monly accepted that for nearly consistent matrices the 3 εij factor is log-normal distributed . Thus, to deter- It is a theorem (Saaty (1990)) that for a reciprocal n× mine the weights wi one can take the logarithm of (2.5) n matrix with all entries greater than zero, the principal 2There are different approaches to deal with the problem of eigenvalue λmax is always greater or equal to n, λmax ≥ the scale range. One approach could be to change the linear scale n. In particular, λmax = n if and only if A is a consistent given before to a more convoluted one. For example in (Donegan matrix. et al. (1992)) an asymptotic scale is employed so that we never What is it meant by consistent matrix? If we reckon get out of a prefixed scale range. However, in the literature, the linear scale of Saaty seems to be the most widely used scale. that alternative i is aij times better than alternative 3Indeed, the authors of (Shrestha & Rahman (1991)) found j, and the latter is a times better than alternative jk out that the error factors εij best describe the inconsistency in k, we should have, for consistency, aik = aijajk. This the decision process when they are log-normal distributed is know as multiplicative consistency. It is easily seen 2 log εij ∼ N (0, σij ) , (2.6) that multiplicative consistency implies reciprocity, but 2 the converse is not true. where N (µ, σ ) is the normal Gaussian distribution function with mean µ and variance σ2. In particular, note that the mean value It is often the case that multiplicative consistency is of the error factor εij is 1 and its range can be varied by choosing 2 not respected, introducing some form of inconsistency σij accordingly with the degree of expertise.

3 and then apply the least square principle, namely min- reciprocal n × n matrix A with all entries bigger than imizing the sum of squares of log εij, zero, the principal eigenvalue is always equal or greater than n. This is easily proved with some simple linear n X 2 algebra. E(w) = (log a − log(w ) + log(w )) . (2.7) ij i j Moreover, it turns out that A is a fully consistent i,j=1 matrix if and only if the principal eigenvalue is strictly An easy computation reveals that E(w) is minimized equal to n. when Given these facts, it is possible to define a set of in-

1 dices to measure the consistency of our decision matri-   n n ces. In particular, we can define the Consistency Index Y wi =  aij , i = 1, . . . , n . (2.8) (CI) as j=1 λ − n CI = max . (2.10) n − 1 This is also called the geometric mean, and from now Note that CI ≥ 0, as a consequence of what we said on we will adopt this method to compute weights. Note above. Also, the more CI is different from zero the that for consistent matrices, wi as in (2.8) is an eigen- more inconsistent we have been in the decision process. vector with eigenvalue n. The weights wi in eqn. (2.8) We can also define the Random Index RI of size n are defined up to a multiplicative constant (see eqn. as the average CI calculated from a large number of (2.7)). We have normalized them so that Qn w = 1. j=1 j randomly filled matrices. For a discussion on how these matrices are created see (Alonso & Lamata (2006)). 2.4 Aggregation Finally, we define the Consistency Ratio CR as the ratio CI(A)/RI(A) for a reciprocal n × n matrix, where The final step is to aggregate local priorities across all RI(A) is the random index for matrices of size n. criteria to in order to determine the global priority of Usually, if the CR is less than 10% the matrix is con- each alternative. This step is necessary to determine sidered to have an acceptable consistency. Nonetheless, which alternative will be the preferred one. this consistency index is sometimes criticized as it al- In the original formulation of AHP, this is done in the lows contradictory judgments. See the review (Ishizaka following way. If we denote l the local priority (weight) ij & Labib (2011)) for a discussion about this. of the alternative i with respect to the criterion j and In the literature, several other methods to measure w the weight of the criterion j, the global priority p j i consistency have been proposed. See (Ishizaka & Labib for the alternative i is defined to be (2011)) for an account of the existing methods. For X pi = wjlij . (2.9) example, the authors (Alonso & Lamata (2006)) have j computed a regression of the random indices and pro- posed the following formula Criterion weights and local priorities can be normal- ized so that they sum up to 1. In this way, we find λmax < 1.17699 n − 0.43513 , (2.11) P i pi = 1. The alternative getting the highest priority where n is the size of the pairwise comparison matrix, (modulo inconsistencies to be discussed later) will be while (Crawford & Williams (1985)) propose to use the the favorite one in the decision process. Geometric Consistency Index GCI Let us now move on to discussing (some of the) pos- n−1 n   2 sible inconsistencies of the AHP method. 2 X X aij GCI = log . (n − 1)(n − 2) w /w i=1 j=i+1 i j 2.5 Consistency of the AHP method (2.12) As we remarked before, the AHP method is based on In the coming sections, we will make extensive use of the idea that there is always some underlying scale in the GCI for the computation of consistency of decision a decision problem. This is encoded in the fact that processes as we believe it is more apt to capture the when we have calculated our weight matrix – which by propagation of inconsistencies. definition is a consistent ratio matrix built out of the weight ratios – this one should not be too far off the 3. Methodology original pairwise comparison matrix. In order to determine how far off we are, we need to In this section, we propose a methodology to evaluate find a way to determine the inconsistency of our deci- R&D performance. In particular, we discuss in detail sion matrices. To this purpose, it is useful to recall a how criteria and sub-criteria are to be chosen in our couple of facts (Saaty (1990)). Saaty noticed that for a proposed method.

4 3.1 Criteria and sub-criteria in R&D perfor- 3.1.2 Selection of Indicators mance evaluation Let us briefly outline the three steps we propose are 3.1.1 Perspectives to measure R&D performance to be taken in order to determine indicators for each criterion. These will be labeled Step 0, 1 and 2 and can Determining R&D performances usually relies on the be summarized as follows: identification of indicators (or metrics) relative to some criteria (perspectives). Giving the same importance to • Step 0: Selection of relevant raw data, i.e. the all indicators and/or criteria can lead to an oversimplifi- building blocks for the final indicators, through a cation of the R&D measuring process and this, in turn, questionnaire given to a team of experts. may lead to misinterpretation to the actual performance Step 1: Identification of the right indicators from of an R&D project (Salimi & Rezaei (2018)). • data selected at Step 0 through a second question- Thus, it is crucial to correctly identify criteria and naire. sub-criteria and subsequently determine relative impor- tance. The latter step can be carried out by asking a • Step 2: Pairwise comparisons between perspectives team of experts to make pairwise comparisons between (criteria) and indicators (sub-criteria) according to alternatives for both perspectives (criteria) and indica- the AHP method described in the previous section tors (sub-criteria). with some modifications that we describe later. Following the literature, for example (Kaplan et al. (1996)), (Bremser & Barsky (2004)), (Lazzarotti et al. More in detail, in Step 0 we prepare a list of param- (2011)) (Salimi & Rezaei (2018)), we lay out the four eters (raw data) that will be used to identify the indi- perspectives which are relevant for measuring R&D per- cators for the decision process. The list, an example of formance: which is given in Section 4, is submitted to a team of experts who are asked to identify the parameters that • Internal Business perspective (IB) are usually available in the projects they are involved in. This step is necessary to understand which parameters, among the proposed ones, are more versed to capture a • Innovation and Learning perspective (I&L) project performance. In Step 1, we ask the same team of experts to build, • Financial perspective (F) out of the raw data selected at Step 0, the indicators for the different perspectives. In particular, each of the • Network and Alliances perspective (N&A) participants is asked to form a number of normalized indicators for each perspective. For example, jumping Let us spell out what each perspective is about. ahead to the example of R&D performance evaluation The Internal Business perspective refers to internal re- given in Section 4, if we think that the number of find- sources, such as technological capabilities or human re- ings in a given project (each given in a publication or sources, that influence directly the performance of a presented at a conference) in the shortest time is a rel- project. The Innovation and Learning perspective refers evant indicator for Innovation and Learning, then we to the development of new skills as the result of project might propose as indicator: # of findings/total time of activities. Financial perspective, instead, aims at cap- the project. turing financial aspects of a project, with a focus on fi- If, for any reason, the experts think that some quan- nancial sustainability of a project. Finally, the Network tities do not need to be normalized and can stand on and Alliances perspective refers to the interaction with their own, they are allowed to choose no denominator. different partners, such as external companies involved Finally, a set of indicators for each perspective is formed in project activities and realization of the results. according to the consensus they received from the ex- The authors (Salimi & Rezaei (2018)) consider also perts. the “Customer perspective”, which refers to the extent In Step 2, the team of experts is eventually asked that R&D satisfies the needs of customers. In the fol- to form pairwise comparison matrices, both between lowing sections, we will be interested mainly in projects all criteria and sub-criteria. Nevertheless, there is an which do not involve customers. Thus, we will stick important caveat. Differently from the original AHP with the four criteria identified above. method, we require no strict reciprocity: aij should The four perspectives presented here will be the four not be necessarily equal to 1/aji, but small (and spo- criteria of our decision process. Indicators, i.e. sub- radic) deviations are allowed. The reason for introduc- criteria, will be associated with each of the criteria in a ing such an inconsistency is that we would like to de- way that we now describe. velop a method capable to capture and bypass possible

5 inconsistencies that often influence decision processes in and normalize it so that R&D performance evaluation. w(c)∗ w(c) = . (3.6) Pmc (c)∗ 3.2 AHP for evaluating R&D performance i=1 wi

As it should be by now clear, in our method, the cri- It turns out to be useful to repack the vectors w(c)into teria for R&D performance evaluation are represented 4×Nind P a matrix W ∈ R , with Nind = c mc the total by the four perspectives mentioned in the last section, number of indicators, in the following fashion while indicators – relative to each criterion – are the sub-criteria. Different projects in an evaluation session  w(1) T 0 0 0  make up the alternatives. In brief, the alternative which  0 w(2) T 0 0  W =   . (3.7) scores the biggest global priority will correspond to the  0 0 w(3) T 0  most impactful – as for the chosen criteria – project for 0 0 0 w(4) T R&D. We can now compute the global weight of the i-th 3.2.1 Pairwise comparisons of perspectives and indi- indicator as

cators 4 T X Let us define the pairwise comparison matrix among Pi = (v W )i = vj Wji , i = 1,...,Nind . (3.8) criteria C ∈ R4×4 in the following manner j=1   N c11 c12 c13 c14 P ind Note that i=1 Pi = 1 in our normalization. When c21 c22 c23 c24 there is more than one expert the global weight vec- C =   . (3.1) c31 c32 c33 c34 tors for each expert have to be combined so to obtain a c41 c42 c43 c44 unique global weight P (group). We will do this again by considering the geometric mean over the experts, i.e we Of course, cii = 1 for i = 1,..., 4. The priority vector v∗ of C can be easily computed as the geometric mean employ the AIP (Aggregation of Individual Priorities) over the columns of C, see eqn. (2.8), method rather than the AIJ (Aggregation of Individual Judgments), see (Dong et al. (2010)), 1  4  (c11 c12 c13 c14) 1 1 N   Nexp 4 Q exp (k) (c21 c22 c23 c24)  P ∗   (group) k=1 i v = 1 . (3.2) P = , (3.9) (c c c c ) 4  i 1  31 32 33 34    Nexp 1 PNind QNexp (k) 4 Pj (c41 c42 c43 c44) j=1 k=1

It turns out to be useful to our purposes to normalize where k runs over the number of experts, Nexp, and it in such a way the sum of its components is 1 (k) Pi is the global weight vector of the k-th expert. v∗ v = . (3.3) P4 ∗ 3.2.2 Evaluating R&D performance i=1 vi In the same fashion, we can define the pairwise com- Finally, we need to find a way to determine the prior- parison matrix among sub-criteria A(c) ∈ Rmc×mc , ity (or score) of each of the alternatives, i.e. different projects in our case.   a(c) ··· a(c) 11 1mc Each of the indicators in a given project can be mea- (c)  (c)  . . .  sured, in general, by means of a standard scale. For A = aij =  . .. .  , (3.4)  . .  (c) (c) instance, “time of a project” (see Section 4) can be eas- a ··· am m mc1 c c ily extrapolated once we know the date of beginning where c is an index that labels the different criteria (in and end of that given project. So it seems natural, in our case there is 4 of them). We can define, just as in order to compute the score of each project, to multi- the case of criteria, the priority vector w(c) for each A(c) ply the indicator-global-priorities by the corresponding R&D measurement and, in fact, here lies the central  1  mc point of our method. (a11 a12 ··· a1mc ) 1 m Once we have determined the global weight of each  (a21 a22 ··· a2m ) c  w(c)∗ =  c  , (3.5) indicator, we should multiply it by its “performance”  .   .  parameter. For instance, going back to the example  1  mc of # of findings/total time of the project mentioned (amc1 amc2 ··· amcmc )

6 in the previous section, the higher this number is, in Thus, the final R&D performance can be computed a given project, the better the project itself will per- by means of the following formula5 form in the final evaluation. This will ensure that the project, among those taken into considerations, with the Nind X (group) most performing indicators will be the most valuable for SR&D = Pi FXi (xi) (3.14) R&D. i=1 However, the alert reader has surely noticed that this Note that F (x ) ≤ 1 for any i = 1,...,N . There- can lead to a nonsense, as R&D measurement are often Xi i ind PNind (group) dimensionful quantities and it makes no sense to sum fore, SR&D ≤ i=1 Pi = 1. Thus we conclude them up. Thus, what we propose is to “map” each R&D that the R&D performance for each project is always measurement to a dimensionless parameter lying in the normalized to lie in the range 0 to 1: range 0 to 1 using the empirical Cumulative Distribu- tion Function (CDF). 0 ≤ SR&D ≤ 1 . (3.15) We remind the reader that the CDF of a real-valued random variable X is the function given by 3.3 Consistency of the method

FX (x) = P (X ≤ x) , (3.10) In AHP we are asked to make comparisons between each pair among the alternatives. Even though in ideal sit- where P (X ≤ x) is the probability that the random uations there would not be any inconsistencies, in real variable X takes on a value less than or equal to x. situations our decisions are subject to judgmental errors Among its properties, we have that the CDF is a non de- and conflicting with each other to some extent. creasing function of its argument and right-continuous. In the following we will stick with the assumption that In particular, if X is a continuous random variable error factors are log-normal distributed with 0 mean. Let us then proceed to estimate what the variance in a lim FX (x) = 0 , lim FX (x) = 1 . (3.11) x→−∞ x→∞ generic R&D performance evaluation is going to be for us. In integral form the CDF can also be expressed as

Z x 3.3.1 Uncertainty in R&D performance FX (x) = fX (t) dt , (3.12) −∞ As just remarked, it is commonly accepted that in- where fX (x) can be interpreted as a probability density consistencies are log-normal distributed. For example, function for the variable X. It is quite trivial to prove (Shrestha & Rahman (1991)) found that for a pairwise that for a continuous random variable X, the random comparison matrix of dimension n the variance of the 2 variable Y = FX (X) has a standard uniform distribu- error σ is well approximated by the formula (2.12), tion.4 Indeed, that we report here for clarity,

n−1 n   2 FY (y) = P (Y ≤ y) = P (FX (X) ≤ y) 2 X X aij (3.13) σ2 = log , −1 −1 (n − 1)(n − 2) w /w = P (X ≤ FX (y)) = FX (FX (y)) = y . i=1 j=i+1 i j (3.16) In practice, we map each R&D measurement variable where aij is the pairwise comparison matrix and wi the Xi using the corresponding empirical CDF, in place of components of the corresponding priority vector. the true unknown CDF, so to obtain variables having In our case, at the level of the four criteria (the an approximately uniform distribution in the range 0 to four perspectives mentioned in the previous section) we 1. would find an error of the form 4If X is a discrete random variable, then its CDF is given by 3 4   2 X 2 1 X X cij FX (x) = P (X = xi) , σ = log (3.17) 3 v /v xi≤X i=1 j=i+1 i j where P (X = xi) is the probability for X to attain the value 5 xi. Clearly, in this case the map Y = FX (X) does not yield a We have assumed throughout that the larger an indicator variable with standard uniform distribution: The resulting vari- performance is the more it will contribute to R&D performance, −1 able is still discrete and P (Y = y) = P (X = FX (y)). However, SR&D. It might very well be that exactly the opposite happens if X can take sufficiently many values, it can be approximately for a given indicator: the smaller an indicator is the better it is seen as a continuous variable and also the aforementioned result in terms of performance. In that case, it is enough to replace approximately holds. FXi (xi) by 1 − FXi (xi).

7 while for each of the sub-criteria we find Here δij is the kronecker delta: δij = 1 if i = j and 0 2 otherwise. The uncertainty on the final outcome S mc−1 mc " (c) !# R&D 2 2 X X aij σ(c) = log is easily evaluated to be (the x’s are assumed to have (c) (c) (mc − 1)(mc − 2) i=1 j=i+1 wi /wj no associated statistical error) (3.18) where, again, c is an index labeling each of the criteria Nind and mc is the number of sub-criteria for the criterion c. X σ2 = σ2 F (x )2 . (3.25) SR&D (group) i In a similar fashion, in (Eskandari & Rabelo (2007)) it Pi is argued that the variances associated with each local i=1 weight are given by

 4  2 15 X 2 2 2 2 σv =  vj − vi  σ vi (3.19) i 16 j=1 4. Application of the method and for the case of the four criteria, while it is of the follow- Results ing form

 m  2 c 2 2 2 mc − 1 X (c) (c) (c)2 2 In this section, we apply our methodology to R&D per- σ (c) = 2  wj − wi  σ wi (3.20) wi m formance of 34 projects stored in the DPR&DI (Digital c j=1 Platform for R&D and Innovation Projects). for the case of the sub-criteria. Note that we are as- suming no correlation among different criteria or sub- The DPR&DI is a PaaS (Platform as a Service) criteria. In this way we can also repack the errors of for the management of R&D and industrial innovation projects. It allows to monitor in real time the progress eqn. (3.20) in the following 4 × Nind matrix of any project, the storage of information and sharing of  2 T  σw(1) 0 0 0 data. It can also be used to create connections between T  0 σ2  0 0  the various parties involved in the innovation process, σ2 =  w(2)  W  2 T  creating a shared space for collaboration that connects  0 0 σw(3) 0  T researchers, innovators, institutions and funding agen- 0 0 0 σ2  w(4) cies6. (3.21) Given that we are interested in estimating the final We discuss in detail the various steps to find a project error affecting SR&D for each of the projects, it is nec- performance by applying the general procedure ex- essary to see how the uncertainties propagate. In par- plained in the previous sections. ticular, the variance error for the global weight of an indicator (for each of the experts) is found to be

4 X   σ2 = σ2 W 2 + v2(σ )2 , i = 1,...,N . Pi vj ji j W ji ind j=1 4.1 Step 0 (3.22) Note that, in order to derive eqn. (3.22), we assumed First of all, we lay out the raw data (Step 0) that we that the uncertainty affecting the criteria and sub- reckoned were necessary to build meaningful indicators criteria are independent of each other. Finally, in order to evaluate the R&D performance of the projects in the to estimate the error affecting the global weight of an in- DPR&DI. dicator for the total group of experts we use the general formula (see for instance (Bevington et al. (1993))) Let us start off by giving all the quantities that we 2 believe are relevant to characterize the magnitude of a Nexp Nind (group) ! X X ∂P σ2 = i σ2 , (3.23) project (group) (l) (l) Pi Pj l=1 j=1 ∂Pj where the derivatives are easily computed from eqn. (3.9) to be 6It has been developed by Idea-re S.r.l. under the grant de- livered by the Region “POR FESR 2014-2020. Asse I (group)  (group) (group) ∂P Pi δij − Pj Azione 1.3.1. Sostegno alla creazione e al consolidamento di start- i = . (3.24) up innovative ad alta intensit`adi applicazione di conoscenza e alle (l) (l) ∂Pj NexpPj iniziative di spin-off della ricerca”

8 Project Financial Reporting

– Duration of the project – Total cost of the project

– Number of calls for tenders – Total cost of the project team – Number of partners involved in the project – Total cost of equipment – Number of project activities – Total cost of external suppliers – Number of people involved in the project – Total cost of consultants

– Number of people with an education ap- propriate for the given topic and financial support: – Time spent on the project Financial Support – Equipment usage time – Grant eligible expenses – Tax credit eligible expenses

At this point, a team of experts was asked to give Second, we believe the impact on R&D is driven also a ranking of the raw data just given in order to form by the amount of findings for a given project. Thus, we a coherent set of indicators. In particular, this led us proposed to consider also the following quantities: to Step 1, where raw data are combined to form the indicators, as explained in Section 3.1.

4.2 Step 1 As already anticipated, a statistical analysis made over Findings the experts’ opinions has led to a set of indicators that can be used to evaluate R&D performance. These are – Number of findings (papers, books, confer- reported in Table 1. As we can see, there are 5 in- ences, exhibitions, others) dicators for the Internal and Business perspective, 6 for Innovation and Learning, 5 for the Financial per- – Number of papers for a given project spective and 4 for Alliances and Network perspective. – Number of books for a given project We have thus created a layer of 20 indicators (sub- criteria), each associated with a given perspective (cri- – Number of conferences attended to present terion). This, along with the 34 project considered in a given result this study, makes up the basic AHP structure in the R&D performance evaluation. – Number of exhibitions attended to present a given result 4.3 Step 2 – Number of patents for a given project We are now ready, as for Step 2, to compute the R&D performance for the 34 selected projects using formulas spelled out in Section 3.2. In particular, the distribution for the R&D perfor- mance scores is depicted in Fig. 2a. We can see that the distribution is quite uniform, and all scores lie (ap- Moreover, it is crucial to have indicators measuring proximately) in the range 0.6 to 0.8 (remember that the the total costs of a given project, especially in order to SR&D is normalized to be in the range 0 to 1). quantify the sustainability of the project itself. Thus, As for the consistency of our results we can employ we introduce raw data also for detailing financial re- the formula (3.25). Scores with errors are shown in porting: Fig. 2b.

9 Table 1: Indicators selected by the team experts consulted to evaluate R&D performance.

Perspective Indicators Number of findings / Cost of the project Number of people in the project / Project duration Internal Business Perspective Grant eligible expenses Time spent on the project / Number of people involved Time spent on the project / Number of activities Number of papers / Number of people in the project Number of books / Number of people in the project Number of patents / Total cost of the project Innovation and Learning perspective Number of findings / Duration of the Project Number of papers / Total cost of the project Number of findings / Time spent on the project Total cost of the team / Total cost of the project Total cost of the suppliers / Total cost of the project Total cost of equipment / Total cost of the project Financial perspective Grant eligible expenses / Total cost of the project Number of patents / Total cost of the project Time spent on the project / Total cost of the project Number of partners Number of partners / Time spent on the project Alliances and Networks perspective Number of project activities / Total cost of suppliers Number of patents / Number of suppliers

1.0

0.200

0.175 0.8

0.150

0.6 0.125

0.100 Score Probability

0.4

0.075

0.050 0.2

0.025

0.000 0.0 0.60 0.65 0.70 0.75 0.80 0 5 10 15 20 25 30 35 Score Project id

(a) Histogram of the score distribution. On the y-axis we have (b) Scores with error bars. the relative probability of finding a given score (x-axis). Figure 2: Score distribution for 34 projects stored in the DPR&DI.

As we can see, the σ2 on any given project is quite probability of inversion of two given projects in the final significant, making it hard to identify precisely which ranking. We leave issues like this for future studies. project performs best in this particular analysis. This is essentially due, as we would expect, to the degree of inconsistency allowed when forming the pairwise com- parisons. What could be nice to do is to compute the

10 5. Discussion and Conclusions experts (i.e. how far off they are with respect to one an- other) involved in the decision process. See for instance In this paper we considered a new approach to de- (Aguar´onet al. (2019)). Another interesting direction termine R&D performance based on the group-AHP might be that of gathering data from different experts method. As explained thoroughly in the main text, the (eqn. (3.9)) using a weighted geometric mean. For ex- AHP method is a powerful method that allows to quan- ample, we could set up a computation where the more tify relative weights of criteria in a decision problem. In consistent an expert has been in writing down pairwise particular, any decision process is suitably decomposed comparison matrices, the more weight she/he will have into a hierarchy of sub-problems that are usually rather in the computation of priorities. Moreover, it would be easy to deal with. interesting to find a way of discussing more perspectives In this paper the decision process of the AHP method than those considered in this paper (Internal business, corresponds, roughly speaking, to determining which Innovation and Learning, Financial and Network and among a list of R&D projects has the best performance Alliances perspectives). In this way, we may hope to according to a number of criteria (perspective) and sub- build a more general method suitable to many more or- criteria (indicators) selected by a team of experts. ganizations. We hope to tackle all these problems in The need for a systematic and quantitative analysis of the near future. the performance of R&D projects relies on the fact that, nowadays, R&D is one of the most significant deter- Acknowledgments minants of the productivity and growth of companies, organizations, governments etc. Thus, it has become It is a great pleasure to thank the Italian Association of somewhat crucial to have at our disposal an intuitive, University Professors of Mechanical and Thermal Mea- easy, efficient yet systematic and analytical method to surement for its support during the realization of the quantify R&D performance. present paper. More in detail, we started off in Section 2 by describ- ing the basics of AHP method as originally developed References by Saaty, outlining all the important steps to follow in a decision process in order to determine the best among Aguar´on,J., Escobar, M. T., Moreno-Jim´enez,J. M. & a set of alternatives. Tur´on,A. (2019), ‘Ahp-group decision making based In Section 3 we laid out the general procedure of on consistency’, Mathematics 7(3), 242. our proposed method in order to define the basic AHP Alonso, J. A. & Lamata, M. T. (2006), ‘Consistency structure for R&D performance evaluation. As we have in the analytic hierarchy process: a new approach’, seen in the main text, this is essentially based on a set International journal of uncertainty, fuzziness and of questionnaires handed to a team of experts who are knowledge-based systems 14(04), 445–459. asked, through a number of steps, to define a consistent hierarchical structure of the AHP-based method. Then Bevington, P. R., Robinson, D. K., Blair, J. M., we gave more mathematical details on how a quantita- Mallinckrodt, A. J. & McKay, S. (1993), ‘Data re- tive evaluation of R&D performances and relative in- duction and error analysis for the physical sciences’, consistencies can be carried out. Finally, in Section 4 Computers in Physics 7(4), 415–416. we presented an example of our method for the case of a number of projects stored in the DPR&DI platform. Bremser, W. G. & Barsky, N. P. (2004), ‘Utilizing the balanced scorecard for r&d performance measure- We believe that our results might have important im- ment’, R&D Management 34(3), 229–238. plications for those companies, organizations and public administrations interested in determining R&D perfor- Chiesa, V., Frattini, F., Lazzarotti, V. & Manzini, R. mance. First of all, we provided a method for a firm to (2009), ‘Performance measurement in r&d: explor- make comparisons between its R&D projects. In this ing the interplay between measurement objectives, way managers are facilitated in understanding which dimensions of performance and contextual factors’, project is more deficient and in which area (perspec- R&d Management 39(5), 487–519. tive) or even in formulating more effective strategies to Crawford, G. & Williams, C. (1985), ‘A note on the improve the R&D performance of low-scoring projects analysis of subjective judgment matrices’, Journal of according to their own objectives. Second, our method mathematical psychology 29(4), 387–405. offers a way of comparing a company’s R&D global per- formance to the performance of other firms. De Jong, P. (1984), ‘A statistical approach to saaty’s To sharpen our work further, it could be interesting scaling method for priorities’, Journal of Mathemati- to study and quantify the compatibility of the different cal Psychology 28(4), 467–478.

11 Donegan, H., Dodd, F. J. & McMaster, T. (1992), ‘A Kim, B. & Oh, H. (2002), ‘Economic compensation new approach to ahp decision-making’, Journal of the compositions preferred by r&d personnel of different Royal Statistical Society: Series D (The Statistician) r&d types and intrinsic values’, R&D Management 41(3), 295–302. 32(1), 47–59.

Dong, Y., Zhang, G., Hong, W.-C. & Xu, Y. (2010), Lazzarotti, V., Manzini, R. & Mari, L. (2011), ‘A ‘Consensus models for ahp group decision making un- model for r&d performance measurement’, Interna- der row geometric mean prioritization method’, De- tional journal of production 134(1), 212– cision Support Systems 49(3), 281–289. 223.

Eskandari, H. & Rabelo, L. (2007), ‘Handling uncer- Moncada-Patern`o-Castello, P., Ciupagea, C., Smith, tainty in the analytic hierarchy process: A stochas- K., T¨ubke, A. & Tubbs, M. (2010), ‘Does europe per- tic approach’, International Journal of Information form too little corporate r&d? a comparison of eu and Technology & Decision Making 6(01), 177–189. non-eu corporate r&d performance’, Research Policy 39(4), 523–536. Griffin, A. (1997), ‘Pdma research on new product de- velopment practices: Updating trends and bench- Saaty, T. L. (1977), ‘A scaling method for priorities marking best practices’, Journal of Product Innova- in hierarchical structures’, Journal of mathematical tion Management: An International Publication of psychology 15(3), 234–281. The Product Development & Management Associa- Saaty, T. L. (1990), ‘How to make a decision: the an- tion 14(6), 429–458. alytic hierarchy process’, European journal of opera- Ishizaka, A. & Labib, A. (2011), ‘Review of the main de- tional research 48(1), 9–26. velopments in the analytic hierarchy process’, Expert Saaty, T. L. (2010), Mathematical principles of decision systems with applications 38(11), 14336–14345. making (Principia mathematica decernendi), RWS Jefferson, G. H., Huamao, B., Xiaojing, G. & Xiaoyun, publications. Y. (2006), ‘R&d performance in chinese industry’, Salimi, N. & Rezaei, J. (2018), ‘Evaluating firms’ r&d Economics of innovation and new technology 15(4- performance using best worst method’, Evaluation 5), 345–366. and program planning 66, 147–155. Kaplan, R. S., Norton, D. P. et al. (1996), ‘Using the Shrestha, G. & Rahman, S. (1991), ‘A statistical repre- balanced scorecard as a strategic management sys- tem’. sentation of imprecision in expert judgments’, Inter- national journal of approximate reasoning 5(1), 1–25. Kerssens-van Drongelen, I. c. & Bilderbeek, J. (1999), ‘R&d performance measurement: more than choosing Tidd, J., Bessant, J. & Pavitt, K. (2000), ‘Managing a set of metrics’, R&D Management 29(1), 35–46. innovation’.

12