Genetic Libraries: and Knowledge Assets¤

Gordon C. Rausser Arthur A. Small, III University of California, Berkeley Columbia University DRAFT October, 2001

Abstract The article examines the incentives facing owners of genetic re- source collections to invest in collecting data about their materials. Characterization converts an unordered collection of materials into a cataloged library. Researchers ¯lter these data through models that suggest how well observed characteristics correlate with potential suc- cess in R&D projects. The catalog enables the manufacture of vari- ance in the researcher's distribution of beliefs regarding the utility of di®erent leads. It allows her to focus her search on promising sub- sets of the collection, increasing the e±ciency of research. Because \cataloged" leads are more likely to in the upper tail of this dis- tribution, they are more likely to be tested and, thus, are more likely

¤Gordon C. Rausser is Robert Gordon Sproul Distinguished Professor of Agricultural and Resource Economics, University of Califoria at Berkeley; and Member, Giannini Foun- dation of Agricultural Economics. Arthur A. Small, III, is Assistant Professor in the School of International and Public A®airs at Columbia University. For thoughtful comments and suggestions we thank, without attributing either endorsement or blame: Christopher Costello, Geo® Heal, Steve Polasky, David Simpson, and workshop participants at the National Center for Ecological Analysis and Synthesis, Santa Barbara, California. Corre- spondence to: Arthur A. Small, Columbia University SIPA, 420 West 118th Street, Mail Code 3323, New York, NY 10027, USA; tel: 1-212-854-9016; fax: 1-212-854-5765; email: [email protected]; web: www.columbia.edu/~aas69.

1 to provide value. They are also more likely to be avoided when they are not in fact promising. A formal framework allow the returns on complementary investments in conservation and data management to be quanti¯ed.

Within the economics of , one class of questions concerns the role of genetic as direct inputs to processes of economic production. This role raises issues about the valuation of genetic resources, the institu- tions by which they are managed, and the incentives for conservation. This discussion includes the role of genetic resources as inputs to R&D processes leading to new drugs, new crops, and other innovations. Economists and others have investigated the value of particular genetic materials in particular applications (Smale et al., AJAE), the value of bio- prospecting as a source of conservation ¯nance[7], and the design of intellec- tual property rights and other institutions governing genetic resources[1]. The economics of bioprospecting begins from an understanding of two technical characteristics of the market for research leads. The ¯rst is uncer- tainty: until tests are undertaken, it is unknown whether any given lead will yield a valueable discovery. The second is non-rivalry: roughly speaking, any given discovery can be made only once. Two copies of a potential discovery are in this sense redundant. In expectation, each lead imposes a negative externality on all other leads. Because of uncertainty and redundancy, the role of prior information is critical. If all leads are associated with identical success probabilities, then the marginal lead in a large collection has very little value [9]. However, if there is variance in the investigator's prior beliefs about the hit rates of di®erent leads, and if she can focus her sampling on the upper tail of this distribution, then unusually promising leads can command signi¯cant infor- mation rents [8]. These rents are largely a function of the role of information in reducing search costs. In almost all formal theory about bioprospecting, information is exogenous. There are at least two rejoinders to the Rausser-Small analysis. In their work (as in almost all formal theory about bioprospecting) information is supplied exogenously and freely. In this story, genetic resources capture part of the knowledge spillovers created by public-sector science. This analysis begs the question, How much knowledge should the public sector provide? When should the creation of information be left in private hands?

2 The conclusions developed in these studies were based on a maintained hypothesis that information is speci¯ed exogenously. However, since infor- mation can increase the returns to holding genetic resource assets, the owners of these resources have incentives to improve the quality of available infor- mation concerning their collection. Rather than attracting the interest of R&D ¯rms through lowering their prices, resource owners can, alternatively, attempt to manufacture promising beliefs on the part of these buyers con- cerning the utility of the seller's collection. Managers of genetic resources have attempted to add value to their collec- tions by providing data on their materials. The United States Department of Agriculture, for example, has developed an Internet-accessible database1 of its collection of rare and unusual crop varieties. breeders from around the world can use the system to identify that express characteristics of potential utility, such as drought tolerance or resistance to a particular pest. Breeders can then request samples of from these varieties, to incorpo- rate into new crops that carry will, with e®ort and luck, also express the desired characteristics. Databases are also being developed for biodiversity collections used for medical bioprospecting. Costa Rica's National Institute for Biodiversity (INBio) has developed a bar-coding based system to inven- tory the wide range of plants and other organisms it has sampled as part of its bioprospecting contract with Merck, a large pharmaceutical ¯rm. These investments in data collection and management have been based on an intuitive understanding of their useful role in the technology devel- opment process. Genetic resource managers acknowledge that they do not currently have rigorous methods for estimating the likely returns on these in- vestments. It is unclear, therefore, whether there is over- or under-investment in database development at the margin. The lack of a framework for mea- sures returns-on-investment hinders the development of the bioprospecting industry. The factor was cited speci¯cally as the reason Costa Rica recently abondoned plans for a $90 million all-taxa survey of a conservation reserve on the country's Paci¯c Coast [5] This article attempts to identify how such a calculus could be developed. The task requires that we examine in detail how research ¯rms formulate their beliefs about the potential utility of leads. Suppose that potential research projects appear with some frequency, and that each time a project

1Genetic Resources Informaton Network, www.ars-grin.gov.

3 appears, R&D ¯rms survey the set of available research leads for promising possibilities. The space of available research approaches could be enormous| indeed, this situation virtually de¯nes a genuine research problem. As noted, research almost never proceeds via brute-force investigation of all possible options. Typically, research is a viable endeavor only when investigators are able somehow to organize a daunting space of options into a manageable number of categories, according to meaningful conceptual distinctions. In other words, the investigator must have or construct a useful model of the system she studies, if experiments upon that system are to yield fruitful result. Models serve as ¯lters for the available data, allowing researchers to form beliefs about the e±cacy of alternative leads. This discussion suggests how the value-adding role of database develop- ment can be quanti¯ed. To motivate the basic issues, consider the problem facing a researcher who seeks some highly speci¯c information in a large but poorly-organized library. Her ¯rst step, obviously, will be to check the catalog, in the hope of gaining guidance as to where, in the great mass of material, she might ¯nd what she's looking for. Suppose, however, that there was no catalog, and that the books were unmarked and arranged completely at random throughout the library. In this case, the patron will have no choice but to assign uniform priors to the books. Her only options would then be to commence a brute-force sequential search of the entire collection, or to give up. When a catalog is available, on the other hand, the patron can focus on a smaller subset of promising books, while entirely ignoring others. In other words, the availability of relevant data on the collection (including `generic' data that may not have been gathered with the patron's particular project in mind) strengthens the patron's ability to generate informative prior beliefs over the likelihood that any given book contains the information she desires. As more data become available, priors becomes sharper, and the books become more ¯nely di®erentiated. Improved organization does not e®ect the \average utility" that a patron would expect from a book drawn from the shelf at random. Rather, the creation of a catalog can be seen as enabling the manufacture of variance in the patron's distribution of beliefs regarding the utility of di®erent books. Pursuing this theme, this article examines the incentives facing owners of research leads to invest in preliminary characterization of their materi- als. \Preliminary characterization" refers to inventorying, collecting data on

4 general characteristics about their collections, and the organization of these data into a publicly accessible database. In e®ect, it involves the conversion of an unordered collection of materials into a cataloged library. Researchers ¯lter these data through models that suggest how well observed character- istics correlate with potential success in a speci¯c innovation project. The availability of the catalog enables a mean-preserving spread in the patron's distribution of beliefs. A catalog is valuable to the patron under the reason- able assumption that she is able to focus her search on a promising subset of materials selected from the inventory. Each patron conducts a sequen- tial search, beginning with the most promising candidates, those books in the \upper tail" of her distribution of beliefs. Because `cataloged' books are more likely to land in the upper tail of this distribution, they are more likely to be tested, and thus are more likely to provide value. `Cataloged' books are thus more valuable than uncataloged books, even if they are not on average more likely to contain any particular information desired by a given patron. Furthermore, the value of the catalog increases with the number of library patrons. The article examines several questions. How does the availability of a cat- alog a®ect the returns to applied research? What incentives are there for the owners of genetic resources to undertake the creation of a catalog? How does the option to build a catalog of biological data a®ect the imputed returns to holding genetic resource assets? These questions are investigated with the aid of a simple formal model of the innovation discovery process in which ¯rms' beliefs are partly endogenous. Analysis of the model shows that the construction of a catalog has signi¯cant e®ects on the imputed value of the collection. Characterized leads are in expectation more valuable that unchar- acterized leads. The value is a function of the degree to which information allows search costs to be avoided, and allows projects to be undertaken that might otherwise not have been. Indeed, characterized leads have a positive value in expectation even if materials are abundant. In this situation, the marginal value of a lead becomes entirely a function of the complementary knowledge assets. These knowledge assets thereby generate an unambiguous increase in the returns to holding leads. Various questions connected with evaluation of genetic resources have been addressed in economics and elsewhere. Small [10, Chapter 5] presents a simpler version of the work presented here. It likewise builds on earlier work in the theory of search.

5 The work is closest in spirit to that of Evenson and Kislev [3], who model applied research as a process of repeated sampling from a known distribution, the parameters of which can be changed through basic research. It is particu- larly related to a version of their model [2, pp. 150-154] in which the variance of the sampling distribution can be increased, at some cost, according to a known function. The distinction between this work and ours turns on a del- icate but essential di®erence between information (or beliefs) about a sample population, and the properties of the population itself. Evenson and Kislev examine the value created by increasing the variance of a population from which samples are drawn. (A library that contains only cookbooks is for most people less useful than a library with wide-ranging subject coverage.) The present work considers investments in pre-sorting a population so that, once a project is identi¯ed, sampling can focus on a promising sub-population. (For all patrons, a cataloged library is more useful that an unorganized col- lection of books.) Intuitively, we would expect the two types of investments to be complementary. (The best libraries are large, diverse, and orderly.) Koo and Wright [6] focus on the timing of the evaluation investment| speci¯cally, on the conditions under which it is preferable to wait until after an infestation of an agricultural pest before conducting a search of a genebank for desired genetic material. As used by Koo and Wright, the term \evalu- ation" is project-speci¯c. It corresponds to the search in a collection for a speci¯c useful trait; a successful search creates the option to develop a new crop variety which expresses that trait. In this paper, the terms \evalua- tion" and \characterization" refer to a prior stage of preparatory activity: collecting general data about accessions, possibly before any speci¯c devel- opment application is identi¯ed. Gollin, Smale and Skovmand [4] consider the marginal value of accessions.

1 A Model of the Genebank Manager's Prob- lem

A simple model of the production process for genebank services is presented. A genebank is represented as a library of N¹ objects, called \accessions" or \leads." Of these, some number N N¹ have been described, or character- ized, according to some qualitative ·and quantitative characteristics. Char-

6 acterization involves the collection of data that associates the accession with an element in a space X of characteristics. Data are generic in the sense that they appear to be useful generally in helping investigators to ¯nd ac- cessions of interest, but they are not chosen for their value in any particular application. The distribution of types represents the state of scienti¯c knowledge, and the ability of applied researchers to use available data to formulate beliefs about the promise of alternative lines of research. It can be imagined that each lead has been measured in several dimensions that correlate with its potential as a viable source of discoveries. Through measurement, each lead is associated with a corresponding point in some space X of characteristics. The state of scienti¯c knowledge is represented by a function P : X [0; 1] that associates each point in characteristics space to some hit rate. !Thus if lead n has characteristics xn X, P(xn) = pn denotes the probability that a test of lead n yields a discov2ery, for the researcher's de¯ned project. Given this initial indication of value, the researcher can elect to examine accessions individually to ascertain their suitability in her application. The investigator tests accessions sequentially, at a cost c per test, where c is a positive constant. Let B denote the social welfare generated by the discovery, net of all such additional development costs. It is assumed that a discovery need be made only once; multiple hits are redundant. The sole behavioral assumption is that researchers select the order in which they test accessions so as to maximize the expected payo® to the project. Given our assumptions about the incentives facing researchers, their be- havior is characterized by the result, noted earlier, that optimal search in- volves checking the most promising leads ¯rst [11]. In light of the result, the researcher begins his search by testing only the promising accessions. If this pool is exhausted without a discovery, the researcher moves on to examine the pool of uncharacterized accessions.

1.1 A Simple Example Many qualitative insights can be gained by examining the very simple case in which there are only two categories of leads, \promising" and \marginal." Those that are \promising" have a probability p1 of containing the desired th trait. \Marginal" accessions have a probability p0 p1 . Thus the k patron's model is represented by a function P : X ·p ; p that gives the k ! f 0 1g 7 probability that an examination of a lead with a given set of characteristics will contain the trait the ¯rm seeks. Each patron ¯nds that a fraction µ of cataloged accessions are promising, in their applications. Thus

p1 with probability µ P(xn) = (1) f p0 with probability 1 µ ¡ The hit rate p for an accession that hasn't been cataloged is, therefore, the expected value of the hit rates for those that have: p = µp1 + (1 µ)p0. A number N of accessions have been characterized. Of the¡se, the re- searcher ¯nds that a fraction µ are promising. Let N1 = µN denote the num- ber of \promising" accessions with hit rate p , N = (1 µ)N the number of 1 0 ¡ \unpromising" accessions with hit rate p0, and M = N¹ N the number of accessions that have not been characterized, with hit rat¡e p. We also adopt the (credible) simplifying assumption that unpromising accessions are not worth testing, in expectation: p0B c < 0. Since p0 < c=B, the stopping rule implies that unpromising access¡ions are never tested. Let V^ denote the continuation value of the project conditioned on the event that all promising accessions have been tested unsuccessfully: N¹XN ¡ i 1 N¹ N V^ = q ¡ (pB c) = (1 q ¡ )(B c=p) (2) i=1 ¡ ¡ ¡ where q = 1 p is the probability that a test of an uncharacterized accession will result in¡failure. Similarly, V = (1 qN1 )(B c=p ) + qN1 V^ (3) ¡ 1 ¡ 1 1 N N N¹ N = (1 q 1 )(B c=p ) + q 1 (1 q ¡ )(B c=p): ¡ 1 ¡ 1 1 ¡ ¡ Viewing this formula as a function of N, the number of characterized leads, and rearranging terms yields " # h i µN N¹ N ¹ ¹ 1 q1 µN 1 q ¡ V (N; N) = 1 F (N; N) B ¡ + q1 ¡ c: (4) ¡ ¡ p1 ¢ p ¹ µN N¹ N Here, F(N; N) = q1 q ¡ denotes the probability that the project fails. Hence a success requires clearing two hurdles. First, a lead must be judged su±ciently promising to be worth testing. Second, the lead must then yield a successful trial. The value of characterization depends on how it a®ects outcomes at both stages, and on what would have happened absent characterization.

8 1.2 The Collection Manager's Investment Problem We suppose that the manager does not know in advance the ap- plications that R&D ¯rms will have in mind when they access the bank. However, the manager is familiar with the general form of the R&D ¯rm's problem, including the distribution of probabilities over the di®erent types. Her problem is to allocate a budget amongst competing activities: collection, conservation and storage, characterization, and the development of informa- tion systems, basic research, and dissemination of data and materials.

1.3 The demand for data The R&D ¯rm's demand for characterization is the bene¯t of a marginal increase in N, holding constant the total resource stock N¹. Note that @ qµN = qµN µ ln q. Furthermore, since p is assumed to be small, we can @N ¢ use the approximation ln q = ln(1 p) = p. Rearranging terms, we have ¡ » ¡ h i @V µN N¹ N N¹ N ¹ = q µ(1 q ¡ )°c + (1 µ)q ¡ (c p B) : (5) @N jN constant » 1 ¡ ¡ ¡ 0

µN The formula is interpreted as follows. With probability q1 all the other characterized accessions are tested unsuccessfully. Conditioned on this event, the R&D ¯rm makes a preliminary evaluation of the marginal characterized lead. With probability µ, this lead is judged to be promising. By focusing attention on this promising lead instead of an average lead, the R&D ¯rm reduces the expected number of trials needed to generate a discovery, thereby avoiding some search costs. The amount of savings depends on the size and fertility of the \average pool," as well as on the degree ° = (p1 p)=p to which promising leads are superior to average leads. ¡ A bene¯t is also realized, however, if the lead is judged to be unpromising. In this case, which occurs with probability 1 µ, the researcher avoids the cost of testing a low-probability \dud." ¡ Note that as the total resource stock grows, i.e. as N¹ , the second term goes to zero (along with all impact of B). Howeve!r, 1the ¯rst term actually gets larger: as the haystack grows, information on the whereabouts of the needle becomes increasingly important. As economic objects, data have wonderful properties: they are completely durable and reusable in¯nitely many times at essentially zero cost. Once an

9 accession has been characterized, the data about it can be used by each researcher who appears with a project. Suppose that researchers appear with projects according to a Poisson process with arrival rate ¸ per year. Then in expectation there will be ¸ searches per year. If future costs and bene¯ts are discounted at the rate of interest r, then the total net present bene¯t of maintaining the collection is given by

X 1 t 1 + r ¸(1 + r)¡ V = ¸V: (6) t=0 r ¢ This value is gross of any recurring costs associated with collection main- tenance. The breeding industry's demand for characterization is the net present value of marginal characterization bene¯ts:

X h i 1 t @V 1 + r µN N¹ N N¹ N ¸(1+r)¡ N¹ »= ¸q1 µ(1 q ¡ )°c + (1 µ)q ¡ (c p0B) : t=0 @N j r ¢ ¡ ¡ ¡ (7) To conserve accessions requires resources. Given advances in biotech- nology, and the promise that researchers can soon use \any gene from any species" in their crop development e®orts, does it make sense to spend re- sources on conservation? In other words, what is the marginal bene¯t to a research enterprise of conserving an accession? The answer depends on whether the lead has been characterized. The marginal value of a characterized lead is h i @V µN N¹ v(N; N¹) = ¹ = µq q (° + 1)(pB c) + °c : (8) @N jN const 1 ¡ The marginal value of an uncharacterized lead is @V v^(N; N¹) = = qµN qN¹ (pB c): (9) @N¹ jN const 1 ¡ Note that v^ is linear in ¸. This re°ects a feature common to the economics of innovation. Since data are not consumed through use, the larger the production base over which the innovation can be spread, the more valuable the innovation becomes.

10 2 Discussion and Extensions

The forgoing discussion has been kept intentionally simple, in order to allow clear exposition of the relevant insights. However, the approach presented here, and the conclusions it generates, do not depend on these assumptions speci¯cally. In particular, the approach generalizes readily to cases in which the hit rates of characterized leads represent independent draws from a general dis- tribution. Suppose that the hit rate of a characterized lead is a random variable p~ with probability density function Á(s), where 0 < s < 1, so that R 1 p = E[p~] = 0 Á(s)ds. If average leads are insu±ciently promising to merit testing (if pB c < 0), the value to an R&D ¯rm of access to the collection can be shown ¡to take the form " # h i Z 1 h(c=B)N h(s)N V (N) = 1 e¡ B e¡ NÁ(s)ds c (10) ¡ ¡ c=B

R 1 h(s)N where h(s) = s ¾Á(¾)d¾ = Pr[p~ > s] E[p~ p~ > s]. Here, e¡ is the probability that a lead with hit rate s ¢is evejr tested, i.e. that all tests of h(c=B)N higher-quality leads end in failure. Thus, 1 e¡ is the probability that the search ends successfully, with a disc¡overy, before the set of leads R 1 h(s)N is economically exhausted. Likewise, c=B e¡ NÁ(s)ds is the expected number of tests that will be implemented per project. This is the integral, over the quality axis, of the probability that leads of a given type s will h(s)N be tested (e¡ ), times the expected density of leads having that type (NÁ(s)). As the gene bank manager decides whether to sink additional resources into characterization, she must consider how the investment will a®ect both the probability of discovery and the number of tests. The bene¯t of the marginal accession is given by

Z 1 h(c=B)N h(s)N V 0(N) = h(c=B)e¡ B + c e¡ (h(s)N 1) Á(s)ds (11) c=B ¡ Increasing the number of characterized accession has two e®ects on the col- lection's value to the R&D ¯rm. First, it increases the probability that a discovery will be made eventually. This is exactly the probability that the characterized lead will contain the discovery uniquely. Second, the lead re- duces search costs in expectation.

11 3 Summary and Conclusions

This article examined the incentives facing the owners of genetic resources to invest in the characterization of the materials under their control, and the e®ect of such investments on conservation incentives. A collection's catalog can be viewed as a tool for the \manufacture of variance" in buyers' beliefs about the utility of di®erent members of the seller's collection. Because buyers can focus their research e®orts on leads that display unusual promise, the catalog improves the productivity of research e®ort, reducing search costs in expectation. In some cases, the catalog makes feasible projects that would otherwise never be undertaken. Cataloged leads are, therefore, more valuable in expectation than are those in unordered collections. Indeed, cataloged leads retain substantial value even when unordered leads are available freely in virtually unlimited supply. The analysis highlights a complementarity between genetic resources and knowledge resources. Models, data and biological materials form a mutually complementary package of inputs in the creation of new technologies. The quality of each input enhances the value of the others, and thus complemen- tary interactions occur. The existence of a well-developed capacity in basic sciences creates the expectation that models will be available for emerging product demands. A well-developed capacity in basic sciences thus creates the conditions under which agents have an incentive for data collection and resource conservation. Resource conservation is, as well, substantially prof- itable only in the presence of appropriate data and models. Likewise, data are not useful unless the underlying resources are conserved. Resource owners who invest in characterization and inventory therefore signal their commit- ment to a conservation strategy. Evidently, the market for research services can exhibit multiple equilibria. One can imagine a virtuous circle develop- ing, with returns to basic research, data collection, and resource conservation each increasing through time. This initial exploration on the theme of knowledge creation has abstracted from an examination of decentralized market equilibria. When access to models, data, and resources is dispersed among many di®erent agents, these entities must form contracts to unlock the value of the complementarities between these factors. The design of contractual arrangements between sup- pliers of these services, and the resulting structure of the research industry, constitute signi¯cant questions for future research.

12 References

[1] J. H. Barton. Introduction: Intellectual property rights. In P. S. Baen- ziger, R. A. Kleese, and R. F. Barnes, editors, Intellectual Property Rights: Protection of Plant Materials, CSSA Special Publication No. 21, pages 13{19. Crop Science Society of America, 1993. [2] R. E. Evenson and Y. Kislev. Agricultural Research and Productivity. Yale University Press, New Haven and London, 1975. [3] R. E. Evenson and Y. Kislev. A stochastic model of applied research. Journal of Political Economy, 84(2):265{281, 1976. [4] D. Gollin, M. Smale, and B. Skovmand. Searching an ex situ collection of wheat genetic resources. American Journal of Agricultural Economics, 82(4, November):812{827, 2000. [5] J. Kaiser. Unique all-taxa survey in costa rica \self-destructs". Science, 276:893, 1997. [6] B. Koo and B. D. Wright. The optimal timing of evaluation of genebank accessions and the e®ects of biotechnology. American Journal of Agri- cultural Economics, 82(4, November):797{811, 2000. [7] S. Polasky and A. Solow. On the value of a collection of species. Journal of Environmental Economics and Management, 29:298{303, 1995. [8] G. C. Rausser and A. A. Small. Valuing research leads: Bioprospecting and the conservation of genetic resources. Journal of Political Economy, 108(1):173{206, 2000. [9] D. R. Simpson, R. A. Sedjo, and J. W. Reid. Valuing biodiversity for use in pharmaceutical research. Journal of Political Economy, 104(1):163{ 185, 1996. [10] A. A. Small. The Market for Genetic Resources: The Role of Research and Development in the Valuation and Conservation of Biological Intel- lectual Capital. Ph.d. dissertation, University of California, 1998. [11] M. L. Weitzman. Optimal search for the best alternative. Econometrica, 47(3):641{654, 1979.

13