MANAGEMENT SCIENCE informs ® Vol. 54, No. 5, May 2008, pp. 956–968 doi 10.1287/mnsc.1070.0784 issn 0025-1909 eissn 1526-5501 08 5405 0956 © 2008 INFORMS

Sequential Testing of Product : Implications for Learning

Sanjiv Erat Rady School of Management, University of California at San Diego, La Jolla, California 92093, [email protected] Stylianos Kavadias College of Management, Georgia Institute of Technology, Atlanta, Georgia 30308, [email protected]

ast research in (NPD) has conceptualized prototyping as a “-build-test- Panalyze” cycle to emphasize the importance of the analysis of test results in guiding the decisions made during the experimentation process. New product designs often involve complex and incorporate numerous components, and this makes the ex ante assessment of their performance difficult. Still, design teams often learn from test outcomes during iterative test cycles enabling them to infer valuable information about the performances of (as yet) untested designs. We conceptualize the extent of useful learning from analysis of a test outcome as depending on two key structural characteristics of the design space, namely whether the set of designs are “close” to each other (i.e., the designs are similar on an attribute ) and whether the design attributes exhibit nontrivial interactions (i.e., the performance function is complex). This study explicitly considers the design space structure and the resulting correlations among design per- formances, and examines their implications for learning. We derive the optimal dynamic testing policy, and we analyze its qualitative properties. Our results suggest optimal continuation only when the previous test outcomes lie between two thresholds. Outcomes below the lower threshold indicate an overall low performing design space and, consequently, continued testing is suboptimal. Test outcomes above the upper threshold, on the other hand, merit termination because they signal to the design team that the likelihood of obtaining a design with a still higher performance (given the experimentation cost) is low. We find that accounting for the design space structure splits the experimentation process into two phases: the initial exploration phase, in which the design team focuses on obtaining information about the design space, and the subsequent exploitation phase in which the design team, given their understanding of the design space, focuses on obtaining a “good enough” configuration. Our analysis also provides useful contingency-based guidelines for managerial action as information gets revealed through the testing cycle. Finally, we extend the optimal policy to account for design spaces that contain distinct design subclasses. Key words: sequential testing; design space; complexity; contingency analysis History: Accepted by Christoph Loch, R&D and product development; received April 29, 2005. This paper was with the authors 1 year and 5 months for 2 revisions. Published online in Articles in Advance March 1, 2008.

1. Introduction to undertake multiple trials (experiments) to obtain Before launching a new product, firms often test mul- a high performing outcome (design). DeGroot (1968) tiple product to gather performance and modeled this process as the repeated sampling of market data, and to decrease the technical and mar- a random variable representing the design perfor- ket uncertainty. Various studies emphasize the impor- mance. In a seminal study, Weitzman (1979) recog- tance of testing and experimentation for successful nizes that the different design performances are not new product development (NPD) and, at the same necessarily “realizations” of the same random vari- time, recognize that testing may consume substan- able. Instead, he views experimentation as a search tial resources (Allen 1977, Clark and Fujimoto 1989, over a finite set of statistically independent alterna- Cusumano and Selby 1995, Thomke 2003). Thus, these tives (designs) with distinct and known prior distribu- studies highlight a key trade-off involved in experi- tions of performances. His simple and elegant result mentation: performance benefits versus costs of addi- is summarized as follows: continue testing as long tional experiments. as the best performance obtained so far is below a Over the years, a number of studies have exam- “reservation performance.” ined this fundamental trade-off and its associated Clark and Fujimoto (1989) present evidence from drivers. Early on, Simon (1969) observed the need practice that design configuration tests are not simply

956 Erat and Kavadias: Sequential Testing of Product Designs: Implications for Learning Management Science 54(5), pp. 956–968, © 2008 INFORMS 957 arbitrary draws from independent performance dis- true performance of design configurations (i.e., per- tributions, but rather are iterative “design-build-test” fect testing), and suppose that the cost of testing tasks. Their study forms the basis for Thomke’s (1998, design configuration A (B)isCA (CB). Consider the 2003) conceptualization of the experimentation pro- following example of parameters: cess as an iterative “design-build-test-analyze” cycle. Thomke’s (1998, 2003) research is the first to empha- A = 01 B = 0 CA = CB = 01 size the aspect of learning during experimentation, A = B = 1  = 08 and to highlight that the analysis phase enables a design team to understand the design space better 1.2. Base Line Case with No Learning and revise the test schedules accordingly. First, consider the case where the performance depen- Building on Thomke’s (1998) empirical findings, dencies and, thus, the possibility of learning are dis- Loch et al. (2001) analytically examine which mode regarded, i.e.,  = 0. Furthermore, suppose that the of experimentation (parallel versus sequential exper- best performance found so far is x (x = 0 if no test imentation) minimizes the overall experimentation has been conducted). Given x, continuing onto test costs. In their model, the design team gradually learns design A at a cost CA results in the (expected) perfor- through imperfect sequential experiments, and iden- mance E max x A . Hence, the marginal value from tifies which design is the “best” in fulfilling an exoge- testing A is E max x A  − x − CA. Weitzman (1979) nously specified functionality. shows that this marginal value is positive and further In this study, we recognize that the extent of trans- experimentation is beneficial as long as the current ferable learning (i.e., relevance of the outcome from best solution (i.e., x) is lower than a threshold (i.e., testing one design configuration in understanding the reservation price of A).1 Furthermore, Weitzman’s and predicting the performance of another configura- (1979) reservation price (denoted by WRP) of A is tion) is directly related to the design space structure obtained by equating the above marginal value to (e.g., the similarities among different design con- zero, i.e., solving for x in the equation E max x A  − figurations). Thomke et al. (1999) offer a tangible x − CA = 0. In our example, the WRP for A is 1.01 of how such transferable learning occurs and for B is 0.91. Thus, if the performance dependen- in practice: the design team, conducting simulation- cies between the two configurations are disregarded, based side-impact crash tests of BMW cars, observed the team optimally tests A, and then continues on to during the course of testing that weakening a partic- test B if and only if the outcome from testing A is less ular part of the body structure (specifically, a pillar) than 091. led to superior crash performance of the vehicle. This insight allowed them to infer the same relationship 1.3. Testing Cycle with Learning in other related (similar) designs, and to improve the Now consider the setting where the design team overall safety of BMW’s product offerings. accounts for performance dependencies (i.e., correla- This paper seeks to answer the following research tion  = 08). Suppose that design A is tested2 and questions: (i) How does the design space structure results in performance a. The conditional distribution influence the optimal number of experiments and (ii) of B (i.e., conditional on the realization a) is normal how does the design space structure drive the design with√ mean B + a − A and standard deviation team’s learning? We motivate our formal treatment of 2 B 1 −  (Johnson and Wichern 2002). Because no the problem with the following numerical example. subsequent learning is possible (i.e., B is the only other design), Weitzman’s (1979) result applies from 1.1. Motivating Example this point on. Hence, the team continues to test B if Consider a product that can be realized through two and only if the WRP of B (conditional on the real- design configurations, A and B, and let a subset of ization a) is greater than the best performance found design attributes be identical across the two con- so far, i.e., 08a + 028 > maxa 0. In summary, when figurations. The design team assesses, a priori, that  = 08, the team optimally tests A and then continues the performance of configuration A (B) is distributed to B if and only if a ∈ −035 14. normally around a mean A (B) with a standard deviation of A (B). The similarities in the design attributes between the two configurations lead the 1 Weitzman’s (1979) results presented here have been adapted to design team to expect that, if A exhibits high (low) our setting. His reservation price result is valid under more general performance, then B has a higher likelihood of a high settings including different costs and arbitrary distributions, as long as the alternatives are statistically independent. (low) performance. Thus, the scalar performances of 2 configurations A and B (A and B, respectively) are It is suboptimal not to test any designs because by testing A (B) and stopping immediately, the experimenter makes a strictly pos- jointly multivariate normal and exhibit positive cor- itive payoff = E max 0 A  − CA (E max 0 B  − CB ). Furthermore, relation AB = . We assume that testing reveals the because A >B , it is strictly better to test A first. Erat and Kavadias: Sequential Testing of Product Designs: Implications for Learning 958 Management Science 54(5), pp. 956–968, © 2008 INFORMS

Figure 1 Differences Between Base Line Case Without Learning and Testing Cycle With Learning

Base line case with CONTINUE TERMINATE no learning –0.35 0.91 1.4 a

Optimal policy TERMINATE CONTINUE TERMINATE with learning

Bad News !! Good News !! Design B is probably inferior Design B probably has high performance Terminate testing Continue testing

A graphical comparison between the two cases is Second, we identify and characterize the qualita- illustrated in Figure 1. The figure points toward two tive effects on the testing process due to the explicit key benefits of learning: (i) cost savings due to early consideration of the design space structure. Design termination, when the acquired information and the teams may pursue additional tests not only to iden- associated analysis suggest that the remaining config- tify configurations with higher performance, but also urations would exhibit inferior performances (i.e., the to explore and to gain greater understanding of the a ≤−035 region), and (ii) gains by additional exper- design space. We term this pattern the “exploration imentation when the information indicates untapped phase.” Such aggressive exploration appears in the ini- potential from the untested designs (i.e., the 091 ≤ tial stages of testing and is stronger when the inter- a<14 region). actions between design attributes are weaker and/or In this paper, we model the experimentation pro- similarities between designs are greater. Once the cess as an iterative search over a set of feasible design team gains sufficient understanding of the design configurations. We develop a stochastic dynamic pro- space, the likelihood of terminating the testing cycle gram, and we characterize the optimal testing pol- increases. We term this the “exploitation phase.” We also icy. Our goal is to describe the effects of design find that the length of the exploration phase depends space structure on learning and, by extension, on the on the extent of similarities between the designs. More optimal test schedules. Furthermore, we also aim to similarities lead to a shorter exploration phase. Thus, develop and prescribe intuitive rules of thumb to aid the experimentation process is split in two phases: an exploration phase where the “design-build-test- managerial decision making during experimentation. analyze” iterations are primarily conducted based on Our contribution is threefold. First, our conceptu- what you may learn, and an exploitation phase where alization allows us to demonstrate how the design the “design-build-test-analyze” cycles are driven by space structure (i.e., the similarities among the design what you have learned. configurations and the complexity of the performance Third, we outline rules of thumb for guiding man- function) critically determines the optimal amount of agerial decisions contingent on the initial test out- experimentation. Furthermore, we offer a more realis- comes. When worse than expected performance tic approach to understanding learning in NPD exper- outcomes are observed, the design team should transi- imentation, beyond Weitzman’s (1979) assumption tion sooner to an exploitation regime if the interactions of independent designs/alternatives, and the highly between the design attributes are deemed weak. Thus, specific assumption of Loch et al. (2001) that testing a the length of the exploration regime depends critically design alternative signals whether or not the design is on how the first few test results compare to the a priori the “best.” However, to maintain analytical tractabil- performance expectations, and on the extent of inter- ity, we rely on specific assumptions about some facets actions between the design attributes. of the testing process (e.g., we assume that design per- The model formulation is presented in §2. Subsec- formances are normally distributed and that testing tion 3.1 presents the solution to the base case together costs are equal within a class of designs). Yet, these with several important properties of the optimal pol- assumptions do not fundamentally relate to the learn- icy. Subsections 3.2 and 3.3 generalize the base case ing aspect of experimentation, and we consider them model to settings where the parameter estimates are a fair price to pay for extending the analysis to inter- uncertain, and the case where the designs belong to related design configurations. In addition, in §4, we different subclasses, respectively. Section 4 concludes offer a set of results that illustrate the mathematical with managerial insights, a discussion of the limita- intractability in obtaining closed-form solutions once tions of the current search model and some paths for we move beyond our assumptions. future research. Erat and Kavadias: Sequential Testing of Product Designs: Implications for Learning Management Science 54(5), pp. 956–968, © 2008 INFORMS 959

2. A Formal Model of Learning in configurations. To summarize, two designs are more Experimentation performance correlated if they are less distant and if Consider a design team who develops and tests dif- the performance function is less complex. ferent configurations before launching a new prod- The design team may, at least roughly, estimate uct. The performance of the product design depends the correlation between any two designs because it on M distinct design attributes. Hence, each design understands the distance between the two designs configuration i (i = 1 2  N) is a point in the and the complexity of the performance function. M-dimensional design space and is represented as a Furthermore, we assume that these correlations are non-negative. That is, a high (low) test outcome unique combination of the M attributes, namely xij (j = 1 2  M. For example, within the premises of for a design configuration signals a higher likeli- the Thomke et al. (1999) example, the thickness of hood for the performance of the other designs to be the pillar for a design configuration i is one design high (low). This assumption is natural in the con- text of performance functions (fitness landscapes).4 attribute value xi1. The team selects the values for the attributes x For instance, Stadler (1992) demonstrates how fami- ij lies of performance functions (landscapes), including and determines the performance Pi = Pxi1  xiM  of configuration i; P· is the performance function those generated from many combinatorial optimiza- that maps the design space to a real number rep- tion problems, can be characterized by their correla- resenting the design performance. As in most new tions and exhibit positive correlations. Furthermore, product designs, the performance function P· cannot our assumption of positive correlations conforms to be ex ante-specified, due to incomplete knowledge NK landscapes and AR(1) processes, both of which or inherently uncertain and complex relationships are widely used to represent performance landscapes between the attributes (Ethiraj and Levinthal 2004, in complexity literature (Kauffman 1993).

Loch and Terwiesch 2007). The Thomke et al. (1999) Let P = p1 p2  pN  be the vector of the perfor- crash test example offers a vivid illustration of how mances of the N potential design configurations. P is complexity naturally emerges from unforeseen inter- random because of the design team’s limited knowl- actions between the pillar and the rest of the design edge; we assume that the joint distribution of the attributes. scalar performances is multivariate normal. Our dis- Despite the presence of such complexity, the design tributional assumption is realistic for a wide range team can often estimate, at a high level, the extent of of performance landscapes. For instance, in fitness potential interactions between the design attributes.3 landscapes such as the widely used NK landscapes, Furthermore, because a specific choice of attribute the performance at each point is approximately nor- values constitutes a design configuration, limited mally distributed (Kauffman 1993, p. 53). The nor- changes in a subset of the design attributes result mality assumption also results in a tractable model in similarities among designs. As the degree of and allows the characterization of the optimal policy attribute commonality increases, the design configu- properties (our discussion in §4 illustrates that index rations appear more similar than others. policies may fail to exist for arbitrary distributions). We model these two aspects of the design space— Let the design team’s a priori expectations of the potential interactions between attributes and the design performances be ¯ = 1 2  N . The a similarity between configurations—through a per- priori performance uncertainty is summarized in a formance correlation between the designs. Specif- performance covariance matrix N . We assume that ically, the performance levels of two designs are the a priori variances of the performances of all the 2 likely to be more related if there is only a small designs are identical =  . Our assumption reflects difference between the attribute values of the two relatively homogeneous design spaces with configu- designs (a smaller “distance” between the designs) rations that differ with respect to a small, limited sub- and/or if the extent of interactions between the set of design attributes. We offer a limited relaxation design attributes is low (a simpler performance func- of this assumption in §3.3. However, as we demon- tion). On the other hand, widely different design strate in §4, the equal variance assumption is not attributes across configurations (a larger “distance” merely convenient, but is also necessary for mathe- between designs) and/or a design space where the matical tractability. design attributes exhibit greater degree of interac- tions (a more complex performance function) lim- 4 Intuitively, if the distance between two designs is low and the its the performance dependence among different performance function is simple, we would expect the performances to be positively correlated. Alternatively, if the distance between designs is high and the performance function is complex, we would 3 This is similar to assessing the interaction parameter K in an NK expect the performances to be unrelated (i.e., correlation = 0). Note landscape model (Kauffman 1993). that in either case the correlations are always non-negative. Erat and Kavadias: Sequential Testing of Product Designs: Implications for Learning 960 Management Science 54(5), pp. 956–968, © 2008 INFORMS

In summary, our conceptualization assumes that Equations (3) and (4) are the Bellman optimal- the design team has a multivariate normal prior ity equations corresponding to the decision process. on the design performances. Thus, the design team The state is described by the maximum performance knows the following parameters of the multivari- realized thus far, the set of untested design config- ate normal distribution of the design performances: urations, and the vector of the updated mean per- (i) the performance expectations, (ii) the performance formance values of the remaining tests. The upper uncertainty (represented through the common stan- branch of (3) describes the payoff if the team termi- dard deviation), and (iii) the performance correla- nates the experimentation; it is equal to p, the high- tions between the design configurations. Note that est performance observed. Upon continuation, the this does not imply knowledge of the exact perfor- expected payoff results from the choice of one of mance levels, but only represents the a priori knowl- the untested configurations. Note that the final pay- edge of design commonalities (based on, for example, offs are given by p = maxj∈T pj . Thus, we assume shared components). that a design is available for market launch only if Suppose that the team undertakes sequential test- it has been tested. The investment involved in set- ing, and that the design configurations 1 through k ting up a manufacturing process and a distribution have been tested. We assume that the tests are perfect, channel makes any launch approval unlikely, unless i.e., the measurement errors, if any, are negligible, and a profitable business case, including testing data, is the true performance of a design is revealed when presented. the design is tested. Then, the conditional joint distri- bution (conditional on the k test outcomes observed 3. Effect of Learning on Optimal so far) of the remaining N − k designs is again a Testing multivariate normal distribution (Rencher 2002). Fur- In this section, we derive the optimal dynamic search thermore, the conditional distribution (conditional on policy and we demonstrate its key structural proper- the k test outcomes observed so far) of each of the ties. We proceed in three stages: In §3.1, we assume remaining N − k designs are univariate normal with that the performance dependencies between any pair the following conditional mean and standard devia- of design configurations are identical, that is, the cor- tion (Rencher 2002): relations among the design performances are iden- tical. The assumption represents settings where the ˆ i = i + r¯i Q−1P −¯ i= k + 1  N (1) k+1 k k k k design configurations lie within a specific scien- tific domain, and they share a substantial number  i =  1 − r¯i Q−1r¯i i= k + 1  N (2) k+1 k k k of attributes (i.e., each design configuration is of the form x x  x , and the first r attributes where ˆ i and  i are the conditional mean and 1 2 M k+1 k+1 x x  x have the same value across all configura- the standard deviation, respectively, for design con- 1 2 r figuration i; r¯i is a k × 1 column vector of a pri- tions). The design team knows the performance corre- k lations among the designs, but not the test outcome. ori covariances of design i with all of the k tested As stated in the introduction, this reflects knowledge designs; Q is a k × k sub-matrix of the covariance k about the distance between design configurations, and matrix  such that it contains the a priori covari- N the complexity of the performance function. In §3.2, ances of all the tested designs; ¯ is a k × 1 column k we relax this assumption and allow for uncertain cor- vector of a priori expectations of all the k tested pro- relations. In §3.3, we relax the assumption of identical totypes (=1 2  k); P is the k × 1 column k correlations between the design configurations. vector of performances of the k tested prototypes

(=p1 p2  pk). 3.1. Base Case: Identical Correlations After the kth test, let U be the set of all untested For now, we assume that any pair of design config- configurations and T the set of all the tested ones. If urations has the same correlation factor. The overall the cost of testing design i is ci, and p = maxj∈T pj is performance uncertainty of the design configurations the maximum performance observed until now, the is summarized by the covariance matrix N (defined team faces the following dynamic decision problem: in §2). It also summarizes the dependencies across the performance random variables, that is, the ij th V p ˆ U k k element a is the covariance between the perfor-  ij p mance random variables of design configurations i   and j. Then,  = max maxE Vk+1maxp pj  (3) 2  j∈U  if i = j  a =  ij  2 ˆ k+1 U − j  − cj   otherwise 0 <<1

VN p ·  = p (4) where  is the correlation factor. Erat and Kavadias: Sequential Testing of Product Designs: Implications for Learning Management Science 54(5), pp. 956–968, © 2008 INFORMS 961

The experimentation cost is composed of two ele- and standard deviation ments (Thomke 2003): (i) the direct cost of specialized (i) ˆ i =ˆi + p −ˆk/k + 1 −  = i + k+1 k k k resources required to conduct the experiments, and /k + 1 −  k p − j , j=1 j (ii) the indirect opportunity cost resulting from prod- (ii)  i =  1 − k2/k + 1 − . uct introduction delays. We assume that the direct k+1 testing costs are constant and independent of the All the proofs are given in the online appendix. tested configuration (=c ). This is realistic when the Given the learning and the evolution of knowledge cost is driven by the experimentation process rather outlined in Proposition 1, Theorem 1 presents the than the experimentation subject. For example, in optimal testing policy for the base case. some athletic gear products, the conduct Theorem 1. Index the design configurations in de- wind tunnel tests to refine a product’s aerodynamic scending order of their a priori expected performances, i.e., properties (The Globe and Mail 2004). These costs are i>j⇒ i ≤ j . Define substantial and are usually independent of the tested design configuration.5 Furthermore, our constant cost k i i  j assumption coincides with similar assumptions made %k+1 = max pj −  + pj −    j≤k k + 1 −  in the extant literature (Loch et al. 2001). In §4, we j=1 offer a formal discussion of the limitations imposed Then, there exist constant thresholds & (i = 1  N) by the constant cost assumption. i such that, after the completion of the kth configuration The second component of the experimentation cost, test, stop if and onlyif %i ≥ & for all i>k, other- the opportunity cost from delayed product introduc- k+1 k+1 wise test the design configuration with the largest expected tions, can be significant. Clark (1989, p. 1260) notes performance. that “each day of delay in market introduction costs an automobile firm over $1 million in lost profits.” We An interesting insight emerges from the follow- account for such losses as follows: a firm that tests ing interpretation of the optimal policy. The term i i k designs before terminating the test cycle incurs a −%k+1 (=ˆk+1 − maxj≤k pj ) expresses a normalized delay loss Lk. Using the total effective cost of the kth value that accounts for the remaining potential from test ck = c + Lk − Lk − 1 allows us to derive the testing, i.e., the difference between the expected per- optimal policy for general loss function structures (see formance of the next configuration and the best 6 the online appendix, provided in the e-companion). performance achieved so far. The term −&k+1 is a This explicit loss function has two advantages over constant threshold. Thus, our policy has an intu- a discount factor: (i) we posit that factors, such as itive economic interpretation: the design team termi- competition and the short product life cycle, affect nates experimentation when the remaining potential i the opportunity loss significantly more than the time drops below a threshold value (i.e., −%k+1 ≤−&k+1). value of money captured through a discount factor If testing continues, the configuration with the largest (Kavadias and Loch 2003); and (ii) using identical remaining potential (or alternatively the design with discounting for both the costs and the performance the largest mean) is tested next. levels is misleading in an NPD context, as the risks Our optimal policy exhibits certain structural simi- associated with the two may be very different (for larities with the Weitzman (1979) policy, adapted for instance, technical risks versus market commercializa- our specific distributional assumptions to enable a tion/acceptance risks). meaningful comparison. &k+1 represents a normalized Suppose that testing is sequential, and configura- value that makes management indifferent between tions 1 to k have been tested. Due to the performance proceeding or stopping, a quantity very similar to the dependencies, the outcomes of the tested configu- classic definition of a reservation price. Reinterpreting rations also convey information about the untested i i %k+1 (=p −ˆk+1) as the normalized value obtained so configurations. Proposition 1 outlines this dynamic far (normalized by accounting for the mean) allows information updating. us to also see the parallel between our policy and Proposition 1. Assume that k configurations have Weitzman’s (1979) policy: testing stops when the nor- malized value %i obtained from testing exceeds a been tested. The updated (marginal) performance distribu- k+1 tion of an untested configuration i is normal with mean reservation price. Our choice rule, i.e., which configuration to test next, is the same as Weitzman’s rule. We can verify 5 “Before the 2003 Tour (de France), Armstrong spent about $15,000 the claim as follows: consider two search oppor- an hour, for wind tunnel tests to create a new helmet” (San Jose Mercury News 2004). tunities X ∼ Nx  and Y ∼ Ny , and test- ing costs c = c and c = c. The difference between 6 An electronic companion to this paper is available as part of the x y online version that can be found at http://mansci.journal.informs. the Weitzman reservation prices of X and Y is org/. equal to the difference between their means, i.e., Erat and Kavadias: Sequential Testing of Product Designs: Implications for Learning 962 Management Science 54(5), pp. 956–968, © 2008 INFORMS

7 WRPX − WRPY = x − y. Hence, the priority Proposition 2. Define WRPk = x- E max x Xk  − 2 2 ranking induced by the reservation prices is iden- x = c where Xk ∼ N0  1 − k /k + 1 − . WRPk tical to the one induced by the means, and Weitz- expresses the Weitzman stopping threshold for search

man’s (1979) adapted policy translates into testing the opportunity Xk. Then, design with the highest mean. (A) &k is a real constant, and is decreasing in k. However, our stopping rule, i.e., when experimen- (B) &k is increasing in . tation should be terminated, exhibits key differences (C) &k is decreasing in . from Weitzman’s rule. We rewrite our policy state- (D) &k ≥ WRPk. ment in Corollary 1 to enable easier comparison to Claim (A) characterizes the normalized stopping Weitzman’s (1979) stopping conditions. thresholds, &k, as decreasing in the number of tests Corollary 1. Index the design configurations in conducted. This occurs for two reasons: (i) fewer descending order of the a priori performance expectations, design configurations remain to be tested and, hence, i.e., i>j⇒ i ≤ j . Define the learning possibilities are fewer, and (ii) the resid- ual uncertainty is lower.   k−1 Claims (B) and (C) demonstrate the effect of the ( =k+1 − k + p −i+& k+1 k+1− k+1− i k+1 residual uncertainty on the optimal stopping val- i=1 ues. Higher residual uncertainty may arise from

and )k = /k + 1 − . After the kth test, optimally fewer commonalities between design configurations continue if and onlyif pk ∈ maxi≤k−1 pi − (k/)k (lower ), greater complexity of the performance func- (k/1 − )k. Otherwise terminate testing. tion (lower ), or greater a priori uncertainty con- cerning the configuration performances (higher ). Corollary 1 shows that accounting for performance Irrespective of the source, higher residual uncertainty dependencies changes the stopping rule. Continua- makes testing more attractive by increasing the nor- tion is optimal only when the current test outcome malized reservation price. The result resembles the is within an interval. This result stands in contrast value-enhancing nature of uncertainty (variance) in to Weitzman (1979), where continuation occurs only a traditional option setting: because the design team when the test outcome is below a threshold. Thus, the is assured of retaining the highest performing design structure illustrated in Figure 1 generalizes. The key configuration upon stopping, the downside loss of difference between the two policies is driven by the conducting one more test is bounded by the test- ability to learn during the experimentation process. ing cost. At the same time, the upside benefits from Low test outcomes signal low future performances experimentation depends on the amount by which the and, hence, decrease the attractiveness of additional performance can vary upward, which increases with experimentation. High design performances, on the higher residual uncertainty (variance). other hand, signal a high remaining potential making Claim (D) demonstrates that the stopping thresh- additional experimentation worthwhile. olds in the optimal policy may be larger compared to Our results add to the discussion initiated by Loch Weitzman’s (1979) optimal stopping thresholds WRPk, et al. (2001) concerning the impact of learning on which do not consider design correlations and learn- experimentation strategies. Our stopping conditions ing. This result illustrates the relevance of learning in emerge endogenously, and they involve balancing experimentation and highlights one of our key con- the performance benefits and learning against exper- tributions. As opposed to myopic stopping proposed imentation costs. The results also demonstrate that in past studies (Weitzman 1979, Loch et al. 2001), we the stopping conditions depend on the design space find that testing may continue to provide information structure and the realized performance levels as about the design space. opposed to an exogenously prespecified confidence Our results stand in contrast to Adam (2001), level. who examines the search problem where the exper- In Proposition 2, we characterize the properties imenter has priors about the distributions of the of the stopping thresholds &k. They qualitatively search alternatives and updates these priors as the describe the optimal test termination decisions of the search proceeds. However, he assumes that “learn- design team. ing exhibits enough monotonicity to prevent the searcher from optimally going through a ‘payoff val- 7 Let WRPX = Rx and WRPY = Ry . Then, by definition, 0 = ley’ to reach a ‘payoff-mountain’ later on” (Adam E max X Rx  − Rx − c and 0 = E max Y Ry  − Ry − c. Because X 2001, p. 264). Under this assumption, an extension of and Y are different only in their means, the distribution of X is iden- the Weitzman’s (1979) policy, where stopping is still tical to the distribution of (Y −y +x). Hence, 0 = E max X Rx − R − c = E max Y −  +  R  − R − c = 0 = E max Y R + myopic, remains optimal. In contrast, we do not con- x y x x x x strain the form of learning, and we find that optimal y − x  − Rx − y − x − c. That is, the reservation price of Y is

equal to Rx +y −x, i.e., Ry = Rx +y −x ⇒ Rx −Ry  = x −y . stopping becomes nonmyopic. Erat and Kavadias: Sequential Testing of Product Designs: Implications for Learning Management Science 54(5), pp. 956–968, © 2008 INFORMS 963

To summarize, experimentation offers two bene- Figure 2 Learning Effects Arising from Performance Dependencies fits: (i) the direct benefit of identifying configurations with higher performance (a key goal identified in Exploration phase Exploitation phase the past NPD literature), and (ii) the indirect benefit θ = 0.1 from learning about the untested design configura- θ = 0.0 tions. Because of the second benefit, the design team θ = 0.1 θ = 0.4 may keep experimenting for the purpose of learn- ing about the design space even when the remain- ing untested alternatives by themselves do not justify experimentation. We illustrate the result with a modified version of the introductory numerical example. Consider the Probability of continuation parameters:

123456 Number of tests A = B =−091 CA = CB = 01  =  = 1  = 01 Exploration phase Exploitation phase A B θ = 0.4

The expected one-step benefit from testing A space. Similarly, we call the phase with lower con- (=E max 0 A  − C ) is 0. Thus, the myopic stopping A tinuation probability the exploitation phase, where con- rule (Weitzman 1979, Adam 2001) dictates immedi- tinuation, if pursued, is for the explicit purpose of ate termination without testing any of the configura- obtaining a better performance. The precise transition tions. However, we can learn about the performance from “exploration” to “exploitation” would be dif- of configuration B from analyzing the test result of ficult to measure in practice, as the parameters are configuration A (say,  = 01). Hence, testing A and never precisely known. Still, an important qualitative obtaining a test result a, prompts the design team to insight for managers emerges: early on, conduct more optimally test B if and only if the reservation price of “design-build-test” iterations than what the remain- B,089 + −091 + 01a − −091, is greater than the ing alternatives seem to justify, in order to learn. maximum obtained so far, max a 0 .8 Hence, testing Later on, conduct “design-build-test” iterations only B is optimal if and only if a ∈ −071 0079. Because to exploit the remaining promising alternatives. there is a nonzero probability of testing B, this implies Our insight offers an explanation of earlier obser- that the optimal policy involves testing A and then vations by Ward et al. (1995) and Sobek et al. (1999) testing B if and only if a ∈ −071 0079. about Toyota’s product development process, where a Our result exhibits properties of “strong learning” large number of different design options are explored (Adam 2001), that is, experimentation for the purpose upfront before converging to a final solution. Toyota of learning. Figure 2 illustrates this phenomenon. It follows this approach “to explore broad regions of the plots the probability of continuation as a function of design space simultaneously” (Ward et al. 1995, p. 49) the number of tests conducted, for different levels of and to “refine [their] map of the design space before design performance dependencies.9 making decisions” (Ward et al. 1995, p. 56). Although We observe that the dependencies across configu- the mechanism of experimentation we examine in this rations (a higher correlation ) lowers the likelihood paper is different (sequential as opposed to simultane- of early termination, but increases the probability of ous testing), the underlying driver of the exploration later termination. Define the case with no dependen- phase is similar, namely, the potential to learn about cies (i.e.,  = 0 or no learning) as the “benchmark the design space. curve.” Then, we may call the phase with greater than The following additional insights can be drawn the benchmark setting continuation probability the from Figure 2: the likelihood of continuation during exploration phase, because in this phase, tests are con- the initial tests and the duration of the exploration ducted for the purpose of learning about the design phase decrease in the complexity of the performance function and the distance between designs. This occurs for the following reason: the test outcomes 8 The argument is identical as in §1. hold greater information in design spaces where 9 Figure 2 was generated for the specific problem instance with the complexity of the performance function is lower N = 8,  2 = 80, c = 30,  = 140 ∀ i = 1  8 and  ∈ 0 01 04 . For i and/or where the distance between designs is lower. each parameter setting of ,10 000 simulation experiments were conducted and the probability of continuation after the kth test was However, intensive testing continues for a shorter calculated as the fraction of experiments where the optimal policy period of time as learning accumulates after fewer proceeds beyond the kth test. tests. Erat and Kavadias: Sequential Testing of Product Designs: Implications for Learning 964 Management Science 54(5), pp. 956–968, © 2008 INFORMS

To further build intuition about the drivers of the (B) If the test outcomes are better than expected, there total number of tests, we perform comparative statics is no unidirectional result. with respect to the design space characteristics. It is (C) If the test outcomes are better than expected with important, from a practitioner perspective, to outline a sufficientlyhigh difference, the design team should con- such contingency-based guidelines (Loch et al. 2006). tinue exploration phase when the design performances are For ease of exposition, assume that the performance stronglycorrelated (high ). expectations of all the design configurations are iden- Proposition 4 highlights the subtle moderating role tical (=). Let    be the total number of tests of the performance correlations (due to the design conducted under the premises of the optimal policy, space structure) on optimal contingency actions. Sup- when the expected configuration performance is  pose that the test outcomes are, on average, poorer and the uncertainty is  2. than initially expected. The information that these Proposition 3. test results convey about the rest of the configu- (i) The number of tests increases in the initial expecta- rations depends on the underlying structure of the tions of design performance, i.e., .  /. ≥ 0. design space. When the performance correlations are (ii) The number of tests increases in the initial design greater, reflecting similar design configurations and performance uncertainty, i.e., .  /. ≥ 0. a simple performance function, the initial test out- comes are more informative about the performances Proposition 3 states formally that more experiments of the remaining configurations. Hence, the design should be conducted on a design space where con- team extrapolates the initial poor performances to the figurations have higher a priori performance expec- rest of the design space, and, consequently, transitions tations and higher performance uncertainty. These to the exploitation mode (i.e., increased likelihood of results confirm intuition, but they emerge for dif- termination). On the other hand, if the performance ferent reasons. High initial expectations lead to the correlations are weak, the initial poor test outcomes design team foreseeing a higher potential value (as are not necessarily indicative of the remaining design expressed by the −%i term), and thus result in addi- k performances. Consequently, the team may continue tional testing. In contrast, higher initial performance exploration even when worse-than-expected test out- uncertainty leads to increase in the stopping thresh- comes are encountered at the early stages. old (& ), and leads to increased experimentation so as k Claims (B) and (C) follow a similar (but more com- to reduce the (residual) uncertainty. plex) line of reasoning. If the test results are better Next, we examine the impact of performance corre- than expected, the design team uses the initial test lations (i.e., similarities between design configurations outcomes as a signal that the remaining designs have and the complexity of the performance function) on higher than the (initially) expected performances. This the number of experiments. The effect is not unidi- signal is stronger when the design performances are rectional. It depends on the performance realizations more correlated, and, thus, results in continued explo- of the initially tested designs. Therefore, we perform ration. At the same time, greater dependencies also a contingency analysis: we assume that k configura- result in faster learning (rapid uncertainty reduction) tions have been tested and have yielded performances and an earlier transition to exploitation phase. Which p  p . The following definitions describe different 1 k of these opposing effects is stronger depends on the contingency scenarios. specific parameter values. When the realized perfor- Definition 1. Suppose k design configurations mance levels are much better than expected, the effect have been tested yielding the following performances: of the “high performance potential” dominates, lead- p  p . Then, we define the scenario (test out- 1 k ing to continued exploration. comes) as Table 1 summarizes the potential managerial con- • worse than expected if the average configura- tingency actions: After testing a few designs, the tion performance realized is lower than the average design team assesses the actual performance realiza- expected one, i.e., k p /k < k i/k; i=1 i i=1 tions and their difference from the initial expectations. • better than expected if the average configura- If the test outcomes are below the a priori expecta- tion performance realized is higher than the average tions, then the design team assesses that a worse- expected one, i.e., k p /k > k i/k. i=1 i i=1 than-expected scenario has emerged. In the event that Given these two contingencies, Proposition 4 de- performance function is simple and/or the distance scribes the optimal actions: between design configurations is lesser, such initial Proposition 4. Assume k configurations have been bad news implies that there is not much more to tested. expect from the design space at hand; thus, the team (A) If the test outcomes are worse than expected, the should most probably attempt to reduce the remain- design team should transition to exploitation phase when ing experimentation cost by transitioning to exploita- the design performances are stronglycorrelated (high ). tion phase. If, on the other hand, the performance Erat and Kavadias: Sequential Testing of Product Designs: Implications for Learning Management Science 54(5), pp. 956–968, © 2008 INFORMS 965

Table 1 Impact of Design-Space Structure on the Contingency Plan correlations require additional information, such as exploring relevant scientific theories, and formulat- Much better than expected Worse than expected (Very good news) (Bad news) ing abstract explanatory models of the design space. These signals are far more general than the perfor- High correlation  mance outcome information; therefore, we assume (Simple performance Continue exploration Transition to function less distance exploitation that the dependence between Sk and the past perfor- between designs) Increased continuation Lower continuation mances is negligible. Theorem 2 presents the optimal probability probability policy. Low correlation  Theorem 2. If the uncertaintyregarding the perfor- (Complex performance Transition to exploitation Continue exploration function greater distance Lower continuation Increased continuation mance correlations has been resolved, then use Theorem 1. between designs) probability probability Otherwise, index the design configurations in the descend- ing order of their a priori expected performance, i.e., i j k+1 i>j⇒  ≤  . Define pmax = maxj≤k pj as the highest function is complex and/or the distance between realized configuration performance. There exist path depen- design configurations is greater, then targeting reduc- k dent thresholds, 1k+1 j=1 pj  (i = 1  N), such that tion of the remaining experimentation costs is proba- after the kth test the design team should stop if and onlyif bly misplaced, as the initial bad news is not indicative k+1 pmax ≥ 1k+1. Else theyshould proceed with the design con- of the remainder of the design space. Similarly, if test figuration with the highest updated expected performance. outcomes emerge significantly better than expected, then the design team should plan on further exploring The optimal policy shows marked similarity in the design space due to the high remaining potential. structure to Theorem 1. However, the stopping thresh- As before, this continued exploration is more likely olds are no longer path independent, and they depend on the sum of the realized configuration per- when the initial good news is more indicative of the formances. Mirroring Corollary 1, the following corol- design space (i.e., when the performance function is lary gives an alternative formulation of the optimal simple and/or when the distance between designs is policy. lesser). The importance of Table 1 stems from the fact that it Corollary 2. There exists thresholds 2l and 2h such outlines guidelines for decision making and it demon- that there is an optimal policywhere after the kth test ter- strates the nontrivial impact of the design space struc- mination is optimal if pk ≤ 2l or pk ≥ 2h. ture on the optimal test schedule. In that light, the Corollary 2 suggests a surprising robustness of contingency actions outlined may also be viewed as our initial results. Termination of the experimentation guidelines to achieve more reliable project monitoring process occurs either because: (i) the test outcome is and control (Loch et al. 2006). high enough so that the design team does not expect to receive a much higher value from additional exper- 3.2. Uncertain Performance Dependencies imentation (pk ≥ 2h); or (ii) the test outcome is so low Thus far, we have implicitly assumed that the design that the design team expects the remaining designs to team can at least estimate the performance correla- be low performing as well (pk ≤ 2l). tions between the design configurations. However, when the designs involve highly innovative technolo- 3.3. General Case: Multiple Design Classes gies, the performance correlations arising from the In this section, we extend our base case model to more configuration similarities may be unknown. In such general design space topographies. We consider that cases, we assume that the design team specifies a there exist distinct subclasses with stronger configura- likelihood for the performance correlations. In this tion dependencies within classes and weaker across. section, we analyze the impact of uncertainty in per- The new setup represents design problems that can formance correlations on optimal experimentation.10 be solved through substantially different approaches The design team holds the belief that the perfor- (e.g., recent efforts to substitute bar codes with radio- mance correlations are weak (low  ) with probabil- frequency identification tags imprinted on different L materials like paper, polymers, etc; see PRC 2004). ity p or strong (high H ) with probability 1 − p. The sequential experiments yield an additional sig- Consider M distinct classes of design configura- tions. Each class j contains N different configura- nal  , in addition to the design performance result. j tions. A priori, the design configurations of class j We assume that the signal S received after the kth test k have unknown performances with identical11 expec- reveals the true correlation with probability ), and 2 tations (j ) and performance uncertainties (j  ). it is independent from past performances p1  pk. In other words, understanding the true performance 11 We restrict the performance expectations to be identical within a class to reduce the notational burden. Extensions to a more general 10 We thank the referee team for suggesting this extension. setting are straightforward. Erat and Kavadias: Sequential Testing of Product Designs: Implications for Learning 966 Management Science 54(5), pp. 956–968, © 2008 INFORMS

In addition, the attributes that define a class (for in- approximate such a design space structure through stance the underlying technology) lead to significantly a surrogate metric that accounts for both the com- correlated performances within a class ( = j within plexity of the performance function and the distance a class), and negligible dependencies across classes between the designs: the correlation between design ( = 0 across classes). performances. We develop a normative model and we Suppose the design team has sequentially tested k derive the optimal dynamic testing policy for design design configurations, Theorem 3 provides the dy- configurations that have different and uncertain per- namic optimal policy for the new setting. formance levels. Test continuation is optimal when the previous test Theorem 3. Define the surplus performance observed outcomes range between two thresholds. Test out- thus far as the following normalization, i.e., %k+1 = j comes below the lower threshold indicate an over- k+1 i maxj≤k pj −ˆj . Then, there exist constants &j j = all low-performing design space, where subsequent 1  M, i = 1  Nj such that after the kth test the tests are unprofitable. Test outcomes above the upper k+1 kj design team should stop if and onlyif %j ≥ &j for all threshold, on the other hand, indicate that the design design classes, otherwise continue with the class that has team cannot expect a much higher value (than the kj k+1 the highest &j − %j . highest value already found) by undertaking addi- Theorem 3 demonstrates that the stopping deci- tional experiments. The upper threshold result is con- sion (“how many more experiments should we under- sistent with the optimal policy of Weitzman (1979). take”) is relatively robust and is structurally similar to Furthermore, we also demonstrate that the design Theorem 1. Our policy permits an interesting analogy team may optimally pursues experimentation even to Weitzman (1979), if we view the design subclasses after a “good” design is found, which, in the absence as Weitzman’s (1979) boxes. Under this interpretation, of performance correlations, would justify stopping. the optimal decision is equivalent to terminating the The potential to learn about the design space adds test cycle only if the best performance obtained so value to the testing process above and beyond the far is greater than the (updated) reservation prices immediate performance benefits. and continuing with the box which has the highest These results help us understand the role of the de- (updated) reservation price. However, note that this sign space structure on learning and on optimal ex- analogy can only be taken so far because, in our mul- perimentation: less experiments with cost savings, tiple class setting, repeated sampling of a box (design when the information suggests that the untested class) is feasible, unlike in the one-trial-per-box case design configurations may be performing low; or analyzed by Weitzman. additional experimentation seeking higher perfor- Also, similar to Corollary 1, it can be verified mance outcomes, when the information indicates that the design team should stop testing the designs untapped potential. within a given subclass when an observed outcome Furthermore, we demonstrate that the considera- from the subclass is either too high or too low (i.e., tion of learning splits the overall experimentation pro- continuation only when the outcome is within an cess into two distinct phases. An exploration phase interval). Furthermore, the overall test cycle is ter- that occurs at the beginning of the testing process minated when any of the observed test outcomes is and is characterized by a strong tendency to con- very high, because this would indicate that the like- tinue testing even when outcomes are good. During lihood of obtaining a still higher performance from this phase, the “design-build-test-analyze” iterations any of the design classes is low, given the experimen- allow the design team to learn about the remaining tation cost. design options. We also find that greater correlations among the design performances enable faster learning and thus reduce the length of the exploration phase. 4. Discussion Once the design team gains sufficient understanding This study examines the learning aspect of iterative about the design space, experimentation transitions testing cycles during the experimentation stage of into an exploitation phase. At that point, the likelihood a NPD process. We conceptualize each design con- of terminating testing and choosing the best design figuration as a vector of attributes. These attributes (found so far) increases. During this phase, the design jointly affect the final performance of each config- team, based on what they have learned, conducts the uration in complex ways, due to possible coupling “design-build-test-analyze” iterations only to test the effects that are not fully known to the design team. remaining promising alternatives. Designs also may exhibit similarities with respect Finally, we analyze how the optimal continuation to their attributes, thus making some designs closer decisions change with the initial test outcomes. Our to others in the design space. Past experience and findings are summarized in Table 1, and they provide domain knowledge may allow the design team to a useful guideline for contingency planning. When Erat and Kavadias: Sequential Testing of Product Designs: Implications for Learning Management Science 54(5), pp. 956–968, © 2008 INFORMS 967 the performance correlations of designs are high, poor whole. Firms are increasingly seeking innovation test outcomes at the initial stages imply that there is through collaborative efforts; such settings present not much more to expect from the remaining design challenges in understanding how firms manage dis- space. Thus, the team should stop or transition to the tributed (or delegated) experimentation (Terwiesch exploitation phase to minimize the experimentation and Loch 2004). At the same time, it is also impera- cost. If, on the other hand, the design correlations are tive to recognize that experimentation and search are low, then extrapolating from initial poor results is pre- performed by individuals and, hence, are subject to mature and exploration may continue. behavioral traits (see Zwick et al. 2003, for a discus- Our policy exhibits a robust structure along two sion of how certain behavioral elements affect sequen- extensions: (i) the design team uncertainty about tial choices). Such extensions would require further the performance dependencies (i.e., unknown correla- field observation of how actual testing teams arrive at tion), and (ii) nonhomogeneous performance depen- decisions/judgments during experimentation cycles. dencies (i.e., a design space with distinct subclasses). 5. Electronic Companion 4.1. Limitations and Future Research An electronic companion to this paper is available as In our analytical model, we employed a set of part of the online version that can be found at http:// assumptions in order to develop a qualitative under- mansci.journal.informs.org/. standing of the impact of the design space structure on learning and on the optimal amount of experimen- Acknowledgments tation. However, three of these assumptions challenge The authors thank Christoph Loch, Beril Toktay, the asso- the mathematical generality of our model and thus ciate editor, and two anonymous reviewers for their valu- merit further discussion: [A1] normal distribution of able suggestions and comments that have greatly helped design performances; [A2] equal testing cost within a improve this paper. class; and [A3] equal variance (uncertainty) in design performances within a class. References Theorem 4 offers an impossibility result that demonstrates that the three key assumptions (listed Adam, K. 2001. Learning while searching for the best alternative. J. Econom. Theory 101 252–280. above) are not merely made for convenience, but are, Allen, T. J. 1977. Managing the Flow of Technology. MIT Press, Cam- in fact, necessary to achieve mathematical tractability. bridge, MA. Clark, K. B. 1989. Project scope and project performance: The effects Theorem 4. Index policies are suboptimal if anyone of of parts and supplier strategy in product development. Man- the three assumptions, A1, A2,orA3 is relaxed to account agement Sci. 35(10) 1247–1263. for arbitrarystructures. Clark, K. B., T. Fujimoto. 1989. Lead-time in automobile develop- ment: Explaining the japanese advantage. J. Tech. Engrg. Man- Theorem 4 demonstrates that the three key assump- agement 6 25–58. tions do indeed place our model at the boundaries Cusumano, M., R. Selby. 1995. Microsoft Secrets. Free Press, New of mathematical tractability (defined as the existence York. of an index policy). Despite of this inherent limit of Dahan, E., H. Mendelson. 2001. An extreme-value model of concept testing. Management Sci. 47(1) 102–116. generalizability, the model does offer an increased DeGroot, M. H. 1968. Some problems of optimal stopping. J. Roy. understanding of the key implications of design Statist. Soc. B. 30 108–122. space structure on optimal experimentation. We con- Ethiraj, S. K., D. Levinthal. 2004. Modularity and innovation in sider our balance between realism and mathematical complex systems. Management Sci. 50(2) 159–173. tractability to be a fair price to pay for developing this Globe and Mail, The. 2004. Nike suits will make Canucks so swift. understanding. Loch et al. (2001) highlight the effect (June 24) S3. Johnson, R. A., D. W. Wichern. 2002. Applied Multivariate Statistical of problem parameters, such as test fidelity and the Analysis, 5th ed. Prentice Hall, Upper Saddle River, NJ. delay costs, on the total experimentation cost. Dahan Kauffman, S. A. 1993. The Origins of Order- Self-Organization and and Mendelson (2001) examine the tradeoff between Selection in Evolution. Oxford University Press, Oxford, New performance benefits and experimentation cost when York. experiments are run in parallel (thus eliminating any Kavadias, S., C. Loch. 2003. Dynamic prioritization of projects at a potential for learning). Our approach thus comple- scarce resource. Production Oper. Management 12 433–444. Loch, C., C. Terwiesch. 2007. Coordination and information ments these past efforts by explicitly accounting for exchange. C. H. Loch, S. Kavadias, eds. Handbook of New Prod- the design space structure and its effects on learning uct Development Management, Chap. 11. Elsevier-Butterworth/ and on the amount of experimentation. Heineman, Oxford, UK. Finally, the limitations of the current conceptu- Loch, C., A. DeMeyer, M. Pich. 2006. Managing the Unknown- A New Approach to Managing High Uncertainty and Risk in Projects. John alization of the optimal experimentation and some Wiley & Sons, London. of the emerging industry practices suggest a need Loch, C. H., C. Terwiesch, S. Thomke. 2001. Parallel and sequential for re-evaluating the search theoretic model as a testing of design alternatives. Management Sci. 47(5) 663–678. Erat and Kavadias: Sequential Testing of Product Designs: Implications for Learning 968 Management Science 54(5), pp. 956–968, © 2008 INFORMS

PRC. 2004. Microsystems packaging research center: Leading the Thomke, S. H. 1998. Managing experimentation in the design of SOP and Nano paradigms in partnership with global industry. new products. Management Sci. 44(6) 743–762. Georgia Tech Packaging Research Center, Atlanta. Thomke, S. H. 2003. Experimentation Matters- Unlocking the Poten- Rencher, A. 2002. Methods in Multivariate Analysis. Wiley Series in tial of New Technologies for Innovation. Harvard Business School Probability and Statistics, New York. Press, Cambridge, MA. San Jose Mercury News. 2004. Technology has role in cyclist’s train- Thomke, S., M. Holzner, T. Gholami. 1999. The crash. Sci. Amer. ing. (June 28). 280(3) 92–97. Simon, H. A. 1969. The Sciences of the Artificial, 1st ed. MIT Press, Cambridge, MA. Ward, A., J. Liker, J. Cristiano, D. Sobek. 1995. The second toyota paradox: How delaying decisions can make better cars faster. Sobek, D., A. Ward, J. Liker. 1999. Toyota’s principles of set-based Sloan Management Rev. 36 43–61. concurrent . Sloan Management Rev. 40 67–83. Stadler, P. F. 1992. Correlation in landscapes of combinatorial opti- Weitzman, M. L. 1979. Optimal search for the best alternative. mization problems. Eur. Physical Lett. 20 479–482. Econometrica 47 641–654. Terwiesch, C., C. H. Loch. 2004. Collaborative prototyping and Zwick, R., A. Rapoport, A. K. C. Lo, A. V. MuthuKrishnan. 2003. pricing of custom-designed products. Management Sci. 50(2) Consumer sequential search: Not enough or too much? Mar- 145–158. keting Sci. 22(4) 503–519.