Quick viewing(Text Mode)

Toward a Causal Interpretation from Observational Data: a New Bayesian Networks Method for Structural Models with Latent Variabl

Toward a Causal Interpretation from Observational Data: a New Bayesian Networks Method for Structural Models with Latent Variabl

Published online ahead of print May 12, 2009

Information Systems Research informs ® Articles in Advance, pp. 1–27 y doi 10.1287/isre.1080.0224 issn 1047-7047 eissn 1526-5536 © 2009 INFORMS .org. ma ms file or The

. Research Note ibers Toward a Causal Interpretation from Observational missions@inf subscr per Data: A New Bayesian Networks Method for to Structural Models with Latent Variables policy institutional

this Zhiqiang (Eric) Zheng to School of Management, University of Texas at Dallas, Richardson, Texas 75083, le [email protected] Paul A. Pavlou ailab v Fox School of Business and Management, Temple University, Philadelphia, Pennsylvania 19122, regarding a [email protected]

made ecause a fundamental attribute of a good theory is , the Information Systems (IS) literature has

is strived to infer causality from empirical data, typically seeking causal interpretations from longitudinal,

questions B experimental, and panel data that include time precedence. However, such data are not always obtainable and y observational (cross-sectional, nonexperimental) data are often the only data available. To infer causality from an which observational data that are common in empirical IS research, this study develops a new data analysis method that integrates the Bayesian networks (BN) and structural equation modeling (SEM) literatures. send Similar to SEM techniques (e.g., LISREL and PLS), the proposed Bayesian Networks for Latent Variables ersion, (BN-LV) method tests both the measurement model and the structural model. The method operates in two v stages: First, it inductively identifies the most likely LVs from measurement items without prespecifying a

Please measurement model. Second, it compares all the possible structural models among the identified LVs in an . ance exploratory (automated) fashion and it discovers the most likely causal structure. By exploring the causal struc-

site tural model that is not restricted to linear relationships, BN-LV contributes to the empirical IS literature by Adv overcoming three SEM limitations (Lee et al. 1997)—lack of causality , restrictive model structure, and in lack of nonlinearities. Moreover, BN-LV extends the BN literature by (1) overcoming the problem of latent vari- able identification using observed (raw) measurement items as the only inputs, and (2) enabling the use of author’s ticles ordinal and discrete (Likert-type) data, which are commonly used in empirical IS studies. Ar

the The BN-LV method is first illustrated and tested with actual empirical data to demonstrate how it can help reconcile competing hypotheses in terms of the direction of causality in a structural model. Second, we conduct this a comprehensive simulation study to demonstrate the effectiveness of BN-LV compared to existing techniques to in the SEM and BN literatures. The advantages of BN-LV in terms of measurement model construction and

including structural model discovery are discussed. ight , yr Key words: causality; Bayesian networks; structural equation modeling; observational data; Bayesian graphs History: Vallabh Sambamurthy, Senior Editor and Associate Editor. This paper was received on October 25, cop ebsite 2006, and was with the authors 20 months for 2 revisions. Published online in Articles in Advance. w holds other y 1. Introduction the requisite attention. Similar to most other disci- an Because a fundamental attribute of a good theory plines (e.g., Mitchell and James 2001, Shugan 2007), the on IS discipline tends to avoid issues of causality because INFORMS is causality (Bagozzi 1980), causality inference (X causes Y ) is deemed invaluable in the social and of the difficulty in inferring causal relationships from

posted behavioral sciences in general and information sys- data, and because causality is only inferred from pure yright:

be tems (IS) research in particular. However, despite the theory. This is partly because of the fact that causality enhanced sophistication of IS studies in terms of the- inference requires strict conditions. Though there is no Cop not ory and empirical testing, causality has not received consensus on the necessary and sufficient conditions

1 Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables 2 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS y .org. ma for inferring causality, Popper’s (1959) three condi- (3) help spawn future research in refining SEM-based ms file

or tions for inferring causality are generally accepted: methods that render causal interpretations.

The (1) X precedes Y ; (2) X and Y are related; and (3) no According to Lee et al. (1997), SEM methods have . factors explain the X → Y relationship. three key limitations: lack of causality inference, To satisfy these strict conditions, researchers need to restrictive model structure, and lack of nonlinearities. ibers use longitudinal, experimental, or panel data with First, though SEM was originally designed to model missions@inf time precedence between variables X and Y to account causal relationships, causality has gradually faded subscr per for confounds and reverse causality (Allison 2005).1 away from SEM studies (Pearl 2000). In fact, a review to However, it is often impossible to obtain such data of the literature suggests that SEM studies do not in IS research (Mithas et al. 2006, p. 223) and obser- attempt to infer causality, and most SEM (Cartwright

policy vational (cross-sectional, nonexperimental) data are 1995) and IS researchers (Gefen et al. 2000) believe institutional that SEM models cannot infer causality.3 The inability

this often the only data available. Therefore, our objective to is to develop a method to help infer causality using for has forced IS researchers to refrain le observational data that are commonly used in empiri- from even discussing issues of causality in IS studies.

ailab cal IS research. Second, most SEM studies specify one model struc- v regarding a Following the literature that maintains that “near” ture and use data to confirm or disconfirm this spe- 4 (versus “absolute”) causality inference is possible from cific structure by operating in a confirmatory mode.

made observational data (e.g., Granger 1986, Holland 1986), This prevents the automated exploration of alterna- is tive or equivalent models. Chin (1998) warns that

questions we develop a new data analysis method built on the

y Bayesian networks (BN) and structural equation mod- overlooking equivalent models is common in SEM

an studies, and Breckler (1990) showed that only 1 of 72 which eling (SEM) literature that offers a causal interpreta- tion to relationships among latent variables (LVs) in published SEM studies even suggests the possibility send structural equation models.2 Our proposed method of alternative models. Third, SEM only encodes linear ersion, relationships among constructs, essentially ignoring v (termed BN-LV—Bayesian networks for latent vari- the possibility of nonlinear relationships.5

Please ables) encodes the relationships among LVs in a graph- . ance ical model as conditional probabilities, it accounts To address these three SEM limitations, we devel-

site oped the BN-LV method, which has three key

Adv for potential confounds, and it discovers the most properties: First, it encodes the relationships among in likely causal structure from observational data. The proposed BN-LV method seeks to (1) sensitize IS constructs as conditional probabilities that, according to

author’s Druzdzel and Simon (1993), can offer a causal interpre- ticles researchers about the importance of causality and

Ar tation (as opposed to SEM, which uses correlation that the present the possibility to infer causal relationships from data, (2) offer a method to IS researchers to help this 3 SEM techniques are primarily based on linear equations (e.g., PLS) to infer causality among constructs from observational or covariance structures (e.g., LISREL). Additional conditions such data while overcoming key SEM limitations, and including as isolation of competing hypotheses (Cook and Campbell 1979) or ight , temporal ordering (Bollen 1989) are deemed necessary. yr 1 Even with longitudinal data that have time precedence, it is not 4 To the best of our knowledge, no SEM techniques allow researchers cop ebsite readily known which variable precedes which. It is often impos- to automate the process of examining alternative models. Manual w sible to know when a person formed certain perceptions (e.g., examination of alternative models becomes virtually impossible for

holds perceived usefulness and perceived ease of use), even if these vari- complex models with multiple constructs. other ables are measured in different periods. Thus, even data with time 5 There have been attempts to incorporate and nonlinear y precedence may not correspond to the actual timing of a person’s effects in SEM (e.g., Kenny and Judd 1984). However, as acknowl- an perceptions. edged by Kenny and Judd (1984, p. 209), their method is prelim- on 2 From a theoretical point of view, because the proposed BN-LV inary because it only deals with a single nonlinearity (quadratic INFORMS method uses observational, cross-sectional data, it only addresses function). Existing approaches need to prespecify the exact form of two of Popper’s (1959) three conditions for inferring causality, the nonlinear relationship. A general approach with unknown non- posted excluding the condition that X must precede Y . Therefore, it is not linearities still remains open in the SEM literature. Similarly, there yright: be a necessary and sufficient condition for inferring absolute causal- have been attempts in PLS to model interaction effects (e.g., Chin ity, but it is a method for inferring “near” causality (Granger 1986, et al. 2003) but a general method for dealing with nonlinearities Cop not Holland 1986). among LVs is still not incorporated in to the PLS method. Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 3 y .org. ma does not imply causality). Second, BN-LV can auto- need to be aware of theoretical rationale justifying the ms file

or matically discover the most likely structural model X → Y or Y → X causal orders.” However, solely rely-

The from observational data without imposing a prespeci- ing on theories to reconcile the directionality of causal- . fied structure, thus exploring alternative SEM models. ity may not be sufficient because there can be equally Third, BN-LV does not rely on any functional form plausible theories, such as the direction of causality ibers (e.g., linear) to capture the relationships among con- between trust and ease of use. BN-LV is particularly missions@inf structs, thus allowing potentially nonlinear relation- useful in these circumstances by providing a data- subscr per ships to freely emerge in the structural model. driven method to reconcile competing hypotheses. to Similar to existing SEM techniques (e.g., LISREL This also has implications for new theory develop- and PLS), the proposed BN-LV method operates in ment (where there is no theory basis), which is com- policy two stages: measurement model construction and mon in IS research because of the rapid change of IT. institutional 6

this structural model discovery. First, BN-LV inductively To evaluate BN-LV relative to existing data analy- to identifies the LVs given the measurement items in sis techniques for testing the measurement and the le an exploratory mode. This is achieved by our pro- structural model, we conducted a large-scale simula- ailab posed LVI (Latent Variable Identification) algorithm, tion study by varying four data dimensions: sample v regarding a which is based on testing the conditional indepen- size, noise, linearity, and normality. First, we com- dence axiom (Kline 1998, Heinen 1996). This axiom pared LVI with the exploratory factor analysis (EFA) made asserts that the measurement items of the same LV (SAS Proc Factor) and the confirmatory factor anal- is

questions are supposed to be caused by the LV and thus ysis (PLS) for measurement model testing. Second, y should be independent of each other (conditional on we compared BN-LV with LISREL and PLS in terms an which the LV). Second, after the LVs are identified, BN-LV of structural model testing. Third, we compared our exploratory discovers the most likely causal structure proposed OL scoring function with two existing BN send among the LVs. In particular, we develop the OL scoring functions—a Bayesian-Dirichlet-based func- ersion, v (ordered logit) scoring function to select competing tion (Heckerman 1996) and a Gaussian-based function

Please structures specifically for ordinal and discrete (Likert- (Glymour et al. 1987). The results show that BN-LV . ance type) data, which are common in IS research. Besides, overall outperforms all the other techniques under site

Adv BN-LV can also be used in a confirmatory mode by three of the four simulated conditions (size, linearity, in examining the fitness of a potential causal structure. normality) except when the data are noisy, because Overall, the inputs to the BN-LV method are the raw SEM methods (LISREL and PLS) tend to work well author’s ticles measurement items, and the final output is the most with noisy data (Fornell and Larcker 1981). Ar the likely causal BN graph that links the identified LVs. This study contributes to the IS literature by propos- We describe how BN-LV works with actual empiri- this ing a new data analysis method for inferring causal to cal data to demonstrate how BN-LV can help reconcile relationships from observational, cross-sectional, competing hypotheses in terms of the directionality of including Likert-type data that are prevalent in IS research. ight , causality when integrating trust with the technology yr Our BN-LV method has several advantages over acceptance model (TAM) (Gefen et al. 2003, Pavlou alternative SEM methods: First, it tests the measure- cop ebsite 2003), specifically the relationship between two con- w ment model by identifying the appropriate LVs from structs: trust and ease of use. Carte and Russell (2003) raw measurement items, operating in an exploratory holds argue that it is a common error in IS research not to other mode without imposing a determined measurement y examine the reverse causality between two variables X model structure (as opposed to SEM). Our novel use an and Y (Carte and Russell 2003, p. 487): “Investigators of the axiom enables causal on

INFORMS interpretation between the LV and its associated 6 The proposed BN-LV method is essentially a data analysis tech- measurement items, thereby being the only method nique that only examines the measurement model (validation of the

posted that is consistent with the theory of measurement. measurement properties of LVs) and the structural model (testing yright: be the causal relationships among LV). It does not address issues of Second, BN-LV infers causal (as opposed to correla- theory and measurement development, data collection, and theory tion) links between the identified LVs by testing all Cop not implications that a complete research project includes. plausible structural models in an automated fashion Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables 4 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS y .org. ma and discovering the most likely one. This exploratory the basis of the modern scientific concept that spe- ms file

or nature offers a major advantage over SEM techniques cific stimuli produce standard results under certain

The that require manual specification of plausible models, conditions. Descartes (1637) argued that causality can . especially for complex models where such manual be understood and that cause is identical to sub- work becomes virtually impossible. This property stance. Kant (1781) posited cause as a basic category ibers also becomes valuable where there is little or no of understanding, arguing that causality is a world of missions@inf prior theory to guide the structure specification or “things in themselves.” subscr per when researchers want to let the data “speak out.” Aristotle (350 b.c.), Descartes (1637), and Kant (1781) to Also, BN-LV can still differentiate among prespecified posit that causality can be comprehended, but other candidate structures, allowing IS researchers to test philosophers disagree. In response to Aristotle’s “four

policy competing theories or question existing ones in a causes,” Spinoza (1662) believed that all final causes institutional are nothing but human fictions. Plato, in his famous this confirmatory mode. Finally, BN-LV offers a causal to interpretation among the LVs in the structural model “Allegory of the Cave,”8 questioned that humans can le by representing conditional probabilities. BN-LV understand causality. Hume (1738) holds the same

ailab relaxes the assumption of linear structures imposed by opinion, concluding that causality is not real but a v regarding a SEM methods, and BN-LV clearly outperforms SEM fiction of the mind. To account for the origin of this techniques (LISREL and PLS) when the structural fiction, Hume (1738) used the doctrine of associa- made model is tested with nonlinear simulated data. tion. He argues that we only learn by experience the is frequent conjunction of objects, without ever being questions The paper is organized as follows. Section 2 reviews

y the philosophical origins of causality, discusses the able to comprehend anything like the true causal an which challenges of inferring causality from data, and connection between them (Hume 1738, p. 46). Simi- reviews existing approaches for inferring causality larly, Pearson (1897), a founder of modern , send (propensity scores, SEM, and BN). Section 3 presents denied that causality was anything beyond frequency ersion, v the method development for the two components of of association. Summarizing the philosophical origins on causality, Please BN-LV—the LVI algorithm that identifies LVs from . ance raw measurement items (measurement model) and there is no consensus whether causality is real, simple

site association among phenomena, artifact, or the mind,

Adv the OL scoring function that helps build a Bayesian in network to identify causal relationships among LVs or even fiction (Shugan 2007). Despite these doubts (structural model). Section 4 describes the steps of the that causality is real or not, throughout history there author’s ticles proposed BN-LV method and evaluates the proposed have been many attempts to operationalize and infer

Ar causality from data, as discussed below. the method through an extensive empirical study with this actual data and a large-scale with simu- 2.2. Operationalizing Causality from Data to lated data. Section 5 discusses the study’s contribu- Hume (1738) laid the foundations for the modern

including tions and the advantages and limitations of BN-LV.

ight view of causality. Hume’s (1738) definition of X , yr causes Y stresses three conditions that can be verified

cop 2. Literature Review through observation: (1) precedence: X precedes Y in ebsite w 2.1. Philosophical Origins of Causality time; (2) contiguity: X and Y are contiguous in space

holds constant conjunction X Y The notion of causality entails a relationship of a and time; and (3) : and always other b.c. cooccur (or not cooccur). y cause to its effect. As early as 350 , Aristotle pro- an posed four distinct causes: the material, the formal, 8 7 Prisoners are in an underground cave with a fire behind them, on the efficient, and the final. Aristotle’s “four causes” is bound so they can see only the shadows on the wall cast by pup- INFORMS pets manipulated behind them. They think that this is all there is 7 The material cause is what something is made of. The formal cause to see; if released from their bonds and forced to turn around to posted is the form, type, or pattern according to which something is made. the fire and the puppets, they become bewildered and are happier yright: be The efficient cause is the immediate power acting to produce the left in their original state. They would think the things they see work. The final cause is the end or motive for the sake of which the on the wall (the shadows) were real; they would know nothing of Cop not work is produced (e.g., the owner’s pleasure). the real causes of the shadows (Plato 360 b.c.). Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 5 y .org. ma Contemporary research has attempted to opera- changed through time. The two types are very differ- ms file

or tionalize causality through data-driven probabilities. ent in nature and probably require different definitions and methods of analysis.

The Suppes’ (1970) well-known operational causality def- . inition states that event X causes event Y if the prob- The well-known Granger causality, for example, ability of Y is higher given X than without X, i.e.,

ibers addresses causality for time-series data. Temporal PY X>PY ∼X. This definition is consistent causality often uses experimental methods that permit missions@inf with Hume’s (1738) constant conjunction criterion, yet (Mithas et al. 2006). However, because subscr per it makes Hume’s criterion probabilistic. A problem such data are often difficult or even impossible to to arises because there is often a third (confounding) fac- obtain, we focus on methods for inferring causal- tor. A common example is that atmospheric current Z ity from observational, cross-sectional data that are policy causes both lightning X and thunder Y . This satis- common in empirical information systems research. institutional ∼ this fies PY X > PY X; however, lightning does not Causality inference from such data is well-accepted to cause thunder. Suppes (1970) solves this problem by in the statistics (Holland 1986, Rubin and Waterman le necessitating that X and Y have no common cause, 2006), econometrics (Granger 1986), computer science ailab thus avoiding a statistical confounding. This condi- (Druzdzel and Simon 1993), and IS literatures (Lee v regarding a tion is also stressed in Popper (1959) who adds a et al. 1997). We review three main methods—SEM, condition that no third variable Z accounts for the propensity scores, and BN. made X–Y association. Cartwright (1995) argues that avoid- 2.3.1. Structural Equation Modeling (SEM). SEM is questions ing confounds requires that the relevant probabilities is one of the most common data analysis methods y be assessed relative to background contexts where all in IS research. Gefen et al. (2000) report that 45% an which other causal factors are held fixed. Probability may of empirical papers in Information Systems Research thus infer causality if the data from which the prob- send and 25% papers in Management Information Systems abilities are computed are obtained with appropriate Quarterly use SEM techniques. Its popularity stems ersion, v care to avoid confounds. from its advantage over , path anal-

Please The view that can allow ysis, factor analysis, panel models, and simultaneous . ance for causality inference has long been proposed in equation models. Though SEM was originally devel- site Adv the causality literature (Glymour et al. 1987, Pearl oped to model causal relationships, SEM methods in and Verma 1991, Spirtes et al. 2000). Druzdzel and are no longer believed to infer causality (see Foot- Simon (1993) explain that conditional probabilities note 3). To overcome this limitation, IS researchers author’s ticles make it possible to represent asymmetries among have attempted to extend SEM methods to allow for Ar the variables and, thereby, causality. There is strong evi- causality inference. Lee et al. (1997) proposed an eight- this dence that human beings are not indifferent to causal step framework that attempts to represent and dis- to relationships and often give causal interpretation to cover causal relationships from SEM data. Their idea

including conditional probabilities (Shugan 2007). In particu- is to integrate confirmatory analysis in SEM with ight , 9 yr lar, many studies seek to operationalize causality dis- exploratory analysis using TETRAD. However, Lee et al. (1997) did not elaborate on how causal rela- cop covery from data with conditional probabilities (e.g., ebsite

w Heckerman 1996, Pearl 2000). tionships can be discovered from data, nor did they demonstrate how TETRAD can be integrated with holds

other 2.3. Methods for Inferring Causality from SEM methods. y Observational Data

an 2.3.2. Propensity Scores. The propensity score Inferring causality can take place with temporal or

on method was originally proposed by Rosenbaum and

INFORMS cross-sectional data, with each approach having a Rubin (1983) to help assess causal effects of inter- different focus. According to Granger (1986, p. 967): ventions. For example, Rubin and Waterman’s (2006) posted In cross-sectional causation one is asking why a partic- yright: be ular unit is placed in a certain part of the distribution 9 TETRAD (Glymour et al. 1987) is a data-driven, graph-based for the variable of interest. In temporal causality one method developed in the machine-learning field that aims to dis- Cop not is asking why parameters of that distribution have cover causal relationships from data. Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables 6 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS y .org. ma intervention—a pharmaceutical salesman’s visit to a in G. Denote as the parents of node X in G.As

ms i i file

or doctor—was shown to have a causal effect on a doc- in Figure 1, node C’s parents are A and B. Given the

The tor’s drug prescription. Their approach first summa- structure in G, the joint probability distribution for V . rizes all covariates into a single propensity score by is given by: regressing (often through a logistic regression) the ibers m treatment (salesman’s visit) on a set of covariates. The = px pxi i (1) missions@inf propensity score is thus the probability of a doctor i=1 subscr per being visited as a function of all covariate variables.

to From the chain rule of probability, we have: Mithas et al. (2006) used this approach to show the causal effect of customer relationship management m = policy applications on one-to-one marketing effectiveness. px pxi x1xi−1 (2) i=1 institutional

this They prescribed a set of assumptions that researchers to must make to infer causality with the aid of propen- A graph G represents causal relationships when le sity scores at the individual, firm, and economy lev- there is an edge from A to B, if and only if A is a ailab els. However, existing propensity scores methods only direct cause of B in G (Spirtes et al. 2002). For instance, v regarding a deal with one cause (treatment) and one effect. They when Figure 1 is a causal graph, an edge A → C is are not applicable to the causal graph discovery prob- interpreted as A is directly causing C or that C is made lem we aim to address in this paper, where the graph causally dependent on A. Druzdzel and Simon (1993) is

questions (or structural model) is composed of a network of introduced the basic conditions by which BN could y multiple causes and effects. reflect a causal structure: A BN is causal if and only if an which (a) each node of G and all of its predecessors describe 2.3.3. Bayesian Networks (BN). BNs are graphi- variables involved in a separate mechanism in the send cal structural models that encode probabilistic rela- graph, and (b) each node without predecessors repre- ersion, tionships among variables (Heckerman 1996). The BN v sents an exogenous variable. More formally, a causal literature has made major advances in inferring causal

Please Bayesian network is defined in terms of d-separation

. relationships from observational data (Binder et al. ance and the causal Markov assumption.

site 1997, Pearl 1998, Spirtes et al. 2002). We follow Heck-

Adv d-separation. A set of variables Z is said to d-separate erman’s (1996) and Friedman et al. (2000) notation to in variable X from variable Y , if an only if Z blocks every represent a generic graph (Figure 1). A graph GVE path from X to Y (Pearl 2000). Graphically, d-separation

author’s is referred to as a DAG (), when ticles typically exhibits itself in two cases: (1) X → Z → Y

Ar the edges E linking node V are directed and acyclic. the and (2) X ← Z → Y . The intuition behind this is: X and Directed means E has an asymmetric edge over V ,

this Y become independent of each other if they are con- and acyclic means that the directed edges do not form to ditioned on variable Z. X causes Y through Z in case circles.

including (1) and X and Y have a common cause Z in case (2). ight , Associated with each edge is a conditional proba- →  ← yr There is also a third case X Z Y , denoting that bility. A BN is a DAG that encodes a set of conditional  X and Y have a common effect Z . This case is oppo- cop

ebsite probabilities and conditional independence assertions

w site to d-separation: If two variables are independent, about variables V (Heckerman 1996). Lack of possi-  they will become dependent once conditioned on Z . holds ble arcs in G encodes conditional independencies. Let other = A set Z that d-separates X and Y should therefore not

y V X X , where m is the number of variables  1 m belong to Z . The notion of d-separation is especially

an and X is both the variable and its matching node i useful in constructing a BN because it controls possible on

INFORMS confounds in the form of Z. Figure 1 A Generic Bayesian Network with Five Nodes Causal Markov Assumption. This is the central

posted assumption that defines a causal BN. According to A E yright:

be this assumption, each node is independent of its non- C descendants in the graph, conditional on its parents

Cop not B D in the graph. Simply put, given a node’s immediate Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 7 y .org. ma cause, we can disregard the causes of its ancestors. Heckerman (1997) consider a simple case where there ms file

or The parents of a node form the smallest set of vari- only exists a single hidden LV with a known struc-

The ables for which the relevant conditional independence ture. What remains unknown and needs to be deter- . holds. This assumption greatly reduces the complex- mined is only the value of this LV. This simplifies the ity of Equation (2) and the joint probability of Fig- hidden LV problem to a special type of missing data ibers ure 1 simplifies to: PABCDE = PA× PB× problem where all values of the LV are missing. Impu- missions@inf PC A B × PE C × PD C. By accepting tation methods, such as the expectation maximization subscr per the causal Markov assumption, we can then infer algorithm, can be used to impute the missing values. to some causal relationships from observational data Similarly, Binder et al. (1997) assume that the complete (Friedman et al. 2000). network structure that includes the location of the LVs policy is known, and the goal is to learn the BN parameters in institutional

this 3. Method Development the presence of LVs. However, when a certain structure to is involved, the difficulty arises from having an unlim- le 3.1. Rationale and Overview of the Proposed Method ited number of hidden LVs and an unlimited number ailab

v of network structures to contain them (Cooper 1995).

regarding The main interest of SEM studies is the structural a 11 model, i.e., the relationships among LVs (or theoreti- Determining network structures is thus NP-hard and cal constructs). LVs are assumed to be unobservable heuristic methods may be necessary. For instance, made phenomena that are not directly measurable. What is Elidan et al. (2000) propose a “natural” approach by is questions measurement items identifying the “structure signature” of hidden LVs,

y observable, however, are the of each which uses a heuristic to identify “semicliques” when

an LV, the raw inputs to an SEM model. SEM studies which thus also address the measurement model—the relation- each of the variables connects to at least half of the others. If such a semiclique is detected, a hidden vari- send ships among the measurement items and their LVs— able is introduced as the parent node to replace the ersion, that test how well the LVs were actually measured. v First, given a set of measurement items, how can semiclique. Silva et al. (2006) questioned this ad-hoc Please approach and proposed a method for determining the . we identify the overarching LVs? This question does ance not often arise in the SEM literature because the com- location of LVs based on a TETRAD difference that, site Adv mon SEM methods (e.g., confirmatory factor analysis loosely speaking, captures the intercorrelations among in (CFA) in LISREL and PLS) mostly work in a confir- four variables (Spirtes et al. 2000, p. 264). However, the matory mode by prespecifying which measurement approach of Silva et al. (2006) only focuses on lin- author’s ticles items load on which LVs (Gefen et al. 2000). Lee et al. ear continuous LVs for which the correlation-based Ar the (1997) criticize this confirmatory mode, pointing out TETRAD difference is applicable. this BN as a potential alternative for exploratory analysis. In sum, though the BN literature has made some to However, building a BN in the presence of hidden progress in developing methods for detecting “hid- including

ight LVs is a nontrivial problem that has long been rec- den” LVs from data, to the best of our knowledge , yr ognized as one of the crucial, yet unsolved problems constructing a BN with LVs from measurement items

cop in the BN literature (Cooper 1995, Friedman 1997). has still not been achieved. The proposed LVI algo- ebsite

w There are two fundamental issues to be addressed: rithm is thus developed (§3.2) to fill in this gap. It

holds (1) detecting the structure (or location) of LVs, and uses the axiom of conditional independence as the

other (2) calculating the values of the identified LVs. building block that provides a causal interpretation to y The BN literature only addresses the second issue the measurement model (§3.2.1); the value of an LV an without dealing with the structure problem.10 Elidan is determined by nonlinear programming that max- on

INFORMS and Friedman (2001) show that even learning the imizes conditional independence (§3.2.2). The actual dimensionality—the number of possible values—of

posted LVs is hard. Cooper (1995) and Chickering and 11 = For n nodes, the number of possible BN structures is fn yright: + − be n − i 1 i in i − = i=1 1 Cn2 fn i. For n 2, the possible structures is 3; 10 This is referred to as the structure problem of LVs because deter- for n = 3, it is 25; and for n = 5, it is 29,000 (Cooper and Herskovits Cop not mining the location of LVs in a BN also specifies the BN structure. 1992). Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables 8 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS y .org. ma algorithm is presented in §3.2.3, which takes the raw and (2) it generates an equivalent class of graphs ms file

or measurement items as inputs and outputs the identi- among the LV, scores each candidate graph using the

The fied LVs. OL scoring function, and searches for the structure . Second, after the LVs are identified, how can we with the highest fitness score (described in §3.3). discover the most probable causal structure among ibers the LVs? Two generic issues need to be addressed: 3.2. Stage 1. Identifying Latent Variables from missions@inf (1) How can we determine that one causal structure Measurement Items subscr per is better than the other? (2) How can we search for The theory of measurement assumes that the LVs to the best structure among all possible graphs, a prob- “cause” the direct measurement items (or indica- lem known to be NP-hard? We first adopt the pop- tors).12 In theory, given an LV, the measurement items policy ular PC algorithm (named after Peter and Clark in are independent from each other (Kline 1998). This institutional

this Spirtes et al. 2000) to generate a good initial Bayesian is formally referred to as the axiom or assumption to network to reduce the number of searchers needed of conditional (or local) independence. Conditional inde- le (§3.3.1). We then refine this initial graph using a scor- pendence is the basis of the theory of measurement ailab ing approach (§3.3.2) to compare potential candidate and “the defining attribute of any latent structure v regarding a structures. The two state-of-the-art scoring functions analysis” (Heinen 1996, p. 4). However, existing SEM in the BN literature are the Bayesian-Dirichlet met- or BN models do not directly test this axiom. SEM made ric (Cooper and Herskovits 1992, Heckerman et al. methods mainly use correlation- or covariance-based is

questions 1996) and the Gaussian metric (Glymour et al. 1987). factor analysis methods to categorize measurement y However, neither scoring function is applicable to the items under LVs to test the measurement model.13 an which Likert-type data commonly used in IS, which ren- Herein, we propose a new algorithm that takes the der discrete and ordinal data. Specifically, the Bayesian- raw measurement items as the only inputs and thus send Dirichlet metric assumes a multinomial distribution, outputs the most likely LVs. This is accomplished by ersion, v with parameters distributed as Dirchlet. Its multino- identifying the most likely measurement model for

Please mial assumption, which is applicable to general dis- these measurement items by directly testing the axiom . ance crete data, ignores the ordinal nature of Likert-type of conditional independence. site

Adv data. The Gaussian metric treats data as continuous 3.2.1. Testing the Axiom of Conditional Indepen- in with a Gaussian distribution, ignoring the discrete dence. The conditional independence axiom (Heinen nature of Likert-type data.

author’s 1996, p. 4; Spirtes et al. 2000, p. 253; Bollen 1989) ticles To fill in this gap, we develop a new scoring func- Ar asserts that Rx x y—the conditional correlation bet- the tion, the proposed OL metric (§3.3.2.1). However, the i j ween any two measurement items xi and xj given the this OL metric only computes the overall fitness of a can- latent variable y—should approach zero for any pair to didate structure and it does not infer if a given struc- ∈ = of xi and xj where i, j 1m and i j. Rxixj yis

including ture is significantly better than the other. In view of ight , computed as follows: yr this, we develop a Chi-square test (§3.3.2.2). Integrat- ing the OL metric and the Chi-square test, we can then R − R R cop xixj xiy xj y ebsite =  Rx x y (3) w determine if a certain structure is significantly better i j 1 − R2 1 − R2 than the other. Finally, to intelligently search for the xiy xj y holds

other best causal structure, we adopt the greedy equivalence y search (GES) searching strategy (which is proven to 12 This property of “reflective” LV is shown in SEM models as an an be optimal by Chickering 2002) for BN construction arrow starting from the LV and pointing to the measurement items. on 13 Only one approach (Sarkar and Sriram 2001) uses conditional INFORMS (§3.3.2.3). Overall, the proposed BN-LV method can both independence to discover composite attributes, which can be viewed as a group of attributes caused by LVs. However, it assumes that

posted identify the measurement model and also test the the dependent variable (in the case of Sarkar and Sriram 2001, bank yright: be structural model. The BN-LV method operates in two failure) of the BN is known, and that given the dependent variable, major stages: (1) it first identifies the “hidden” LVs the composite attributes are independent. This is a strict condition Cop not given a set of measurement items (described in §3.2), that SEM studies do not meet. Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 9 y .org. ma We choose the common t-test independence. Without loss of generality, suppose ms file

or r an LV y has m indicators x1x2xm. Let 1, 2, t =  m = The − 2 − m i=1 i 1be the corresponding weights. We . 1 r /n 2 = m then have y i=1 ixi, using the weighted average for correlation coefficient r of a sample size n to deter- approach.15 ibers We formulate the problem of assigning latent scores mine if Rxixj y is significantly different from zero. If missions@inf t>196 when n is sufficiently large, we assume that as an optimization problem of finding the optimal

subscr ∗ per 14 = the correlation is nonzero (p<005). This test is weight vector 12m, such that the to called into question for LVs with only two measure- maximum of all the m × m− 1/2 pairs of conditional 16 correlation R in absolute value is minimized. ment items (say, x1 and x2. For example, Rx1x2 y is xixj y policy always one when y is a linear combination of x1 and Formally, the optimization problem is formulated institutional below: this x2, similar to computing the factor scores in a prin-   to   cipal components factor analysis. This implies that ∗  

le = arg min max Rx x y 1≤ij≤m i =j i j this test cannot empirically verify the axiom with 1m ailab only two items per LV. Torgerson (1958) observed the v ≤ ≤ ∈ regarding s.t. 0 1, i 1n a same phenomenon that he referred to as “measure- i m ment by Fiat,” because it often leads to rejection of = (4) i 1 made the measurement model. It is no accident that many i=1 is questions researchers (e.g., Kline 1998) recommend using more m y = than two measurement items per LV in SEM stud- y ixi an which ies whereas LISREL recommends at least four mea- i=1 surement items per LV for the measurement model to Once the optimal weight ∗ is determined (given send ≤ ≤ 17 converge. 0 i 1, the latent score of y is fixed. The pro- ersion, v 3.2.2. Determining the Latent Scores. A remain- posed minmax approach ensures that the conditional Please independence axiom is met even in the worst case sce-

. ing caveat above is that the axiom of conditional inde- ance ∗ nario (i.e., the max of R ). However, if no such pendence cannot be empirically tested because the LV xixj y site Adv (i.e., y in Equation (3)), at least in principle, is not is found to satisfy the axiom of conditional indepen- in directly measurable and, thus, it cannot be empiri- dence, then the research design or the measurement items may be highly problematic.

author’s cally fixed. To test the conditional independence of a ticles measurement model, an estimate of the value of the 3.2.3. The Proposed Latent Variable Identifica- Ar the LV, referred to as latent scoring, must first be assessed. tion (LVI) Algorithm. The LVI algorithm seeks to dis- this A common latent scoring method is raw sumscore, cover the smallest possible set of LVs (to ensure a to which uses the simple sum of the measurement items. parsimonious model with as few LVs) for the mea- including ight

, A variant of the raw sumscore method is the weighted surement items to be partitioned into disjoint sets yr average. For example, a usual method for estimating while assuring that the axiom of conditional indepen-

cop dence is satisfied within each disjoint set. The notation

ebsite the values of LVs is principal components factor anal-

w ysis where the factor loadings are used as weights for 15 holds computing the latent scores. However, the use of the If the true latent score is required to be an integer (e.g., Likert- other raw sumscore or a weighted average lacks theoretical type scales), one can add an integer constraint to the optimization y model (Equation (5)) or round the latent score to an integer value.

an justification (Skrondal and Rable-Hesketh 2004). 16 One less strict alternative is to minimize the average of those

on We propose an optimal weighting method to com- m × m − 1/2 pairs of conditional correlation. INFORMS pute the latent scores to directly maximize conditional 17 ≤ ≤ Our proposed formulation sets 0 i 1, thereby preventing negative correlations among the measurement items. This follows posted 14 Empirically, this test may be too restrictive for a large n. For the reflective view of measurement where all items are expected to yright: be instance, a small r of 0.08 turns out to be significant when n = 500, be positively correlated with each other. For example, LISREL auto- leading to rejection of the measurement model. Future studies may matically converts a negative correlation into a positive one when Cop not find a more appropriate rule-of-thumb test for R . testing the measurement model. xixj y Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables 10 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS y .org. ma Table 1 Notation for the LVI Algorithm relationships among the LVs. This corresponds to the ms file

or Notation Definition structural model testing part of SEM. The common

The approach to learning a BN from data is by speci-

. X A set of m measurement items, X = X X X ) 1 2 m fying a scoring function (typically variations of the x A measurement item of X i ) of each candidate network struc- ibers k-item set An item set having k-measurement items

Lk Set of valid k-item sets that satisfy the axiom of local ture and then selecting the BN with the highest score missions@inf independence (Friedman et al. 2000). Because examining the possible

subscr C Set of candidate k-item sets (with k items that may not per k satisfy the axiom of conditional independence) network structure is NP-hard, the search algorithms to C-1 The necessary condition that two measurement items of (for the optimal structure) in the BN literature are the same LV should be moderately correlated almost exclusively variations of greedy algorithms. To

policy C-2 The axiom of conditional independence is detailed in §4.1 reduce the number of searches, Spirtes et al. (2002) institutional

this proposed the generic PC algorithm to generate an ini- to used in the LVI algorithm is described in Table 1. LVI tial starting point and then used a greedy search algo- le has two steps (Table 2). rithm based on the scoring function to reduce search

ailab The flowchart of the LVI algorithm is shown in complexity. We follow this common practice and dis- v regarding a Figure 2 and its algorithmic steps are outlined in cover the most likely BN in two steps: (1) generate an Appendix A. The inputs to the LVI algorithm are the initial class of equivalent BN using PC2 (our proposed

made measurement items and the outputs are the disjoint variation of the PC algorithm), and (2) select the is item sets, each of which represents an LV. Each L questions 1 most likely causal BN using a new scoring function

y contains a measurement item. L2 is generated using designed specifically for ordinal and discrete (Likert- an which only the necessary condition C-1. This is because type) data that are commonly found in IS research.

the axiom of conditional independence of L2 is not

send 3.3.1. Generating Equivalent Classes of Bayesian directly testable (§3.2.2). Therefore, the LVI algorithm Networks from Data. Given a set of data, is it pos- ersion, v works best for LVs that are measured with more than sible to create a unique causal Bayesian network? The

Please two measurement items, as is strongly recommended consensus is that one cannot distinguish between . ance by SEM researchers (e.g., Kline 1998, Bollen 1989). BN that specify the same conditional independence site Then, LVI generates L + from L by examining candi- Adv k 1 k from data alone. It is possible that two or more

in date item sets based on C-1 and C-2 (see also Table 1). BN structures represent the exact same constraints of Step 2 prunes all valid item sets by eliminat- conditional independence (every joint probability dis- author’s ticles ing all subsets and overlapping measurement items. tribution generated by one BN structure can also be Ar

the To ensure the smallest number of LVs, the LVI algo- generated by the other). In this case, the BN structures rithm begins from the largest item set (the one with this are said to be likelihood equivalent. the most measurement items) among all L . It then to k When learning an equivalent class of structures eliminates the overlapping items from the item set including from data, we can conclude that the true BN is pos- ight , that is affected the least, after removing any overlap- yr sibly any one of the networks in this class (Friedman ping items. Finally, the LVI algorithm outputs the dis-

cop et al. 2000). An equivalence class of network struc- ebsite joint item sets, each of which represents an underlying

w tures can be uniquely represented by a partially LV, and the value of each LV is computed according directed graph, where a directed edge X → Y sug- holds to formulation (4).18 other gests that all members of the equivalence class contain y the arc X → Y . Otherwise, an undirected X–Y edge

an 3.3. Stage 2. Constructing a Causal Bayesian Network for Structural Models denotes that some members of the class contain arc on → → INFORMS After the LVs are identified and their values are com- X Y while others contain arc Y X. Learning the puted, the next step is to build a BN to test the causal causal relationships among LVs can be regarded as

posted the process of “directing” a graph. yright: be 18 Additional information is contained in an online appendix to this The BN literature (e.g., Glymour et al. 1987, paper that is available on the Information Systems Research website Heckerman et al. 1995) has developed methods to Cop not (http://isr.pubs.informs.org/ecompanion.html). generate equivalent structures that have the same Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 11 y .org. ma Table 2 Detailed Steps of the LVI Algorithm ms file

or Step 1: Identify all sets of measurement items (item sets) that satisfy the axiom of conditional independence

The LVI uses a maximum spanning approach. It starts with a randomly selected measurement item and it incrementally adds items to the item set. It stops when . no item can be added to the item set without violating the axiom. Denote Lk the item set with k measurement items that meet the conditional independence + axiom. The core step of the algorithm is to span from Lk to Lk+1 the item set containing k 1 measurement items that still meet the axiom. This is done by ibers + adding an item not already in Lk into Lk , and then testing the axiom for the new item set with these k 1 items using the method in §3.2.1. This step can incur high computational cost because it involves the optimization procedure to determine the latent score. We impose one condition to limit the possible missions@inf

combinations of Lk+1 to reduce the computation cost: The correlation between any two items in Lk+1, say, xi and xj , should be at least moderate (Kline subscr per 1998, p. 190). The user may specify a threshold to determine “moderate” correlation. We use a low correlation of r = 05 as the generic threshold. Then,

to we eliminate the candidate item sets with k + 1 items that do not meet the conditions in the first place. Step 2: Prune the generated item sets from Stage 1 into disjoint (discriminant) sets

policy LVI first identifies the supersets of item sets and then deletes all subsets. For example, suppose two item sets A = x1x2x3 and B = x1x2x3x4 are

institutional generated by Step 1. Clearly B is a superset of A. In this case, we need to delete subset A to ensure convergent validity. We then deal with item sets that have

this = = to overlapping items. For example, suppose two item sets A x1x2x3 and B x1x4x5x6 have an overlapping item x1 (i.e., x1 loads on both A and B). SEM methods would consider x1 a problematic item because it violates discriminant validity. Skrondal and Rable-Hesketh (2004, p. 8) suggest le either accepting that an item may belong to two or more LVs or discarding the problematic item (Goldstein 1994). Our algorithm detects such problematic items and, by default, we assume the user decides to keep the items. The LVI algorithm then determines the LV that the item is more likely to belong to. ailab v

regarding This is done by testing the impact of deleting the measurement item from the two LVs on the value maxRx x y among the residual measurement items. a i j The measurement item is then assigned to the LV that is affected the most. made underlying undirected graph. As reviewed earlier, a triplet of variables XYZ, where Z is any set is questions of variables besides X and Y .IfX and Y are not

y theories of causal BN are based on the d-separation = an and the causal Markov assumptions. Pearl and Verma d-separated given any Z(Rxy z 0), then there is an which (1991) established theorems to operationalize the con- edge between X and Y . We thus need to test the con- = send dition R 0 for all possible combinations of Z. struction of BN using d-separation. Let Rxy z be the xy z ersion, partial correlation of variable X and Y given Z,from This is again NP-hard (Spirtes et al. 1998). Spirtes v et al. (2002) proposed the PC algorithm that tests the Please

. d-separation condition for any possible combinations ance Figure 2 Flowchart of the Steps of the LVI Algorithm of X, Y , and Z to determine if there is a link between X site Adv and Y . Though the PC algorithm is found to not be as Input X = (X1, X2, …, Xm) in accurate as the scoring approach in general (Silva et al. 2006, p. 211), it is more efficient and it can thus gen- author’s Initiate L = {x }, i  (1, …, m)

ticles 1 i erate BN structures that serve as good starting points Ar the for other scoring-based algorithms.

this  ≠ Generate L2 = {xi, xj} from L1 that satisfy C-1 and i, j (1, …, m), i j Our approach also uses the PC algorithm to dis- to cover an initial causal structure as the input to the

including k = k +1, 2 ≤ k ≤ m –1 scoring-based algorithm in §3.3.2. We slightly modi- ight , yr fied the PC algorithm to make it consistent with SEM Generate candidate set C by adding x (x ∉L ) to L cop k +1 i i k k techniques and we termed our version of the algo- ebsite

w rithm PC2 (Appendix B). Algorithm PC2 refines the PC algorithm in two dimensions: (1) the PC algorithm holds Add all Ck+1 to Lk+1 other Next x uses Fisher’s Z test, which requires all variables to be y i Ck +1 meet C-1 normally distributed, but the PC2 algorithm relaxes an No and C-2? Yes this assumption by using the aforementioned t-test on

INFORMS for correlation coefficients to determine the signifi- When k = m–1 cance of Rxy z (Equation (2)); (2) the PC2 algorithm

posted Prune Lk; delete subsets and handle overlapping items incorporates Verma and Pearl’s (1992) five rules for yright:

be directing graphs. The output of the proposed PC2 algorithm is a par- Cop not Output all Lk tially causal BN Bs because some edges may remain Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables 12 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS y .org. ma undirected. This is a direct result of the limited capac- Suppose there are n records. Let Xx ···x be a set

ms 1 m file

or ity of Verma and Pearl’s (1992) five rules for direct- of m discrete variables. Each variable xi has ri possible

The ing links. Our empirical studies suggest that these values and a set of parents i. Let ijk denote the kth . five rules are too specific. For example, Rule 3 states parent for record j of variable xi. Suppose there are qi that if X → Y , Y → Z, and X–Z, then direct the link such parents. Then, ibers between X and Z as X → Z. This rule only covers a q missions@inf n m i few cases we might encounter in BN construction. The PB D= PB OL + (6) subscr s s ij ijk ijk per PC2 algorithm is thus ineffective in orienting those j=1 i=1 k=1 to links that are not covered by these five rules. How- ever, we found PC2 to be adequate in identifying Once PBs and the data D are known, we have

policy nonedges (i.e., nodes that should not be connected), full knowledge about the domain for the purpose of institutional learning network structures. Cooper and Herskovits

this and it is a fine starting point for causal BN discovery. to (1992) assume that PBs is constant, that is, all net- le 3.3.2. Refining Bayesian Network Using the work structures are equally likely without further Ordered Logit Scoring Function. The problem of

ailab knowledge. This is a necessary assumption if we do v learning a Bayesian network can also be perceived as regarding a not allow the users to specify their own priors about a process of finding a class of B that best matches s the network structure. Therefore, we can omit the data D. To compare two BNs—BS1 and BS2—we need made component PBs when computing Equation (6). to test the ratio PBs1 D/PBs2 D given data D.By is questions computing such ratios for pairs of BN structures, we 3.3.2.2. Goodness-of-Fit Test Based on Ordered y Logit. A goodness-of-fit test is needed to compare

an can rank order a set of competing structures. Follow- which ing Bayes’ rules, we have competing structures. SEM techniques have various

send tests for the overall fitness of a structural model PB D PB D/PD PB D s1 = s1 = s1 (Gefen et al. 2000).19 Here, we develop a 2 test ersion, v PBs2 D PBs2 D/PD PBs2D based on the OL scoring function. Similar to SEM, we

Please assume that the null model is the measurement model . Therefore, it is possible to score the likelihood of a ance ··· Bayesian network B given D by computing the joint that has no paths among its LVs. Let Xx1 xm be a

site s

Adv = probability PBsD. set of m latent variables. In the null model, PBs, D in px ×px ···×px . Assume each variable x has r 3.3.2.1. The Ordered Logit (OL) Scoring Func- 1 2 m i i possible values x1 ···xri . Suppose there are n records.

author’s i i ticles tion. As pointed out in §3.1, none of the existing scor- Define nij to be the number of cases in the data D Ar

the ing functions is intended specifically for SEM data, j = in which variable xi has the value xi . Then, pxi especially for Likert-type data that are commonly r this i = n /n used in IS research. We develop a new scoring func- j 1 ij and the overall likelihood for the null to tion termed ordered logit (OL) specifically for ordinal model is: including ight , m ri and discrete data (Likert-type data). For a particular nij yr = = node x given a set of q parents , its conditional prob- L0 pBsDi (7) = = n

cop i 1 j 1 ebsite ability can be estimated by the following OL function: w q For a particular BN, we know from Equation (6) holds m n qi = + that L = = = OL + = . other Px/ OL ii (5) 1 i 1 j 1 ij k 1 ijk ijk

y i=1 an where represents the set of parameters of the inter- 19 SEM techniques use different statistical tests for overall goodness- on of-fit. For example, LISREL has 15 and AMOS has 25 different INFORMS cept and coefficient i by running an ordered logistic regression (Borooah 2002) with x as the depen- tests, the choice of which is debatable. However, we do not aim to develop a comprehensive list of tests here. We simply propose

posted dent variable and the q parents as the independent one overall goodness-of-fit test, assuming the null model is nested yright: be variables. The proposed OL function is derived in within the model of interest. Extensions to general criteria such as Appendix C. The joint probability of a Bs and data D AIC or BIC are straightforward, following the derived likelihood Cop not is computed as follows. function. Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 13 y .org. ma Given the two likelihoods, an appropriate goodness- the OL function) adds dependencies by considering ms file 2

or − − of-fit test is the test 2lnL1 L0, with degree of all possible single-edge additions. Once the greedy

The freedom as n minus the total number of local param- algorithm stops at a local maximum, a second-phase . n − eters estimated. We know there are i=1 ri 1 inter- greedy algorithm considers all possible single-edge n cepts ( and = q coefficients () to be estimated, deletions. The algorithm terminates after no signif- ibers i 1 i where ri is the possible number of values of xi and qi icant improvement can be further achieved in the missions@inf is the total number of parents of xi. second phase and the final graph is outputted. This subscr per Therefore, the degrees of freedom are given by: represents the method’s final output. to m m = − − − df n ri 1 qi (8) 4. Evaluating the Bayesian Networks policy i=1 i=1 for Latent Variables (BN-LV) institutional

this 2 to Using the test, we can determine if a particular Method le graph structure is significantly better than any com- The BN-LV method (Figure 3) integrates both the peting graphs (see Footnote 18). measurement model (via the LVI algorithm) and the ailab v

regarding structural model (via the OL scoring function). BN-LV a 3.3.2.3. Searching for the Best Structure. After takes the raw measurement items as inputs (Step 1, running the PC2 algorithm, we already have an ini- Figure 3) and it identifies the LVs that govern these made tial graph B . Our only task is to then refine the ini- s items using the LVI algorithm (Step 2, Figure 3). The is

questions tial BN graph and orient the undirected edges (e.g., value of the LVs is simultaneously computed by LVI y X–Y )inBs. According to Pearl and Verma (1991),

an  through formulation (4) (Step 3, Figure 3). Then, an which two graphs G and G are structurally different and initial graph is generated using the PC2 algorithm on distinguishable if and only if they have a different

send the identified LVs (Steps 4 and 5, Figure 3). The graph underlying undirected graph and at least one differ-

ersion, is refined based on the OL scoring function (Step 6, v ent V -structure (i.e., converging directed edges into Figure 3), and it then outputs the most likely causal

Please the same node, such as X → Z ← Y. From this theo-

.  BN graph (Step 7, Figure 3). ance rem, if G and G have the same undirected structure, The computational complexity of BN-LV is tested site Adv the only edges that must be directed are those that in Appendix D. Section 4.1 offers an illustrative exam- in participate in V -structures (also referred to as collid- ple to describe the step-by-step process of the BN-LV ers by Spirtes et al. 2000). Suppose we need to select author’s ticles → between two competing structures Bs1X Y with Figure 3 Flowchart of the Steps of the BN-LV Algorithm Ar

the → Bs2 (Y X to orient the direction for X–Y . We first 1. Input data X = (X1, X2, …, Xm) this need to investigate if the direction reversal yields dif-

to ferent V -structures. If it does, we must check if the

including likelihoods of the two structures are significantly dif- 2. Identify LV using algorithm LVI ight ,

yr ferent according to the Chi-square test, and we choose the one with the highest fitness score. cop ebsite 3. Compute latent scores for each LV

w However, a local change in one part of the net- work can affect the evaluation of a change in another holds

other part of the network, making the search for the opti- 4. Initiate a graph G among LVs (all connected and undirected) y mal structure NP-hard (Chickering 2002). Chickering an (2002) developed a greedy equivalence search (GES)

on 5. Apply algorithm PC2 on G to get Bs

INFORMS algorithm that is asymptotically optimal, and it is now considered as the best search algo-

posted rithm to date (Silva et al. 2006). This searching strat- 6. Refine Bs using OL function with GES searching strategy yright:

be egy is herein adopted. The main objective of GES is to reduce the search space. GES has two phases: First,

Cop not 7. Output the most likely causal graph it greedily (according to a scoring criterion such as Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables 14 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS y .org. ma Table 3 The Correlation Matrix for the Raw Measurement Items ms file

or SAT1 SAT2 Trust1 Trust2 Trust3 EOU1 EOU2 EOU3 USEF1 USEF2 USEF3 INT1 INT2 The

. SAT1 10 SAT2 0867 10 ibers Trust1 0725 0694 10

missions@inf Trust2 0695 0615 0819 10 subscr

per Trust3 0722 0676 0912 0869 10

to EOU1 0583 0532 0603 0658 0622 10 EOU2 0455 0377 0476 0551 0510 0720 10 EOU3 0558 0477 0569 0616 0561 0865 0864 10 policy

institutional USEF1 0613 0550 0604 0636 0675 0667 0718 0730 10 this to USEF2 0527 0425 0464 0476 0512 0541 0577 0620 0695 10 le USEF3 0545 0520 0665 0646 0649 0585 0472 0564 0602 0582 10 INT1 0310 0319 0431 0373 0413 0393 0312 0296 0392 0387 0497 10 ailab v

regarding INT2 0336 0355 0447 0397 0385 0381 0285 0306 0393 0361 0530 0773 10 a

made method using actual empirical data. Because of data constructs that are common across the two structural

is 20

questions complexity and space constraints, we can only pro- models—intentions (INT), usefulness (USEF), ease of

y vide details at a summary level for Stage 2 of the use (EOU), trust (Trust), and satisfaction (SAT).22 an which algorithm (Steps 4–7, Figure 3). Still, for a smaller 4.1.1. Identifying Latent Variables from Mea- artificial data set, we show the detailed calculations

send surement Items. In Stage 1 (the first three steps in behind Stage 2 of the BN-LV method as a tutorial (see ersion, Figure 3), BN-LV identifies the LVs from the measure- v Footnote 18). ment items. The raw data are the 13 measurement Please

. items associated with the 5 LVs (Pavlou 2003). Each ance 4.1. Illustrating the BN-LV Method Using item is measured on a seven-point Likert-type scale. site Empirical Data to Test Competing Adv The correlation matrix of the 13 measurement items in Causal Models is shown in Table 3 with a significance level at 0.162 We describe BN-LV by illustrating how it can help (p<005 level, n = 151). author’s ticles reconcile competing structural models using actual Following the flowchart of LVI (Figure 2), L is

Ar 1

the empirical data. The step-by-step illustration demon- first initiated and consists of 13 item sets {S1}, {S2}, strates how BN-LV can test competing hypotheses this {T1} and so on (abbreviation of each item is used).

to in terms of the direction of causality between trust L is then generated based on the weak constraint and ease of use when integrating the classic TAM 2 including ight

, that the pairwise correlation should be moderate model with the construct of trust (Figure 5).21 Specifi- yr (i.e., >0.50), following condition C-1. This is done by cally, Pavlou (2003) argued and showed trust to influ- cop

ebsite checking with the correlation matrix (Table 3). Fifty- ence ease of use while Gefen et al. (2003) argued w three such item sets meet the condition, including {S1, and showed the opposite direction of causality. To holds S2}, {S1, T1} and so on. L needs to satisfy both make the comparison meaningful, we selected five 3 other conditions (C-1 and C-2) for any three-item combina- y

an tion. This yields five item sets in Table 4 together with 20 Theoretically, there are in total 29,000 possible structures that

on the weight vector and the MinMax(R)—the objective need to be evaluated for 5 constructs. INFORMS function in formulation (4). LVI then determines that 21 According to TAM (Davis 1989), perceived usefulness (USEF) is the extent to which a user thinks that using a system will enhance posted her job performance (Davis 1989, p. 320). Perceived ease of use 22 Because the two studies (Gefen et al. 2003, Pavlou 2003) included yright: be (EOU) is the extent to which a user thinks that using the system different control variables, though Pavlou (2003) also included per- will be effortless (Davis 1989, p. 321). These two constructs predict ceived in the structural model, the comparison was made for Cop not a user’s intention (INT) to use the system in the workplace. only these five constructs. Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 15 y .org. ma Table 4 The Weight Vector and MinMaxR Values Pavlou (2003) proposed that Trust → EOU. Both stud- ms file

or L3 Weights MinMax(R) ies provide compelling theoretical justifications. In

The such cases, Carte and Russell’s (2003, p. 487) solution . {T1, T2, T3} 015 012 073 0.133 that seeks theoretical justification may be of little help. {E1, E2, E3} 009 016 075 0.106

ibers {U1, U2, U3} 084 007 009 0.112 In contrast, BN-LV offers a “let data speak” approach {T2, T3, U1} 004 084 012 0.118 to reconcile competing structural models. We exam- missions@inf {E1, E3, U1} 004 087 009 0.103 ine the interrelationships among the five LVs—Trust, subscr per EOU, USEF, INT, SAT—for the two competing mod- to 23 there is no L4 that meets C-2 (the best candidate is els (Figure 5). The data support the case of Trust → = {T1, T2, T3, U1} with MinMax(R) 0.194). In sum, LVI EOU. The likelihoods are −908.8 for Pavlou (2003) policy completes generating item sets, resulting in 5 item sets and −952.3 for Gefen et al. (2003). The Trust → EOU institutional in L and 53 item sets in L . this 3 2 direction of causality thus improves the likelihood by to Next, LVI prunes all those item sets as follows: First,

le 43.5. The degrees of freedom for the Chi-square test it identifies the supersets of these 53 item sets in L2 (Equation (8)) is 6 between the 2 graphs (Figure 5)

ailab and drops those subsets correspondingly (51 of them). 24

v and the critical value is 14.5 (p<005). The improve- regarding a Only two item sets in L2 remain: {S1, S2} and {I1, I2}. ment of Pavlou’s (2003) model over the model of We then identify the overlapping items in L . Notice 3 Gefen et al (2003) is significant and distinguishable.

made that item U1 loads on three item sets (last three rows) Though we cannot obviously draw definite conclu- is (Table 4), which suggests that it is a potentially prob- questions sions from one data set, this example illustrates how y lematic item. LVI recommends keeping U1 with the BN-LV can empirically reconcile between competing an

which set {U1, U2, U3} because it affects this item set the structural models in terms of the direction of causality most (dropping it would lead to a weak L2 of {U2, U3} 25

send in certain relationships. with a correlation of 0.582). The other two sets {T2, ersion,

v T3, U1} and {E1, E3, U1} are then dropped. Therefore, 4.2. Simulation to Evaluate LVI eventually outputs the final disjoint item sets {S1, Please the BN-LV Method Relative to . ance S2}, {I1, I2}, {T1, T2, T3}, {E1, E2, E3}, and {U1, U2, U3}. Competing Techniques

site The corresponding latent scores are computed using Adv We further systematically evaluate the BN-LV method

in the optimal weights in Table 4 for L and equal weight 3 using a simulation experiment. The data generat- of 0.5 for L . 2 ing process (DGP) is first described, followed by author’s ticles 4.1.2. Constructing the Most Likely Graph Ar the Among the Identified LVs. Stage 2 in BN-LV (corre- 23 SAT (satisfaction) is a control variable antecedent of Trust in

this sponding to Steps 4–7 in Figure 3) discovers the most Pavlou (2003). Our simulation study excluded this control variable to likely graph among the five LVs. It first initiates a for simplicity, but it is necessary to have SAT here in the empirical

including fully connected and undirected graph that connects data to ensure that the two graphs are distinguishable. In graph (b), ight , Figure 5, the node Trust is a collider case because SAT → Trust ←

yr any two LVs (Step 4, Figure 3). Step 5 in Figure 3 applies algorithm PC2 and generates the graph (left EOU but it is not a collider case in graph (a), Figure 5. cop ebsite panel of Figure 4). Note that PC2 fails to identify any 24 The only difference between the two graphs in Figure 6 is “Trust” w causal directions for these links. Step 6 of Figure 3 and “EOU.” In the left graph, both “Trust” and “EOU” have one holds − + = applies the proposed OL scoring function with the parent. Each node needs 7 1 1 7 parameters (see Equa- other tion (8)) and, in total, 14 parameters are needed for them. In the y GES search strategy to identify the directionality of right graph, “Trust” has two parents and “EOU” has none. There- an these links. The result shown (right panel of Figure 4) fore, it needs 7 − 1 + 2 = 8 parameters. The left graph needs six

on closely corresponds to Pavlou’s (2003) structural more parameters. INFORMS model for these five LVs. 25 We acknowledge that the two studies proposed different struc- tural models with multiple additional constructs whereas our anal-

posted 4.1.3. Using BN-LV to Reconcile Competing ysis only includes these five constructs. Therefore, we do not make yright: be Hypotheses on Causal Links. Gefen et al. (2003) pro- any claims about the validity of the original findings because the posed a different structural model in which the direc- results may have been influenced by the other constructs that are Cop not tion between EOU and Trust is EOU → Trust while not included in this example. Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables 16 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS y .org. ma Figure 4 Graphs Generated by the Proposed BN-LV Method ms file

or Graph after applying the PC2 algorithm (Step 5, Figure 3) Graph after applying the OL scoring function (Step 7, Figure 3) The . INT INT ibers USEF EOU USEF EOU missions@inf subscr per SAT Trust SAT Trust to

policy the evaluation of the BN-LV method’s two core First, we generated the exogenous construct Trust institutional components—the LVI algorithm for identifying the with a N4 3. We attempted to this to LVs, and the OL scoring function for discovering the be consistent with Pavlou (2003) who used a 7-point le optimal graph structure (most likely causal BN). Likert-type scale with mean = 4. The variance was chosen to be 3, such that 95% of times the simulated ailab 4.2.1. The Data Generating Process (DGP) for the v regarding a Simulation Experiments. The blueprint graph struc- values fall within the range 06 74. The endoge- ture for our simulated data comes from Pavlou (2003, nous LVs were simulated using the path coefficients made p. 90). The structural model (Figure 6) depicts the (Figure 6). For instance, the only parent of EOU is is Trust with a path coefficient 0.64. We thus generated

questions constructs that are hypothesized to affect consumer

y = intentions to transact online. We only selected the the value of EOU from the linear equation EOU

an + × + which study’s principal constructs—Trust, Risk, EOU, USEF, 144 064 Trust e, where the intercept 1.44 was and INT—while the control variables were omitted chosen for the mean of EOU to be 4. The noise term

send 2 for simplicity. In particular, Trust is deemed exoge- e follows N0e . Noise e was varied by three ersion, v nous while EOU, USEF, Risk, and INT are deemed levels: low, medium, and high, which were instanti-

Please endogenous (Figure 6). ated as 0.3, 0.6, and 0.9, respectively. The same pro- . ance We simulated the data according to the above theo- cedure was used for the other principal constructs in

site Figure 6.

Adv retical structure, following the DGP specified in Silva

in et al. (2006) and Spirtes et al. (2000, p. 114). The DGP We simulated four indicators per LV (LVI algorithm was composed of three steps: requires three indicators per LV while LISREL recom- author’s ticles Step 1. The exogenous variables were first indepen- mends four). Each indicator was simulated as the LV Ar

the dently generated following a normal distribution. plus an error term i. Likewise, i followed a nor- 2 Step 2. Values of the endogenous variables were mal distribution N0i while i has 3 levels: 0.3, 0.6, this and 0.9. We then converted the indicator values into to then generated as a linear function of their parents with a normally distributed error term . 7-point Likert scales—1 for the simulated value below

including e ight , Step 3. Values of the indicators were generated 1.49, 2 for the value within the range 1.5–2.49, and so yr directly from each of their corresponding latent on. We argue that this is consistent with the subjects’ cop ebsite variables, adjusted by a normally distributed noise actual responses to survey questionnaire items, where w term i. each Likert anchor reflects a range of values. holds other

y Figure 5 Competing SEM Models for Integrating TAM with Trust

an (a) Pavlou (2003), log l = – 908.8 (b) Gefen et al. (2003), log l = –952.3 on INFORMS INT INT

posted USEF EOU USEF EOU yright: be

Cop not SAT Trust SAT Trust Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 17 y .org. ma Figure 6 The Blueprint Graph Structure for the Simulated Data use multiple runs to average out the randomness that ms file (Adapted from Pavlou 2003) or arises from the normally distributed error terms, we

The Perceived risk used 5 runs for each of the 15 combinations, resulting

. –0.63 –0.11 in a total of 75 data sets. The total number of data sets Trust 0.35 Intentions 0.41 0.33 to transact was limited by the manual data analysis procedures ibers Usefulness 0.64 in LISREL and PLS. Therefore, we only examined one

missions@inf 0.49 0.12 level of noise (medium noise) for the nonnormality subscr per Ease of use and nonlinearity cases and we used only five runs for to each combination. 4.2.2. Manipulation of Experimental Dimensions. 4.2.3. Measurement Model Comparison. We com- policy In this experiment, we simulated the data across pared the LVI algorithm with the two commonly used institutional

this four dimensions: sample size, noise, normality, and methods for measurement model testing: EFA and to

le linearity. CFA. For EFA, we used the principal components Sample Size. We simulated three sample sizes—50, factor analysis (Proc Factor in SAS 9.1) using the ailab v 250, and 1,000—to represent small, medium, and large Eigenvalue >1 criterion. For CFA, we used the CFA regarding a sample sizes, respectively. In SEM, as a rule of thumb, procedure in PLS Graph 3.0. Note that this compari- sample sizes below 100 are considered small, between son is more generous to the CFA method because it made 200–300 are considered moderate, and >500 are con- takes more inputs (the number of factors) than the is

questions sidered large (Gefen et al. 2000). LISREL is sensitive to exploratory LVI and EFA. y small sample sizes while PLS can handle small sam- We ran each method on the 75 simulated data sets. an which ple sizes with bootstrapping (Chin 1998). Each data set consisted of 20 indicators (4 measure- Noise. Because noise affects data quality and the

send ment items for each of the 5 constructs). Dayton and structural models built with noisy data, we varied the Macready (1988) proposed using omission and intru- ersion, v noise levels of both e and i from low (0.3), medium sion error rates to evaluate the results of measure-

Please (0.6), to high (0.9). ment model testing (factor analysis). The omission . ance Normality. SEM assumes data normality (Gefen et al. error rate is the percentage of manifest items that are site Adv 2000). To examine how violation of this assumption not included in any LV. Intrusion is the error rate in affects model performance, we simulated the exoge- that manifest items that are misassociated with cer- nous construct Trust from a uniform distribution tain LVs. Spirtes et al. (2000) and Silva et al. (2006) use author’s ticles between 0.5 and 7.5. Note that a nonnormal Trust also similar metrics, termed omission rate and commission Ar the renders the other constructs nonnormal because the rate, respectively (the percentage of LVs not specified this other constructs are endogenous. in the true measurement model). These metrics result to Linearity. SEM assumes linear relationships among in four evaluation criteria:

including LVs while the BN literature does not make this 1. Latent Omission LO. The error rate associated ight , yr restrictive assumption. We simulated the endogenous with omitted LVs. It is computed as the number of variables to follow an exponential function of their cop true LVs that are not identified by the method under ebsite

w parents plus a normal error term. We used a cumula- investigation, divided by the total number of true LVs tive exponential distribution with parameter set to (five in our simulated study). holds

other be equal to the path coefficients. For example, EOU 2. Latent Commission LC. The error rate associated y = + × − − × + was EOU 05 7 1 exp 064 Trust e with with misidentified LVs. It is computed as the number an the scale parameters (0.5 and 7) chosen to ensure a of LVs that are identified by the method (however, on

INFORMS mean of 4. not the true LVs), divided by the total number of true These 4 dimensions yielded a total of 15 combina- LVs.

posted tions. There were five scenarios: three noise levels for 3. Indicator Omission IO. The error rate associated yright:

be the normal and linear data, one level of nonnormal- with missing indicators (items). It is computed as the ity, and one level of nonlinearity. Each scenario had number of items that are in the true measurement Cop not 3 sizes: 50, 250, and 1,000. Because it is customary to model but do not appear in the measurement model Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables 18 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS y .org. ma Table 5 Measurement Model Comparison Results ms file

or Simulated dimensions LVI EFA (SAS Proc Factor) CFA (PLS) The

. Size Noise Normality Linearity LO LC IO IC LO LC IO IC LC IC

1,000 Low Yes Yes 000006040604068 059 ibers 1,000 Medium Yes Yes 0360036 0 06040604036 04

missions@inf 1,000 High Yes Yes 0640064 0 036 016 036 016 0202

subscr 1,000 Medium No Yes 0480048 0 06028 06028 068 054 per 1,000 Medium Yes No 040040028 004 028 005 028 014 to 250 Low Yes Yes 0 0120006 0604060408082 250 Medium Yes Yes 026 008 026 004 052 032 052 032 036 012

policy 250 High Yes Yes 0440044 0 032 022 032 02008 004 institutional 250 Medium No Yes 016 024 016 011 052 056 052 056 06031 this to 250 Medium Yes No 024 016 024 006 028 008 028 008 008 003 le 50 Low Yes Yes 012 028 012 022 056 024 056 028 076 085 50 Medium Yes Yes 012 012 012 006 036 024 036 02044 021 ailab 50 High Yes Yes 016 028 016 018 028 02028 017 032 018 v regarding a 50 Medium No Yes 008 034 008 013 048 02048 017 088 038 50 Medium Yes No 024 004 024 002 032 02032 014 036 013

made Average 025 011 025 006 045 0263 045 025 046 033 is questions y

an generated by the method under investigation, divided does not load onto any LV, IO is detected. Finally, if which by the total number of items in the true measurement an LV in the true measurement model is not identified

send model (20 in our simulated study). by the method, we detect an LO.

ersion, 4. Indicator Commission IC. The error rate associ- Table 5 presents the summary results for LVI, EFA v ated with misclustered indicators (items). It is computed (SAS Proc Factor), and CFA (PLS) for an average over Please

. as the total number of items generated by the method five runs. Column 1 indicates the sample size. Col- ance under investigation that are misclustered under their umn 2 indicates the noise level (low, medium, or site Adv nonhypothesized LVs. high). Column 3 indicates whether the data is gener- in These four criteria can be readily computed by the ated from a normal distribution and Column 4 indi- LVI algorithm that directly outputs the LVs and the cates whether there is a nonlinearity, as described author’s ticles associated items. However, the LO and IO criteria are above. Table 5 shows that on average the LVI algo- Ar the not applicable to the PLS CFA because the true num- rithm outperforms the EFA and CFA methods. As the this ber of LVs is already prespecified. Also, because the sample size increases, the LVI error shifts from com- to SAS Proc Factor and the PLS CFA output LVs with the mission to omission. PLS CFA is the least sensitive to

including loadings of each item associated with the LVs, there is sample size. Noise has a negative effect on the perfor- ight , yr no consensus as to what determines a good LV given mance of the LVI algorithm. Nonnormality affects the

cop its loadings. A common guideline for ensuring dis- CFA the most (e.g., the LC rate goes from 0.36 to 0.68 ebsite

w criminant validity is that the loading of an item on its for n = 1000). However, nonlinearity does not appear hypothesized LV to be reasonably high (e.g., >0.70) to have a clear impact on the three methods. holds

other while the item loadings on the other LVs should be We ran two repeated measure ANOVA analyses: y substantially smaller (e.g., <0.40) (Gefen et al. 2000). One between the LVI and the EFA (Table 6) and one an A conservative rule of thumb is for the difference between the LVI and the CFA (Table 7) to examine the on

INFORMS between the hypothesized and nonhypothesized indi- role of the four simulated dimensions (sample size, cators to be at least 0.2. If this rule is violated and an noise, normality, and linearity).

posted item loads on more than one LV, we detect the indica- Table 6 shows that the LVI is significantly supe- yright:

be tor as IC. If any indicators from different LVs load into rior to the EFA in terms of LO, LC, and IC a single LV, we detect LC because the method does (p-value < 005) and marginally significant in terms of Cop not not discriminate among these items. If an indicator IO (p-value = 0051) (within subjects). The between Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 19 y .org. ma Table 6 ANOVA Results Between LVI and EFA (SAS Proc Factor) on the Dirichlet assumption of the data (Heckerman ms file

or Comparison Source Measure F -value Significance (p-value) 1996) and the Gaussian approach that assumes data

The normality (Glymour et al. 1987). The graph (structural . Within LVI vs. EFA LO 4188 0045 model) generated by the five methods is compared LC 5415 0023 against the prespecified graph (Pavlou 2003) on which ibers IO 3947 0051 IC 6075 0016 the data was simulated. Following Spirtes et al. (2000), missions@inf Between Sample size LO 27067 0000 we use three comparison criteria: subscr per LC 5178 0008 1. Path Omission PO. The error rate associated to IO 28189 0000 with omitted paths (links). It is computed as the num- IC 2243 0 114 ber of paths that are in the true structural model policy Noise LO 3066 0053 (graph) but were not identified by the method under

institutional LC 10858 0000

this IO 3122 0050 investigation, divided by the total number of paths to IC 5417 0007 (eight in our simulated study) in the underlying struc- le Nonnormality LO 0282 0597 tural model (Pavlou 2003). ailab LC 2516 0 117 2. Path Commission PC. The error rate associated v regarding a IO 0287 0 594 with misidentified paths. It is computed as the number IC 1356 0 248 of paths that are identified by the method but do not Nonlinearity LO 3785 0056 made appear in the true model, divided by the total number LC 11362 0001 is

questions IO 4588 0036 of paths in the true model. y IC 11503 0001 3. Path Misdirection PM. The error rate associated an which with misdirected paths. It is computed as the number of misoriented directions of causality as opposed to send subjects comparison shows the LVI to be generally the true structural model, divided by the number of ersion, superior to the EFA in terms of sample size, noise, and v nonlinearity but not in terms of nonnormality. true directions (eight in our study). Please We further separated the comparison into confir-

. Table 7 shows the comparison between LVI and ance CFA. The within-subjects comparison shows that the matory (PLS and LISREL) (Table 8) and exploratory site Adv LVI algorithm significantly outperforms CFA in terms results (BN) (Table 9). in of both LC and IC. The between-subjects comparison Table 8 Structural Model Comparison Between BN-LV with PLS and

author’s also shows that the LVI outperforms the CFA on vir- ticles tually all accounts except under sample size for LC. LISREL (Confirmatory Mode) Ar the Simulated dimensions BN-LV PLS LISREL 4.2.4. Structural Model Comparison. This section this Size Noise Normality Linearity PO PC PO PC PO PC to compares BN-LV with four methods: PLS, LISREL, and two BN methods—the BD-metric approach based including 1,000 Low Yes Yes 0.05 0.00 0.00 0.15 0.08 0.00 ight , 1,000 Medium Yes Yes 0.05 0.00 0.00 0.10 0.05 0.00 yr Table 7 ANOVA Results Between LVI and CFA (PLS) 1,000 High Yes Yes 0.00 0.00 0.08 0.05 0.13 0.00 cop

ebsite 1,000 Medium No Yes 0.05 0.00 0.00 0.05 0.08 0.02

w Comparison Source Measure F -value Significance 1,000 Medium Yes No 0.00 0.15 0.03 0.20 0.08 0.13 250 Low Yes Yes 0.08 0.03 0.10 0.08 0.15 0.00 holds Within LVI vs. CFA LC 41789 0000 250 Medium Yes Yes 0.03 0.00 0.08 0.00 0.20 0.00 other IC 57274 0000 y 250 High Yes Yes 0.00 0.03 0.23 0.05 0.25 0.08

an Between Sample size LC 0611 0 546 250 Medium No Yes 0.08 0.00 0.10 0.03 0.18 0.00 IC 3885 0025 250 Medium Yes No 0.03 0.00 0.20 0.13 0.18 0.08 on

INFORMS Noise LC 27521 0000 50 Low Yes Yes 0.15 0.00 0.33 0.00 0.38 0.00 IC 48402 0000 50 Medium Yes Yes 0.13 0.00 0.30 0.03 0.35 0.00 50 High Yes Yes 0.15 0.00 0.33 0.05 0.43 0.00

posted Nonnormality LC 14505 0000 50 Medium No Yes 0.10 0.00 0.30 0.00 0.43 0.00

yright: IC 9009 0004 be 50 Medium Yes No 0.18 0.00 0.30 0.05 0.53 0.00 Nonlinearity LC 14505 0000

Cop not IC 4294 0042 Averages 0.07 0.01 0.16 0.06 0.23 0.02 Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables 20 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS y .org. ma Table 9 Structural Model Comparison Between BN-LV with Dirichlet Table 10 ANOVA Results Between BN-LV with Dirichlet and Gaussian ms file and Gaussian Methods (Exploratory Mode) Methods (Exploratory Mode) or

The Common BN BN-LV BN BN BN (Dirichlet) BN (Gaussian)

. Simulated dimensions error rates (OL) (Dirichlet) (Gaussian) Source F -value Significance Source F -value Significance Size Noise Normality Linearity PO PC PM PM PM ibers 1,000 Low Yes Yes 005 0 028 055 038 Within Dirichlet 110454 0.000 Gaussian 53391 0000

missions@inf 1,000 Medium Yes Yes 025 0 023 043 028 Between Size 30853 0.000 Size 20847 0000 1,000 High Yes Yes 018 0 020 050 050

subscr Noise 5577 0.006 Noise 1149 0323 per 1,000 Medium No Yes 020013 038 045 Normality 0799 0.374 Normal 1077 0303

to 1,000 Medium Yes No 003 015 018 033 053 250 Low Yes Yes 02003 023 045 048 Linearity 21586 0.000 Linear 0129 0720 250 Medium Yes Yes 030020 040 035 250 High Yes Yes 025 003 020 033 030 policy 250 Medium No Yes 028 0 020 038 035 4.2.5. Discussion of Simulation Results. Based institutional 250 Medium Yes No 030018 013 030 this

to 50 Low Yes Yes 053 0 015 018 018 on the simulated results, BN-LV has certain advan- 50 Medium Yes Yes 040015 033 020 le tages over PLS and LISREL under the following 50 High Yes Yes 041 0 008 013 015 50 Medium No Yes 030020 020 023 conditions. ailab 50 Medium Yes No 030008 010 007 v Sample Size. The BN-LV method performs consis- regarding a Averages 018 032 031 tently better than PLS and LISREL as sample sizes increase (PO errors decrease and PC errors stay made BN-LV operates in both a confirmatory and an low). LISREL substantially improves with larger sam- is 26 questions exploratory mode while PLS and LISREL operate ple sizes while PLS shows little improvement from y solely in a confirmatory mode. Therefore, alternative medium to large sample sizes. BN-LV is clearly pre- an which BN approaches (BD-metric and Gaussian) that operate ferred for small sample sizes. Taken together, BN-LV in an exploratory mode must be used for a complete

send is generally superior to PLS and LISREL across the comparison with BN-LV. For the confirmatory mode,

ersion, spectrum of sample sizes 50 250 1000. v a link is considered missing if a hypothesized path is Noise. In terms of noise, PLS and LISREL are shown Please not significant (PO error). A PC error occurs if non- to generally perform well for high noise levels, con- . ance hypothesized link is significant. The PM error is irrel- sistent with Fornell and Larcker (1981) who find site Adv evant because the true direction of the causal links that SEM model fitness based on structure consis- in is already prespecified. Table 8 shows that on aver- tency may improve as both the model and the theory age BN-LV outperforms both PLS and LISREL with decline. BN-LV turns out to be more sensitive to high author’s ticles respect to the PO and PC error rates. data noise, especially for the measurement model. Ar the Table 9 compares the three BN methods. To make Linearity. When the true relationship among LVs is the results consistent, we only apply the three scoring this not linear, BN-LV is shown to be superior to PLS and

to functions to orient directions after an initial graph is generated by algorithm PC2. Therefore, the only rel- including

ight Table 11 ANOVA Results Between BN-LV with PLS and LISREL , evant criterion is the PM error rate. Table 9 shows yr (Confirmatory Mode) that on average our OL function outperforms both the cop PLS LISREL ebsite Dirichlet and the Gaussian functions. w Tables 10 and 11 report the ANOVA results with Error Error Source rate F -value Significance Source rate F -value Significance holds repeated measures. The results demonstrate that other Within PLS PO 13595 0.000 LISREL PO 53852 0.000 y BN-LV statistically outperforms the two competing PC 8686 0.004 PC 0881 0.351 an SEM methods in both the confirmatory and in the Between Size PO 78860 0.000 Size PO 81377 0.000

on exploratory mode. PC 8151 0.001 PC 2730 0.072 INFORMS Noise PO 1289 0.282 Noise PO 0569 0.569 26 BN-LV can also be used for confirmatory analysis but the error PC 0631 0.535 PC 0610 0.546 posted rates can be different from the exploratory case. The PO error is Normality PO 0158 0.692 Normality PO 0506 0.479 yright: PC 0199 0.657 PC 0145 0.865 be detected by using d-separation conditions (PC2 algorithm) on the prespecified graph if a link is d-separated by other prespecified Linearity PO 1421 0.237 Linearity PO 2560 0.114 Cop not PC 12755 0.001 PC 13791 0.000 links. Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 21 y .org. ma LISREL. This is expected because BN-LV explicitly (a) allowing the identification of multi-item LVs with ms file

or allows nonlinearities to emerge in the relationships the proposed LVI algorithm and by (b) allowing dis-

The among principal constructs. The main impact of non- crete and ordinal data in BN with the proposed OL . linearity on PLS and LISREL is the PC error in scoring function. Third, it contributes to the SEM lit- the structural model. For example, for sample size erature by addressing three key limitations of exist- ibers n = 1000, the average PC error for PLS doubles from ing SEM methods (Lee et al. 1997)—identifying causal missions@inf 0.10 to 0.20 (other dimensions held constant). links in the structural model, identifying the measure- subscr per Normality. In terms of nonnormality, BN-LV is not ment and structural models in an exploratory manner, to affected when the data violate the normality assump- and allowing nonlinearities. tion in the measurement model or in the structural

policy model. In contrast, the CFA turns out to be very sen- 5.1. Implications for Empirical IS Research institutional The first contribution is to develop a comprehen- this sitive to data normality. The effect of nonnormality on to LISREL appears to interact with sample size. For large sive (measurement model construction and structural le sample sizes, nonnormality has a negative impact on model discovery) data analysis method for inferring

ailab LISREL (PO error increases from 0.05 to 0.08); for causal relationships among constructs, using observa- v regarding a medium sample size, PO error decreases from 0.2 to tional, cross-sectional data that are discrete and ordi- 0.18. However, nonnormality does not considerably nal. In fact, the majority of empirical studies in the IS

made affect PLS for the structural model (Chin 1998). There- literature use this type of data in Likert-type scales. is

questions fore, BN-LV is clearly superior to LISREL but per- In terms of the measurement model, the proposed

y forms comparably to PLS. LVI algorithm has several advantages over competing an which The results for the structural model show that both methods. First, in contrast to common factor analysis PLS and LISREL make high omission errors in general techniques that rely on “rule-of-thumb” heuristics send while PLS commits the highest PC error rate. There- and approximate solutions, LVI offers an exact solu- ersion, v fore, BN-LV outperforms PLS and LISREL both in the tion to the measurement model by categorizing all

Please PC and the PO error rate. measurement items into LVs. Second, in contrast to . ance The simulation study also highlighted the difficulty CFA methods that impose a certain structure on site

Adv in manually testing the measurement and the struc- the data, LVI operates in an exploratory mode, thus in tural model for 75 data sets in PLS and LIREL (each allowing the data to “speak out” and be categorized data set required about 30 minutes to calculate the under the most likely LVs. Because BN-LV does not author’s ticles measurement and structural model). In contrast, the require IS researchers to prespecify which measure- Ar the automated nature of BN-LV greatly facilitated model ment items should belong to each LV, it allows them estimation (about few seconds per data set). Thus, this to explore how new measurement items could be

to when there is a need for automating the process of classified into new LVs. By identifying problematic exploring multiple causal structures, particularly for including items (those that cannot be categorized under LVs), it ight , complex models that prohibit a manual specification yr allows IS researchers to reevaluate potentially prob- of all possible structural models, BN-LV is clearly lematic such items. Most important, the LVI algorithm cop ebsite superior to PLS and LISREL. w directly tests the fundamental axiom of conditional independence, thus allowing a causal interpretation to holds

other 5. Discussion the relationship between the LVs and their identified y This study contributes to and has implications for measurement items, consistent with the principles of an the following literatures: First, it contributes to the the psychometric theory of measurement. on

INFORMS empirical IS literature and the social and behav- In terms of the structural model, BN-LV tests ioral sciences in general by proposing a new data the d-separation conditions and uses the proposed

posted analysis method for inductively identifying LVs from OL scoring function to generate the most likely yright:

be raw measurement items and inferring the most causal Bayesian network. This allows the inference likely causal structural model among the identified of causality in structural models without imposing a Cop not LVs. Second, it contributes to the BN literature by prespecified structure. By operating in an exploratory Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables 22 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS y .org. ma mode, BN-LV automatically examines all plausible distributional assumptions. More important, the LVI ms file

or structural models and selects the most likely one. algorithm uses the axiom of conditional independence

The This provides a major advantage over competing SEM as its building block. To our knowledge, this renders . methods that require manual specification of plau- the proposed LVI algorithm as the only approach con- sible models, especially for complex models where sistent with the theory of measurement: it offers a ibers such manual work becomes virtually impossible. This causal interpretation to the measurement model by missions@inf advantage becomes valuable where there is little or specifying directional links from the measurement subscr per no prior theory or when IS researchers want to rely items to the LVs. to purely on data. This has implications for new theory Second, BN-LV extends the BN literature to allow development (where there is no existing theory basis), the use of ordinal and discrete data. The proposed policy which is particularly common in IS research because OL scoring function overcomes this long-held limita- institutional tion. Moreover, our simulation results show that OL this of the rapid evolution of IT and the introduction of to new IT systems. outperforms the two state-of-the-art BN approaches— le Finally, as a “conditional probability” method, BN- the Bayesian Dirichlet and the Gaussian metrics—for ailab LV fundamentally differs from existing data analysis ordinal and discrete data. v regarding a tools that rely on the correlation or covariance matrix. 5.3. Implications for Structural Equation Because conditional probabilities can help infer causal- Modeling (SEM) Research made ity (e.g., Druzdzel and Simon 1993, Shugan 1997), the As reviewed earlier, SEM methods no longer claim to is

questions BN-LV method provides IS researchers with another

y infer causality, though they were originally designed tool to infer causality from observational data.

an to model causal relationships. In fact, Pearl (2000) which 5.2. Implications for the Bayesian notably observed: send Networks Literature I believe that the causal content of SEM has been ersion, allowed to gradually escape the consciousness of SEM v Despite the touted potential of Bayesian networks practitioners mainly for the following two reasons:

Please to facilitate research in the IS literature (Lee et al. (1) SEM practitioners have sought to gain respectabil- . ance 1997), existing BN methods have two key limitations ity for SEM by keeping causal assumptions implicit,

site since statisticians, the arbiters of respectability, abhor

Adv that preclude their application to empirical SEM stud- such assumptions because they are not directly testable in ies: (1) They cannot readily handle LVs measured with multiple measurement items,27 and (2) they are and; (2) The algebraic, graph-less language that has dominated SEM research lacks the notational facil- author’s ticles not suitable for discrete and ordinal data such as ity needed for making causal assumptions, as dis- Ar the those obtained from Likert-type scales (which are tinct from statistical assumptions, explicit. By failing to both prevalent in IS research). The proposed BN-LV equip causal relations with distinct mathematical nota- this tion, the founding fathers in fact committed the causal to method overcomes these limitations. First, our proposed LVI algorithm provides a gen- foundation of SEM to oblivion. Their disciples today including

ight are seeking foundational answers elsewhere (p. 209). , eral method that allows the identification of “hid- yr den” LVs from raw measurement items through an In contrast, BN-LV takes advantage of condi- cop ebsite optimal weighting method that maximizes condi- tional probabilities to encode directional relationships w tional independence. Also, the LVI algorithm does among LVs, thus permitting a causal interpretation in holds not impose a certain prespecified structure on SEM models. This helps overcome the limitation of other

y the measurement model, nor does it make any SEM methods to infer causality.

an Most SEM studies specify one model structure

on and use data to confirm this specific structure, 27 An exception is the work of Spirtes et al. (2000, p. 264) in their INFORMS 28 multiple indicator model building (MIMBuild) algorithm, which thus operating in a confirmatory mode. Diligent starts with a certain mix of LV and measurement items in a linear posted system. The MIMBuild algorithm identifies impure measurement 28 To the best of our knowledge, no SEM techniques allow re- yright: be items for certain LV. In contrast, the BN-LV method starts with only searchers to automate the process of examining alternative models. raw measurement items and it does not assume any of the relation- Manual examination of alternative models becomes virtually impos- Cop not ships among the LV to be linear. sible for complex models with multiple LV and measurement items. Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 23 y .org. ma researchers are supposed to explore plausible alter- small sample size. However, SEM approaches are less ms file

or native models and the lack of an automated method sensitive to high data noise.

The for exploring alternative models potentially overlooks Therefore, BN-LV can be used under these conditions . equivalent models (Chin 1998). BN-LV helps over- to complement existing SEM methods. come this limitation by generating an equivalent class ibers of graphs among LVs based on d-separation tests, 5.4. Limitations and Suggestions for missions@inf scoring each candidate graph using the proposed OL Future Research subscr per scoring function, and selecting the one with the high- The paper also has a number of limitations, which cre- to est score. In doing so, BN-LV allows IS researchers to ate some interesting opportunities for future research. operate in an exploratory mode and allows the data to First, because this paper focuses on observational policy inductively identify the most likely structural model. (cross-sectional, nonexperimental data), we address institutional only two of Popper’s (1959) conditions for inferring this SEM encodes relationships among LVs as linear to equations, thereby ignoring potential nonlinear rela- causality (correlation between X and Y and account- le tionships. In contrast, BN-LV uses conditional prob- ing for potential confounds), excluding the condi- ailab abilities that do not assume any functional (linear) tion that X must temporally precede Y . Following v regarding a form, thus allowing nonlinear relationships to emerge Granger (1986) who distinguishes between temporal among LVs in the structural model. Accounting for and cross-sectional causality, our method does not made nonlinearities is an important strength of the BN-LV claim to create the necessary and sufficient conditions is

questions method. Our simulation study corroborates this by for inferring “absolute” causality, but it is posited as y showing the superiority of BN-LV for data with non- a method of inferring “near” causality from obser- an which linearities. vational, nonexperimental data. However, if temporal In sum, BN-LV has certain advantages over exist- ordering among variables is already known from the send ing SEM data analysis methods under the following data, the BN-LV method can sort all variables in a ersion, v conditions: temporal order and add a constraint to only allow the

Please 1. In the early stages of research when a hypothe- preceding variables to cause the subsequent variables . ance sis has not yet been developed, particularly when a when constructing the BN. This approach will per- site

Adv researcher prefers to let data “speak by themselves” mit longitudinal or experimental data to be used in in as opposed to testing a prespecified measurement or the BN-LV method, and the temporal constraint will structural model. also greatly reduce the complexity of the BN construc- author’s ticles 2. When theory and the literature provide little tion. The challenge for future research, however, is to Ar the guidance on the causal structure of a structural model, identify the temporal ordering among variables from

this particularly when the researcher needs to inductively longitudinal data. to explore several potential structural models to identify Second, the BN-LV method aims to discover the

including the most appropriate one. most likely causal structure in a probabilistic (not ight ,

yr 3. When there is need to automate the process for deterministic) fashion. By no means does the most exploring potential causal structures, in particular for likely causal structure discovered by the BN-LV nec- cop ebsite

w complex models with numerous permutations among essarily capture the definite causal model. As noted in LVs that prohibit the researcher from manually spec- §2, there is still disagreement among many philoso- holds

other ifying all potential models. phers and researchers about the possibility to infer y 4. When the researcher needs to have a stronger causality from data deterministically or probabilisti- an causal interpretation. The conditional probability- cally. In response to some philosophers who argue on

INFORMS based BN-LV method is theoretically closer to the that causality can only be inferred from controlled notion of causality than existing correlation- or experiments that account for all potential confounds,

posted covariance-based methods. BN-LV examines the relationship between X and Y yright:

be 5. When the data violate normality assumptions while capturing possible confounds Z by evaluating and when the true relationships among the LVs are d-separation conditions. Though rigorous researchers Cop not nonlinear. BN-LV is also found to be more robust to are supposed to account for all potential confounds, Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables 24 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS y .org. ma future research could develop a formal method to applicable to the complex nexus of causal relation- ms file

or test whether the existing confounds Z in the data are ships in the structural model addressed here. There → The “adequate” to assure that the X Y relationship is are two key challenges to extend the propensity scores . truly and significantly causal. approach to our problem: (1) determining the propen-

ibers Third, BN-LV only deals with LV identification sity scores in the presence of multiple causes, and given observed measurement items. However, it (2) identifying the right cloning for a given individual missions@inf does not address the more general missing variable in the presence of multiple values of a given variable. subscr per problem—how to build a model when potentially rel- Both issues need to be investigated by future research. to evant LVs are unobserved and thus may not have Finally, BN techniques cannot distinguish between been necessarily captured by the data (raw measure- structures that entail the same likelihood, especially policy ment items). When a relevant variable is missing from when the two structures have the same V -structures institutional

this the set of Z variables, it may cause inconsistency (see our discussion in §3.3 and Spirtes et al. 2000, to

le in the d-separation condition of BN-LV. An intuitive p. 60). In these cases, theoretical arguments may solution is to search for the missing (unobserved) be necessary to specify the best structure. However, ailab v variables. Hutchinson et al. (2000, p. 325) call this a the BN-LV method is a data analysis method that regarding a “needle in a haystack” problem because seeking all only examines the measurement and structural model missing variables is unending and it is very likely that and does not address issues of theory development, made the key sources of unobserved effects may never be measurement development, data collection, or the- is questions found. Indeed, a major challenge for causality infer- ory implications. Similar to the study of Lee et al. y (1997) study, future research could explore how the an ence is to account for all possible confounding vari- which ables (Mitchell and James 2001, Allison 2005). Recent proposed BN-LV method can be integrated into a

send advances in econometrics and marketing have looked comprehensive method for theory building, empirical ersion, into this problem, primarily via latent class modeling validation, and theory implications. v and mixture models. It is assumed that responses are Please . ance not from a single population (group). However, what Concluding Remarks

site causes the group membership is unobserved and can-

Adv Causality is a fundamental characteristic of a good not be determined a priori. These studies do represent in theory, but the difficulty in inferring causality has a major step toward identifying missing variables but forced researchers to either infer causality from pure author’s ticles the literature has still not addressed the general struc- theory (Carte and Russell 2003) or from longitudi- Ar the ture problem of LVs, discussed in §3.1. Most studies nal (Granger 1986), experimental (Cook and Campbell make specific assumptions on the structure of the LVs. 1979), or panel (Allison 2005) data. This paper is an this Once the structure is known, the LV identification to attempt to revive the pursuit of causality in struc- problem is simplified as one of finding the parame-

including tural models from observational data in the IS liter- ight , ters that fit the LV structure best. This is best reflected

yr ature in particular and the social sciences in general, by the finite where each observation and encourage IS researchers to bring causality con- cop ebsite may arise from two or more unobserved groups that

w siderations back into IS studies. The proposed BN-LV have a common distribution but different parame- method aims to provide a tool for IS researchers to holds ters. Still, the “missing variables” problem remains other better understand how causal relationships can be

y a caveat in the literature. Solutions to this problem inferred in structural models from observational data. an by future research can be readily integrated into the We hope the proposed data analysis method serves on proposed BN-LV method to provide a more accurate

INFORMS as a modest starting point for enhancing methods set of Z variables for accounting for the d-separation for inferring causality and building causal theories in

posted condition. the IS literature. Given the enhanced sophistication of yright:

be Fourth, besides BN, the propensity scores approach IS research in terms of theory and methods, causal- (Mithas and Krishnan 2008) is a promising causal ity can become an important consideration in the IS Cop not method. However, existing methods are not readily literature. Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 25 y .org. ma Table A1 Steps of the LVI Algorithm Appendix C. Deriving the OL Metric ms file

or We assume that a subject’s response (e.g., to a measurement Input: X = X X X 1 2 m item) is a choice among r ordered values, rendering r pos- The Output: disjoint item sets, each of which represents an LV

. sible ordered values 1r of a variable (node) x. For ← ∈ 1. L1 all xi , i 1m this type of choice behavior, the OL model is appropriate ∗ = = ∈ ibers 2. L ← all x x that meet C-1, ij ∈ 1mi = j/ (see §3.2.3) (Borooah 2002). Let pi Px i and i 1r be the 2 i j = 3. for (k = 2; L = and k ≤ m − 1k++ { conditional probability of x i given its parents . We have

missions@inf k the following OL functions: 4. generate L + ;

subscr k 1 per ∗ q (a) C + ← adding x x L to L / adding x one at a time to k 1 i i k k i = − = + log itp1 logp1/1 p1 1 ii (b) Eliminate Ck+1 that do not meet C-1 i=1

(c) Eliminate Ck+1 that do not meet C-2 q policy + = + − − = + 5. } log itp1 p2 logp1 p2/1 p1 p2 2 ii institutional i=1 this 6. Prune L for all k ∈ 1m/∗ (see §3.2.3) to k le (a) Identify supersets and delete all subsets + +···+ log itp p pr− (b) Detect overlapping measurement items and determine which item 1 2 1 ∗ ailab set to keep the items / start from the largest item sets and then go = logp + p +···+p − /1 − p − p −···−p − v 1 2 r 1 1 2 r 1 regarding a down the list to ensure the minimum number of LVs q = + 7. Output all Lk r−1 ii i=1 made + +···+ = is p1 p2 pr 1 questions Appendix A. The LVI Algorithm y − The algorithmic steps of the LVI algorithm are outlined in An ordered logistic regression estimates (r 1) intercepts an which Table A1. 1r−1 and q coefficients (s). Note that each possible value i has a different regression equation with different i send Appendix B. The Proposed PC2 Algorithm but with the same . Once the parameters are estimated, ersion, Table B1 summarizes the steps of Algorithm PC2. For more it is possible to derive the individual conditional probabil- v ∈ detailed discussion of Algorithm PC, please refer to Spirtes ity pi, i 1r from the set of equations above. The Please et al. (2000). The proposed PC2 algorithm has three main OL function is defined as the solution of pi to the above . ance steps: equations. site

Adv • Step 1 initiates a fully connected, undirected graph. Appendix D. The Time Complexity of the BN-LV

in • Step 2 computes R for all possible X, Y , and Z.If xy z Method R = 0, then delete the edge between X and Y . xy z The proposed BN-LV algorithm can be decomposed into

author’s • Step 3 orients the graph using five rules of directing ticles three parts: algorithm LVI, PC2, and the OL scoring func- an undirected graph (Verma and Pearl 1992). Ar

the tion. We discuss the computation complexity for each of these three components below: this Table B1 The Proposed PC2 Algorithm 1. The LVI Algorithm. Assume there are n measurements. to Item set L has at most n/k item sets. And there are at most Step 1. Start with the complete (all nodes are connected), undirected k

including n − k candidate items need to be evaluated. At each step, ight , graph G yr  LVI needs to check two conditions: C-1 and C-2. The most Step 2. Generate reduced undirected graph G expensive one is condition C-2, whose complexity depends cop = ebsite (1) Test if Rxy z 0 for each edge on the optimization procedure (Equation (4)). Let us denote w = (2) Delete the edges that are d-separated by Z (where Rxy z 0) this complexity as oC2. Therefore, the overall complexity  holds Step 3. Direct G using the following five rules: is given by:

other   Rule 1. For each triple of vertices X, Y , Z such that the pair X, Y , and n y × − × = 2 − + × the pair Y , Z are each adjacent in C but the pair X, Z are not adjacent n/k n k oC n 1 k nn 1/2 oc2 an → ← k=1 k in C, orient X–Y –Z as X Y Z if and only if Y does not  on d-separate X and Z and if it does not introduce a cycle of direction. If n

INFORMS k=1 1/k is the Harmonic series, which diverges very this edge reverses the previous orientation, then make it bidirectory slowly. When n = 1000, the value is 7.48. Therefore, we can Rule 2. If X → Y , Y –Z and X and Z are not adjacent, then direct Y → Z treat it as a constant. Hence, the complexity is n2oc2, the posted Rule 3. If X → Y , Y → Z, and X–Z, then direct X → Z number of tests that must be examined. yright: be Rule 4. If X–Y , Y –Z, Y –W , X → W , Z → W , then Y → W 2. PC2 Algorithm. Spirtes et al. (2000, p. 86) demonstrates that the PC2 complexity is given by n2n − 1k−1/k − 1!, Cop not Rule 5. If X–Y , Y –Z, X–Z, Z–W , W → X, then X → Y and Z → Y where k is the maximum number of edges a node can have. Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables 26 Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS y .org. ma Table D1 Illustration of the Computational Cost of BN-LV Method Chin, W. W. 1998. Issues and opinion on structural equation mod- ms file eling. MIS Quart. 22(1) 7–16. or Original sample 10 × Original data points Cook, T. D., D. T. Campbell. 1979. Quasi-Experimentation: Design and The Analysis for Field Settings. Rand McNally, Chicago. . TAM model (Pavlou 2003) <1 second <1 second k = 8m= 3n= 151 (151 data points) (1,510 data points) Cooper, G. 1995. A Bayesian method for learning belief networks that contain hidden variables. J. Intelligent Inform. Systems 4 ibers TAM-trust model (Pavlou 2003) <1 second 3.5 seconds 71–88. k = 13m= 5n= 151 (151 data points) (1,510 data points) missions@inf Cooper, G., E. Herskovits. 1992. A Bayesian method for the induc- ≈ ≈ Extended TPB model 3 minutes 6 hours tion of probabilistic networks from data. Machine Learn. 9 subscr per (Pavlou and Fygenson 2006) 309–347.

to = = = k 24m 14n 266 (266 data points) (2,660 data points) Davis, F. D. 1989. Perceived usefulness, perceived ease of use and user acceptance of information technology. MIS Quart. 13(3) 319–340. policy 3. The OL Scoring Function. The number of edges that Dayton, M., G. Macready. 1988. Concomitant-variable latent-class institutional need to be directed at the worst case is nk, suggesting that this k models. J. Amer. Statist. Assoc. 83(401) 173–178. to the scoring function needs to run n times ordered logistic Descartes, R. 1637. Discourse on method. Translated by le regression. J. Cottingham, R. Stoothoff, D. Murdoch, and A. Kenny (1991). To investigate how the time complexity of the BN-LV The Philosophical Writings of Descartes. Cambridge University ailab method varies as a function of the number of measurement v Press, Cambridge, UK. regarding a items (denoted by k), the number of constructs (denoted Druzdzel, M., H. Simon. 1993. Causality in Bayesian belief net- as m), and the number of data points (denoted as n), works. Proc. 9th Annual Conf. Uncertainty Artificial Intelligence Table D1 presents the results of six scenarios. The three rows UAI, Washington, DC, 3–11. made represent three data sets used in prior research with differ- Elidan, G., N. Friedman. 2001. Learning the dimensionality of hid- is questions ent level of complexity. The second column represents the den variables. Proc. 17th Conf. Annual Uncertainty Artificial Intel- y original data set and the third column represents the orig- ligence UAI, Seattle, WA, 144–151. an which inal data set bootstrapped to artificially generate 10 times Elidan, G., N. Lotner, N. Friedman, D. Koller. 2000. Discovering hid- larger data sets. The results (ona2GBRAMand2GHZ den variables: A structure-based approach. Adv. Neural Inform. send CPU computer) show that BN-LV is more sensitive to the Processing Systems 13(1) 30–37. Fornell, C., D. Larcker. 1981. Evaluation structural equation models ersion, number of measurement items than to the number of data v J. Marketing points. In sum, BN-LV can easily run on typical data sets with unobserved variables and measurement error. Res. 18(1) 39–50. Please encountered in most empirical IS studies. .

ance Friedman, N. 1997. Learning belief networks in the presence of missing values and hidden variables. Proc. Internat. Conf. site

Adv Machine Learn. ICML 97, Nashville, TN, 125–133. References in Friedman, N., M. Linial, I. Machman, D. Peer. 2000. Using Bayesian Allison, P. D. 2005. Causal inference with panel data. Amer. networks to analyze expression data. J. Computational Biol. Sociol. Association Annual Meeting, Philadelphia. http://www. 7(3/4) 601–620. author’s ticles allacademic.com/meta/p23194_index.html. Gefen, D., E. Karahanna, D. W. Straub. 2003. Trust and TAM in

Ar b.c. the Aristotle. 350 Physics, Book II. Translated by R. P. Hardie, online shopping: An integrated model. MIS Quart. 27(1) 51–90. R. K. Gaye in 1994. Massachusetts Institute of Technology, this Cambridge. http://classics.mit.edu/Aristotle/physics.2.ii.html. Gefen, D., D. W. Straub, M.-C. Boudreau. 2000. Structural equa- tion modeling and regression: Guidelines for research practice. to Bagozzi, R. P. 1980. Causal Modes in Marketing. Wiley, New York. Comm. Association Inform. Systems 4(7) 1–70. Binder, J., D. Koller, S. Russell, K. Kanazawa. 1997. Adaptive prob- including

ight Glymour, C., R. Scheines, P. Spirtes, K. Kelly. 1987. Discovering , abilistic networks with hidden variables. Machine Learn. 29 yr Causal Structure: Artificial Intelligence, Philosophy and Statistical 213–244. Modeling. Academic Press, San Diego.

cop Bollen, K. A. 1989. Structural Equations with Latent Variables. John ebsite Wiley and Sons, New York. Goldstein, E. 1994. Psychology. Brooks/Cole Publishing Company, w Belmont, MA. Borooah, V. 2002. Logit and Probit: Ordered and Multinomial Models. holds Sage Publications, Thousand Oaks, CA. Granger, C. 1986. Statistics and causal : Comment. J. Amer.

other 81 Breckler, S. J. 1990. Application of covariance structure modeling in Statist. Association (396) 967–968. y psychology: Causes for concern? Psych. Bull. 107(2) 260–372. Heckerman, D. 1996. A tutorial on learning Bayesian networks. an Carte, T., C. Russell. 2003. In pursuit of moderation: Nine common Technical Report MSR-TR-95-06, Microsoft Research, Redmond, WA. http://citeseer.ist.psu.edu/heckerman95tutorial.html. on errors and their solutions. MIS Quart. 27(3) 479–501. INFORMS Cartwright, N. 1995. Probabilities and experiments. J. Econometrics Heckerman, D., D. Geiger, D. Chickering. 1995. Learning Bayesian 67(1) 47–59. networks: The combination of knowledge and statistical data. Machine Learn. 20 197–243.

posted Chickering, D. 2002. Optimal structure identification with greedy search. J. Machine Learn. Res. 3(3) 507–554. Heinen, T. 1996. Latent Class and Discrete Latent Trait Models. Sage yright: be Chickering, D., D. Heckerman. 1997. Efficient approximation for the Publications, New York.

Cop not marginal likelihood of Bayesian networks with hidden vari- Holland, P. 1986. Statistics and causal inference. J. Amer. Statist. ables. Machine Learn. 29 181–212. Association 81 945–960. Zheng and Pavlou: New Bayesian Networks Method for Structural Models with Latent Variables Information Systems Research, Articles in Advance, pp. 1–27, © 2009 INFORMS 27 y .org. ma Hume, D. 1738. A treatise of human nature. Report, Clarendon Plato. 360 b.c. The Republic. Translated by Benjamin Jowett in ms file Press, Oxford, UK (1996). 1893. Wikisource, New York. http://en.wikisource.org/wiki/ or Hutchinson, W., W. Kamakura, J. Lynch. 2000. Unobserved het- TheRepublic. The erogeneity as an alternative explanation for reversal effects in Popper, K. 1959. The Logic of Scientific Discovery. Basic Books, . behavioral research. J. Consumer Res. 27 323–344. New York. Rubin, D., R. Waterman. 2006. Estimating the causal effects of

ibers Kant, I. 1781. The Critique of Pure Reason. Translation by J. M. D. Meiklejohn. 1999. Cambridge University Press, . http:// marketing interventions using propensity score methodology.

missions@inf eserver.org/philosophy/kant/critique-of-pure-reason.txt. Statist. Sci. 21(2) 206–222. Sarkar, S., R. Sriram. 2001. Bayesian models for early warnings of subscr

per Kenny, D., C. Judd. 1984. Estimating the nonlinear and interactive effects of latent variables. Psych. Bull. 96(1) 201–210. bank failures. Management Sci. 47(10) 1457–1475. to Kline, R. B. 1998. Principles and Practice of Structural Equation Mod- Shugan, S. M. 2007. Causality, unintended consequences and eling. Guilford Press, New York. deducing shared causes. Marketing Sci. 26(6) 731–741.

policy Lee, B., A. Barua, A. B. Whinston. 1997. Discovery and representa- Silva, R., R. Scheines, C. Glymour, P. Spirtes. 2006. Learning the structure of linear latent variables. J. Machine Learn. Res. 7 institutional tion of causal relationships in MIS research: A methodological

this 191–246.

to framework. MIS Quart. 21(1) 109–136. Skrondal, A., S. Rable-Hesketh. 2004. Generalized Latent Variable le Mitchell, T. R., L. R. James. 2001. Building better theory: Time and Modeling. Chapman & Hall, London. the specification of when things happen. Acad. Management Rev. Spinoza, B. 1662. On the improvement of the understanding. Trans-

ailab 26(4) 530–547. v lated by R. H. M. Elwes. In The Chief Works of Benedict de regarding a Mithas, S., M. Krishnan. 2008. From association to causation via Spinoza. 1883. G. Bell & Sons, London. a potential outcomes approach. Inform. Systems Res., ePub ahead of print December 18, http://isr.journal.informs.org/ Spirtes, P., C. Glymour, R. Scheines. 2000. Causation, Prediction, and made cgi/content/abstract/isre.1080.0184v1. Search. MIT Press, Cambridge.

is Spirtes, P., C. Glymour, R. Scheines. 2002. tasks and

questions Mithas, S., D. Almirall, M. Krishnan. 2006. Do CRM systems methods: Probabilistic and causal networks: Mining for prob- y cause one-to-one marketing effectiveness? Statist. Sci. 21(2) abilistic networks. Handbook of Data Mining and Knowledge Dis-

an 223–233. which covery. Oxford University Press, New York. Pavlou, P. A. 2003. Consumer acceptance of electronic commerce: Spirtes, P., T. Richardson, C. Meek, R. Scheines, C. Glymour. 1998. Integrating trust and risk with the technology acceptance send Using path diagrams as a structural equation modeling tool. model. Internat. J. Electronic Commerce 7(3) 69–103. Sociol. Methods Res. 27(2) 182–225. ersion, v Pearl, J. 1998. Graphs, causality, and structural equation models. Suppes, P. 1970. A Probabilistic Theory of Causality. North Holland Sociol. Methods Res. 27(2) 226–284.

Please Publishing Company, Amsterdam.

. Pearl, J. 2000. Causality: Models, Reasoning and Inference. Cambridge ance Torgerson, W. S. 1958. Theory and Methods of Scaling. Wiley, University Press, Cambridge, UK. New York. site Adv Pearl, J., T. Verma. 1991. A theory of inferred causation. Proc. Prin- Verma, T., J. Pearl. 1992. An algorithm for deciding if a set of in ciples Knowledge Presentation Reasoning 2(1) 441–452. observed independencies has a causal explanation. Proc. 8th Pearson, K. 1897. Mathematical contributions to the theory of evo- Annual Conf. Uncertainty Artificial Intelligence UAL, Stanford

author’s lution. Proc. Roy. Soc. London 60 489–503. University, Palo Alto, CA, 323–330. ticles Ar the this to including ight , yr cop ebsite w holds other y an on INFORMS posted yright: be Cop not