<<

Quantitative measurement of parliamentary accountability using text as data: the Canadian House of Commons, 1945-2015

by

Tanya Whyte

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Political Science University of

c Copyright 2019 by Tanya Whyte Abstract

Quantitative measurement of parliamentary accountability using text as data: the Canadian House of Commons, 1945-2015

Tanya Whyte Doctor of Philosophy Graduate Department of Political Science 2019

How accountable is ’s Westminster-style ? Are minority parliaments more accountable than majorities, as contemporary critics assert? This dissertation develops a quanti- tative measurement approach to investigate parliamentary accountability using the text of speeches in

Hansard, the historical record of proceedings in the Canadian House of Commons, from 1945-2015. The analysis makes a theoretical and methodological contribution to the comparative literature on legislative debate, as well as an empirical contribution to the Canadian literature on Parliament.

I propose a trade-off model in which parties balance communication about goals of office-seeking

(accountability) or policy-seeking (ideology) in their speeches. Assuming a constant context of speech,

I argue that lexical similarity between government and opposition speeches is a valid measure of parlia- mentary accountability, while semantic similarity is an appropriate measure of ideological polarization.

I develop a computational approach for measuring lexical and semantic similarity using word vectors and the doc2vec algorithm for word embeddings. To validate my measurement approach, I perform a qualitative case study of the 38th and 39th Parliaments, two successive minority governments with alternating governing parties. I find that similarity scores are positively related with the substantive quality of opposition questions and government responses.

In the quantitative analysis phase, I study Question Periods from 1975-2010 and daily debates from

1945-2015 using the lexical similarity measurement. I find that minority parliaments are more account- able than majority governments since the 30th Parliament, but find no significant relationship between government seat percentage and parliamentary accountability. I show that Parliament becomes more accountable as a government’s popularity decreases. However, the data more strongly support a non- linear model. A structural break analysis yields one significant break at 33%: below this critical value, polling popularity and parliamentary accountability are positively related, and above, are negatively re- lated. Finally, I confirm that the correlation between measures of lexical and semantic similarity varies in strength and direction across parliamentary sessions, suggesting two distinct generative processes are indeed at work.

ii Acknowledgements

Thanks to my supervisors, Grace Skogstad and Chris Cochrane, and the third member of my committee, Arthur Spirling; the internal and external members of my exam committee, Ludovic Rheault and Justin Grimmer; academic colleagues including but not limited to Kaspar Beelen, Steven Bernstein, Hanil Chang, Adrienne Davidson, Rod Haddow, Maxime Héeroux-Legault, Graeme Hirst, Vincent Hopkins, Peter Loewen, John McAndrews, Denver McNeney, Nona Naderi, Federico Nanni, Ed Schatz, Lior Sheffer, Graham White, and Linda White; department graduate administrators Carolynn Branton and Louis Tentsos; mentors and teachers during prior studies at the University of and Old Scona Academic High School; the , including SSHRC for its financial support, and the staffs of the Library of Parliament and Canadiana at Library and Archives Canada; and my colleagues at Receptiviti, especially my boss, Sharan Karanth. Finally, my thanks and love to my parents, Annie and Tony Whyte, my sister, Andrea Whyte, and my husband, Kevin Chan.

iii Contents

1 Introduction 1 1.1 Canadian Literature on Parliament ...... 4 1.2 The Parliamentary Decline Thesis ...... 10

2 Literature Review: Responsible Government and Parliamentary Debate 14 2.1 Responsible Government and Accountability ...... 14 2.1.1 Individual Ministerial Responsibility ...... 16 2.1.2 Collective Ministerial Responsibility ...... 16 2.1.3 Role of the Opposition ...... 18 2.2 Comparative Theories of Parliamentary Debate ...... 21 2.2.1 Proksch and Slapin (2014) ...... 24 2.2.2 Bäck and Debus (2016) ...... 27

3 Research Design 30 3.1 Textual Debate Models: Peterson and Spirling (2018) ...... 30 3.2 Accountability, Ideology, and the Lexical Gap ...... 32 3.2.1 Theoretical Framework ...... 34 3.2.2 Empirical Model and Measurement ...... 36 3.3 Validation and Prediction ...... 38 3.3.1 Part 1: Qualitative Assessment of Opposition-Minister Exchanges in the 38th and 39th Parliaments ...... 38 3.3.2 Part 2: Quantitative Study of Daily Debates (1945-2015) and Oral (1975-2010) ...... 39

4 Methodology 45 4.1 Content Analysis ...... 45 4.2 Dictionary Methods ...... 47 4.3 Lexicographic and Scaling Methods ...... 49 4.4 Supervised and Unsupervised Machine Learning ...... 50 4.5 Latent Semantic Analysis ...... 52 4.5.1 Probabilistic Linguistics ...... 53 4.5.2 Limitations of LSA ...... 54 4.6 Measuring Lexical Similarity: Cosine Similarity and the Term-Document Matrix . . . . . 55

iv 4.7 Measuring Semantic Similarity: Probabilistic Vector Representations of Words and Para- graphs ...... 57 4.8 Study Dataset ...... 61

5 Qualitative Validation: Opposition-Minister Exchanges in the 38th and 39th Parlia- ments 62 5.1 38th Parliament: Liberal Minority ...... 63 5.1.1 Historical Overview ...... 63 5.1.2 Topics of Debate ...... 68 5.1.3 Natural Resources ...... 69 5.1.4 Health ...... 72 5.1.5 Citizenship and Immigration ...... 76 5.1.6 ...... 81 5.1.7 Sponsorship Program ...... 82 5.2 39th Parliament: Conservative Minority ...... 85 5.2.1 Historical Overview ...... 85 5.2.2 Topics of Debate ...... 89 5.2.3 Child Care ...... 93 5.2.4 The Environment ...... 96 5.2.5 Ethics ...... 99 5.2.6 Airbus ...... 100 5.3 Summary: Comparison of 38th and 39th Parliaments ...... 103

6 Quantitative Analysis: Question Period, 1975-2010 and Daily Debate, 1945-2015 105

7 Conclusion 127 7.1 Ideas for Further Research ...... 130

Appendices 134

A Qualitative Results: Additional Analysis 135 A.1 38th Parliament quantitative tests ...... 135 A.2 39th Parliament quantitative tests ...... 137 A.3 Model: lexical similarity vs. word count differential ...... 140

B Quantitative Results: Additional Analysis 141 B.1 Polling data breakpoints analysis ...... 141 B.2 Replication of polling analysis with all opposition speeches ...... 145 B.3 Comparison of alternative fits (loess, lm, spline, polynomial) for polling data model . . . . 145 B.4 Parliament-level daily debate model using 1975- data subset ...... 149 B.5 Daily debate model including random effects using 1975- data subset ...... 150 B.6 Model of lexical and semantic similarity, Question Period data ...... 150 B.7 Interaction of government poll popularity and previous popularity ...... 152 B.8 Effect size calculations ...... 152 B.9 Simulated distribution of similarity scores ...... 153

v C Technical Information 156 C.1 Preprocessing ...... 156 C.2 Computation and Analysis ...... 156

D Model of Parliamentary Speech Content 158

E Lipad Digitization of Canadian Hansard Debates 162

vi List of Tables

5.1 Topics of Opposition-Minister Exchanges in Question Period, 38th Parliament ...... 69 5.2 Natural Resources (38th Parliament) High Similarity Speech Pair Examples ...... 70 5.3 Natural Resources (38th Parliament) Low Similarity Speech Pair Examples ...... 71 5.4 Health (38th Parliament) High Similarity Speech Pair Examples ...... 73 5.5 Health (38th Parliament) Low Similarity Speech Pair Examples ...... 74 5.6 Citizenship and Immigration (38th Parliament) High Similarity Speech Pair Examples . . 77 5.7 Citizenship and Immigration (38th Parliament) Low Similarity Speech Pair Examples . . 78 5.8 David Dingwall (38th Parliament) High Similarity Speech Pair Examples ...... 79 5.9 David Dingwall (38th Parliament) Low Similarity Speech Pair Examples ...... 80 5.10 Sponsorship Program (38th Parliament) High Similarity Speech Pair Examples ...... 83 5.11 Sponsorship Program (38th Parliament) Low Similarity Speech Pair Examples ...... 84 5.12 Topics of Opposition-Minister Exchanges in Question Period, 39th Parliament ...... 90 5.13 Child Care (39th Parliament) High Similarity Speech Pair Examples ...... 91 5.14 Child Care (39th Parliament) Low Similarity Speech Pair Examples ...... 92 5.15 The Environment (39th Parliament) High Similarity Speech Pair Examples ...... 94 5.16 The Environment (39th Parliament) Low Similarity Speech Pair Examples ...... 95 5.17 Ethics (39th Parliament) High Similarity Speech Pair Examples ...... 97 5.18 Ethics (39th Parliament) Low Similarity Speech Pair Examples ...... 98 5.19 Airbus (39th Parliament) High Similarity Speech Pair Examples ...... 101 5.20 Airbus (39th Parliament) Low Similarity Speech Pair Examples ...... 102

6.1 Models 1, 2, and 3 investigate the effect of majority status, governing party, government poll popularity, and seat percentage controlled by the government on lexical similarity scores. All models use the Question Period dataset. Mode1 1 studies individual Question Period observations and includes random effects for quarter, session, and parliament, while Models 2 and 3 are calculated on mean quarterly similarity scores and include random effects for session and parliament...... 108 6.2 Models 4 and 5 investigate the effect of majority status, governing party, and seat percent- age controlled by the government on lexical similarity scores. Both models are calculated on individual daily observations from the daily debate dataset and include random effects for quarter, session, and parliament...... 116

vii 6.3 Models 6 and 7 investigate the relationship between semantic similarity and lexical sim- ilarity, employing the daily debate dataset. Model 6 uses averaged observations at the parliamentary session level, while Model 7 studies individual daily-level observations and includes random effects for quarter, parliament, and session. Note the significant negative relationship between lexical and semantic similarity across both models...... 123

viii List of Figures

4.1 Frequency vs. word rank distributions from zipfR Dickens data, linear (left) and loga- rithmic (right) scales ...... 54

6.1 Histogram of Question Period (1975-2010) lexical similarity scores (n = 4105, µ = 0.586, σ = 0.065)...... 106

6.2 Histogram of daily debate (1945-2015) lexical similarity scores (n = 9102, µ = 0.718, σ = 0.088)...... 107

6.3 Plot of Question Period lexical similarity scores averaged on a parliamentary session basis, calculated between government speeches and only official opposition speeches. Red data points represent minority governments and blue majority governments...... 109

6.4 Plot of Question Period lexical similarity scores averaged on a parliamentary session basis, calculated between government speeches and speeches from all opposition parties. Red data points represent minority governments and blue majority governments...... 111

6.5 Plot of daily debate lexical similarity scores averaged on a parliamentary session basis, calculated between government speeches and official opposition speeches only. Red data points represent minority governments and blue majority governments...... 112

6.6 This plot compares the percentage of seats held by the government with daily debate lexical similarity scores averaged on a parliamentary session basis, calculated between government speeches and only official opposition speeches. Red data points represent minority governments and blue majority governments...... 114

6.7 Plot of quarterly government poll popularity versus quarterly means of Question Period similarity scores. Points are coloured to indicate whether the government was a majority or minority. The curved line is a loess model with corresponding 95% CI. The vertical lines represent the best-fitting model for a break point from a structural change model and corresponding 95% CI...... 118

6.8 Plot of quarterly government poll popularity versus quarterly means of Question Period similarity scores, selecting only those observations from majority parliaments. The curved line is a loess model with corresponding 95% CI. The vertical lines represent the best- fitting model for a break point from a structural change model and corresponding 95% CI...... 119

ix 6.9 This plot compares two methods of calculating similarity scores, representing lexical and semantic similarity measures. I employ the daily debate dataset to calculate scores be- tween government and official opposition speeches and averaging scores on a parliamentary session level. Red data points represent minority governments and blue majority govern- ments...... 122 6.10 Plot of daily debate semantic similarity scores averaged on a parliamentary session basis, calculated between government speeches and official opposition speeches only. Red data points represent minority governments and blue majority governments...... 126

x List of Appendices

Appendix A: Qualitative Results: Additional Analysis 135

Appendix B: Quantitative Results: Additional Analysis 141

Appendix C: Technical Information 156

Appendix D: Model of Parliamentary Speech Content 158

Appendix E: Lipad Digitization of Canadian Hansard Debates 162

xi Chapter 1

Introduction

In Canada, legitimate political authority is centred in a Westminster-style parliament. In the comparative political science literature, the Westminster model is widely characterized by a centralized and powerful executive drawn from a weak (Lijphart, 2012). While elected representatives nominally control the actions of the executive through the parliamentary confidence convention, in practice power rests with the executive due to party discipline on confidence votes (Olson, 1994, 77). The fusion of executive and legislative powers binds together power and responsibility with adversarial clarity, in a system also known as “responsible government”. Parliament establishes a government that has the power to get things done, while the opposition pursues accountability for that power by both demanding justifications and by presenting itself as an alternative government (Franks, 1987, 264–265). From another perspective, political accountability is a dialogic process in a Westminster-style parliament. The parliamentary battle between a government’s defence of its executive prerogative and an opposition’s critique of government maintains democratic legitimacy of the exercise of power in a more organic fashion than provided by, for example, the institutional checks-and-balances of an American-style congressional system. On March 10, 2017, Liberal released a discussion paper outlining proposals to reform the Standing Orders of the Canadian House of Commons (Wherry, 2017b). Some of these ideas, including the creation of a Prime Minister’s question time similar to that of the , derived from the Liberal government’s prior election promises; others, such as electronic voting in Parliament, have been floated for decades. Chagger argued this slate of reforms would make Parlia- ment “more accountable, predictable, efficient, and transparent (Wherry, 2017b).” However, opposition MPs protested that the proposed changes, especially those to Question Period and to committee rules on interventions, would significantly harm the opposition’s ability to hold governments to account. In response to the government’s rapid timeline for committee review of the changes, opposition MPs availed themselves of the very filibuster opportunities in committee the government sought to curtail and at- tempted to prevent the proposals from coming to a vote (Curry, 2017; Wherry, 2017a). The government, for its part, continued to insist that had granted the government a mandate to implement its election promises regarding “open and transparent government,” including changes to parliamentary procedure (Curry, 2017). This episode is characteristic of previous attempts to reform the procedures of the Canadian Par- liament. More generally, it provides an example of the aforementioned tension between power and accountability characteristic of all Westminster-style parliaments. A government, especially one with a

1 Chapter 1. Introduction 2 majority of the seats in the House of Commons, possesses both executive power to implement its agenda and the representational authority of elected office to justify the exercise of that power (Skogstad & Whyte, 2015, 84). A majority government claims to speak decisively for the people in its exercise of power and authority, and so justifies limitations on the time and scope of debate in the House of Com- mons. On the other hand, oppositions argue such debate is critical to holding the government to account for that exercise of power in between elections, however limited the practical effect might be, and that government-endorsed rule changes threaten this accountability. Indeed, part of the opposition argument in this recent case is that the House of Commons’ role in holding government to account has already been hollowed out to near-worthlessness across decades of increased concentration of power in the executive. Despite the apparent capacity of the to deliver accountable government, critics believe it is no longer doing so. P. H. Russell (2008, 122) argues the Canadian Parliament is most likely to satisfy contemporary demands for democratic accountability under conditions. Minority governments must cooperate with opposition parties to pass legislation, survive confidence votes, and work productively in legislative committees. In Russell’s view, minority governments provide the best balance between the efficiency of fusion of powers and the virtues of deliberative parliamentary democracy since Canada lacks an accountable and effective Senate to act as a legitimate check on a majority-led House of Commons (P. H. Russell, 2008, 112, 173). In contrast, Stewart (1977, 22) argues such an interpretation is “dangerous” because it obscures the accountability role performed by responsible government, cohesive parties, adversarial debate, and the confidence convention. Minority parliaments shift too much decisive power to small and unrepresentative third parties. Governments are unable to act effectively if defeated over and over on important policy questions, and are able to avoid accountability for their actions by shifting blame to third parties (Stewart, 1977, 25). Stewart’s conclusion is that minority governments are neither good nor bad but preferable to “constitutional collapse” (Stewart, 1977, 29). This dissertation will empirically investigate whether governments are more accountable to Parlia- ment under majority or minority conditions, utilizing a computational analysis of textual speech data from the Canadian Hansard over the years 1945-2015. I argue that parties in the Canadian House of Commons operate under strong institutional constraints that allow them to largely dictate the content of speeches their MPs make in service of collective political goals. These parties face a trade-off in speechmaking strategies between goals of ideology and accountability, and their best move is shaped by their relative institutional power as government or opposition. First, parties are policy-seeking: they are interested in implementing political changes informed by their collective political ideology, or “set of beliefs about the proper orientation of society and how it should be achieved (Erikson & Tedin, 2003, 64).” A government in a Westminster system possesses the executive power to implement a policy pro- gram for which it received a mandate from the people during an election, and uses parliamentary speech to advertise and claim credit for this program. Opposition members, in contrast, have limited time to speak in Parliament, little institutional power to represent their partisan policy interests, and negligible practical influence over policy outcomes. Second, parties are office-seeking: electoral benefits and penalties accrue from how responsibly a government implements its mandate, and how well an opposition holds the government to account for this conduct. Governments feel the pressure of parliamentary accountability in terms of maintaining agenda control, avoiding embarrassment and scandal, and justifying decisions against opposition critique. For their part, oppositions stand to gain popular support by successfully holding a government to account in Chapter 1. Introduction 3 parliament for poor stewardship. This pursuit of accountability is especially important to their electoral prospects considering they hold little to no power to affect policy change, especially under majority conditions. How can the relative emphasis governments and oppositions place on accountability in their par- liamentary speechmaking be measured? In this dissertation, I assert that relative linguistic similarity across speeches made by governments and oppositions is a quantitative measure of accountability in the House of Commons as a dialogic process. That is, the more focused the opposition on holding governments to account (as opposed to appealing to or representing an ideological position) and the more accountable the government to their questioning (as opposed to ignoring opposition questions or changing the subject), the greater the lexical similarity in their speech content.1 This dissertation will make the following contributions to the literature. First, I make a theoreti- cal contribution to the comparative literature on the empirical study of parliament by improving upon existing unidimensional scaling models of legislative debate. Second, I make a methodological contribu- tion by proposing a computational textual measure of parliamentary accountability based upon relative lexical similarity. Third, I make an empirical contribution to the Canadian literature on Parliament both through the Lipad dataset of historical Hansard debates, and the use of this dataset to answer a persistent empirical question in Canadian political science regarding minority governments. The dissertation proceeds as follows. In the remainder of this chapter, I overview the Canadian literature on the study of Parliament in order to situate this study and justify the selection of Canada as a single case. In Chapter 2, I explore in more detail two literatures of major relevance to the dissertation. First, an understanding of accountability in a Westminster parliament is inseparable from the concept of responsible government. In order to situate and justify the assertions above regarding accountability and representation, I begin with a background overview of responsible government in the Canadian context. Second, in order to move from the theory of responsible government to an empirical model of parliamentary accountability, I build on existing empirical approaches from the comparative literature on legislative debate. In the second part of Chapter 2, I overview these approaches and develop a critique based on their failures in the Canadian case. In Chapter 3, I bring together this critique with my discussion of responsible government, employ- ing an empirical definition of accountability developed by Bovens (2005, 2010) to propose a model and measurement strategy for studying accountability in parliamentary debate text. Working within this theoretical framework, I develop five testable propositions surrounding Russell’s claim that minority parliaments are more accountable than majority parliaments. I investigate whether minorities are in- deed more accountable than majorities, and whether the percentage of seats a government holds affects its level of parliamentary accountability. Next, I examine the relationship between government polling popularity and accountability, with the expectation that popularity is negatively related to account- ability and that, above a certain popularity threshold, majority and minority governments alike face no additional incentive to change their accountability behaviour. Finally, I study the relationship be- tween accountability and ideology in political speech, anticipating a negative relationship between the expression of these two goals in debate. In Chapter 4, I investigate potential text analysis methodologies for measuring accountability and elaborate upon the linguistic similarity approach I select as well as details of my dataset construction

1Lexical, or surface, similarity is the extent to which the vocabularies of a set of texts overlap with each other. For further discussion regarding the measurement of textual similarity, see Chapter 4. Chapter 1. Introduction 4 and analysis procedure. In Chapter 5, I perform a qualitative validation of my measurement approach through a case study of Question Period speeches from the 38th and 39th Parliaments. In Chapter 6, I return to my five testable propositions and seek to answer these questions using the quantitative analysis procedures outlined in Chapter 4. I find that minority parliaments are indeed more accountable than majorities, but have insufficient evidence to make a finer-grained claim about the effect of the percentage of seats a government holds. I discover a non-linear relationship between polling popularity and parliamentary accountability; accountability peaks in the mid-30% range representing a turning point between a positive and negative relationship, suggesting that parliaments are more accountable the more uncertain the outcome of a potential election. Finally, I find that accountability decreases as ideological polarization increases, a result potentially reflective of historical trends in both features of parliamentary discourse in Canada since 1945. These results, especially the latter, suggest many fruitful areas of further comparative research, which I discuss in the concluding Chapter 7.

1.1 Canadian Literature on Parliament

There is general consensus in the literature that the dominance of the political executive and its “court” at the expense of accountability to Parliament is particularly and problematically strong in Canada, especially in majority government situations (P. H. Russell, 2008; Savoie, 1999, 2008, 2015; White, 2005, 2012). Some scholars of the Canadian “democratic deficit,” concerned with declining voter turnout and public trust in traditional political institutions, have therefore focused on alleviating a parallel democratic deficit in the House of Commons between the executive and the average elected representative as one approach to increasing its relevance. A recent and accessible work in this field has emerged from Samara Canada, a non-partisan organization dedicated to civic engagement and democracy. In Tragedy in the Commons (2014), Samara co-founders Loat and MacMillan conduct interviews with over 80 former MPs in an effort to diagnose “a democracy that has lost its way, its purpose, and the support of the public it is meant to serve (Loat & MacMillan, 2014, 275).” In their introduction, Loat and MacMillan make extensive reference to C.E.S. Franks’ The (1987), namely to how Franks’ critical observations in the late 1980s read as if written today. Franks observed dysfunctions in the accountability role of the House of Commons rooted in both institutional structures—for example, an insufficient number of MPs to permit party discipline to loosen—as well as political circumstances (Franks, 1987, 262). Franks, and Loat and MacMillan, form part of an extensive list of Canadian scholars and politicians who have reviewed similar issues in déjà-vu fashion since the 1970s. Up until the late 1950s–early 1960s, the rules of the Canadian parliament were largely uncontro- versial. Save for critical episodes such as the 1913 Naval Bill (which resulted in the introduction of a closure rule to the House of Commons), the combination of a relatively narrow scope for government and the amateurism of political representation meant that “no excessive strain ha[d] ever been put on the Canadian House” with regard to the sufficiency of parliamentary procedure (Dawson, 1962, 252). However, transformational changes beginning with World War II yielded pressures for change: citizen comfort with, and demands for, government to play an expanded role in society; the inadequacy of existing procedures for debating spending and taxation in committees of the whole House; increasing salaries for MPs and the advent of air transportation, allowing them to be MPs full time; the emergence of national tensions creating issues for regional accommodation; and successions of minority parliaments on a never-before seen scale led to cross-partisan consensus that changes to parliamentary procedure Chapter 1. Introduction 5 were necessary (Dawson, 1962, 211; Stewart, 1977, 214, 221, 223, 276). Significant revisions to the Standing Orders of Parliament were made over 1955–1969, the majority of which served to shorten un- necessary debates and increase efficiency of proceedings, for example by restricting budgetary debates and by significantly increasing the role of committees in the legislative process (Stewart, 1977).

This slate of reforms, however necessary, marked the beginning of a parliamentary literature on responsible government and the necessity of reform that has not abated since. Writing in response to these Standing Order changes, Stewart (1977) argues that additional reforms including stronger time allocation measures and an expanded role for committees were necessary to improve democratic functioning of the House. Nielson and MacPherson (1978) collect observations from political scientists and politicians on the need for Canadian parliamentary reform, highlighting an assertion by opposition leader Robert Stanfield that “we no longer have parliamentary responsible government in Canada” due to a combination of executive dominance and opposition under-staffing (Stanfield, 1978, 42). Thomas (1979) notes an “apparently sharp decline” in public trust in government since the 1960s, and proposes in response a slate of procedural changes to committees, ministerial responsibility, and time allocation to increase the House of Commons’ oversight powers (Thomas, 1979, 57). Mallory (1979) argued the 1969 reforms to committees worsened problems of information overload and lack of resources. These difficulties undermined the ability of MPs to perform oversight functions in committees, while the transfer of these functions away from the House of Commons opened up new accountability gaps. In the new millennium, Docherty (2005) observed the same problems in his evaluation of Parliament for the Canadian Democratic Audit series; he highlighted a lack of capacity and support resources for individual MPs to make a difference, together with excessive party discipline, as critical issues. By the late 2000s, a new generation of MPs, most notably , had taken up the mantle of parliamentary reform along similar lines to those proposed by academics, Speakers and MPs since the late 1970s (Blaikie, Boyer, Boudria, Dalphond-Guiral, & Nystrom, 2006; Chong, 2008; Chong, Jennings, Laframboise, Davies, & Lukiwiski, 2010; Garner, 1998; Lincoln, Merrifield, Bergeron, Nystrom, & Casey, 2001; P. Manning, 1994; Segal, 2004; Strahl, 2001). In short, the 2017 struggle over parliamentary reform was one of a long line of such contests, rooted in uncertainty as to the adequacy of Parliament as an institution for accountability in light of the complexity of modern governance.

In addition to re-imagining the balance between executive power and legislative accountability, re- form proposals dating back to the 1970s have generally agreed some additional rules are necessary to promote civility and lessen adversarialism in Parliament, especially during Question Period. The highly partisan and often juvenile environment of Parliament, it is argued, contributes significantly to declining public respect for Canadian political institutions, resulting in the voter disengagement and declining voter turnout characteristic of a democratic deficit (Ryan, 2009, 18–19). This is not simply a modern phenomenon: an impetus for parliamentary reform in 1968 was that all parties were “embarrassed” by the Commons’ public “reputation as a pandemonium” (Stewart, 1977, 219). Unlike in Britain, where Question Time is rarely attended by the Prime Minister and oral questions must be tabled at least three days in advance, in Canada by convention the Prime Minister and his or her ministers must be present to face a barrage of interrogation as the Opposition sees fit. The ferocity of what Loat and MacMillan term “Kindergarten on the Rideau” (Loat & MacMillan, 2014, 115) is often a shock to the first-time specta- tor in the House Gallery. Nevertheless, some former MPs—especially opposition members—defend the advantages of the Canadian format as an accountability mechanism: “Just imagine, it’s the only democ- racy in the world where the executive comes to a session not knowing what questions they will be asked. Chapter 1. Introduction 6

They are being asked in public, so it is quite a vital and important instrument in our democracy.” (Loat & MacMillan, 2014, 126) To critics, daily Question Period can represent too much of a good thing, as mandatory accessibility to government members could encourage easy partisan attacks at the expense of deeper policy questions. This is because opposition members are likewise time-pressured by the format of Question Period but have access to even fewer resources than ministers for researching and preparing their material (Dobell & Reid, 1992). From this perspective, it is difficult to assess whether the 2017 proposal to implement a Prime Minister’s question block would improve the efficiency of debate time in the House of Commons, reduce opportunities to hold the government as a whole to account, or put a further centralizing spotlight on the Prime Minister and away from the rest of Parliament.

This diversity of competing opinions on reform of Question Period exemplifies the complexity of institutional tensions at work in Parliament. As Mallory points out, “[t]he House of Commons as an institution would be more effective if there were some general agreement about what its function is.... it has played a variety of roles—some of which are largely obsolete—and has been forced to assume new roles however unsuited it is as an institution to perform them (Mallory, 1979, 26).” To what extent can parliamentary reform address this confusion? Despite his identification of areas for improvement, Franks is persistently cautious about unintended, potentially negative consequences of reform in a system as path dependent as the Westminster model (Franks, 1987, 265). This hesitation is characteristic of the so-called “responsible government” tradition in the Canadian literature on Parliament, identified by its emphasis upon the foundational Westminster conceptualization of parliamentary accountability (Malloy, 2002). Such literature focuses on description and defence of existing institutional structures. From this perspective, Westminster institutions have served Canada well over the years, and tinkering unnecessarily with written rules could upset its successful balance founded upon unwritten convention (J. Smith, 1999, 410). Contemporary demands for such rule changes, it is argued, are typically rooted in ideas and values of democratic representation imported from the American congressional context that are incompatible with the foundation of the Canadian system (Atkinson & Thomas, 1993, 433–434; J. Smith, 1999, 412). Institutional practices such as party discipline appear undemocratic when viewed in isolation—and can indeed be so at their most egregious—but must be assessed in context as pieces of institutional machinery that “deliv[er] on the democratic principle in a particular way” different from a congressional system where, for example, delegate representation is strongly valued (J. Smith, 1999, 417).

Governance in a Westminster parliamentary system, this perspective holds, is a continuous and some- times unpleasant process of “muddling through” changing circumstances. Best practices that emerge and gain acceptance through this process become convention, allowing the system to adapt to fundamen- tal change over time (notwithstanding changes to the Constitution Act, 1982 including the Charter of Rights and Freedoms—the implications of which preoccupy another significant chunk of the literature). Parliament provides an arena for working out legitimate solutions to fundamental political grievances via verbal conflict among parties, and its flexibility in this regard has been critical to managing Canadian regional and linguistic cleavages (D. E. Smith, 2007, 89). Procedural rule changes meant to facilitate its adaptation to modern circumstances have had a decidedly mixed track record and tend to create new problems as well as shift old ones around (Mallory, 1979; J. Smith, 1999; Sutherland, 1991). In- stead of making institutional rule changes, Canadians concerned about a democratic deficit should first focus on improving civic education of how Canadian democratic institutions work. This would include a greater understanding of the role an adversarial opposition plays in accountability and a stronger ap- Chapter 1. Introduction 7 preciation of the advantages of parliamentary government, as well as critique of its downsides (Franks, 1987, 257; D. E. Smith, 2007, 127, 142). Canadians who remain fundamentally dissatisfied with their parliamentary democracy should not blame the institution itself, but the media for rewarding negative and frivolous politics with disproportionate news coverage (Franks, 1987, 267; Loat & MacMillan, 2014, 131; P. H. Russell, 2008, 168; D. E. Smith, 2007, 133; Stevens, 1978, 230). A counter-thread in the Canadian study of parliament that also first emerged in the late 1970s has argued that this “traditional” responsible government perspective on Parliament is flawed. To these reformist critics it suffers from a colonial reluctance to examine alternative democratic arrangements or reforms from a comparative or critical perspective, and to theorize more abstractly about the Canadian case. Given these tendencies, the argument continues, its dominance has impoverished Canadian schol- arship on Parliament more generally (Atkinson & Thomas, 1993; Sproule-Jones, 1984). Malloy divides the flaws of the traditional approach into two categories: normative and theoretical. First, the tradi- tional literature emphasizes accountability as the primary value of democracy, when alternatives such as representation (in the congressional sense of government by consensus), deliberation, or inclusiveness are equally valid candidates (Malloy, 2002, 6). This results in difficulties separating the argument that the Canadian Westminster system is relatively good at what it does from the premise that accountability in the responsible government sense is a normatively superior approach to democratic governance. Such logic can be particularly troublesome for scholarship in the way it forecloses critique: “muddling through” taken to its farthest conclusion implies that any perceived dysfunction is self-correcting over the long run, a teleological and elitist justification of the way things are at present (Mackintosh, 1978, 308). The “traditional” perspective that minority governments are constitutionally dangerous because they allow superficial minority party interests to disrupt the business of government sounds anathema to contemporary ears (Stewart, 1977, 21–22). As Malloy points out, democratic norms have simply shifted since the 1950s–1960s as Canadian society has become increasingly diverse and Canadian voters more accustomed to minority governments. The appropriate scholarly response to such change should not be to shut down these trends as anathema to responsible government. Indeed, scholars within the traditional school more broadly imagined have tackled this normative issue in their work (J. Smith, 1999, 417). P.H. Russell’s reframing of minority governments as a positive embodiment of the deliberative values of Westminster parliament is an example that addresses Malloy’s concerns about the responsible government perspective. As Russell puts it, responsible government can also be understood from the contemporary theoretical perspective of deliberative democracy (P. H. Russell, 2008, 174), expressing political discussions so that all citizens can appreciate in the deliberation their voice being heard. Malloy’s argument also fails to appreciate how advocates of parliamentary reform in Canada have tended to be politicians, who have injected strong normative perspectives of their own into the academic debate. Progressive Conservative leader Robert Stanfield, for example, felt the dysfunctions of an overworked House would best be fixed by restricting the scope of what modern governments do (Stanfield, 1978, 48). Reform leader ’s vision of a less-disciplined House of Commons and an elected Senate elevated normative principles of electoral democracy over parliamentary ones (J. Smith, 1999, 55). Liberal Prime Minister ’s Charter of Rights and Freedoms likewise exemplifies a reform that upholds a valuation of constitutional or judicial democratic values over parliamentary democracy (J. Smith, 1999, 39). Related, it does not follow that placing a high value on the accountability mechanism of responsible government is inherently elitist or an endorsement of the status-quo of executive dominance. To assume Chapter 1. Introduction 8 so represents a simple reading of the literature that belies the complexity of parliamentary accountability as a process (Jackson & Atkinson, 1980). John Wilson’s (1988) defence of the values of parliamentary accountability highlights this point:

What the nuisance does is challenge the quaint notion—so often held by prime min- isters and presidents, but also by university professors and teachers of all kinds—the notion that such people have a corner on knowledge, and simple folk are expected to shut up and listen to them. By challenging this view the nuisance forces those in authority to hesitate just long enough to accept the possibility that they are wrong.... By doing this, by being a nuisance who gets in the way of an easy pas- sage; by constantly badgering those in authority; by forcing them to answer for their stewardship—not at every general election, but every day—we will be teach- ing them that they must accept the necessity of their accountability. We will be showing them, as Eugene Forsey so nicely put it some years ago, that “it is our Par- liament, not theirs. They are our servants, not our masters.”... [A]n understanding of the principle of the necessity of opposition—which lies at the heart of the suc- cessful practice of parliamentary government—shows that in society generally we should encourage every nuisance we can find... (Wilson, 1988, 26–31).

Both parliamentary accountability and electoral or popular accountability operate via adversarial legisla- tive processes; the distinction is one of process and potential outcomes. Parliament represents an ongoing process of executive accountability—“not at every general election, but every day”—in a popularly-elected House of Commons to which the executive is fundamentally responsible. That is, at any time, a serving government can be defeated on a confidence motion—and their subsequent resignation is “an obliga- tion, rather than an option (Sutherland, 1991, 95).” In contrast, in a congressional system that pri- oritizes electoral accountability, legislative members directly represent constituents through their votes and legislative participation. They can oppose or obstruct the executive via their institutional role as check-and-balance, but the executive is not responsible to the legislative branch and cannot be held to account except directly by the people at election time (a fact made salient in 2019 by the many oppo- nents, both Democratic and Republican, of President Donald Trump). In short, responsible government prescribes a strong role for parliamentary opposition as an anti-executive counterbalance that should not be overlooked. The responsible government literature also contextualizes the evolution of parliamentary account- ability in historical perspective. Sir John A. Macdonald’s conceptualization of responsible government in newly-confederated Canada endorsed strong central authority–and a strong dash of patronage. Since the 1870s, a “one-way street” transfer of power from the legislature to the executive has consistently taken place through government-sponsored reforms (Ward, 1958, 55–56). In short, the responsible gov- ernment literature views parliamentary reform with scepticism precisely because there has never been a “golden age” of parliamentary independence in Canada and previous reforms have had the unintended or intended consequence of compounding executive power (Mallory, 1979, 26). As executive dominance has adversely impacted responsible government, scholars such as Franks, Mallory, and Smith have been crit- ical of its overreach in their work. Finally, the opposition of traditional Canadian parliamentary scholars to reform has been overstated–so long as that reform genuinely strengthens the democratic function of responsible government (Stewart, 1977, 266). Most of these scholars are consistently comparative in their analyses of the Canadian case with the British system. Nearly all point out that Westminster has already faced similar problems and implemented a number of reforms Canada could readily emulate including procedures for closure, time allotment, and a strengthened role for committees that are not Chapter 1. Introduction 9 at all “congressional” in inspiration (Dawson, 1962, 254–255; Stewart, 1977, 241, 283). Indeed, Stewart admonishes Canadian politicians for failing to respond to precedents set by British procedural reforms (Stewart, 1977, 241, 283). Malloy’s second critique of the responsible government literature concerns its theoretical foundations and their implications for empirical work. An unfortunate structural fact of the Westminster model is that the day-to-day work of democratic representation typically occurs behind closed doors, rather than in public speeches or votes as in a congressional system. Secrecy amongst cabinet and colleagues is necessary for parliamentary effectiveness. Governments are strongest when they take backbench opinions seriously and can attain private consensus (or, at least, consent) of caucus members on policy directions; caucus unity underlies a strong public front of party discipline in Parliament that allows the government to get things done. However, such secret negotiation leaves little empirical evidence behind for scholars interested in evaluating the quality of representation. For example, the observation that government backbenchers have little to no influence in the House of Commons is one basis for reform proposals aimed at loosening party discipline and improving opportunities for private members’ legislation. On the other hand, backbenchers may have ample opportunity to have their voices heard in caucus meetings; unfortunately, there is no way of empirically ascertaining whether this is the case. Indeed, we might hypothesize that the happier backbenchers are with their private input in caucus, the less likely they would be to publicly raise dissenting opinions either in Parliament (or to a researcher). In general, theoretical propositions that are difficult or impossible to empirically tests are common in the responsible government literature (Malloy, 2002, 6). Related, this literature rarely engages with or contributes to comparative theories of parliamentary government (Atkinson & Thomas, 1993, 446). A response to this critique is that this difficulty is not confined to scholars within the traditional perspective, nor does it reflect lazy scholarship. Arguably, the reason why Canadian scholarship on Par- liament has tended to engage less with comparative theory is that empirical measures that are employed in the comparative literature (such as roll-call voting data) are not appropriate or valid measures in the Canadian context. Second, many rigorous and excellent studies of the Canadian Parliament never- theless rely heavily on private interviews with politicians and officials or historical case studies out of necessity; an example is Savoie’s work on executive power and accountability (Savoie, 1999, 2008, 2015). Nevertheless, political interview data has severe empirical limitations. In Tragedy of the Commons, Loat and MacMillan provide a contemporary and colourful example of the difficulty of reliance on interview data. They harshly critique the tendency of former MPs to deflect blame and responsibility away from themselves toward parties, leaders, or other MPs in their retrospective accounts of Parliament: It was startling to hear how often MPs accepted their own helplessness.... Avoiding responsibility for the problems that plague life on the Hill was a constant in our interviews. MPs can blame the political parties. They can blame the media and they can blame the culture of . But at its root, any parliamentary problem exists because the Members of Parliament allow it to exist (Loat & MacMillan, 2014, 228, 232).

The persistence of personal and partisan bias in recollections of former MPs is responsible for much of the cynicism and frustration expressed by Loat and MacMillan in their conclusion. Additionally, the difficulties of bias and simple fallibility of human memory inherent to interview data are compounded in the Canadian case by a comparative dearth of institutional memory. Turnover among Canadian MPs has historically been and remains substantially higher compared with other Westminster countries, especially notable in contrast with Britain (Atkinson & Docherty, 1992; Franks, 1987; Kerby & Blidook, Chapter 1. Introduction 10

2011). Despite these structural difficulties, Canadian scholars have made a persistent effort to overcome challenges of empirical measurement by undertaking the collection and distribution of new datasets on voting, legislation, and debates and making use of more complex methodological techniques to explore the role of MPs and parliamentary parties in Canada over the last two decades. Kam’s comparative work on partisan and cabinet dynamics in Parliament is an outstanding example, combining creative empirical work with engagement with comparative theory-building (Kam, 2000, 2001, 2006, 2009; Kam & Indridason, 2005). Other examples include work by Blidook, Penner, and Soroka on Question Pe- riod and constituency representation (Penner, Blidook, & Soroka, 2006; S. Soroka, Penner, & Blidook, 2009); Blidook, Byrne, and Kerby on parliamentary careers, media influence, and the constant cam- paign (Blidook, 2012; Blidook & Byrne, 2013; Blidook & Kerby, 2011; Kerby & Blidook, 2011); Soroka, Wlezein and others on agenda setting dynamics and media and campaigns (S. Soroka, 2002a, 2002b; S. Soroka, Cutler, Stolle, & Fournier, 2011; S. Soroka & Wlezien, 2010); Loewen and colleagues’ exper- imental work on policy communication, policy proposal power and electoral outcomes (Loewen, Koop, Settle, & Fowler, 2014; Loewen & Rubenson, 2011); and Godbout and Høyland’s work on partisan vot- ing patterns (Godbout & Høyland, 2011b, 2013). To sum up, as data have become more available and methodological techniques increased in sophistication, Canadian scholars of Parliament have risen to the challenge. There is no evidence that traditional parliamentary scholarship in Canada has driven off “the best students” interested in theoretical and methodological sophistication, as Atkinson and Thomas warned (Atkinson & Thomas, 1993, 43). Instead, the specificities of the Canadian case, as highlighted by the traditional approach, have provided starting points for a rapidly expanding array of sophisticated empirical studies. It is this strand of Canadian quantitative empirical work on Parliament within which this dissertation is situated and contributes to. A final critique of the Canadian literature on Parliament in general, related to a reluctance to engage with comparative theory, is its absorption with self-assessment and solving perceived practical problems, exemplified by the aforementioned literature on parliamentary reform (Atkinson & Thomas, 1993; Malloy, 2002; D. E. Smith, 2007; Sproule-Jones, 1984). Further, the locus of this scholarship has shifted from political science departments to journalistic, popular, and practitioner-oriented works that identify a decline in accountability and accessibility of the House of Commons (D. E. Smith, 2007, 12). As mentioned at the beginning of this introductory chapter, Tragedy of the Commons is a characteristic example of this trend. Its critical assessment is journalistic and Canadian-focused, with its comparative perspective limited to a search for tools from other countries that could be imported to address perceived deficits in the Canadian House of Commons. The blind spot of this critique is the possibility that literatures on other Westminster parliamentary democracies are also prone to scholarly insularity, a focus on declining performance and democratic deficits, demands for institutional reforms to decrease party discipline, and difficulties with empirical measurement.

1.2 The Parliamentary Decline Thesis

The Canadian debate as to when and how accountability in parliamentary government began to decline has much older roots in Britain, which has been struggling with the question of accountability of the executive to Parliament for at least as long as has existed. Low lamented how the House of Commons was no longer controlling, but controlled by, the Executive in his critique of the British Chapter 1. Introduction 11

Parliament written in 1904; Churchill declared the House of Commons “dead” and “marching docilely to execution blindfold” in 1923; and Lord Hewart bemoaned “the new despotism” of Parliament in 1929 (Flinders & Kelso, 2011, 255). In short, what current scholars have termed the “parliamentary decline” thesis has been tenacious in the British literature on Parliament for over a century, most interestingly as the precursor to the emergence of public policy and policy network studies in Britain (Hollis, 1949; Polsby, 1975; Richardson & Jordan, 1979; Sampson, 1962; Shanks, 1961). Even as C.E.S Franks upheld the larger and less-disciplined British House of Commons as a possible model for Canadian emulation in the 1970s and 1980s, the British literature was proclaiming the decline of its parliament from an era when party discipline was weaker and MPs demanded greater accountability of the executive (Franks, 1987, 254). Even so, Robert Stanfield’s observations regarding the dysfunctional operation of Canadian parliamentary politics make judicious citation of contemporary British analysis of the dysfunction of Westminster (Stanfield, 1978, 42). Looking back on this literature, Russell and Cowley (2016) argue the parliamentary decline thesis has spread, typically through the typologies of comparative politics, to other Westminster countries and has had a dampening effect on their domestic parliamentary scholarship. Scholarship on the In- dian Parliament’s Lok Sabha (Lower House) provides an interesting example. In recent quantitative work, Indian scholars find significant evidence of substantially improved representation of previously marginalized groups such as women, Scheduled Castes, Scheduled tribes, and people in agricultural oc- cupations among elected MPs from 1950 to present (Ayyangar & Jacob, 2014; Jayal, 2006; Shankar & Rodrigues, 2010). The evidence on these members’ debate participation is mixed: members of histori- cally marginalized groups continue to speak up less frequently than average in the Lok Sabha, a trend which has changed little over the past 30 years. Evidence from Question Hour shows that opposition MPs have been surprisingly effective in holding the executive to account and representing local constituent interests through use of questioning, although national parties continue to dominate subnational parties (Ayyangar & Jacob, 2014, 2015). Notwithstanding these empirical trends, the dominant perspective in the Indian literature, as Ayyangar and Jacob put it, is “bordering on despair... with words such as ‘decline’, ‘corrosion’, ‘atrophy’, ‘endangerment’, and ‘death’ being used to describe its functioning.” (Ayyangar & Jacob, 2014, 3). The usual explanatory suspects familiar to Canadian scholars are all present in the Indian literature: a lack of resources for opposition members to prepare questions (Na- tional Social Watch, 2009), the weakness of parliamentary committees, an institutional role bias toward stalling and disruption on the part of the opposition (Wallack, 2008), excessive party discipline, party system fragmentation, excessive turnover in MP careers, and, finally, an excessive adversarialism that turns the public away from politics (Kapur & Mehta, 2006). In summary, the disjuncture between a “general impression of parliamentary decline” in the literature (Ayyangar & Jacob, 2014, 3) and empirical studies that present a more nuanced picture of institutional strengths is not only identifiable in Canada, but in Britain, India, Ireland, Scotland, and Australia as well (Bach, 2008; Benton & Russell, 2012; Elgie & Stapleton, 2006; M. Russell & Cowley, 2016; M. Russell, Gover, & Wollter, 2016; Shephard & Cairney, 2005; Thompson, 2012). The parliamentary decline argument is not without some explanatory merit, as mixed quantitative evidence in the Indian case, for example, makes clear. However, many of the scholarly criticisms of traditional parliamentary studies in Canada, as discussed above, are also directly applicable to the parliamentary decline literature that succeeded it, both in Canada and in other Westminster countries. First, a lack of empirical verification has perpetuated the unexamined repetition of a dire picture of Chapter 1. Introduction 12 parliamentary governance. For example, Russell and Cowley (2016) present perhaps the most thorough analysis of the influence of the British House of Commons on policy ever conducted, utilizing a mixed- methods approach combining over 500 interviews with parliamentarians and officials; four human-coded datasets of recorded votes, bill and amendment texts; and committee reports spanning fifteen years. They conclude that in Britain, “the conventional ‘parliamentary decline’ thesis... is inconsistent with the empirical evidence. The data that we have presented make clear that the British Parliament has significant influence, at all stages of the policy process (M. Russell & Cowley, 2016, 132).” In their conclusion, they cite an argument made by British scholars Flinders and Kelso (2011) that “the dominant public media and academic perception of an eviscerated and sidelined parliament provides a misleading caricature of a more complex institution. Moreover the constant promotion and reinforcement of this caricature by scholars arguably perpetuates and fuels public disengagement and disillusionment with politics (Flinders & Kelso, 2011, 249).” Second, Flinders and Kelso propose the parliamentary decline thesis misrepresents a normative argument—about what type of accountability role parliament ought to be performing given shifts in societal values—as an empirical performance evaluation of parliamentary institutions (Flinders & Kelso, 2011, 260). The UK Parliament is, in their view, performing excellently at delivering responsible government; whether or not society values a different standard for democratic accountability is a different issue. This parallels Malloy’s critique of the responsible government literature and its normative content, except in reverse. In sum, the literature on Westminster parliaments broadly has swung from strongly supporting the system and maintaining strong skepticism of reform to unfairly condemning the system and overemphasizing its lack of capacity for change. I argue the empirical reality is somewhere in the middle—and is shaped by the particular institutional and political characteristics of Westminster-style parliaments across different countries.

One worrisome implication of the unchallenged parliamentary decline narrative is that it is in part constitutive of the perception of a democratic deficit in Westminster model countries. For example, the quantitative literature on the Canadian democratic deficit, best exemplified by Neil Nevitte’s The Decline of Deference (1996), investigates empirical trends in measures like political efficacy and institutional trust from the perspectives of citizens. Revisiting the “decline of deference” argument in 2012, Nevitte and White find that Canadians’ confidence in parliament, the courts, and the civil service has shown no significant trend across 1981-2007, according to World Values Survey results (Nevitte & White, 2012, 55). These measures of institutional confidence have, on the other hand, dropped in the over the same time frame. As of 2006, Canadians’ evaluations of democratic performance were on par with countries such as Australia and Germany and slightly lower than those of Finland and Sweden, compared to substantially lower assessments in the United States and Britain (Nevitte & White, 2012, 61). In both Canada and the United States, those citizens who are dissatisfied with democracy engage politically at similar rates to those who are satisfied, sometimes participating even more frequently depending upon the type of action (Nevitte & White, 2012, 69). Yet, despite the emergence of significant empirical variation in Canadian data, especially on political values, Nevitte and White backtrack and uphold the standard decline argument in their conclusion: “the growing unhappiness with political institutions is clearly associated with a deeper dissatisfaction with democratic performance across all countries (Nevitte & White, 2012, 73).”

The literature on the democratic deficit itself observes that Canadians’ “mental image” of knowledge about Canadian democracy is important to their attitudes (Nevitte & White, 2012, 64), but there is surprisingly little self-reflection on how expert opinion shapes that mental image. This omission is Chapter 1. Introduction 13 disquieting given evidence of a value shift toward the political legitimacy of expert authority away from state-centred authority (Skogstad, 2003, 960), together with a broadening definition of “experts” to include media-savvy watchdog groups, think tanks, political bloggers, and Twitter commentators (D. E. Smith, 2007, 11). Average Canadians looking for authoritative information about their political system could not be blamed for developing an overwhelming sense of despair in looking at the titles of both academic and non-academic scholarship. For example, the polemical full title Tragedy in the Commons: Former Members of Parliament Speak Out About Canada’s Failing Democracy of Loat and MacMillan’s work does a disservice to the the nuances of their empirical results and recommendations. As Thomas noted decades earlier, “if all the criticisms were added up, one might easily conclude that the Commons ought to be abolished (Thomas, 1979, 57).” At the end of their review article on the Canadian parliamentary literature, Atkinson and Thomas foresee a duality to progress in the field:

Those who press ahead with the systematic study of legislative behaviour or legislative de- velopment in the hopes of building a body of generalizations will want to identify the limit of applicability of those generalizations, as well as their relevance to the problems of parlia- mentary government in Canada. Those who seek to study the Canadian Parliament within its own tradition, with emphasis on the special circumstances of Canada, will be obliged to explicate this tradition and explain how it differs from those of other systems (Atkinson & Thomas, 1993, 447).

I argue that such a duality, summarized by the contrast between traditional and parliamentary decline approaches to the Canadian Parliament, is a problematic oversimplification. It is precisely the special circumstances of the Canadian Parliament, including insights gained from within its own tradition of study, that can inform and inspire Canadian scholars’ contribution to generalizable comparative theory. Likewise, critical claims in Canadian parliamentary scholarship, as well as supportive claims of the responsible government literature, should be re-examined in light of new insights from comparative theories of parliamentary government and from recent methodological developments permitting new levels of empirical depth in the study of parliaments. Flinders and Kelso argue that “[s]cholars have a public duty to correct rather than propagate the myths that surround their chosen subject matter,” namely the parliamentary decline thesis; they encourage scholars of other Westminster-style parliaments to participate in the same endeavour (Flinders & Kelso, 2011, 265). It is within this context that I justify my decision to focus on the Canadian case in this dissertation. Throughout this chapter I have referenced a resurgence in the empirical study of parliaments in both national and comparative literatures, especially over the preceding decade. As I will describe in more detail in the next chapter, European parliamentary scholars have developed formalized, comparative theories of parliamentary debate rooted in new techniques for measurement and analysis of parliamentary speeches. In this dissertation, I will build on two such frameworks to develop a model of parliamentary accountability in debate texts, with a view to testing whether minority parliaments are more accountable than majority parliaments. However, the institutional specificities of responsible government in the Canadian system are not fully accounted for in these existing models. Thus, I begin the next chapter by exploring the definition and characteristics of responsible government and parliamentary accountability in Canada. Then, I examine theories of parliamentary debate developed by Proksch and Slapin (2014) and Bäck and Debus (2016) and highlight how the Canadian case illuminates oversights in these existing models. Chapter 2

Literature Review: Responsible Government and Parliamentary Debate

In this chapter, I will argue that the institutional practice of responsible government in Canada is important to an understanding of how parliamentary accountability motivates debate, and can inform models of the collective action problem posed by legislative speechmaking in parliaments. In the first section of this chapter, I overview a definition of responsible government and its constituent concepts, and examine the literature on how it varies under majority and minority government conditions. In the second part of the chapter, I examine theories of parliamentary debate that have emerged out of the comparative literature, focusing on a formal theory developed by Proksch and Slapin and the more flexible collective action model proposed by Bäck and Debus. I subsequently argue that both theoretical approaches fail to explain parliamentary speechmaking in the Canadian case due to their inadequate conceptualization of collective accountability, a key aspect of functioning responsible government and of the institutional purpose of debate in the House of Commons.

2.1 Responsible Government and Accountability

Responsible government is the foundation of parliamentary legitimacy in Canada. Most simply defined, it is the concept that a government is responsible to the people, rather than the monarch. In Britain, it originated in the shift of control of executive power from to Parliament. Ministers who formerly were directly appointed and empowered by the monarch instead attained power via a broader process of election that granted some parliamentary majority executive power (Savoie, 2008, 31). The concept is fundamental to the Westminster parliamentary model in general, but misunderstandings often characterize its use. As is common in the study of Westminster systems, this “slipperiness” is in part due to its status as convention, or a set of accepted rules and practices that institutional actors should feel obliged to respect (Marshall & Moodie, 1959, 41). This conceptual slipperiness is magnified in Canada, which obtained and adapted British-style responsible government gradually via the shift of control by British officials to elected assemblies in Upper and Lower Canada, and finally to Confederation when Canada was united, in the words of the British North America Act, “with a Constitution similar in Principle to that of the United Kingdom” (Constitution Act, 1867 ). This assumption of tradition meant the bulk of the BNA Act itself could be devoted to the parameters of federalism, the major formal

14 Chapter 2. Literature Review 15 constitutional difference between the British and Canadian systems. However, the unwritten practices of the Constitution of the United Kingdom also underwent organic adaptation to the Canadian context. As Smith points out, neither the sovereign nor the aristocratic class, the “dignified” institutions that balanced the Westminster House of Commons in Bagehot’s constitutional view, inspired the same sort of popular devotion in Canada (D. E. Smith, 2007, 8). Instead, he argues the House of Commons itself became the central institution of elite deference in Canada. In any case, in Canada, perhaps even more so than in Britain, responsible government is central to democratic legitimacy. The basic premise of parliamentary responsible government is that ministers of government—the executive—are responsible to the electorate via their responsibility to the House of Commons—the legislature (Marshall & Moodie, 1959, 41). As Smith summarizes, “it is because of its election that the House becomes the repository of popular sovereignty.... Legal sovereignty rests in Parliament and popular sovereignty in the people. The vital connection between the two is the (D. E. Smith, 2007, 4).” However, what is the government responsible for, in practical terms? In House of Commons Procedure and Practice, Bosc and O’Brien lay out a useful definition of how this convention is conceptualized in contemporary practice:

In a general sense, responsible government means that a government must be responsive to its citizens, that it must operate responsibly (that is, be well organized in developing and implementing policy) and that its Ministers must be accountable or responsible to Parliament. Whereas the first two meanings may be regarded as the ends of responsible government, the latter meaning—the accountability of ministers—may be regarded as the device for achieving it (Bosc & O’Brien, 2009, 32).

This definition expresses three senses in which we can understand the practice of responsible government. First, responsiveness is the representation of public preferences and the public interest in government policy. Second, effectiveness captures the quality and propriety of the government’s approach to re- sponsiveness. Together, responsiveness and effectiveness can be thought of as the objects or topics of accountability. I will argue later in this dissertation that the measurement of attention to these ob- jects of accountability in parliamentary debate can be used to empirically characterize trends in the practice of accountability. That practice forms the third concept in the above definition: parliamentary accountability, or the convention that ministers are responsible or accountable to Parliament, is the active process by which responsible government is achieved through public debate and questioning in the House of Commons. Accountability in this parliamentary sense can be further divided into individual and collective min- isterial responsibility. In general, individual ministers are thought to be responsible for matters of effectiveness, such as the proper and efficient implementation of a government program within their departmental mandate. Theoretically, a minister is responsible to the Crown and to the House for all activities of their subordinate departments (Bosc & O’Brien, 2009, 32). A corollary is that civil servants within those departments are responsible to their ministers, and are assured of the protection of their anonymity in advice-giving due to the fact that the responsibility rests with the minister (Savoie, 2008, 32). Collective ministerial responsibility, on the other hand, is generally a matter of responsiveness, involving the defence of a government’s policy direction or platform. Of course, there is considerable overlap between these distinctions. For example, as will be described below, collective ministerial respon- sibility rests on cabinet solidarity, so the line between an individual minister and a government being Chapter 2. Literature Review 16 accountable for some matter of effectiveness is rarely clear. And, individual ministers can certainly take the fall for unpopular government policies for which they were in some part responsible. A 2017 example is former Minister of Democratic Institutions , who shouldered the blame for the Liberal government’s failed promise of electoral reform and was shuffled to a different Cabinet position (Wherry, 2017c).

2.1.1 Individual Ministerial Responsibility

The definition and limits of individual ministerial responsibility have been well-explored in the liter- ature since the 1970s, primarily due to its increasing importance as well as challenges in the modern era of governance. Individual departments, and therefore the ministers accountable for them, control increasingly large bureaucracies tasked with administering and allocating benefits of the modern . However, deregulation, privatization, and public service reforms consistent with “new public man- agement” practices have further blurred the traditional line of accountability, running from front-line civil servants through their departmental managers to deputy ministers and finally political ministers ultimately responsible to Parliament. Despite this complexity, the conventional meaning of individual ministerial responsibility is well-understood in the literature. Kernaghan (1979) identifies two compo- nents of the conventional definition. The first is that a minister is accountable to Parliament for the individual conduct of all his or her departmental subordinates; in the case of serious administrative errors that minister is obliged to resign. As many scholars have pointed out, this aspect of the doctrine is a persistent myth that has not existed in practice in either Britain or Canada, based on empirical evidence (Finer, 1956; Kernaghan, 1979; Savoie, 2008; Sutherland, 1991). Realistically, no minister can be aware of nor devote attention to the thousands of actions taken by their subordinates in a complex modern government department. Administrative matters are considered separate from political matters. The second, political compo- nent of ministerial responsibility is that ministers must “explain and defend” the policy decisions taken by their department in Parliament (Kernaghan, 1979, 386). This convention rests in part on the con- ventions of political neutrality and civil service anonymity. Civil servants, hired on a merit rather than partisan basis, are expected to remain politically neutral in the discharge of their duties, and in turn are protected with anonymity under the aegis of their minister, allowing them to perform their duties and deliver advice without fear of political reprisal (Kernaghan, 1979, 386; Savoie, 2008, 55). According to many observers, this part of the convention has weakened since the 1980s. Public servants are in- creasingly required to testify before parliamentary committees as witnesses; new accountability officers reporting directly to parliament on the activities of departments have been established; and ministers and central agencies are more likely to exert political influence over departmental operations. In general, both ministers and the civil service could be accused of “cheating” the doctrine of individual ministerial responsibility in Britain and in Canada (Hood & Lodge, 2006, 49; Savoie, 2008, 59).

2.1.2 Collective Ministerial Responsibility

The second sense of parliamentary accountability derived from the conventions of responsible govern- ment, and the one this dissertation is primarily concerned with, is collective ministerial responsibility. Marshall and Moodie provide a definitive exploration of the concept in their work: “The substance of the Government’s collective responsibility could be defined as its duty to submit its policy to and defend Chapter 2. Literature Review 17 its policy before the House of Commons, and to resign if defeated on an issue of confidence (Marshall & Moodie, 1959, 62).” The first part of this definition is captured by the idea of cabinet solidarity: gov- ernment ministers must publicly stand together as a coherent administration, sharing the same position and speaking “with one voice” to Parliament (Sutherland, 1991, 95). Privately, cabinet members may disagree or negotiate with the prime minister on policy positions; publicly, they will be expected to “take responsibility for, and defend, all Cabinet decisions (Bosc & O’Brien, 2009, 33).” Marshall and Moodie refer to the metaphorical “shield” that cabinet solidarity represents (Marshall & Moodie, 1959, 68). It can be deployed to protect individual ministers from attack in the House of Commons, such as from the fallout of some policy-related matter of individual ministerial responsibility. However, it can also be invoked by the prime minister to enforce internal compliance among ministers on a divisive issue (Suther- land, 1991, 95). Any conflict which remains utterly unresolvable in private must be publicly expressed through the resignation of the minster from cabinet, representing a significant cost to the minister doing so (Bosc & O’Brien, 2009, 33). As Sutherland (ibid.) points out, cabinet solidarity is “an entitlement or right that the prime minister possesses in relation to cabinet colleagues;” however the confidence convention, the second aspect of the definition above, is an institutional requirement. If a government is defeated on a vote of confidence, “it is expected to resign or seek the dissolution of Parliament in order for a general election to be held (Bosc & O’Brien, 2009, 43).” Indeed, cabinet solidarity flows from this expectation or requirement. Governments benefit from cabinet solidarity precisely because they must act coherently and collectively in order to avoid defeat on substantial issues in the House of Commons and thus be forced to resign.

The principle of cabinet solidarity is enforced by the Privy Council oath of secrecy. However, the existence of parliamentary parties as the supporting apparatus of cabinet means that solidarity (and secrecy) flows downwards to the caucus level. Again, backbenchers may be free to speak in private caucus meetings about their policy disagreements with cabinet decisions; publicly, however, dissent is limited by the influence of generally high levels of party cohesion in Westminster systems. Such cohesion can be maintained via formal (and costly) measures of party discipline, or informal means of socialization and incentives (Kam, 2009, 15). Again, in a Westminster parliamentary system, private secrecy and public loyalty are fundamental to the success of a government—especially a minority government, which faces an ever-present threat of defeat on a confidence measure and needs all the supporting votes it can muster (Savoie, 2008, 39). Even in majority government situations where the worst-case of defeat is unlikely to happen, party cohesion remains critical to government (and opposition) strength; unchecked dissent can lead to embarrassing defeats on more minor government bills or destabilize the control of party leadership (Kam, 2009, 9–11). As discussed in the first chapter, critics of party discipline in Canada argue that it stifles the representation of minority views, minimizes the role of backbench MPs, and ensures that major policy decisions are made behind closed doors. However, such a position overlooks the fact that it is party discipline that enables the practice of collective accountability, and thus of responsible government, in the first place. If governments were unable to act cohesively they could not form stable governments, nor could they be held meaningfully to account for directions taken as a government. Indeed, Sutherland notes that parties and collective responsibility are essentially synonymous: “both are constituted of shared preferences and shared views about how preferences can be acceptably realized (Sutherland, 1991, 96).” Parties provide a shorthand to the electorate to make these platforms visible and these groups of legislators accountable. Chapter 2. Literature Review 18

2.1.3 Role of the Opposition

In short, collective ministerial responsibility is what makes accountability work in the House of Commons; government by individual MPs without the structure of cabinet solidarity or disciplined parties would devolve into chaos. What role remains for the House of Commons on the other side of the equation? Savoie summarizes the task of the House in this position as “not to govern but to act as a public forum, to be the country’s leading deliberative body, to focus opinion, to criticize government, and to hold ministers accountable (Savoie, 2008, 32).” Although the executive is nominally responsible to the legislature as a whole, given the bonds of government party cohesion these roles fall to the opposition party, or parties, in the House of Commons. The Official Opposition is otherwise called “Her Majesty’s Loyal Opposition” for a reason—in service to the more general principle of responsible government, the opposition is tasked with the responsibility of countering the government on its policies and on its conduct (Bosc & O’Brien, 2009, 37). Again, notwithstanding views about the deleterious effects of adversarial politics on public discourse, the clear separation between government, government supporters, and opposition is a virtue of the Westminster model; it leaves no confusion to the voter as to who is responsible. As C.E.S Franks summarizes,

Parliamentary government is a great simplifier, especially in creating a stark distinction be- tween those who hold power and those who do not. In so doing it resolves one of the most difficult things to achieve in a political system: to ensure accountability by binding responsi- bility to power.... Neither can be evaded by a cabinet. This virtue is underappreciated, for it is typical of human societies that those who exercise power over others seek to deny or hide it (Franks, 1987, 265).

A cabinet may not be able to evade responsibility, but governments nevertheless seek to maximize their legislative effectiveness and minimize their exposure to opposition scrutiny. In general “Parliamentary procedure must balance the government’s power to manage the business of the House, against the opposition’s responsibility to hold the government accountable (Bosc & O’Brien, 2009, 43).” From an opposition perspective, “holding to account” has two components. First, there is the right to demand information and explanation from both individual ministers and from the government. Due to cabinet solidarity, no minister can be forced to resign for refusing to provide information if the opposition were to demand it (Sutherland, 1991, 96). Nevertheless, the House of Commons retains a right to information, or answerability, which makes possible the role of the legislature in holding government to account for effectiveness in its discharge of its duties (Marshall & Moodie, 1959, 69). As mentioned earlier, contests over the appropriate interpretation of individual ministerial responsibility are in practice typically questions of the answerability requirement (Kernaghan, 1979, 385). The political force of this convention is such that despite there being no institutional threat to a minister who refuses to answer questions, nor a requirement to explain this refusal, ministers nevertheless almost always respond due to the threat of criticism from the opposition, media, and public opinion (Kernaghan, 1979, 389). Oppositions can obtain some political leverage among the electorate by demanding resignation of a minister for serious personal or departmental misconduct, although, as described above, this aspect of the convention is ineffective (Kernaghan, 1979, 388). The second opposition role in accountability parallels the conceptual shield of collective ministerial responsibility and derives from the confidence convention. In a case where the government loses the confidence of the House—an outcome which, in practice, hinges on the opposition’s decision to defeat the Chapter 2. Literature Review 19 government—the House is empowered to apply sanction and demand the resignation of the government (Marshall & Moodie, 1959, 69). Following such a defeat, the prime minister conventionally proceeds to request dissolution of Parliament from the governor general so that fresh elections can be held. This request is generally granted as a matter of course. Nevertheless, in this situation the governor general retains discretion, or reserve powers, that flow from and are critical to the operation of responsible government (Forsey, 1964, 7). It is always the responsibility of the governor general to appoint the prime minister, which in turn requires the governor general to select the leader who best commands majority support in the existing House of Commons. Following an election, this selection is straightforward: the governor general simply appoints the leader of the party that won the largest number of seats in the House of Commons. Following a vote of non-confidence, however, the governor general faces a unique decision predicated upon the “factual configuration of the House and any alliances, formal or informal, among the various parties (Slattery, 2009, 88).”

If an opposition leader can form a government that reasonably can sustain the confidence of that House, the governor general may very well grant that leader the opportunity to form a government instead of proceeding to dissolve parliament. It was a similar situation that took place during the Kyng- Byng affair of 1926, leading to the appointment of opposition leader as Prime Minister (though Prime Minister Mackenzie King avoided the unpleasantness of a non-confidence vote by pre- emptively requesting dissolution). Likewise, the prorogation crisis of 2008 involved the potential for an opposition coalition to form government after a confidence defeat. In 2017, the Legislative Assembly of witnessed a transition in power between a Liberal and NDP minority parliament after a vote of non-confidence took place immediately following an indecisive election. In short, under the convention of responsible government the opposition plays a non-trivial role as a “government in waiting.” This can take literal form, as in the examples above, or figuratively as even under majority conditions when a government is secure against defeat, an opposition can demonstrate through its performance in the House of Commons—both in holding the government to account and representing views on policy goals that may or may not coincide with those of the government—a competence and capacity to govern that presents the electorate with a credible alternative whenever elections next take place. As Eddie Goldenberg, Jean Chrétien’s longtime policy adviser and chief of staff, puts it: “Keeping a government on its toes, and making it prove its case, is only half the job of an Opposition party; the other half is getting ready to govern (Goldenberg, 2006, 34).”

The dynamics of the parliamentary accountability relationship between government and opposition vary substantively according to the relative number of government members. Most simply, whether a government has a clear majority of seats in the House of Commons should matter a great deal to how responsible government operates. A minority government must forestall the persistent threat of a non- confidence motion, and logically faces an incentive to be more responsive to the House of Commons at large—namely opposition parties—in order to survive (Forsey, 1964, 10–11). Because of this pressure, it is arguably under minority government conditions that the House of Commons best fulfills its constitu- tional role as a representational and discursive institution, in contrast to the executive-centered, prime ministerial government that characterizes majority parliaments (P. H. Russell, 2008, 161). As men- tioned in the Introduction, the Canadian scholarly view of minority governments in Canada represents a historical shift. Whereas up to the 1960s minority governments were perceived as inherently bad or aberrational to Westminster-style parliament, a greater appreciation developed of the responsiveness and accountability minority parliaments can engender as they became more common (Forsey, 1964; Geller- Chapter 2. Literature Review 20

Schwartz, 1979; P. H. Russell, 2008). The assumption that minority government are automatically more responsive has also been challenged in the literature, however, with historical case study evidence that minority governments vary in their responsiveness according to their strategic decisions in parliament (Falcone, 1974; Geller-Schwartz, 1979). The threat, and corresponding incentive, of a confidence motion is especially imposing in Canada. In Britain, the defeat of a supply bill has typically been considered an automatic vote of no confidence. However, in Canada, a convention of considering nearly all but the most insignificant motions as con- fidence motions had arisen by the 1940s (Geller-Schwartz, 1979, 76; Heard, 2007, 396). The increasing frequency of minority parliaments in the 1960s-70s forced reconsideration of the issue, yielding wildly varying scholarly opinions over the next decades as to what properly constitutes a motion of confidence. The clearest acceptable examples are those in which a government or opposition motion is purposefully worded that its defeat will imply a loss of confidence, and as in Britain major budget bills or the Speech from the Throne are considered no confidence vote opportunities (Heard, 2007, 397–398). Nevertheless, some constitutional scholars, including Peter Hogg, maintain a version of the traditional stronger view that defeat on any “important” vote reflects a loss of confidence (Hogg, 2014, 9-22). The relative strength of a majority or minority government, the ideological and strategic compatibility of opposition third parties, the level and character of party discipline within those parties, and the relationships between front and backbench MPs in both government and opposition all variously affect the strategic calculations parliamentary parties make (Gervais, 2012, 10). A significant slice of the recent Canadian empirical literature on parliament has focused on elucidating these influences under majority and minority conditions. Bourgault (2011) finds that Canadian minority governments tend to assert more centralized control over deputy ministers and make policy on a shorter-term, opinion- sensitive basis than do majority governments (Bourgault, 2011, 522–523). Pickup and Hobolt (2015) study Canadian minority governments from 1958 to 2009 and find quantitative empirical evidence that minority governments balance responsiveness and effectiveness differently than majority governments. They document how minority governments face a dynamic trade-off based upon their polling numbers: governments are more policy responsive to the average voter, and less effective at passing legislation, the lower their current popularity. They conclude that at a level of 40% polling popularity, a minority can essentially govern like a majority government (Pickup & Hobolt, 2015, 528–529). Godbout and Høyland (2011a) study inter-party voting coalitions in Canadian minority governments from the 38th to 40th Parliaments, finding that parties assess the decision to align their votes with government along two dimensions: support for government policies (as opposed to a traditional left-right ideological dimension) and orientation toward (Godbout & Høyland, 2011a, 479). Finally, Conley (2011) compares the legislative production of majority and minority governments and finds weak evidence of an overarching trend. Minority governments tend to produce fewer laws than majorities, but majority governments vary significantly on this dimension; contextual factors, Conley concludes, are the critical determining factor in legislative output across both types (Conley, 2011, 433). These empirical studies have investigated the claim in the literature that minority governments are more responsive to Parliament and to changes in public opinion than majority governments. However, little attention has been devoted to the question of whether minority governments are more or less accountable in Parliament, how the practice of accountability varies across the two situations, and how this relationship is shaped by contextual and strategic factors. Minority governments are empirically less effective in terms of their legislative production, but are those minority governments also held more Chapter 2. Literature Review 21 strongly to account for that program in the House of Commons? This dissertation seeks to answer this empirical question. Comparing accountability across a statistically reasonable number of majority and minority parlia- ments suggests a quantitative empirical approach. In his work on procedure in the Canadian House of Commons, John B. Stewart, a scholar and former parliamentarian, notes that parliamentary accountabil- ity is difficult to assess quantitatively. Potential metrics such as the number of dead bills or amendments to bills in committee are problematic. A large number of amendments to a bill, for example, could reflect a substantive opposition contribution, or a strategic decision by government to appease special interests or draw attention away from other more problematic aspects the bill contains (Stewart, 1977, 19–20). “In short,” Stewart concludes, “the public record provides no reliable evidence. What would be helpful instead is the detailed testimony of those who have participated in the preparation of the business of the House. In Canada such testimony is hard to come by (Stewart, 1977, 20).” The contem- porary availability of digitized textual data for the complete House of Commons Debates suggests an alternative: transforming the textual public record into quantitative data for measurement. To develop such an empirical strategy, I begin the second half of this chapter by examining existing work in the comparative literature on legislative debate, with a focus on quantitative, rational choice models. As I will explore, existing models cannot fully explain outcomes in the Canadian case. Instead, capturing the dynamics of parliamentary accountability under institutional conditions of strong party discipline ne- cessitates two developments: a model of parliamentary debate compatible with responsible government, and a definition of parliamentary accountability itself for measurement purposes.

2.2 Comparative Theories of Parliamentary Debate

My suggestion that the practice of parliamentary accountability can be quantitatively measured in parliamentary speeches draws on an existing literature on the empirical study of legislative debate. In a recent review of this comparative literature, Bächtiger (2014) outlines three approaches to the study of legislative debate: deliberative, discursive, and strategic (Bächtiger, 2014, 2). The deliberative approach is founded upon normative theories of deliberative democracy, most prominently those of Habermas (1996; 2005), and is centrally concerned with the concept of quality deliberation. From this perspective, open and respectful debate among reasonable participants should encourage the careful consideration of political alternatives in a decision situation. Ideally, such quality deliberation should result in the formation of consensus surrounding a mutually acceptable, and legitimate, solution. Fundamentally, this literature grapples with the question of whether are actually deliber- ative venues. Is it possible to persuade others, especially political opponents, of a policy position during a legislative debate (Fishkin & Luskin, 2005)? If not, is persuasion strictly necessary for quality deliber- ation, or is mutual communication sufficient to produce quality outcomes (Esterling, 2011; Mucciaroni & Quirk, 2006)? Finally, how can the quality of legislative debate as a deliberative process, and any potential beneficial impact on outcomes, be measured from an empirical perspective (Bächtiger, 2014; Steenbergen, Bächtiger, Spörndli, & Steiner, 2003)? Much of the research directed at these questions examines the United States Congress, an institution characterized by enough individual legislator au- tonomy to suggest that persuasion might be a politically-useful strategy. Mucciaroni and Quirk (2006) develop one such deliberative model of legislative debate in the US Congress. They argue that debate strategies revolve around a trade-off between creating an immediate, powerful impression—which may Chapter 2. Literature Review 22 or may not be accurate—and establishing a lasting, durable narrative that stands up to criticism, or, in their words, “force versus responsibility” (Mucciaroni & Quirk, 2006, 23). Legislators are motivated by individual goals, one of which is to persuade other legislators to support their policy position. However, this objective competes with others such as convincing or seeking support from an external, rather than internal, audience; seeking credit for taking a position, as opposed to convincing others; or developing and enhancing a reputation for competence or expertise (Mucciaroni & Quirk, 2006, 25–27). Empirically, Mucciaroni and Quirk’s model assumes that a legislator’s main goal in legislative debate is to convince their external constituency (voters) of their argument in a given policy debate (Mucciaroni & Quirk, 2006, 34). To investigate the impact of the strategic trade-off between forceful and responsible arguments, they perform a content analysis that categorizes arguments and content in Congressional speeches from 1995 to 2000 within three policy case studies. Following the general argument of the deliberative approach, they propose a “very good” debate is highly informative and makes few mislead- ing claims, while a “very bad” debate is highly misleading and provides little information, then classify speeches on this continuum (Mucciaroni & Quirk, 2006, 47,51). The final results are fairly disappointing from a deliberative perspective, with 39% of speeches classifiable as poor or very poor. Esterling (2011) updates their work with a more sophisticated structural equation model of Congressional hearings on Medicare. His textual measure is the falsifiable argument, namely one that asserts validity claims rather than relying on anecdotal or untestable claims (Esterling, 2011, 140). He find support for the hypoth- esis that the use of falsifiable arguments increases under conditions of moderate partisan disagreement compared to extreme disagreement (Esterling, 2011, 183, 187). This work draws attention to some important features of legislative debate, such as the distinction between internal and external audiences and the idea of trade-offs in debate strategies of persuasion— especially that arguments intended to be persuasive need not involve factual and verifiable claims on a policy issue. These models, as their authors are explicitly aware, are strongly institutionally specified. Conditions of partisan polarization matter in the US congressional context to deliberative quality, for example, even as the institutional structure incentivizes the individual rather than collective goals of legislators. Likewise, individual actors in the US congressional system possess individual resources such as access to interest groups and individual freedoms such as a lack of strong party discipline that change their strategic calculations in both voting and debate behaviours. Comparative work to explicate these institutional and cultural factors, including the development of a general Discourse Quality Index (DQI) (Lord & Tamvaki, 2013; Steenbergen et al., 2003), have drawn attention to institutional factors such as voting rules, bodies such as second chambers and committees, structures, and opposition party status as influential upon deliberative quality indicators (Bächtiger & Hangartner, 2010; Caluwaerts, 2012). Second, the discourse perspective on legislative debate investigates legislatures as social institutions characterized by norms, rules, and procedures shared among members. This scholarship is primarily qualitative and employs interpretive or close-reading methodologies to interrogate explore the “recurring linguistic patterns and rhetorical strategies used by MPs that help to reveal their ideological commit- ments, hidden agendas, and argumentation tactics (Ilie, 2015, 2).” Case studies of particular individuals, institutional practices, or contentious policy debates are frequently employed. Like deliberative scholars, discursive scholars emphasize how legislators combine strategic positioning with theatrical or audience- oriented showmanship; unlike the deliberative approach, however, the adversarial or antagonistic features of debate are considered both important and inherent to the institutional performance (Ilie, 2015, 7). Chapter 2. Literature Review 23

Indeed, discourse scholars are particularly interested in analyzing “the norm deviations, rule violations, and verbal disruptions that can most clearly reveal various peculiarities” of debating institutions (Ilie, 2015, 8). For example, Ilie’s comparative work on politeness, insults, unparliamentary language, and interruptions in parliaments in Sweden and the United Kingdom is perhaps the most thorough example of scholarship from this perspective (Ilie, 2001, 2004, 2010, 2015). Her research has encouraged further cross-cultural comparative work on “face-threatening” debate practices in Austria and Italy (Bevitori, 2004; Zima, Brône, & Feyaerts, 2010). While the majority of this scholarship is inherently qualitative in orientation, some scholars have employed computer assisted textual methods to study thematic patterns; an example is Schonhardt-Bailey’s (2008) case study of debates on partial-birth abortion. Labbé and Monière’s linguistically-oriented work on speeches by Canadian prime ministers and Quebec political leaders represents a Canadian example of computer aided discursive analysis (Labbé & Monière, 2003, 2010; Monière, Labbé, & Labbé, 2008).

Finally, the rational choice perspective on legislative debate, alternatively the strategic or partisan rhetoric approach, originated as an adaptation of existing literatures on voting behaviour and agenda setting to the sphere of debate. This approach shares some common theoretical ground with scholars such as Mucciaroni and Quirk in the deliberative realm, namely a conception of strategic motivations for debate. To a vote-seeking rational politician, legislative speeches represent opportunities to advertise one’s candidacy, communicate policy positions, and take credit for popular decisions (Mayhew, 1974). Scholars within the rational choice framework seek to model how these individual motivations combine with institutional and partisan or collective factors to shape both a legislator’s decision to speak and their choice of what to speak about (Bäck & Debus, 2016; Proksch & Slapin, 2014). The dominant measurement approach to this problem is to adapt the methodology of “scaling” roll call voting data on a left-right axis in order to estimate the policy positions of legislators, substituting textual data from speeches for vote counts (Lowe, Benoit, Mikhaylov, & Laver, 2011; Slapin & Proksch, 2008). More generally, some form of strategic approach is shared by much of the existing Canadian empirical literature on parliamentary debate. Representative examples include Blidook’s (2010) investigation of Private Members’ Business, Blidook and Kerby’s (2011) study of constituency-related questions in the House of Commons, Soroka et al.’s (2009) examination of dyadic representation during Question Period debates, and Penner et al.’s (2006) study of the representation of policy issue priorities in Question Period.

The strategic perspective on legislative debate will be the approach I start from in this dissertation, as its theoretical concepts are the most applicable to a longitudinal, quantitative analysis across many majority and minority parliaments. The discursive approach is best suited to qualitative case studies, and would simply be unwieldy in this study. I also rule out the deliberative approach for three reasons. First, although the deliberative literature investigates relevant independent variables such as conditions of partisan disagreement, its inherent dependent variable—namely the “aspirational” goal of high quality deliberative debate—remains problematic. Despite attempts to operationalize and generalize measures of discursive quality in the literature described above, many of the process-oriented characteristics of quality deliberation inherent to deliberative theory (for example, mutual respect among participants, or the educative function of debate) remain contested and difficult to measure. Given the strong foundation of the approach in normative theory, this literature has had difficulty assessing whether negative em- pirical findings are merely cases of measurement problems or reflect deeper issues (Bächtiger, 2014, 19). More generally, the deliberative empirical literature on parliaments has been criticized for conceptually Chapter 2. Literature Review 24 stretching the idea of deliberation, and failing to clearly distinguish between deliberative and strategic motivations for legislative debate (Steiner, 2008). Second, the measurement of debate quality remains methodologically reliant on manual coding, often at the individual sentence level, even when well-defined criteria such as falsifiable arguments are em- ployed. While natural language processing methods for argument parsing are emerging in the computer science literature, they are not as yet developed enough to reliably automate this type of measurement for large datasets (see Chapter 4). Finally, the deliberative approach is simply not very helpful theo- retically for understanding Westminster parliamentary systems. Given institutional characteristics such as a strong adversarial dynamic and strong party discipline, deliberative models simply predict a low quality of debate will occur. As this outcome is not very interesting from a normative perspective, it remains undertheorized in this literature. Indeed, empirical work from the deliberative perspective has found little difference in debate quality between high discipline parliamentary systems and low discipline presidential systems (Bächtiger, 2014, 14). In short, debate quality is not an appropriate concept for the research questions I seek to answer. In the following sections of this chapter, I examine more closely two theoretical models from the rational choice literature that are the good candidates for understanding and measuring accountability in parliamentary debate.

2.2.1 Proksch and Slapin (2014)

The most developed rational choice model of parliamentary debate in the comparative literature is the formal model proposed by Proksch and Slapin (2014) in their book The Politics of Parliamentary Debate. While they acknowledge debate can serve purposes of both persuasion and position-taking, they assume the motivation for individual representatives to speak is the latter: parliamentary speeches “allow MPs to stake out a position and communicate it to their parties and to voters (Proksch & Slapin, 2014, 17).” However, party leaders also face varied institutional incentives to control members’ speeches in the interests of presenting a united front to the electorate. Leaders also control access to floor time for speeches, and thus agenda-setting power over their party’s debate agenda (Proksch & Slapin, 2014, 27). Thus, speechmaking is a collective action problem that can be modelled as a strategic game between leaders and rank-and-file members. Parties benefit as an electoral team when members make speeches that toe the cohesive party line; on the other hand, individual members often stand to gain by defecting from the party line in their speeches in order to cultivate a personal vote. Proksch and Slapin’s main argument is as follows. Where electoral rules incentivize the development of strong party brands, party leaders face a stronger incentive to control the speeches of individual MPs. Where electoral rules provide less of an incentive to create a strong party brand, and encourage the pursuit of an individual vote by MPs, party leaders will face a weaker incentive to control speeches (Proksch & Slapin, 2014, 28). Their formal model of this process takes place as a delegation game. Given the opportunity to deliver a speech about a certain issue, a can either choose to speak themselves or can delegate the speech to one of their MPs, perhaps because that person has particularly relevant expertise or because the party wants to give them some exposure. Party leaders and members both possess their own ideal positions on the issue, and a delegated speech delivered by a MP is likely to deviate from the party leader’s preferred position to reflect their own to some degree. The designated MP faces a trade-off between representing their own policy position (or policy-seeking), or toeing the party line (office-seeking) during their speech, and will settle at some combination of the two that maximizes their own electoral utility. The party leader likewise maximizes his or her utility by either making the Chapter 2. Literature Review 25 speech themselves (which entails costs of time, research, and potential benefits of public exposure for the MP), or by delegating the speech (and thus risking some amount of defection, or policy loss, in its content) (Proksch & Slapin, 2014, 36–38). Proksch and Slapin test their empirical model using the Wordfish scaling methodology (see Chapter 4) to measure the policy positions of MPs and party leaders in parliamentary speech texts scaled to a single left-right ideological dimension, focusing primarily on a comparison between Germany and the United Kingdom. Proksch and Slapin’s model is an excellent adaptation of a formal modelling approach to party cohesion and roll call voting to the sphere of parliamentary debate. However, it cannot adequately account for the Canadian case. In their application of the model to Britain, Proksch and Slapin confirm that, according to their theoretical expectations, a single member plurality electoral system encourages members to cultivate a personal vote and party leaders to exert weaker control over MP speeches. This is evidently not the case in Canada. Notwithstanding generally high levels of party discipline, free divisions sometimes occur in the Canadian House of Commons and MPs do vote against their parties from time to time (Kam, 2009, 8). However, partisan dissent in legislative speech is essentially unheard of in Canada, at least since the mid-20th century (Docherty, 1997, 170). In the contemporary House of Commons, government members deliver talking points and even verbatim speeches dictated by the Prime Minister’s Office, a practice entrenched by Conservative governments from 2006-2015 (Harris, 2014). In short, the Canadian example both undermines their simplification that the electoral system is the significant source of institutional variation across cases, and a more fundamental assumption of their delegation model that speech dissent reflects a lesser level of latent intra-party disagreement than does voting dissent (Proksch & Slapin, 2014, 26). Their model could be extended to account for party discipline in Canada; indeed, Proksch and Slapin outline an analogous possibility as a hypothetical case (Proksch & Slapin, 2014, 38). That is, if we assume that MPs will always toe the party line in any speech regardless of their personal policy position (in other words, we set the individual MP’s relative weight on office-seeking to 1), then party leaders will always delegate speeches. They can completely avoid the cost of speaking, maximize exposure of individual MPs, and obtain no policy loss all at the same time. However, it is clearly not the case that the Prime Minister nor the Leader of the Official Opposition never or only rarely ever speak in the Canadian House of Commons. Their empirical rate of participation is clearly affected by institutional considerations, the most obvious of which being the conventional requirement that the Prime Minister respond to questions during Question Period; in other words, the practice of parliamentary accountability. Likewise, this understanding of party discipline is historically static. Research on British and Canadian political development has demonstrated how disciplined parties emerged and adapted over time to changing institutional circumstances while electoral systems changed relatively little (Eggers & Spirling, 2014; Godbout & Høyland, 2011b, 2013). It would be too complex to include all the specific factors that have shaped this historical variation, such as differences in political culture, parliamentary procedure, or regional cleavages. Foremost among these comparative factors, however, would be the balance of power in the legislative assembly—namely how many parties exist and their relative levels of (majority of minority) institutional power. Over and above its conceptualization of party discipline, I argue Proksch and Slapin’s model also falters in its primary assumption that speeches are solely an exercise in policy positioning. Such a simplification makes sense within the context of a spatial theory of voting: a vote can signal only support or opposition for a proposal, and the aggregation of votes thus reflects an average pattern of orientations Chapter 2. Literature Review 26 towards political questions. However, legislative speech is a far more complicated method of political communication than a yes or no vote. In theory, speeches could reflect policy positions, bargaining tactics, reflexive opposition, institutional practices, personal values, or informational communication, among many possibilities. Furthermore, while votes serve a limited and highly institutionalized function within a legislature, speeches can and very often are directed at diverse audiences outside the chamber itself. The simplifying assumption that policy positioning is the significant observable dimension in political speechmaking is additionally problematic when applied to the Westminster parliamentary context. From this assumption flows Proksch and Slapin’s proposition that the only institutional source of variation is difference in electoral system, since the role of an electoral system is to translate voter preferences into representation of those preferences. Again, this makes sense from the context of theoretical origins in the spatial theory of voting. Voters vote for the party or representative closest to their spatial preferences, and then their representative votes on their behalf for or against policy positions. Within such a system, the main source of variation is how voters’ preferences are transformed into legislative votes. However, in the Westminster parliamentary model, representatives are elected not only to represent policy preferences but to make a government (or exert executive power) and to make that government accountable for its decisions. This observation is well-recognized in the discursive literature on parliamentary debate, which emphasizes the tension between the theatrical features of parliamentary discourse—through which MPs seek to reveal their beliefs and shape others’ beliefs—and the competitive or adversarial features—which proclaim winners and losers, or “destabilize and reestablish the power balance (Ilie, 2015, 7).” In the Canadian literature, empirical efforts to classify Canadian parties on a left-right ideological dimension based upon their parliamentary speeches have met with little success, primarily because the institutional dynamic of government and opposition washes out ideological differences (Hirst, Riabinin, & Graham, 2010). More fundamentally, there is evidence that the spatial model of voting upon which speech scaling methodologies such as Wordfish are based is also inappropriately specified for parlia- mentary contexts. It is well-recognized that pure parametric scaling of roll-call votes is unrealistic in parliamentary systems: party discipline and whipped votes mean that votes are unevenly distributed and non-independent. As Spirling and McLean (2006b) demonstrate, methodological attempts to com- pensate for these issues, including Poole’s Optimal Classification (OC) procedure, still fail to account for the strategic voting engendered by the dominance of government-opposition dynamics in Westminster parliaments. Empirical studies employing OC methodology in parliamentary contexts have confirmed this critique. In the Canadian case, Godbout and Høyland (2011b) fit a two-dimensional OC model. Unsurprisingly, the dominant dimension reflects government and opposition dynamics; however, they find that regional positioning, typically orientation towards Quebec, is a more accurate label for the “ideological” dimension recovered in an OC analysis of Canadian roll call voting data. In other work, they argue that such a dimension also underlies the formation of voting coalitions in recent Canadian minority parliaments (Godbout & Høyland, 2011a). More generally, Hix and Noury (2016) perform a comparative analysis of roll call voting data across 16 legislatures, applying the newer IDEAL scaling method. They conclude that government-opposition dynamics are “the main drivers of voting behavior in most institutional contexts,” with only coalition governments and parliamentary minority governments showing evidence of policy position voting behaviour (Hix & Noury, 2016, 250). In sum, Proksch and Slapin’s simplifying assumption that legislative speeches reflect policy posi- tions, and that similarity or difference between two speeches is a measure of ideological agreement or Chapter 2. Literature Review 27 disagreement, is problematic at multiple theoretical levels. In spatial models of roll call voting, insti- tutional variation affects the “ideological” dimensions recovered: for example, the impact of collective responsibility for executive decision-making and the existence of a confidence convention (Carey, 2007). We should expect these problems to worsen when voting models are extended to speech data. The literature strongly suggests the most significant dimension recoverable in a textual analysis of speeches in a Westminster-style parliament should be government-opposition dynamics begotten by responsible government, not ideological or policy differences. Indeed, Proksch and Slapin’s main comparative case provides a good illustration of this distinction. In contrast to Britain, German federal ministers are ac- countable only to the Chancellor and there is little practice of collective accountability (Germany: Basic Law for the Federal Republic of Germany, 1949, §65); this difference should significantly impact parties’ incentives to control the debate agenda above and beyond electoral system differences. Unsurprisingly, empirical evidence in the literature for a strong link between ideological positioning in speeches and ideological positioning in roll call voting focuses on the German Bundestag (Baumann, Debus, & Müller, 2015).

2.2.2 Bäck and Debus (2016)

Recent scholarship by Bäck and Debus (2016) has highlighted the limitations of the formal speech model proposed by Proksch and Slapin, arguing it overlooks important variables at the individual, partisan, and institutional levels. To begin, both pairs of scholars agree on the starting point of modelling legislative speeches as rational, strategic decisions wherein individual MPs seek to balance competing goals of policy, office, and vote-seeking (Müller & Strøm, 1999). However, Bäck and Debus argue that individual characteristics such as gender, personality, or adherence to parliamentary norms also significantly shape an MP’s decision to speak (Bäck & Debus, 2016, 26). Second, both pairs agree that speechmaking represents a collective action problem for parties, which likewise face a trade-off between policy, office, and vote-seeking goals in deciding who gets to speak. On this point, however, Bäck and Debus note that parties face an additional fourth goal not present at the individual level: internal cohesion. In the service of party cohesion, for example, party leaders may effectively assert some level of control over speech content despite delegating the task of speaking to an MP (Bäck & Debus, 2016, 20). Parties may also differ in their relative weights placed upon office-seeking or policy-seeking goals, or focus disproportionately on policy issues that are particularly salient to their base or founding principles (Bäck & Debus, 2016, 29). Finally, at the institutional level, Bäck and Debus note that MPs who hold a “mega-seat” position such as a ministry, or MPs who are part of a governing party, are likely to face differing incentives in speechmaking. That is, government ministers are more likely to toe the party line, while government backbenchers are less likely, in comparison with MPs who do not belong to the governing party (Bäck & Debus, 2016, 40). They also expect the influence of electoral system on speeches to vary according to finer-grained features of those systems, such as whether an MP in a proportional representation system was selected via party list, or how close the electoral race was for the election of a given MP (Bäck & Debus, 2016, 44). To bring together these additional variables in a rational choice approach, Bäck and Debus develop a collective action model based upon the calculus of voting approach that originated with Downs (1957) and Riker and Ordeshook (1968). In such a model, the utility of participation (U) in an an election can be represented as a function of the probability P that one’s vote will be the decisive vote in the election, B the benefit the voter receives from realizing their desired outcome, and C, the costs of voting. The Chapter 2. Literature Review 28 paradox of participation, namely that P is extremely small making voting irrational given its cost, was addressed by Riker and Ordeshook by the addition of D, a term representing the psychological “selective benefits” obtained by an individual who votes:

U = P · B − C + D (2.1)

Upon this foundation, Bäck and Debus construct a “calculus of speechmaking” to model an MP’s decision to speak or not to speak (Bäck & Debus, 2016, 28):

Uspeak = P · Bpolicy + P · Bvotes + P · Boffice − C + Snorms + Sexpress + Splatform + Scareer (2.2)

In the above equation 2.2, the potential benefit accruing to an MP for delivering a decisive speech B has been divided into three components, based on goals of policy-seeking, vote-seeking, and office- seeking. These terms represent the collective benefits accruing to the party as a whole. For simplicity, Bäck and Debus leave out corresponding subscripts for P and C terms, although as they point out these may also vary across types of benefits. Likewise, the individual or selective incentives that the MP in question may receive for delivering a speech D are divided according to different categories of individual benefit, such as psychological satisfaction from the opportunity to speak Sexpress or from supporting their parliamentary team Snorms. Bäck and Debus ask two main research questions: who makes parliamentary speeches, and what do they speak about? The first question—in other words, the decision to speak or not to speak—is the subject of the model described above. Empirically, they employ a measure of floor participation or the frequency of speeches to test hypotheses generated by this model. To investigate the second question, they propose a simple “choice calculus” in which an individual MP balances the relative benefit of deviating or not deviating from the party line to maximize their own policy, vote, and office goals (Bäck & Debus, 2016, 37). They do not specify a mathematical model, but outline some general hypotheses about the direction of effects on party line deviation of some relevant independent variables at the individual MP level, such as seat type, based on results from the roll call voting literature (Bäck & Debus, 2016, 39). Their dependent variable here is measured a as deviation from the party line in speeches given by MPs using the Wordscores methodology of party position scaling (see Chapter 4) (Bäck & Debus, 2016, 45). Despite their theoretical innovations over Proksch and Slapin, Bäck and Debus still fall short of specifying a model that can account for speechmaking behaviour in the Canadian parliament. First is the persistent issue of whether a utility maximization model of voting such as the calculus of voting is adequately generalizable to the decision to speak or not speak. Similar to the adaptation of the spatial voting model to legislative speeches, the voting calculus model imports some assumptions that are difficult to sustain even in their original context, becoming even more problematic in the case of speech. The “duty” term, which balances the utility of voting equation in the case of an individual citizen’s decision to vote, has no real analogue in the highly institutionalized context of parliamentary speech. Furthermore, in the Canadian Parliament, an individual MP has little individual control over the decision to speak. Overwhelmingly, an MP is constrained by institutional rules (for example, the Standing Orders governing Question Period and Adjournment Debates) as well as party discipline (including the convention of drawing up party speaker lists for Question Period). Since Bäck and Debus simply adopt Chapter 2. Literature Review 29

Proksch and Slapin’s assumptions regarding institutional variation and floor participation rules (Bäck & Debus, 2016, 56–57), their decision model does not overcome the same issues that make Proksch and Slapin’s model a poor fit in the Canadian case. However, they do point out that further research is needed to address these simplifying assumptions regarding variation in speechmaking norms across systems (Bäck & Debus, 2016, 59). Second, despite modelling the decision to speak as a combination of partisan and individual influences, Bäck and Debus subsequently understand the content of a speech as a choice between toeing the party line or defection represented on a single ideological dimension. In their review of the literature on voting and party unity that informs their collective action model, they point out the multidimensional basis of party unity, which has components of cohesion, discipline, and agenda control (Bäck & Debus, 2016, 8). A cohesive party shares a political outlook and major goals, reflecting shared preferences; a disciplined party offers both incentives to loyal MPs, such as career progression, and punishments for disobedience; and a party with strong agenda control actively avoids divisive votes or circumstances that could have negative implications for party unity (Carey, 2007). Based on Bäck and Debus’ own reading of this literature, a variety of collective as well as individual incentives ought to affect speech content in their model, as is the case regarding the decision to speak. However, such factors are not included in their unidimensional model of speech content. Furthermore, they adopt the Wordscores methodology, using party manifestos as reference texts for scaling MP and party speeches to positions, to measure along this dimension (see Chapter 4). Again, their fundamental measurement assumption remains that speeches reflect policy positioning. In sum, while Bäck and Debus present an innovative collective action model of the decision to deliver a parliamentary speech, their understanding of the content of those speeches does not extend this basis. Their work raises useful ideas about the multidimensionality of legislative speech, namely that parties employ speeches to advance collective goals of agenda control, accountability, or internal discipline as well as shared policy preferences. In their collective action decision model, the concept of “office-seeking” as opposed to “policy-seeking” motivations for speech are an abstraction that captures a similar distinction. However, Bäck and Debus do not take the additional step of translating these ideas into their model of parliamentary speech content. In the next chapter, I will build on their work to propose such a model specifying a trade-off between the goals of policy positioning and parliamentary accountability in speech content that is compatible with the institutional dynamics of responsible government in Westminster systems. Chapter 3

Research Design

In the previous chapter, I outlined how the central conventions of responsible government structure parliamentary debate in Canada. Collective accountability, I have argued, is a dialogic process. Re- sponsible government is legitimized via the attack and defence that occurs in the House of Commons between oppositions and governments. Party discipline enables the government to exercise executive power and defend its program; it also allows oppositions to hold governments to account and poten- tially defeat a government if confidence is lost. These institutional dynamics have frustrated empirical attempts to study ideological positioning in Westminster parliamentary systems, for example, through the construction of spatial models using roll call voting data. Strategic voting behaviour and party dis- cipline obscure the relationship between an individual MP’s personal ideological position, their party’s position, and the vote they finally cast on the House of Commons floor. As discussed in Chapter 2, the extension of scaling models to the empirical study of parliamentary debate as textual data faces similar difficulties. The assumption that measuring variation in parliamentary text as data necessarily captures variation in ideological position is unreliable in Westminster parliaments. The Canadian case is particu- larly challenging for such models due to remarkably strong norms of party discipline. In this chapter, I propose an alternative model of parliamentary speech as a trade-off between collective goals of ideology and accountability. To operationalize parliamentary accountability, I adapt an empirical definition of accountability proposed by Bovens. Extrapolating from this definition, I propose that measuring lexical similarity across government and opposition speeches within a common debate context is a reasonable approach to the quantitative textual measurement of accountability. Working with this model, I pro- pose a qualitative validation strategy of the accountability measure, a quantitative verification against polling data. Finally, I make some testable predictions appropriate to answer my research question: are minority parliaments more accountable than majority parliaments?

3.1 Textual Debate Models: Peterson and Spirling (2018)

The recognition that parliamentary debate is dialogic suggests that modelling debates as text in relative terms is a realistic approach. Instead of scaling individual speeches as data points to a single dimension parameterized by partisan extremes, as do Proksch and Slapin, or with reference to external measures of partisan ideological positions, as do Bäck and Debus, such an approach would instead compare speeches within some shared context and investigate variables that shape relative textual similarity or distance

30 Chapter 3. Research Design 31 across contexts. Peterson and Spirling (2018) propose one such method in their study of partisan polarization in British parliamentary debates from 1935 to 2013. Within each parliamentary session in their dataset, a supervised classification model is trained on a subset of speeches to recognize party labels (either Labour or Conservative). The more accurate the classifier is when applied to a test set of debates from that same session (that is, the easier it is for the trained algorithm to correctly predict the party to which an MP belongs based solely upon the content of their speeches), they argue, the more relatively ideologically polarized the two parties (Peterson & Spirling, 2018, 121). The theoretical model of parliamentary speech underlying their approach specifies three mutually exclusive types of words: “left”, “right”, and “noise” words. Left and right words are entirely partisan, while noise words are non-partisan. The quantity of interest is the difference between the frequencies of left and right terms parties use, which is what the classification model employs to assign speeches to partisan categories within a given session. In practical terms, the less likely a Labour MP is to use “right” words, and the less likely a Conservative MP is to use “left” words in a particular session, the more polarized the parties—and the easier it is for the algorithm to identify clear examples of left and right words upon which to base this distinction. Because this classification is performed at the within-session level, the measure is not frustrated by historical shifts in the political meaning of words over time. Validating these relative accuracy measurement results across parliamentary sessions against qualitative reports and polarization measurements based on party manifesto data, Peterson and Spirling find general support for the viability of their approach. However, this model still relies on an assumption that left or right ideological position is the significant source of textual variation across parties. Related is the premise that observing a lower classification accuracy implies lower ideological polarization and not some systematic difference within the noise words in their textual model. As Hirst et al. (2010) find in Canadian House of Commons debates, government or opposition status (from their disciplinary perspective, argumentation dynamics of “attack and defence”) is a significant and troublesome confound for recovering ideology from textual data via classification algorithms. Peterson and Spirling are not inattentive to this concern in their article. First, they emphasize they are not interested in attaining a high ideological classification accuracy within each session per se (as is more commonly the task in the computer science literature on textual classification), but in comparing relative levels of prediction accuracy (Peterson & Spirling, 2018, 123). Second, they explain that relative classification accuracy corresponds to relative change in ideological polarization only if we assume that noise word relative frequencies are constant. There are two parts to this assumption: first, that noise words do not disproportionately outweigh left and right words in the data, causing problematically high variance; and second, that noise words are used at similar rates by both parties across time. Peterson and Spirling address the first concern by conducting a simulation of ideological classification at different levels of noise. They confirm that increasing noise does falsely increase ideological similarity between parties, but this effect only has an appreciable impact at high levels of noise (> 60%) (Peterson & Spirling, 2018, 124). Second, it is possible that the relative frequency of noise words in the data over time could vary non-randomly, for example due to changes to parliamentary procedure. They investigate this issue by testing the correlation between the variance in predictions of individual members’ ideological positions and party positions. They find that periods of partisan ideological consensus are not associated with higher prediction variance in members’ positions, arguably indicating the classification measure is actually recovering ideological polarization as opposed to a noise phenomenon (Peterson & Spirling, Chapter 3. Research Design 32

2018, 126). However, Peterson and Spirling’s strategy for testing the assumption of randomness in noise words is problematic in the Canadian context. In practical terms, the extreme level of party discipline within Canadian parliamentary votes means there are inadequate data for making statistically valid comparisons across speeches and roll call voting data. The efficacy of their measure is therefore in part specific to parliamentary contexts characterized by enough observable individual-level ideological variation to yield individual-level scaled positions. The translation of the method across contexts also requires maintenance of the assumption that the average level of noise is below the problematic level of 60% obtained through simulation, which, again, relies on the ability to compute a meaningful individual-level ideological variance baseline. In the Canadian context, and more generally as party discipline increases, we would expect this verification method to perform more poorly as the relative amount of noise increases. Rather than attempt the more difficult task of measuring ideology in parliamentary debate texts, my general approach will be to model and measure institutional adversarialism as an interesting puzzle in its own right.

3.2 Accountability, Ideology, and the Lexical Gap

The argument I make in this chapter is as follows. Fluctuations in noise that confound the ideological classification of parliamentary speech are not entirely random; when examined at the appropriate level of analysis, embedded in this noise is a measurable signal of the institutional performance of Parliament as an accountability mechanism. I assert that similarity in word choice (more concisely, lexical similarity) across parties in a parliamentary debate is representative of parties discussing the same topic rather than agreeing on a topic. That is, lexical similarity is more appropriately used as a quantitative measure of parliamentary accountability, rather than ideological polarization. Consider how there are two possible types of incorrect classification under the conceptualization of noise words presented by Peterson and Spirling. First is the case of non-ideological words that are mistakenly modelled as left or right words. As Hirst and colleagues (2010) note, there is a substantial set of non-ideological words that nevertheless are strongly predictive of government or opposition status. These words can originate from formal, obligatory procedure—government members are much more likely to introduce legislation, and therefore to use words associated with this process, for example—as well as less formal norms of Westminster parliamentary practice. Peterson and Spirling argue we can reasonably assume this type of noise occurs uniformly across parliaments since parliamentary procedure is generally very consistent across time. Since they construct individual models per parliament there is no clear method of testing this assumption, but their model can safely ignore it in practice. If such words are not correlated with government or party status, or more broadly are random, the effects should not matter if we are comparing relatively across parliaments. Second, there is the case of left or right words that are mistakenly modelled as non-ideological words. Related is the assumption that using the same words reflects relative ideological agreement given they are not partisan negative or positive words. However, political opponents frequently use the same words to disagree strongly in relation to the same issue. Consider the simple case of negation: one party avows it supports tax increases, and the other argues it is against raising taxes (and supports lowering taxes). A bag of words classifier will not recognize the distance between these political opinions, which differ more in their semantics rather than lexical content. A more complex model incorporating part-of-speech Chapter 3. Research Design 33 tagging and a rules-based negation framework could begin to capture the distinction, but at a much lower accuracy than human coding could potentially provide. As Peterson and Spirling point out, there are particular conditions under which their ideology classi- fication model is most appropriate. Their measure theoretically performs best when parties “use different vocabularies when discussing the same issue (Peterson & Spirling, 2018, 127).” Yet, there is a tension between this measurement approach and its theoretical underpinnings: “claims about polarization make the most sense when parties (or people) have different perspectives on the same topics, that is when they are not simply raising (possibly orthogonal)1 subjects of interest which have implicitly different word frequencies (Peterson & Spirling, 2018, 127).” Thus, the assumption that parties are discussing the same topic is critical to the interpretation of vocabulary difference (or lexical similarity) as ideological polarization and not as orthogonal time-wasting. Furthermore, there is the case Peterson and Spirling leave outside the scope of their article: what about when parties use the same vocabularies to discuss the same issues? Considering a government’s capacity for agenda control in a Westminster parliamentary system, especially under majority conditions, and the limited potential opposition opportunities to speak, it is unlikely that oppositions spend much of their time pursuing orthogonal issues. Instead, they focus primarily on holding the government to account on key issue areas or policies. The extent to which oppositions do pursue their own “subjects of interest” independent of the government agenda, or to which governments do so irrespective of (or in defiance of) opposition questioning on their agenda, is the extent to which Parliament is failing to operate in an accountable fashion. As Stewart observes, the opposite of accountability is not ideological contestation but “superficial, repetitious speechmaking” that raises orthogonal issues and wastes the House’s time (Stewart, 1977, 21). As a practical example, imagine a Question Period exchange between a minister and an opposition member. If a minister is addressing the opposition question in an accountable manner, they are likely to reference and reuse key words and data within the question in forming their response. If they dodge the question or answer very generally, they are less likely to reuse the exact language of the original question posed. They may completely ignore the opposition question and repeat an ideological talking point that does not overlap at all—in language or in meaning—with the member’s original intent. It is also worth emphasizing that accountable debate is a two-way street. The opposition member must, for their part, ask a question that the government reasonably could expect to be accountable for, rather than asking a heckling or leading question. In the case where a government minister is unable to answer a poor question, we would expect to observe the same lack of overlap in word choice. This conforms with Stewart’s reflection that an undisciplined opposition is less capable of debate preparation and targeted questioning, and is therefore less effective at holding government to account (Stewart, 1977, 19). In sum, such a dialogic measure of accountability as linguistic similarity is not a measurement of government performance, but of the performance of the House of Commons as an institution of political accountability as a unit. Importantly, the significance of overlapping words in question and response is reduced to random noise if removed from their dialogic context, wherein it is meaningful that members from different parties are using the same word. Peterson and Spirling’s classification method is performed on bag of words aggregates for parties or MPs per parliamentary session (Peterson & Spirling, 2018, 122). At such a

1In this context, the term orthogonal refers to statistical independence; that is, a topic on a completely unrelated tangent to the overall discussion. Chapter 3. Research Design 34 broad level of aggregation, it is impossible to recover the textual similarity dynamics exemplified above. However, aggregating speeches within narrower contexts, such as at the Question Period or daily debate level, may represent a solution to the trade-off between computational complexity and keeping a shared context. I return to these more practical considerations later in this chapter. In the following section, I elaborate on the theoretical mechanisms underlying this empirical measurement strategy.

3.2.1 Theoretical Framework

In any legislature, speeches made by members are often directed at policy seeking, or more generally representation of policy positions, as is emphasized by Proksch and Slapin. As I have argued, however, in the case of a Westminster-style parliament, responsible government entails that executive accountability constitutes a second political aim accomplished via parliamentary speech. Bäck and Debus propose an idea that can be extended to capture this concept. To summarize, their model proposes that the decision to speak in a debate is influenced by three types of gains: collective political goals, individual political goals, and individual selective incentives. I argue this type of model can also be used to understand the content of parliamentary speeches, not just the decision to speak. Within the domain of collective political goals, Bäck and Debus recognize three types of motivation familiar from the comparative literature on competitive political parties: policy seeking, vote seeking, or office seeking. As literature on legislative debate from the deliberative perspective has found, even in ideal cases vote seeking (that is, attempting to convince other legislative members to vote for your proposal via debate) is difficult and costly, and as a result exceedingly rare. Thus, an opposition party faces a practical trade-off between expressing its own position on a given debate issue or holding the government to account for its actions on that issue. Likewise, a government in control of the debate agenda can choose to focus on its policy priorities, or (be forced to) respond to opposition questioning. Of course, speeches may in fact contribute toward both goals at once—in particular, during a majority parliament in a two-party system, when adversarialism is strongest and accountability and ideological difference are in institutional alignment. A party’s decision to emphasize one goal or the other is shaped by institutional and political variables affecting the incentive structure they face. For example, a strong majority government with strong popular support is likely to emphasize policy goals, while an unpopular minority government may face strong accountability pressure from an emboldened opposition. Alternatively, an ideological third party may instead lend its support to an unpopular government to attain policy change consistent with its beliefs. The purpose of the decision is not necessarily the attainment of desired change (indeed, for an opposition this may be near impossible): it is the act of communication to an audience both inside and outside Parliament in service of a political outcome. The assumption that the goal of parliamentary debate is effective political communication underlies a second key assumption of this model: the relative importance of each goal to the party is equivalent to the average proportion of words allocated toward each goal in its parliamentary speech. Parties do not speak in the House of Commons—individual MPs do. In the behavioural models described by Proksch and Slapin and Bäck and Debus, MPs make decisions about whether to speak and what to speak about based upon weighing their individual political goals and incentives versus those of their party. For their part, party leaders (in a practical scenario, House Leaders or other delegated party members) exert some level of control over individual MPs’ speeches through disciplinary incentives and disincentives. In the Canadian case, as I argued in Chapter 2, we can reasonably assume parties possess Chapter 3. Research Design 35 complete control over MP speeches. Thus, setting aside individual goals and incentives2, speeches in the House of Commons can serve two types of collective political communication: policy seeking or office seeking. Such a distinction can be reframed, especially in the Westminster parliamentary context, as a tension between ideology and accountability. For a more detailed discussion of the behavioural model underlying these assumptions, see Appendix C. The external validity of this trade-off model has empirical support via evidence that a trade-off in debate strategies between policy representation and the pursuit of accountability evolved over time as party systems matured in Westminster parliaments. Eggers and Spirling investigate the puzzle of how, in the late nineteenth century, opposition in the British House of Commons consented to substantial expansion of government agenda control power exemplified by Balfour’s timetable reforms of 1902 (Eggers & Spirling, 2014). They argue this transformation had its roots in a shift in electorate priorities from the local benefits promised by individual MPs to assessment of the collective fitness of a party to form government. Thus, parliamentary opposition came to benefit more strongly from “the opportunity to demonstrate, in inquisitorial clashes with the government, that it deserved the opportunity to lead,” (Eggers & Spirling, 2014, 874), rather than from representing its own positions and raising its own matters in debate. From their point of view, the benefit of securing guaranteed opportunities to hold government to account in the House of Commons outweighed the loss of agenda control over House business such timetable reforms represented (Eggers & Spirling, 2014, 876). Eggers and Spirling support this theory by developing an empirical measure of government responsiveness, and show that ministers responded more frequently to opposition questioning beginning in the 1890s (Eggers & Spirling, 2014, 876). In the Canadian context, Godbout and Høyland (2013) also investigate the emergence of unified parties in Parliament. Examining roll call voting over 1867–1908, they investigate three influences upon the solidification of party unity: partisan sorting, electoral rule changes, and, parallel to Eggers and Spirling’s study, increasing government agenda control (Godbout & Høyland, 2013, 774). In Canada, such government control was weaker during these early years than in the British case, as opposition members retained greater parliamentary freedom to propose their own business and amendments to government bills than in Britain (Godbout & Høyland, 2013, 781). Godbout and Høyland find that as government agenda control increased in Canada over this period, voting unity also increased; however, this trend was uneven from parliament to parliament, reflecting a divergence from the British case. They argue that religious division within parties was the root cause of this variation, and as partisan sorting lessened internal divisions over religious issues, the relationship between government agenda control and voting unity evened out as well. They summarize that it may be “easier for the executive to control the agenda when governing party members have homogenous preferences (Godbout & Høyland, 2013, 793).” Indeed, once the Liberals attained a consistent level of internal religious unity by the 10th and 11th Canadian Parliaments (in this case, the sorting of Catholic MPs into the Liberal Party, which was near- complete by the 1900s), they implemented rule modifications to strengthen government agenda control and increase time for government business in the House, as had occurred in Britain (Godbout & Høyland, 2013, 792–793). In a follow-up investigation of party unity from 1867-2011, Godbout and Høyland confirm that, by World War I, Canadian governments possessed the agenda control tools such as closure they needed to limit the influence of potentially divisive private member motions, and this control was

2Speeches directed at such goals can be understood as the true orthogonal noise theorized by Peterson and Spirling (Peterson & Spirling, 2018, 127) Chapter 3. Research Design 36 directly reflected in voting unity measures (Godbout & Høyland, 2015, 558–559). To summarize, evidence from political development research suggests that a trade-off between ideology and accountability has evolved over time in Westminster parliaments, and agenda control by government (such as that enjoyed by a majority government) was a significant component of this transformation.

3.2.2 Empirical Model and Measurement

Assuming this trade-off exists in parliamentary speeches, and that textual similarity reflects parlia- mentary accountability, the next step is to precisely define and conceptualize this similarity measure. Linguistic similarity can be measured along three basic dimensions. As already noted, the simplest form—whether or not parties use the same dictionary words in their speeches—can be referred to as surface or lexical similarity. The higher-order concept of meaning—whether or not parties are discussing the same concept or construct—is known as semantic similarity. Peterson and Spirling’s work outlines how ideological polarization should be consistent with low semantic similarity across parties. Finally, language can vary in its sentiment, in simplest terms from positive to negative tone. Menini and Tonelli (2016) investigate algorithmic options for computing these three types of textual similarity in the context of the automatic detection of political agreement and disagreement. Not sur- prisingly, they find an aggregate measure combining methods from each of the three similarity categories performs best for this agreement classification task. More generally, research in computer science on the performance of linguistic similarity measures in different contexts is challenged with the goal of bridging the “lexical gap”. The lexical gap is defined as the space between what words say and what words mean, which can confound a computer attempting to “understand” how two texts resemble each other. In other words, machine measures of textual overlap have trouble capturing semantic similarity in texts that humans could easily distinguish based on their background knowledge or contextual cues: for example, texts that are about the same topic or are in strong ideological agreement, but which nevertheless em- ploy completely distinct vocabularies. Aggregating multiple similarity measures enables us to begin to “bridge the lexical gap” between word counts and meanings. However, as I explain in the next chapter, computerized techniques to measure semantic and sentiment similarity suffer from limitations, especially when attempting to translate across linguistic contexts. Furthermore, the extent of the lexical gap is not only a function of algorithmic sophistication, but also of context. Political speech is an example where ideological orientation, institutional structures, and norms like party discipline could have uneven effects on different linguistic similarity features. In short, I focus primarily on a lexical similarity measure in this dissertation for reasons of both internal and external validity. Returning to the construct of accountability, Bovens’ work on accountability and bureaucracy pro- vides an empirical definition of accountability that is helpful for conceptualizing it as a dialogic process for the purposes of measurement (Bovens, 2005, 2010). Bovens defines accountability as a social relation between an actor and an accountability forum, in which the actor feels obliged to explain and justify his or her behaviour to the accountee (Bovens, 2005, 184). He argues this relation has three observable stages. First, the actor is required to inform the accountability forum through provision of data, expla- nation of actions taken, and justification of choices. Second, the forum has the opportunity to question the actor, and the quality of information and rationales they have provided. Third, the forum passes judgement on the actor, with the possibility of applying sanctions if necessary (Bovens, 2005, 184–186). Following this definition, if we were to observe an idealized House of Commons debate in which the opposition was effectively holding the government to account, we would expect the dynamics of that Chapter 3. Research Design 37 debate would follow the empirical pattern Bovens describes of information, questioning, and potentially judgement. Furthermore, we could expect the actor (or government) and the forum (effectively, the opposition) to overlap significantly in the lexical content of their speeches, given they will be discussing the same data, policies, and questions. However, if the subject of an idealized parliamentary debate was a policy contest, we would expect the speeches of government and opposition MPs to be less similar, both lexically and semantically. Employing some combination of shared and distinctive language, parties will establish distinct meaningful positions on some understood policy issue. For example, an opposition member may employ a different linguistic framing or emphasize different components of a policy issue when holding a government to account, in ways that originate from their ideological position. This exemplifies the polarized case Peterson and Spirling identify as the optimal scenario for an ideological classification accuracy score. In this case, it is especially difficult to separate ideological differences from disputes over government accountability, both from a practical standpoint and a measurement perspective. Related, the relationship between lexical and semantic similarity is likely to be issue- dependent and shaped by political and historical factors beyond the institutional level. Finally, in the third, “orthogonal” case, MPs from different parties not only use different words (representing little lexical overlap) but also talk about completely different topics (representing little semantic overlap). To summarize, in the accountability case, we would expect to see high lexical similarity and high semantic similarity. In the policy contest case, we would expect to see lower similarity scores and a complex relationship between their rates of change. Finally, in the orthogonal case, we would expect both low lexical and semantic similarity. It is important to bear in mind that, by methodological definition, lexical and semantic similarity are not independent variables and some component of this relationship will be endogenous. Any textual analysis algorithm, or human reader, must infer semantic similarity from the same words used to calculate lexical similarity. This measurement uncertainty is one reason why I am unable to make predictions about the expected level of lexical similarity in the policy contest case. Given these caveats, the simplest analytic approach is to begin with the first case, characterized by high lexical similarity and high semantic similarity. It is the easiest and most transparent to measure (fundamentally, we need to compare word counts across speeches) and has theoretical merit to its validity as an empirical measure of parliamentary accountability per Bovens’ model. As I have noted previously, holding context constant is necessary for lexical similarity to be a valid measure of accountability. Aggregating speeches by members of Parliament per session (as do Peterson and Spirling) recasts such similarity as statistical noise. However, my approach is to aggregate speeches within narrower time frames to perform similarity calculations, then examine trends in these means across broader time frames. Such a decision is a compromise between validity and computational intensity. Ideally, we would compare similarities across pairs of speeches between individual MPs on the same topic of debate. Indeed, this is the approach I will take in Part 1 of Chapter 5 as a qualitative validation step. However, calculating pairwise similarity between all such speech pairs in all debates would be unreasonable computationally. As a compromise, we can reasonably assume that parties will be talking about the same set of topics on a given day of debate, even though there will be some error introduced through aggregation of both sides’ text. From a theoretical perspective, this level of analysis also represents a good compromise. Measuring similarity at the individual member level can obscure accountability dynamics precisely because “we should regard the House as the field and occasion for government by political parties (Stewart, 1977, 21).” Responsible government, as discussed in Chapter 2, is characterized by collective accountability, Chapter 3. Research Design 38 which requires some measurement level that captures collective partisan positions rather than individual exchanges. As I will emphasize during the qualitative evaluation step of my analysis, it is not appropriate to calculate individual-level similarity scores per minister and use these to argue some ministers are more accountable than others. The debate level provides context for a relative accountability contest between government and opposition, while maintaining a level of aggregation that renders the analysis computationally tractable. Finally, my approach retains one of the fundamental traits of Peterson and Spirling’s measurement strategy. That is, it represents a relative measure across time rather than one scaled to an absolute comparable value, such as a left-right position. Rather than being concerned with the “actual” positions represented by claims made in the text, or with the ideological content of the claims themselves, we are concerned with how two particular categories of texts resemble each other on average, albeit at different levels of analysis and for different theoretical reasons.

3.3 Validation and Prediction

3.3.1 Part 1: Qualitative Assessment of Opposition-Minister Exchanges in the 38th and 39th Parliaments

Assuming that lexical similarity is a potential measure of parliamentary accountability in debate texts, how can its efficacy be verified? It would be difficult to establish a “ground truth” of comparison against which the new quantitative measurement could be validated; as Stewart points out, a qualitative assess- ment of parliamentary accountability is difficult even for an experienced parliamentarian. Nevertheless, as is common to many other attempts to develop computer models, human coding is our best alternative to develop a standard for comparison. In the first part of my analysis (Chapter 5), I attempt a human validation of the lexical similarity measurement of accountability. As outlined above, my measurement approach is based on Bovens’ empirical model of accountability as a three-step relation between the accountable party, or the government, and the accountability forum, or the opposition. At the level of analysis of the individual MP, the first two steps should generally be observable in passages of debate between opposition members and government ministers on a consistent topic. This should be especially apparent during Question Period when opposition members have the procedural power to ask questions. The first section in my analysis in Chapter 5 takes exactly this form. I perform a case study of two recent minority parliaments: the 38th Parliament, under a Liberal government, and the 39th, under a . I identify all pairs of Question Period speeches in which an Official Opposition member speaks followed by a minister, and for which the subtopic as recorded in Hansard is identical for both speeches.3 Matching subtopics reduces the likelihood of selecting unrelated pairs of speeches, and also maintains the constant underlying context important to the validity of my accountability measure. Next, I calculate scaled lexical similarity values for each dialogue pair, categorize each according to its topic of debate, and calculate mean scores for each topic. I select four debate topics per government for closer analysis, choosing relatively high and low similarity examples from varied policy domains. Finally, I examine the text of speech pairs ranked as high and low

3In this section, and in my subsequent quantitative analyses, I focus on the Official Opposition since the majority of speech opportunities for opposition members is reserved for the Official Opposition under parliamentary rules. In pre- analysis testing, I found that the inclusion of third party opposition speeches had a negligible impact on the direction and significance of results found. Chapter 3. Research Design 39 similarity examples within these debate topics to assess the validity of the accountability measure in a qualitative, face-validity sense.

3.3.2 Part 2: Quantitative Study of Daily Debates (1945-2015) and Oral Question Period (1975-2010)

To broaden this analytical approach chronologically, I shift to a quantitative approach. First, I investigate my primary research question of whether minority governments are more accountable than majority governments. According to P.H. Russell, one of the reasons minority governments are more able to meet public demands for political accountability is that some level of deliberation becomes required in the House of Commons. The more opposition members in the House, the more governments must solicit opposition opinions and genuinely adapt to their concerns in order to secure votes, both to pass routine legislation and to retain the confidence of the House. Minority parliaments including third parties further incorporate a diversity of perspective amongst which governments must negotiate. A government may possess a reliable governing partner in a third party willing to support its pursuit of a particular agenda; the “classic legislative alliance” between the Liberals and the NDP from 1972-1974 was one such example (P. H. Russell, 2008, 33). In other cases, a minority government may be so tenuous as to be threatened with imminent defeat from its very inception, with the official opposition having a substantial number of seats and a clear path to potential victory in the next election. A representative example is the Conservative minority of 1979, which was characterized by minimal coordination with Social Credit and the persistent pursuit of non-confidence by the Liberal opposition. In either case, a minority government must work hard to maintain support within the House of Commons by anticipating the concerns of opposition parties in its policy proposals and preparing thoroughly for opposition questioning on its conduct. In a majority situation, by contrast, a government has the executive power and democratic legitimacy to implement its policy platform. It possesses the votes needed to safeguard against losing votes of confidence (given norms of party discipline) as well as the agenda control to limit troublesome opposition questioning. Under these conditions, governments will feel less pressured to explain and justify their decisions, include concessions to opposition concerns in their bills, and act responsibly in general. For its part, a weak opposition may lack the resources and discipline to effectively hold the government to account as a caucus, leading to orthogonal questioning or partisan attacks in lieu of a targeted strategy to hold the government accountable. My quantitative study builds upon the approach I validate in Part 1 of studying exchanges between government and opposition during Question Period. Helpfully, Question Period has had a standard length and relatively consistent scheduling on the daily timetable since reforms in 1975. Each Question Period since represents a consistent unit of analysis characterized by similar levels of government and opposition participation.4 For each Question Period, I concatenate all government speeches and all official opposition speeches to form two aggregate “documents” for comparison. Due to normalization, calculating a similarity score between these two aggregates is equivalent to calculating the mean similarity score between each government and opposition speech.5

4This is important for measurement validity: given word choices are not normally distributed, the longer a text, the more it will resemble, on average, some other text written in the same language. For more discussion of this issue, see Section 4.5.1 and Appendix A.9. 5Unlike in the qualitative analysis phase, I do not study the lexical similarity between government and official opposition speech pairs on matching debate topics within Question Period. One major limiting factor is doing so is computational Chapter 3. Research Design 40

To study trends in Question Period similarity scores, I use a hierarchical mixed effects model in- cluding random effects for sessions within parliaments. Attention to the parliament, and parliamentary session, level of analysis is important for validity reasons. In Parliament, “the session is the basic unit of time for procedural purposes,” as each new session means a fresh beginning for government business (Stewart, 1977, 48). As discussed previously, a shared and consistent political and linguistic context is necessary for my lexical similarity measurement to have construct validity. At minimum, it is within the parliamentary session level that ideological differences as well as the institutional dynamics of collective accountability are consistent and meaningful. In the interests of performing a conservative test of my theory, I therefore treat Question Periods as samples representative of parliamentary session-level esti- mates of accountability. For an even more conservative test, I also study results at the parliamentary level, since (excepting by-elections and floor crossings) the formation of a new parliament is the only occasion when the balance of power between government and opposition shifts. The inclusion of random effects at the parliament and session level in my model (in addition to at the quarterly level, where appropriate) allow me to compensate for historical variation across time. Considering the limited number of minority governments in parliamentary history and their short average duration, it would be ideal to include as broad a time span as possible in the study for reasons of statistical significance. Unfortunately, procedural changes to the House of Commons over history make construction of an analysis dataset a more difficult problem. Prior to the procedural reforms in the 1970s that included introduction of a scheduled Question Period, similar discussions were conducted during other phases of the timetable, primarily Committee of the Whole and debates on motions. As discussed in Chapter 1, the ability of the opposition to use these opportunities to obstruct proceedings was an impetus for procedural changes made in the late 1960s and 1970s, including to time allocation and the role of committees. Given this procedural variation, I perform two separate statistical analyses. First, I investigate the effect of majority status on accountability using the Question Period (1975-2010) dataset. Second, I broaden my level of analysis to the daily debate level in order to permit inclusion of as much historical debate data as possible in the dataset, extending the period of study back to 1945. We can assume a limited number of topics will be discussed per day and MPs present that day will share the same discussion context, and that parliamentary procedure affording the opposition opportunities to hold the government to account will not shift significantly from day to day.

P3: As the government’s polling popularity decreases, accountability increases (lexical similarity increases).

P4: As government polling popularity increases above the threshold needed to form a majority, accountability remains stable (lexical similarity remains stable) and does not vary across majority and minority governments.

Collective accountability in a Westminster parliament is ultimately maintained through the the threat of dissolution and elections, especially under minority conditions. This threat of sanction is also a component of Bovens’ empirical model of accountability, representing the final stage of the process he describes. One reasonable measure of the overall threat of sanction for a government is polling data. time. Another is validity, given the accuracy of my approach to automatically labelling speeches as paired exchanges is uncertain (as I discuss in more detail in the next chapter). Treating each speech or speech pair as an observation would require consideration of a large range of potential individual-level confounds including order of discussion, discussion topic, position of member speaking (for example, government member or opposition critic), many of which would be difficult to model. However, applying the more complex approach on a larger scale would ideally produce more valid results. Chapter 3. Research Design 41

When voters are asked who they would vote for if an election were to take place tomorrow, they are performing an assessment of the accountability of the current government as close to an actual election as is reasonably possible. Generally, the lower the polling numbers of the government, and the higher those of the opposition, the more pressured a government will be to be accountable in order to communicate to voters its legitimate entitlement to reelection. The discretionary power of a Canadian government to dissolve Parliament and call an election when conditions and polling numbers best serve its potential success also increases its incentive to communicate accountability when it is most interested in conveying its competence to govern to voters.6 The quality of the opposition is also related to government popularity, and it would be difficult to trace the causal process entwining the two. To an extent, this is the same endogeneity problem facing a measurement of parliamentary accountability, as the performance of both government and opposition contribute to an accountable House of Commons. However, there are some patterns we might expect based upon the incentives faced under particular conditions by government or opposition. If minority parliaments are more accountable due to the need to cooperate for survival with opposition members, as Russell proposes, then we can expect a government will behave more accountably the likelier the odds of electoral defeat. Another reason to expect this pattern is rooted in the observation that an opposition is also effectively a government in waiting. Stewart argues that the responsible performance of a government in power and resultant voter evaluations of its fitness to lead, not its policy platform, are the keys to understanding when voters decide to electorally sanction an incumbent government (Stewart, 1977, 29). In sum, we would expect to see more accountability in Question Period as polling numbers for the government decrease. More specifically, I study the effect of government polling popularity on the dependent variable of lexical similarity scores using the Question Period dataset. To match the availability of polling data, I study this relationship at the quarterly level of analysis. I also include as independent variables a lagged polling value and governing party as a control. The relative willingness of a minority government to “behave” like a majority government should also be shaped by the government’s current electoral prospects. Pickup and Hobolt have documented how minority government status impacts democratic responsiveness through an empirical study of policy spending and mean voter preferences in Canada. Specifically, they find that minority governments that possess enough support in the polls to suggest that they would form a majority government if an election were held (in Canada, about 40%) are about as responsive to mean voter preferences as are majority governments. Minority governments with a polling popularity lower than this approximate threshold are more responsive than are majority governments (Pickup & Hobolt, 2015, 528). If governments are more or less responsive in relation to polling numbers, then it follows that Parliament’s relative attention to accountability should also vary according to the government’s popularity. We should expect to see similar non-linear results; as polling numbers increase, minority governments will behave less and less in an accountable manner, up until the 40% threshold where parliamentary accountability becomes relatively consistent across majorities and minorities. By extension, we should expect to see lexical similarity scores follow this pattern.

6Amendments to the Canada Elections Act in 2007 introduced a fixed federal election date on the third Monday of October, every four years. However, this legislation does not bind the Governor General’s power to dissolve Parliament nor the Prime Minister’s request for such a dissolution. Prime Ministers have continued to use their discretion in calling early elections in spite of the fixed date; indeed, the first election to take place under the new rules was called one year early (Dodek, 2009). Chapter 3. Research Design 42

P5: Accountability (as lexical similarity) and ideology (as semantic dissimilarity) are neg- atively related.

A final, more exploratory proposition relates to the distinction drawn above between similarity across words used by different parties and similarity in how parties discuss political topics: the difference between lexical similarity and semantic similarity. From a theoretical perspective, I have argued that parties face a trade-off in their speech texts between communicating goals of accountability and ideology. I have also claimed that accountability can be measured via lexical similarity between opposition and government speeches. An opposition must refer directly to policies, decisions, agendas, and rationales provided by the government in order to demand accountability for them. This situation should imply some level of semantic similarity, since government and opposition will share context and some amount of agreement on the broader political contest underlying the debate even if they have divergent opinions on the issue at hand. Essentially, the more ideologically polarized a government and opposition, the lower the semantic similarity between their speeches we should observe. Despite the theoretical simplicity of a trade-off between accountability and ideology, under real conditions it is difficult to disentangle these two constructs for measurement, especially in an adversarial Westminster parliamentary context. In other words, ideological polarization is a confounding variable for measuring parliamentary accountability. As discussed earlier in this chapter, lexical and semantic similarity are also correlated at the textual level. However, based on my assertion that accountability and ideology are theoretically separable in debate texts, I expect to observe evidence of systematic differences between trends in lexical and semantic similarity. In the final phase of my research, I attempt to better understand this relationship with a view to eventually controlling for ideology as a confound. Definitions matter a great deal to how I conceptualize both accountability and ideology for measure- ment in text. To restate, my focus in this dissertation has been specifically on measuring parliamentary accountability as associated with the practice of responsible government, characterized by collective accountability of the executive to the House of Commons. The features of this type of political ac- countability, for example, its dialogic nature and strong institutional foundation, are important to the validity of my lexical similarity measurement approach. Other forms of political accountability are im- portant to the quality of democratic governance but are beyond the scope of the narrower measurement concept I define here. My definition of ideology emphasizes its general nature as a set of (relatively) coherent, shared beliefs about what is good for society. Such a worldview structures the organization of a political speaker’s ideas and his or her expression of these ideas in meaningful language. Based on this definition, I argue that ideological differences yield systematically different patterns of word choice between opposed groups. These patterns originate in underlying differences in worldview that dictate how members of these groups understand and organize language. Importantly, I consider this a distinct phenomenon from disagreement over or opposition to a particular government policy, which I consider as falling under the scope of parliamentary accountability. Put simply, accountability is discussion of what a government does, whereas ideology shapes the framing of that discussion. As a practical example, consider a hypothetical House of Commons debate in which the government asserts that “free trade is good for economic growth,” while the opposition argues that “free trade is bad for economic growth.” On its own, a simple negation is not a sufficient indicator of ideological disagreement in text. Both participants probably share an understanding of what free trade is and support economic growth as a desired end for society. What is clear from this statement is that the opposition disagrees on the appropriateness of the government’s plan to achieve it. The opposition has Chapter 3. Research Design 43 drawn on the same language used by the government to assert its own position on that plan, which, befitting an adversarial accountability mechanism, is a contrary one. Next, consider if the opposition were to instead rebut the government by claiming “economic growth is harmful to the environment.” On a surface level, the phrase “free trade” is no longer included in the sentence while “environment” is, resulting in a lower lexical similarity score when compared with the government’s statement. Despite this decreased similarity, the opposition is still holding the government to account for its policy decision; when government and opposition discuss the same policy, parliamentary accountability is taking place even if both sides employ different framings. From a measurement point of view, the persistence of the common words “economic growth” are an indicator that this is the case. I argue the shift in word use between the first opposition example and the second also implies an increase in ideological distance from the government. To verify this claim, we would need to determine whether the differences in word choice—the selection of words like “environment” rather than “free trade” in a discussion of economic growth—are generated by underlying differences in ideology independent of the opposition’s institutional obligation to oppose the government. That is, the structure of beliefs and assumptions held by opposition members (or, at least, the party officials who write their talking points) yields a distinctive understanding of the meanings of the words and phrases they employ. A lexical similarity measure is not helpful for understanding this distinction as each word counts the same toward the measure of overlap regardless of its “meaning” in context. Using a large enough volume of textual data, however, we could observe on average distinctive distributions of word choices and positions in sentences concerning economic policy. Capturing this requires a measure of semantic similarity measure incorporating some concept of context within sentences. The distinctiveness between these two similarity measurements would reflect operation of two causal mechanisms of political speech, one associated with parliamentary accountability and the other ideology. As ideal types, the former is narrow and institutionally-prescribed; the latter is general and speaks to audiences inside and outside the chamber. Practically, however, isolating, tracing, and testing such causal relationships would be exceedingly difficult.7 In my study, I expect to observe a negative relationship between lexical similarity and semantic dis- similarity, as predicted by the trade-off model. However, I also expect that the strength and direction of this relationship should vary as politically-important conditions change, signifying preliminary evidence that two underlying causal mechanisms exist. To visualize how this would occur, imagine we had a reliable way of measuring how accountable and ideologically polarized a parliament was, and could plot a given parliament according to its level of accountability on one axis and level of polarization on the other. Within each parliamentary case, we measured lexical and semantic similarity between government and opposition speeches, keeping track of two features. The first is, straightforwardly, the two similarity scores themselves, representing the linguistic similarity between government and opposition on two scales. The second is the correlation across all cases between the two similarity scores. Imagine a set of observed parliaments on this scale representing one ideal case: high accountability and low polarization. Per my measurement model, we would expect there to be no linguistic clues available to tell government and opposition apart—the opposition and government both earnestly debate the same agenda and hold the same positions on that agenda. In these cases, both similarity scores across

7In rare cases, institutional change does provide the opportunity to test hypotheses about behaviour in legislatures through natural experiments; see, for example, Loewen et al. (2014). Chapter 3. Research Design 44 the parties would be high. Observed across multiple cases, there should also be little difference in the trend of the semantic and lexical similarity measures. Next consider parliaments in the high polarization, low accountability quadrant. Both government and opposition would lack any shared objects of debate or shared ideological meanings, and would exist in completely separate linguistic worlds—a Parliament characterized purely by orthogonal speech. Obviously, similarity scores of both types across parties would be low. In other words, it would be easy to classify government and opposition members into their respective groups based on the language of their individual speeches. In terms of the correlation of the two measures, however, we would expect the same as the first case: both should be strongly, positively correlated. Put another way, the effect of adding a marginal common word across government and opposition speeches would increase both the lexical and semantic similarity measures. We would not expect to see either of these artificial extremes in a real Parliament; real cases would exist somewhere in the middle. Institutional rules and norms dictate some balance between emphasizing accountability and ideology given the incentive structure dictated by variables like seat count, polling popularity, internal discipline, time to the next election, and so on of pursuing one collective political goal over another. These variables, I propose, should differently shape the relative strength and potentially direction of the marginal effect of adding or subtracting a common word differently for either measure. A straightforward example of such an effect is that of ideological salience of the relevant issue area. It is easier to detect an ideological distinction between “homosexual unions,” “gay marriage,” and “LGBTQ rights” simply because language choice is a more ideologically salient act in this issue area than, for example, taxation policy. The transformational shift in public attitudes toward gay marriage over the 2000s was much broader than the simple substitution of words across parties. In my model, the way these terms were used in context and linked to other ideological constructs would be reflected in a more drastic increase in semantic similarity compared to lexical similarity, and an uneven correlation between the two over the period of transition.8 Critically important to this understanding of the significance of semantic similarity is the underlying textual model used to measure it. In order to capture the distinctions I have described, such a measure must incorporate information about how and in what context words are used and be trained on a large historical dataset incorporating text samples from a variety of parties and ideological positions—not just those of the government and the official opposition at a given point in time. For this final phase of my study, I employ the daily debate dataset and include speeches from all parties in the model training phase, in order to include as much ideological and historical variation in the model as possible. In this chapter, I have suggested the use of a linguistic similarity measure of parliamentary account- ability in debate texts. How does this similarity approach compare with other methodological options for measuring parliamentary accountability? What computational methods are available for comput- ing linguistic and semantic similarity on large textual datasets, and do they meet the requirements I have outlined in the discussion above? In the next chapter, I investigate these methodological choices in greater detail and justify my selection of approach. I also outline the technical details of my data collection, processing, and analysis steps in greater detail.

8At a given point in time, the relationship could even be negative—consider two parties that support ideological posi- tions that, viewed in historical context, are nearly identical but which nevertheless persistently argue over the appropriate terminology between “civil unions” or “marriage”. Chapter 4

Methodology

The challenge of translating the theoretical model outlined in Chapter 3 into a textual measure is significant. An appropriate measure must balance validity with computational tractability across a large dataset. It must capture underlying trends across time even though salient policy issues, party positions, and institutions have shifted considerably in Canada over the period since the mid-20th century. In this chapter, I provide an overview of the strengths and weaknesses of standard methodologies used in social and computer science to study textual data, each of which could be used for measuring accountability in political texts. From the social science literature, I examine content analysis, dictionary approaches, and lexicographic and scaling methods. Transitioning to computer science, I provide a general introduction to supervised and unsupervised machine learning methods. Next, I focus on the methodology I will employ in this dissertation to calculate and analyze lexical similarity scores, which I argue are an appropriate construct for measuring parliamentary accountability. In the final part of this chapter, I explain the creation and preparation of the dataset used for my analysis.

4.1 Content Analysis

Political tracts, speeches, essays, pamphlets, and manifestos have provided source material for political scholars since ancient times. During the 20th century, however, technological advances in mass commu- nication spurred efforts to systematize the analysis of political text. At first, scholars engaged in largely qualitative study of mass market newspapers to assess journalistic quality, although some innovators began employing metrics such as newspaper column inches and basic ideological codes (Lasswell, 1949, 4, 5) in what became known as quantitative newspaper analysis (Krippendorff, 1980, 14). Beginning with World War II, the focus of scholarship in content analysis shifted to the study of new and pervasive forms of propaganda. Seminal work by Woodward (1934) and Lasswell (1941) argued that adopting a quantitative approach would help reduce researcher bias, introduce standards for measurement and error, and in general lend scientific rigour to the study of propaganda and political texts. Pioneering research employing both quantitative content analysis and experimental methods would inform, for ex- ample, theories of priming, framing effects, and agenda setting in political science (Kinder & Iyengar, 1987; Klapper, 1960; Lasswell, 1927; Lippmann, 1946; McCombs & Reynolds, 2002; McCombs & Shaw, 1972). Content analysis is a framework for “making replicable and valid inferences from data to their context

45 Chapter 4. Methodology 46

(Krippendorff, 1980, 21).” It represents an effective methodological and epistemological compromise between the qualitative interpretation of text through close reading, and the quantitative practices of measurement, error, and validity. Krippendorff’s (1980) definitive work on content analysis is an excellent guide to its best practices. A content analysis research design begins with the unitization and, if necessary, sampling of some form of linguistic data (Krippendorff, 1980, 53). Next, the researcher develops an explicit recording, or coding, scheme for summarizing the data. This process can be iterative as it typically involves the researcher becoming familiar with the data, designing both a coding scheme and detailed instructions for its application, and then testing the scheme on a data sample together with the human coders who will perform the work, as both a calibration and training procedure (Krippendorff, 1980, 73). The coding manual must be as complete as possible: it must act as sole reference for the coders during the actual procedure, and also acts as a pre-registration of the study that lends validity to its final results. After the training process, coders then proceed through the data to be studied in complete independence of each other (Krippendorff, 1980, 74). Once coding is complete, a variety of statistical analyses of the frequency and relative proportion of codes can be performed. Reliability of the results is quantified using an intercoder reliability measure, which summarizes the agreement across independent coders on the same data (Krippendorff, 1980, 133); Krippendorff proposes a rather rigorous definition commonly known as Krippendorff’s alpha and is critical of weaker criteria of reliability employed in the literature (Krippendorff, 1980, 132, 138). A well-designed content analysis allows for valid, reliable, and replicable inference to be made from a wide range of non-numerical data, from written articles to interviews to television programs. Smaller studies can provide supporting evidence for historical or process tracing analyses (Skogstad & Whyte, 2015). Large-scale content analysis projects such as the Comparative Manifestos Project (Dalton & Wattenberg, 2002; Volkens, Lehmann, Matthieß, Merz, & Regel, 2016) and Comparative Agendas Project (Comparative Agendas Project, 2015) have contributed to significant progress in political science. Dozens of studies been published leveraging the large cross-national datasets produced, and their coding schemes act as useful standards for replication and extension. The most obvious downside to content analysis is its labour–intensive nature, which naturally limits its applicability to large textual datasets and comparative research. This is amplified by the high standards required of well-designed content analysis research: at least two, but preferably more, human coders, not including the researcher(s) themselves, are required for meaningful reliability. The “worst practice in content analysis” (Krippendorff, 1980, 74) is also unfortunately a very common one in the literature: an author who designs and implements a coding scheme on their own. Rigorous content analysis is unattainable for researchers with limited financial resources. Even under ideal conditions, human coders are fallible: the task is tedious, good coding schemes are complex and may be difficult to recall, and personal biases may affect the interpretation of codes from person to person. Inter-coder reliability measures such as Krippendorff’s alpha attempt to measure this error, and a high standard of ≥ 0.8 is recommended for results to be acceptable, although this is difficult enough to obtain that tentative conclusions are generally permissible at values 0.8 > α ≥ 0.667 (Krippendorff, 1980, 147). At the same time, high inter-coder reliability can be a sign of a poorly-designed coding scheme, such as one so generic as to possess no external validity. Despite its potential problems, content analysis performed by human coders sets a gold standard for research using political texts as data. However, the daunting task of performing content analysis on new political datasets that span decades of debate, dozens of countries, or thousands of bloggers Chapter 4. Methodology 47 simply necessitates a computerized approach. Computer analysis will likely never substitute for a close and careful qualitative analysis of text—at least not in our lifetime. For example, humans can easily interpret figures of speech, humour, and metaphor that are characteristic of political expression but which remain nearly impossible for computers to detect. However, as Grimmer and Stewart put it, “rather than replace humans, computers amplify human abilities (Grimmer & Stewart, 2013, 4).” In many cases, methodologies for computerized textual analysis develop upon the logic of existing methods and present exciting opportunities for replication. Supervised learning, for example, essentially proceeds from a pre-specified coding scheme, the design of which necessitates deep familiarity with the data and often requires human coding of a sample set. Other methods have a natural affinity with older statistical models familiar to political scientists (for example, scaling techniques draw upon spatial voting models). Finally, it is standard in the literature to validate new methodological tools against human coding and expert surveys in order to assess validity (Grimmer & King, 2011; Quinn, Monroe, Colaresi, Crespin, & Radev, 2010). Content analysis would be a sensitive method of measuring accountability in speech. A human coder could detect textual nuance indicating whether a government MP was, for example, speaking vaguely, dismissing an allegation with sarcasm, or merely repeating themselves instead of responding to an accountability question from the opposition. In 2018, the produced an investigative report on Question Period that performed a content analysis of parliamentary accountability, using human coding to detect and quantify correct and incorrect factual statements, vague answers, and dodged questions (Campion-Smith et al., 2018). On a broader scale, Soroka has produced a dataset of coded Question Period exchanges including coding for government accountability, for example relating to the work of the Auditor General or to individual ministerial accountability (Penner et al., 2006; S. N. Soroka, 2005). However, it would be very difficult to develop and apply a coding strategy that could capture accountability across multiple parliaments. In order to separate out collective accountability from partisan speech, a human coder would have to be deeply familiar with different parties’ historical policy positions. This is notwithstanding the investment of human labour it would take to manually code enough debates across decades of parliamentary history in order to produce usable comparative results. While less sensitive to nuance than human coding, a computerized approach is necessary to study phenomena that are only meaningful at the multi-parliament level—for instance, whether there is a difference in accountability across majorities and minorities. However, from the content analysis literature I borrow the practice of validating my quantitative measurement approach with a qualitative close reading. In Chapter 5, I perform a case study of controversial topics across two subsequent recent Parliaments to ensure that my lexical similarity measure is correlated with a subjective assessment of accountability in exchanges between opposition members and government ministers.

4.2 Dictionary Methods

The potential benefits of computerization for text analysis were recognized very early on in the digital age. The General Inquirer (Stone, Dunphy, & Smith, 1966) and DICTION (Hart, 1984) were two early computer programs designed to analyze a collection of text documents, or a corpus, by performing word frequency counts. These dictionary methods scan through texts and count the occurrence of a user- specified set of key words, each with an associated category or score. The word counts are used to code passages of text or provide a relative measure of some construct for each passage, for example positive Chapter 4. Methodology 48 or negative tone (Grimmer & Stewart, 2013, 8). A wide variety of off-the-shelf dictionaries, ranging from the very general to highly domain-specific, have been applied to research questions in political science. One major source of generalized dictionary methodologies has been the discipline of psychology. Linguistic Inquiry and Word Count (LIWC) orig- inated in the early 1990s as a dictionary for measuring thinking styles and positive and negative tone in open-ended writing samples (Pennebaker & Beall, 1986; Pennebaker & Francis, 1996; Pennebaker, Mayne, & Francis, 1997). Since then, it has been extended to 80 linguistic features showing correlations with psychological phenomena across dozens of experimental studies (Tausczik & Pennebaker, 2010, 39– 42); in political science, it has been applied to analyze U.S. presidential and vice presidential candidates’ speaking styles (Slatcher, Chung, Pennebaker, & Stone, 2007), speeches by Rudy Giuliani (Pennebaker & Lay, 2002) and Alan Greenspan (Abe, 2011), and politicized tweets (Tumasjan, Sprenger, Sandner, & Welpe, 2010). In computer science and industry, expansive dictionaries for measuring sentiment have emerged for sentiment analysis or opinion mining applications, for example for deriving numerical rat- ings from product reviews and consumer feedback (Esuli & Sebastiani, 2007; Ohana & Tierney, 2009; Taboada, Brooke, Tofiloski, Voll, & Stede, 2011). On the other end of the scale are domain-specific dictionaries designed for classification within particular disciplines and contexts. DICTION, for exam- ple, originated as an analytical tool for American presidential speeches, and defines constructs such as “optimism” with reference to literature from political theory and sociology (Hart, 2001). Lexicoder, and the related Lexicoder Sentiment Dictionary, are designed for coding and sentiment analysis of political speeches and media articles; notably, Lexicoder leverages the topic coding scheme from the Comparative Agendas Project (Albaugh, Sevenans, & Soroka, 2013; S. Soroka, Stecula, & Wlezien, 2015; Young & Soroka, 2012). In general, dictionary methods are easy to build upon existing content analysis studies by extrapolation from human-coded text samples (Grimmer & Stewart, 2013, 8). Dictionary approaches are easy to apply and relatively intuitive to interpret; however, they can be methodologically problematic. From an epistemological perspective, the construct measured by counting word frequencies within categories is questionable given the ratio of words to clauses is variable across texts in seemingly random fashion (Ball, 1994, 299). In general, it is difficult to assess the reliability of dictionary count methodologies against gold standards, which limits their research usefulness to reporting descriptive results (Grimmer & Stewart, 2013, 9). Dictionary methods are also highly context-dependent and can produce meaningless results when applied across subject domains. For example, Loughran and McDonald (2011) test a standard sentiment dictionary on company earnings reports and find that 73.8% of words rated negative by the dictionary (such as “cost,” “crude [oil],” or “liability”) are tonally neutral in a financial context (Loughran & McDonald, 2011). The construct measured by an aggregated tone score also is unclear and has no empirical equivalent in human scale; is “happy” a more positive word than “evil” is a negative one, and if so, by how much (Young & Soroka, 2012, 209)? Even domain specific dictionaries intended simply for categorization have limitations. A political science dictionary developed for American campaign speeches such as DICTION has limited validity when applied to, for example, British parliamentary debates. Context-specific applications of dictionary methods also suffer from a lack of validity across time. Relevant to my dissertation, dictionaries are unsuitable for application to texts older than the original source material of the dictionary. Lexicoder, for example, provides a dictionary of Canadian political terms for coding of text into Comparative Agendas categories; it contains names of specific policies such as “HEALTH AND SOCIAL TRANSFER,” issues such as “F-35,” and actors such as “TD CANADA TRUST” (Albaugh et al., 2013) that are valid Chapter 4. Methodology 49 measures of their representative policy codes only within a relatively narrow span of history. The more specific a dictionary, the more careful updating it will require in order to handle unforeseen words and potentially categories, which raises issues of “backwards-compatibility” of measurement constructs and the research findings that support them (Slapin & Proksch, 2008, 707). From the standpoint of measuring accountability, a dictionary that could capture this dialogic and contextual relationship between government and opposition and do so with validity across time would be extremely difficult to compile.

4.3 Lexicographic and Scaling Methods

A diverse set of text as data methods that improve upon dictionary methods by incorporating linguistic data and statistical modelling have emerged out of political science in the last decade. One such tactic is to utilize techniques from the discipline of linguistics such as corpus linguistics and lexicographic analysis. Savoy (2010) compares a corpus of speeches by presidential candidates John McCain and to a standard linguistics reference corpus, the Brown corpus, to test for significant over- and under-use of words. Within Canadian political science, lexicographic methods have been applied to Speeches from the Throne (Monière & Labbé, 2014), the electoral speeches of (Labbé & Monière, 2010), and fifty years of political leaders’ speeches in Canada, Quebec, and France (Labbé & Monière, 2003; Monière et al., 2008). This work finds that institutional constraints and, occasionally, personal style, are the primary influences on political vocabulary rather than partisan identification (Monière & Labbé, 2014, 260; Labbé & Monière, 2003, 159). In a massive corpus investigation spanning 130 years of the United States Congressional Record, Jensen et al. (2012) investigate the relationship between polarized discourse in Congress and in society by comparing partisan language in Congress with contemporary language in the Google Books corpus. In general, the major motivation of text analysis methodologies in political science has been the use of political speeches to measure ideological orientation on what is generally understood to be a left–right scale (Grimmer & Stewart, 2013, 25). As mentioned earlier within the context of the empirical study of legislative debate, scaling methods take inspiration from the use of roll call vote data and expert opinion surveys to construct spatial models of voting based on issue preferences (Clinton, Jackman, & Rivers, 2004; Poole & Rosenthal, 1985). The Wordscores method innovates upon dictionary methods by using “reference” texts for a given ideological position, for example speeches made by legislators identified via expert survey or voting patterns as strongly partisan, to construct application-specific dictionaries of ideological words. These dictionaries are then applied to “virgin” texts to measure their relative position vis-a-vis the reference texts, and a simple measure of uncertainty is calculated (Laver, Benoit, & Garry, 2003). For example, Weinberg (2010) uses Wordscores to analyze state governors’ “state of the State” speeches to assess variation in the left-right orientation of Democrats and Republicans across American states. In their study of parliamentary debate, Bäck and Debus also employ the Wordscores methodology to capture deviation from the party line in individual speeches (2016) However, the rudimentary method Wordscores implements to transform reference text scores can lead to inconsistent and potentially in- accurate results, even if we accept the validity assumption that the “virgin” texts selected represent an appropriate baseline for measurement (Benoit & Laver, 2008; L. W. Martin & Vanberg, 2008). An alternative approach to scaling party positions from text, Wordfish, emerged out of Proksch and Slapin’s aforementioned work on European parliamentary debate (Proksch & Slapin, 2009; Slapin & Chapter 4. Methodology 50

Proksch, 2008). As outlined in the previous chapter, roll call voting is often unsuited for measuring the ideological position of representatives in parliamentary systems with strong party discipline (Spirling & McLean, 2006a). Human-coded references for party positions, such as those of the Comparative Manifestos Project, are a potentially controversial data source due to measurement and external validity concerns (Lo, Proksch, & Slapin, 2016; Lowe et al., 2011). Instead, Wordfish defines a one dimensional scale for party position scaling from text based solely on the texts themselves, by modelling word choice using a simple Poisson distribution and utilizing a naïve Bayes classification algorithm (Slapin & Proksch, 2008, 595). Monroe, Colaresi, and Quinn (2008) also address flaws in the Wordscores approach by evaluating a slate of dimensionality reduction, machine learning, and statistical modelling techniques for studying the relationship between party and word choice. One disadvantage of Wordfish is that it is unclear what exactly the dimension it recovers is mea- suring. Wordscores, at least, begins with the premise that a researcher supervises the construction of a left-right scale by using expert evaluation to set each end; for example, Bäck and Debus use data from the Party Manifestos project and party leader speeches to calibrate their scale. However, since Wordfish scales to a low-dimensional space based on the most disparate texts discovered in the dataset, the measurements it recovers must be fully elaborated post-hoc. It is often assumed (as in Proksch and Slapin’s work, for example) that scaling to a single dimension recovers traditional economic left-right positions. However, research on Euroskepticism and European party positions on the left-right scale has uncovered a relationship between these two dimensions that varies across European countries, sug- gesting a single scaling dimension is an oversimplification (Hooghe, Marks, & Wilson, 2002; Nanni et al., 2018; Proksch & Slapin, 2010). Overall, scaling methodologies such as Wordfish and Wordscores are not a good candidate for measuring accountability in a Westminster parliamentary system. Their assumption of a latent ideological scaling dimension is problematic and has measurement validity issues when applied to this context. Again, given the dialogic relation of accountability between government and opposition, the Wordscores methodology is inappropriate given its reliance on specifying reference texts for scaling; it would be difficult to identify speeches that sufficiently exemplify accountability out of context and across time. On this note, scaling methodologies also rely on an assumption of the stability of the reference dimension over time that would be problematic for a historical analysis across many parliaments, paralleling a previously-discussed flaw with dictionary approaches.

4.4 Supervised and Unsupervised Machine Learning

Textual scaling approaches can also be understood from an alternative disciplinary perspective as part of the family of machine learning methods in computer science. Machine learning is a general name for automated approaches to generalization from data. Machine learning models fall into two general types: supervised and unsupervised. In general, supervised models are appropriate for tasks where we already possess some labelled data sample, or training set, and wish to learn a general rule linking inputs to their labels. In the categorical case, this problem is known as classification; for the continuous case, it is known as regression (Murphy, 2012, 2). In many cases, supervised learning can be understood as a different disciplinary language for describing familiar techniques of linear, multiple, and logistic regression from a probabilistic perspective (Hastie, Tibshirani, & Friedman, 2009, 9). Unsupervised methods are primarily used for data exploration, with a goal of finding interesting patterns within some unlabelled set of data automatically. The choice between supervised and unsupervised machine learning methods is generally Chapter 4. Methodology 51 a matter of which approach best fits the research question; they should not be considered competing, but rather complimentary, techniques (Grimmer & Stewart, 2013, 15). One simple example of supervised text classification is a filter for email spam. First, a sample set of emails is coded by hand as “spam” or “not spam”. Each email is converted to an unordered list of words called a bag of words representation and transformed into a numerical vector to make calculations simpler. Then, a classification algorithm is given this training set as input. As it processes documents in the training set, the algorithm gains evidence that words such as “cheap”, “casino”, and “viagra” are highly likely to occur in spam emails and unlikely to appear in normal emails. With enough training data, the algorithm will be able to flag new incoming emails as either spam or not spam based upon words in the new emails and the associated probabilities. In the case of automated content analysis, a training set comprising some random sample of the overall corpus is human coded according to a predefined scheme. The size of the training set must be carefully selected in order to minimize classification error and maximize validity. If the training set is too small, the model will underfit and lack enough information to classify the main data set; if it is too large, the model will overfit and map too closely onto spurious noise in the training data (Murphy, 2012, 22). After the training set is ready, the learning algorithm is then trained using some percentage (typically 80%) of the training set (Murphy, 2012, 23). If the training set is large enough, the “held out” remainder is utilized as a validation set to assess performance of the trained model. If the training set is too small to set aside a validation set without risking underfitting the model, then a technique called cross validation can be used to assess performance across different permutations of validation sets from within the training data (Murphy, 2012, 23–24). Finally, the trained model can be applied to the uncoded main dataset and the results interpreted. A number of political science studies have applied supervised learning techniques similar to the above example to successfully code political texts; an early and prominent example is King and Lowe’s automated detection and coding of international conflict events from news releases (King & Lowe, 2003). A variety of classification algorithms are available to researchers, and the choice is generally dictated by the features of the problem to be studied. Naïve Bayes classifiers, employed by Monroe, Colaresi, and Quinn (2008), are very simple probabilistic models that rely upon a strong assumption of data independence; in practice, they perform surprisingly well especially when the size of the training set is limited (Murphy, 2012, 84). The support vector machine (SVM), a much more complex non-probabilistic algorithm, is particularly popular for text analysis in the literature. When applied to classification, the SVM searches for an optimal separation between classes of data points in high or n-dimensional space (Hastie et al., 2009, 417). Unlike naïve Bayes algorithms, SVM requires a large training set due to its high dimensionality; however, this limitation is minimized for textual data where one may have millions of words as data points (Joachims, 1998, 3). Diermeier, Godbout, Yu, and Kaufman (2012) use an SVM to extend scaling methods and investigate the content of highly partisan language in a study of the 101st to 108th US Senate proceedings. Klebanov, Diermeier, and Beigman (2008) employ another supervised learning method called decision trees to investigate the lexical cohesion of speeches by Margaret Thatcher as an indicator of cohesion in belief systems (Converse, 1964). As outlined in Chapter 3, Peterson and Spirling (2018) test multiple supervised learning algorithms to measure classification accuracy of partisan speeches over time in the British House of Commons. Unsupervised machine learning methods are typically used for data exploration. No training set is necessary: an algorithm processes a raw dataset and attempts to look for “interesting patterns” (Murphy, Chapter 4. Methodology 52

2012, 2), such as separate clusters of data points whose means are very close to one another (known as k-means clustering). Unsupervised methods save the difficulty and effort of constructing a human- labeled training set for each problem; on the other hand, the external validity of unsupervised results has fewer guarantees than in the supervised case, and requires stronger justification by the researcher. One task well-suited to unsupervised learning is dimensionality reduction, useful for simplifying very large data sets according to latent common features. Wordfish, for example, can be considered a simple unsupervised approach for scaling party positions from text to a low number of dimensions (Grimmer & Stewart, 2013, 26). Topic modelling is a name for a class of unsupervised machine learning algorithms for dimensionality reduction that model text documents as a joint statistical distribution across words and topics. As with many of the text-as-data methods described above, the underlying statistical techniques can be complex but are not unfamiliar to quantitative political scientists. In the next section of this chapter, I take the example of latent semantic analysis (LSA) to introduce and explain the fundamentals of a vector-based model of textual data from a quantitative political science perspective.

4.5 Latent Semantic Analysis

Imagine you are a teaching assistant in a Canadian politics course. Your students have been assigned to write a paragraph on the Canadian electoral system and you are marking their responses. As you read dozens of examples, you will see the same concepts selected for discussion over and over: “single member plurality,” for example, might be a phrase employed in nearly every single assignment. On the other hand, only a handful of students might choose to discuss voter identification requirements or procedures for handling spoiled ballots. Hypothetically, as the number of writing samples you receive approaches infinity, you could imagine the distribution of topics as representing the universe of what students know and can describe about the Canadian electoral system. This “semantic space” is in fact what guides your marking process: for an answer to be adequate, you expect to see a certain set of the top substantive points covered by the student’s answer. Latent semantic analysis is a dimensionality reduction method for summarizing the semantic features of a corpus of documents. LSA can be thought of as a generalization of factor analysis to textual data (Deerwester, Dumais, Furnas, Landauer, & Harshman, 1990, 395). Factor analysis, a standard method- ology in quantitative political science, proceeds as follows: a matrix of potentially interrelated variables is reduced, or decomposed, in order to isolate the smaller set of variables that are most responsible for the structure observed in the data. Essentially, LSA adds an additional layer onto machine learning models of text by mapping words (known as “tokens” in the language of computational linguistics) onto a numerical representation, then using this mapping to represent documents as high-dimensional vectors of token counts. The resultant term-document matrix (TDM) represents each document in the corpus to be studied as a column, and each word that appears anywhere in the vocabulary of the corpus as a row; each cell thus counts the frequency of a given vocabulary word in a given document (Landauer, Foltz, & Laham, 1998, 8). The next step, as in some approaches to factor analysis, is to perform singular value decomposition (SVD) on the term-document matrix. This represents it as the product of three component matrices: the orthogonal matrix of row (or term) vectors, the orthogonal matrix of column (or document) vectors, and a diagonal matrix of scaling values. The latter contains the coefficients of the relationships between the rows and columns, and can be reduced by deleting less important coefficients to yield a more parsimonious Chapter 4. Methodology 53 model (Landauer et al., 1998, 8). In sum, “LSA represents the meaning of a word as a kind of average of the meaning of all the passages in which it appears, and the meaning of a passage as a kind of average of all the words it contains (Landauer et al., 1998, 6).” The correlation matrix of term-document relationships characterizes the corpus’ latent semantic space of common meanings; each reduced-dimensionality term or document vector can thus be understood as a linear combination of some “amount” of each semantic meaning present in the corpus (Deerwester et al., 1990, 393, 395). These term or document vectors can also be used to compute useful measures of similarity; as I will describe later in this chapter, this is the starting point for the methodology I apply in this dissertation (Landauer et al., 1998, 16). LSA is also known as “latent semantic indexing,” or LSI, because of its usefulness in searching, index- ing, and document retrieval applications (Deerwester et al., 1990). However, like dictionary methods, its application has also been strongly influenced by the discipline of psychology. LSA has been employed in a range of psycholinguistic modelling applications including childhood language acquisition, student essay writing, learning and comprehension of texts, and characteristic word use patterns of neuropsy- chological conditions (Laham, 1997; Landauer & Dumais, 1996; Landauer et al., 1998; Miller, 2003; Rehder et al., 1998). Relevant to political science, LSA has shown very strong results as a predictive model for the priming effects of individual words on semantic meaning (Landauer & Dumais, 1997). In short, while LSA implements only a basic basic model of the semantic layer of text, it demonstrates how useful semantic information can be mathematically inferred from words without parsing their grammar. However, LSA suffers from limitations due to its incomplete mathematical and theoretical foundations; the explanation as to why necessitates a short detour into probabilistic linguistics.

4.5.1 Probabilistic Linguistics

In general, observations of frequency effects in the cognitive processes of language, including language acquisition and recall, support the assertion that language is probabilistic and can be usefully studied from a probability perspective (Bod, Hay, & Jannedy, 2003). Early seminal work on word frequency distributions by Zipf (1935, 1949) investigated the relationship between word frequency and frequency rank. Frequency rank is simply a descending rank of all words in order of their total frequencies: that is, if “the” and “and” are the first and second most common words in a sample text, they are assigned rank 1 and 2 respectively. Zipf observed that empirical word frequencies obey a simple form of power law: the frequency of any given word is inversely proportional to its rank z. Intuitively, a very small number of words are used extremely frequently in natural language, while the vast majority of words are used infrequently. More formally, let fz(z, N) equal the frequency of a word of rank z in a sample size N (notation from (Baayen, 2001)): C f (z, N) = (4.1) z za In the simplest form of the law, as described above, a = 1. Taking the log of this equation will yield a linear relationship, where a determines the slope and C is a normalizing constant:

log z(z, N) = log C − a log z (4.2)

As a visual example, the two plots in Figure 4.1 show fz(z, N) and log fz(z, N) for a sample dataset of Charles Dickens novels using the R package zipfR Evert and Baroni (2007). This power law observation has empirical applications to a surprisingly wide range of human be- Chapter 4. Methodology 54 14000 ) 12000 ) ) ) 1e+03 z, N ( 10000 z, N z ( f z f log 8000 6000 1e+01 Frequency ( Frequency ( 4000 2000 0

1 4 7 10 14 18 22 26 30 1e-01 1 2 5 10 20 50 Rank (z) Rank (log z)

Figure 4.1: Frequency vs. word rank distributions from zipfR Dickens data, linear (left) and logarithmic (right) scales haviour, from social network friendships to income inequality to population distribution. In many such situations, a few members are extremely popular but popularity decays exponentially, resulting in a skewed distribution with an long tail of unpopular members. Zipf’s law (and the related zeta distribu- tion of probabilities) have significant limitations and are only a starting point to statistically modelling natural language. However, the key observation is that word frequency distributions are not normal, word choice is not purely random, and sample size matters to the shape of word frequency distributions, making it inappropriate to use standard statistical techniques which assume a Gaussian distribution (Baayen, 2001, 32, 34).

4.5.2 Limitations of LSA

Based upon the prior discussion, LSA suffers from two drawbacks as a methodology for understanding semantic patterns in documents. The first is practical: if human language generally follows Zipf’s law, then the majority of words employed in documents will tend to the overwhelmingly similar such as “and” and “the,” potentially obscuring relevant information. This problem is exemplified by LSA’s applications in indexing. If you were creating an index to this dissertation, relatively frequent words such as “it” and “and” would be useless to someone attempting to navigate the text. On the other hand, including only the most unique terms found in the dissertation would also make for a poor index. Words that occur only once or twice in the dissertation are very unlikely to be thematically important to a reader. LSA applications address this issue practically through text preprocessing measures and weighting functions, as do many other machine learning analyses of text. Preprocessing both improves the qual- itative performance of these algorithms in applications like indexing and prediction, and also has the benefit of reducing computational time. The best practices of preprocessing are quite standard in the literature (Lucas et al., 2015, 4–5). The most widely used is lemmatization, or stemming: the removal of word suffixes to yield a collection of more general root words. The Porter stemming algorithm is Chapter 4. Methodology 55 standard for English language applications (Porter, 1980). At this stage, simple conversions such as removal of numbers, punctuation and a transformation to lower case are typically performed. After these preprocessing steps, nonprobabilistic methods such as LSA must employ some weighting or transformation procedure to cull very frequent and very infrequent words; this is essentially an initial dimensionality reduction step. One approach is to simply remove the most and least frequently occurring terms across the whole corpus according to some researcher-specified higher and lower bound. Another is to remove a standard list of frequent stopwords such as “and” and “or” (Grimmer & Stewart, 2013, 7). In many LSA applications a log occurrence transformation is used that standardizes words according to their entropy, an information gain measure defined as the “sum of the probability of each label times the log probability of that same label.” (Landauer & Dumais, 1997, 17). The most popular such transformation in the literature is the tf-idf, or term frequency-inverse document frequency, transformation, a metric inspired by Zipf’s law (Sparck Jones, 1972). For each document in the corpus, the frequency of each unique term that appears within it is tabulated. For each term, the inverse document frequency is calculated as the log of the total number of documents divided by the number of documents that contain that term at least once. Multiplying these two numbers together yields a tf-idf value that expresses the “specificity” of a term-document pair: that is, a high tf-idf value means that term appears highly frequently in that document, but appears very infrequently in other documents in the corpus (Blei, Ng, & Jordan, 2003, 994). A cutoff value for tf-idf is thus useful for removing terms that appear frequently across many documents, or that appear very infrequently overall. For all these weighting methods, close familiarity with the data, precedent from the literature, and judgement are critical to achieving a compromise between validity and precision. The second issue is a fundamental flaw specific to LSA. In the early 1990s, LSA was made possible by advances in computational speed and algorithmic design that permitted efficient computation of SVD for very large matrices (Deerwester et al., 1990, 394–395). The use of SVD in LSA enforces an implicit assumption that both dimensions are drawn from a constant multivariate normal distribution, since they can be represented as a combination of mean (via the row and column matrices) and variance (via the covariance matrix) alone (Hofmann, 1999, 291). However, as discussed in the previous section, an assumption of normality is empirically unsupported for describing word choice. One possibility is to perform more complex preprocessing and weighting steps to approximately normalize the term- document matrix. However, given advances in computational speed since the 1990s, it would be more valid and more accurate to model word choice and document content according to some set of probability distributions that can flexibly represent topical dimensions of human language use. The second, more general advantage of a probabilistic approach is that it allows us to quantify our uncertainty about topics. LSA does not allow us to, for example, test hypotheses or make scientific statements about topical clusters, as they are themselves correlation coefficients. Taking this step generally necessitates a Bayesian statistical approach, exemplified by latent Dirichlet analysis (LDA).

4.6 Measuring Lexical Similarity: Cosine Similarity and the Term- Document Matrix

I do not use the LSA methodology in this dissertation. However, I do make use of the basic methodolog- ical building blocks upon which it is based. As discussed in the previous chapter, I make a distinction between lexical and semantic similarity. For the former, I rely on cosine similarities based on the TDM. Chapter 4. Methodology 56

For semantic similarity, I employ a method that improves upon the limitations of LSA: document em- beddings. To calculate lexical similarity, I compare document vectors across government and opposition speeches in a matter consistent with my empirical understanding of parliamentary accountability. First, I use a simple hashing vectorizer (a more memory-efficient approach to count vectorization) to construct a TDM and standardize these vectors using L2 normalization.1 Prior to vectorization, I perform some of the standard text preprocessing methods including stemming, lowercasing, and removal of numbers and punctuation. However, I do not remove stopwords nor transform the term-document matrix using an algorithm such as tf-idf to reduce the influence of frequently-used words on the results. As discussed above, very common words constitute noise in a topic modelling or classification model. However, there are important reasons for their inclusion in an empirical measurement of accountability. The psychology literature has documented how linguistic style matching between conversation partici- pants, especially in use patterns of common words like pronouns and articles, is correlated with social dynamics of coordination and engagement (Niederhoffer & Pennebaker, 2002). Empirical studies have observed this relationship in a variety of contexts relevant to modelling parliamentary debate, includ- ing political speechmaking, conflict management, and deception scenarios (Hancock, Curry, Goorha, & Woodworth, 2007; Romero, Swaab, Uzzi, & Galinsky, 2015; Taylor & Thomas, 2008). The retention of stop words in documents thus substantially increases the amount of raw data available in our linguistic model (recall Zipf’s law from earlier in this chapter) in an empirically useful manner. Second, a proce- dure like tf-idf applied to a large, historical debate dataset would also overweight unique content words, such as those related to rare political events. While I choose not to remove stop words from my dataset, I remove their opposite, namely extremely infrequent words. Practically, such words almost always represent OCR errors in a digitized historical dataset such as the one I employ in this study. I use a lower bound of 50 occurrences across the corpus as my removal cutoff.2 Denny and Spirling point out how ideally preprocessing choices (for example, lowercasing and stopword removal) should be informed by theory rather than literature conventions, especially when those conventions are from different methodological and disciplinary domains bearing large unexamined assumptions (Denny & Spirling, 2018, 3). To summarize, I employ a vectorization procedure intended to better capture consistent patterns of lexical similarity based on my measure- ment model of accountability, rather than, for instance, amplify lexical distinctiveness for purposes of classification.3 Calculating a similarity matrix across more than two million documents, each containing tens of thousands of vocabulary dimensions, is computationally demanding. To reduce the size of this practical problem in a way that makes theoretical sense, I concatenate individual speeches to match my level of analysis prior to vectorization. In the Question Period dataset, I aggregate all government speeches, and all official opposition speeches, from a given day’s Question Period debate into two “documents” for comparison. In the larger daily debate dataset, I perform a similar aggregation of all government and

1Normalization essentially allows for the valid comparison of document vectors independent of their original word counts (in other words, their vector lengths or magnitudes). The L2 norm transforms vectors to the Euclidean unit sphere and in practical terms is calculated as the the square root of the sum of the squared vector values. L2 normalization at this stage also simplifies the later computation of cosine similarity given the latter reduces to the dot product between two unit length vectors. 2This value was selected through a combination of manual review of a list of unique tokens and their frequencies, and simulations of the effect of different removal cutoffs on dataset size. 3See (Monroe et al., 2008) for further discussion of these methodological issues, and an alternative probabilistic approach to regularization. Chapter 4. Methodology 57 all official opposition speeches in a given day.4 In this case, each day of debate (less those days with less than 10 speeches, to again account for OCR errors) is associated with one score that represents the mean similarity between each government speech and each opposition speech that took place that day. To perform the similarity score calculation, I employ a simple cosine similarity method. Cosine similarity is equal to the dot product of two vectors over the product of their magnitude, as in Equation 4.3, or in other words the dot product of two unit-length document vectors.

A · B cos(θ) = (4.3) kAkkBk The more opposing the direction of two document vectors in unit space, the lower the similarity between the two documents. The length of the documents does not affect these similarity values, so my approach of concatenating speeches (for example, on a per-party-per-debate basis) requires no adjustment or weighting to compensate for the reality that governments have more time to speak in Parliament. Values for cosine similarity across document vectors are always positive, since they represent counts of tokens in documents and there is no such thing as a negative word count. Thus, the values I report for cosine similarities range from 0 to 1. To analyze the impact of independent variables of interest on lexical similarity scores, I use hierar- chical mixed effects models, a type of generalized linear model (Gelman & Hill, 2006). The independent variables I include in these models are majority/minority status (as a dummy variable), percentage of seats in the House of Commons held by the government, and percentage government poll popularity (including a t − 1 lagged value), depending on the particular proposition I test (see Chapter 3). To account for unobserved variation in political and historical circumstances during the period under study, I also include random effects terms that reflect the hierarchical structure of the data: daily observations are nested within months and years, but also within parliaments and sessions. Based on the level of analysis (either by quarter or by parliamentary session), I include a random effect for sessions within parliaments, or parliaments alone.5 For more technical details including software packages used in this analysis, see Appendix C.

4.7 Measuring Semantic Similarity: Probabilistic Vector Repre- sentations of Words and Paragraphs

The methodological and measurement approach I have outlined above is the foundation of the majority of my quantitative analysis, and is appropriate for investigating the first four of the five questions I ask in Chapter 3. These empirical propositions concern the impact of independent variables like majority sta- tus and government popularity on parliamentary accountability, which I conceptualize for measurement as normalized lexical similarity between government and opposition speeches. However, my theoretical approach also proposes that accountability in parliamentary speeches is negatively related with ideolog-

4For comparison and verification, I also analyzed lexical similarity across all opposition and government, across parties in the opposition, and within government using both the Question Period and daily debate datasets. However, I focus primarily on the results obtained from the government/official opposition in my analysis as it is the most relevant level of comparison for studying parliamentary accountability. Additionally, as I will discuss in greater detail in the next chapter, including third opposition parties in the analysis had only a minor impact on results. 5More specifically, polling data is available on a quarterly basis while seat percentage and majority status vary primarily at the parliamentary level. The large sample size daily observations in the dataset, however, threatens to artificially inflate the statistical significance of any result. To perform a conservative test, I average similarity scores at the quarter, session, and parliament level where appropriate prior to fitting the model. Chapter 4. Methodology 58 ical argument. That is, in deciding what to speak about in Parliament, MPs must balance the goal of being accountable or holding the government to account with the alternative aim of representing and defending distinctive policy positions. I proposed that an alternative approach to measuring linguistic similarity, one that captured semantic similarity, would be more appropriate for measuring ideology in parliamentary speech. Such a measurement task is difficult: there are multiple layers of endogeneity at work. In an adversarial parliament, accountability and ideological distinctiveness often go hand in hand, especially in two party systems. Methodologically, linguistic and semantic similarity measures should not differ substantially from each other in most applications, hence the popularity in the literature of using classification methods for studying ideological polarization. As discussed in the previous chapter, my approach is based on the premise that MPs (or, at least, their partisan speechwriters) hold consistent ideological views that structure their understandings of word meanings. Measured across many speeches, a party’s selection of particular words and their combination into phrases will reflect ideologically-rooted differences in meaning. The starting point is similar to Peterson and Spirling’s model of “left” words and “right” words. However, I expand the construct to include contextual collocations such that there are left word structures and right word structures, or ways of using words together with other specific words in sentences, on average. I argue that if two parties use a given word at the same rate, but speak about that word in a significantly different manner, we have observed an indication of ideological disagreement. The measurement approach I employ to investigate this model of ideology in language is drawn from the natural language processing literature and involves modelling words as a function of their surrounding partners, known as /textitword embeddings. To recap, in the LSA approach I take above, tokens are mapped onto a numerical representation across the entire corpus, which is then used to represent individual documents as vectors of word counts. The similarity of two documents, or two words, can then be calculated based on the distance between their respective vector representations. Fundamentally, LSA is a simple count-based approach: word vectors are defined with respect to a frequency count of their occurrence in documents. However, newer computational approaches exist for modelling word and document vectors probabilistically in a multidimensional space dependent upon their frequent close neighbours within some threshold of proximity. A popular algorithmic approach developed by engineers at Google is word2vec, which uses an unsupervised neural network learning model or “deep learning” approach and is computationally much faster than LDA topic modelling at summarization of very large textual datasets (Mikolov, Sutskever, Chen, Corrado, & Dean, 2013). A competing approach to word embeddings for machine learning applications, GloVe, essentially combines the computational benefits of the count-based approach with some of the mathematical logic of the word2vec approach (Pennington, Socher, & Manning, 2014). The practical difference between the two methods is essentially a trade-off between a longer computational time (word2vec) and a higher memory footprint (GloVe). The computerized text analysis methods discussed so far in this chapter have relied on the assumption that text documents can be represented as an unordered “bag of words,” and that syntax has no impact on the meaning of a document (Jurafsky & Martin, 2000, 647). Individual tokens, referred to as unigrams, are typically the textual unit of analysis (Jurafsky & Martin, 2000, 195). Some researchers attempt to increase analytical leverage by studying small groups of words—bigrams, trigrams, or more generally n-grams of text—in order to include a basic level of syntactic information. However, such approaches introduce exponentially larger numbers of variables the longer the word group, representing a severe computational cost. There is general consensus in the computational social science literature that using Chapter 4. Methodology 59 n-grams provides only a minimal benefit compared to its efficiency cost, and that unigrams are a suitable unit in most applications when a bag-of-words approach is employed; this is the approach I follow in my lexical similarity analysis. (Hopkins & King, 2010; C. D. Manning, Raghavan, & Schütze, 2008). An embedding-based approach to word vectorization is an alternative method of incorporating syntac- tical information in the model of a word. In the skip-gram embedding model implemented by word2vec, each word W and its surrounding context C drawn from a dataset D are represented by vectors vW and vC . The context is modelled as a linear bag-of-words characterized by a size parameter k. For example, if k = 3, the context of the word W is of length 6 and consists of the three words preceding

W and the three words following W . The algorithm learns the set of parameters for vW and vC that maximizes the probability they originated from the training dataset D (Levy & Goldberg, 2014). The word of interest W is thus “embedded” in a higher dimensional space of other words related by their proximities and likelihoods in the sample sentences upon which the model is trained. This approach to semantic recovery yields an interesting linear mathematical property of “additive compositionality” of word vectors. Essentially, such embeddings allow us to represent simple linguistic analogies or relations as vector addition problems with surprising accuracy. An example of this type of relation can be seen in Equation 4.4, which captures the meaningful concept of a Canadian provincial capital.

v(“toronto”) + v(“alberta”) − v(“”) ≈ v(“”) (4.4)

The mathematical representation of linguistic relations exemplified by Equation 4.4 is less intuitive than the word count vectors employed in the LSA approach. It is difficult to conceptualize what each extremely high dimensional word vector value “means”, and impossible to summarize exactly how it was derived. From a scientific perspective, deep learning models such as word2vec can be criticized for their “black box” generative processes; the process by which the final output is achieved can be impossible to trace in reverse for purposes of inference (Lipton, 2016). Embeddings are nevertheless extremely useful for prediction and classification tasks, such as generating search engine results or product recommendations, where obtaining the most accurate result (for example, “edmonton”) is more important than understanding its generative process. Users can also reduce the computational intensity of these prediction tasks by making use of pre-trained embedding models, developed using very large corpora such as Wikipedia or Google Books (Bojanowski, Grave, Joulin, & Mikolov, 2016). Finally, embeddings can also be aggregated at arbitrary levels, as well as combined with other vectorized features in models incorporating information about documents and their metadata. doc2vec, for example, is an algorithm that leverages an adaptation of the word2vec algorithm to perform a supervised learning task at the document level. In simple terms, doc2vec trains intermediate word2vec representations for words within tagged documents to generate a predicted document-level vector.6 Calculations of similarity or distance can then be performed across documents and document tags, or more complex clustering models can be trained using the vectors as inputs (Lau & Baldwin, 2016; Le & Mikolov, 2014). In the last stage of my analysis, I use doc2vec instead of a hashing vectorizer to generate document vectors, and perform the same cosine similarity calculations as before. Initial research suggests that word embedding methodologies are an effective way of measuring ideo- logical positions in political texts, presenting a promising alternative to scaling methods such as Word- scores and Wordfish (Iyyer, Enns, Boyd-Graber, & Resnik, 2014). Menini and Tonelli (2016) argue that

6More specifically, I use the PV-DBOW algorithm, which generates paragraph or document vectors via an intermediate skip-gram vector representation of words (Le & Mikolov, 2014). For more technical details, please refer to Appendix C. Chapter 4. Methodology 60 both sentiment (speaking positively or negatively about a policy issue) and semantic content (in their model, the proposed solution to the policy issue) can express ideological agreement or disagreement. They identify two reasons why previous approaches to supervised classification of political positions have per- formed unrealistically. First, the relative difference between statements within a policy domain—rather than absolute positions on an abstract scale—is a more pertinent way of understanding political differ- ences in points of view. Second, statements about policies must be examined at a higher level of analysis than individual words, as contextual information is critical to the task (Menini & Tonelli, 2016). These observations lead them to an approach to dataset construction very similar to the one I employ in my study: they select sets of text that share an appropriate context for comparison, then classify paired pas- sages of these texts that represent exchanges or arguments. They study three categories of features and assess their performance on an ideological classification task: sentiment (methods of measuring positive or negative affect), semantic (including word embeddings and cosine similarities), and surface (including lexical overlap and use of negation words) (Menini & Tonelli, 2016, 5). They find that word embeddings are the best-performing feature in classifying ideological agreement and disagreement, although a com- bination of features was the most effective overall. These findings support my epistemological claims about the importance of context and the distinction between lexical and semantic textual similarity in measuring ideology in political speeches, and confirm that word embeddings are a strong option for the measurement of ideological polarization. Menini, Nanni, Ponzetto, and Tonelli 2017 also expand on this research by testing a variety of topic modelling methods. Their goal is to increase the quality of data inputted to their ideological classification method by selecting sets of debate texts that share an underlying context and are therefore relevant for ideological comparison. Using data from the American political context, they find that agreement on a “political system” topic, concerning previous administrations and responsibilities, is much stronger than disagreement over other topics such as external affairs (Menini et al., 2017, 2942). This is additional evidence that debate surrounding the “political system” or institutional domain is structurally distinct from ideological disagreement even in non-parliamentary systems, and that semantic context is key to separating accountability from ideological disagreement for purposes of measurement. In my application, I train a doc2vec model on each individual speech, tagged by party status and date, in the the daily debate dataset.7 Since this model represents a given word in terms of its window of contextual words, I cannot concatenate speeches into a continuous document prior to the training phase since doing so would introduce false contextual relationships across the boundaries between distinct speeches. However, once the speech-level model is trained, I can concatenate government and opposition speeches on a Question Period or daily basis as before and use the model to generate inferred vectors for each new “document”; this is essentially identical to summing each independent speech vector for the government and for the opposition for that day. Finally, I perform the same cosine similarity calculation to compare these pairs of vectors. Additional technical details are available in Appendix C. The scores that result, which I refer to as semantic similarity scores, can be compared in relative terms with their lexical similarity equivalents across time. For a small sample of speeches, these scores should generally be strongly positively correlated, as they are distinct but inherently related calculations applied to identical texts as data. For larger samples, as discussed in Chapter 3, I expect to observe a much weaker relationship.8 Any such discrepancy between these measures would be of great analytical

7Note that in the training phase, I include speeches from all opposition parties, not just the Official Opposition, in order to incorporate as broad a variety of training data as possible. 8Technically, the effect of textual concatenation on the lexical similarity dataset will tend to emphasize similarity Chapter 4. Methodology 61 interest as it would confirm that there are systematic differences between how parties use words and how they place these words in context, as discussed in Chapter 3.

4.8 Study Dataset

The data set I employ in this dissertation originated with the Dilipad digitization project, an international collaboration of research teams at the Univerity of Amsterdam, Institute for Historical Research at the University of London, and the University of Toronto. At the latter, I participated in an interdisciplinary team of computer scientists, digital historians, and political scientists producing a complete, machine- readable digitization of the Canadian Hansard in English from 1900-1993. Each speech in the dataset is linked with associated metadata regarding the speaker’s personal data, political affiliation, and career. I have also extended the original Dilipad dataset from 1993 to present with the addition of proceedings scraped from the API provided by openparliament.ca. This extended dataset can be accessed at www.lipad.ca (Beelen et al., 2017). For more details about the digitization process, please refer to Appendix E. For my analysis in this dissertation, I also wrote software to scrape additional data from online sources and incorporate it into a new revision of the Lipad database. I gathered additional data on elections and parliamentary sessions from the Library of Parliament’s ParlINFO database, allowing me to link additional information to speeches and party files such as parliamentary status, seat count, and electoral results (Library of Parliament, 2016). I performed additional OCR and structural corrections to increase the precision of identification of Question Period debates from the Lipad originals. I also gathered polling data from ODESI and CORA, two databases of Canadian opinion research results, for all dates available and for each party represented in the dataset and linked it to party files in the Lipad database (Canadian Gallup Poll, 2007; Environics, 2010). More specifically, I compiled all available results from Gallup and Environics polls, beginning in 1953, for the preferred party questions or the closest question asked to the following: “If an election were to be held tomorrow, which party would you vote for?”. Because data were collected at irregular intervals—sometimes monthly, sometimes quarterly—I collapse these polling data to a quarterly basis. Additional details about dataset filtering are available in Appendix C.

across the most common words in the parliamentary dataset, resulting in scores closer to 1. In relative terms, the semantic similarity measure will emphasize difference because each individual word compared is represented by an additional layer of distinguishing information about its likely neighbours. Thus, although both measures will theoretically range (as do all cosine similarities) between -1 and 1, values for lexical similarity will likely exceed those of semantic similarities I observe. Given all speeches I study are in English and share a very consistent form and context, all scores are also likely to be greater than 0. Chapter 5

Qualitative Validation: Opposition-Minister Exchanges in the 38th and 39th Parliaments

In Chapter 3, I proposed a linguistic similarity measure of parliamentary accountability. Underlying this measurement approach is a two-part theory. First, I conceptualize parliamentary speeches in terms of their content, reflecting a trade-off under which parties can choose either to emphasize accountability or ideology in their political communication depending on their collective political goals and the incentives they face. Second, I modelled the accountability component of this equation as a three step process of information, questioning, and judgement, following Bovens’ empirical model of parliamentary account- ability. To test this model and the propositions advanced in the previous chapter, I pursue a multi-staged approach, proceeding from the micro level of individual exchanges between MPs in government and the Official Opposition to the macro level of comparison across parliaments. First, to assess the external validity of both my theoretical approach and my measurement strategy, I begin with a qualitative study of how lexical similarity scores reflect substantive patterns in debate speeches. I perform a case study of two recent minority parliaments, employing close readings of speech exchanges on a variety of debate topics to explore the relationship between my quantitative accountability measure and the qualitative level of accountability in parliamentary speeches. Second, I shift to a fully quantitative approach and analyze lexical similarity scores within Question Periods from 1978 to 2010, and on a daily level from 1945 to 2015. In this chapter, I examine the texts of debates from two recent minority parliaments: the 38th Parliament (Liberal) and 39th Parliament (Conservative). My goal is to qualitatively assess whether accountability in parliamentary speeches appears to be correlated with the similarity scores they yield. I selected these particular parliaments for this case study for two reasons. The first is relevance for performing a validity assessment of the measure. Both were modern minority parliaments, occurring one after the other, with an alternation of power and Official Opposition status taking place between the Liberals and Conservatives. I select two minority parliaments because of my theoretical expectations that parliamentary accountability will be stronger under these conditions, and thus more qualitatively visible in textual examples. The transition in government also permits me to investigate whether party

62 Chapter 5. Qualitative Validation 63 has an effect on this relationship. The second is that accountability was a central political issue within both cases. The sponsorship scandal that contributed to the defeat of ’s Liberals informed a Conservative electoral platform and later approach to governance that focused on the accountability of elected officials. Developing a “gold standard” of what parliamentary accountability looks like to a human reader of Hansard is a difficult task. I take the following methodological approach to the qualitative verification of my accountability measure. Following the logic of Bovens’ model, the steps of parliamentary account- ability should be observable within passages of exchange on a given topic, with individual ministers providing information and justification and opposition members questioning these rationales. The clos- est real-world situation to this theoretical model in the Canadian House of Commons is Question Period, the daily 45-minute period during which opposition party members are able to question the government. I expect that when opposition members ask reasonable, answerable questions, and in turn when minis- ters address their key points with appropriate information and justification, a higher lexical similarity score will result between the question and answer speeches. When an opposition asks an ideological (or orthogonal) question or a minister dodges a legitimate matter of parliamentary accountability—or both—I expect a lower similarity score will result. Using debate subtopic information from Hansard, I select and classify all such example exchanges between opposition members and ministers from Question Period in the two parliaments under study. My approach in selecting particular exchanges for closer reading is to group all Question Period exchanges by topic, focus on those topics with more than 50 examples, and calculate the mean similarity score for each topic. Then, I select high, medium, and low scoring topics for closer analysis, guided by a historical analysis of each parliament to identify and contextualize controversial policies and particular accountability scandals. Within each topic selected for closer analysis, I analyze the two highest and lowest-scored exchanges within each sample.1 It is important to note that given my approach attempts to measure collective accountability, it would not be methodologically valid to make claims about which ministers were the “most” or “least” accountable in a government, especially since the demands on a minister in Question Period vary drastically based upon their portfolio. This is why I choose to focus on debate topics, rather than ministers, as my grouping variable in this analysis.

5.1 38th Parliament: Liberal Minority

5.1.1 Historical Overview

The 38th Parliament lasted only one session, from October 4, 2004 to November 29, 2005. It was a turbulent minority government characterized by political scandal and internal political strife within the government and the Liberal party more generally. The Liberals had been in power since 1993, when Jean Chrétien, teamed with future Finance Minister Paul Martin, offered the famous Red Book campaign platform to voters—most remembered for its proposal to eliminate the GST instituted by the preceding Progressive Conservative government. The platform was well-received and the Liberals rewarded with an impressive majority government, nearly eliminating the Conservatives from the House of Commons altogether. The first year of the new government was marked by increasing criticism from economists, lenders, and international market-makers regarding Canada’s fiscal deficit and growing national debt.

1In some cases, extreme low similarity examples arose due to a speech that was wrongly classified as a question response. I did not select these examples for closer study. Chapter 5. Qualitative Validation 64

Going above and beyond the Red Book promises of fiscal responsibility, Martin’s 1995 Federal Budget successfully implemented severe cuts to government spending and transfer payments; the budget would become recognized as a historic accomplishment for the Liberal Party. On the national unity front, another significant victory was the successful No campaign in the 1995 referendum on Quebec sovereignty. Over the life of the government, however, relations between Chrétien and Martin became increasingly strained. Martin’s desire to pursue additional devolution of fiscal responsibility to the provinces, part of an aggressive deficit management strategy, clashed with Chrétien’s strong belief in . In 2000, Martin’s faction of partisan supporters became increasingly insubordinate over Chrétien’s de- cision to seek a third leadership term instead of making way for Martin. Years in power had allowed for the accretion of public scandals, including Chrétien’s direct personal involvement in the “Shawinigate” real estate controversy, and public perception grew that the Chrétien Liberals were tired and corrupt. The Chrétien era closed with a rancorous and drawn-out transfer of power. Martin was dismissed as Finance Minister in June 2002, affording him the flexibility to consolidate partisan support behind the scenes, while Chrétien announced his retirement in August 2002 but proceeded to extend the process over the next eighteen months (Clarkson, 2006, 240). Martin finally assumed leadership of the Liberal Party in late 2003 (LeDuc, McKenzie, Pammett, & Turcotte, 2010, 484). In advance of fresh elections, he set about replacing the cabinet and removing Chrétien loyalists from key positions in order to signal a fresh governing approach. Taking note of growing voter malaise in Canada following a decade of Liberal rule, Martin’s campaign approach centred around addressing the Canadian “democratic deficit.” Observers initially predicted a relatively safe path to majority re-election for the Martin Liberal team (Clarkson, 2006, 241). However, this trajectory was launched off-course in February 2004 as the “sponsorship scandal” burst into public consciousness. Auditor General Sheila Fraser released a report detailing the improper awarding of contracts and misuse of public funds that took place within a federal communications and sponsorship program in Quebec administered by the Department of Public Works. The Liberal Party, including members of Chrétien’s inner circle, were directly implicated. Advertising firms and consultants with Liberal ties had been awarded payoffs or sham contracts since the inception of the program in 1996; some of this money made its way back to Liberal party staffers in kickbacks, and to other government organizations including the RCMP and (CBC News Online, 2006). Martin’s response to the scandal attempted to turn the situation to his advantage by raising the profile of the sponsorship scandal as an accountability issue, driving a wedge between the old “politics as usual” Chrétien Liberals and his rejuvenated approach to government. He appointed an independent commission, headed by Justice , to investigate the sponsorship program and dismissed involved officials including , the former minister of Public Works, from their positions (Clarkson, 2006, 244). However, the spectacle of Chrétien’s longtime finance minister disavowing both knowledge of and responsibility for a prolonged misuse of public funds was a public relations failure. It worsened the persistent internal divisions within the Liberal party as well as confused voters who had otherwise been inclined to support Martin on the basis of the positive accomplishments of the Liberals under Chrétien, especially the fiscal leadership shown by Martin as Minister of Finance (Clarkson, 2006, 244, 246). Committed to his decision to hold early elections to seek a national mandate for his leadership, Martin’s anti-“adscam” campaign strategy yielded a disappointing result for the Liberals in the June 2004 election. Their opponent was the newly-formed Conservative Party of Canada, a merger of the right- Chapter 5. Qualitative Validation 65 wing (formerly the ) and the Progressive Conservatives. The Conservatives organized an effective campaign emphasizing the sponsorship scandal as an embodiment of pernicious Liberal corruption, but were held back late in the race by messaging missteps of their own. The final result of the 2004 election yielded a minority government for the Liberals. The 38th Parliament began with 135 Liberal government members, with the Conservatives as Official Opposition holding 99 seats. Throughout its short life, Martin’s minority government was overshadowed by ongoing testimony and new findings arising from the Gomery Commission. Significant policy announcements, including increased transfers to the provinces for health spending and a national child care program, were unable to dislodge the sponsorship scandal from the headlines (LeDuc et al., 2010, 499). Furthermore, the Martin Liberals were an awkward fit for the type of governance that had characterized previous successful Liberal minorities. A clear potential existed for a governing partnership with the NDP, especially considering Martin’s proposals for additional social spending and symbolic gestures including opposition to the US missile defence shield program (Jeffrey, 2010, 526; LeDuc et al., 2010, 499). Such collaborations on the left had underpinned the stability and productivity of previous Liberal minority governments under Prime Ministers Pearson and Trudeau. However, senior Martin team officials deliberately chose a defensive and uncooperative strategy for dealing with opposition parties (Jeffrey, 2010, 526). The decision was informed by their perception of constant negative media pressure, not only related to the ongoing Gomery inquiry but targetting Martin personally as an indecisive and weak leader, a portrayal he had been unable to shake since the campaign (Jeffrey, 2010, 527). The general result was a Parliament characterized by “unusually deep partisan hostility” and a PMO (Prime Minister’s Office) with paranoid and micromanaging tendencies (Jeffrey, 2010, 547). Martin’s first cabinet had been crafted in the interim between his leadership succession and his electoral victory, and had emphasized the removal of Chrétien loyalists. However, the newly-elected minority government needed to be filled out given six of the previous sitting ministers had lost their seats. Free cabinet positions were allocated to Martin supporters from the West, rural areas, and Quebec to compensate for the poor electoral representation of the government in these regions (Jeffrey, 2010, 527–8). A non-traditional tactic employed in the selection of Cabinet ministers, and one that reinforced the Liberals’ tactical decision not to cooperate with opposition parties, was to elevate high-profile floor- crossers to Cabinet positions. These ministers included the former NDP of BC, , and the Conservative leadership candidates and, later, (Jeffrey, 2010, 529). The combined effects of internal division between Chrétien and Martin supporters, a defensive PMO, and enthusiastic promotion of floor-crossers had deleterious effects on caucus morale. As emphasized in Chapter 2, a cohesive and disciplined caucus is critical to governing effectively in a Westminster system. Lacking this foundation, the Martin government suffered from repeated public leaks reflecting internal caucus dissent even on the part of loyal supporters (Jeffrey, 2010, 357). Relations with the public service were also strained. One strategy to reinforce public opinion of the Prime Minister was to emphasize his prior successes as Finance Minister in the 1990s and tie this history to the government’s current fiscal competence. The desire to present a healthy budget surplus and the necessity of finding money to finance the new social spending promised by the Liberals necessitated drastic cost reductions elsewhere. These included sudden cuts to the public service and a subsequent refusal to negotiate with public sector unions. The consequences were labour actions against the government and an eventual, Chapter 5. Qualitative Validation 66 acrimonious public settlement (Jeffrey, 2010, 538). The remaining public service was also alienated by the combination of indecisive central direction interspersed with sudden and drastic PMO interventions in the bureaucracy in response to Gomery commission revelations or the perception of an accountability threat. The Martin government also suffered from disunion on public policy. One example was the Septem- ber 2004 Health Care Accord, which was not only criticized harshly in Parliament by the Opposition but by prominent Liberals. The accord, negotiated at a first ministers’ conference, was a work of asym- metrical federalism that offered separate terms to Quebec and other interested provinces (Jeffrey, 2010, 541).2 Former Liberal ministers including and Sheila Copps raised the alarm over what they saw as serious flaws in the accord’s spending commitments and the patchwork of unclear terms applied across different provinces (Jeffrey, 2010, 541). More generally, opposition to Martin’s philosophy of asymmetrical federalism (which the Health Care Accord embodied and sought to institutionalize) deepened as a fracture within the Liberal party. Interestingly, this was not the case in Parliament. The opposition parties all agreed in principle on some level of federalist decentralization in their platforms (Jeffrey, 2010, 544). The survival of the Martin minority government was challenged multiple times, the first as early as the Speech from the Throne in October 2004. Citing the government’s refusal to consult with opposition parties, Stephen Harper publicly announced his party’s decision not to support the government on the Speech. NDP leader complained to the media about the government’s refusal to negotiate on its platform and the lack of availability of cabinet ministers (Jeffrey, 2010, 547). The text of the Speech from the Throne reflected this lack of opposition consultation and mirrored verbatim many of the election promises on which the Martin government had campaigned (Jeffrey, 2010, 547). Rather than bring down the government so soon after an election by voting against the speech, the opposition parties responded with a carefully-formatted joint motion that challenged specifics of the agenda. The proposal incorporated the Bloc Québécois’ demand to alleviate the “fiscal imbalance” between the federal government and the provinces and to reduce interference with provincial jurisdiction. When the government opted to accept these critical amendments rather than defend the direction of its Throne Speech, the Liberal Party was deeply split over the decision. The reversal reinforced a growing public perception of the government’s weakness and desire to retain power at all costs (Jeffrey, 2010, 549). Behind a justification of accountability, the Martin government continued its preoccupation with removing Chrétien loyalists from appointed positions or pursuing them for apparent misconduct or spending irregularities. Notable on the list of targets were André Ouelette, David Dingwall and . Given Martin’s promise of a more transparent and accountable approach to public appoint- ments, the operation to fire officials and replace them with Martin supporters was controversial (Jeffrey, 2010, 551). In addition, new scandals emerged within the Martin cabinet. In November 2004, Minister of Citizenship and Immigration was accused of securing a temporary work permit for a campaign volunteer. The volunteer had entered Canada as part of a special work visa program for exotic dancers. Public awareness that the Liberal government had quietly been supporting an exploitative exotic dancer

2Asymmetrical federalism is an approach to the division of powers between the federal government and the provinces that first rose to prominence in the late 1980s and was most strongly associated with the Meech Lake Accord and the subsequent Spicer Commission. Under this model, one or more provinces possess powers within a given area of jurisdiction that vary from those of other provinces, generally established via individually-negotiated federal-provincial agreements. Quebec has frequently advocated for the approach and possesses de-facto asymmetrical powers in policy areas such as immigration (Bakvis & Skogstad, 2012). Chapter 5. Qualitative Validation 67 work program since the late 1990s combined with anger over Sgro’s personal impropriety. She took individual ministerial responsibility for “strippergate” and resigned in January 2005 (CBC News, 2005b; Jiménez, 2004). As the government wore on, more long-term Liberals took to speaking out in public against Martin’s leadership style, management of internal party politics, lack of clear policy objectives, and decisions taken that distanced the current party from the traditional Liberal brand (Jeffrey, 2010, 559–560). The government’s response to criticism from both the opposition and party insiders was to press onward as if it had a majority in the House of Commons. The first budget, tabled in February 2005, took a hard line against compromise with opposition parties. The Official Opposition initially announced it would support the budget, but the fortunes of the Liberals were again derailed in rapid fashion by the ongoing sponsorship scandal. Fresh reporting from the Gomery inquiry over the next months depressed Liberal polling numbers and boosted the Conservatives’ hopes, leading Stephen Harper to float the idea of opposing the budget at the end of March 2005 (Jeffrey, 2010, 561). The Liberals had no choice but to cooperate with the NDP on changes to the budget, including $4.6 billion in additional spending. The announcement of these programs was made by NDP leader Jack Layton, rather than the Prime Minister, at the end of April, further damaging the government’s reputation among dissatisfied Liberals (Jeffrey, 2010, 562). Prime Minister Martin took the desperate measure of appealing over the heads of Parliament di- rectly to the people in a televised address. He proposed a fresh election be held once the Gomery process was complete and its final report released in January 2006, allowing the Canadian people to pass final judgement on his government’s handling of the sponsorship scandal (P. Martin, 2005). In the wake of this move, the budget vote became a political sideshow that exemplified how, according to critics, the Martin government would do anything to stay in power. Belinda Stronach, the former Conservative leadership candidate, crossed the floor to the Liberals following backroom negotiations and was imme- diately appointed to a cabinet position. With a new extra member in the House, the result on second reading ended in a tie that had to be broken by the Speaker (Jeffrey, 2010, 565). The government cited the budget turmoil as an example of how the minority situation had made governing impossible, and hoped that, once the Gomery Report cleared the current government of wrongdoing, Canadians would consider supporting the Liberals to form a majority. The Gomery Commission’s preliminary report was released at the beginning of November 2005. It assigned partial blame for the scandal to the Liberal party despite Martin’s efforts to distance himself from the Chrétien staffers involved and to implement corrective measures. Opposition parties seized the chance for a definitive non-confidence vote (Jeffrey, 2010, 578). At this juncture, the interests of all opposition parties lined up. The Bloc saw an opportunity to rebound from historic low levels of support for Quebec separation; the Conservatives, likewise, saw the promise of victory in their polling numbers. The NDP initially anticipated an opportunity to extract further policy concessions from the Liberals, but Layton’s demands were rebuffed in private negotiations and the NDP committed to the opposition non-confidence strategy by late November (Jeffrey, 2010, 580). The opposition parties cooperated to pass a non-binding motion on the 21st of November, calling for a confidence vote to be held in January in anticipation of a mid-February election. The timing would allow for the release of the final Gomery report and permit the Canadian public to pass judgement on the Liberals. The government ignored the motion and further provoked the opposition by introducing a wide range of unexpected spending measures. On November 28th, Parliament simply passed a non-confidence motion. The Conservatives Chapter 5. Qualitative Validation 68 and the NDP blamed the Liberals for inflicting their own defeat by maintaining a culture of corruption and entitlement to office, and for their unwillingness to negotiate with opposition parties on policy issues. Instead of facing the public next spring, the parties geared up for a holiday campaign leading up to an election on January 23, 2006.

5.1.2 Topics of Debate

To select speeches for review, I identify discrete exchanges between a minister and an Official Opposition member that take place within Question Period and that share the same topic of debate as listed in Hansard. I then calculate similarity scores for each exchange and report within-topic means and standard deviations. Table 5.1 contains information on all such debate topics with 50 or more exchanges ordered by descending mean similarity score. Unsurprisingly, the debate topic comprising a disproportionately high frequency of exchanges in the 38th Parliament was Sponsorship Program, referring to the sponsorship scandal and the Gomery inquiry. Additional speech topics, including Government Contracts and David Dingwall, are closely related to the sponsorship scandal. The second-most discussed topic is Citizenship and Immigration; as will be investigated in more detail below, speeches within this topic involve alleged malfeasance by Minister of Citizenship and Immigration Judy Sgro and by her successor in the portfolio, . A qualitative review of content examples within each topical group suggests that higher similarity topics involve policy areas (for example, Natural Resources, Foreign Affairs, and Health) and particular pieces of legislation, while lower similarity topics involve subjects of political scandal. An obvious exception is Government Contracts, although its outlier status in terms of number of exchanges may explain its presence in the middle of the table close to the overall mean. Overall, scandals and accountability debates appear to take up the majority of the Question Period agenda for the 38th Parliament. I perform statistical tests to determine whether mean similarity scores are significantly different across topics. The similarity scores within topics are normally distributed but heteroscedastic, so I perform a Welch’s ANOVA with appropriate post-hoc tests (see Appendix A.1 for further details). Overall, I find a significant difference in topic means at p < 0.001, with a small effect size (Cohen’s f 2 = 0.0307) (Cohen, 1988). To visualize the significant differences across topics, I provide a letter representation of significant pairwise differences in means across different topics in the final column (“Groups”) of Table 5.1. Each topic that shares a given letter is a member of a grouping within which the similarity scores of members are not significantly different. For example, we can reject the null hypothesis of no difference in means between the Natural Resources speech topic and those speech topics in groups a, b, and c, but fail to reject the null hypothesis of no difference for speech topics in group d.

How far do speeches from this sampling of topics reflect the measurement proposition that lexical similarity is associated with higher parliamentary accountability? In the following analysis, I select two high similarity (Natural Resources and Health) and two low similarity (Citizenship and Immigration and David Dingwall) topics for qualitative review. In particular, I select Health for review, rather than Foreign Affairs, due to the key role health care played as a topic of negotiation between the NDP and the Liberals in guaranteeing the survival of the government. Rather than review the lowest similarity Chapter 5. Qualitative Validation 69

Table 5.1: Topics of Opposition-Minister Exchanges in Question Period, 38th Parliament

Subtopic Mean Similarity Score Standard Deviation n Groups Natural Resources 0.236 0.117 68 d Foreign Affairs 0.227 0.092 67 d Health 0.225 0.113 83 cd National Defence 0.207 0.101 170 bcd Sponsorship Program 0.203 0.118 545 bcd Child Care 0.195 0.091 53 abcd The Environment 0.187 0.086 64 abcd Aboriginal Affairs 0.186 0.094 54 abcd Agriculture 0.182 0.085 104 abcd Justice 0.174 0.090 142 ab David Dingwall 0.173 0.091 66 abc Citizenship and Immigration 0.167 0.093 216 a Government Contracts 0.164 0.102 60 abc topic, Government Contracts, I also include in my qualitative sample the Sponsorship Program debates given both the importance of the sponsorship scandal to the trajectory of the 38th Parliament and the disproportionate number of speech exchanges on this topic. The group comparison results in Table 5.1 suggest one interesting contrast in particular: the significant difference between the low similarity topic Citizenship and Immigration and the higher similarity topic Sponsorship Program, the top two topics in the parliament in terms of frequency of exchanges. Given this difference, we should expect to see a clear contrast in the qualitative level of accountability across both speech topics.

5.1.3 Natural Resources

Natural Resources (see Table 5.2 and 5.3) speeches follow a general pattern of opposition inquiry into specific policy initiatives with regional relevance. One prominent subject of debate was the Atlantic Accord, which changed calculations for oil and gas revenues and equalization payments for the Atlantic provinces. The Opposition demanded the Prime Minister take a consistent position the issue as well as address concerns of fairness from other oil-producing provinces. Other recurring debates concerned the Devils Lake water diversion project, a North Dakota initiative that would potentially affect ’s waterways, and the cleanup of abandoned uranium mines in . Upon qualitative review, higher textual similarity scores consistently reflect commonalities in lan- guage use between government and opposition speeches primarily due to explicit references to projects, locations, provinces, and legislation in a manner consistent with parliamentary accountability. In the first example in Table 5.2, opposition member references the aforementioned election promise that Newfoundland and Labrador would receive 100% of all oil and gas revenues regardless of the associated project. The government response reaffirms the details of the promise, employing the same language as in the question, then supplies additional details on current negotiations between the federal government and the province. The second example in the table relates to the uranium mines cleanup in Saskatchewan. In this case, the opposition member does not appear to disagree with the tar- get or substance of government policy on this issue; however, they are interested in holding the minister Table 5.2: Natural Resources (38th Parliament) High Similarity Speech Pair Examples

Date Opposition Member Question Text Minister Answer Text Similarity

26/10/04 Loyola Hearn Mr. Speaker, the only person in Newfound- R. John Efford Mr. Speaker, Newfoundland and Labrador 0.53085171 land and Labrador who believes this is a is receiving 100% of the revenues. Right good deal is that very minister, who in a let- now, the issue is equalization on top of the ter to his constituents said that the Prime revenues. They keep referring to the rev- Minister promised to finalize the deal to enues. The deal that was committed by the give Newfoundland and Labrador 100% of Prime Minister and the premier during the its revenues without affecting equalization. election is a deal that was negotiated. Every Liberal candidate in the election reit- Last Thursday evening, when they con- erated that promise. cluded negotiations between the finance Why has the Prime Minister not lived up minister of Newfoundland and the finance to his commitment, and what part of 100% minister of Canada, the premier called me does the Prime Minister not understand? and said that it was going for sign-off and to make sure that sign-off was done. I made sure the sign-off was done, not–

4/2/05 Mr. Speaker, the Minister of Natural Re- R. John Efford Mr. Speaker, before I answer the member’s 0.45331954 sources stated that he had never promised question, I would suggest that the next time a quick answer on the question of federal he should get someone competent to write funding for the cleanup of uranium mines his questions for him. in northern Saskatchewan but media reports I have had meetings with the minister in prove otherwise. Saskatchewan concerning the cleanup of The minister also said that he would tour uranium mines. As the minister it is my those mines last February. He did not. priority and it is a priority with the govern- The minister further stated that he would ment. make this issue a top priority with his gov- I suggest that the hon. member just take ernment. He has not. his time and wait and it will be done. My question for the minister is simply, why did you mislead the House and why– Table 5.3: Natural Resources (38th Parliament) Low Similarity Speech Pair Examples

Date Opposition Member Question Text Minister Answer Text Similarity

27/10/04 Stephen Harper Mr. Speaker, the Prime Minister has fla- Paul Martin Mr. Speaker, I had an opportunity to speak 0.05474489 grantly broken his 100% promise to Nova to Premier Williams this morning. The sug- Scotia and Newfoundland and Labrador. gestion I have made is that our officials Even his own federal Liberal MPs do not should meet. I am very clear in terms of the believe that he is sticking to his word. understanding that we had arrived at, and I The Liberal member for Humber-St. Barbe- am certainly prepared and in fact most de- Baie Verte said that he heard the resource sirous of fulfilling that understanding. royalty promise, and now he is concerned To the extent that there are differences of that conditions and strings are being at- opinion, I believe our officials should meet. tached. We will see what will come from that. The Liberal MP for Random-Burin-St. George’s has said that he is considering his future over the broken promise. If the Prime Minister’s own Liberal mem- bers are not sure they can trust him on this, why should Newfoundland and Labrador? Why should ? Why should any- one?

3/11/04 Peter MacKay Mr. Speaker, clearly that is not the com- Mr. Speaker, I am very confident that we 0.04619213 mitment the Prime Minister gave Premier will ultimately arrive at a satisfactory con- Williams and Premier Hamm. clusion for the provinces of Newfoundland The Premier of Newfoundland and and Nova Scotia. Labrador has said that he will take it I would point out to the hon. gentleman on the road if the Prime Minister does that what we are dealing with, at least not fulfill that commitment. The premier in part, are the offshore accords that were is prepared to tell ordinary Canadians signed by a previous Conservative govern- first-hand what will happen to them if they ment. Both of them were specifically lim- take the Prime Minister at his word. ited in terms of time and in terms of dollar Could the Prime Minister explain why values. We are trying to improve on the pre- it would be necessary for a premier in viously flawed Conservative record. this country to embark on a cross-country campaign just to get the Prime Minister to keep his word? Chapter 5. Qualitative Validation 72 and the government more generally to account for not placing a high enough priority on the cleanup. The minister’s response is to restate the government’s position, refute the accusation that priorities are misplaced, and affirm that action has been taken. Both examples are consistent with the theoretical expectations of a pattern of accountability. In contrast, the low similarity exchanges in Table 5.3 exemplify two features characteristic of low parliamentary accountability: government responses that dodge a question or answer in extremely broad terms, and partisan or ideological attacks from either or both of government and opposition. In the first example, opposition leader Stephen Harper references the “100% promise” for oil and gas revenues but focuses on stoking dissent among Liberal MPs in relation to the Prime Minister’s position. Prime Minister Martin’s answer is to avoid the partisan accusations, as well as the promise, altogether; he speaks in very vague terms about past and potential future negotiations between government officials. In the second example, critic Peter MacKay accuses the Prime Minister of having no desire to keep his commitments or word on the offshore oil and gas issue. Mackay’s question itself is reasonably a matter of accountability, albeit with a partisan tinge, but the answer provided in return does not address it as such. The Deputy Prime Minister, not the Prime Minister, responds to the inquiry with a generic affirmation that a solution will be reached. The majority of his speech instead consists of a partisan rebuttal referring to the past record of the Conservative party in government on the issue.

5.1.4 Health

As discussed in the historical overview of the 38th Parliament, health care was a policy area within which government and opposition shared common ground and a potential for compromise. Despite differing approaches to health care policy, both the Liberal government and Conservative opposition consistently reference similar ideas in their speeches on health care: support for the public health care system, criticism of private health care and a two-tiered approach, support of the Canada Health Act principles, and a commitment to lower wait times in hospitals to save Canadian lives. This consistent lexical similarity throughout Health speeches is reflected by a high mean similarity score for this topic. Passages of debate from the Health topic exemplify how textual similarity reflects a common frame of reference across government and opposition consistent with parliamentary accountability as opposed to strong ideological contention. The first example of a high similarity exchange in Table 5.4 is a straightforward case in which an opposition member demands more clarity on a specific government position or decision taken, and the government responds with an explanation. The issue in question is the government’s stance on a United Nations resolution on human cloning given an apparently contradictory position it enshrined in Canadian legislation. The minister responsible reaffirms the government’s position on human cloning and explains the rationale underlying its decision not to ratify the UN Declaration on Human Cloning in March 2005, namely that while the government fully supported the intention of making human cloning illegal it could not support the ambiguous language of the declaration. The government’s response is consistent with the statement given by the Canadian representative to the General Assembly at the time (General Assembly of the United Nations, 2005). A similar pattern is visible in the second exchange, an inquiry regarding the Canadian Strategy for Cancer Control. The Strategy was a policy vision that was endorsed by the Canadian Cancer Society and supported by the House of Commons but which did not receive the full allocation of funding demanded by the opposition. In this case, the Conservative questioner confirms the government’s support for a Conservative motion on the Strategy and requests accountability from Table 5.4: Health (38th Parliament) High Similarity Speech Pair Examples

Date Opposition Member Question Text Minister Answer Text Similarity

24/2/05 Rob Merrifield Mr. Speaker, the government abstained two Mr. Speaker, the government’s position on 0.49099 years ago on the same sort of vote, which human cloning is clear. All forms of hu- was unacceptable at that time, but the vote man cloning for whatever purpose and using last week against human cloning shows an whatever techniques are banned in Canada agenda. under the Assisted Human Reproduction The government took a position at the Act. That is the reality. United Nations that was contrary to its own While important elements of the United legislation that bans all cloning. Nations declaration are consistent with If the government is against human cloning, Canada’s domestic legislation on the prohi- why did it vote against the ban last week? bition of cloning, the government was un- able to support it due to some imprecise drafting. The language presented raised dif- ficulties.

8/6/05 Steven Fletcher Mr. Speaker, yesterday the government Ujjal Dosanjh Mr. Speaker, yesterday’s motion was about 0.437042 agreed in the House to fully fund the Cana- cancer control, mental health and heart dis- dian strategy for cancer control, which it ease. It was essentially about the major knows is a commitment to provide $260 mil- chronic diseases. I said yesterday in the lion over five years. House that we had $300 million over the By supporting yesterday’s Conservative mo- next five years for an integrated chronic dis- tion, the government has agreed to specifi- ease strategy, and that is what we will do. cally allocate these moneys to the national cancer strategy. When will the $260 million for the Canadian strategy for cancer control be delivered? Table 5.5: Health (38th Parliament) Low Similarity Speech Pair Examples

Date Opposition Member Question Text Minister Answer Text Similarity

16/6/05 Colin Carrie Mr. Speaker, yesterday we learned that Ujjal Dosanjh Mr. Speaker, as the Prime Minister has 0.060634 120,000 Quebeckers are on surgical waiting said, we recognized this issue over eight lists and 43,000 of them have waited longer months ago and provided $41 billion. I un- than is medically acceptable. derstand each of the provinces is engaged What concrete measures is the government in reducing wait times. Whether it is in going to provide to resolve the waiting list Saskatchewan, in Quebec, in B.C., in Al- problem highlighted by the Supreme Court? berta or in Ontario, all the provinces are worried about this, which is why the first ministers of the country got together to deal with this issue last September.

18/10/04 Steven Fletcher Mr. Speaker, I understand why the NDP Ujjal Dosanjh Mr. Speaker, there is a trust fund estab- 0.04239 traded him away. lished by the courts. Contributors to that The minister refuses to give Canadians an trust fund are the federal and provincial honest answer. Why is the government bla- governments. We need to speak to those tantly discriminating against the pre-1986 partners. We need to speak to the lawyers and post-1990 victims? Why will the minis- of the plaintiffs. We need to then approach ter not stand up in the House right now and the courts to take a look at whether or not tell Canadians that all victims of hepatitis C there is an actuarial surplus. We are in the from tainted blood deserve compensation? process of doing exactly that. Canadians know. Give an honest answer and do the right thing. Chapter 5. Qualitative Validation 75 government on when its funding will be delivered. The spending commitment is reaffirmed by the minister’s response, albeit while drawing attention to other aspects of the government’s more general policy approach.

The low similarity Health examples in Table 5.5 are of particular interest for this qualitative validity assessment. One potential source of lexical similarity that would confound its use as an accountability measure is if particular terms (notably, references to the names of provinces) are used across questions and answers by virtue of some third variable rather than a shared frame of reference characteristic of accountability. This would undermine the assumption that words in any speech are drawn from the same population. In the case of Natural Resources, for example, the higher lexical similarity between questions and responses observed could be misleading: natural resources are a matter of provincial jurisdiction, so references to provinces may always be frequent and consistent across speeches. Health care is also primarily a provincial responsibility, so it provides an opportunity for confirmation of this measurement validity concern. The first low similarity example, however, shows that government MPs are perfectly willing to ignore or recast the provincial dimension of a discussion if questioned in a partisan or ideological manner, lowering the lexical similarity of the exchange as expected. The Conservative question refers specifically to hospital wait times for Quebecers, in reference to the Supreme Court’s 2005 decision in the Chaoulli v Quebec case. The decision found that restrictions on private health insurance violated fundamental rights under the Quebec Charter of Human Rights and Freedoms, given long wait times for necessary care in the public health care system. The strategy of directly appealing to Quebec regional interests is deliberate, and would pay dividends for the Conservative Party in the subsequent election, as the extra votes in Quebec gave them the regional seats needed to win a minority government. The Minister of Health diffuses this loaded question by affirming a commitment to lowering wait times for all provinces, and also by avoiding the question of the Chaoulli decision altogether. In this example, the discussion of a province-specific issue does not appear to guarantee higher lexical similarity values, and the relationship between similarity score and qualitative accountability persists.

The second low similarity example is interesting as it provides an example of when an opposition dissatisfied with the government’s accountability on an issue was eventually successful in extracting policy concessions. The topic of discussion is the tainted blood scandal, which saw thousands of Canadians who received blood transfusions exposed to HIV/AIDS and Hepatitis C due to cost cutting and a lack of testing procedures at the Canadian Red Cross. Disputes over the extent and amount of government compensation to the victims of this tragedy had been ongoing following the inquest of the Krever Commission in 1993 (CBC News Online, 2007). In October 2004, the government was in the process of opening negotiations with stakeholders to compensate victims who were not covered under the previous compensation package dating to 1998. At that time, the decision to exclude particular groups of victims from the settlement resulted in a dramatic confidence vote that left many Liberals upset with the direction the party had taken (Comeau, 2005). Conservative MP Steven Fletcher’s question invokes this partisan history—opening with a reference to Ujjal Dosanjh’s past as an NDP premier—and pressures the government on this uncomfortable issue. The minister’s response shows a lack of accountability insofar as he diffuses responsibility to the courts and other stakeholders, and does not address the central point of the question regarding compensation for excluded victims. Significantly, the pressure on the Liberal government to be accountable for this issue evidenced in this exchange ended up working: compensation negotiations would reopen a month after the question was posed in the House, and later in April 2005 the House would pass a Conservative motion to include 5000 more victims in Chapter 5. Qualitative Validation 76 the compensation package (CBC News Online, 2007). The trajectory of this case, of which this speech exchange was one representative component, exemplifies Bovens’ accountability model of questioning, information provision (or lack thereof), and sanction.

5.1.5 Citizenship and Immigration

During the early stages of the 38th Parliament, Minister of Citizenship and Immigration Judy Sgro was subjected to substantial pressure to take personal responsibility for scandal in her department, eventually resulting in her resignation. The Citizenship and Immigration example speech exchanges show how an accountability scandal can be interrogated in the House of Commons in contrasting ways and how this qualitative distinction is captured by the similarity score measurement. In the first high similarity example in Table 5.6, both Opposition Leader Stephen Harper and Deputy Prime Minister Anne McLellan discuss the review of Sgro’s actions by the Ethics Commissioner. Both seem to have no dispute over the legitimacy of the Ethics Commissioner or the validity of the arms-length inquiry process. Harper is simply pressuring the Minister for her immediate resignation, while McLellan maintains that the Minister should wait until the Ethics Commissioner has reported before taking this decision.3 The second sample exchange follows a similar pattern. As in the first, the Conservative demand is for the Minister to resign immediately; there is likewise a common frame of reference, Treasury Board guidelines on government expenses, that both sides agree is legitimate. Both government and opposition agree on the rules, namely that campaign expenses are not billable to the government, but disagree on the meaning of specifics of the regulations. As predicted by Bovens’ model, the government responds to the opposition question with an account of why the decision taken was legal and gives further background on the specifics of the procedural guidelines that were followed. The low similarity examples in Table 5.7 again demonstrate how low scores typically reflect partisan, unsubstantial questions on the part of opposition and combative or general rebuttals from government respondents. The first example requires some additional contextualization to be meaningful: it con- cerns Joe Volpe, the minister who took over the portfolio of Citizenship and Immigration after Judy Sgro’s resignation. A political controversy emerged when Volpe privately received information that two Conservative MPs had accepted money in exchange for immigration visa assistance, in one of the cases involving a family member accused of domestic assault. Volpe passed the information on to the RCMP for investigation, but the episode was leaked to CTV News, allegedly deliberately by Liberal insiders (Canadian Press, 2005). In this case, there is a fundamental disagreement about the personal integrity of the minister rooted in a partisan conflict. The government response makes extensive reference to the Sgro allegations (which, by this time, had been dismissed) and rebuttal of the supposed attempt to unfairly undermine the new minister’s reputation. The second low similarity example is also addressed to Joe Volpe, in a trivial partisan snipe about meal expenses; Volpe responds in general terms regarding typical expensing procedure as there is little of substance in the question to address. Of particular note in this example is the hyperbolic, mocking language of the question; the distinctiveness of the words em- ployed (including “cheese”, “whopping” and of course “pizza”) and the repetitiveness of word use within the question. The contrast between the language of question and answer is an excellent qualitative demonstration of factors that result in a low similarity score.

3The caution was prudent: despite Sgro’s eventual resignation in January 2005, she was eventually cleared of all wrongdoing in the two immigration scandals in which she was involved. Table 5.6: Citizenship and Immigration (38th Parliament) High Similarity Speech Pair Examples

Date Opposition Member Question Text Minister Answer Text Similarity

22/11/04 Stephen Harper Mr. Speaker, if the minister were concerned Anne McLellan Mr. Speaker, the minister did the right 0.4709505 about the ethics, she would have asked the thing in referring this matter proactively to Ethics Commissioner before it appeared in the Ethics Commissioner. The Ethics Com- the newspaper and was raised in the House missioner is an independent officer of Par- of Commons. liament. The minister has also indicated The minister has been asked to explain why that upon receipt of the report from the her staff was doing immigration business in Ethics Commissioner, she will make that re- her office. She can offer no answer to the port public. What the minister is doing is House. If she has to wait for an answer, both accountable and transparent. why does she not step aside until the Ethics Commissioner can answer the question for us?

22/11/04 Mr. Speaker, perhaps the minister can tell Reg Alcock Mr. Speaker, it is important to note that 0.44212976 us exactly which Treasury Board guidelines Treasury Board guidelines do not allow peo- say government staffers can bill taxpayers ple to be paid expenses for working on cam- for their campaign expenses. paigns. When one of her staff takes leave during the However, under Treasury Board guidelines campaign but then continues to direct the each minister is recognized as having to take ministry, that is wrong. When her director on responsibilities to serve the government of parliamentary affairs submits claims for while an election is on and each minister $5,900 for working on her campaign, that is is allowed to take one staff person to han- wrong. dle their ministerial responsibilities during In light of her ethical lapses, will the minis- a campaign. It is very straightforward and ter step aside? the minister has complied. Table 5.7: Citizenship and Immigration (38th Parliament) Low Similarity Speech Pair Examples

Date Opposition Member Question Text Minister Answer Text Similarity

17/5/05 Mr. Speaker, the minister’s own office con- Anne McLellan Mr. Speaker, as I said yesterday, I would 0.04188539 firmed that the leak came from the Lib- have hoped that the hon. member and all erals. It was under this minister’s watch of us in this House would have learned some- that highly confidential information, which thing from the situation that the hon. mem- he has sworn to protect both as a minister ber for York West was put through. and as a privy councillor, was leaked to the In fact, again the hon. member is asserting media. Whose file will be publicized next? certain things as facts and making sweeping Will the minister responsible for this be- allegations in relation to what may or may trayal of the public trust be removed? not have happened. I would hope that in this House we would be able to ask respect- ful questions, receive respectful answers and stop this attempt to destroy people’s repu- tations without foundation.

5/10/05 Rahim Jaffer Mr. Speaker, the pizza expenses keep piling Joe Volpe Mr. Speaker, as I have indicated already 0.03967598 up like a mountain of cheese for the immi- in other answers, I conduct meetings with gration minister. stakeholders and other constituents over the On August 20, the minister visited his course of hours that are beyond the normal favourite pizza joint once again and spent working hours in the House. When I invite a whopping $207 for pizza for himself and those people for the benefit of their consul- two guests. That is $70 per person. I do tation, I do so in a responsible fashion and not know about other members, but I am I pick up the costs of those meetings. sure Canadians are getting indigestion just We put it on proactive disclosure in the thinking about all that pizza. House and we do it to demonstrate that we The minister could not explain how he spent do this. $138 for two, but could he now explain how he spent $207 on pizza for three? Table 5.8: David Dingwall (38th Parliament) High Similarity Speech Pair Examples

Date Opposition Member Question Text Minister Answer Text Similarity

20/10/05 Stephen Harper Mr. Speaker, I do not understand why the John McCallum Mr. Speaker, David Dingwall is not knock- 0.47809144 Prime Minister just sits there. David Ding- ing on anybody’s door. David Dingwall is wall is knocking on his door. He holds Cana- doing what is legally appropriate in our sys- dians’ chequebook in his hand. He says tem. That is to say, any matter regarding David Dingwall quit voluntarily. In fact, he legal obligations is handled by government begged David Dingwall to stay and not quit. lawyers in the Privy Council Office who are Why does he not just say no and say he will under instructions from the Prime Minister not give him any more taxpayers’ money? to pay the legal minimum.

3/10/05 Stephen Harper Mr. Speaker, I am going to dispute the law John McCallum Mr. Speaker, it seems the Leader of the Op- 0.36514837 and the facts here. This government, we will position did not hear my answer in French, recall, in fact did not give a severance to Al- so I will repeat it in English. This is a mat- fonso Gagliano, but neither did he keep his ter of law. It is not a matter of political mouth shut. That is really the issue here. discretion. The government will pay to Mr. The government is negotiating a half mil- Dingwall only what it is legally required to lion dollar payoff for Mr. Dingwall after he pay and not a penny more. Moreover, if left his job voluntarily. Will the government the independent investigator finds that any simply admit that the real reason for this of his expenses were inappropriate, the gov- severance package is that it is hush money ernment will retrieve those expenses, dollar for David Dingwall? for dollar. Table 5.9: David Dingwall (38th Parliament) Low Similarity Speech Pair Examples

Date Opposition Member Question Text Minister Answer Text Similarity

29/9/05 Mr. Speaker, the minister is saying a lot, Reg Alcock Mr. Speaker, I do not think the member 0.04586706 and nothing, at the same time. opposite should play fast and loose with the Yesterday the Prime Minister tried to de- truth. Would he please identify a single law fend, as his colleagues are doing today, the that has been broken, a single rule that has indefensible. The Prime Minister had a been broken? choice yesterday: Liberal crony or Canadian How does he defend the fact that he seems taxpayer. He chose Liberal crony. He chose to think it is inappropriate for the head of wrong. a $400 million corporation, which generates The Prime Minister makes these bold pro- $182 million offshore, to travel to do that nouncements about improving governance, business? This whole thing is nothing more but they are nothing more than bogus. Now than a character assassination on somebody he is planning to add insult to injury by who has done an excellent piece of work. giving David Spendwell a severance pack- age. Why is the Prime Minister giving more money to Dingwall when he should be get- ting it back?

31/10/05 Brian Pallister Mr. Speaker, the government says that John McCallum Mr. Speaker, last week I said that the 0.0421076 David Dingwall is clean, which just shows hon. member devalued the currency of all us what passes for clean with the govern- members in this House by his accusations ment. Last week’s dingwash audit showed without merit and without facts. Today I that David Dingwall charged taxpayers for will quote from the Saskatoon StarPhoenix personal flights, personal courier service– which wrote that the Leader of the Opposi- and the Treasury Board minister should lis- tion “needs to acknowledge it when his party ten to this as he seems to pretend he is in- jumped the gun by attacking the reputation terested in accountability–personal gum and of a man before the facts were in, and then even a personal massage. acknowledge the mistake when the facts be- Just two weeks ago the Department of Fish- came known.” eries and Oceans fired employees for using Saskatoon is near where the hon. member tax dollars for personal use. Is paying David lives. The message is directed at him. Dingwall severance a Liberal double stan- dard? Chapter 5. Qualitative Validation 81

5.1.6 David Dingwall

A substantial number of sponsorship scandal-related questions in the 38th Parliament dataset centered around a particular public servant: David Dingwall, head of the Royal Canadian Mint since February 2003. Dingwall, a former Liberal cabinet minister, was implicated in the scandal and in subsequent allegations of improper and excessive expense claims for him and his aides. Throughout these episodes, Dingwall forcefully defended his reputation; as he legitimately pointed out, he had guided the Mint to unprecedented profitability during his tenure as President and CEO. He argued the expenses claimed were necessary salary and business related items that in some cases had been incorrectly reported. Responding to consistent pressure, Dingwall finally resigned in September 2005 but insisted upon receiving his severance package following resignation, famously asserting, “I’m ethically entitled to the entitlements which I believe are owing me.” (CBC News, 2005a). Unsurprisingly, Dingwall’s Liberal lineage, proximity to multiple spending scandals and unwillingness to back down spurred extensive partisan attacks in the House of Commons, resulting in an overall low similarity score for the debate topic that bears his name. The high similarity examples in Table 5.8 show government members directly engaging with oppo- sition accusations by countering them with rationales and providing additional information and com- mitment to address opposition concerns. Both exchanges centre around a specific, common frame of reference: the legitimacy of Dingwall’s severance package under the law. In the first example, Liberal MP John McCallum defends Dingwall, explaining that his actions to claim severance pay are appropriate under the law and endorsed by government lawyers in the (non-partisan) Privy Council Office. The sec- ond example contests the legitimacy of the motivation or reasoning behind the severance decision, citing the broader context of the sponsorship scandal; McCallum argues the severance package is a matter of contract legality, not political patronage. The government also commits to obey the findings of the inde- pendent audit of Dingwall’s expenses and to obtain payment for any improper expenses from Dingwall. This second exchange is a particular example of a government taking the substance of an opposition question seriously, despite its partisan undertones, and providing both rationales for its decision and commitments to future action. The examples in Table 5.9 show how the same line of questioning can be excessively partisan, ac- cusatory, and loaded resulting in lower similarity scores across exchanges. As in the Citizenship and Immigration examples above, low similarity scores are characterized by opposition questions containing unique vocabularies, mocking and partisan insults, and repetition. In the first example, Conservative MP Brian Pallister repeats references like “Liberal crony”, cites “bogus” pronouncements, and castigates “David Spendwell”. Unsurprisingly, the government response does not repeat these insults. However, it does display a similar level of internal generality and lexical repetition, and includes partisan rebuttals in turn, including an accusation of “character assassination”. Still, the government minister makes some attempt to supply a rationale for the decisions taken by Dingwall. The second example is similar to the first; indeed, the questioner is the same Conservative MP. Pal- lister uses the term “dingwash” to refer to the ongoing independent investigation of Dingwall’s expenses, makes repetitive references to minor details such as expenses for “personal gum” and a “personal mas- sage”, and closes with a partisan reference to a “Liberal double standard”. The Liberal response from John McCallum again critiques the question for being inappropriate and general. However, his response throws partisan accusations right back at both the Conservative opposition and Pallister himself. In a remarkable example of the qualitative dynamics represented by a low similarity score, McCallum skirts close to violating parliamentary procedure by using a newspaper quotation to retaliate directly against Chapter 5. Qualitative Validation 82 an opposition MP. In sum, the David Dingwall speech examples are an excellent illustration of how ad- versarial accountability can be constructive or destructive, and how similarity scores effectively capture these dynamics in a quantitative measurement.

5.1.7 Sponsorship Program

Overwhelmingly, the central topic of discussion in Question Period during the 38th Parliament was the sponsorship scandal. Perhaps because the topic is so expansive, the highest and lowest similarity samples from the Sponsorship Program debate topic are excellent examples of high and low accountability in text. In general, when similarity scores are high, opposition MPs asks more specific and less partisan questions, while government MPs clarify and reaffirm positions, provide more information, or make promises of future action in their responses. When lexical similarity in an exchange is low, opposition questions are leading or loaded, contain partisan or ideological jabs, and are repetitively phrased; likewise, government responses are more partisan, general, and confrontational. In the first high similarity example in Table 5.10, Conservative Opposition MP James Moore poses a specific and well-researched question involving a statement by the Prime Minister about dinner with a particular sponsorship scandal figure. The deputy Prime Minister responds with a reaffirmation of the Prime Minister’s testimony and a commitment to the government’s position on the matter, includ- ing a clear denial of the relationship alluded to by the opposition; there is no attempt to avoid the question. In the second example, government and opposition are likewise on the same page in terms of their commitment to obtaining the truth and respecting the Gomery inquiry process. Peter MacKay presses the government to establish a clear position on whether it will challenge the final Gomery report, referencing Chrétien’s divisive lawsuit. Chrétien’s involvement made this question politically difficult for the government, but Scott Brison’s response does not deflect or dodge the issue. Brison was at the time Minister for Public Works, the department at the centre of the sponsorship spending controversy. He explains in detail the government’s view that the report should not be delayed regardless of Chré- tien’s actions. Again, he provides further explanation of the government position, clarifies the confusion regarding Chrétien’s involvement, and makes a future commitment to accountability for the Gomery process, addressing all key points in the opposition question. The low similarity examples in Table 5.11 are likewise typical. Both questions are directed at Min- ister Scott Brison (notably, a former Progressive Conservative who crossed the floor following the CPC merger). In the first example, Opposition MP Diane Ablonczy employs an arsenal of accusatory and negative partisan language including “ugly”, “theft”, “broken promises”, “money laundering”, “kickbacks”, “hidden agenda”, and “betrayal”; the question is unfocused and leaves no room for an accountable or ac- ceptable government answer. In response Brison retaliates in kind, accusing Ablonczy of inappropriate behaviour and of making unfounded allegations. The second example likewise exemplifies the opposition tactic of posing accusatory and loaded questions fuelled by partisanship. Again, the government can do little but retaliate, in this case targeting the Conservative party and MP James Moore personally for a lack of respect for the rule of law and the Canadian justice system. Characteristic of a low similarity ex- change, Brison’s answer is a partisan volley that does not address the question posed—but the question itself was too loaded to be an honest attempt at holding the government to account. To summarize, I began with the premise that accountability in Question Period debate can be as- sessed qualitatively as the extent to which oppositions ask reasonable questions that are not solely partisan attacks, and governments respond with rationales, additional information, and commitments Table 5.10: Sponsorship Program (38th Parliament) High Similarity Speech Pair Examples

Date Opposition Member Question Text Minister Answer Text Similarity

19/4/05 James Moore Mr. Speaker, on April 14, the Prime Minis- Anne McLellan Mr. Speaker, the Prime Minister is not 0.691905 ter denied having dined with Claude Boulay, hiding anything here. He has been abso- but Alain Renaud has stated that the Prime lutely clear about his relationship with Mr. Minister did dine with him. Now we learn Boulay. that, when the PM was finance minister, The Prime Minister testified before Gomery. Mr. Boulay received over $67 million in The member should read that testimony. sponsorships. The Prime Minister said in this House that Why is the Prime Minister denying his rela- he did not have with Mr. Boulay, or tionship with Mr. Boulay? Is it because this anyone else as far as that goes, in relation would directly implicate him in the sponsor- to directing any contract to anyone. How ship program? much clearer can the Prime Minister be?

14/6/05 Peter MacKay Mr. Speaker, I do not believe that is the Scott Brison Mr. Speaker, the government’s view was 0.605444 letter. expressed by counsel. That view was that Justice Gomery says the government’s we did not want to see any delay in Jus- recent position of leaving the door open tice Gomery’s report and that in fact, if for a potential challenge to his final report there were going to be any action by Mr. puts a cloud over him and places him in Chrétien that would allege bias, it would be an extremely difficult position. Justice preferable to have that action after Justice Gomery is now going to court to get a Gomery provided his report to Canadians. definitive dismissal of Mr. Chrétien’s Justice Gomery wants to deal with this is- challenge. Protecting Justice Gomery’s sue in the short term. We respect Justice integrity and the guarantee of a final report Gomery’s position and support him in that are extremely important to Canadians. position because we continue to support the Will the government support Justice work of Justice Gomery. We believe he is Gomery in his fight to protect the integrity doing important work and positive work on and the timeliness of the report, or is the behalf of Canadians. Prime Minister again just setting up an election escape hatch? Table 5.11: Sponsorship Program (38th Parliament) Low Similarity Speech Pair Examples

Date Opposition Member Question Text Minister Answer Text Similarity

21/3/05 Diane Ablonczy Mr. Speaker, the ugly truth about Liberal Scott Brison Mr. Speaker, it is a good thing the hon. 0.034816 Party theft from Canadian taxpayers is fi- member opposite is using the immunity of nally coming out. the House to protect herself in making those A year ago the Prime Minister promised kinds of outrageous statements. If she, as a that voters would have these facts before lawyer, were to make those kinds of state- an election. He broke that promise and hid ments in a courtroom without evidence, the organized money laundering that kicked based solely on testimony before an inquiry back millions of public dollars into Liberal on a daily basis, she would probably be dis- hands, including his own closest supporters. barred. The Prime Minister told voters his compe- She should be ashamed of herself, by drag- tition had a hidden agenda. It turns out ging reputations through the mud here on that the Liberals were the ones really hid- the floor of the House of Commons without ing something. any firm evidence upon which to make those How can the Prime Minister explain this be- allegations. trayal?

26/10/04 James Moore Once again, Mr. Speaker, the minister is Scott Brison Mr. Speaker, I think there is some- 0.025863 making his own mistake by persistently fail- thing more deep-seated in the Conserva- ing to answer the simplest of questions. tive Party’s contempt for Justice Gomery’s Last week the public works minister stood work. I think it reflects a greater contempt in the House and specifically referred to spe- for the independence of the Canadian judi- cific constituencies and projects that were ciary, in fact reflected by their justice critic’s under the sponsorship program, but when description of the situation when he said it comes to the Prime Minister he has selec- that there was a lot of distrust in general tive amnesia. How convenient but he cannot toward the judiciary right now and that it have it both ways. was leading a lot of people to be very fearful Again, when did the Prime Minister know of giving powers to the judiciary. that his office was making phone calls to se- Further, their own leader said that he agreed cure taxpayer money for his own personal that serious flaws existed in the Charter of fundraiser? When did he know? Rights and Freedoms and that there was no meaningful review or accountability mecha- nisms for justices. They should let Justice Gomery work and have some respect for the independence of the Canadian judiciary. Chapter 5. Qualitative Validation 85 rather than dodging the question or throwing partisan attacks in return. I expected that a quantitative measurement of lexical similarity across exchanges is a way of quantifying and measuring such account- ability, following a measurement definition derived from Bovens’ model of questioning, information, and potential judgement. Within a close reading of 38th Parliament debates, these propositions appear to have empirical support. Politically-charged, partisan, and controversial topics, such as accusations against the Ministers of Citizenship and Immigration Judy Sgro and Joe Volpe, generally showed up as lower similarity topics than did substantive areas such as Natural Resources and Health. Across speech exchanges within topics, high similarity scores reflected substantive debates over procedures and policies wherein government and opposition members shared a common frame of reference. Low similarity scores were associated with leading or loaded questions, partisan sniping, repetition and generality, and eva- sion. The Sponsorship Program speeches were perhaps the most clear demonstration of clear qualitative differences between debates with high and low similarity scores reflecting a higher and lower quality of parliamentary accountability in debate. In the next section of this chapter, I confirm these observations within a second case study.

5.2 39th Parliament: Conservative Minority

5.2.1 Historical Overview

The 39th Parliament makes an excellent comparative case given its time proximity, alternation of power, and contextual continuity with the 38th Parliament. The Conservative government, led by Prime Minis- ter Stephen Harper, was one of the longest-lived minority governments in Canadian history. In contrast to the declining fortunes of the Liberals over the early 2000s, the Conservative Party of Canada (CPC) emerged united from a relatively quick merger process in 2003. The new party, comprised of the former Canadian Alliance and Progressive Conservative parties, was positioned to end the vote-splitting on the right that had contributed to successive Liberal victories over the 1990s (Doern, 2007, 6). The CPC maintained a steady base of support following the 2004 election, which saw their seat total improve from the previous showing of the Canadian Alliance in 2000 but not yet attain the level of support in Ontario or Quebec that would allow them a shot at governing (LeDuc et al., 2010, 492). The 2004 result demonstrated how the party merger and growing public acceptance of leader Stephen Harper had begun to pay off for the Conservatives; the 2006 election offered them the chance to institutionalize this shift. Following their defeat on a confidence motion in late 2005, the Martin Liberals pieced together a winter electoral campaign. The party remained disunited and understaffed, with little attempt to welcome Chrétien supporters back into the organization. Chrétien himself struck a further blow to Liberal fortunes with his December release of a legal challenge to the Gomery report, forcing Martin into the spotlight to again defend his government’s position on the sponsorship scandal (LeDuc et al., 2010, 589). The Liberals presented a directionless policy platform with little innovation beyond the government’s existing initiatives, and soon ran out of fresh material to attract voters as the 8 week campaign dragged on (Jeffrey, 2010, 586–587). In contrast, the CPC strategically used the pre-Christmas campaign period to make simple and attractive policy promises, including lowering the GST immediately by 1% (Flanagan, 2009, 235). They rolled out successive social policy proposals with a conservative flavour, such as wait- time guarantees for the public health care system and child care tax credits in lieu of the Liberals’ public day care program (LeDuc et al., 2010, 501, 590). The Conservatives also attempted to legitimize their Chapter 5. Qualitative Validation 86 claim on government early by anticipating public concerns on controversial values issues such as gay marriage, promising no action would be taken without a free vote in Parliament. A tragic shooting over the holidays in Toronto served to shift public sentiment towards the Conservatives’ tough-on-crime platform proposals including mandatory minimum sentencing (Jeffrey, 2010, 593). Furthermore, the Conservatives capitalized on the ill will towards the Liberals in Quebec that had continued to fester as a result of the sponsorship scandal. The CPC platform addressed provincial rights issues relating to culture of particular relevance to Quebec (LeDuc et al., 2010, 503). The final nail in the coffin for the Liberals’ record on accountability was an announcement by the RCMP of an investigation into insider trading allegations involving Finance Minister Ralph Goodale relating to the government’s legislation on regulating income trusts (Flanagan, 2009, 247–248). Liberal poll numbers began to drop as the campaign proceeded into January, instigating a tactical switch to negative advertising. The strategy of frightening voters away from the Conservatives and casting Stephen Harper as a right-wing extremist with a “hidden agenda” had worked for the Liberals in 2004. However, these allegations failed to resonate with Canadians as strongly as they had in the past decade (Flanagan, 2009, 251–252). On election day, January 23, the Conservatives won a narrow minority government. Their major success had been in securing new support in Quebec, gaining votes from both the Liberals and the Bloc Québécois; they also improved their balance of power versus the Liberals in Ontario. The shift yielded a Conservative total of 124 seats (with 36% of the popular vote) to the Liberals’ 103 seats (LeDuc et al., 2010, 504). The internal breakdown of the Liberal Party guaranteed the minority government safety for the immediate future. Former Prime Minister Martin resigned immediately after the Liberals’ electoral defeat, and a vote was scheduled for December 2006. A lengthy process would be required to recover from infighting over the Martin government’s decisions, lingering resentment from determined Chrétien supporters, exhaustion of financial reserves following changes in party financial rules, and a lack of policy direction that had been evident in the 2006 campaign platform (Jeffrey, 2010, 606). Over and above the disarray among the Official Opposition, the Conservative government’s stability was bolstered by general public fatigue after two elections in two years. The Bloc also signalled its willingness to support the Conservatives in Parliament indefinitely given their pledge to respect provincial jurisdiction (Gervais, 2012, 166–167). Given these favourable circumstances, the government moved rapidly to make good on five key campaign promises: an Accountability Act, a child care policy, a patient wait times guarantee, a cut to the GST, and a package of tough-on-crime legislation. The Speech from the Throne was short and focused mechanically on these five promises, but the opposition had little energy remaining to demand input or elaboration (P. H. Russell, 2008, 45). From its inception, the 38th Parliament had operated under conditions of ongoing scandal and a precarious balance of power that lent bravery to the opposition. In contrast, the 39th Parliament was characterized by what Russell terms “classic minority government rule, when government proposals have to be effectively defended in Parliament and policy-making must be opened up to include views different from those of the minority who voted for the government (P. H. Russell, 2008, 47).” The Conservatives’ first budget, presented in May 2006 by Finance Minister , exemplified this style of governance. It maintained a strong focus on the five key platform promises emphasized in the Throne Speech, but also included concessions to the Liberals and NDP on surplus spending and child care, and to the Bloc in an increase of transfer payments to the provinces to begin addressing the fiscal imbalance. This budgetary balancing act allowed all parties to save face: the Liberals and NDP obtained some of their policy demands, and also felt safe to vote against the budget given the Bloc’s committed Chapter 5. Qualitative Validation 87 support guaranteed the government would not be defeated at an inopportune time (Gervais, 2012, 172; P. H. Russell, 2008, 48). Another case study in classic minority governance was one of the first bills to be introduced by the new government to make good on its campaign promises: the Accountability Act first introduced in April 2006. It directly addressed the public mood of discontent following the Gomery Commission with new rules on conflicts of interest, lobbying, election financing, and oversight of the public service including an enhanced role for the Auditor General. Eight months of elaboration in the House and in committee contributed to the final version of the bill, including adoption of amendments from both sides of the floor. The negotiation process was at times acrimonious, but all parties had campaigned on some promise of increased accountability measures and generally agreed on appropriate policy responses to the Gomery Report’s recommendations (Gervais, 2012, 176–177; P. H. Russell, 2008, 48). Such cooperation was not on offer on another one of the government’s policy priorities: crime. The Conservatives introduced a substantial number of new justice bills in the first session and had intense difficulties convincing opposition members to support them. By the time Parliament was prorogued, the majority of these bills had not yet been addressed on the legislative calendar and had to be re-introduced in the second session; others that did pass were held up in the Liberal-dominated Senate (Gervais, 2012, 177). On multiple occasions, Prime Minister Harper made clear his displeasure at being forced to govern under minority conditions and to negotiate with the opposition. His unwillingness to appoint a leader acceptable to the opposition to the new appointments committee created by the Accountability Act until he had a majority government was a case in point; it foreshadowed his later decision to dissolve Parliament to seek a majority mandate (P. H. Russell, 2008, 48). This decisive if sometimes controversial leadership style paid off in successfully navigating political crises, including securing a deal with the United States to ameliorate the softwood lumber dispute, extending Canada’s commitment to the mission in , and handling changes to the taxation of income trusts (Doern, 2007, 5). At the same time, however, Harper’s personal style and desire to implement key promises as fast as possible fostered a climate of micromanagement by the PMO. In Parliament, the minority government was also criticized for its aggression. The Liberals elected their new leader, Stéphane Dion, in December. Dion’s victory was largely unanticipated, given his lower profile than other candidates in the race (including Michael Ignatieff and former Ontario NDP Premier ), weak caucus support, and troublesome unpopularity in Quebec. However, he successfully distinguished himself with a strong policy focus on the environmental portfolio and benefitted from factional vote-splitting. The Conservatives immediately implemented a strategy to undermine Dion’s momentum by highlighting his weaknesses in negative advertising (LeDuc et al., 2010, 508). This advertising led to accusations that the Conservatives were engaging in an “American-style” constant campaign (Doern, 2007, 6). The government also took an adversarial approach to committees, which were necessarily dominated by opposition membership, in their appointment process for committee chairs and in explicit instructions given to Conservative committee chairs (Gervais, 2012, 184). In general, standing committees of the 39th Parliament were characterized by exceptional partisanship and episodes of confrontation (Gervais, 2012, 204). March 2007 marked the introduction of the Finance Minister’s second budget. It bore significant similarities to the first: it focused on five distinct policy priorities, including additional transfers to the provinces to relieve the fiscal imbalance and reduce health care wait times, and tax reductions including credits for child care and low-emission vehicles. Again, it also afforded the opposition parties a strategic Chapter 5. Qualitative Validation 88 opportunity. The Bloc again committed its support based on the budget’s focus on provincial needs, allowing the Liberals and NDP to vote against a budget that nevertheless increased public spending to levels at or higher than those under the Martin Liberals (Doern, 2007, 6; Gervais, 2012, 173). The second budget marked a transition point for the Conservative government at which the five platform proposals from the 2006 election campaign had essentially been achieved. The last major piece, fixed election date legislation included as part of the Conservatives’ accountability strategy, fell into place in May 2007. The government’s next priorities were to address the incipient threat posed by a Liberal opposition led by Stéphane Dion on policy terms. Given Dion’s successful focus on environmental policy during the leadership race, issues such as carbon taxes and Canada’s commitment to the Kyoto Accord became central points of contention. The Harper Government’s position on the Kyoto Accord had changed little from the Conservatives’ time in opposition. The strategy was to let Canada’s treaty commitments on the reduction of greenhouse gases expire, instead focusing on legislation to reduce other air pollutants within a broader policy frame of air pollution as a health care and economic issue (Gervais, 2012, 179). Dion’s entrance as Opposition Leader forced the government to negotiate this strategy in the House of Commons. In February 2007, a Liberal Private Members’ Bill demanding the government formulate a plan within 60 days to uphold Canada’s Kyoto commitments passed with support of all three opposition parties (P. H. Russell, 2008, 52). The bill had no binding authority over the government nor could it have any influence on public spending; nevertheless, the government announced its intention to produce such a plan. Public opinion was shifting towards support of the Kyoto Accord commitments, and the government’s Clean Air Act addressing particulate air pollution had stalled on the timetable. The solution the government offered in April 2007 was modified regulatory, not legislative, controls on greenhouse gas emissions; this action allowed the Conservatives to placate the opposition without being forced to seek their approval in the House of Commons (Gervais, 2012, 181). As the Private Members’ Bill passed through the Senate and became law, and the first session of the 39th Parliament came to a close in Summer 2007, the Prime Minister affirmed the government would take no further action on the Kyoto file (P. H. Russell, 2008, 56). The second session of the 39th Parliament, beginning in September 2007, presented the Conservative minority government with a political dilemma. Its five point agenda from the 2006 election had been enacted successfully under minority conditions. In order to justify seeking a majority mandate from the Canadian electorate, the government aimed to introduce a more ambitious legislative program that would allow them to make the case that majority power was necessary to implement their full vision (P. H. Russell, 2008, 57). An example of the government’s new legislative tactic was an omnibus crime bill that repackaged the tough-on-crime legislation that had fallen off the agenda at the end of the first session. The omnibus bill did contain revisions to individual pieces of legislation previously negotiated with the opposition; however, during the second session, there would be no further compromise. The government showed it was willing to tempt defeat on the passage of the omnibus crime bill, and other subsequent omnibus bills, by treating them as confidence motions (P. H. Russell, 2008, 57). It approached the Speech from the Throne in a similar fashion. Prime Minister Harper gave an ultimatum to the opposition parties: if they were to vote in support of the Speech, he would expect them to support the entirety of the government’s legislative agenda going forward (Gervais, 2012, 186). Despite this tough talk, the government did make it politically viable for the Liberals to escape triggering an election. The fiscal agenda was not overly antagonistic and contained popular, if minimalist, Chapter 5. Qualitative Validation 89 spending increases. On files such as Environment, the proposals made were so general that the could find no tangible reason to oppose the February 2008 Budget, granting the government a few much-needed votes. The Liberals decided not to vote against the budget but to abstain, narrowly permitting the government to survive as the Bloc and NDP both voted against it (Gervais, 2012, 191; P. H. Russell, 2008, 59). This split between the Liberals and the NDP was again evident in March 2008, when the NDP advanced a motion of no confidence concerning the government’s inadequate action to address climate change, including a lack of progress on the stalled Clean Air and Climate Change Act in the second session. Only 11 Liberal MPs, including leader Stéphane Dion, showed up for the vote and joined the NDP and Bloc in voting against the government. The NDP gambit succeeded in further typecasting Dion as a weak leader unwilling to take a stand on the policy file for which he ostensibly had the strongest claim of legitimacy (Gervais, 2012, 198). As the Liberals’ unwillingness to trigger an election persisted, the Conservatives took it upon them- selves to dissolve Parliament themselves in order to seek the majority they desired, their recent fixed election date legislation notwithstanding. The Prime Minister called upon the Governor General in September 2008 and set an election for October 14, 2008, about one year before the anticipated election date. He argued that Parliament had become dysfunctional and he had found it impossible to negoti- ate a successful compromise with opposition leaders on the upcoming legislative agenda; however, the government had largely been successful so far on implementing its program (LeDuc et al., 2010, 508). The looming threat of economic crisis and the Liberals’ weakness in the polls were more significant contributing factors to the decision’s timing.

5.2.2 Topics of Debate

Table 5.12 shows the topics discussed in at least 50 question-answer exchanges during Question Period in the 39th Parliament, ordered by descending similarity score. I perform an identical statistical analysis to that performed for the 38th Parliament (see Appendix A.2 for details). Again, I find a significant overall difference across topical mean similarity scores at p < 0.001 with a small effect size (Cohen’s f 2 = 0.053) (Cohen, 1988). The right-most column of Table 5.12 shows a letter representation of groups of topics among which there is insufficient evidence of difference in similarity scores, and across which there is a statistically significant difference.

A preliminary scan of speeches in low and high similarity topics reveals an immediate parallel with the 38th Parliament: low similarity topics are typically related to political scandal. National Defence primarily concerns two political controversies surrounding Minister Gordon O’Connor, the first being his past as a defence industry lobbyist and the second involving reports of abuse of Afghan detainees. As in the 38th Parliament, a major political scandal involving the actions of a previous Conservative govern- ment and Prime Minister occupied a substantial proportion of debate yet yielded an overall similarity score around the mean: the Airbus payoffs scandal implicating former Prime Minister and businessman . Ethics speeches overlap with the Airbus scandal, but also involve controversy surrounding Conservative staffers’ actions toward late MP Chuck Cadman in the months prior to his death. A notable exception to this pattern of similarity with the 38th Parliament is the Cit- Chapter 5. Qualitative Validation 90

Table 5.12: Topics of Opposition-Minister Exchanges in Question Period, 39th Parliament

Subtopic Mean Similarity Score Standard Deviation n Groups Child Care 0.288 0.137 56 e Business of the House 0.242 0.101 57 a e 0.236 0.111 66 ab e Health 0.212 0.128 66 abcde Foreign Affairs 0.211 0.101 204 abc Aboriginal Affairs 0.199 0.091 152 abcd Afghanistan 0.195 0.096 443 abcd Airbus 0.192 0.089 134 abcd The Budget 0.187 0.102 76 abcd The Economy 0.185 0.094 59 abcd 0.184 0.095 101 abcd Justice 0.181 0.093 88 bcd The Environment 0.178 0.087 428 d Government Appointments 0.176 0.095 68 bcd National Defence 0.175 0.097 110 cd Ethics 0.173 0.099 61 bcd Citizenship and Immigration 0.169 0.083 67 cd

izenship and Immigration topic; it does not concern any particular scandal but consists of an ideological contest around substantial policy changes regarding undocumented foreign workers, temporary worker visas, and a backlog in immigration case processing. A closer look reveals this debate was strongly partisan, involving accusations by the new Conservative government of prior Liberal mismanagement, and on the Liberal opposition side of a Conservative attempt to betray Canadian values of diversity and inclusion.

Among high similarity topics, Child Care exemplifies a policy area that both government and oppo- sition emphasized in their electoral platforms, sharing a common frame of reference regarding its societal importance but differing on appropriate policy solutions. Interestingly, the Business of the House topic represents a procedural anomaly. These exchanges consist almost entirely of questions asked by Ralph Goodale, acting in his role as , to the Conservative House Leader, seeking clar- ity on the government’s timetable and in particular opportunities for debate. Although these speeches are less interesting from a substantive perspective, they are a useful confirmation of the relationship between accountability and high similarity scores. The questions are intended to solicit information and explanation on government’s impending legislative plans, and are answered in similarly straightforward fashion, yielding a high mean similarity score.

As with the 38th Parliament, in the following sections I examine in more detail the highest and lowest similarity speech examples drawn from a selection of topics: Child Care, Citizenship and Immigration, Ethics, and Airbus. Table 5.13: Child Care (39th Parliament) High Similarity Speech Pair Examples

Date Opposition Member Question Text Minister Answer Text Similarity

25/4/06 Michael Savage Mr. Speaker, that is not a vision, that is Mr. Speaker, we are serious about creating 0.71084337 a fiction. Choice in child care exists only these spaces and about child care overall. when child care spaces exist. A choice was We will be investing $1.25 billion in the cre- made in Canada to provide real child care ation of these child care spaces. We will be and better training and wages for child care putting forth legislation for parents to re- workers. Over 60% of Canadians voted for ceive $1,200 a year to help with their choice parties on January 23 that supported real in child care. child care. If the Liberal Party is serious about help- When will the government get serious about ing parents with child care, then I suggest helping Canadian children and families? it support our choice in child care allowance.

10/4/06 Bill Graham Mr. Speaker, the child care network in Que- Stephen Harper Mr. Speaker, the leader of the opposition is 0.58474991 bec is a model for Canada and the entire suggesting that some provinces, especially world. In Quebec there is reason to be Quebec, are capable of managing their own proud. They have a program that most child care system. We respect that. Canadians need. Yet, the Prime Minister We intend to provide an allowance to every will not budge. He claims that these tax family for child care. That way families will benefits will be equivalent to a national child have a choice and have a program that can care program. create new child care spaces. That is what Will he now promise to respect the agree- this government will do. ments that the Liberal government reached with the provinces on child care? Table 5.14: Child Care (39th Parliament) Low Similarity Speech Pair Examples

Date Opposition Member Question Text Minister Answer Text Similarity

25/4/06 Mr. Speaker, one would think that in 13 Diane Finley Mr. Speaker, with the help of the oppo- 0.06241878 years as a government in waiting the Conser- sition parties, Canadian parents across the vatives could have come up with something country will receive in the budget, should more substantive than vague promises and the opposition members support it, $1,200 numbers pulled out of the air. a year for each child to help with the choice They have no plan and have never had a in child care that meets their needs, whether plan. The minister has admitted that her- it is day care, babysitters, grannies, moms self. Why is she now trying to cobble to- or dads staying at home. Parents will have gether a plan when the provinces and the that option and then we will work on the families have said that they like our Liberal creation of 125,000 new spaces that meet the plan? Is this just spite? needs of real working families.

19/2/07 Bonnie Brown Not surprisingly, Mr. Speaker, the govern- Mr. Speaker, the member should really di- 0.05938557 ment continues to boast about its measly rect that question to her own leader. It $100 per month cheque, but parents are now is her leader who said he would take back receiving the notice of taxes due on this 100% of the universal child care benefit, money. Single parents will have to pay the something that today goes to 1.4 million highest rate. families on behalf of 1.9 million children, for The former minister spent $750 on a limou- $10 billion over the next five years. sine to deliver the first cheque, but now it is That is something the leader of the Liberal tax time and the government has come col- Party said he would take away from Cana- lecting. Will the current minister be spend- dian families. Shame on him. ing hundreds of dollars on limousines to col- lect the tax from the family that received the initial cheque? Chapter 5. Qualitative Validation 93

5.2.3 Child Care

Child care was a policy issue on which the opposition Liberals and governing Conservatives shared common ground: both had promised greater access to child care for Canadian families as a major platform plank. However, the parties disagreed substantially on how child care should best be delivered, with the Conservatives favouring a tax credit approach and the Liberals a government provision model. Informing the background of this debate was the successful but controversial Quebec subsidized universal daycare model. The Conservatives emphasized their plan would not infringe upon provincial rights to implement care services as they wished, but they would not commit to endorsing a Quebec-style model on a national basis as the NDP and Liberals suggested. Overall, the child care topic is a straightforward example of a policy issue area of debate within which a shared frame of reference for accountability exists.

The high similarity examples in Table 5.13 exemplify this type of policy debate. In the first exchange, the opposition position is that improved child care would best be achieved through direct government spending to create more spaces. The Conservative response is to acknowledge the importance of the shared goal of creating spaces, but to contest the method of accomplishing this goal. Accountability in this exchange consists of the government addressing the substance of the opposition question, acknowl- edging a common frame of reference, and detailing what it will do to resolve the situation. The second example follows a similar pattern to the first, this time with particular reference to the Quebec model. This question is more directly one of accountability, referencing a demand to respect provincial agree- ments on child care made by the previous Liberal government. This series of 10 bilateral agreements negotiated with the provinces, in conjunction with $5 billion in federal spending commitments, was achieved at the very end of the Martin era and delivered the nation-wide child care program that he had promised during the 2003 campaign. Upon their electoral victory, the Conservatives announced their decision to let these provincial agreements lapse (Delacourt, 2010). Given no Westminster government is bound by the actions of its predecessors, especially considering this series of provincial deals was not solidified in legislation by the Liberals, the question is a political one. The Prime Minister does not address the question as posed, but dodges and reinterprets its intent in order to restate his government’s approach to child care spending. This example suggests the similarity measure of accountability might not perform as accurately at high similarity levels when the topic of discussion is a policy contest.

However, the low similarity Child Care examples in Table 5.14 still show a clear contrast to the high similarity cases. The opposition criticism of the Conservative tax credit approach is much more aggressively partisan than the high similarity exchanges; the second question, for example, cites wasteful spending on a photo-op related to the policy announcement. Two possible strategies for a government response to these tactics are on display, both of which reflect a disconnect between government and opposition frames of reference characteristic of low accountability. In the first example, Diane Finley essentially ignores the opposition and uses her response as an opportunity to repeat talking points about the government’s policy in repetitive fashion. In the second example, Monte Solberg takes an opposite approach and directly retaliates with attacks on the Liberal Party and Dion in particular. The avoidance of questioning and ideological bickering in these examples lead to lower lexical overlap in a manner consistent with its use as a measure of accountability. Table 5.15: The Environment (39th Parliament) High Similarity Speech Pair Examples

Date Opposition Member Question Text Minister Answer Text Similarity

31/5/06 Raymonde Folco Mr. Speaker, Quebec had to establish its Mr. Speaker, the only party in this House 0.58129343 own plan to reach the Kyoto targets be- that failed Quebec’s opportunity to reach cause the Conservatives rejected the made its Kyoto targets is the Liberal Party of in Canada plan that was already in place. Canada. It never put in place any national The Prime Minister confirmed that Que- plan with the provinces. In fact, it agreed bec cannot count on financial support from to targets without even consulting with the the Conservative government. The Quebec provinces years ago on how to put that im- Minister of the Environment was clear: if plementation plan in place. Quebec does not meet its Kyoto targets, the Our government is working with the Conservatives will be to blame. provinces, including Quebec. When I was in Why is this government abandoning the Quebec, Minister Béchard told me that the provinces instead of assuming those respon- highest cause of greenhouse gases in Quebec sibilities that, clearly, should be taken care was transportation, which is why this gov- of by the federal government? ernment has invested in public transporta- tion.

31/10/06 Mr. Speaker, environmental groups agree Rona Ambrose Mr. Speaker, that is rich coming from the 0.42224772 that the minority Conservative govern- party that has no plan on global warming. ment’s environmental plan is a disaster. The member opposite knows full well that Today the NDP abandoned Kyoto as well. the clean air act is made up of amendments This dead air act rips the heart out of ex- to the Canadian Environmental Protection isting environmental protection legislation, Act to strengthen it so that we can regu- leaving Canada with a fragmented, uncoor- late every industry sector across this coun- dinated and piecemeal Canadian Environ- try, both for greenhouse gases and for air mental Protection Act. No amount of tin- pollution. kering with this disaster will salvage it. It I would encourage the member to work with is simply wrong-headed. us, to strengthen the Canadian Environ- When will the government withdraw this mental Protection Act and the other acts fraud of a bill and bring forward a genuine that we are looking to strengthen, and to plan on global warming? support the clean air act. Table 5.16: The Environment (39th Parliament) Low Similarity Speech Pair Examples

Date Opposition Member Question Text Minister Answer Text Similarity

19/6/08 David McGuinty Mr. Speaker, less than a month ago in Lon- Mr. Speaker, it is sad to see the Liberals 0.02791 don, England, the Prime Minister admitted once again lowering the tone in this place, that he would “effectively establish a price but I understand the member’s anger. I un- on carbon of $65 a tonne”. He argued that derstand it because I think he is more than $65 carbon was economic. a little frustrated that he cannot persuade Can the Prime Minister now tell us why one person, his brother, the Premier of On- he says one thing outside of Canada and tario, who has come out squarely defending something else completely different here at Ontario taxpayers against the Liberal tax home? While he is at it, can the Prime trick. Minister name a single economist or envi- Premiers across the country understand ronmentalist who says his plan will do what that the Liberal so-called green shift will he claims? end up shafting taxpayers with higher en- And by the way, where the hell is the Min- ergy prices, higher food prices, and higher ister of the Environment? prices on just about everything. They are not going to buy this bogus tax shift.

28/11/07 Stéphane Dion Mr. Speaker, what the Prime Minister is Stephen Harper Mr. Speaker, that question is coming from a 0.02672612 saying is that he will pretend to do some- leader who raised greenhouse gas emissions thing when in fact he will do nothing mean- 35% when he was in office. Our position ingful on climate change. His excuse is that is to lower greenhouse gas emissions, not to some other countries are not doing enough. raise them. Instead of pushing the world in the right di- We have been absolutely clear. In order rection to do more, he will drag everyone to to reduce greenhouse gas emissions globally, do less and less and less, down and down we must have mandatory emissions targets and down. for all major emitters. That is the position Is the Prime Minister sending his minister to of the Government of Canada. Shamefully, Bali to sabotage Bali as the Prime Minister it is not the position of the Liberal Party. sabotaged the Commonwealth? That is the wrong position. It is the wrong position for Canada and it is the wrong po- sition for the globe. We are going to fight for a strong international agreement. Chapter 5. Qualitative Validation 96

5.2.4 The Environment

The Environment was also a contested policy issue area between the government and the opposition Liberals during the 39th Parliament, with a significant third-party role played by the NDP. This topic yields an interesting comparison with Child Care, as despite its policy focus it had a lower similarity score overall likely due to a higher degree of partisanship. As discussed earlier, the Liberals took a strong stand on Dion’s “green shift” policy platform, including a carbon tax scheme. The Conservatives decided to let Canada’s Kyoto commitments lapse and initially focused on reducing other air pollutants before conceding to introduce greenhouse gas emissions targets. Finally, the NDP clashed with both other major parties in its persistent call for tougher environmental regulations.

The high similarity examples in Table 5.15 are both questions posed by Liberal MPs and responded to by Minister of Environment Rona Ambrose. As observed in the Child Care topic, there is persis- tent partisan conflict underlying high similarity examples in this policy contest, including references to Quebec-specific concerns. In response, the government again recasts these partisan accusations into a castigation of prior Liberal failures, in this case a lack of consultation with the provinces and of respect for provincial jurisdiction. However, Ambrose also provides additional information on the government’s legislative plans relevant to Quebec’s interests. The second example is even more of a harsh partisan con- test: Liberal MP John Godfrey using terms like “fraud”, “wrong-headed”, “disaster”, and “dead air act” in his question about the Conservatives’ proposed changes to the Canadian Environmental Protection Act. A closer examination of the text reveals that despite the inflammatory language, the two participants do agree on the scope and legitimacy of the debate but disagree on appropriate policy solutions and their specifics. For example, the Liberal MP considers the changed Canadian Environmental Protection Act as “fragmented” and “piecemeal”; the Conservative position is that the package is made up of multiple amendments. This shared frame of reference is what gives this exchange a higher similarity score despite evident ideological differences.

The low similarity examples are in Table 5.16 and differ noticeably from the high similarity exchanges; both are partisan, ideological contests, but the low similarity examples are less policy-focused and more argumentative and petty. The first consists of an exchange of personal attacks. The Liberal MP speaking, David McGuinty, is the brother of sitting Liberal Premier of Ontario Dalton McGuinty. His accusations are directly targeted against the Prime Minister and the Minister of Environment. The government response is primarily a partisan attack against McGuinty and his brother, and is characteristically repetitive in its use of language. The second exchange between the government and opposition leaders is likewise characteristic: both Dion and Harper are partisan, negative, and repetitive in their use of language, and Harper specifically refers to prior Liberal governments in his response. To sum up, within a topic of debate representing a highly-contested policy area, high similarity passages can be strongly ideological while still surrounding a shared and substantive frame of reference. This is not surprising: adversarialism is not opposed to accountability in a Westminster system, but is indeed critical to collective accountability. When low similarity examples are juxtaposed, they are qualitatively different in terms of their language use, aggression, and lack of substantive debate, from one or both sides. Table 5.17: Ethics (39th Parliament) High Similarity Speech Pair Examples

Date Opposition Member Question Text Minister Answer Text Similarity

5/4/06 Ujjal Dosanjh Mr. Speaker, the Prime Minister claims the Stephen Harper Mr. Speaker, once again, the Minister of 0.46328625 accountability act as the top legislative pri- National Defence is not only in compliance ority of his government, but his defence min- with all the conflict of interest rules of the ister’s list of former clients reads like a who’s previous government but with much more who of the defence industry. stringent conflict of interest rules that we Defence procurement represents nearly half are introducing. of all government procurement. Why did I will only say that it is about time we had a the Prime Minister give that portfolio to a Minister of National Defence who had some former defence lobbyist? background and some knowledge in national defence.

5/4/06 Ujjal Dosanjh Mr. Speaker, the Prime Minister’s conflict Stephen Harper Mr. Speaker, I can simply assure the House 0.44342203 of interest code states that ministers shall and the hon. member that the Minister of avoid even the appearance of being under an National Defence has complied with all as- obligation to anyone who might profit from pects of the conflict of interest code and will special consideration. be an outstanding Minister of National De- The Minister of National Defence was a reg- fence. He brings tremendous knowledge to istered lobbyist until February 2004, repre- an area of government that needs a lot of senting at least 28 defence firms. Why did rebuilding after 13 years of that party in of- the Prime Minister appoint that minister in fice. violation of his own code? Table 5.18: Ethics (39th Parliament) Low Similarity Speech Pair Examples

Date Opposition Member Question Text Minister Answer Text Similarity

13/3/08 Stéphane Dion Mr. Speaker, the Prime Minister will not Stephen Harper Mr. Speaker, as I just said and have been 0.05114958 get off so easily. There was a tape and saying for the past two weeks, these alle- we were able to hear him. The question gations of criminal wrongdoing are utterly he was asked in the tape was about a $1 false. million insurance policy. He answered by I am availing myself of what any Canadian speaking about “financial considerations” for would do when he has been treated in a Mr. Cadman, “financial insecurity,” “finan- completely unacceptable and illegal manner, cial losses,” and “financial issues.” which is what the Liberal Party has done Once again, the question is as follows: what here. I have every right, as does my fam- “financial insecurity” was the Prime Minis- ily, to defend our reputation. The Liberal ter talking about when he replied to a ques- Party will, as I said, come to regret engag- tion about a $1 million insurance policy? ing in this illegal and untruthful behaviour.

28/2/08 Stéphane Dion Mr. Speaker, the Prime Minister knew that Stephen Harper Mr. Speaker, these allegations are false and 0.03178209 his emissaries, his agents, people acting un- irresponsible. der his orders, tried to buy off a dying man. I can understand that the leader of the Lib- He knew it. It was immoral, unethical and eral Party is not going to accept my word illegal. It was against section 119 of the for it. I can understand that he is not going Criminal Code. to accept what Conservative officials said. I Why did the Prime Minister authorize such can understand that he is not going to ac- immoral and illegal wheelings and dealings? cept the fact there is no evidence, but on Why did he let that happen? national television Chuck Cadman said he discussed nomination meetings with Conser- vative officials. He said, “That was the only offer on anything that I had from anybody.” We have a deceased colleague who was highly respected here, so the question is, why will the leader of the Liberal Party not accept his word? Chapter 5. Qualitative Validation 99

5.2.5 Ethics

The Conservative government also experienced its share of political scandals during the 39th Parliament despite its promised focus on accountability issues. The high similarity examples in Table 5.17 both concern Minister of National Defence Gordon O’Connor and the legitimacy of his appointment at the opening of the 39th Parliament. O’Connor seemed a natural fit for the position, having been Harper’s defence critic in Opposition with an experienced background in the Canadian military. However, follow- ing retirement from his military position as a Brigadier General, O’Connor had worked for eight years as a defence industry lobbyist, serving a succession of major firms including General Dynamics, Raytheon, and Airbus Military (Pugliese, 2006). Furthermore, O’Connor had made a rapid transition from lobbyist to elected official, having de-registered as a lobbyist just four months prior to his successful election in June 2004. The new Accountability Act restricted former government officials from becoming lobbyists for five years, but the Prime Minister argued there was no issue with O’Connor’s reverse transition from lobbying to public affairs (Blanchfield, 2006). Underlying the high similarity examples is a specific and shared linguistic focus on the Minister of National Defence and the conflict of interest code. Consistent with accountability behaviour during a scandal, the government offers rationales supporting its decision, arguing O’Connor is acting in compliance with conflict of interest rules. It provides additional informa- tion to justify these rationales, in both cases emphasizing the knowledge and experience that O’Connor brings to the position. Finally, it promises future action for which it can be held accountable, namely more stringent regulations. The low examples in Table 5.18 are both related to the Chuck Cadman controversy. Cadman was a former Canadian Alliance MP who had sat as an independent in the House since the 2004 election, controlling a valuable swing vote under minority parliament conditions. In May 2005, Cadman had recently suspended chemotherapy following a serious cancer diagnosis when he flew to Ottawa to vote in support of the Liberal budget. His vote forced a tie and allowed the government to stay in power, and Cadman died just weeks after the vote took place. In early 2008, a biography was published including allegations by his widow that Cadman was offered a $1 million dollar life insurance policy by Conservative party operatives in exchange for his vote against the Liberal government on the budget (McGregor, 2008). Evidence emerged that these representatives were high-ranking and extremely influential Conservatives; a tape that indicated Prime Minister Harper was directly aware of the offer. A Conservative news conference suggested in June 2008 that the recording of Harper had been doctored, while forensic experts retained on either side debated its legitimacy (Naumetz, 2008). The Liberals asked the RCMP to investigate a charge of bribery of an elected official in February 2008, but the investigation had uncovered nothing of substance by May (Greenaway, 2008). Nevertheless, the Liberals made much political noise over the Cadman affair and the audio tape, to the extent that the Prime Minister filed a libel lawsuit against the Liberal Party (Hanes, 2008). The chilling effect of the libel suit eventually pushed the Cadman controversy off the House of Commons radar. The examples in Table 5.18 show characteristics we have come to expect in the low similarity case. In the first, Opposition Leader Dion is asking a relatively specific question regarding the Prime Minister’s accountability for the Cadman tape; however, the question is loaded and particularly repetitive. In re- sponse, Harper does not address the specific repeated quotes and phrases, nor the more general question of what took place during the recording. Instead, he retaliates by claiming innocence of wrongdoing and chastising the Liberal Party for unfairly and illegitimately pursuing a partisan issue. The second question is similar to the first: Dion makes inflammatory accusations of both criminal and moral trans- Chapter 5. Qualitative Validation 100 gression by the Prime Minister and asks a loaded and repetitive question. However, in the response, the Prime Minister attempts to address the situation in an accountable manner. He cites additional information the government has provided and statements from Cadman himself absolving the Prime Minister and his officials of any illegal actions. But, as Harper points out, no amount of information or justification would be likely to satisfy the Opposition Leader at this time. This speech exchange exem- plifies how parliamentary accountability requires earnest participation from both sides of the House; if an Opposition asks misleading and unanswerable questions, a government is unable to be accountable for them. The similarity score measure appears to adequately capture different mechanisms leading to low parliamentary accountability.

5.2.6 Airbus

The Airbus scandal was not as devastating for the Conservatives as was the sponsorship scandal for the Liberals, but it nevertheless makes a reasonable comparative case. Like Sponsorship Program, Airbus is near the mean in terms of topical similarity scores; more substantively, it also implicated a prior government and former Prime Minister. The affair dated back to the late 1980s, when , still a Crown corporation at the time, made a purchase of Airbus aircraft. In the mid 1990s, allegations emerged that Progressive Conservative Prime Minister Brian Mulroney had put pressure on Air Canada to make the deal, and that Canadian politicians had received financial kickbacks as part of the arrange- ment. German-Canadian businessman Karlheinz Schreiber was named as the alleged intermediary. Over the next ten years, the RCMP investigated the Airbus contract and related business deals. Schreiber continued to fight the Justice Department, the RCMP, and Canadian courts of various levels to avoid extradition to Germany for charges of tax evasion, bribery, and fraud (Canadian Press, 2007). In 2004, new allegations surfaced that Mulroney received cash payments of $300,000 from Schreiber starting in 1993, and in 2007 that the former Prime Minister had not paid taxes on these receipts. Schreiber retal- iated by suing Mulroney for return of this money plus interest, arguing Mulroney had not provided the lobbying services he had paid for. Schreiber further revealed in a 2007 affidavit that Mulroney received the first of these payments before he had stepped down as Prime Minister. Like Martin, Prime Minister Harper attempted to pre-empt any fallout from the scandal by calling for a third-party inquiry into the allegations, involving the RCMP, and affirming the government would have no contact or interference with the parties involved (Aubry, 2007). This became of urgent importance in light of news reporting on the personal relationships among Harper, key ministers like Peter MacKay and , and both Mulroney and Schreiber (Hanes, Aubry, & White, 2007). The most obvious feature of the high similarity examples in Table 5.19 is that the government responses are very short in length. Under these circumstances, a few repeated key terms can dispro- portionately raise the similarity score between a short speech and its lengthier partner. My calculation method normalizes speech vectors to control for the effects of word count disparities, and as a further measure speeches below 25 words in length were eliminated from the analysis dataset. Nevertheless, the second high similarity example in the table just escapes this cutoff, demonstrating an edge case for the similarity measure that can raise questions of measurement validity. On the same token, the short length of the government responses make it difficult to qualitatively assess the level of accountability demonstrated in each exchange. Both questions demand clarification on specifics relating to the public inquiry into the Mulroney-Schreiber controversy headed by Professor . Both answers address the question asked by providing information or committing to future actions. The government’s Table 5.19: Airbus (39th Parliament) High Similarity Speech Pair Examples

Date Opposition Member Question Text Minister Answer Text Similarity

7/4/08 Brian Murphy Mr. Speaker, this government has said that Mr. Speaker, again, the opposition is trying 0.49878375 it is absurd to ask whether there was any to make up stories or scandals out of thin contact between ministers or government air. representatives and Mr. Mulroney that may When I was Minister of Industry, I never have been organized or facilitated by Mr. met with Mr. Mulroney about anything. I Mulroney. However, Mr. Mulroney did did however have social contact with him meet in private with the former Minister of during his book launching in . Industry last April. Will that meeting be part of the public in- quiry’s mandate, yes or no?

9/4/08 Mr. Speaker, as parliamentarians, we are Mr. Speaker, we have said that we will act 0.43023721 not addressing any questions to Professor on the recommendations made by Professor Johnston, but rather to the Prime Minister. Johnston on the public inquiry and that is He is responsible for launching an inquiry exactly what we intend to do. and he promised a public inquiry. The fun- damental nature of a public inquiry means that testimony is given publicly, Canadians can watch and listen to it, and everyone who should appear does appear, including Fred Doucet and everyone close to Mr. Mulroney. We understand why the Prime Minister said that Mr. Mulroney was appreciated as a mentor and advisor, and that he found it awkward to get to the bottom of all this concerning Mr. Mulroney and those close to him. We... Table 5.20: Airbus (39th Parliament) Low Similarity Speech Pair Examples

Date Opposition Member Question Text Minister Answer Text Similarity

16/11/07 Tina Keeper Mr. Speaker, last Friday the Prime Minis- Mr. Speaker, if the member opposite has in- 0.04286152 ter belatedly warned his ministers against formation that is somehow going to be per- having any dealings with Brian Mulroney, tinent to this full public inquiry, then she but does that prohibition extend to Mul- should absolutely bring that forward to Pro- roney’s close associates such as lobbyist Fred fessor Johnston. Doucet? The other thing I find very interesting is Fred Doucet arranged the meeting at Har- that members opposite seem to have great rington Lake and worked for one of Mr. details, which I appreciate because all Cana- Schreiber’s companies. He is now actively dians want to get to the bottom of these lobbying the government on 11 files, mostly questions. They have great details, but in the defence department. Can the min- when it comes to details of where the $40 ister, who is also in conflict, tell the House million in stolen money went, which have whether he and his staff have ceased contact come out through the Gomery inquiry, with with Mulroney operative Fred Doucet? Liberal friends they seem to have no infor- mation at all. I wish they would be as ag- gressive in helping us pursue that $40 mil- lion as we are in wanting to pursue this $300,000.

28/11/07 Robert Thibault Mr. Speaker, let us now consider subsection Rob Nicholson Mr. Speaker, the Liberal Party has demon- 0.04233338 40(3) of the Extradition Act that gives the strated over the last couple of days why it is minister the complete authority to make the inappropriate and unfruitful to discuss mat- extradition subject to “any conditions that ters like this and negotiate matters of law on the Minister considers appropriate”. the floor of the House of Commons. Why does the minister not consider it ap- I indicated that we will follow all the rules, propriate to make the surrender of Karl- all the laws. We have indicated our coop- heinz Schreiber specifically conditional upon eration, but it has become obvious that the his remaining physically present in Canada Liberals will never be happy. for as long as it takes to testify under oath at a public inquiry and parliamentary com- mittee? Or will it be necessary for Parlia- ment to again trump the minister to keep Schreiber from being silenced by that Mul- roney infested government? Chapter 5. Qualitative Validation 103 short-and-direct strategy differs from the approach taken by the Liberals during the sponsorship scandal, which was to provide more detailed rationales and promises, but the substance of the government’s reply is similar in both cases. It is notable that the similarity scores for those high similarity examples in the Sponsorship Program topic were much higher—in the 0.6 range. Thus, the fact that the length disparity between government and opposition speeches yields some uncertainty in the measure is captured in the quantitatively lower score that results. The low similarity passages in Table 5.20 are more consistent with familiar patterns observed across low similarity examples. In the first example, Stockwell Day deflects Tina Keeper’s pointed question about a particular lobbyist close to Mulroney and Schreiber by referring her to the independent inquiry. The large majority of the government response is instead devoted to rehashing the Gomery inquiry and the sponsorship scandal. The second exchange follows a similar pattern to the first: a focused and detailed, if somewhat loaded, question goes completely unaddressed by the government response, which is accusatory and partisan.

5.3 Summary: Comparison of 38th and 39th Parliaments

In the preceding analysis, I attempted to understand accountability dynamics in Question Period speeches from a qualitative perspective in order to validate my lexical similarity measure of parliamen- tary accountability. Drawing on Bovens’ empirical model of accountability, I expected a low similarity exchange results when the government fails to provide relevant information and rationales to the oppo- sition, the opposition fails to perform its duty by interrogating those rationales, or both. Under these conditions, I expected speeches to be more about partisan and ideological signalling than about substan- tive policy or accountability discussion. In the passages examined above, a general pattern emerged of textual characteristics associated with low similarity scores including partisan attacks, loaded questions, negative words, accusations, evasion, general and vague phrasing, and repetition, on the part of one or both of government and opposition. From the “character assassination” of David Dingwall to the “completely unacceptable and illegal” claims made about the Prime Minister’s involvement in the Chuck Cadman case, there was an observable association between debate topics relating to matters of political scandal, partisan antagonism, and low mean similarity scores. Loaded or unanswerable questions were a common opposition tactic, such as asking of Prime Minister Martin “When did he know?” of misappropriated spending by his office. For the government’s part, low similarity speeches typically failed to address any part of the question asked and allocated more time to partisan tangents, for example when the Conservative government deflected a question about carbon pricing with personal attacks against the questioner’s brother, Liberal Premier of Ontario Dalton McGuinty. Low similarity exchanges were less likely to acknowledge a common frame of reference, such as a particular policy, or shared basis of legitimacy. They also frequently confirmed Stewart’s observation that repetitive, vacuous speechmaking—not adversarialism—is most characteristic of a lack of accountability in parliament, with blatant repetition of words and phrases a consistent feature. Finally, in low similarity cases governments were less likely to announce any commitments implying future accountability. My qualitative review of low similarity exchanges also revealed a potential measurement validity issue. The patterns discussed above also appeared to be correlated with a disparity between the word count in question and response. Examples include the second passage in Table 5.18 from 39th Parliament Chapter 5. Qualitative Validation 104 speeches about Ethics, or the first passage in Table 5.3 from the 38th Parliament involving Natural Resources. To test for this possibility, I perform a simple test for correlation between similarity scores and the absolute values of the difference between word counts in the speeches studied in this chapter. My results (available in Appendix A.3) find the relationship significant (p = 0.03) but with a very low effect size (R2 < 0.001). More theoretically, if a detailed opposition question is followed-up by a short government response, or vice versa, can we assume that the exchange reflects poor accountability? I have argued that parliamentary accountability is a two-way street: it is equally the opposition’s duty to ask reasonable, non-leading questions as it is the government’s duty to answer these questions as fully as possible. However, this potential measurement issue is worth addressing in my quantitative analysis. In the next chapter, instead of examining individual question-response pairs, I aggregate government and opposition speeches from each daily Question Period and normalize the resulting combined vectors in order to minimize the impact of word count on the accountability measure. High similarity cases typically shared underlying context surrounding a particular policy, regulation, or event such as a meeting or negotiation, and both government and opposition would make reference to the same terms as a result. For example, in the 38th Parliament’s Health topic, the government held an apparently contradictory position on human cloning between its stand taken in the Assisted Human Reproduction Act and Canada’s opposition to a UN Resolution on Human Cloning. This was the subject of an opposition inquiry for clarification, which the government provided by both restating its position for the record and explaining the deeper reasoning behind its lack of support for the UN initiative. Partisan attacks were less likely to occur in high similarity exchanges, although they were by no means absent, especially regarding politically controversial policy areas. For example, in the 39th Parliament The Environment took on an increased partisan relevance, leading to a higher degree of partisan differentiation even across high similarity speeches and a lower similarity score in relative and absolute terms than observed in less-controversial cases, such as Child Care in the 38th Parliament. In general, high similarity exchanges on policy area topics appeared more partisan during the 39th Parliament than the 38th. While this sample is small and non-random, it suggests the possibility of an interaction effect between party, majority status, and accountability, a phenomenon I identified in the previous chapter and will investigate in more detail in the quantitative analysis in the next chapter. Finally, similarity responses contained more supporting details such as spending amounts and names of legislation, and were more likely to contain commitments to action the government is intending to take, indicating that the government is making a statement about a willingness to be accountable. For example, the David Dingwall high similarity examples contain commitments from the Liberal government to abide by the specific recommendations of Privy Council Office lawyers on compensation and auditing; the corresponding low similarity examples contest the validity of the questions posed and refuse to address them. In this section, I explored how my lexical similarity measure of accountability correlates with quali- tative evidence of accountability in Question Period debates in order to test the validity of the measure. Consistent patterns emerged across multiple debate topics and both parliaments studies within the tex- tual dynamics of high and low similarity examples; overall, the quantitative measure reasonably captures the extent to which both government and opposition are earnestly participating in parliamentary ac- countability as a process. The next step in generalizing this analysis is to examine opposition-minister exchanges as a higher level of aggregation, at the daily Question Period level, and analyze statistically the relationship between this aggregate measure and other variables of interest. Chapter 6

Quantitative Analysis: Question Period, 1975-2010 and Daily Debate, 1945-2015

In this chapter, I return to the research questions advanced in Chapter 3. Are minority governments more accountable than majority governments, and does the percentage of seats a government controls matter? How is parliamentary accountability related to the popularity of a government? Finally, how are accountability and ideology related in parliamentary speech? Using a quantitative approach, I test the five theoretical propositions from Chapter 3 to answer these questions. To restate, I rely on two datasets in this chapter. The first was constructed by concatenating the government and official opposition speeches within a given day’s Question Period, then calculating similarity scores across the two composite texts. I calculate such Question Period-level scores for the years 1975-2010, selecting this scope for two reasons: Question Period attained a consistent length and place on the daily timetable only as of 1975, and polling data at a minimum quarterly resolution are available only up until 2010. In the analysis, I employ a mixed model including hierarchical random effects for sessions within parliaments, and quarter where relevant, to study the dependent variable of similarity score.1 The second dataset broadens this approach to dataset construction to the daily level and to a timespan from 1945-2015. Instead of concatenating speeches that took place only within Question Period, I combine all government and all official opposition (or all opposition) speeches, calculate similarity scores, and perform statistical analysis in a similar fashion. Finally, the last stage of the analysis compares two separate methods of calculating textual similarity scores that have different theoretical implications for detecting accountability and ideology in political speech, using the Question Period dataset.2 All the statistical models in this chapter assume lexical similarity as the dependent variable. Applying the lexical similarity method to both study datasets yields 4105 Question Period-level observations from 1975-2010, and 9102 daily-level observations from 1945-2015. The Question Period

1To aid the reader in interpreting model results, I report simulated p-values for all fixed effects generated using lmertest(Kuznetsova, Brockhoff, & Christensen, 2017). I also calculate effect sizes for each fixed effect using the R2 method (Edwards, Muller, Wolfinger, Qaqish, & Schabenberger, 2008; Nakagawa & Schielzeth, 2013) using r2glmm(Jaeger, 2017). For these effect size calculations, please refer to Appendix B.8. 2For more detailed information on methodology, dataset construction, and analytical techniques, please refer to Chapter 4.

105 Chapter 6. Quantitative Analysis 106 scores are approximately normally distributed (µ = 0.586, σ = 0.065) and are visualized in Figure 6.1. The daily scores are also approximately normal (µ = 0.718, σ = 0.088) but with a noticeable skew to the left, as shown in Figure 6.2. Such a pattern indicates an excess of low similarity observations. A manual examination of speeches sampled from this end of the distribution reveals that this phenomenon is largely attributable to OCR and processing errors generating false or incomplete, non-representative days of debate. As I will discuss later in this chapter, these issues foreshadow how the daily debate dataset proves less useful for a lexical similarity analysis of accountability because the broadened context of the daily level is too contextually general to capture the phenomenon of accountability specified by my empirical model. For more discussion of these issues, and a comparison with simulated distributions, please refer to Appendix B.9.

Distribution of Question Period Similarity Scores 300

200 Frequency 100

0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Lexical Similarity

Figure 6.1: Histogram of Question Period (1975-2010) lexical similarity scores (n = 4105, µ = 0.586, σ = 0.065)

P1: Majority parliaments are less accountable (lower lexical similarity) than minority parliaments.

I begin my quantitative analysis by addressing the central research question of this dissertation: are minority governments more accountable than majority governments? In this section, I investigate this question in two stages. First, I perform an analysis using the Question Period dataset; these data cover a narrower historical range of Parliaments, but represent a closer conceptual fit with the empirical model of accountability underlying my measure and with the qualitative validation performed in the previous chapter. Second, I extend my study to the daily debate level, which permits the incorporation of more historical Parliamentary data at a cost of contextual precision. Table 6.1 shows a quantitative assessment of the relative accountability of majority and minority gov- ernments, measured as lexical similarity between government and official opposition in Question Period. Chapter 6. Quantitative Analysis 107

Distribution of Daily Debate Similarity Scores 500

400

300

200 Frequency

100

0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Lexical Similarity

Figure 6.2: Histogram of daily debate (1945-2015) lexical similarity scores (n = 9102, µ = 0.718, σ = 0.088) Model 1 Model 2 Model 3 Predictors Estimates CI p Estimates CI p Estimates CI p

(Intercept) 0.634 0.613–0.655 <0.001 0.665 0.619–0.711 <0.001 0.593 0.285–0.901 0.013 majorityMajority -0.06 -0.090– -0.029 0.012 -0.065 -0.100– -0.031 0.015 0.008 -0.319–0.335 0.969 gov_partyLiberal -0.043 -0.093–0.006 0.167 -0.042 -0.096–0.013 0.225 majorityMajority×gov_partyLiberal 0.048 -0.008–0.103 0.182 0.047 -0.015–0.110 0.233 poll govt -0.145 -0.283– -0.008 0.084 poll prev govt -0.001 -0.002–0.000 0.234 poll_govt×poll_prev_govt 0.005 0.001–0.008 0.035 govtseatpct 0.071 -0.622–0.765 0.87 majorityMajority×govtseatpct -0.115 -0.833–0.604 0.8

Random Effects σ2 0.00 0.00 0.00

τ00 0.00qtr 0.00sessnum_factor:parlnum_factor 0.00sessnum_factor:parlnum_factor 0.00sessnum_factor:parlnum_factor 0.00parlnum_factor 0.00parlnum_factor 0.00parlnum_factor ICC 0.05qtr 0.31sessnum_factor:parlnum_factor 0.29sessnum_factor:parlnum_factor 0.05sessnum_factor:parlnum_factor 0.33parlnum_factor 0.36parlnum_factor 0.06parlnum_factor Observations 4105 113 116 Marginal R2/ Conditional R2 0.076 / 0.227 0.285 / 0.745 0.223 / 0.727

Table 6.1: Models 1, 2, and 3 investigate the effect of majority status, governing party, government poll popularity, and seat percentage controlled by the government on lexical similarity scores. All models use the Question Period dataset. Mode1 1 studies individual Question Period observations and includes random effects for quarter, session, and parliament, while Models 2 and 3 are calculated on mean quarterly similarity scores and include random effects for session and parliament. Chapter 6. Quantitative Analysis 109

Model 1 investigates the effect of the independent variables of majority status and governing party (in- cluding their interaction) on the dependent variable of lexical similarity. As in previous models, I include random effects terms to account for political and historical variation across sessions and parliaments. Model 1 shows a significant negative effect of majority status on lexical similarity scores (p ≈ 0.01); in other words, minority parliaments are more accountable than majority parliaments, controlling for governing party and random political variation. To visually summarize these data and the findings as a historical trend, I aggregate scores at the parliamentary level and plot them in Figure 6.3.

Mean Question Period Similarity (Government and Official Opposition) Per Session

0.75

0.70

0.65 Majority Status Minority Majority 0.60 Lexical Similarity

0.55

0.50 30-1 30-2 30-3 30-4 31-1 32-1 32-2 33-1 33-2 34-1 34-2 34-3 35-1 35-2 36-1 36-2 37-1 37-2 37-3 38-1 39-2 40-1 40-2 40-3 39-1 Parliament-Session

Figure 6.3: Plot of Question Period lexical similarity scores averaged on a parliamentary session basis, calculated between government speeches and only official opposition speeches. Red data points represent minority governments and blue majority governments.

Before accepting this conclusion, it is prudent to eliminate some alternative explanations. One possibility is that the significance of the result is inflated by an artificially low p-value, given there are more than 4000 observations at the individual Question Period level included. In Model 2 in Table 6.1, I perform a more conservative test: I aggregate observations at the quarterly level and, as will be discussed in more detail in the next section, I include additional terms representing government polling popularity. The significant effect of majority government status (p ≈ 0.01) persists even after moving to a higher level of analysis and controlling for polling popularity. The size of this effect is also substantial (simulated R2 = 0.270).3 A second possibility is that this finding results from computing similarity scores across governments and official oppositions, ignoring any additional variation introduced by third parties. As P.H. Russell

3For effect size calculations for this fixed effect, and for other individual predictors in models in this Chapter, please refer to Appendix B.8. Chapter 6. Quantitative Analysis 110 notes, one of the reasons we might expect minority parliaments to be more accountable is that, in the Canadian case, this implies multiple parties working together without the formation of coalitions; to some extent, a diversity of perspectives must repeatedly be accommodated in order for a government to survive (P. H. Russell, 2008, 16). On the other hand, opportunities on the time table for third parties to speak in the House of Commons are very limited. To study this question in more detail, I rerun Model 2 on an alternative Question Period dataset consisting of lexical similarity scores calculated between government speeches and those of all opposition parties, not just the official opposition. The results of this analysis do not differ substantially, except all scores shift slightly upward, as can be seen in Figure 6.4 in comparison with 6.3. The correlation coefficient between the two sets of scores is very high, at 0.826. Including the additional opposition speeches also has negligible effect on the covariate relationships explored in Model 2 (see Appendix B.2). It might appear odd that lexical similarity scores between government speeches and all opposition speeches are generally higher than those between government and official opposition speeches, given the increased diversity of speeches represented in the measurement. However, these results are consistent with the theoretical logic underlying the proposition that minority parliaments are more accountable, and that lexical similarity scores measure parliamentary accountability rather than ideological polarization. Third parties may be more likely to consider supporting a government rather than mechanically opposing it as the official opposition typically does, which would be reflected in higher similarity values as the government and the third party align to a greater extent in their communication strategies. This logic derives from the historical circumstances of the Canadian party system, which, for most of the period studied here, has had as its consistent third party the left-leaning CCF/NDP, typically disposed to support the governing Liberals. A second explanation, and one more relevant to understanding how minority governments yield higher similarity scores in general, is procedural. Minority governments must negotiate to maintain agenda control and accomplish their legislative program. In order to do so, they must keep third parties apprised of the government’s direction and intentions. Recall that in the previous chapter, one particularly high similarity topic discovered during the 39th Parliament consisted of questions related to the government’s plans for the legislative timetable. Such day-to-day negotiation of debate priorities is characteristic of parliamentary accountability as I conceptualize and measure it here. An analysis comparing Canadian minority and majority governments should ideally encompass as broad a time range as possible in order to increase reliability. Minority governments are far outnum- bered by majorities in Canadian political history, with most of them clustered in the last two decades, underscoring the need to include as many early examples of minorities as possible. Therefore, I also examine lexical similarity scores across government and official opposition speeches aggregated at the daily debate level. Moving from Question Period to the daily debate level of analysis, however, intro- duces considerably more procedural noise into the data. To minimize these issues, I begin the analysis with the first Parliament following World War II (the 20th Parliament, beginning September 1945), after which the Canadian party system remained relatively stable until 1993. The typography, style, and organization of debates recorded in the Canadian Hansard shifted substantially on several occasions prior to the 1970s, making OCR and structural recovery of these data more inconsistent the older the volume. A mid-century start date therefore represents a compromise balancing quality of the raw data with historical scope. Increasing the historical scope of the dataset by moving to the daily debate level of analysis yields Chapter 6. Quantitative Analysis 111

Mean Question Period Similarity (Government and All Opposition) Per Session

0.75

0.70

0.65 Majority Status Minority

0.60 Majority Lexical Similarity

0.55

0.50 30-1 30-2 30-3 30-4 31-1 32-1 32-2 33-1 33-2 34-1 34-2 34-3 35-1 35-2 36-1 36-2 37-1 37-2 37-3 38-1 39-1 39-2 40-1 40-2 40-3 Parliament-Session

Figure 6.4: Plot of Question Period lexical similarity scores averaged on a parliamentary session basis, calculated between government speeches and speeches from all opposition parties. Red data points represent minority governments and blue majority governments. Mean Debate Similarity (Government and Official Opposition) Per Session

0.85

0.80

0.75 Majority Status

Minority Majority 0.70 Lexical Similarity

0.65

0.60 20-1 20-2 20-3 20-4 20-5 21-1 21-2 21-3 21-4 21-5 21-6 21-7 22-1 22-2 22-3 22-4 22-5 23-1 24-1 24-2 24-3 24-4 24-5 25-1 26-1 26-2 26-3 27-1 27-2 28-1 28-2 28-3 28-4 29-1 29-2 30-1 30-2 30-3 30-4 31-1 32-1 32-2 33-1 33-2 34-1 34-2 34-3 35-1 35-2 36-1 36-2 37-1 37-2 37-3 38-1 39-1 39-2 40-1 40-2 40-3 41-1 41-2 Parliament-Session

Figure 6.5: Plot of daily debate lexical similarity scores averaged on a parliamentary session basis, calculated between government speeches and official opposition speeches only. Red data points represent minority governments and blue majority governments. Chapter 6. Quantitative Analysis 113 about twice as many observations of parliamentary sessions as in the Question Period dataset. In Model 4 in Table 6.2, I replicate the previous analysis of the effect of majority status on lexical similarity, using the 9102 daily-level observations and again incorporating random effects for quarter, session, and parliament. In Figure 6.5, I visualize the trend of lexical similarity scores since the 20th Parliament at the parliamentary session level. In contrast to my previous analysis using the Question Period dataset, I find no significant effect of majority status on lexical similarity scores. Visually, Figure 6.5 confirms that the phenomenon of greater accountability during minority parliaments is apparently a recent one. Earlier minority parliaments, such as the 25th and 26th Parliaments, appear indistinguishable from neighbouring majorities in historical context.

One potential explanation for the discrepancy between these results is fundamental structural change. With the exception of the anomalous and short-lived Progressive Conservative minority of the 31st Par- liament, the minority governments studied in the previous Question Period analysis were characteristic of the contemporary Canadian party system, roughly beginning with the 38th and 39th Parliaments (those studied in the previous chapter). Perhaps the post-1970 political era is structurally different such that minority parliaments are only more accountable within the contemporary party system and insti- tutional climate. To test this proposal, I analyze the daily debate results filtered to the same time frame as the Question Period analysis. Still, I find no significant difference between majority and minority parliaments either at the daily or parliament level of analysis (see Appendix B.4 and B.5). An additional confirmation is the negligible ICC for random effect groupings shown in the bottom-half of the table in Model 4 in Table 6.2. To summarize, if parliamentary sessions differed substantially from one another in their variance due to structural variation across time, we would expect a substantial component of model variance to be contributed by these random effects. However, this is not the case in Model 4. In sum, structural change does not explain the discrepancy between the Question Period and daily debate findings regarding accountability in minority parliaments.

An alternate explanation is that the additional noise in the daily debate data, both contextual and in the raw textual source, is too problematic to yield viable analytical results. Visually, there appears to be some level of correlation between the Question Period results in Figure 6.3 and the daily debate scores after the 30th Parliament in Figure 6.5. To study this possibility, I again filter the daily debate dataset to the same historical timespan covered by the Question Period dataset and compare lexical similarity scores calculated for the same days across each dataset. The correlation across these two sets of results is 0.651 when aggregated at the session level, but only 0.205 at the daily level. In other words, there is a substantial increase in random noise moving from the Question Period to the daily debate level for any given day of debate. For the lexical similarity analysis, which assumes a consistent textual context for scores to meaningfully measure parliamentary accountability, the validity of the daily debate analysis is questionable. To conclude, based on findings from analysis of the Canadian Question Period dataset, I find support for the proposition P1 that minority governments are more accountable than majority governments. However, these findings are applicable only to the period from the 30th Parliament onward, during which a consistent Question Period appeared on the daily House of Commons timetable. I cannot draw any broader conclusions about minority parliaments historically, owing to the problematic level of noise in the more general daily debate dataset necessary for studying debates prior to the institutionalization of Question Period. Chapter 6. Quantitative Analysis 114

Mean Debate Similarity (Government and Official Opposition) Versus Government Seats

0.85

0.80

0.75 Majority Status

Minority 0.70 Majority Lexical Similarity

0.65

0.60 0.4 0.5 0.6 0.7 0.8 Government Seat Percentage

Figure 6.6: This plot compares the percentage of seats held by the government with daily debate lexical similarity scores averaged on a parliamentary session basis, calculated between government speeches and only official opposition speeches. Red data points represent minority governments and blue majority governments.

P2: Accountability (lexical similarity) increases as the seat percentage of the government decreases.

The next proposition I investigate is a finer-grained version of the minority parliament problem. Assume minority governments are more accountable than majorities, and that Parliament’s motivation to behave in an accountable manner derives from the perceived stability of the sitting government. Then, the greater the percentage of seats controlled by a government in the House of Commons, the less accountable should be the Parliament, and vice-versa (P4). As with the previous analysis, I consider this problem at two different levels of analysis using the Question Period and daily debate datasets. The results from the former are available in Model 3, Table 6.1, and from the latter in Model 5, Table 6.2. Finally, to summarize this analysis visually, results from the daily debate analysis are shown in Figure 6.6. As is evident in Figure 6.6, and as is confirmed by both the statistical models referenced above, there is no significant evidence of a relationship between government seat percentage and lexical similarity scores. The most reasonable explanation for this finding is one consistent with the norms of collective accountability in the Canadian political context. Party discipline is so strong and reliable a force in the Canadian House of Commons that a government is unlikely to face different incentives given a marginal difference in seat numbers. As discussed in Chapter 2, the threat of defection of a government MP on any given vote is a negligible consideration for a Canadian governing party, a phenomenon that can Chapter 6. Quantitative Analysis 115 reasonably be extended to “defection” in the content of speeches. However, this negative result in the Canadian context raises an interesting possibility for further comparative analysis, to study how the strength of party discipline across different Westminster parliaments shapes the relationship between number of seats and accountability in debate. This is even more the case considering political variation across parliamentary sessions (captured in the ICC in the random effects terms) appears to wash out all effects in Model 3. In other words, the relationship between seat percentage and accountability is too complex a puzzle for a case study of a single country, at least using the Canadian example.

P3: As the government’s polling popularity decreases, accountability increases (lexical similarity increases).

An election is the ultimate evaluation, and potentially sanction, of a government for its accountability in office, representing the third and final stage of parliamentary accountability in Bovens’ empirical model. The next two theoretical predictions I study in this chapter are based on the proposal that polling popularity is an approximation of how well a government would perform if an election were held at the time of the poll.4. My theoretical expectation is of a negative relationship between parliamentary accountability and government polling popularity. This is because, as discussed in Chapter 3, the more pressured a government feels for its survival the more likely it should be to behave in an accountable manner in the House of Commons (P3). Likewise, the lower a government polls, the more motivated the opposition should be to demonstrate its effectiveness in Parliament as an alternative to the . Such a relationship should not only be proportional to the threat of electoral loss, but also to the threat of a vote of non-confidence; this same theoretical logic underlies the proposition that minority governments ought to be more accountable than majority governments.

P4: As government polling popularity increases above the threshold needed to form a majority, accountability remains stable (lexical similarity remains stable) and does not vary across majority and minority governments.

A related proposition (P4) draws on prior research on the relationship between polling popularity and the policy responsiveness of Canadian majority and minority governments. Pickup and Hobolt (2015) observe that governments behave differently when their polling support is either below or above the level at which they can be expected to win a majority in the next election. In the Canadian electoral system, this turning point value is generally accepted to be about 40%. I expect to find a corresponding turning-point in the relationship between polling popularity and similarity scores.5 Model 2 in Table 6.1 tests a linear model of the relationship between polls and similarity, while Figure 6.7 summarizes the relationship visually. Model 2 in Table 6.1 shows that government poll numbers are indeed significantly related to parliamentary accountability, although the effect size is much smaller than that of majority status. Government polls are negatively related with lexical similarity at p = 0.10 and a simulated R2 ≈ 0.02. This weakly corroborates my expectation that as governments gain public

4Indeed, that is directly implied by the way the standard wording of the polling question as was asked in the polls from which I draw my data: “If an election were to be held tomorrow, for which party would you vote?” 5This 40% “rule of thumb” value, as well as the strength of the relationship between polls and parliamentary account- ability, is likely to vary according to the effective number of parties within a given parliament (Laakso & Taagepera, 1979). To explore this possibility, I repeated the analysis in this section including the effective number of parties and its interac- tion (Marcelino, 2016) and found negligible results. As discussed in the Conclusion, a comparative analysis incorporating greater variation in party systems would be a necessary next step in investigating this negative result. Thanks to John McAndrews for his feedback on this question. Model 4 Model 5 Predictors Estimates CI p Estimates CI p

(Intercept) 0.741 0.718–0.764 <0.001 0.862 0.592–1.131 <0.001 majorityMajority 0.001 -0.031–0.032 0.966 -0.129 -0.418–0.160 0.472 gov_partyLiberal -0.024 -0.057–0.009 0.254 majorityMajority×gov_partyLiberal -0.017 -0.059–0.024 0.507 govtseatpct -0.294 -0.893–0.305 0.43 majorityMajority×govtseatpct 0.263 -0.360–0.886 0.496

Random Effects σ2 0.01 0.01

τ00 0.00qtr 0.00qtr 0.00sessnum_factor:parlnum_factor 0.00sessnum_factor:parlnum_factor 0.00parlnum_factor 0.00parlnum_factor ICC 0.03qtr 0.03qtr 0.01sessnum_factor:parlnum_factor 0.01sessnum_factor:parlnum_factor 0.09parlnum_factor 0.13parlnum_factor Observations 9102 9102 Marginal R2/ Conditional R2 0.042 / 0.172 0.008 / 0.179

Table 6.2: Models 4 and 5 investigate the effect of majority status, governing party, and seat percentage controlled by the government on lexical similarity scores. Both models are calculated on individual daily observations from the daily debate dataset and include random effects for quarter, session, and parliament. Chapter 6. Quantitative Analysis 117 popularity, they feel less of a need to be accountable in Parliament given their relative electoral safety. The interaction between government polls and a lagged variable (that is, the government polling data from the previous quarter) is significant at the p = 0.05 level, and shows twice the effect size (simulated R2 ≈ 0.04). This interaction effect implies that for a government that polled highly one quarter ago, the marginal effect of higher polling numbers on lexical similarity in the current quarter increases. In other words, when governments are more certain their poll numbers are high and rising, Parliament is even less likely to be accountable. Likewise, when a government’s popularity was low in the last quarter, low polls in the subsequent quarter have a greater marginal positive effect on lexical similarity. These results are consistent with the theoretical expectation that accountability is positively related to the perceived threat of a governing party’s loss of control of Parliament. For a visualization of this interaction, please refer to Appendix B.7. Finally, note in Model 2 the ICC (intraclass correlation coefficient) of the random effects of parliament and session on the model in the lower half of the table. ICC is a succinct method of summarizing the contribution to variance of the random effect terms in a mixed model, and is defined as the between- cluster variance divided by the total variance. In the case where ICC is 0, the variance of observations within groups is equal to the variance of observations between groups. In other words, there is no impact of this grouping of observations on the model to account for, and a simpler linear model could be used instead (Lüdecke, 2018). In Model 2, about one-third of the variance summarized by the model is attributable to differences between parliaments (adding in an additional hierarchical grouping, sessions within parliaments, has a negligible impact on this ratio). To sum up, these random effects show that political variation across Parliaments is important to bear in mind when interpreting trends in accountability; however, even accounting for these political random effects, the effect of government poll popularity persists. In Figure 6.7, quarterly mean Question Period lexical similarity scores are plotted against government polling popularity (as a percentage). This plot provides visual evidence that a non-linear model could be more appropriate for understanding this relationship than the linear model just discussed. Fitting a loess regression to these data, as seen in Figure 6.7, yields a trend with a bell curve shape, peaking at about 30 to 35% popularity.6 A loess curve is appropriate for visualizing and interpreting a smoothed non-linear relationship, but not for fitting a generalizable non-linear model. To assess the goodness of fit of a non-linear approach, I compare a simple linear model against a smoothing spline model with 4 degrees of freedom (selected through cross-validation) as well as a polynomial regression model with degree 4 (see Appendix B.3). Both the polynomial and spline models yield a significantly stronger fit than the linear model, with adjusted R2 of about 0.09 and 0.14 respectively. To summarize, the basic expectation of P2—that the effect of government polling popularity on accountability is non-linear—can be confirmed. The next step is to assess whether the majority government polling threshold (in Canada, a party polling at about 40% in nationwide polls can expect to form a majority government if an election were held) represents a turning point in this non-linear relationship. To determine whether a significant break point in lexical similarity scores occurs at approximately 40% popularity, I perform a structural change analysis (Zeileis, Kleiber, Krämer, & Hornik, 2003). This approach is most associated with econometrics, specifically in application to time series for which the assumption of structural stability

6Loess is a non-linear modelling procedure that leverages the power of least squares regression. At each data point, a weighted least squares fit is calculated using a subset of data (controlled via a smoothing parameter) drawn from its close neighbours. Chapter 6. Quantitative Analysis 118

Mean Question Period Similarity (Government/Official Opposition) Versus Government Polls

0.65

0.60 Majority Status Minority Majority

Lexical Similarity 0.55

0.50

0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 Government Poll Popularity

Figure 6.7: Plot of quarterly government poll popularity versus quarterly means of Question Period similarity scores. Points are coloured to indicate whether the government was a majority or minority. The curved line is a loess model with corresponding 95% CI. The vertical lines represent the best-fitting model for a break point from a structural change model and corresponding 95% CI.

produces substantial modelling error. Assume covariates of interest have structurally different effects at different thresholds of the independent variable. In this case, it may be appropriate to fit a series of linear models to capture discontinuous but stable linear relationships separated by structural break points. Given some pre-specified n of breakpoints, the model optimizes their position by minimizing the residual sum of squares of the linear fits within each segment. In order to select the optimal value for n, I compare successive fits in order to minimize their BIC.7. The best-fitting structural break model yields one break point at 32.8% government popularity, with a 95% confidence interval (CI) from 27% to 37.1%. This break point and its corresponding CI are shown as solid and dotted vertical lines respectively in Figure 6.7. For complete results from this analysis, see Appendix B.1. To summarize, while a critical poll threshold exists above and below which the accountability behaviour of governments fundamentally changes, it is at a slightly lower percentage value than anticipated in P2. The overall pattern of results makes sense in light of theoretical expectations. First, a government with very low popularity has little incentive to behave in an accountable manner given it is likely to lose regardless of its efforts, and can obtain greater political benefits from accomplishing ideological goals and communicating these successes while it remains in office. It will also have trouble maintaining the internal discipline necessary for effective collective accountability given its members face few incentives to

7BIC, or Bayesian information criterion, is a metric for relative model quality that balances goodness of fit against number of parameters. A lower value is preferable (Burnham & Anderson, 2004) Chapter 6. Quantitative Analysis 119 respect party control over their behaviour in Parliament. Second, governments with high levels of public popularity also have less incentive to be accountable as they do not face significant pressure for their sur- vival, and derive relatively greater political benefits from communicating their policy successes. Finally, between these poles, Parliament sits under uncertain conditions wherein both the current governing party and the largest opposition party have a reasonable chance of forming the next government—one that is likely to be a minority. In this case, both government and opposition face incentives to appear accountable to the public and the costs of an ideologically-motivated misstep are high. These patterns can be seen in the data in Figure 6.7. From a linear perspective, government popularity and lexical sim- ilarity are positively related until a threshold level of certainty about Parliament’s stability is attained, and then a negative relationship occurs. Through the non-linear lens, there is a relatively constant baseline of lexical similarity scores at lower and higher government popularity values, but a transition band exists between these two cases within which similarity is more responsive to marginal changes in polling popularity. The uncertainty surrounding the optimal political strategy in this interstitial case is a potential explanation of why the result of the critical value observed is less than the 40% “rule of thumb” of Canadian electoral politics.

Mean Question Period Similarity (Government/Official Opposition) Versus Government Polls, Majority Governments Only

0.65

0.60

Lexical Similarity 0.55

0.50

0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 Government Poll Popularity

Figure 6.8: Plot of quarterly government poll popularity versus quarterly means of Question Period similarity scores, selecting only those observations from majority parliaments. The curved line is a loess model with corresponding 95% CI. The vertical lines represent the best-fitting model for a break point from a structural change model and corresponding 95% CI.

It is important to note that the line of causality is unclear in this relationship and there is endogeneity involved. In other words, this analysis does not shed light on whether low government polling numbers cause lower accountability or vice versa, and in reality both probably affect each other in complex ways. Chapter 6. Quantitative Analysis 120

Furthermore, some component of this relationship is likely attributable to a minority parliament effect. This is visually apparent in Figure 6.7, in which the red dots signifying minority status are clustered around the break point. Referring back to Model 2 in 6.1, discussed earlier in this section, the majority status effect is also evident, and significant, in a the linear model of this relationship. If minority parliaments are more accountable than majorities, as explored by P1, and minority governments by default are more likely to poll in a lower range than majority governments, then the analysis in this section may in part reflect the influence of minority status as a confound. To test for this possibility, I repeat both the structural breakpoints and non-linear modelling approaches on a filtered dataset containing only majority parliament observations. A visualization of these results is available in 6.8 and additional model results are available in the second part of Appendix B.3. To summarize these results, a similar pattern exists and is significant among majority parliaments alone, with one break point separating a positive relationship between government polls and similarity at lower poll levels, and a negative relationship at high poll levels. However, model fit is much weaker for this dataset. Additionally, as can be observed in 6.8, the threshold polling value for behaviour change is higher among majority governments alone, at just over 35%. This interesting difference leads back to the question posed in the second half of P2: above the “majority threshold” of government popularity, are majority and minority governments equally account- able? Unfortunately, I do not have enough data to assess this question. Confirming observations from earlier in this section, a simple linear model reveals there is no significant relationship between govern- ment popularity and lexical similarity above 40%. However, there are not enough historical examples of popular Canadian minority governments to assess whether majorities and minorities differ in this area with statistical validity. As will be revisited in the conclusion, this is one example where extension to a comparative analysis would shed further light on an interesting phenomenon observed in the Canadian case. To summarize, in this analysis section I found support for P3: governments are more accountable (measured via lexical similarity) as their poll numbers decrease, an effect that is magnified by the interaction of a lagged poll variable. This effect is non-linear, as anticipated, and a significant break point exists at which a positive relationship between government poll popularity and lexical similarity flips direction to a negative one. This critical threshold was slightly lower than anticipated, however, at 33%. Finally, there are insufficient data to support the hypothesis (P4) that minority and majority governments behave similarly at high levels of polling popularity.

P5: Accountability (as lexical similarity) and ideology (as semantic dissimilarity) are neg- atively related.

The final prediction I study in this chapter seeks to establish evidence of a proposed theoretical trade- off between accountability and ideology in parliamentary speeches. My approach to this problem is to compare two contrasting methods of calculating textual similarity scores: the lexical similarity method employed thus far, and a word embedding-based approach more appropriately used to capture semantic similarity (see Chapter 4). Recall that in Chapter 3, I categorized parliamentary speeches into three categories. First, and most central to this dissertation, a debate exchange could concern parliamentary accountability. In this case, I expected government and opposition speeches to have high lexical similarity. I also specified that collective accountability demands a shared context between government and opposition participants. Chapter 6. Quantitative Analysis 121

In other words, for a lexical similarity measure to validly capture accountability, high lexical similarity must co-occur with high semantic similarity: opposition and government should not only use the same words, but possess some mutual understanding of their meaning, if one is to be holding the other to account. As the qualitative evaluation in Chapter 5 demonstrated, this was frequently the case in high lexical similarity speech examples consistent with government accountability. The second case of parliamentary speech represents an ideological contest. My qualitative analysis revealed how a lexical similarity measure for accountability can be confounded when a policy contest underlies a speech exchange. Recall that within controversial policy topics, such as Environment in the 39th Parliament, speech exchanges with a high lexical similarity score could nevertheless contain a surprising amount of spurious partisanship. Such a measurement inconsistency occurs as a lexical gap exists between parties when they hold different ideological positions on a shared topic of discussion. This is the reason why I argued in Chapter 3 that the lexical similarity measurement of accountability is most valid assuming the first case of parliamentary speech, while a classification approach to measuring ideology is more valid under the second. The final case of parliamentary speech is unrelated or orthogonal debate. I observed repeated ex- amples of this phenomenon in the low similarity exchanges selected for qualitative review. Orthogonal exchanges were characterized by low lexical similarity, but also low semantic similarity. Opposition members would ask loaded or unanswerable questions, government members would completely dodge what was asked, and both sides engaged in repetitious partisan insults. This case draws attention back to an important feature of responsible government: parliamentary accountability is a two-way street. Governments face incentives to be more or less accountable to the House of Commons, but the quality of parliamentary accountability is also dependent on the ability or willingness of an opposition to demand accountability from the government. The final analytical proposition studied (P5) asserts a measurable empirical distinction between the first and second case. To the extent to which MPs do not make orthogonal speeches—that is, if we assume all parliamentarians are behaving on-task—my theoretical model implies they and their parties face a trade-off between emphasizing accountability or partaking in ideological debate in their speechmaking. The analysis in this section is a preliminary attempt to find quantitative evidence of this trade-off. One of the many measurement challenges involved is the endogeneity of the textual measures corresponding to these two concepts. Algorithms to measure lexical and semantic similarity use the same textual data and relatively similar math to arrive at a final value. Nevertheless, I expect to observe that lexical and semantic similarity measures are correlated with each other, indicative of the trade-off between accountability and ideology, but that the strength and potentially direction of this relationship is moderated by other politically significant variables. In Figure 6.9, I calculate lexical and similarity scores using the daily debate dataset, aggregate them at the parliamentary session level, and plot the relationship between them.8 To quantify the trend

8Given I have observed problematic error resulting from increased textual noise using the daily debate dataset for the lexical similarity analysis above, why use it for this analysis? The training of a word embedding model performs better the greater the volume of textual data (essentially, the majority of the model training period amounts to developing a model of the English language). In other words, the additional institutional “noise” within a parliamentary context becomes an advantage to the embedding model, as it imparts additional low-level contextual data about what parliamentary speech looks like to the vector finally assigned to each word. To validate my decision, I replicated the analysis in this section using the much smaller Question Period dataset and found a strong positive correlation between lexical and semantic similarity, in contrast to the results observed using the much larger daily debate dataset for training the model. This is consistent with expectations that holding context constant is necessary for performing a valid measurement of accountability using the lexical similarity measure, and that additional ideological and institutional noise renders the measure less effective. On the other hand, that noise is useful data for the embedding given we are precisely concerned with how parties use the Chapter 6. Quantitative Analysis 122

Lexical versus Semantic Similarity (Government and Official Opposition) Per Session

0.76

Majority Status

0.72 Minority Majority Lexical Similarity 0.68

0.35 0.40 0.45 0.50 0.55 0.60 Semantic Similarity

Figure 6.9: This plot compares two methods of calculating similarity scores, representing lexical and semantic similarity measures. I employ the daily debate dataset to calculate scores between government and official opposition speeches and averaging scores on a parliamentary session level. Red data points represent minority governments and blue majority governments. Model 6 Model 7 Predictors Estimates CI p Estimates CI p

(Intercept) 0.86 0.83–0.90 <0.001 0.4 0.34–0.46 <0.001 semantic_similarity -0.25 -0.32– -0.18 <0.001 0.7 0.68–0.72 <0.001 majorityMajority -0.00 -0.02–0.01 0.696 -0.01 -0.08–0.07 0.839 gov_partyLiberal -0.02 -0.05– -0.01 0.014 -0.03 -0.12–0.06 0.572 majorityMajority×gov_partyLiberal -0.01 -0.04–0.01 0.328 0.01 -0.08–0.11 0.804

Random Effects σ2 0.00

τ00 0.00qtr 0.00sessnum_factor:parlnum_factor 0.01parlnum_factor ICC 0.01qtr 0.01sessnum_factor:parlnum_factor 0.59parlnum_factor* Observations 62 9102 Marginal R2/ Conditional R2 0.676 / 0.653 0.304 / 0.727

Table 6.3: Models 6 and 7 investigate the relationship between semantic similarity and lexical similarity, employing the daily debate dataset. Model 6 uses averaged observations at the parliamentary session level, while Model 7 studies individual daily-level observations and includes random effects for quarter, parliament, and session. Note the significant negative relationship between lexical and semantic similarity across both models. Chapter 6. Quantitative Analysis 124 observed in this plot, I fit a corresponding linear model at the parliamentary session level in Model 6 of Table 6.3 including a control for majority government status, which, as discovered earlier in this chapter, has a significant impact on similarity scores. This model finds support for a significant negative correlation between lexical and semantic similarity when observed at the parliamentary session level. To confirm these results, I replicate the model using daily observations including random effects in Model 7 of Table 6.3. At this level of analysis, I find the opposite pattern: the significant relationship between semantic similarity and lexical similarity in this model is positive. However, note the high ICC of the random variable parliamentary session; the majority of the variance observed in the daily- level linear model is attributable to variance across parliamentary sessions. In other words, political circumstances that differ across sessions have a critical impact on the relationship between lexical and semantic similarity. These results are consistent with my theoretical model discussed in Chapter 3. First, they support my emphasis on an appropriate level of analysis within which a shared institutional and political context makes textual similarity theoretically meaningful. Second, they indicate the expected interaction between the strength and direction of the relationship between lexical and semantic similarity, and other politically important variables (most of which, including party system and majority status, vary at the parliament level). If these results do capture a trade-off between accountability and ideology within parliamentary sessions, have these dynamics varied over parliamentary history? Figure 6.10 shows semantic similarity per parliamentary session, and can be viewed in comparison with Figure 6.5 earlier in this chapter showing the corresponding timespan employing the lexical similarity measure. Comparing these two plots shows that historical change could be an an alternative explanation for the negative relationship observed. Over time, governments and official oppositions have been converging in terms of their lexical choices (theoretically, becoming more accountable) but diverging in terms of the underlying meanings of these words (becoming more ideologically polarized) since about the 29th Parliament in 1970. On the other hand, a direct comparison between the two historical patterns is potentially problematic due to the different datasets used within each analysis. This observation, and more generally the above findings relating semantic and lexical similarity at different levels of analysis, suggest a multitude of new research questions that necessitate a comparative approach; I will explore many of these in further detail in the Conclusion. Such directions for future research are based on the groundwork of the previous two empirical chapters using Canada as a single country test case. To summarize, I first performed a qualitative assessment and validation of the lexical similarity measure of parliamentary accountability. Second, I used this measure to perform quantitative tests of five theoretical predictions yielded by theoretical model of accountability underlying the measure. I found that government polling popularity has a negative effect on parliamentary accountability (p = 0.08), and the interaction of polling popularity and a lagged poll measure has a positive effect (p = 0.03), although in both cases the effect size is relatively small. I found these polling effects are better modelled as a non-linear relationship. Most simply, a turning point exists in accountability behaviour of parliaments around the 30-35% popularity level; lower than this point, polls have a positive relationship with lexical similarity, and higher than this point a negative relationship. I found that minority parliaments are significantly more accountable than majorities (p = 0.01), but restricted this conclusion to parliaments since the 30th Parliament based on the unavailability of reliable Question Period data prior to this date and the unsuitability of daily debates as an alternative data source. Finally, I investigated the same words in different contexts due to underlying differences of meaning. Chapter 6. Quantitative Analysis 125 theorized relationship between lexical and semantic similarity and found evidence of a trade-off between accountability and ideology using these constructs. Debate Semantic Similarity (Government and Official Opposition) Per Session

0.70

0.65

0.60

0.55 Majority Status

0.50 Minority Majority 0.45 Semantic Similarity

0.40

0.35

0.30 20-1 20-2 20-3 20-4 20-5 21-1 21-2 21-3 21-4 21-5 21-6 21-7 22-1 22-2 22-3 22-4 22-5 23-1 24-1 24-2 24-3 24-4 24-5 25-1 26-1 26-2 26-3 27-1 27-2 28-1 28-2 28-3 28-4 29-1 29-2 30-1 30-2 30-3 30-4 31-1 32-1 32-2 33-1 33-2 34-1 34-2 34-3 35-1 35-2 36-1 36-2 37-1 37-2 37-3 38-1 39-1 39-2 40-1 40-2 40-3 41-1 41-2 Parliament-Session

Figure 6.10: Plot of daily debate semantic similarity scores averaged on a parliamentary session basis, calculated between government speeches and official opposition speeches only. Red data points represent minority governments and blue majority governments. Chapter 7

Conclusion

Skepticism about liberal democracy is on the rise in the 21st century, touching not only young and emerging democracies such as Hungary and the Philippines but established and wealthy jurisdictions including the United States. Westminster parliamentary institutions are not immune to the challenge, as the saga of Brexit (and Parliament’s struggle for involvement in the process) has highlighted. In 2018, some saw signs of a Canadian populist uprising in the campaign appeals and governing strategies of Doug Ford’s Progressive Conservative government in Ontario, and in former Conservative minister Maxime Bernier’s launch of a new national , the People’s Party of Canada. Nevertheless, Parliament remains a relevant force in Canadian politics. The ’s leadership strategy leading up to the October 2019 federal election provides an illustrative example. Upon his leadership victory in October 2017, promised to avoid seeking a seat in Parliament until the next federal election, arguing it was more important to contest a riding where he had personal roots. His leadership strategy emphasized party events and social media photo-ops over establishing a presence in Question Period, a deliberate contrast to his predecessor as leader, Thomas Mulcair, who was well-respected for his debating skills but failed to attain personal popularity among the public. However, the decision to avoid Parliament proved reversible. One year after his leadership victory, opinion polls revealed Singh trailed significantly in leadership approval among Canadians, with nearly half asserting they lacked sufficient information to judge his potential (Abedi, 2018). As EKOS (a polling firm) President Frank Graves put it, “Most voters would think that someone who is presenting him or herself as a potential PM should at least have the authority of being a sitting member of Parliament. Being an MP not only adds legitimacy and authority, it also provides important opportunities for the leader to show his leadership and debate skills in the House (as quoted in Solomon, 2017).” By fall 2018, Singh had changed his perspective on Parliament and stood for by-election in a vacated NDP seat in Burnaby South. Despite the rapid expansion of social media into the political sphere and its disproportionate influence on political news coverage, Canadian voters still appear to value Parliament’s role as a proving ground for opposition leaders. In light of these tensions, advocates for the defence and preservation of democratic governance must honestly evaluate the health of their institutions using valid empirical evidence. From a Canadian perspective, it is an opportune time to re-examine the empirical foundations and normative assumptions underlying the parliamentary decline thesis, which has dominated scholarship in Westminster parliament countries over recent decades. Such an analysis may draw our attention to the accountability functions of

127 Chapter 7. Conclusion 128 responsible government that are typically taken for granted but remain critical to functioning Canadian democracy: Parliament’s role in preventing governments from “clandestine use of power” (Stewart, 1977, 29), in assessing day-to-day the confidence of the people in the fitness of a government and its personnel to lead and to govern, and as a proving-ground for the opposition. This dissertation performed a quantitative empirical assessment of the current state and historical pattern of accountability in the Canadian House of Commons. It addressed the question of whether, as P.H. Russell has proposed, minority parliaments are more accountable than majority parliaments. Related, it investigated the impact on this relationship of independent variables including government polling popularity, party, and the percentage of seats held by the government, as well as the association between measurements of accountability and ideology in parliamentary text. Drawing on work by Bäck and Debus to develop a theoretical model of parliamentary debate, I proposed a trade-off model of accountability and ideology in parliamentary speeches. Then, I adapted Bovens’ model for measuring accountability to develop a quantitative measure of accountability in debate. Assuming a constant debate context, I argued that lexical similarity between government and opposition speeches is a valid measure of parliamentary accountability based on my theoretical perspective. Based on my theory, I generated the following five predictions for quantitative analysis:

P1: Majority parliaments are less accountable than minority parliaments.

P2: Accountability increases as the seat percentage of the government decreases.

P3: As the government’s polling popularity decreases, accountability increases.

P4: As government polling popularity increases above the threshold needed to form a majority, accountability remains stable and does not vary across majority and minority governments.

P5: Accountability (as lexical similarity) and ideology (as semantic dissimilarity) are neg- atively related.

To validate my theoretical assumptions and measurement approach, I began my investigation with a qualitative case study of the 38th and 39th Parliaments, two successive minority governments with two different governing parties (Liberal and Conservative) and Prime Ministers (Paul Martin and Stephen Harper). I performed a close reading of paired interactions between opposition members and government ministers during Question Period across a variety of debate subtopics, from the Liberals’ notorious Spon- sorship Scandal to the Conservatives’ controversial position on Canada’s Kyoto Accord commitments. I selected speech pairs for study based upon their lexical similarity scores (very high and very low) in order to investigate whether there were systematic qualitative differences in speech styles between the two associated with parliamentary accountability. I found that low lexical similarity scores were associated with loaded questions, partisan attacks, and general and repetitious phrasing in question and answer exchanges between opposition and government. High lexical similarity scores reflected passages where government and opposition spoke about the details of a particular piece of legislation or policy area. I concluded that lexical similarity is associated with the qualitative extent to which government and opposition were engaged in parliamentary accountability. The qualitative analysis also raised questions about the relationships between partisanship, ideo- logical polarization, and accountability, and how they vary across policy areas. On topics that were Chapter 7. Conclusion 129 particularly ideologically salient at the time (for example, Environment within the 39th Parliament), partisan sparring regularly occurred together with the dynamics of accountability in speech examples with high lexical similarity. This was less the case within weakly contested policy areas such as Health Care in the 38th Parliament. This pattern provided qualitative confirmation of my assertion that lexical similarity is not an appropriate measure of ideological disagreement, and that the extent to which this is true varies across issue areas, calling into question the simple textual classification approach to measuring ideological polarization in parliamentary debate. When adversarialism is institutionally prescribed as an accountability mechanism, it is difficult even for a human reader to ascertain when “partisan” questioning of government policy is a case of the opposition keeping the government in line or of substantial ideolog- ical differences. The fact that these dynamics varied substantially across issue areas suggests that there are indeed multiple mechanisms informing parliamentary word choice, and supported my measurement strategy of distinguishing between lexical and semantic similarity as appropriate indicators of different phenomena. In the subsequent quantitative phase of the research, I studied Question Periods from 1975-2010 and daily debates from 1945-2015 using the lexical similarity measurement approach to study factors affecting parliamentary accountability. First, I found that minority parliaments are more accountable than majority governments in the period since the 30th Parliament. This finding is based on the dataset of Question Period speeches, as issues with noise and contextual inconstency that emerged withing the larger dataset of daily debate speeches prevented me from drawing conclusions about minority parlia- ments over a broader historical scale. Second, I found no significant relationship between government seat percentage and parliamentary accountability. Third, I found that Parliament is more accountable as a government’s polling popularity decreases. This effect becomes stronger in interaction with a lagged poll variable; that is, Parliament is more likely to behave in an accountable manner if government polls have decreased over two successive quarters. The effect of government polls on my accountability measure was better understood using a structural break model, which yielded a threshold at about 33%. Below this critical value, polling popularity and parliamentary accountability are positively related, and above this value, are negatively related. Put simply, my results deepen our understanding of a general rule-of-thumb in Canadian federal elections: 40% of the vote is sufficient to guarantee a party forms a majority government. I discover that debate between parties over which group is most fit to govern is fiercest slightly below this value in the low 30% range. Fourth, I found insufficient evidence to support the proposition that minority and majority governments behave similarly at high levels of polling popularity. Finally, I extended my analysis of parliamentary debate to include a semantic similarity measurement, intended to capture ideological similarity or polarization. Based on my supporting theory, I expected to observe the existence of a trade-off between accountability and ideology in political communication strategies during parliamentary debate. In the fifth theoretical prediction studied, I confirmed these two measures behave differently. I found evidence of a negative relationship between lexical and semantic similarity at the parliamentary session of analysis, but a positive relationship at the daily level of analysis, suggesting that change in political conditions affects the balance between different goals of political communication. In historical terms, I found preliminary evidence that governments and official oppositions have been converging in terms of their lexical choices (becoming more accountable) but diverging in terms of their semantic meanings (becoming more ideologically polarized) since about the 29th Parliament (1970). Chapter 7. Conclusion 130

What do these results tell us about the health of Canada’s parliament? Although ideological polar- ization seems to be increasing, parliamentary accountability is not in drastic decline. Due to institutional reforms over the past half-century, Canadian governments have more institutional power than ever to assert agenda control. Nevertheless, oppositions continue to perform an accountability role during Ques- tion Period, especially under minority parliament conditions. Increased central control of speechmaking may also mean that the superfluous, personal speeches that Stewart bemoaned as characteristic of un- accountable, poor-performing Parliaments in the mid-20th century occur less often. Finally, Parliament appears to be responsive to signals from the public, via political polls, in shaping their communication strategies around ideological and accountability goals. My results provide some empirical support for Franks’ assertion that, “[c]ontrary to common rhetoric, the Canadian parliament does a reasonably good job in the function of holding the government accountable and making it behave.” (Franks, 1987, 265) However, my analysis cannot provide insight into the question of how, as explored in Donald Savoie’s work, increased centralization of politics and party discipline have affected the meaning of accountabil- ity in Parliament; how shifts in the Canadian party system, which have yielded increased likelihoods of minority parliaments in recent decades, affect accountability; or how the percentage of seats controlled by a governing party can potentially prove decisive for dynamics of accountability under different insti- tutional conditions. To investigate these questions, the measurement of accountability and its trends in Canada must be translated to other parliaments, Westminster or otherwise, for comparative study.

7.1 Ideas for Further Research

In the introduction to this dissertation, I emphasized the value of close consideration of a single country case as an analytical proof-of-concept. This approach allowed me to ground the validity of my results in specific expectations, both quantitative and qualitative, yielded by features of Canadian political institutions. One example is how the effect of geography on Canada’s first-past-the-post electoral system dictated the particular importance of a turning point in polling popularity and its effects on the incentives faced by parliaments to pursue collective accountability. Another is the impact of extremely strong party discipline on theoretical expectations about MP defection and minority government stability. During the qualitative validation phase, the selection of particular policy areas and discussion cases also required a close familiarity with Canadian political history to validate individual examples of accountability in text. Focusing solely on the Canadian case has allowed me to establish and test the foundations of a generalizable theory of parliamentary accountability in debate texts, and develop an associated measurement approach. My findings, both positive and negative, yield an informed starting point for future comparative research. The obvious next step is to extend the accountability measure and an adapted theoretical frame- work to debates in other Westminster parliaments. However, attention must be paid to theoretical expectations surrounding the number of major parties and their levels of party discipline. For example, in Westminster-style parliaments, where the official opposition has a significant institutional role, it is difficult to separate institutionally adversarial debate from ideological opposition for reasons of multi- collinearity. In a two-party system under these conditions, it may be impossible to distinguish both practices by way of linguistic similarity alone. Focusing on the Canadian case simplified these issues on two fronts. First, the additional linguistic data of third party speeches enabled my analysis trade-off between accountability and ideology in the last section of Chapter 6. Second, as discussed in Chapter Chapter 7. Conclusion 131

3, studying the Canadian case allowed me to reasonably assume that parties have complete control over the content of the speeches made by their MPs. A comparative approach would obviate these problems more generally and would permit a deeper theoretical understanding of accountability in parliamentary speech. For example, changes in party discipline over time are small and difficult to measure in the Canadian case. However, they are of great theoretical interest in a generalization of my accountability model. Some variable amount of party discipline is likely to mediate between party position and individual speech content, as is a common feature of both Proksch and Slapin and Bäck and Debus’ debate models. Calculating similarity measures within parties, or between back and front bench members of parties, is a simple step toward quantifying party discipline in parliamentary speeches that could then be incorporated as an additional term in the accountability model. Modelling party discipline could also involve unpacking individual-level political goals and ideological positions, and investigating the distance of individual MPs from the party line on these measures. This would further permit the inclusion of individual MP-level characteristics, such as gender, local district size, or electoral margin, in a study of parliamentary accountability. A related application of this party discipline measure could be as a control variable in ideological classification applications. According to Proksch and Slapin, speech is a strategic game of position-taking both across and within political parties, therefore debate texts will not reflect “true” party positions appropriate for scaling measures. However, the distance between the ideological position of an individual speech and “true” party position is influenced by party discipline and level of agenda control, which could be accounted for if quantified. A final possibility of a focus on the individual level is an assessment of the particular role of ministers in collective accountability. As an exploratory analysis in the context of my qualitative case study of the 38th and 39th Parliaments, I found no significant differences across individual ministers at the within-parliament level primarily due to high variance. Such a comparison is also problematic for validity reasons, given ministers’ speeches will vary in unknown ways based on the characteristics of their portfolio and political events that are difficult to account for. However, it is possible to compare the performance of individual ministers, or of the same ministry, across multiple parliaments. The accountability of Prime Ministers, for example, could be studied in relation to their personal electoral success in their riding and to the electoral success of their party. Individual-level data about floor crossings, resignations, and byelections might also inform a higher-resolution study of the relationship between government seat percentage and accountability. A second area of further research that will support comparative use of my accountability measure is additional validation. The qualitative analysis I performed in Chapter 5 was a first step toward es- tablishing validity, but a systematic content analysis is required to establish a reliable gold standard for the measure. This next step would include developing a formal codebook for identifying parliamentary accountability, training a group of coders, sampling a broader historical range of speech samples, and measuring intercoder reliability. Following these steps, a comparison of human coding results with pre- dictions yielded by my accountability measure will provide a better quantitative picture of its reliability. Another historical stumbling block encountered in this analysis was the lack of a consistent Oral Question Period prior to procedural reform in the 1970s. The Canadian question period has evolved since 1913 from a short opportunity to debate the Orders of the Day to a long and unwieldy session in between Routine Proceedings and Orders of the Day that varied substantially in duration until reforms in 1964. Opposition members also availed themselves of a variety of debate opportunities on the Chapter 7. Conclusion 132 timetable including adjournment debates, debates on motions and “emergency debates” permitted by Standing Order 26 that were curtailed by 1975 (Stewart, 1977, 52–55) Prior to the modernization of the committee system in 1968, the line-by-line review of second reading also took place in Committee of the Whole. As my results in the previous chapter showed, daily debates are too contextually varied to act as a substitute for Question Period debates prior to 1975—for many of the procedural reasons described above. A potential remedy would be to catalogue contextually-relevant Hansard topics from different procedural eras and combine them into a constructed set of “oral questioning” debate. This would require careful attention to successive changes to the Standing Orders since the 1910s in order to identify representative sets of Hansard topics, and additional manual processing of the Lipad dataset to identify and clean up references to these topics. However, the inclusion of these data would significantly extend the historical scope of the study of majority and minority parliaments undertaken in this dissertation.

A common theme across scholarly work on the Canadian Parliament since the 1970s is the role of the media—especially television coverage—in shaping parliamentary behaviour. Franks, Savoie, Russell, Stewart, and Smith, among others, all emphasize that negative media representations of politics, includ- ing MP behaviour in the House of Commons, can shape public opinion, increase ideological polarization, and foment political cynicism. Franks goes so far to assert that “problems of the relationship between the media and the House of Commons are the most important single factor contributing to weakness of parliamentary discussion and parliament as a centre of national politics.” (Franks, 1987, 267) Media coverage should also play a major role in the relationship between government polling popularity and the incentive structure surrounding parliamentary speech decisions. Proksch and Slapin, for example, have found empirical evidence of the media’s role in transmitting both policy ideas and dissenting perspec- tives evidenced in legislative debate (Proksch & Slapin, 2014, 124). Studying the relationship between polls, electoral success, the textual content of media coverage of a government, and accountability in parliamentary speeches from a comparative perspective would help illuminate the causal mechanisms underlying my theoretical perspective. A comparative analysis involving polling data could be used to investigate how majority and minority governments differ in their accountability behaviours, since there was not sufficient data to perform this analysis in the Canadian case.

One of the many critiques of media coverage of the Canadian House of Commons is the suggestion that Question Period is ineffective as an accountability mechanism because politicians, especially the Prime Minister, focus on “news management” and public performance at the expense of serious policy discussion. Savoie argues that the media spotlight on the Prime Minister gives him or her the ability to appeal directly to the average voter on a personal level, contributing to the centralization of political power (Savoie, 2008, 159–160). These critiques imply that political speech, including parliamentary debate, has become simpler and less formal in the television age as its audience has shifted from inside to outside the House—a pattern that has only intensified in the era of social media. One possibility overlooked in this dissertation is that a decrease over time in the formality and complexity of parlia- mentary language resulted in systematically lower lexical similarity scores in earlier parliaments. The concept of “integrative complexity” has been employed in the political psychology literature since the 1980s to explore the relationship between psychological traits, political reasoning, and expression in po- litical speeches (Ballard, 1983; Suedfeld, Bluck, Ballard, & Baker-Brown, 1990; Tetlock, 1981). From a measurement perspective, a decrease in integrative complexity would make linguistic overlap more likely (as simpler speech implies a smaller set of frequently-used words), and thus increase lexical similarity across speeches, independent of the subject of parliamentary debates. To investigate this possibility, syn- Chapter 7. Conclusion 133 tactic complexity or textual readability measures could be applied across the historical dataset in order to study the relationship between integrative complexity, accountability, and ideology (Feng, Jansche, Huenerfauth, & Elhadad, 2010). The final, and most complex, area for further research I have identified is the measurement of ideo- logical polarization. While not the focus of this dissertation, my results in the final section of Chapter 6 suggest that a combination measure incorporating semantic and lexical similarity could capture ide- ological polarization while partially controlling for institutional adversarialism. A textual measurement of polarization is likely to vary systematically across topics of debate; I detected significant evidence of this phenomenon in my qualitative validation chapter. Either a supervised (using debate subtopics captured from Hansard and available in the Lipad dataset) or unsupervised (using topic modelling approaches) approach to including topical information in my ideology measure could improve its perfor- mance. Second, my application of word2vec/doc2vec only scratched the surface in terms of tuning model hyperparameters and testing model optimization to the specificities of the ideological measurement task. These refinements would improve the contextual validity of my semantic similarity measure and allow me to elaborate upon my preliminary findings about the theoretical significance of semantic similarity in parliamentary debate. Appendices

134 Appendix A

Qualitative Results: Additional Analysis

A.1 38th Parliament quantitative tests

### Oneway Anova for y=simil and x=subtopic (groups: Aboriginal Affairs, Agriculture, Child Care, Citizenship and Immigration, David Dingwall, Foreign Affairs, Government Contracts, Health, Justice, National Defence, Natural Resources, Sponsorship Program, The Environment)

Omega squared: 95% CI = [.01; .05], point estimate = .03 Eta Squared: 95% CI = [.02; .04], point estimate = .04

SS Df MS F p Between groups (error + effect) 0.67 12 0.06 5.19 <.001 Within groups (error only) 18.09 1679 0.01

### Levene’s test for homogeneity of variance:

F[12, 1679] = 4.92, p < .001.

### Post hoc test: games-howell

diffci.loci.hi t df p Agriculture-Aboriginal Affairs 0.00 -0.06 0.05 0.29 98.27 1.000 Child Care-Aboriginal Affairs 0.01 -0.05 0.07 0.48 104.99 1.000 Citizenship and Immigration-Aboriginal Affairs -0.02 -0.07 0.03 1.36 80.92 .977 David Dingwall-Aboriginal Affairs -0.01 -0.07 0.04 0.78 111.89 1.000 Foreign Affairs-Aboriginal Affairs 0.04 -0.02 0.10 2.43 112.48 .433 Government Contracts-Aboriginal Affairs -0.02 -0.08 0.04 1.18 111.92 .993 Health-Aboriginal Affairs 0.04 -0.02 0.10 2.17 127.21 .611 Justice-Aboriginal Affairs -0.01 -0.06 0.04 0.80 92.37 1.000 National Defence-Aboriginal Affairs 0.02 -0.03 0.07 1.41 95.31 .970 Natural Resources-Aboriginal Affairs 0.05 -0.02 0.11 2.60 119.99 .325 Sponsorship Program-Aboriginal Affairs 0.02 -0.03 0.06 1.27 70.73 .987 The Environment-Aboriginal Affairs 0.00 -0.06 0.06 0.08 108.81 1.000 ChildCare-Agriculture 0.01 -0.04 0.060.86 98.25 1.000 Citizenship and Immigration-Agriculture -0.02 -0.05 0.02 1.44 221.20 .966 David Dingwall-Agriculture -0.01 -0.06 0.04 0.64 131.16 1.000 Foreign Affairs-Agriculture 0.05 0.00 0.09 3.27 132.76 .067 Government Contracts-Agriculture -0.02 -0.07 0.04 1.12 106.03 .996 Health-Agriculture 0.04-0.01 0.092.88148.05 .178 Justice-Agriculture -0.01 -0.05 0.030.67 229.34 1.000

135 Appendix A. Qualitative Results: Additional Analysis 136

National Defence-Agriculture 0.03 -0.01 0.06 2.24 246.76 .563 Natural Resources-Agriculture 0.05 0.00 0.11 3.27 111.91 .068 Sponsorship Program-Agriculture 0.02 -0.01 0.05 2.24 188.50 .564 The Environment-Agriculture 0.01 -0.04 0.05 0.41 131.84 1.000 Citizenship and Immigration-Child Care -0.03 -0.08 0.02 2.00 80.60 .732 David Dingwall-Child Care -0.02 -0.08 0.03 1.30 111.43 .984 Foreign Affairs-Child Care 0.03 -0.02 0.09 1.94 112.05 .765 Government Contracts-Child Care -0.03 -0.09 0.03 1.67 110.97 .903 Health-ChildCare 0.03-0.03 0.091.71126.89 .887 Justice-ChildCare -0.02-0.07 0.031.40 92.31 .972 National Defence-Child Care 0.01 -0.04 0.06 0.85 95.34 1.000 Natural Resources-Child Care 0.04 -0.02 0.11 2.17 119.00 .618 Sponsorship Program-Child Care 0.01 -0.04 0.06 0.66 70.16 1.000 The Environment-Child Care -0.01 -0.06 0.05 0.44 108.38 1.000 David Dingwall-Citizenship and Immigration 0.01 -0.04 0.05 0.47 109.62 1.000 Foreign Affairs-Citizenship and Immigration 0.06 0.02 0.10 4.72 111.18 <.001 Government Contracts-Citizenship and Immigration 0.00 -0.05 0.05 0.16 88.25 1.000 Health-Citizenship and Immigration 0.06 0.01 0.11 4.17 126.60 .004 Justice-Citizenship and Immigration 0.01 -0.03 0.04 0.77 308.20 1.000 National Defence-Citizenship and Immigration 0.04 0.01 0.07 4.05 347.29 .004 Natural Resources-Citizenship and Immigration 0.07 0.02 0.12 4.43 94.83 .002 Sponsorship Program-Citizenship and Immigration 0.04 0.01 0.06 4.55 497.65 <.001 The Environment-Citizenship and Immigration 0.02 -0.02 0.06 1.66 110.11 .906 Foreign Affairs-David Dingwall 0.05 0.00 0.11 3.45 130.99 .041 Government Contracts-David Dingwall -0.01 -0.07 0.05 0.48 118.97 1.000 Health-DavidDingwall 0.05 0.00 0.113.11146.98 .101 Justice-David Dingwall 0.00 -0.04 0.05 0.11 125.71 1.000 National Defence-David Dingwall 0.03 -0.01 0.08 2.53 130.87 .365 Natural Resources-David Dingwall 0.06 0.00 0.12 3.48 125.86 .037 Sponsorship Program-David Dingwall 0.03 -0.01 0.07 2.50 93.72 .384 The Environment-David Dingwall 0.01 -0.04 0.07 0.94 127.92 .999 Government Contracts-Foreign Affairs -0.06 -0.12 0.00 3.65 119.60 .022 Health-Foreign Affairs 0.00 -0.06 0.05 0.15 148.00 1.000 Justice-Foreign Affairs -0.05 -0.10 -0.01 3.93 127.38 .009 National Defence-Foreign Affairs -0.02 -0.07 0.03 1.48 132.61 .959 Natural Resources-Foreign Affairs 0.01 -0.05 0.07 0.46 126.44 1.000 Sponsorship Program-Foreign Affairs -0.02 -0.07 0.02 1.94 95.10 .769 The Environment-Foreign Affairs -0.04 -0.09 0.01 2.57 128.96 .338 Health-Government Contracts 0.06 0.00 0.12 3.34 134.51 .054 Justice-Government Contracts 0.01 -0.04 0.06 0.65 100.07 1.000 National Defence-Government Contracts 0.04 -0.01 0.09 2.81 103.14 .215 Natural Resources-Government Contracts 0.07 0.01 0.14 3.68 125.96 .020 Sponsorship Program-Government Contracts 0.04 -0.01 0.09 2.78 77.63 .232 The Environment-Government Contracts 0.02 -0.03 0.08 1.35 115.94 .979 Justice-Health -0.05-0.10 0.003.48142.50 .036 National Defence-Health -0.02 -0.07 0.03 1.20 147.51 .992 Natural Resources-Health 0.01 -0.05 0.07 0.58 141.09 1.000 Sponsorship Program-Health -0.02 -0.07 0.02 1.59 110.88 .930 The Environment-Health -0.04 -0.09 0.02 2.28 144.98 .536 National Defence-Justice 0.03 0.00 0.07 3.04 308.74 .116 Natural Resources-Justice 0.06 0.01 0.12 3.81 106.07 .014 Sponsorship Program-Justice 0.03 0.00 0.06 3.22 280.72 .072 The Environment-Justice 0.01 -0.03 0.06 1.00 126.75 .999 Natural Resources-National Defence 0.03 -0.03 0.08 1.76 108.91 .865 Sponsorship Program-National Defence 0.00 -0.03 0.03 0.40 324.54 1.000 The Environment-National Defence -0.02 -0.06 0.02 1.50 132.30 .955 Sponsorship Program-Natural Resources -0.03 -0.08 0.02 2.13 84.77 .642 The Environment-Natural Resources -0.05 -0.11 0.01 2.71 122.77 .261 The Environment-Sponsorship Program -0.02 -0.06 0.02 1.36 93.35 .978

Agriculture Child Care Citizenship and Immigration David Dingwall "a" "b" "cdefg" "hi" Appendix A. Qualitative Results: Additional Analysis 137

ForeignAffairs GovernmentContracts Health Justice "chjk" "jl" "dm" " kmn " National Defence Natural Resources Sponsorship Program The Environment "e" "filn" "g" "o" Aboriginal Affairs "p"

### Welch correction for nonhomogeneous variances:

F[12, 396.03] = 5.42, p < .001.

### Brown-Forsythe correction for nonhomogeneous variances:

F[12, 1005.67] = 5.81, p < .001.

### Cohen’s f^2:

> e = 0.67/(18.09+0.67) > e / (1-e) [1] 0.03703704 A.2 39th Parliament quantitative tests

### Oneway Anova for y=simil and x=subtopic (groups: Aboriginal Affairs, Afghanistan, Airbus, Business of the House, Canadian Wheat Board, Child Care, Citizenship and Immigration, Elections Canada, Ethics, Foreign Affairs, Government Appointments, Health, Justice, National Defence, The Budget, The Economy, The Environment)

Omega squared: 95% CI = [.03; .06], point estimate = .04 Eta Squared: 95% CI = [.03; .06], point estimate = .05

SS Df MS F p Between groups (error + effect) 1.1 16 0.07 7.37 <.001 Within groups (error only) 20.72 2219 0.01

### Levene’s test for homogeneity of variance:

F[16, 2219] = 3.07, p < .001.

### Post hoc test: games-howell

diffci.loci.hi t df p Afghanistan-Aboriginal Affairs 0.00 -0.03 0.03 0.53 273.12 1.000 Airbus-Aboriginal Affairs -0.01 -0.04 0.03 0.71 281.07 1.000 Business of the House-Aboriginal Affairs 0.04 -0.01 0.10 2.78 92.56 .324 Canadian Wheat Board-Aboriginal Affairs 0.04 -0.02 0.09 2.36 104.67 .608 Child Care-Aboriginal Affairs 0.09 0.02 0.16 4.50 73.76 .003 Citizenship and Immigration-Aboriginal Affairs -0.03 -0.07 0.01 2.40 138.08 .584 Elections Canada-Aboriginal Affairs -0.02 -0.06 0.03 1.27 208.69 .998 Ethics-Aboriginal Affairs -0.03 -0.08 0.03 1.82 103.39 .921 Foreign Affairs-Aboriginal Affairs 0.01 -0.02 0.05 1.10 340.72 1.000 Government Appointments-Aboriginal Affairs -0.02 -0.07 0.02 1.72 124.86 .949 Health-Aboriginal Affairs 0.01 -0.05 0.07 0.70 94.74 1.000 Justice-Aboriginal Affairs -0.02 -0.06 0.02 1.53 179.38 .983 National Defence-Aboriginal Affairs -0.02 -0.07 0.02 2.08 226.51 .803 The Budget-Aboriginal Affairs -0.01 -0.06 0.04 0.89 135.73 1.000 The Economy-Aboriginal Affairs -0.01 -0.06 0.04 0.99 103.22 1.000 The Environment-Aboriginal Affairs -0.02 -0.05 0.01 2.52 253.98 .493 Airbus-Afghanistan 0.00-0.03 0.030.34233.541.000 Appendix A. Qualitative Results: Additional Analysis 138

Business of the House-Afghanistan 0.05 0.00 0.10 3.33 69.65 .099 Canadian Wheat Board-Afghanistan 0.04 -0.01 0.09 2.87 79.97 .275 ChildCare-Afghanistan 0.09 0.03 0.164.95 61.97 .001 Citizenship and Immigration-Afghanistan -0.03 -0.06 0.01 2.29 94.69 .658 Elections Canada-Afghanistan -0.01 -0.05 0.03 1.02 150.03 1.000 Ethics-Afghanistan -0.02-0.07 0.031.64 76.36 .965 Foreign Affairs-Afghanistan 0.02 -0.01 0.05 1.88 376.88 .901 Government Appointments-Afghanistan -0.02 -0.06 0.03 1.53 89.39 .981 Health-Afghanistan 0.02-0.04 0.081.02 76.151.000 Justice-Afghanistan -0.01-0.05 0.021.31126.56 .996 National Defence-Afghanistan -0.02 -0.06 0.02 1.94 165.79 .872 TheBudget-Afghanistan -0.01 -0.05 0.040.62 98.761.000 TheEconomy-Afghanistan -0.01 -0.06 0.04 0.73 75.05 1.000 The Environment-Afghanistan -0.02 -0.04 0.00 2.72 865.41 .344 BusinessoftheHouse-Airbus 0.05 0.00 0.10 3.24 95.04 .116 Canadian Wheat Board-Airbus 0.04 -0.01 0.10 2.82 107.26 .293 ChildCare-Airbus 0.10 0.03 0.174.85 75.20 .001 Citizenship and Immigration-Airbus -0.02 -0.07 0.02 1.77 140.83 .937 Elections Canada-Airbus -0.01 -0.05 0.03 0.63 208.08 1.000 Ethics-Airbus -0.02-0.07 0.031.29106.13 .997 ForeignAffairs-Airbus 0.02 -0.02 0.061.80307.96 .929 Government Appointments-Airbus -0.02 -0.06 0.03 1.15 127.85 .999 Health-Airbus 0.02-0.04 0.081.13 96.85 .999 Justice-Airbus -0.01-0.06 0.030.90180.861.000 National Defence-Airbus -0.02 -0.06 0.03 1.42 224.23 .992 TheBudget-Airbus 0.00-0.05 0.040.34138.711.000 TheEconomy-Airbus -0.01-0.06 0.040.45106.011.000 TheEnvironment-Airbus -0.01 -0.04 0.021.58217.59 .978 Canadian Wheat Board-Business of the House -0.01 -0.07 0.06 0.29 120.72 1.000 Child Care-Business of the House 0.05 -0.03 0.13 2.05 100.98 .816 Citizenship and Immigration-Business of the House -0.07 -0.13 -0.01 4.32 108.59 .004 Elections Canada-Business of the House -0.06 -0.12 0.00 3.52 110.54 .052 Ethics-Business of the House -0.07 -0.13 0.00 3.75 115.10 .026 Foreign Affairs-Business of the House -0.03 -0.08 0.02 2.06 89.76 .807 Government Appointments-Business of the House -0.07 -0.13 0.00 3.74 116.28 .027 Health-Business of the House -0.03 -0.10 0.04 1.46 119.96 .989 Justice-Business of the House -0.06 -0.12 0.00 3.69 112.59 .032 National Defence-Business of the House -0.07 -0.12 -0.01 4.13 109.66 .008 The Budget-Business of the House -0.05 -0.12 0.01 3.08 121.86 .168 The Economy-Business of the House -0.06 -0.12 0.01 3.12 112.72 .153 The Environment-Business of the House -0.06 -0.11 -0.01 4.56 67.51 .002 Child Care-Canadian Wheat Board 0.05 -0.03 0.13 2.27 105.81 .673 Citizenship and Immigration-Canadian Wheat Board -0.07 -0.13 -0.01 3.92 120.15 .015 Elections Canada-Canadian Wheat Board -0.05 -0.11 0.01 3.13 123.24 .149 Ethics-Canadian Wheat Board -0.06 -0.13 0.00 3.40 124.80 .073 Foreign Affairs-Canadian Wheat Board -0.03 -0.08 0.03 1.66 101.74 .962 Government Appointments-Canadian Wheat Board -0.06 -0.12 0.00 3.38 127.35 .078 Health-Canadian Wheat Board -0.02 -0.10 0.05 1.18 127.46 .999 Justice-Canadian Wheat Board -0.06 -0.12 0.00 3.29 125.03 .097 National Defence-Canadian Wheat Board -0.06 -0.12 0.00 3.72 122.47 .029 The Budget-Canadian Wheat Board -0.05 -0.11 0.01 2.72 133.30 .351 The Economy-Canadian Wheat Board -0.05 -0.12 0.01 2.77 122.58 .322 The Environment-Canadian Wheat Board -0.06 -0.11 -0.01 4.06 77.62 .011 Citizenship and Immigration-Child Care -0.12 -0.19 -0.04 5.68 87.08 <.001 Elections Canada-Child Care -0.10 -0.18 -0.03 5.05 84.82 <.001 Ethics-ChildCare -0.12-0.19-0.045.19 99.29<.001 Foreign Affairs-Child Care -0.08 -0.15 -0.01 3.95 72.13 .017 Government Appointments-Child Care -0.11 -0.19 -0.04 5.20 94.70 <.001 Health-ChildCare -0.08-0.16 0.013.17113.96 .136 Justice-ChildCare -0.11-0.18-0.035.17 87.12<.001 National Defence-Child Care -0.11 -0.19 -0.04 5.53 83.87 <.001 TheBudget-ChildCare -0.10-0.18-0.024.65 97.56 .001 TheEconomy-ChildCare -0.10 -0.18 -0.024.68 96.65 .001 The Environment-Child Care -0.11 -0.18 -0.04 5.87 60.90 <.001 Elections Canada-Citizenship and Immigration 0.01 -0.03 0.06 1.07 153.76 1.000 Appendix A. Qualitative Results: Additional Analysis 139

Ethics-Citizenship and Immigration 0.00 -0.05 0.06 0.21 117.70 1.000 Foreign Affairs-Citizenship and Immigration 0.04 0.00 0.08 3.34 135.15 .084 Government Appointments-Citizenship and Immigration 0.01 -0.05 0.06 0.43 131.24 1.000 Health-Citizenship and Immigration 0.04 -0.02 0.11 2.25 111.00 .688 Justice-Citizenship and Immigration 0.01 -0.04 0.06 0.79 149.04 1.000 National Defence-Citizenship and Immigration 0.01 -0.04 0.05 0.40 156.06 1.000 The Budget-Citizenship and Immigration 0.02 -0.04 0.07 1.14 140.02 .999 The Economy-Citizenship and Immigration 0.02 -0.04 0.07 1.01 116.79 1.000 The Environment-Citizenship and Immigration 0.01 -0.03 0.05 0.79 90.09 1.000 Ethics-Elections Canada -0.01 -0.07 0.04 0.72 122.62 1.000 Foreign Affairs-Elections Canada 0.03 -0.01 0.07 2.25 210.64 .693 Government Appointments-Elections Canada -0.01 -0.06 0.04 0.56 144.13 1.000 Health-ElectionsCanada 0.03 -0.04 0.091.49110.51 .986 Justice-Elections Canada 0.00 -0.05 0.04 0.27 184.48 1.000 National Defence-Elections Canada -0.01 -0.06 0.04 0.71 208.15 1.000 The Budget-Elections Canada 0.00 -0.05 0.06 0.19 154.64 1.000 The Economy-Elections Canada 0.00 -0.05 0.06 0.08 122.67 1.000 The Environment-Elections Canada -0.01 -0.04 0.03 0.60 142.04 1.000 ForeignAffairs-Ethics 0.04 -0.01 0.092.61100.30 .429 Government Appointments-Ethics 0.00 -0.06 0.06 0.18 124.12 1.000 Health-Ethics 0.04-0.03 0.111.92121.11 .882 Justice-Ethics 0.01-0.05 0.060.48123.911.000 National Defence-Ethics 0.00 -0.05 0.06 0.13 122.06 1.000 TheBudget-Ethics 0.01-0.05 0.080.82130.531.000 TheEconomy-Ethics 0.01-0.05 0.070.71117.961.000 TheEnvironment-Ethics 0.01 -0.04 0.050.39 73.781.000 Government Appointments-Foreign Affairs -0.03 -0.08 0.01 2.58 121.54 .451 Health-ForeignAffairs 0.00 -0.06 0.060.06 92.341.000 Justice-Foreign Affairs -0.03 -0.07 0.01 2.48 178.26 .523 National Defence-Foreign Affairs -0.04 -0.08 0.00 3.08 230.95 .158 The Budget-Foreign Affairs -0.02 -0.07 0.02 1.72 132.42 .949 The Economy-Foreign Affairs -0.03 -0.08 0.02 1.79 100.07 .927 The Environment-Foreign Affairs -0.03 -0.06 0.00 3.98 350.75 .009 Health-Government Appointments 0.04 -0.03 0.10 1.83 119.41 .917 Justice-Government Appointments 0.00 -0.05 0.06 0.31 142.86 1.000 National Defence-Government Appointments 0.00 -0.05 0.05 0.08 144.73 1.000 The Budget-Government Appointments 0.01 -0.05 0.07 0.68 141.86 1.000 The Economy-Government Appointments 0.01 -0.05 0.07 0.56 122.79 1.000 The Environment-Government Appointments 0.00 -0.04 0.05 0.17 85.84 1.000 Justice-Health -0.03-0.10 0.031.67112.99 .961 National Defence-Health -0.04 -0.10 0.03 2.01 109.46 .837 TheBudget-Health -0.02-0.09 0.041.25123.97 .998 TheEconomy-Health -0.03-0.10 0.041.32118.45 .996 TheEnvironment-Health -0.03 -0.09 0.022.06 74.41 .810 National Defence-Justice -0.01 -0.05 0.04 0.42 189.76 1.000 TheBudget-Justice 0.01-0.05 0.060.42152.781.000 TheEconomy-Justice 0.00-0.05 0.060.31123.671.000 The Environment-Justice 0.00 -0.04 0.04 0.24 120.24 1.000 The Budget-National Defence 0.01 -0.04 0.06 0.82 155.49 1.000 The Economy-National Defence 0.01 -0.04 0.06 0.69 122.23 1.000 The Environment-National Defence 0.00 -0.03 0.04 0.32 156.74 1.000 TheEconomy-TheBudget 0.00 -0.06 0.060.10129.411.000 The Environment-The Budget -0.01 -0.05 0.04 0.72 95.00 1.000 The Environment-The Economy -0.01 -0.05 0.04 0.57 72.35 1.000

Afghanistan Airbus BusinessoftheHouse Canadian Wheat Board "a" "b" "cdefgh" " ijk " Child Care Citizenship and Immigration Elections Canada Ethics "ablmnopqrstu" "cil" "m" "dn" Foreign Affairs Government Appointments Health Appendix A. Qualitative Results: Additional Analysis 140

Justice "o" "ep" "v" "fq" NationalDefence TheBudget TheEconomy The Environment "gjr" "s" "t" " hko " Aboriginal Affairs "u"

### Welch correction for nonhomogeneous variances:

F[16, 520.43] = 5.4, p < .001.

### Brown-Forsythe correction for nonhomogeneous variances:

F[16, 1130.44] = 6.74, p < .001.

### Cohen’s f^2:

> e = 1.1/(20.72+1.1) > e / (1-e) [1] 0.0530888

A.3 Model: lexical similarity vs. word count differential

Call : lm(formula = simil ~ abs(wc_diff), data = data_wc)

Residuals : Min 1Q Median 3Q Max -0.19753 -0.07539 -0.01437 0.05873 0.51859

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.920e-01 1.537e-03 124.964 <2e-16 *** abs(wc_diff) 1.369e-05 6.330e-06 2.163 0.0306 * --- Signif. codes: 0 âĂŸ***âĂŹ 0.001 âĂŸ**âĂŹ 0.01 âĂŸ*âĂŹ 0.05 âĂŸ.âĂŹ 0.1 âĂŸ âĂŹ 1

Residual standard error: 0.09985 on 7301 degrees of freedom Multiple R-squared: 0.0006402, Adjusted R-squared: 0.0005033 F-statistic: 4.677 on 1 and 7301 DF, p-value: 0.0306 Appendix B

Quantitative Results: Additional Analysis

B.1 Polling data breakpoints analysis

> b2 <- breakpoints(simil_gov_offopp_hash ~ poll_govt, data=data2a, h=4) > confint(b2)

Confidence intervals for breakpoints of optimal 2-segment partition:

Call : confint.breakpointsfull(object = b2)

Breakpoints at observation number: 2.5 % breakpoints 97.5 % 1 31 38 43

Corresponding to breakdates: 2.5 % breakpoints 97.5 % 1 0.2672414 0.3275862 0.3706897

> summary(b2)

Optimal (m+1)-segment partition:

Call : breakpoints.formula(formula = simil_gov_offopp_hash ~ poll_govt, h = 4, data = data2a)

Breakpoints at observation number: m = 1 38 m = 2 45 73 m = 3 30 67 72 m=4 3338 6772 m=5 3338 677278 m=6 333845 677278 m=7 333845 6772 79 88 m=8 333845 6772 79 88 104 m=9 6 333845 6772 79 88 104 m=106 333845 67717579 88 104 m=11 6 333845 5559 6772 79 88 104

141 Appendix B. Quantitative Results: Additional Analysis 142

m=12 6 333845 5559 67717579 88 104 m=13 6 333845 5559 67717579 88 100106 m=14 6 333845 5559 67717579 8894 100106 m=15 1318 29333845 5559 67717579 88 100106 m=16 1318 29333845 5559 67717579 8894 100106 m=17 1318 29333845 5559 67717579 8894 100106110 m=18 1318 29333845 5559 67717579838894 100106110 m=19 4 1318 29333845 5559 67717579838894 100106110 m=20 4 1318 29333845 515559 67717579838894 100106110 m=21 4 1318 29333845 515559 6771757983889397102106110 m=22 4 131822 29333845 515559 6771757983889397102106110 m= 23 4 13 18 22 29 33 38 45 51 55 59 63 67 71 75 79 83 88 93 97 102 106 110 m = 24 4 8 13 18 22 29 33 38 45 51 55 59 63 67 71 75 79 83 88 93 97 102 106 110 m = 25 4 8 12 16 20 24 29 33 38 45 51 55 59 63 67 71 75 79 83 88 93 97 102 106 110 m = 26 4 8 12 16 20 24 29 33 38 43 47 51 55 59 63 67 71 75 79 83 88 93 97 102 106 110 m = 27 4 8 12 16 20 24 29 33 38 43 47 51 55 59 63 67 71 75 79 83 88 92 96 100 104 108 112

Corresponding to breakdates: m = 1 m = 2 m = 3

0.258620689655172 m = 4 m = 5 m = 6 m = 7 m = 8 m = 9 0.0517241379310345 m = 10 0.0517241379310345 m = 11 0.0517241379310345 m = 12 0.0517241379310345 m = 13 0.0517241379310345 m = 14 0.0517241379310345 m=15 0.1120689655172410.155172413793103 0.25 m=16 0.1120689655172410.155172413793103 0.25 m=17 0.1120689655172410.155172413793103 0.25 m=18 0.1120689655172410.155172413793103 0.25 m = 19 0.0344827586206897 0.112068965517241 0.155172413793103 0.25 m = 20 0.0344827586206897 0.112068965517241 0.155172413793103 0.25 m = 21 0.0344827586206897 0.112068965517241 0.155172413793103 0.25 m = 22 0.0344827586206897 0.112068965517241 0.155172413793103 0.189655172413793 0.25 m = 23 0.0344827586206897 0.112068965517241 0.155172413793103 0.189655172413793 0.25 m = 24 0.0344827586206897 0.0689655172413793 0.112068965517241 0.155172413793103 0.189655172413793 0.25 m = 25 0.0344827586206897 0.0689655172413793 0.103448275862069 0.137931034482759 0.172413793103448 0.206896551724138 0.25 m = 26 0.0344827586206897 0.0689655172413793 0.103448275862069 0.137931034482759 0.172413793103448 0.206896551724138 0.25 m = 27 0.0344827586206897 0.0689655172413793 0.103448275862069 0.137931034482759 0.172413793103448 0.206896551724138 0.25 m=1 0.327586206896552 m=2 0.387931034482759 m = 3 Appendix B. Quantitative Results: Additional Analysis 143

m = 4 0.28448275862069 0.327586206896552 m = 5 0.28448275862069 0.327586206896552 m = 6 0.28448275862069 0.327586206896552 0.387931034482759 m = 7 0.28448275862069 0.327586206896552 0.387931034482759 m = 8 0.28448275862069 0.327586206896552 0.387931034482759 m = 9 0.28448275862069 0.327586206896552 0.387931034482759 m = 10 0.28448275862069 0.327586206896552 0.387931034482759 m = 11 0.28448275862069 0.327586206896552 0.387931034482759 0.474137931034483 0.508620689655172 m = 12 0.28448275862069 0.327586206896552 0.387931034482759 0.474137931034483 0.508620689655172 m = 13 0.28448275862069 0.327586206896552 0.387931034482759 0.474137931034483 0.508620689655172 m = 14 0.28448275862069 0.327586206896552 0.387931034482759 0.474137931034483 0.508620689655172 m = 15 0.28448275862069 0.327586206896552 0.387931034482759 0.474137931034483 0.508620689655172 m = 16 0.28448275862069 0.327586206896552 0.387931034482759 0.474137931034483 0.508620689655172 m = 17 0.28448275862069 0.327586206896552 0.387931034482759 0.474137931034483 0.508620689655172 m = 18 0.28448275862069 0.327586206896552 0.387931034482759 0.474137931034483 0.508620689655172 m = 19 0.28448275862069 0.327586206896552 0.387931034482759 0.474137931034483 0.508620689655172 m = 20 0.28448275862069 0.327586206896552 0.387931034482759 0.439655172413793 0.474137931034483 0.508620689655172 m = 21 0.28448275862069 0.327586206896552 0.387931034482759 0.439655172413793 0.474137931034483 0.508620689655172 m = 22 0.28448275862069 0.327586206896552 0.387931034482759 0.439655172413793 0.474137931034483 0.508620689655172 m = 23 0.28448275862069 0.327586206896552 0.387931034482759 0.439655172413793 0.474137931034483 0.508620689655172 m = 24 0.28448275862069 0.327586206896552 0.387931034482759 0.439655172413793 0.474137931034483 0.508620689655172 m = 25 0.28448275862069 0.327586206896552 0.387931034482759 0.439655172413793 0.474137931034483 0.508620689655172 m = 26 0.28448275862069 0.327586206896552 0.370689655172414 0.405172413793103 0.439655172413793 0.474137931034483 0.508620689655172 m = 27 0.28448275862069 0.327586206896552 0.370689655172414 0.405172413793103 0.439655172413793 0.474137931034483 0.508620689655172 m = 1 m=2 0.629310344827586 m=3 0.577586206896552 0.620689655172414 m=4 0.577586206896552 0.620689655172414 m=5 0.5775862068965520.620689655172414 0.672413793103448 m=6 0.5775862068965520.620689655172414 0.672413793103448 m=7 0.5775862068965520.620689655172414 0.681034482758621 0.758620689655172 m=8 0.5775862068965520.620689655172414 0.681034482758621 0.758620689655172 m=9 0.5775862068965520.620689655172414 0.681034482758621 0.758620689655172 m = 10 0.577586206896552 0.612068965517241 0.646551724137931 0.681034482758621 0.758620689655172 m=11 0.577586206896552 0.620689655172414 0.681034482758621 0.758620689655172 m = 12 0.577586206896552 0.612068965517241 0.646551724137931 0.681034482758621 0.758620689655172 m = 13 0.577586206896552 0.612068965517241 0.646551724137931 0.681034482758621 0.758620689655172 m = 14 0.577586206896552 0.612068965517241 0.646551724137931 0.681034482758621 0.758620689655172 m = 15 0.577586206896552 0.612068965517241 0.646551724137931 0.681034482758621 Appendix B. Quantitative Results: Additional Analysis 144

0.758620689655172 m = 16 0.577586206896552 0.612068965517241 0.646551724137931 0.681034482758621 0.758620689655172 m = 17 0.577586206896552 0.612068965517241 0.646551724137931 0.681034482758621 0.758620689655172 m = 18 0.577586206896552 0.612068965517241 0.646551724137931 0.681034482758621 0.71551724137931 0.758620689655172 m = 19 0.577586206896552 0.612068965517241 0.646551724137931 0.681034482758621 0.71551724137931 0.758620689655172 m = 20 0.577586206896552 0.612068965517241 0.646551724137931 0.681034482758621 0.71551724137931 0.758620689655172 m = 21 0.577586206896552 0.612068965517241 0.646551724137931 0.681034482758621 0.71551724137931 0.758620689655172 m = 22 0.577586206896552 0.612068965517241 0.646551724137931 0.681034482758621 0.71551724137931 0.758620689655172 m = 23 0.543103448275862 0.577586206896552 0.612068965517241 0.646551724137931 0.681034482758621 0.71551724137931 0.758620689655172 m = 24 0.543103448275862 0.577586206896552 0.612068965517241 0.646551724137931 0.681034482758621 0.71551724137931 0.758620689655172 m = 25 0.543103448275862 0.577586206896552 0.612068965517241 0.646551724137931 0.681034482758621 0.71551724137931 0.758620689655172 m = 26 0.543103448275862 0.577586206896552 0.612068965517241 0.646551724137931 0.681034482758621 0.71551724137931 0.758620689655172 m = 27 0.543103448275862 0.577586206896552 0.612068965517241 0.646551724137931 0.681034482758621 0.71551724137931 0.758620689655172 m = 1 m = 2 m = 3 m = 4 m = 5 m = 6 m = 7 m = 8 0.896551724137931 m = 9 0.896551724137931 m = 10 0.896551724137931 m = 11 0.896551724137931 m = 12 0.896551724137931 m=13 0.8620689655172410.913793103448276 m = 14 0.810344827586207 0.862068965517241 0.913793103448276 m=15 0.8620689655172410.913793103448276 m = 16 0.810344827586207 0.862068965517241 0.913793103448276 m = 17 0.810344827586207 0.862068965517241 0.913793103448276 0.948275862068966 m = 18 0.810344827586207 0.862068965517241 0.913793103448276 0.948275862068966 m = 19 0.810344827586207 0.862068965517241 0.913793103448276 0.948275862068966 m = 20 0.810344827586207 0.862068965517241 0.913793103448276 0.948275862068966 m = 21 0.801724137931034 0.836206896551724 0.879310344827586 0.913793103448276 0.948275862068966 m = 22 0.801724137931034 0.836206896551724 0.879310344827586 0.913793103448276 0.948275862068966 m = 23 0.801724137931034 0.836206896551724 0.879310344827586 0.913793103448276 0.948275862068966 m = 24 0.801724137931034 0.836206896551724 0.879310344827586 0.913793103448276 0.948275862068966 m = 25 0.801724137931034 0.836206896551724 0.879310344827586 0.913793103448276 0.948275862068966 m = 26 0.801724137931034 0.836206896551724 0.879310344827586 0.913793103448276 0.948275862068966 m = 27 0.793103448275862 0.827586206896552 0.862068965517241 0.896551724137931 0.931034482758621 0.96551724137931

Fit : m 0 1 2 3 4 5 6 7 8 9 10 11 RSS 0.10933 0.09387 0.08654 0.07716 0.07052 0.06447 0.06100 0.05849 0.05646 0.05460 0.05314 0.05155 BIC -464.71599 -468.14197 -463.30868 -462.35824 -458.54088 -454.67347 -446.83472 -437.44279 -427.28886 -416.91891 -405.78515 -395.04616 m12 13 14 15 16 17 18 19 20 Appendix B. Quantitative Results: Additional Analysis 145

21 22 23 RSS 0.05010 0.04887 0.04798 0.04686 0.04597 0.04514 0.04435 0.04361 0.04286 0.04221 0.04158 0.04110 BIC -384.09953 -372.72684 -360.59775 -349.07593 -337.03913 -324.88726 -312.66986 -300.38579 -288.11521 -275.64357 -263.10872 -250.20752

m 24 25 26 27 RSS 0.04092 0.04073 0.04148 0.04259 BIC -236.46836 -222.72797 -206.35338 -189.02785

B.2 Replication of polling analysis with all opposition speeches

> summary(fit2a) Linear mixed model fit by REML. t-tests use Satterthwaite’s method [’lmerModLmerTest’] Formula: simil_base_hash ~ majority * gov_party + poll_govt * poll_prev_govt + (1 | parlnum_factor/sessnum_factor) Data: data2a

REML criterion at convergence: -493.4

Scaled residuals: Min 1Q Median 3Q Max -3.6373 -0.5415 0.0585 0.5090 2.7038

Random effects: Groups Name Variance Std.Dev. sessnum_factor:parlnum_factor (Intercept) 0.0001409 0.01187 parlnum_factor (Intercept) 0.0006684 0.02585 Residual 0.00033360.01827 Number of obs: 113, groups: sessnum_factor:parlnum_factor, 25; parlnum_factor, 11

Fixed effects: Estimate Std. Error df t value Pr(>|t|) (Intercept) 7.061e-01 2.927e-02 4.967e+01 24.126 <2e-16 *** majorityMajority -6.867e-02 2.632e-02 7.797e+00 -2.609 0.0319 * gov_partyLiberal -3.536e-02 3.790e-02 1.303e+01 -0.933 0.3678 poll_govt -9.245e-02 8.187e-02 1.002e+02 -1.129 0.2615 poll_prev_govt -9.107e-04 8.432e-04 9.421e+01 -1.080 0.2829 majorityMajority:gov_partyLiberal 4.857e-02 4.472e-02 1.099e+01 1.086 0.3007 poll_govt:poll_prev_govt 3.510e-03 2.104e-03 9.611e+01 1.668 0.0986 . --- Signif. codes: 0 âĂŸ***âĂŹ 0.001 âĂŸ**âĂŹ 0.01 âĂŸ*âĂŹ 0.05 âĂŸ.âĂŹ 0.1 âĂŸ âĂŹ 1

Correlation of Fixed Effects: (Intr) mjrtyM gv_prL pll_gv pll_p_ mjM:_L majrtyMjrty -0.527 gv_prtyLbrl -0.269 0.295 poll_govt -0.648 0.131 0.009 pll_prv_gvt -0.667 0.147 -0.001 0.376 mjrtyMjr:_L 0.307 -0.590 -0.849 -0.076 -0.063 pll_gvt:p__ 0.730 -0.146 -0.002 -0.793 -0.818 0.056

B.3 Comparison of alternative fits (loess, lm, spline, polynomial) for polling data model

All Observations

> summary(loess_fit) Call : loess(formula = simil_gov_offopp_hash ~ poll_govt, data = data2a) Appendix B. Quantitative Results: Additional Analysis 146

Number of Observations: 116 Equivalent Number of Parameters: 4.77 Residual Standard Error: 0.02886 Trace of smoother matrix: 5.22 (exact)

Control settings: span : 0.75 degree : 2 family : gaussian surface : interpolate cell = 0.2 normalize: TRUE parametric: FALSE drop.square: FALSE

[1] "Pseudo R2" > print(r_sq_loess <- cor(data2a$simil_gov_offopp_hash, hat)^2) [1] 0.1683514 > > lm_fit <- lm(simil_gov_offopp_hash ~ poll_govt, data=data2a) > summary(lm_fit)

Call : lm(formula = simil_gov_offopp_hash ~ poll_govt, data = data2a)

Residuals : Min 1Q Median 3Q Max -0.069002 -0.020526 -0.000267 0.018406 0.083645

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.577891 0.009214 62.720 <2e-16 *** poll_govt 0.015580 0.027614 0.564 0.574 --- Signif. codes: 0 âĂŸ***âĂŹ 0.001 âĂŸ**âĂŹ 0.01 âĂŸ*âĂŹ 0.05 âĂŸ.âĂŹ 0.1 âĂŸ âĂŹ 1

Residual standard error: 0.03097 on 114 degrees of freedom Multiple R-squared: 0.002785, Adjusted R-squared: -0.005963 F-statistic: 0.3183 on 1 and 114 DF, p-value: 0.5737

> poly_fit <- lm(simil_gov_offopp_hash~poly(poll_govt, 4), data=data2a) > summary(poly_fit)

Call : lm(formula = simil_gov_offopp_hash ~ poly(poll_govt, 4), data = data2a)

Residuals : Min 1Q Median 3Q Max -0.074025 -0.017404 0.003499 0.019679 0.077140

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.582830 0.002725 213.896 < 2e-16 *** poly(poll_govt, 4)1 0.017472 0.029347 0.595 0.55281 poly(poll_govt, 4)2 -0.027263 0.029347 -0.929 0.35491 poly(poll_govt, 4)3 -0.066465 0.029347 -2.265 0.02547 * poly(poll_govt, 4)4 0.092563 0.029347 3.154 0.00207 ** --- Signif. codes: 0 âĂŸ***âĂŹ 0.001 âĂŸ**âĂŹ 0.01 âĂŸ*âĂŹ 0.05 âĂŸ.âĂŹ 0.1 âĂŸ âĂŹ 1

Residual standard error: 0.02935 on 111 degrees of freedom Multiple R-squared: 0.128, Adjusted R-squared: 0.09658 F-statistic: 4.074 on 4 and 111 DF, p-value: 0.004047

> spline_cv<-smooth.spline(data2a$simil_gov_offopp_hash,data2a$poll_govt,cv = TRUE) > print(spline_cv) Appendix B. Quantitative Results: Additional Analysis 147

Call : smooth.spline(x = data2a$simil_gov_offopp_hash, y = data2a$poll_govt, cv = TRUE )

Smoothing Parameter spar= 1.066141 lambda= 0.01672798 (15 iterations) Equivalent Degrees of Freedom (Df): 4.015938 Penalized Criterion (RSS): 1.22596 PRESS(l.o.o. CV): 0.01173807 > > bs_fit <- lm(simil_gov_offopp_hash~bs(poll_govt, 4), data=data2a) > summary(bs_fit)

Call : lm(formula = simil_gov_offopp_hash ~ bs(poll_govt, 4), data = data2a)

Residuals : Min 1Q Median 3Q Max -0.073229 -0.018165 0.001415 0.018220 0.076440

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.60652 0.01276 47.541 < 2e-16 *** bs(poll_govt, 4)1 -0.10963 0.02914 -3.763 0.00027 *** bs(poll_govt, 4)2 0.06197 0.02126 2.915 0.00430 ** bs(poll_govt, 4)3 -0.06124 0.02187 -2.800 0.00603 ** bs(poll_govt, 4)4 -0.01827 0.01773 -1.030 0.30515 --- Signif. codes: 0 âĂŸ***âĂŹ 0.001 âĂŸ**âĂŹ 0.01 âĂŸ*âĂŹ 0.05 âĂŸ.âĂŹ 0.1 âĂŸ âĂŹ 1

Residual standard error: 0.02907 on 111 degrees of freedom Multiple R-squared: 0.1446, Adjusted R-squared: 0.1138 F-statistic: 4.693 on 4 and 111 DF, p-value: 0.001548

> > anova(lm_fit, bs_fit, poly_fit) Analysis of Variance Table

Model 1: simil_gov_offopp_hash ~ poll_govt Model 2: simil_gov_offopp_hash ~ bs(poll_govt, 4) Model 3: simil_gov_offopp_hash ~ poly(poll_govt, 4) Res.Df RSSDf SumofSq F Pr(>F) 1 114 0.109329 2 111 0.093776 3 0.0155524 6.1363 0.0006715 *** 3 111 0.095600 0 -0.0018237 --- Signif. codes: 0 âĂŸ***âĂŹ 0.001 âĂŸ**âĂŹ 0.01 âĂŸ*âĂŹ 0.05 âĂŸ.âĂŹ 0.1 âĂŸ âĂŹ 1

Majority Government Observations Only

> summary(loess_fit_maj) Call : loess(formula = simil_gov_offopp_hash ~ poll_govt, data = maj_data2a)

Number of Observations: 101 Equivalent Number of Parameters: 4.6 Residual Standard Error: 0.0257 Trace of smoother matrix: 5.02 (exact)

Control settings: span : 0.75 degree : 2 family : gaussian surface : interpolate cell = 0.2 normalize: TRUE parametric: FALSE Appendix B. Quantitative Results: Additional Analysis 148

drop.square: FALSE > print("Pseudo R2") [1] "Pseudo R2" > print(r_sq_loess <- cor(maj_data2a$simil_gov_offopp_hash, hat_maj)^2) [1] 0.1109325 > > lm_fit_maj <- lm(simil_gov_offopp_hash ~ poll_govt, data=maj_data2a) > summary(lm_fit_maj)

Call : lm(formula = simil_gov_offopp_hash ~ poll_govt, data = maj_data2a)

Residuals : Min 1Q Median 3Q Max -0.062417 -0.019447 0.001478 0.019142 0.089795

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.569976 0.008029 70.993 <2e-16 *** poll_govt 0.020401 0.023841 0.856 0.394 --- Signif. codes: 0 âĂŸ***âĂŹ 0.001 âĂŸ**âĂŹ 0.01 âĂŸ*âĂŹ 0.05 âĂŸ.âĂŹ 0.1 âĂŸ âĂŹ 1

Residual standard error: 0.02666 on 99 degrees of freedom Multiple R-squared: 0.007342, Adjusted R-squared: -0.002685 F-statistic: 0.7323 on 1 and 99 DF, p-value: 0.3942

> > maj_spline_cv<-smooth.spline(maj_data2a$simil_gov_offopp_hash,maj_data2a$poll_govt,cv = TRUE) > print(maj_spline_cv) Call : smooth.spline(x = maj_data2a$simil_gov_offopp_hash, y = maj_data2a$poll_govt, cv = TRUE )

Smoothing Parameter spar= 1.160892 lambda= 0.03503012 (16 iterations) Equivalent Degrees of Freedom (Df): 3.232593 Penalized Criterion (RSS): 1.217711 PRESS(l.o.o. CV): 0.0124175 > > bs_fit_maj <- lm(simil_gov_offopp_hash~bs(poll_govt, 4), data=maj_data2a) > summary(bs_fit_maj)

Call : lm(formula = simil_gov_offopp_hash ~ bs(poll_govt, 4), data = maj_data2a)

Residuals : Min 1Q Median 3Q Max -0.061148 -0.016333 0.003029 0.017439 0.084709

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.600935 0.011236 53.484 < 2e-16 *** bs(poll_govt, 4)1 -0.078332 0.026729 -2.931 0.00423 ** bs(poll_govt, 4)2 0.009966 0.021057 0.473 0.63708 bs(poll_govt, 4)3 -0.031277 0.019956 -1.567 0.12033 bs(poll_govt, 4)4 -0.019752 0.015803 -1.250 0.21436 --- Signif. codes: 0 âĂŸ***âĂŹ 0.001 âĂŸ**âĂŹ 0.01 âĂŸ*âĂŹ 0.05 âĂŸ.âĂŹ 0.1 âĂŸ âĂŹ 1

Residual standard error: 0.02571 on 96 degrees of freedom Multiple R-squared: 0.1052, Adjusted R-squared: 0.06788 F-statistic: 2.82 on 4 and 96 DF, p-value: 0.0292

> poly_fit_maj <- lm(simil_gov_offopp_hash~poly(poll_govt, 4), data=maj_data2a) > summary(poly_fit_maj) Appendix B. Quantitative Results: Additional Analysis 149

Call : lm(formula = simil_gov_offopp_hash ~ poly(poll_govt, 4), data = maj_data2a)

Residuals : Min 1Q Median 3Q Max -0.059828 -0.015725 0.002428 0.016774 0.085619

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.576460 0.002566 224.642 <2e-16 *** poly(poll_govt, 4)1 0.022817 0.025789 0.885 0.3785 poly(poll_govt, 4)2 0.031816 0.025789 1.234 0.2203 poly(poll_govt, 4)3 -0.066433 0.025789 -2.576 0.0115 * poly(poll_govt, 4)4 0.033364 0.025789 1.294 0.1989 --- Signif. codes: 0 âĂŸ***âĂŹ 0.001 âĂŸ**âĂŹ 0.01 âĂŸ*âĂŹ 0.05 âĂŸ.âĂŹ 0.1 âĂŸ âĂŹ 1

Residual standard error: 0.02579 on 96 degrees of freedom Multiple R-squared: 0.09956, Adjusted R-squared: 0.06204 F-statistic: 2.654 on 4 and 96 DF, p-value: 0.03765

> anova(lm_fit_maj, bs_fit_maj, poly_fit_maj) Analysis of Variance Table

Model 1: simil_gov_offopp_hash ~ poll_govt Model 2: simil_gov_offopp_hash ~ bs(poll_govt, 4) Model 3: simil_gov_offopp_hash ~ poly(poll_govt, 4) Res.Df RSSDf SumofSq F Pr(>F) 1 99 0.070387 2 96 0.063451 3 0.0069361 3.4981 0.01848 * 3 96 0.063848 0 -0.0003974 --- Signif. codes: 0 âĂŸ***âĂŹ 0.001 âĂŸ**âĂŹ 0.01 âĂŸ*âĂŹ 0.05 âĂŸ.âĂŹ 0.1 âĂŸ âĂŹ 1

B.4 Parliament-level daily debate model using 1975- data subset

> summary(fit4_plot)

Call : lm(formula = mean_simil_gov_offopp_hash ~ mean_d2v_m3_gov_offopp + majority * gov_party, data = data2)

Residuals : Min 1Q Median 3Q Max -0.023906 -0.005060 0.001883 0.007194 0.015641

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.79563 0.05323 14.9471.44e-06*** mean_d2v_m3_gov_offopp -0.16312 0.33903 -0.481 0.645 majorityMajority -0.01791 0.01207 -1.484 0.181 gov_partyLiberal -0.01302 0.01973 -0.660 0.530 majorityMajority:gov_partyLiberal -0.02492 0.02233 -1.116 0.301 --- Signif. codes: 0 âĂŸ***âĂŹ 0.001 âĂŸ**âĂŹ 0.01 âĂŸ*âĂŹ 0.05 âĂŸ.âĂŹ 0.1 âĂŸ âĂŹ 1

Residual standard error: 0.01408 on 7 degrees of freedom Multiple R-squared: 0.8273, Adjusted R-squared: 0.7286 F-statistic: 8.383 on 4 and 7 DF, p-value: 0.00834 Appendix B. Quantitative Results: Additional Analysis 150

B.5 Daily debate model including random effects using 1975- data subset

> summary(fit_1975) Linear mixed model fit by REML. t-tests use Satterthwaite’s method [’lmerModLmerTest’] Formula: simil_gov_offopp_hash ~ d2v_m3_gov_offopp + majority * gov_party + (1 | parlnum_factor/sessnum_factor) + (1 | qtr) Data : data2

REML criterion at convergence: -11920.2

Scaled residuals: Min 1Q Median 3Q Max -5.5354 -0.5591 0.1074 0.6835 7.7006

Random effects: Groups Name Variance Std.Dev. qtr (Intercept)0.00014930.01222 sessnum_factor:parlnum_factor (Intercept) 0.0001658 0.01288 parlnum_factor (Intercept) 0.0001293 0.01137 Residual 0.00561900.07496 Number of obs: 5156, groups: qtr, 150; sessnum_factor:parlnum_factor, 27; parlnum_factor, 12

Fixed effects: Estimate Std. Error df t value Pr(>|t|) (Intercept) 0.88779 0.01190 18.77444 74.600 <2e-16*** d2v_m3_gov_offopp -0.78510 0.04273 4713.56860 -18.376 <2e-16 *** majorityMajority -0.01859 0.01325 8.48617 -1.403 0.196 gov_partyLiberal -0.02905 0.02166 13.74295 -1.341 0.202 majorityMajority:gov_partyLiberal -0.01237 0.02424 12.04513 -0.510 0.619 --- Signif. codes: 0 âĂŸ***âĂŹ 0.001 âĂŸ**âĂŹ 0.01 âĂŸ*âĂŹ 0.05 âĂŸ.âĂŹ 0.1 âĂŸ âĂŹ 1

Correlation of Fixed Effects: (Intr) d2_3__ mjrtyM gv_prL d2v_m3_gv_f -0.542 majrtyMjrty -0.646 0.021 gv_prtyLbrl -0.419 0.057 0.350 mjrtyMjr:_L 0.375 -0.053 -0.545 -0.893

B.6 Model of lexical and semantic similarity, Question Period data

Question Period-level

Linear mixed model fit by REML. t-tests use Satterthwaite’s method [’lmerModLmerTest’] Formula: simil_gov_offopp_hash ~ d2v_m4_gov_offopp + majority * gov_party + (1 | parlnum_factor/sessnum_factor) + (1 | qtr) Data : data2

REML criterion at convergence: -11493.8

Scaled residuals: Min 1Q Median 3Q Max -4.2594 -0.6193 -0.0240 0.5871 4.9776

Random effects: Groups Name Variance Std.Dev. qtr (Intercept)0.00018980.01378 sessnum_factor:parlnum_factor (Intercept) 0.0001896 0.01377 parlnum_factor (Intercept) 0.0002411 0.01553 Appendix B. Quantitative Results: Additional Analysis 151

Residual 0.00339900.05830 Number of obs: 4105, groups: qtr, 116; sessnum_factor:parlnum_factor, 25; parlnum_factor, 11

Fixed effects: Estimate Std. Error df t value Pr(>|t|) (Intercept) 0.59671 0.01350 13.60869 44.185 4.29e-16*** d2v_m4_gov_offopp 0.08564 0.01325 4025.45444 6.464 1.15e-10 *** majorityMajority -0.06215 0.01776 7.93267 -3.500 0.00819 ** gov_partyLiberal -0.04233 0.02914 19.07517 -1.453 0.16255 majorityMajority:gov_partyLiberal 0.04915 0.03281 15.11179 1.498 0.15476 --- Signif. codes: 0 âĂŸ***âĂŹ 0.001 âĂŸ**âĂŹ 0.01 âĂŸ*âĂŹ 0.05 âĂŸ.âĂŹ 0.1 âĂŸ âĂŹ 1

Correlation of Fixed Effects: (Intr) d2_4__ mjrtyM gv_prL d2v_m4_gv_f -0.424 majrtyMjrty -0.615 -0.022 gv_prtyLbrl -0.383 0.006 0.289 mjrtyMjr:_L 0.335 0.007 -0.536 -0.888

Session-level

Call : lm(formula = mean_simil_gov_offopp_hash ~ mean_d2v_m4_gov_offopp + majority * gov_party, data = data3)

Residuals : Min 1Q Median 3Q Max -0.04306 -0.01139 -0.00149 0.01157 0.05271

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.53884 0.04859 11.0895.42e-10*** mean_d2v_m4_gov_offopp 0.21254 0.11233 1.892 0.073047 . majorityMajority -0.06398 0.01425 -4.489 0.000224 *** gov_partyLiberal -0.03765 0.02443 -1.541 0.138983 majorityMajority:gov_partyLiberal 0.05129 0.02728 1.880 0.074740 . --- Signif. codes: 0 âĂŸ***âĂŹ 0.001 âĂŸ**âĂŹ 0.01 âĂŸ*âĂŹ 0.05 âĂŸ.âĂŹ 0.1 âĂŸ âĂŹ 1

Residual standard error: 0.02261 on 20 degrees of freedom Multiple R-squared: 0.5727, Adjusted R-squared: 0.4872 F-statistic: 6.7 on 4 and 20 DF, p-value: 0.001366 Appendix B. Quantitative Results: Additional Analysis 152

B.7 Interaction of government poll popularity and previous pop- ularity

Marginal Effect of Previous Poll Popularity and Government Poll Popularity

0.65

0.60

0.55 Lexical Similarity

0.50

0.0 0.2 0.4 0.6 Government Poll Popularity

Previous Government Poll Popularity 8.3 51.8

B.8 Effect size calculations

> model1 <- lmer(simil_gov_offopp_hash ~ majority*gov_party + (1 | parlnum_factor/sessnum_factor) + (1 | qtr), data = qp_data) > r2beta(model1) Effect Rsq upper.CL lower.CL 1 Model0.085 0.102 0.070 2 majorityMajority0.076 0.092 0.062 4 majorityMajority:gov_partyLiberal 0.006 0.011 0.002 3 gov_partyLiberal0.005 0.010 0.002

> model2 <- lmer(simil_gov_offopp_hash ~ majority*gov_party + poll_govt*poll_prev_govt + (1 | parlnum_factor/sessnum_factor), data = parlsess_qp_data) > r2beta(model2) Effect Rsq upper.CL lower.CL 1 Model0.337 0.483 0.232 2 majorityMajority0.270 0.404 0.147 7 poll_govt:poll_prev_govt 0.040 0.136 0.001 6 majorityMajority:gov_partyLiberal 0.023 0.107 0.000 4 poll_govt0.022 0.103 0.000 3 gov_partyLiberal0.020 0.100 0.000 5 poll_prev_govt0.012 0.083 0.000

> model3 <- lmer(simil_gov_offopp_hash ~ majority*govtseatpct + (1 | parlnum_factor/sessnum_factor), data = parlsess_qp_data) Appendix B. Quantitative Results: Additional Analysis 153

> r2beta(model3) Effect Rsq upper.CL lower.CL 1 Model0.271 0.411 0.16 4 majorityMajority:govtseatpct 0.002 0.052 0.00 3 govtseatpct0.001 0.047 0.00 2 majorityMajority 0.000 0.043 0.00

> model4 <- lmer(simil_gov_offopp_hash ~ majority*gov_party + (1 | parlnum_factor/sessnum_factor) + (1 | qtr), data = daily_data) > r2beta(model4) Effect Rsq upper.CL lower.CL 1 Model0.046 0.055 0.039 3 gov_partyLiberal0.004 0.007 0.002 4 majorityMajority:gov_partyLiberal 0.002 0.004 0.000 2 majorityMajority0.000 0.001 0.000

> model5 <- lmer(simil_gov_offopp_hash ~ majority*govtseatpct + (1 | parlnum_factor/sessnum_factor) + (1 | qtr), data = daily_data) > r2beta(model5) Effect Rsq upper.CL lower.CL 1 Model0.009 0.014 0.006 3 govtseatpct0.003 0.006 0.001 2 majorityMajority 0.003 0.005 0.001 4 majorityMajority:govtseatpct 0.003 0.005 0.001

> model6 <- lm(mean_simil_gov_offopp_hash ~ mean_d2v_m3_gov_offopp + (majority*gov_party)^2. data= parl_daily_data) > r2beta(model6) Effect Rsq upper.CL lower.CL 1 Model0.777 0.902 0.624 2 data2$mean_d2v_m3_gov_offopp 0.680 0.852 0.450 4 data2$gov_partyLiberal0.213 0.565 0.005 3 data2$majorityMajority0.068 0.407 0.000 5 data2$majorityMajority:data2$gov_partyLiberal 0.034 0.349 0.000

> model7 <- lmer(simil_gov_offopp_hash ~ mean_d2v_m4_gov_offopp + majority*gov_party + (1 | parlnum_factor/sessnum_factor) + (1 | qtr), data = daily_data) > r2beta(model7) Effect Rsq upper.CL lower.CL 1 Model0.505 0.517 0.492 2 mean_d2v_m4_gov_offopp 0.503 0.516 0.491 4 gov_partyLiberal0.010 0.015 0.007 5 majorityMajority:gov_partyLiberal 0.002 0.004 0.001 3 majorityMajority0.001 0.003 0.000

B.9 Simulated distribution of similarity scores

To generate a simulated “baseline” for comparison with observed similarity scores, I begin by randomly drawing a speech count for government speeches and for official opposition speeches from a Gaussian distribution with parameters equal to the observed distribution of Question Period lengths for each type of speech in the real dataset. For governments, this corresponds to µ = 44.239, σ = 10.224; for oppositions, µ = 23.900, σ = 9.376. Then, I randomly draw samples of each size with replacement from the dataset, and, following my analysis methodology, concatenate each sample set into a composite text. Finally, I calculate the lexical similarity scores between these two random composites to generate “Question Periods”. Appendix B. Quantitative Results: Additional Analysis 154

The figure below shows the resulting distribution of similarity scores for 10000 such randomly- generated Question Periods.

10000 Randomly-Generated “Question Periods”, Variable Length

20000

15000

10000 Frequency

5000

0 0.00 0.25 0.50 0.75 Lexical Similarity

Comparing this simulation with the actual observed distribution of Question Period similarity scores, the mean is higher and there is a noticeable skew to the left. Given the randomly-sampled texts are all in English and all share a very specific linguistic context, the higher mean is not surprising. The longer the random text aggregates we normalize and compare, the more the basic features of English will dominate each vector and the likelier both passages will represent similar word distributions. This is why it is important for my analysis that Question Periods are a consistent length and format: the variance of similarity scores is affected by the word count of the texts compared since words used in natural language are not normally distributed. The fact that the observed data shows a lower mean and more dispersed distribution than the random simulation implies that there are systematic differences in word choice between observed government and opposition passages, resulting in lower similarity scores than those obtained through random sampling. The simulated distribution appears to resemble more closely the distribution of scores at the daily debate level, which adds support to my interpretation that poor results at this level of analysis were due to excessive data noise. A final possibility to eliminate is that Question Period lengths differ between majority and minority parliaments (for example, due to fewer questions being allocated to the official opposition to accomodate additional third parties) thus producing a spurious difference between similarity scores under majority and minority conditions. To investigate, I perform two additional simulations drawing the counts of government and official opposition speeches from separate distributions matching those observed in the dataset under majority and minority conditions. For a majority, the observed parameters for governments are µ = 44.434, σ = 10.950 and for official oppositions µ = 24.223, σ = 9.713. For a minority, they are µ = 42.440, σ = 6.420 and µ = 21.681, σ = 7.276 respectively. The resultant simulated score distributions are shown below. As can be observed visually, all three are extremely similar. Appendix B. Quantitative Results: Additional Analysis 155

10000 Randomly-Generated Majority Government “Question Periods”, Variable Length 20000

15000

10000 Frequency 5000

0 0.00 0.25 0.50 0.75 Lexical Similarity

10000 Randomly-Generated Minority Government “Question Periods”, Variable Length

20000

15000

10000 Frequency

5000

0 0.00 0.25 0.50 0.75 Lexical Similarity Appendix C

Technical Information

C.1 Preprocessing

In additional preprocessing of the Lipad dataset, I removed the following: topic and stage-direction events (ie. procedural text), unattributed speeches, speeches made by non-MPs including senators and appointed ministers, exceptionally long speeches (typically improperly parsed volume indices), speeches made by parties minor enough not to be included in opinion polls, and speeches of MPs who were elected in byelections and subsequently failed to retain their seats in the next election. This filtering, joins of additional electoral and party data, and transformation of features to dummy variables (for example, minister status, opposition status, and government status) were performed at the database level either through arbitrary queries or using the django ORM (Django Software Foundation, 2016). A final query was used to generate a table for export including the appropriate date and context (Question Period) filters and exported to a Python object.

C.2 Computation and Analysis

Running the full analysis (4.3 GB of compressed data) might take weeks on a home computer. For this reason, I leveraged AWS spot instances generated using Louis Aslett’s R AMIs (Aslett, 2017). Encapsu- lating the data analysis in a deployable package including setup scripts allowed me to take advantage of powerful AWS computing instances tuned for my precise needs (high RAM, low CPU), to customize the installation environment (for example, to sync automatically with Dropbox to write out intermediate model backups to cloud storage), and to deploy a new machine in less than 15 minutes in case of a software crash. I used pandas (McKinney, 2010) for creating and manipulating the data matrix,sklearn (Pedregosa et al., 2011) for hashing vectorization, normalization, and cosine similarity calculations, and gensim (Řehůřek & Sojka, 2010) to perform doc2vec modelling. doc2vec model hyperparameters were set as follows: size = 500, window = 4, mincount = 20, workers = 4, iter = 50. I use the default PV- DBOW algorithm rather than PV-DM for performance reasons; a pilot comparison of both approaches revealed negligible differences in the distribution of results using the Question Period dataset. Output from the Python model was exported to csv for offline analysis and visualization in R (R Core Team, 2018). I used tidyverse (Wickham, 2017) and zoo (Zeileis & Grothendieck, 2005) for data manipu- lation; lme4 (Bates, Mächler, Bolker, & Walker, 2015), lmerTest (Kuznetsova et al., 2017), splines,

156 Appendix C. Technical Information 157 stats, and strucchange (Zeileis, 2006; Zeileis et al., 2003; Zeileis, Leisch, Hornik, & Kleiber, 2002) for analysis; and ggplot2 (Wickham, 2016), tikzDevice (Sharpsteen & Bracken, 2018), sjPlot (Lüdecke, 2018), and broom (Robinson & Hayes, 2018) for data visualization and export. Specifically for the qualitative analysis (Chapter 6) I removed from the 38th and 39th Parliament data export those speeches belonging to subtopics with fewer than 50 valid speech pairs. This was because the distribution of subtopic counts is extremely skewed to the right with a very long tail. That is, there were dozens of subtopics on which only one or two exchanges take place, and a small amount of subtopics where there were greater than 100 exchanges. 50 was a reasonable cutoff across both parliaments in order to capture a subset of the most significant debate topics within a parliament while still having a range of topic sizes (ie. not solely focusing on the outlier topics with hundreds of speeches, which may be less representative of “normal” parliamentary dynamics). I then filtered the dataset for contiguous opposition-government speech pairs within shared subtopics and performed additional statistical tests using the packages sqlalchemy (SQLAlchemy, 2018), pandas (McKinney, 2010) and sklearn (Pedregosa et al., 2011). For subsequent analysis in R, I used tidyverse (Wickham, 2017) for data manipulation, and onewaytests (Dag, Dolgun, & Konar, 2018), userfriendlyscience (Peters, 2018), and multcompView (Graves, Piepho, & Selzer, 2015) for analysis. Appendix D

Model of Parliamentary Speech Content

Recall that Bäck and Debus propose the following utility equation for an individual MP’s decision to speak or not speak (Bäck & Debus, 2016, 28):

Uspeak = P · Bpolicy + P · Bvotes + P · Boffice − C + S (D.1)

The B terms are collective incentives, or those benefits that accrue to everyone within the MP’s party upon delivering a decisive speech (an outcome with probability P ). Helpfully, Bäck and Debus separate these benefits into three types, reflecting the assertion that speeches yield not only benefits in terms of policy representation, but can also serve political goals of vote seeking and office seeking. The S term represents selective benefits of speaking, which accrue only to the MP in question based upon their individual characteristics. In their second model, which explains speech content in terms of deviation from the party line, Bäck and Debus suggest reframing an MP’s decision to deviate based upon their individual differences from collective positions on policy, vote, and office-seeking goals. Although they mention the existence of a connection between this second model and the first, they do not pursue this suggestion in much detail, instead developing a simple “choice calculus” that yields general predictions about the decision of an MP to deviate or not from the party line (Bäck & Debus, 2016, 30). In Equation D.2, I combine these general predictions with the logic of their speech decision model. First, we can divide S into truly selective benefits, I, that occur intrinsically to any speech (such as personal satisfaction derived from voicing one’s opinion), and selective political benefits defined in relation to collective goals, S (such as the benefit of representing one’s own true policy opinion, which differs from one’s party’s position). For clarity, we could also separate out the costs CB, CS, and CI of each of the types of speech, which Bäck and Debus explicitly collapse in their notation.

Uspeak = P (Bpolicy + Bvotes + Boffice) − CB + P (Spolicy + Svotes + Soffice) − CS + I − CI (D.2)

The probability P , that the given speech is decisive, has also been collapsed in the notation by Bäck and Debus for the purposes of simplicity (Bäck & Debus, 2016, 28). However, it makes sense that the

158 Appendix D. Model of Parliamentary Speech Content 159 likelihood of a speech being decisive for some type of collective political benefit will vary across types of benefits. For example, the closer to an election, we might expect the probability of an office-seeking speech being decisive might increase; likewise, the decisiveness of a policy-related speech might be more likely the farther away from an election. At the same time, there is some shared component to the probability of decisiveness, based on institutional circumstances such as government status. Thus, I factor each probability P into two components: the general probability that a speech will be decisive,

PdB, and the specific probability that a speech associated with a particular political goal will be decisive for that goal, for example PBp. Having separated selective benefits into truly selective benefits I and selective benefits defined in relation to collective benefits S, we can expect there are probabilities associated with the latter, whereas truly selective benefits accrue regardless of decisiveness. For example, at the individual level, an MP who represents a low population rural district may have a larger probability of cultivating a personal vote through a dissenting speech PSv. However, we can assume the personal psychological benefit they receive from standing up and being recognized in the House, for example, is constant and relatively homogenous across MPs. Likewise, there is a general probability that any speech will be decisive on individual benefits, PdS. These changes are summarized in Equation D.3. At this point I have not made any substantive changes to Bäck and Debus’ model, but simply expanded the notation used in their book based on their theoretical perspective in anticipation of my additions.

Uspeak = PdB(PBpBp + PBvBv + PBoBo) − CB + PdS(PSpSp + PSvSv + PSoSo) − CS + I − CI (D.3)

To adapt this model to my research question, I begin with the assumption that a given speech under study was made, thus Uspeak ≥ 0. In this case, I expect an MP sought to maximize their expected utility gain from the speech they delivered by structuring its content according to the priorities that were dictated by their speech utility function. In other words, in the model above, there are two primary reasons an MP might speak: an MP could choose to speak about their party’s priorities, or their own priorities in relation to those partisan priorities. In practice, the content of speeches will be textually constructed to reflect some mix of the two, plus some individualistic variation. Likewise, within each set of priorities are three possible political goals, a combination of which may be potentially served by a decisive speech: policy-seeking, vote-seeking, or office-seeking goals. In all, then, there are six possible political goals of speech that contribute to the decision to speak. There also remains a seventh constant individual effect I, which serves a similar purpose to the “duty” term in the calculus of voting model. Even when the probability of delivering a decisive speech for either personal or collective political goals is very low, MPs will still benefit some amount from simply delivering a speech in their own words. Assuming that Uspeak ≥ 0, then, I argue that the relative contribution of these seven terms to Uspeak is proportional to the likelihood of speech content proportions relating to each of these seven topics. In my adaptation of the Bäck and Debus model, I assume the speaker maximizes their utility by “spending” some percentage of words in their speech to further particular collective or individual goals. Assume that the proportional allocations of words, or spending, within a speech sums to 1. This represents a simplification since the length of a speech could certainly impact its likelihood of decisiveness, among other factors. However, this should not be a problem for two reasons. First, we are interested in the relative variation in the proportion of different speech goals over time, not their “true” values; second, Appendix D. Model of Parliamentary Speech Content 160

Equation D.3 already includes a variable capturing variation in the probability of speech decisiveness that could reasonably encompass the effect of variation in speech length. Choosing to spend words on collective or individual goals represents a trade-off. Following Proksch and Slapin, the cost of speaking about goals of one sort of the other varies with both the weight the individual MP places on representing their own political perspective and the weight the party or party leader places on cohesion (Proksch & Slapin, 2014, 39). In short, I assume the relative cost of toeing the party line or discussing individual goals within a speech—the CB and CS terms above—are a function of partisan control of speeches; the higher party discipline, the lower are collective costs of speech for an individual member, and the higher the individual cost for a member who decides to defect. However, in this model I differ from Proksch and Slapin’s approach to modelling party discipline in two ways. First, as a simplification I collapse the consideration of absolute difference between partisan and individual positions. I only model the existence of two such separate positions, the relative proportion of which varies according to party control of speech. This is because my research question concerns goals of accountability and representation at the collective level; I am not interested in the relative distance between collective and individual positions, only in controlling for the effect of individual positions so that I can be more certain of capturing the collective relationship. This simplification also increases measurement validity, since to control for individual positions within such a content model I need only consider linguistic, not ideological, overlap. That is, my measure of party discipline specifically aims to capture party control over the linguistic content of speeches, not the stronger claim of partisan ideological cohesion which would be necessary to incorporate ideological distance between parties and individual MPs. I therefore avoid the pitfalls of scaling words to ideological positions inherent to methodologies such as Wordfish, such as the difficulties of comparison over time. Second, with reference to the influence of time, I separate out the influence of party discipline (or more specifically, party control over speech content) into two components: short run and long run. This addresses the theoretical oversight in Proksch and Slapin’s model regarding party discipline and its resulting erroneous prediction in the Canadian case. Party cohesion is not of the same long-run strength across country cases, for reasons including but not limited to differences in electoral systems. However, following Kam (2009), party cohesion is neither automatic nor monolithic may will fluctuate in strength over the short run due to political considerations. As Godbout and Høyland find, for example, the number of private members bills on the agenda within a given parliament has a significant impact on voting cohesion (Godbout & Høyland, 2015, 560). Due to the focus of my research question on the collective level, I simplify Equation D.3 by considering the three collective goals B and collapsing S back into one term. In other words, let Tp be proportional to Pd(PBpBp), Tv to Pd(PBvBv), To to Pd(PBoBo), and S to Pd(PSpSp +PSvSv +PSoSo). Let w ∈ (0, 1) represent the short-run strength of party discipline. Thus, let (1 − w)CT and wCS represent the costs of each of the two types of goal-oriented speech (collective and selective) respectively. Each reflects a combination of the current discipline situation w and the long-run institutional costs of speech delivery

(including those related to long run party discipline) CT and CS. In other words, as party discipline increases in the short run, the cost of speaking in support of collective goals decreases in a simple linear fashion. As short-run party discipline increases, the cost of toeing the party line in a speech decreases and the cost of defecting increases.1 All other things being equal, as w increases, a speaker will “buy” a

1Unlike Proksch and Slapin, I do not derive and prove a formal model for these relationships, instead following Bäck and Debus’ simpler approach. A related decision is the strong simplification of a linear party discipline relationship, since in reality we would expect party discipline would suffer some type of non-linear diminishing return. However, as I have Appendix D. Model of Parliamentary Speech Content 161 larger proportion of speech of type T and less of type S in their utility-maximizing allocation. Note that in the Canadian case, characterized by strong institutional constraints and norms of party discipline in speechmaking, we would expect the constant CT to be very low compared to the constant CS, meaning that even at a low w, allocations to S will remain low. Then the predicted composition of the content of that speech is given in Equation D.4:

(1 − w)CT (Tp + Tv + To) + wCSS + CI I = 1 (D.4)

To sum up generally, this model proposes the text of parliamentary speeches can be divided into three composing elements: collective political goals, individual political goals, and individual selective incentives. The first two of these are the primary constituents, and are proportionately related by their costs due to the influence of party discipline in both the short and long run. Within the domain of collective political goals, speech content can be divided according to three types of aims: policy seeking

Tp, vote seeking Tv, or office seeking To. Each of these relative proportions varies according to both the shared likelihood of a decisive speech, and the specific likelihood that, given a decisive speech, that speech will concern that particular collective political aim.

emphasized, I am less interested in studying the effects of party discipline than reasonably controlling for its effect. Appendix E

Lipad Digitization of Canadian Hansard Debates

In the following Appendix, I draw extensively on the article “Digitization of the Canadian Parliamentary Debates” describing the data process undertaken by the Toronto Dilipad team (Beelen et al., 2017). The Canadian House of Commons Debates, more commonly known as Hansard, is one of the most comprenensive and complete data sources on Canadian political history available. However, accessing and making use of this data has traditionally posed a daunting challenge for researchers. Containing over 650 million words and spanning 148 bound volumes, it would be physically impossible to read Hansard in its entirety (Beelen et al., 2017, 4). In 2013 the Library of Parliament produced digital scans of Hansard from 1867 to 1999, and entrusted them to Canadiana, a non-profit devoted to preserving Canadian heritage documents, to host online as PDF image files. Although images are ideal for conservation, they fail to solve many of the practical issues associated with the physical Hansard, for example limited searchability. Our first task was to convert these image files into machine-readable text, which could then be indexed, structured, and marked up in order to make a truly searchable and browsable dataset. First, we employed OCR software and developed error correction methods to convert images to usable plain text. Next, we designed pattern matching algorithms, or parsers, to identify structural clues in the plain text. This structure, such as typographic features and formatting changes, allowed us to reconstruct the original textual features of the proceedings that were lost in a plain text representation. For example, we separated individual days of debate, identified distinct speeches, detected topic headings and associated speeches with those topics, and identified speaker names. We translated this information into a hierarchical data structure that replicates the flow and structure of the original debate. The format we used to capture this structure is an XML (eXtensible Markup Language) standard for the markup of legislative debates called Political Mashup (PM), with our own addition of features intended to capture particulars of the Canadian source material not included in the PM standard (Beelen et al., 2017, 5). After reproducing the textual content of the debates as well as their structure in a hierarchical data format, we began to enrich the speech data with additional metadata. The first step to this process was to error-correct, identify, and link raw speaker names parsed in our text processing steps with an authoritative list of Canadian MPs. This presented a technical challenge: speakers’ full names are only printed in full during their first intervention of the day, MPs could have multiple spellings of their name

162 Appendix E. Lipad Digitization of Canadian Hansard Debates 163 or share their name with other MPs, and OCR errors make reliable matching of names more challenging, especially in earlier volumes of Hansard with inconsistent printing or physical degradation. We developed a disambiguation and matching algorithm to handle these issues and link speeches with MP files with a “speaker-link-confidence” measure to capture our uncertainty. In the end, we were able to successfully associate 98% of speeches with an MP (Beelen et al., 2017, 10). Once linked, additional data related to MP careers and partisan information was incorporated in XML member files. Another important aspect of the Dilipad project was data distribution and outreach, considering the wide range of potential academic, public, and government users of this historical resource. The first stage of our commitment to open data availability was the upload of the complete XML corpus online. Next, my development of a dataset website at lipad.ca aimed to increase accessability and relevance through two major additions to the existing Dilipad datset. First, in the interests of extending the dataset and its relevance, I incorporated Hansard data from years missing from the original PDF scans, namely from 1994 until present. The original reason for this date cutoff was simply that Hansard began to be posted online at the House of Commons web- site beginning in 1994, meaning there was no need for our group to digitize scans for this period. Instead, I leveraged the openparliament.ca parse of these online Hansard proceedings to bring the Dilipad dataset up to date, and to continuously update the dataset with daily proceedings utilizing the openparliament.ca API (application-programmer interface). openparliament.ca was the first project to parse Canadian House of Commons proceedings into a database format and provide an accessible web interface; it represents an invaluable contribution by its author, Michael Mulley, without which the extended Lipad dataset would not have been possible (Mulley, 2017). In order to make this addition possible, I first translated the Dilipad XML debates and member files into a SQL database schema, using PostgreSQL as the database back-end. Unlike an XML structure, which employs textual markup to store structure and metadata information in a hierarchical fashion, an SQL database organizes data as a set of tables with structural relationships and constraints, called a schema. Essentially, an SQL database can be envisioned as a set of Excel spreadsheets. Each column or field holds a predefined type of information (such as text or integer), and each row holds one record. Each spreadsheet or table can be linked together via structural relationships; for example, an ID column in one table can be used to link records with corresponding IDs in another table. For example, in the Lipad database, the primary table contains one row for each individual speech, including its text and basic information in columns as well as IDs that link each speech with additional data (such as MP files) stored in other tables. The advantage of moving from an XML data structure to a database schema is that it allows for vastly faster searching and querying of the entire dataset. It also makes extension of the dataset easier, for example by including new data tables that can be associated with existing records. I leverage this extensibility to incorporate electoral and polling data into the Lipad dataset for my analysis. Second, and most importantly for outreach, an SQL database makes it possible to serve the data via a website front-end, so that users not familiar with XML or SQL can easily make use of the data. I developed the lipad.ca website using Django, a Python-based web development framework, and the search engine backend Solr to provide advanced search and filtering of the Lipad dataset. Using the web interface, users can browse Hansard via date, flip back and forth through proceedings, search speech content and speech metadata, and view search results in-context. The complete Lipad dataset, updated on a monthly basis, is also available for download in PostgreSQL and CSV formats on the Data page Appendix E. Lipad Digitization of Canadian Hansard Debates 164 of the Lipad website. Source code for the database conversion and website is also open and available (Whyte, 2016). References

Abe, J. A. A. (2011). Changes in Alan Greenspan’s language use across the economic cycle: A text analysis of his testimonies and speeches. Journal of Language and Social Psychology, 30 (2), 212– 223. Abedi, M. (2018, October 16). Jagmeet Singh trails in latest poll–what could that mean for the 2019 election? Global News. Retrieved from https://globalnews.ca/news/4556399/federal -election-2019-trudeau-singh/ (Online; accessed 6-December-2018.) Albaugh, Q., Sevenans, J., & Soroka, S. (2013). Lexicoder Topic Dictionary. Retrieved from http:// www.lexicoder.com (Online; accessed 13-February-2017.) Aslett, L. (2017). RStudio Server Amazon Machine Image (AMI). Retrieved from http://www .louisaslett.com/RStudio_AMI/ (Online; accessed 21-June-2017.) Atkinson, M. M., & Docherty, D. C. (1992). Moving right along: The roots of amateurism in the Canadian House of Commons. Canadian Journal of Political Science, 25 (2), 295–318. Atkinson, M. M., & Thomas, P. G. (1993). Studying the Canadian Parliament. Legislative Studies Quarterly, 18 (3), 423–451. Aubry, J. (2007, November 10). Harper announces review of Mulroney-Schreiber allegations. CanWest News. Ayyangar, S., & Jacob, S. (2014). Studying the Indian Legislature: What does Question Hour reveal? Studies in Indian Politics, 2 (1), 1-19. Ayyangar, S., & Jacob, S. (2015). Question Hour activity and party behaviour in India. The Journal of Legislative Studies, 21 (2), 232–249. Baayen, R. H. (2001). Word Frequency Distributions. Dordrecht: Springer Science & Business Media. Bach, S. (2008). Senate amendments and legislative outcomes in Australia, 1996–2007. Australian Journal of Political Science, 43 (3), 395–423. Bächtiger, A. (2014). Debate and deliberation in legislatures. In S. Martin, T. Saalfeld, K. W. Kaare W. Strøm, & A. Bächtiger (Eds.), The Oxford Handbook of Legislative Studies. Oxford: Oxford University Press. Bächtiger, A., & Hangartner, D. (2010). When deliberative theory meets empirical political science: Theoretical and methodological challenges in political deliberation. Political Studies, 58 (4), 609– 629. Bäck, H., & Debus, M. (2016). Political Parties, Parliaments and Legislative Speechmaking. New York: Springer. Bakvis, H., & Skogstad, G. (2012). Conclusion: Taking stock of Canadian federalism. In H. Bakvis & G. Skogstad (Eds.), Canadian Federalism: Performance, Effectiveness, and Legitimacy (3rd ed., p. 340-357). Don Mills, ON: Oxford University Press.

165 REFERENCES 166

Ball, C. N. (1994). Automated text analysis: Cautionary tales. Literary and Linguistic Computing, 9 (4), 295–302. Ballard, E. J. (1983). Canadian prime ministers: Complexity in political crises. Canadian Psychology, 24 (2), 125. Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67 (1), 1–48. doi: 10.18637/jss.v067.i01 Baumann, M., Debus, M., & Müller, J. (2015). Personal characteristics of MPs and legislative behavior in moral policymaking. Legislative Studies Quarterly, 40 (2), 179–210. Beelen, K., Thijm, T. A., Cochrane, C., Halvemaan, K., Hirst, G., Kimmins, M., . . . Whyte, T. (2017). Digitization of the Canadian parliamentary debates. Canadian Journal of Political Science, 50 (3), 849–864. Benoit, K., & Laver, M. (2008). Compared to what? A comment on “A robust transformation procedure for interpreting political text” by Martin and Vanberg. Political Analysis, 16 (1), 101–111. Benton, M., & Russell, M. (2012). Assessing the impact of parliamentary oversight committees: The select committees in the British House of Commons. Parliamentary Affairs, 66 (4), 772–797. Bevitori, C. (2004). Negotiating conflict: Interruptions in British and Italian parliamentary debates. In P. Bayley (Ed.), Cross-cultural Perspectives on Parliamentary Discourse (pp. 87–109). Amsterdam: John Benjamins. Blaikie, B., Boyer, P., Boudria, D., Dalphond-Guiral, M., & Nystrom, L. (2006). The wisdom of the elders: a round table on reform of the House of Commons. Canadian Parliamentary Review, 29 (3), 33–39. Blanchfield, M. (2006, February 8). Defence minister takes flak: O’Connor under fire for lobbyist past, links to defence industry. Saskatoon Star-Phoenix, C12. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3 (Jan), 993–1022. Blidook, K. (2010). Exploring the role of “legislators” in Canada: Do Members of Parliament influence policy? The Journal of Legislative Studies, 16 (1), 32–56. Blidook, K. (2012). Constituency Influence in Parliament: Countering the Centre. : UBC Press. Blidook, K., & Byrne, M. (2013). Constant campaigning and partisan discourse in the House of Com- mons. In R. Koop & A. Bittner (Eds.), Parties, Elections, and the Future of Canadian Politics (pp. 46–66). Vancouver: UBC Press. Blidook, K., & Kerby, M. (2011). Constituency influence on “constituency members”: The adaptability of roles to electoral realities in the Canadian case. The Journal of Legislative Studies, 17 (3), 327–339. Bod, R., Hay, J., & Jannedy, S. (2003). Introduction. In R. Bod, J. Hay, & S. Jannedy (Eds.), Probabilistic Linguistics (p. 1-10). Cambridge, MA: MIT Press. Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching word vectors with subword information. arXiv. Retrieved from https://arxiv.org/abs/1607.04606 (arXiv preprint arXiv:1607.04606) Bosc, M., & O’Brien, A. (2009). House of Commons Procedure and Practice (2nd ed.). Ottawa: Editions Yvon Blais. Bourgault, J. (2011). Minority government and senior government officials: the case of the Canadian REFERENCES 167

federal government. Commonwealth & Comparative Politics, 49 (4), 510–527. Bovens, M. (2005). The concept of public accountability. In E. Ferlie, L. E. Lynn, & C. Pollitt (Eds.), The Oxford Handbook of Public Management (pp. 182–208). New York: Oxford University Press. Bovens, M. (2010). Two concepts of accountability: Accountability as a virtue and as a mechanism. West European Politics, 33 (5), 946–967. Burnham, K. P., & Anderson, D. R. (2004). Multimodel inference: understanding AIC and BIC in model selection. Sociological Methods & Research, 33 (2), 261–304. Caluwaerts, D. (2012). Confrontation and Communication: Deliberative Democracy in Divided Belgium. Bern: Peter Lang. Campion-Smith, B., Kennedy, B., Oved, M. C., Ballingall, A., Boutilier, A., & MacCharles, T. (2018, June 20). Parliament in check. The Toronto Star. Retrieved from http://projects.thestar.com/ question-period/index.html (Online; accessed 21-June-2018.) Canadian Gallup Poll. (2007). Canadian Gallup Polls, 1951–2000. Retrieved from http://odesi.ca (Aggregate data. Online; accessed 13-February-2017.) Canadian Press. (2005, May 13). Volpe asks RCMP and ethics commissioner to investigate MPs: CTV. Canadian Press NewsWire. Canadian Press. (2007, November 15). Chronology of events surrounding Brian Mulroney, Karlheinz Schreiber and Airbus. Canadian Press. Carey, J. M. (2007). Competing principals, political institutions, and party unity in legislative voting. American Journal of Political Science, 51 (1), 92–107. CBC News. (2005a, October 19). Dingwall says expenses “falsely reported”. CBC News. Re- trieved from https://www.cbc.ca/news/canada/dingwall-says-expenses-falsely-reported -1.538024 (Online; accessed 20-November-2018.) CBC News. (2005b, January 14). Immigration minister resigns. CBC News. Retrieved from http://www .cbc.ca/news/canada/toronto/immigration-minister-resigns-1.520784 (Online; accessed 20-November-2018.) CBC News Online. (2006, October 26). Federal sponsorship scandal. CBC News Online. Retrieved from http://www.cbc.ca/news2/background/groupaction/ (Online; accessed 20-November-2018.) CBC News Online. (2007, October 1). Canada’s tainted blood scandal: a timeline. CBC News Online. Retrieved from https://www.cbc.ca/news2/background/taintedblood/bloodscandal _timeline.html (Online; accessed 20-September-2018.) Chong, M. (2008). Rethinking Question Period and debate in the House of Commons. Canadian Parliamentary Review, 31 (3), 5–7. Chong, M., Jennings, M., Laframboise, M., Davies, L., & Lukiwiski, T. (2010). What to do about Question Period: A roundtable. Canadian Parliamentary Review, 33 (3), 2–8. Clarkson, S. (2006). The Big Red Machine: How the Liberal Party Dominates Canadian Politics. Vancouver: UBC Press. Clinton, J., Jackman, S., & Rivers, D. (2004). The statistical analysis of roll call data. American Political Science Review, 98 (02), 355–370. Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. Routledge. Comeau, P. (2005). Hepatitis C compensation may be extended. CMAJ: Canadian Medical Association Journal, 172 (1), 25. Comparative Agendas Project. (2015). Comparative Agendas Project: Comparing Policies Worldwide. REFERENCES 168

Retrieved from http://www.comparativeagendas.net/ (Online; accessed 13-February-2017.) Conley, R. S. (2011). Legislative activity in the Canadian House of Commons: Does majority or minority government matter? American Review of Canadian Studies, 41 (4), 422–437. Constitution Act, 1867. (1867). (30 & 31 Victoria, c. 3 (U.K.)). Converse, P. E. (1964). The nature of belief systems in mass publics. In D. Apter (Ed.), Ideology and Discontent (pp. 206–261). London: Free Press of Glencoe. Curry, B. (2017, April 5). Trudeau fields every question in Question Period as parliamentary reform battle rages. . Retrieved from http://www.theglobeandmail.com/ news/politics/trudeau-fields-every-question-as-parliamentary-reform-battle-rages/ article34611215/ (Online; accessed 20-November-2018.) Dag, O., Dolgun, A., & Konar, N. (2018). onewaytests: An R Package for One-Way Tests in Independent Groups Designs. The R Journal, 10 (1), 175–199. Retrieved from https://journal.r-project .org/archive/2018/RJ-2018-022/index.html Dalton, R. J., & Wattenberg, M. P. (2002). Parties Without Partisans: Political Change in Advanced Industrial Democracies. Oxford: Oxford University Press. Dawson, W. F. (1962). Procedure in the Canadian House of Commons. Toronto: University of Toronto Press. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41 (6), 391. Delacourt, S. (2010, February 3). Paul Martin laments loss of child-care program he built. The Toronto Star. Retrieved from https://www.thestar.com/news/canada/2010/02/03/ paul_martin_laments_loss_of_childcare_program_he_built.html (Online; accessed 20- November-2018.) Denny, M. J., & Spirling, A. (2018). Text preprocessing for unsupervised learning: why it matters, when it misleads, and what to do about it. Political Analysis, 26 (2), 168–189. Diermeier, D., Godbout, J.-F., Yu, B., & Kaufmann, S. (2012). Language and ideology in Congress. British Journal of Political Science, 42 (01), 31–55. Django Software Foundation. (2016). Django. Retrieved from https://djangoproject.com (Online; accessed 1-December-2018.) Dobell, P., & Reid, J. (1992). A larger role for the House of Commons Part I: Question Period. Parliamentary Government, 40 , 5–10. Docherty, D. C. (1997). Mr. Smith goes to Ottawa: Life in the House of Commons. Vancouver: UBC Press. Docherty, D. C. (2005). Legislatures. Vancouver: UBC Press. Dodek, A. (2009). Fixing our fixed election date legislation. Canadian Parliamentary Review, 32 (1), 18–20. Doern, G. B. (2007). The Harper Conservatives in power: Emissions impossible. In G. B. Doern (Ed.), How Ottawa Spends, 2007-2008: The Harper Conservatives–Climate of Change (pp. 3–24). Montreal: McGill-Queen’s Press. Downs, A. (1957). An economic theory of political action in a democracy. Journal of Political Economy, 65 (2), 135–150. Edwards, L. J., Muller, K. E., Wolfinger, R. D., Qaqish, B. F., & Schabenberger, O. (2008). An r2 statistic for fixed effects in the linear mixed model. Statistics in medicine, 27 (29), 6137–6157. REFERENCES 169

Eggers, A. C., & Spirling, A. (2014). Ministerial responsiveness in Westminster systems: Institutional choices and House of Commons debate, 1832–1915. American Journal of Political Science, 58 (4), 873–887. Elgie, R., & Stapleton, J. (2006). Testing the decline of parliament thesis: Ireland, 1923–2002. Political Studies, 54 (3), 465–485. Environics. (2010). Canadian Public Opinion Trends: Approval, Voting Intentions. Retrieved from http://www.queensu.ca/cora/_trends/Ap_Voting.htm (Online; accessed 13-February-2017.) Erikson, R. S., & Tedin, K. L. (2003). American public opinion: Its origins, content and impact. New York: Routledge. Esterling, K. M. (2011). “Deliberative Disagreement” in US health policy committee hearings. Legislative Studies Quarterly, 36 (2), 169–198. Esuli, A., & Sebastiani, F. (2007). SentiWordNet: a high-coverage lexical resource for opinion mining. Evaluation, 17 , 1–26. Evert, S., & Baroni, M. (2007). zipfR: Word frequency distributions in R. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions (pp. 29–32). Falcone, D. (1974). Minority government in Canada: Jeux parliamentaires. The Round Table, 64 (255), 259–276. Feng, L., Jansche, M., Huenerfauth, M., & Elhadad, N. (2010). A comparison of features for auto- matic readability assessment. Proceedings of the 23rd International Conference on Computational Linguistics: Posters, 276–284. Finer, S. E. (1956). The individual responsibility of ministers. Public Administration, 34 (4), 377–377. Fishkin, J. S., & Luskin, R. C. (2005). Experimenting with a democratic ideal: Deliberative polling and public opinion. Acta Politica, 40 (3), 284–298. Flanagan, T. (2009). Harper’s Team: Behind the Scenes in the Conservative Rise to Power. Montreal: McGill-Queen’s Press. Flinders, M., & Kelso, A. (2011). Mind the gap: Political analysis, public expectations and the par- liamentary decline thesis. The British Journal of Politics and International Relations, 13 (2), 249–268. Forsey, E. (1964). The problem of “minority” government in Canada. Canadian Journal of Economics and Political Science, 30 (1), 1–11. Franks, C. E. S. (1987). The Parliament of Canada. Toronto: University of Toronto Press. Garner, C. (1998). Reforming the House of Commons: Lessons from the past and abroad. Canadian Parliamentary Review, 21 (4), 28–32. Geller-Schwartz, L. (1979). Minority government reconsidered. Journal of Canadian Studies, 14 (2), 67–79. Gelman, A., & Hill, J. (2006). Data Analysis using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press. General Assembly of the United Nations. (2005, March 5). General Assembly Adopts United Nations Declaration on Human Cloning by Vote of 84-34-37. Retrieved from https://www.un.org/press/ en/2005/ga10333.doc.htm (Online; accessed 13-September-2018.) Germany: Basic Law for the Federal Republic of Germany. (1949, May 23). Germany. Gervais, M. (2012). Challenges of Minority Governments in Canada. Ottawa: Invenire Books. Godbout, J.-F., & Høyland, B. (2011a). Coalition voting and minority governments in Canada. Com- REFERENCES 170

monwealth & Comparative Politics, 49 (4), 457–485. Godbout, J.-F., & Høyland, B. (2011b). Legislative voting in the Canadian Parliament. Canadian Journal of Political Science, 44 (2), 367–388. Godbout, J.-F., & Høyland, B. (2013). The emergence of parties in the Canadian House of Commons (1867–1908). Canadian Journal of Political Science, 46 (4), 773–797. Godbout, J.-F., & Høyland, B. (2015). Unity in diversity? The development of political parties in the Parliament of Canada, 1867–2011. British Journal of Political Science, 47 , 545–569. Goldenberg, E. (2006). The Way it Works: Inside Ottawa. Toronto: McClelland and Stewart. Graves, S., Piepho, H.-P., & Selzer, L. (2015). multcompview: Visualizations of paired compar- isons [Computer software manual]. Retrieved from https://CRAN.R-project.org/package= multcompView (R package version 0.1-7) Greenaway, N. (2008, May 17). No charges over Cadman, but Grits vow to push on. Winnipeg Free Press. Grimmer, J., & King, G. (2011). General purpose computer-assisted clustering and conceptualization. Proceedings of the National Academy of Sciences, 108 (7), 2643–2650. Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21 (3), 267–297. Habermas, J. (1996). Between Facts and Norms: Contributions to a Siscourse Theory of Law and Democracy. Cambridge, MA: MIT Press. Habermas, J. (2005). Concluding comments on empirical approaches to deliberative politics. Acta Politica, 40 (3), 384. Hancock, J. T., Curry, L. E., Goorha, S., & Woodworth, M. (2007). On lying and being lied to: A linguistic analysis of deception in computer-mediated communication. Discourse Processes, 45 (1), 1–23. Hanes, A. (2008, March 4). PM knew nothing, Cadman’s wife says; “Telling The Truth”. . Hanes, A., Aubry, J., & White, M. (2007, November 13). Mulroney demands inquiry to clear name in Airbus Affair. CanWest News. Harris, M. (2014). Party of One: Stephen Harper and Canada’s Radical Makeover. Toronto: Viking. Hart, R. P. (1984). Verbal Style and the Presidency: A Computer-based Analysis. Orlando, FL: Academic Press. Hart, R. P. (2001). Redeveloping Diction: theoretical considerations. In M. D. West (Ed.), Theory, Method, and Practice in Computer Content Analysis (pp. 43–60). Westport, CT: Greenwood Publishing Group. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York, NY: Springer-Verlag. Heard, A. (2007). Just what is a vote of confidence? The curious case of May 10, 2005. Canadian Journal of Political Science, 40 (2), 395–416. Hirst, G., Riabinin, Y., & Graham, J. (2010). Party status as a confound in the automatic classification of political speech by ideology. In Proceedings, 10th International Conference on Statistical Analysis of Textual Data/10es Journées internationales d’Analyse statistique des Données Textuelles (JADT 2010), Rome (pp. 731–742). Hix, S., & Noury, A. (2016). Government-opposition or left-right? The institutional determinants of voting in legislatures. Political Science Research and Methods, 4 (2), 249–273. REFERENCES 171

Hofmann, T. (1999). Probabilistic latent semantic analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 289–296). Hogg, P. W. (2014). Constitutional Law of Canada: 2014 Student Edition. Toronto: Thomson Carswell. Hollis, C. (1949). Can Parliament Survive? London: Hollis & Carter. Hood, C., & Lodge, M. (2006). The Politics of Public Service Bargains: Reward, Competency, Loyalty and Blame. Oxford: Oxford University Press. Hooghe, L., Marks, G., & Wilson, C. J. (2002). Does left/right structure party positions on European integration? Comparative Political Studies, 35 (8), 965–989. Hopkins, D. J., & King, G. (2010). A method of automated nonparametric content analysis for social science. American Journal of Political Science, 54 (1), 229–247. Ilie, C. (2001). Unparliamentary language: Insults as cognitive forms of ideological confrontation. In R. Dirven, R. Frank, & C. Ilie (Eds.), Language and Ideology Vol. 2: Descriptive Cognitive Practices (pp. 235–264). Amsterdam: John Benjamins. Ilie, C. (2004). Insulting as (un) parliamentary practice in the British and Swedish parliaments. In P. Bayley (Ed.), (pp. 45–86). Amsterdam: John Benjamins Publishing. Ilie, C. (2010). Strategic uses of parliamentary forms of address: The case of the UK Parliament and the Swedish Riksdag. Journal of Pragmatics, 42 (4), 885–911. Ilie, C. (2015). Parliamentary discourse. In K. Tracy, C. Ilie, & T. Sandel (Eds.), The International Encyclopedia of Language and Social Interaction. Hoboken, NJ: Wiley–Blackwell. Iyyer, M., Enns, P., Boyd-Graber, J., & Resnik, P. (2014). Political ideology detection using recursive neural networks. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 1113–1122). Jackson, R. J., & Atkinson, M. M. (1980). The Canadian Legislative System: Politicians and Policy- making. Toronto: Macmillan of Canada. Jaeger, B. (2017). r2glmm: Computes r squared for mixed (multilevel) models [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=r2glmm (R package version 0.1.2) Jayal, N. G. (2006). Engendering local democracy: The impact of quotas for women in India’s panchay- ats. Democratisation, 13 (1), 15–35. Jeffrey, B. (2010). Divided Loyalties: The , 1984–2008. Toronto: University of Toronto Press. Jensen, J., Kaplan, E., Naidu, S., Wilse-Samson, L., Gergen, D., Zuckerman, M., & Spirling, A. (2012). Political polarization and the dynamics of political language: Evidence from 130 years of partisan speech. Brookings Papers on Economic Activity, 43 (2 (Fall)), 1–81. Jiménez, M. (2004, December 4). Sgro defends department against “strippergate”. The Globe and Mail. Retrieved from https://www.theglobeandmail.com/news/national/sgro-defends -department-against-strippergate/article20437252/ (Online; accessed 20-November- 2018.) Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In C. Nedellec & C. Rouveirol (Eds.), 10th European Conference on Machine Learning (pp. 137–142). Berlin: Springer. Jurafsky, D., & Martin, J. H. (2000). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech. New York, NY: Pearson Education. REFERENCES 172

Kam, C. (2000). Not just parliamentary “Cowboys and Indians”: ministerial responsibility and bureau- cratic drift. Governance, 13 (3), 365–392. Kam, C. (2001). Do ideological preferences explain parliamentary behaviour? Evidence from Great Britain and Canada. Journal of Legislative Studies, 7 (4), 89–126. Kam, C. (2006). Demotion and dissent in the Canadian Liberal Party. British Journal of Political Science, 36 (3), 561–574. Kam, C. (2009). Party Discipline and Parliamentary Politics. Cambridge: Cambridge University Press. Kam, C., & Indridason, I. (2005). The timing of cabinet reshuffles in five Westminster parliamentary systems. Legislative Studies Quarterly, 30 (3), 327–363. Kapur, D., & Mehta, P. B. (2006). The Indian Parliament as an Institution of Accountability. Geneva: United Nations Research Institute for Social Development. Kerby, M., & Blidook, K. (2011). It’s not you, it’s me: Determinants of voluntary legislative turnover in Canada. Legislative Studies Quarterly, 36 (4), 621–643. Kernaghan, K. (1979, Summer). Power, Parliament and public servants in Canada: ministerial respon- sibility reexamined. Canadian Public Policy, 3 , 383–396. Kinder, D. R., & Iyengar, S. (1987). News that Matters: Television and American Opinion. Chicago, IL: University of Chicago Press. King, G., & Lowe, W. (2003). An automated information extraction tool for international conflict data with performance as good as human coders: A rare events evaluation design. International Organization, 57 (03), 617–642. Klapper, J. T. (1960). The Effects of Mass Communication. Glencoe, IL: Free Press. Klebanov, B. B., Diermeier, D., & Beigman, E. (2008). Lexical cohesion analysis of political speech. Political Analysis, 16 (4), 447–463. Krippendorff, K. (1980). Content Analysis: An Introduction to Its Methodology. Newbury Park, CA: Sage Publications. Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82 (13), 1–26. doi: 10.18637/jss.v082.i13 Laakso, M., & Taagepera, R. (1979). ’effective’ number of parties: a measure with application to west europe. Comparative Political Studies, 12 (1), 3–27. Labbé, D., & Monière, D. (2003). Le discours gouvernemental. Canada, Québec, France (1945-2000). Paris: Honoré Champion. Labbé, D., & Monière, D. (2010). Quelle est la spécificité des discours électoraux? Le cas de Stephen Harper. Canadian Journal of Political Science, 43 (01), 69–86. Laham, D. (1997). Latent semantic analysis approaches to categorization. In M. G. Shafto & P. Lan- gley (Eds.), Proceedings of the 19th Annual Conference of the Cognitive Science Society (p. 979). Hillsdale: Lawrence Erlbaum Associates. Landauer, T. K., & Dumais, S. T. (1996). How come you know so much? From practical problems to new memory theory. In D. Hermann, C. McEvoy, C. Hertzog, P. Hertel, & M. Johnson (Eds.), Basic and Applied Memory: Memory in Context (pp. 105–126). Mahwah, NJ: Lawrence Erlbaum. Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104 (2), 211. Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. REFERENCES 173

Discourse Processes, 25 (2-3), 259–284. Lasswell, H. D. (1927). The theory of political propaganda. The American Political Science Review, 21 (3), 627–631. Lasswell, H. D. (1941). The world attention survey: an exploration of the possibilities of studying attention being given to the United States by newspapers abroad. Public Opinion Quarterly, 5 (3), 456–462. Lasswell, H. D. (1949). Why be quantitative? In H. D. Lasswell & N. Leites (Eds.), Language of Politics: Studies in Quantitative Semantics (pp. 40–52). Cambridge, MA: MIT Press. Lau, J. H., & Baldwin, T. (2016). An empirical evaluation of doc2vec with practical insights into document embedding generation. arXiv. Retrieved from http://arxiv.org/abs/1607.05368 (arXiv preprint arXiv:1607.05368) Laver, M., Benoit, K., & Garry, J. (2003). Extracting policy positions from political texts using words as data. American Political Science Review, 97 (2), 311–331. Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In E. P. Xing & T. Jebara (Eds.), Proceedings of the 31st International Conference on Machine Learning (ICML- 14) (pp. 1188–1196). LeDuc, L., McKenzie, J. I., Pammett, J. H., & Turcotte, A. (2010). Dynasties and Interludes: Past and Present in Canadian Electoral Politics. Toronto: Dundurn. Levy, O., & Goldberg, Y. (2014). Dependency-based word embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (Vol. 2, pp. 302–308). Library of Parliament. (2016). PARLINFO. Retrieved from https://lop.parl.ca/ParlInfo/ (Online; accessed 13-February-2017.) Lijphart, A. (2012). Patterns of Democracy: Government Forms and Performance in Thirty-Six Coun- tries. New Haven, CT: Yale University Press. Lincoln, C., Merrifield, R., Bergeron, S., Nystrom, L., & Casey, B. (2001). Members look at parliamen- tary reform. Canadian Parliamentary Review, 24 (2), 11–17. Lippmann, W. (1946). Public Opinion. New York, NY: Harcourt, Brace and Co. Lipton, Z. C. (2016). The mythos of model interpretability. In B. Kim, D. M. Malioutov, & K. R. Varsh- ney (Eds.), 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016). Retrieved from https://arxiv.org/abs/1606.03490 (arXiv preprint arXiv:1606.03490) Lo, J., Proksch, S.-O., & Slapin, J. B. (2016). Ideological clarity in multiparty competition: A new measure and test using election manifestos. British Journal of Political Science, 46 (3), 591–610. Loat, A., & MacMillan, M. (2014). Tragedy in the commons: Former Members of Parliament speak out about Canada’s failing democracy. Toronto: Random House Canada. Loewen, P. J., Koop, R., Settle, J., & Fowler, J. H. (2014). A natural experiment in proposal power and electoral success. American Journal of Political Science, 58 (1), 189–196. Loewen, P. J., & Rubenson, D. (2011). For want of a nail: Negative persuasion in a party leadership race. Party Politics, 17 (1), 45–65. Lord, C., & Tamvaki, D. (2013). The politics of justification? Applying the “Discourse Quality Index” to the study of the European Parliament. European Political Science Review, 5 (1), 27–54. Loughran, T., & McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. The Journal of Finance, 66 (1), 35–65. REFERENCES 174

Lowe, W., Benoit, K., Mikhaylov, S., & Laver, M. (2011). Scaling policy preferences from coded political texts. Legislative Studies Quarterly, 36 (1), 123–155. Lucas, C., Nielsen, R., Roberts, M., Stewart, B., Storer, A., & Tingley, D. (2015). Computer assisted text analysis for comparative politics. Political Analysis, 23 (2), 254–277. Lüdecke, D. (2018). sjplot: Data visualization for statistics in social science [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=sjPlot (R package version 2.6.0.9000) doi: 10.5281/zenodo.1308157 Mackintosh, J. P. (1978). The future of representative parliamentary democracy. In W. A. W. Neilson & J. C. MacPherson (Eds.), The Legislative Process in Canada: The Need for Reform (pp. 303–324). Toronto: Institute for Research on Public Policy. Mallory, J. (1979). Parliament: Every reform creates a new problem. Journal of Canadian Studies, 14 (2), 26–34. Malloy, J. (2002). The “responsible government” approach and its effect on Canadian legislative studies. Parliamentary Perspectives, 5 (November). Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge: Cambridge University Press. Manning, P. (1994). Obstacles and opportunities for Parliamentary reform. Canadian Parliamentary Review, 17 (2), 2–5. Marcelino, D. (2016). SciencesPo: A tool set for analyzing political behavior data [Computer software manual]. Retrieved from http://CRAN.R-project.org/package=SciencesPo (R package version 1.4.1) Marshall, G., & Moodie, G. C. (1959). Some Problems of the Constitution. London: Hutchison. Martin, L. W., & Vanberg, G. (2008). A robust transformation procedure for interpreting political text. Political Analysis, 16 (1), 93–100. Martin, P. (2005, April 21). Text of Prime Minister Paul Martin’s speech. CBC News. Retrieved from http://www.cbc.ca/news2/background/groupaction/address_martin.html (Online; accessed 20-November-2018.) Mayhew, D. (1974). Congress: The Electoral Connection. New Haven, CT: Yale University Press. McCombs, M. E., & Reynolds, A. (2002). News influence on our pictures of the world. In J. Bryant & D. Zillmann (Eds.), Media Effects: Advances in Theory and Research (pp. 1–18). Mahwah, NJ: Lawrence Erlbaum. McCombs, M. E., & Shaw, D. L. (1972). The agenda-setting function of mass media. Public Opinion Quarterly, 36 (2), 176–187. McGregor, G. (2008, February 29). Insurance benefits to face scrutiny; Terminally-ill MP faced reper- cussions with policy coverage on 2005 swing vote. The , A5. McKinney, W. (2010). Data structures for statistical computing in Python. In S. van der Walt & J. Millman (Eds.), Proceedings of the 9th Python in Science Conference (p. 51 - 56). Menini, S., Nanni, F., Ponzetto, S. P., & Tonelli, S. (2017). Topic-based agreement and disagreement in US electoral manifestos. In M. Palmer, R. Hwa, & S. Riedel (Eds.), Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 2938–2944). Menini, S., & Tonelli, S. (2016). Agreement and disagreement: Comparison of points of view in the political domain. In Y. Matsumoto & R. Prasad (Eds.), COLING 2016, the 26th International Conference on Computational Linguistics (pp. 2461–2470). REFERENCES 175

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 26 (pp. 3111–3119). Miller, T. (2003). Essay assessment with latent semantic analysis. Journal of Educational Computing Research, 29 (4), 495–512. Monière, D., Labbé, C., & Labbé, D. (2008). Les styles discursifs des premiers ministres québécois de à Jean Charest. Canadian Journal of Political Science, 41 (1), 43–69. Monière, D., & Labbé, D. (2014). Un siècle et demi de discours gouvernemental au Canada: Contribution de la lexicométrie à l’histoire politique. In E. Née (Ed.), 12th International Conference on Textual Data Statistical Analysis (pp. 485–494). Monroe, B. L., Colaresi, M. P., & Quinn, K. M. (2008). Fightin’ words: Lexical feature selection and evaluation for identifying the content of political conflict. Political Analysis, 16 (4), 372–403. Mucciaroni, G., & Quirk, P. J. (2006). Deliberative Choices: Debating Public Policy in Congress. Chicago: University of Chicago Press. Müller, W. C., & Strøm, K. (1999). Policy, Office, or Votes?: How Political Parties in Western Europe Make Hard Decisions. Cambridge: Cambridge University Press. Mulley, M. (2017). openparliament.ca: Keep Tabs on Parliament. Retrieved from http://www .openparliament.ca (Online; accessed 13-February-2017.) Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. Cambridge, MA: MIT Press. Nakagawa, S., & Schielzeth, H. (2013). A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in Ecology and Evolution, 4 (2), 133–142. Nanni, F., Glavaš, G., Ponzetto, S. P., Tonelli, S., Conti, N., Aker, A., . . . Yordanova, N. (2018). Findings from the hackathon on understanding Euroscepticism through the lens of textual data. In Proceedings of the LREC 2018 Workshop ParlaCLARIN (pp. 1–8). Miyazaki, Japan: LREC. Retrieved from http://ub-madoc.bib.uni-mannheim.de/44172/ National Social Watch. (2009). Citizens Report on Governance and Development 2008. New Delhi: National Social Watch. Naumetz, T. (2008, July 14). Expert contradicts Cadman tape claims. The Globe and Mail. Neilson, W. A. W., & MacPherson, J. C. (1978). Executive summary–an overview of the papers. In W. A. W. Neilson & J. C. MacPherson (Eds.), The Legislative Process in Canada: The Need for Reform (pp. 1–38). Toronto: Institute for Research on Public Policy. Nevitte, N. (1996). The Decline of Deference: Canadian Value Change in Cross-National Perspective. Peterborough, ON: Broadview Press. Nevitte, N., & White, S. (2012). Citizen expectations and democratic performance: The sources and consequences of democratic deficits from the bottom up. In P. T. Lenard & R. Simeon (Eds.), Imperfect Democracies: The Democratic Deficit in Canada and the United States (pp. 51–76). Vancouver: UBC Press. Niederhoffer, K. G., & Pennebaker, J. W. (2002). Linguistic style matching in social interaction. Journal of Language and Social Psychology, 21 (4), 337–360. Ohana, B., & Tierney, B. (2009, October). Sentiment classification of reviews using SentiWordNet. In 9th. IT & T Conference (p. 13). Dublin, Ireland: Dublin Institute of Technology. Olson, D. M. (1994). Democratic Legislative Institutions: A Comparative View. London: Routledge. REFERENCES 176

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., . . . Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python . Journal of Machine Learning Research, 12 , 2825–2830. Pennebaker, J. W., & Beall, S. K. (1986). Confronting a traumatic event: toward an understanding of inhibition and disease. Journal of Abnormal Psychology, 95 (3), 274. Pennebaker, J. W., & Francis, M. E. (1996). Cognitive, emotional, and language processes in disclosure. Cognition & Emotion, 10 (6), 601–626. Pennebaker, J. W., & Lay, T. C. (2002). Language use and personality during crises: Analyses of mayor Rudolph Giuliani’s press conferences. Journal of Research in Personality, 36 (3), 271–282. Pennebaker, J. W., Mayne, T. J., & Francis, M. E. (1997). Linguistic predictors of adaptive bereavement. Journal of Personality and Social Psychology, 72 (4), 863. Penner, E., Blidook, K., & Soroka, S. (2006). Legislative priorities and public opinion: representation of partisan agendas in the Canadian House of Commons. Journal of European Public Policy, 13 (7), 1006–1020. Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representa- tion. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Vol. 14, pp. 1532–1543). ACL. Peters, G.-J. Y. (2018). userfriendlyscience: Quantitative analysis made accessible [Computer software manual]. Retrieved from https://userfriendlyscience.com (R package version 0.7.1) doi: 10.17605/osf.io/txequ Peterson, A., & Spirling, A. (2018). Classification accuracy as a substantive quantity of interest: Measuring polarization in Westminster systems. Political Analysis, 26 (1), 120–128. Pickup, M., & Hobolt, S. B. (2015). The conditionality of the trade-off between government respon- siveness and effectiveness: The impact of minority status and polls in the Canadian House of Commons. Electoral Studies, 40 , 517–530. Polsby, N. W. (1975). Legislatures. In W. Riker, F. Greenstein, & N. Polsby (Eds.), Handbook of Political Science. Reading, MA: Addison-Wesley. Poole, K. T., & Rosenthal, H. (1985). A spatial model for legislative roll call analysis. American Journal of Political Science, 29 (2), 357–384. Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14 (3), 130–137. Proksch, S.-O., & Slapin, J. B. (2009). How to avoid pitfalls in statistical analysis of political texts: The case of Germany. German Politics, 18 (3), 323–344. Proksch, S.-O., & Slapin, J. B. (2010). Position taking in European Parliament speeches. British Journal of Political Science, 40 (3), 587–611. Proksch, S.-O., & Slapin, J. B. (2014). The Politics of Parliamentary Debate. Cambridge: Cambridge University Press. Pugliese, D. (2006, January 26). O’Connor among candidates for defence minister. The Ottawa Citizen, A3. Quinn, K. M., Monroe, B. L., Colaresi, M., Crespin, M. H., & Radev, D. R. (2010). How to analyze political attention with minimal assumptions and costs. American Journal of Political Science, 54 (1), 209–228. R Core Team. (2018). R: A Language and Environment for Statistical Computing [Computer software manual]. Vienna, Austria. Retrieved from https://www.R-project.org/ REFERENCES 177

Rehder, B., Schreiner, M., Wolfe, M. B., Laham, D., Landauer, T. K., & Kintsch, W. (1998). Using latent semantic analysis to assess knowledge: Some technical considerations. Discourse Processes, 25 (2-3), 337–354. Řehůřek, R., & Sojka, P. (2010, May 22). Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks (pp. 45–50). Valletta, Malta: ELRA. Richardson, J. J., & Jordan, A. G. (1979). Governing Under Pressure. Oxford: Martin Robertson. Riker, W. H., & Ordeshook, P. C. (1968). A theory of the calculus of voting. American Political Science Review, 62 (1), 25–42. Robinson, D., & Hayes, A. (2018). broom: Convert statistical analysis objects into tidy tibbles [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=broom (R package version 0.5.0) Romero, D. M., Swaab, R. I., Uzzi, B., & Galinsky, A. D. (2015). Mimicry is presidential: Linguistic style matching in presidential debates and improved polling numbers. Personality and Social Psychology Bulletin, 41 (10), 1311–1319. Russell, M., & Cowley, P. (2016). The policy power of the Westminster parliament: The “Parliamentary State” and the empirical evidence. Governance, 29 (1), 121–137. Russell, M., Gover, D., & Wollter, K. (2016). Does the executive dominate the Westminster legislative process?: Six reasons for doubt. Parliamentary Affairs, 69 (2), 286–308. Russell, P. H. (2008). Two Cheers for Minority Government: The Evolution of Canadian Parliamentary Democracy. Toronto: Emond Montgomery Publications. Ryan, F. H. (2009). Can Question Period be reformed? Canadian Parliamentary Review, 32 (3), 18–22. Sampson, A. (1962). Anatomy of Britain. London: Hodder and Stoughton. Savoie, D. J. (1999). Governing from the Centre: The Concentration of Power in Canadian Politics. Toronto: University of Toronto Press. Savoie, D. J. (2008). Court Government and the Collapse of Accountability in Canada and the United Kingdom. Toronto: University of Toronto Press. Savoie, D. J. (2015). What is Government Good At?: A Canadian Answer. Montreal: McGill-Queen’s Press. Savoy, J. (2010). Lexical analysis of US political speeches. Journal of Quantitative Linguistics, 17 (2), 123–141. Schonhardt-Bailey, C. (2008). The congressional debate on partial-birth abortion: Constitutional grav- itas and moral passion. British Journal of Political Science, 38 (3). Segal, H. (2004). Failing legitimacy: The challenge for parliamentarians. Canadian Parliamentary Review, 27 (1), 30–32. Shankar, B. L., & Rodrigues, V. (2010). The Indian Parliament: A Democracy at Work. New Delhi: Oxford University Press. Shanks, M. (1961). The Stagnant Society: A Warning. Harmondsworth, UK: Penguin. Sharpsteen, C., & Bracken, C. (2018). tikzdevice: R graphics output in latex format [Computer soft- ware manual]. Retrieved from https://CRAN.R-project.org/package=tikzDevice (R package version 0.12) Shephard, M., & Cairney, P. (2005). The impact of the Scottish Parliament in amending executive legislation. Political Studies, 53 (2), 303–319. REFERENCES 178

Skogstad, G. (2003). Who governs? Who should govern?: Political authority and legitimacy in Canada in the twenty-first century. Canadian Journal of Political Science, 36 (05), 955–973. Skogstad, G., & Whyte, T. (2015). Authority contests, power and policy paradigm change: Explaining developments in grain marketing policy in prairie Canada. Canadian Journal of Political Science, 48 (1), 79–100. Slapin, J. B., & Proksch, S.-O. (2008). A scaling model for estimating time-series party positions from texts. American Journal of Political Science, 52 (3), 705–722. Slatcher, R. B., Chung, C. K., Pennebaker, J. W., & Stone, L. D. (2007). Winning words: Individual differences in linguistic style among US presidential and vice presidential candidates. Journal of Research in Personality, 41 (1), 63–75. Slattery, B. (2009). Why the Governor General matters. In P. H. Russell & L. Sossin (Eds.), Parlia- mentary Democracy in Crisis (pp. 79–90). Toronto: University of Toronto Press. Smith, D. E. (2007). The People’s House of Commons: Theories of Democracy in Contention. Toronto: University of Toronto Press. Smith, J. (1999). Democracy and the Canadian House of Commons at the millennium. Canadian Public Administration, 42 (4), 398–421. Solomon, E. (2017, November 20). Jagmeet Singh and the shunning of Parliament. Maclean’s. Re- trieved from https://www.macleans.ca/politics/ottawa/jagmeet-singh-and-the-shunning -of-parliament/ (Online; accessed 6-December-2018.) Soroka, S. (2002a). Agenda-Setting Dynamics in Canada. Vancouver: UBC Press. Soroka, S. (2002b). Issue attributes and agenda-setting by media, the public, and policymakers in Canada. International Journal of Public Opinion Research, 14 (3), 264–285. Soroka, S., Cutler, F., Stolle, D., & Fournier, P. (2011). Capturing change (and stability) in the 2011 campaign. Policy Options, 32 , 70–77. Soroka, S., Penner, E., & Blidook, K. (2009). Constituency influence in Parliament. Canadian Journal of Political Science, 42 (3), 563–591. Soroka, S., Stecula, D. A., & Wlezien, C. (2015). It’s (change in) the (future) economy, stupid: economic indicators, the media, and public opinion. American Journal of Political Science, 59 (2), 457–474. Soroka, S., & Wlezien, C. (2010). Degrees of Democracy: Politics, Public Opinion, and Policy. Cam- bridge: Cambridge University Press. Soroka, S. N. (2005). Oral Questions, Canadian House of Commons, 1983-2004. Sparck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28 (1), 11–21. Spirling, A., & McLean, I. (2006a). The rights and wrongs of roll calls. Government and Opposition, 41 (4), 581–588. Spirling, A., & McLean, I. (2006b). UK OC OK? Interpreting optimal classification scores for the UK House of Commons. Political Analysis, 15 (1), 85–96. Sproule-Jones, M. (1984). The enduring colony? Political institutions and political science in Canada. Publius, 14 (1), 93-108. SQLAlchemy. (2018). SQLAlchemy: Python SQL toolkit and Object Relational Mapper. Retrieved from https://www.sqlalchemy.org (Online; accessed 1-December-2018.) Stanfield, R. L. (1978). The present state of the legislative process in Canada: myths and realities. In W. A. W. Neilson & J. C. MacPherson (Eds.), The Legislative Process in Canada: The Need for REFERENCES 179

Reform (pp. 39–50). Toronto: Institute for Research on Public Policy. Steenbergen, M. R., Bächtiger, A., Spörndli, M., & Steiner, J. (2003). Measuring political deliberation: A discourse quality index. Comparative European Politics, 1 (1), 21–48. Steiner, J. (2008). Concept stretching: The case of deliberation. European Political Science, 7 (2), 186. Stevens, G. (1978). The influence and responsibilities of the media in the legislative process. In W. A. W. Neilson & J. C. MacPherson (Eds.), The Legislative Process in Canada: The Need for Reform (pp. 227–235). Toronto: Institute for Research on Public Policy. Stewart, J. B. (1977). The Canadian House of Commons: Procedure and Reform. Montreal: McGill- Queen’s University Press. Stone, P. J., Dunphy, D. C., & Smith, M. S. (1966). The General Inquirer: A Computer Approach to Content Analysis. Cambridge, MA: MIT Press. Strahl, C. (2001). Toward a more responsive Parliament. Canadian Parliamentary Review, 24 (1), 2–4. Suedfeld, P., Bluck, S., Ballard, E. J., & Baker-Brown, G. (1990). Canadian federal elections: Motive profiles and integrative complexity in political speeches and popular media. Canadian Journal of Behavioural Science, 22 (1), 26. Sutherland, S. L. (1991). Responsible government and ministerial responsibility: Every reform is its own problem. Canadian Journal of Political Science, 24 (1), 91–120. Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational Linguistics, 37 (2), 267–307. Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and comput- erized text analysis methods. Journal of Language and Social Psychology, 29 (1), 24–54. Taylor, P. J., & Thomas, S. (2008). Linguistic style matching and negotiation outcome. Negotiation and Conflict Management Research, 1 (3), 263–281. Tetlock, P. E. (1981). Personality and isolationism: Content analysis of senatorial speeches. Journal of Personality and Social Psychology, 41 (4), 737–743. Thomas, P. G. (1979). Theories of Parliament and parliamentary reform. Journal of Canadian Studies, 14 (2), 57–66. Thompson, L. (2012). More of the same or a period of change? The impact of bill committees in the twenty-first century House of Commons. Parliamentary Affairs, 66 (3), 459–479. Tumasjan, A., Sprenger, T. O., Sandner, P. G., & Welpe, I. M. (2010). Predicting elections with Twitter: What 140 characters reveal about political sentiment. ICWSM , 10 (1), 178–185. Volkens, A., Lehmann, P., Matthieß, T., Merz, N., & Regel, S. (2016). The Manifesto Data Collection. Manifesto Project (MRG/CMP/MARPOR). Version 2016b. Berlin: Wissenschaftszentrum Berlin für Sozialforschung. Wallack, J. S. (2008). India’s parliament as a representative institution. India Review, 7 (2), 91–114. Ward, N. (1958). Confederation and responsible government. Canadian Journal of Economics and Political Science, 24 (1), 44–56. Weinberg, M. (2010). Measuring governors’ political orientations using words as data. State Politics & Policy Quarterly, 10 (1), 96–109. Wherry, A. (2017a, March 24). Liberals’ latest attempt at parliamentary reform remains a tale of woe, for now. CBC News. Retrieved from http://www.cbc.ca/news/politics/wherry-parliament -reform-chagger-1.4037813 Wherry, A. (2017b, March 10). Liberals propose changes to how House of Commons works. REFERENCES 180

CBC News. Retrieved from http://www.cbc.ca/news/politics/liberals-parliament-reform -discussion-paper-1.4019904 (Online; accessed 20-November-2018.) Wherry, A. (2017c, January 12). Maryam Monsef escapes the Liberal adventure in electoral reform. CBC News. Retrieved from http://www.cbc.ca/news/politics/wherry-monsef-shuffle-electoral -reform-1.3930891 (Online; accessed 20-November-2018.) White, G. (2005). Cabinets and First Ministers. Vancouver: UBC Press. White, G. (2012). The “centre” of the democratic deficit: Power and influence in Canadian political executives. In P. T. Lenard & R. Simeon (Eds.), Imperfect Democracies: The Democratic Deficit in Canada and the United States (pp. 226–247). Vancouver: UBC Press. Whyte, T. (2016). lipad: Public source of lipad.ca, a Django website for viewing and searching Canadian Hansard records from 1901-. https://github.com/twhyte/lipad. GitHub. Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. Retrieved from http://ggplot2.org Wickham, H. (2017). tidyverse: Easily install and load the “tidyverse” [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=tidyverse (R package version 1.2.1) Wilson, J. (1988). In defence of parliamentary opposition. Canadian Parliamentary Review, 11 (2), 26–31. Woodward, J. L. (1934). Quantitative newspaper analysis as a technique of opinion research. Social Forces, 12 , 526–537. Young, L., & Soroka, S. (2012). Affective news: The automated coding of sentiment in political texts. Political Communication, 29 (2), 205–231. Zeileis, A. (2006). Implementing a class of structural change tests: An econometric computing approach. Computational Statistics & Data Analysis, 50 , 2987–3008. Zeileis, A., & Grothendieck, G. (2005). zoo: S3 infrastructure for regular and irregular time series. Journal of Statistical Software, 14 (6), 1–27. doi: 10.18637/jss.v014.i06 Zeileis, A., Kleiber, C., Krämer, W., & Hornik, K. (2003). Testing and dating of structural changes in practice. Computational Statistics & Data Analysis, 44 , 109–123. Zeileis, A., Leisch, F., Hornik, K., & Kleiber, C. (2002). strucchange: An R package for testing for structural change in linear regression models. Journal of Statistical Software, 7 (2), 1–38. Retrieved from http://www.jstatsoft.org/v07/i02/ Zima, E., Brône, G., & Feyaerts, K. (2010). Patterns of interaction in Austrian parliamentary debates. In C. Ilie (Ed.), European Parliaments Under Scrutiny: Discourse Strategies and Interaction Practices (pp. 135–164). Amsterdam: John Benjamins. Zipf, G. K. (1935). The Psycho-Biology of Language. Boston, MA: Houghton, Mifflin. Zipf, G. K. (1949). Human Behavior and the Principle of Least Effort. Reading, MA: Addison-Wesley.