AUTNES Automatic Content Analysis of the Media Coverage 2013

Documentation Version 1.0.0

Martin Haselmayer, Carina Jacobi, Jakob-Moritz Eberl, Ramona Vonbun

Katharina Kleinen-von Königslöw, Klaus Schönbach, Hajo G. Boomgaarden

Vienna, May 2016

AUTNES Automatic Content Analysis of the Media Coverage 2013 - Documentation

Martin Haselmayer, Carina Jacobi, Jakob-Moritz Eberl, Ramona Vonbun, Katharina Kleinen-von Königslöw, Klaus Schönbach, Hajo G. Boomgaarden

(Edition 1.0.0, 2016)

[email protected] http://www.autnes.at

2 Seite Seite


1. Introduction 5 ...... Acknowledgement of the data 6

Conditions of use ...... 6

Restrictions ...... 6

...... Confidentiality 6

...... Deposit Requirement 6

2. Study description 7

Funding ...... 7

Data file name ...... 7

Keywords ...... 7

3. Study design 8 ...... Media sample 8

...... Article selection 9

Comparison of the automatic and manual datasets ...... 13

AmCAT ...... 14

Article sets ...... 15

...... Data cleaning and preparation 16

...... Data cleaning 16

4. Data set 17 Coding procedure ...... 17

...... Metadata 20


Actors (parties, top candidates and members of government) ...... 24 Seite Seite

Parties ...... 24

Top candidates ...... 26

Members of Government ...... 27

Political knowledge ...... 28

Style...... 29

Issues (vi1-vi14; vi99) ...... 32

Validation procedure ...... 33

References 34

4 Seite Seite


The Documentation Report of the AUTNES Automatic Content Analysis is a document to accompany and document the data of the AUTNES Automatic Content Analysis of the 2013 Austrian National Election Coverage. The election was held on 29

September 2013. The body elected was the National Parliament (Nationalrat). This document is divided into four parts. The first part contains a description of the AUTNES Media Side study and more specifically, its automatic content analysis part. The second part addresses the conditions of use of the AUTNES automatic content analysis data. The third part contains a description of the study design – the media sample, data cleaning strategies as well as how to find the full articles in our sample online, in AmCAT (amcat.nl) article sets. A separate section addresses how this sample compares to the AUTNES manual content analysis dataset. The final part of this documentation addresses the semi-automatic analysis of these articles: the method of creating search strings, and a description of each (search string-based) variable in the AUTNES automatic content analysis dataset, including the searchstrings and their precision and recall values.

5 Seite Seite

Acknowledgement of the data

Data users are kindly asked to acknowledge the data and the accompanying release document. Please refer to the GESIS data catalogue (www.gesis.org) for a recommendation on how to cite these data and the documentation.

1. Conditions of use

Restrictions The data are available for non-profit use without restrictions.

Confidentiality AUTNES, the Principal Investigators and the funding institution bear no responsibility for the use of the data, or for interpretations or inferences based on their use, neither do they accept liability for indirect, consequential or incidental damages or losses arising from use of the data collection.

Deposit Requirement In order to facilitate exchanges within the scientific community and to provide funding agencies with essential information about use of archival resources users of the AUTNES data are requested to notify the AUTNES team of all forms of publications referring to AUTNES data.

6 Seite Seite

2. Study description

The data set was created through an automatic analysis of all media reports referring to either a political party or a political candidate running in the Austrian national th th election between August 18 and September 29 2013 (=election day) based on search strings.

th The data collection, cleaning and analysis were conducted between August 18 , 2013 and January 31th, 2015 by Martin Haselmayer and Carina Jacobi, using the online content analysis platform AmCAT (amcat.nl). Principal investigators were Klaus Schönbach, Hajo G. Boomgaarden, and Katharina Kleinen-von Königslöw.

Funding The study was carried out under the auspices of the Austrian National Election Study (AUTNES), a National Research Network (NFN) sponsored by the Austrian Science Fund (FWF) (S10903-G11).

Data file name

ZA5865_ger_v1-0-0.dta ZA5865_ger_v1-0-0.sav

Stata-Format, 43021 Cases, 76 Variables.

Keywords Election coverage, online media, print media, television news, political issues, political parties, political actors, emotionalization.

7 Seite Seite

3. Study design

3.1. Media sample

The table below gives an overview of the media outlets included in the dataset.

Table 1. Media overview

Private TV Elite newspapers Regional dailies Online newspapers

ATV Aktuell 19:20 Kärntner Tageszeitung derstandard.at

Pro7 AustriaNews Kleine Zeitung diepresse.com Neue Vorarlberger

Sat1 Die Presse am Sonntag Tageszeitung heute.at

Salzburger Nachrichten Neues Volksblatt kleinezeitung.at Oberösterreichische

Public service TV WirtschaftsBlatt Nachrichten krone.at

ZIB 20 Salzburger Volkszeitung kurier.at ZIB 24 Popular newspapers Tiroler Tageszeitung nachrichten.at

ZIB Magazin Heute TT-Kompakt news.at Zeit im Bild 1 Kronen Zeitung Vorarlberger Nachrichten noen.at

ZiB 2 Österreich Vorarlberger Tageszeitung oe24.at

Wiener Zeitung orf.at

Public service radio Mid-market newspaper salzburg.com Ö1 Mittagsjournal Kurier Regional weeklies tt.com

12:00 BVZ .at

Magazines derGrazer vol.at Die ganze Woche NÖ Nachrichten

Falter NÖN Landeszeitung Online-only news site Format Oberländer Rundschau gmx.at

News Salzburger Woche profil WOCHE Bildpost


WOCHE Hartberg WOCHE Kärnten

WOCHE Murtaler Zeitung

WOCHE Obersteiermark WOCHE Südweststeiermark 8

WOCHE Weizer Zeitung Seite

3.2. Article selection To collect our data, articles/transcripts for print newspapers and public television newscasts were downloaded from the database of the Presse Agentur (APA, http://www.apa-defacto.at/Site/index.de.html). For this, we used the following search string containing the names of all Austrian political parties participating in the National Election of 2013, all names of the chancellor, ministers and secretaries of state, as well as the Nationalrat:

(nationalrat* ODER spö ODER övp ODER vp ODER "die grünen" ODER glawischnig ODER fpö ODER bzö ODER "bündnis zukunft österreich" ODER piratenpartei ODER "die piraten" ODER "den piraten" ODER stronach: ODER faymann ODER spindelegger ODER (hundstorfer: UND (** ODER bundesvertreter*)) ODER "" ODER "hundstorfer rudolf" ODER fekter: ODER "heinisch hosek" ODER "heinisch hoseks" ODER (stöger: UND *minister*) ODER "alois stöger" ODER "stöger alois" ODER "mikl leitner" ODER "mikl leitners" ODER (karl: UND (justizministerin ODER ministerin*)) ODER "beatrix karl" ODER "karl beatrix" ODER (berlakovich: UND *minister*) ODER "" ODER "berlakovich niklaus" ODER "niki berlakovich" ODER "berlakovich niki" ODER (klug: UND *minister*) ODER "" ODER "klug gerald" ODER (schmied: UND *minister*) ODER "" ODER "schmied claudia" ODER (bures UND *minister*) ODER "" ODER "bures doris" ODER (mitterlehner: UND *minister*) ODER "" ODER "mitterlehner reinhold" ODER (t:chterle: UND *minister*) ODER "karlheinz t:chterle" ODER "t:chterle karlheinz" ODER "" ODER (fischer UND (Nationalratspräsident ODER staatsoberhaupt ODER bundespräsident*))) ODER neos ODER kpö ODER "kommunistische partei österreich*" ODER "österreich* kommunist*" ODER "slp" ODER "sozialitisch* linkspartei" ODER "josef bucher" ODER "sepp* bucher" ODER "bucher sepp*" "bucher joseph" ODER "strache*" ODER "matthias strolz" ODER "strolz matthias" ODER "mirko messner" ODER "messner mirko" ODER "rot-grün:" ODER "schwarz-grün:" ODER "die grüne" ODER "der grüne" ODER "der grünen" ODER ökopartei ODER grünpolitiker* ODER "grün politiker*" ODER "den grünen" ODER "tiroler grüne*" ODER "wiener grüne*" ODER "grüne* tirol" ODER "grüne* wien" ODER "grüne* burgenland" ODER "burgenländ* grüne*" ODER "grune* kärnten" ODER "kärntner grüne*" ODER "grüne* salzburg" ODER "salzburger grüne*" ODER "grüne* oberösterreich" ODER "oberösterreichische* grüne*" ODER "grüne* steiermark" ODER "steiermarkische* grüne*" ODER "grüne* vorarlberg" ODER "vorarlberger grüne*" ODER "grüne* niederösterreich*" ODER "niederösterreichische* grüne*"

Data were downloaded from the APA databank on a daily basis, and uploaded in

AmCAT article sets per week, separate for television and print newspapers.

9 Seite Seite

For commercial television, every program included was checked for relevant items (at least one political party, candidate or the Nationalrat was mentioned) by one of our coders. The relevant programs were transcribed by student assistants, and uploaded in one AmCAT articleset per program.

Articles from online newspapers were scraped using custom scrapers programmed by a student assistant. These scrapers were checked and compared to the website daily, and any missing data was added later. They scraped a website’s content the morning

after it was published, with the exception of the website orf.at which was scraped every hour, due to the rapidly changing content of this website. The scraped articles were added to AmCAT. In this process, a deduplicator automatically discarded all articles featured on the website for more than one round of scraping based on medium, date/time, headline/byline (including the line where the information of regional editions for certain newspapers were found, so similar articles appearing in different regional editions were kept in the database) and article text. When scraping was finished for the entire campaign period, we selected the relevant articles using the

following search string:

Politik_Allgemein# nationalrat* spo ovp vp "die grunen" glawischnig fpo bzo piratenpartei "die piraten" "den piraten" stronach* faymann spindelegger spindi "hundstorfer (ministe* bundesvertrete*)"~10 "rudolf hundstorfer"~2 fekter* fekter "heinisch hose*"~2 "(stoger stoger?) ministe*"~10 "alois (stoger stoger?)"~2 "mikl leitner" "(karl karl?) (justizministerin ministeri*)"~10 "beatrix karl"~2 berlakovich berlakovich? "(klug? klug) ministe*"~10 "gerald klug"~2 "(schmied? schmied) minister*"~10 "claudia schmied"~2 "bures minister*"~5 "doris bures"~2 "(mitterlehner? mitterlehner) ministe*"~10 "reinhold mitterlehner"~2 "mitterlehner? reinhold?"~2

"(tochterle tochterle?) ministe*"~10 "karlheinz tochterle"~2 "tochterle? karlheinz?"~2 "heinz fischer"~2 "heinz? fischer?"~2 "(fischer fischer?) (nationalratsprasident nationalratsprasident? staatsoberhaupt staatsoberhaupt? bundesprasident* bundesprasident)"~10 "staatssekretar kurz"~2 "staatssekretar? kurz?"~2 "staatssekretar? ostermayer?"~2 "staatssekretar ostermayer"~2 "staatssekretar? lopatka?"~2 "staatssekretar lopatka"~2 "staatssekretar schieder"~2 "staatssekretar? schieder?"~2 neos kpo

"kommunistische partei osterreic*" "osterreic* kommunis*" "josef bucher"~2 "sepp* bucher" "josef? bucher?"~2 strache strache? "matthias strolz" "strolz? matthias?" "mirko messner" "messner? mirko?" "rot grun" "rot grun?" "schwarz grun?" "schwarz grun" "die grune" "der grune?" okopartei grunpolitike* "grun politike*" "den grunen" mannerpartei austrittspartei euaus christenpartei cpo "christlich* parte* osterreichs" "sozialistisc* link* partei" "sozialistisch*

linkspartei" slp "parte* der wandel" "list* der wandel" "grupp* der wandel"

10 Seite Seite

Due to differences in search syntax and technical possibilities, the search string used in for the APA database and the one for AmCAT are not completely identical. We regularly tested the prediction quality of our search strings and relied on standard measures used in binary classification tasks (e.g. Davis and Goadrich 2006; Fawcett

2006). For both data sets, recall proved to be near 100%. Issues with precision were later solved with data cleaning.

st th All news stories from August 1 , 2013 to September 29 , 2013 were included. th However, for Election Day on September 29 , we chose not to include any evening television programs, as these discussed the Election outcome rather than the campaign. After data collection and data cleaning were finished, we added all relevant articles together into the two datasets for automatic content analysis mentioned earlier in this documentation.

Table 2. Data overview

Name of outlet N Percent Cumulative

ZiB 249 0.58 0.58

ZiB2 160 0.37 0.95

ZiB20 69 0.16 1.11

ZiB24 106 0.25 1.36

ZiB Magazin 9 0.02 1.38 ATV Aktuell 70 0.16 1.54

Sat.1 Austria News 127 0.3 1.84

Pro7 Austria News 88 0.2 2.04

Ö1 Mittagsjournal 281 0.65 2.69

Der Standard 1,280 2.98 5.67

Die Presse + Die Presse am Sonntag 1,526 3.55 9.22

Salzburger Nachrichten 1,099 2.55 11.77

Wirtschaftsblatt 245 0.57 12.34 11

Kronen Zeitung 2,748 6.39 18.73 Seite Seite

Österreich 3,156 7.34 26.06

Heute 947 2.2 28.27

Kurier 2,063 4.8 33.06

Kleine Zeitung 2,592 6.02 39.09

OÖ Nachrichten 1,347 3.13 42.22

Tiroler Tageszeitung 1,115 2.59 44.81 VN Vorarlberger Nachrichten 706 1.64 46.45

Wiener Zeitung 707 1.64 48.09

Kärntner Tageszeitung 335 0.78 48.87

Neue Vorarlberger Tageszeitung 510 1.19 50.06

Neues Volksblatt 774 1.8 51.86

TT Kompakt 225 0.52 52.38

SV Salzburger Volkszeitung 273 0.63 53.01

News 135 0.31 53.33 Format 159 0.37 53.7

Falter 221 0.51 54.21

Profil 174 0.4 54.62

Die ganze WOCHE 59 0.14 54.75

Woche Graz 103 0.24 54.99

Woche Obersteiermark 117 0.27 55.26

Woche HBZ 72 0.17 55.43

Woche Weiz 63 0.15 55.58

Woche Südweststeiermark 150 0.35 55.93

Woche Murtaler Zeitung 110 0.26 56.18

Woche Bildpost 97 0.23 56.41

Woche Kärnten 268 0.62 57.03

Salzburger Woche 184 0.43 57.46 Rundschau - Oberländer Wochenzeitung 71 0.17 57.62

NÖN Niderösterreichische Nachrichten 4,242 9.86 67.48 derGrazer 36 0.08 67.57 12

BVZ - Burgenländische Volkszeitung 793 1.84 69.41 Seite

orf.at 1,184 2.75 72.16 derstandard.at 1,934 4.5 76.66 diepresse.com 1,507 3.5 80.16 salzburg.com 1,777 4.13 84.29 krone.at 362 0.84 85.13

oe24.at 531 1.23 86.37 heute.at 567 1.32 87.69

kurier.at 835 1.94 89.63 kleinezeitung.at 2,024 4.7 94.33 nachrichten.at 430 1 95.33 tt.com 197 0.46 95.79 vol.at 361 0.84 96.63 vienna.at 364 0.85 97.47

news.at 466 1.08 98.56 noen.at 476 1.11 99.66

gmx.at 145 0.34 100

Total 43,021 100

3.3. Comparison of the automatic and manual datasets

The selection of articles included in the article sets for automatic content analysis and that in the dataset for manual content analysis differ on a number of characteristics, listed below.

More media Manual content analysis was only conducted on a limited number of daily national newspapers (Kronen Zeitung, Heute, Der Standard, Die Presse, Kleine

Zeitung, Kurier, Österreich, and Salzburger Nachrichten) and television news programs (Zeit im Bild 1, ZiB 2, ZiB 20, ZIB 24, ZIB Magazin, ATV Aktuell, Sat1-

PULS 4 News, Pro7- PULS4 News). All other media, including regional newspapers,

radio, news websites and weekly news magazines, are only part of the automatic 13

content analysis. All media in the manual content analysis are also part of the Seite Seite

automatic content analysis, but the relevance criteria employed there are wider (see below) – in other words, daily national newspapers and television programs will have a slightly larger number of articles/items in the automatic content analysis than in the manual content analysis.

Wider criteria for relevance For the automatic content analysis, all articles on

Austrian politics on whatever level (as long as they included one of the political actors listed in the search string used for article selection) were considered relevant. From

these, coders selected for the manual content analysis all articles which referred to one or more political actorsfrom a predetermined list of relevant actors (all actors relevant to the national election, see the manual content analysis documentation).

Although using the exact same selection criteria for both analyses might be preferable for reasons of consistency, the time and effort required by such a (manual) selection process for the larger media sample in the automatic analysis, especially for regional media, made us opt for the slightly less precise version.

Longer time period The manual content analysis includes a time period of six weeks, th th from August 19 , 2013 to Election Day on August 29 , 2013. For the automatic st content analysis, we have articles available from August 1 .

3.4. AmCAT

AmCAT is an open-source platform for automatic and manual content analysis freely available (after registration) on amcat.nl. The project ID for the AUTNES media side study is 50. All AmCAT users can access this project, but for full access that includes full texts of all media articles and items, one should become a member of the project.

For this, please email one of the authors of this documentation.

All articles within AmCAT are stored in one or more article sets. These article sets form the basis of an automatic content analysis within AmCAT or AmCAT’s R integration (see http://github.com/amcat/amcat-r). Automatic content analysis is

performed in AmCAT’s Query section

( http://amcat.nl/navigator/projects/50/query/). Here, one can use the keyword 14

search string(s) window to enter one or more search strings, and search within one or Seite Seite

1 more selected article sets, media or time periods. AmCAT uses Apache Lucene search engine library which features i.a. Boolean and proximity operators for its

search; ‘Search syntax help’ provides an overview of the exact commands available.

Aside from a simple list of hits, AmCAT has features for various types of output, such as graphs or tables of hits over time or per medium.

3.5. Article sets

Article set no. 10727 Alles 19.08.-29.09. contains all articles for the automatic th th content analysis media selection from August 19 , 2013 to September 29 , 2013, the research period of six weeks also used in the manual content analysis.

Article set no. 12728 Alles 01.08. -29.09. contains all articles for the automatic content analysis media selection collected for this project, that is, for the period from st th August 1 , 2013 to September 29 , 2013.

Automatic content analysis is performed in the query section of AmCAT. Here, the selection of media and/or the time period can berestricted further if so desired.

Article set no. 12743 Mutationen 01.08.-29.09 contains all articles in ‘Mutationen’ - regional/local editions of a national/regional newspaper that were also featured in st th another edition of the same newspaper, from August 1 , 2013 to September 29 ,

2013. These articles were filtered out of the article sets for automatic content analysis, which contain only one version of each article.

N.B. Only these article sets are regularly checked and kept up to date. Other article sets within the AUTNES project might not be complete and/or precise!


1 Seite http://lucene.apache.org/core/

4. Data cleaning and preparation

4.1. Data cleaning

After data collection was finished, the article sets in AmCAT were checked for irrelevant articles, namely articles on the German elections, real duplicates and

duplicates due to regional varieties of newspapers. Additionally, we cleaned up individual articles by splitting them and removing irrelevant text.

German elections Mainly because of the name of one party (Grüne – the Green party) which is identical for the German and the Austrian party, combined with the fact that the German elections took place one week before those in Austria, our original dataset included a lot of articles about German elections. We searched for these in our data using the following search string:

(deutschland* merkel* cdu fdp csu "die union" spd "bundnis 90") NOT (spo ovp bzo fpo stronach faymann mitterlehner hundstorfer fekter bures spindelegger "heinz fischer" wien wiener leitner berlakovich lopatka glawischnig osterreich* vp nationalrat* "heinisch hosek" neos)

After checking all articles found by this search string, those mentioning the German elections without any reference to Austrian politics or the Austrian elections were removed from our data entirely.

Real duplicates Real duplicates (articles identical in date and medium and nearly identical in headline and/or text) were found by downloading a list of all articles showing their ID, medium, date, headline and first paragraph, which was manually checked for duplicates. Duplicates found in this way were discarded.

Duplicates due to regional editions We tested for duplicates due to regional editions by downloading a list with all articles in media that have these editions and comparing

articles that were similar in headline and length. If the articles proved to be highly similar, only one of them (selected randomly) was kept in the original article set, while

the other(s) were moved to a separate ‘Mutationen’ article set. For the manually

coded data, coders were also instructed to look for duplicates, and if discovered, these 16

articles were again moved to the ‘Mutationen’ article set. Seite Seite

Cleaning within articles The APA database frequently considered multiple short news reports or letters to the editor as a single article. Our coders were told to list these articles, and we split them into separate articles, removing the original article.

Irrelevant text within articles (e.g. APA database copyright information) was removed using an automatic script.

Articles featuring local politicians or ex-politicians that were not coded in the manual analysis were still considered relevant for the automatic content analysis, so

these were not excluded from any of the article sets.

5. Data set

5.1. Coding procedure

We used search strings to measure all variables. Lucene syntax enables complex searches and allows combining many different sub-blocks and operators. The general procedure for creating search strings was similar for any of these tasks. It combines inductive and deductive steps, including multiple tests and adaptions. A search string

usually starts with basic information provided by the item of interest (names of politicians, issue categories). A second step is to collect additional keywords, synonyms and thesauri related to a person (e.g. function, party) an issue (e.g. dictionaries) or a variable (e.g. related theoretical work).

First versions of a search strings were subsequently tested against the data set to test precision and recall. Search terms were then added or removed, word distances were adapted and different combinations of sub-blocks and operators were employed. In

some cases, we excluded keywords that may change the meaning of a search term.

The stepwise elaboration of the search strings also involved the gathering of additional keywords by coders, who manually coded media articles according to the same coding scheme (issues, persons, some variables). Coders were asked to collect phrases terms in order to compare these with existing search strings. This further helped to improve both precision and recall of our search strings. Also, at least three different 17 researchers cross-checked the search terms to retrieve misspellings and control for Seite

logical errors. Naturally, some categories are harder to detect with search terms than others.

Table 3 gives an overview of all variables in the dataset. We included several types of variables: metadata, political actors (parties, top candidates, ministers and secretaries of state), political knowledge, style, and issues. All are explained below.

Table 3. Variables overview

Variable type Variable name Variable label

Metadata v0 Article_ID

v1 Headline

v2 Byline

v3_l1 Medium_Ebene_1

v3_l2 Medium_Ebene_2

v3_l3 Medium_Ebene_3 v4a Date

v4 Medium

v5 Seite

v6_l2 Ressort_Ebene_2

v7 Wortzahl

v0_ac amcat uniqueID

Parties vp1 SPÖ

vp2 ÖVP

vp3 FPÖ

vp4 Grüne

vp5 BZÖ

vp6 Team Stronach

vp7 Neos

vp8 KPÖ

vp9 Piraten 18

vp10 CPÖ Seite Seite

vp11 SLP

vp12 EUAUS

vp13 Männerpartei

vp14 Der Wandel

Top candidates vk1 Faymann

vk2 Spindelegger vk3 Strache

vk4 Glawischnig

vk5 Bucher

vk6 Stronach

vk7 Strolz

vk8 Messner

vk9 Wieser vk10 Gehring

vk11 Grusch

vk12 Marschall

vk13 Hausbichler

vk14 Mulla

Members of Government vk15 Hundstorfer vk16 Fekter

vk17 Heinisch

vk18 Stöger

vk19 Mikl

vk20 Karl

vk21 Berlakovich

vk22 Klug

vk23 Schmied vk24 Bures

vk25 Mitterlehner

vk26 Töchterle 19

vk27 Kurz Seite

vk28 Lopatka

vk29 Schieder

vk30 Ostermayer

Political knowledge v8 Voting Age 16

V9 4% Hürde Einzug Nationalrat

Style v10 Conflict

v11 Emotionalization

v12 Language of games, sports and war

v13 Personalization

Issues vi1 Wirtschaft

vi2 Wohlfahrtsstaat

vi3 Budget vi4 Bildung und Kultur

vi5 Sicherheit

vi6 Bundesheer

vi7 Außenpolitik

vi8 Europa

vi9 Infrastruktur

vi10 Gesellschaft

vi11 Umweltschutz

vi12 Institutionenreform

vi13 Immigration


vi99 Kein Policy Issue

5.2. Metadata

Metadata were derived from the source texts (digital newspaper articles, scraped news sites, or transcripts) automatically by AmCAT, where necessary manually corrected or


20 Seite Seite

Article_ID (v0) is the ID number by which the article can be recognized and searched within AmCAT (amcat.nl).

Headline (v1) is the news story’s headline, stored as string variable.

Byline (v2) is the text below the headline in the news story, for example a sub- headline, a location or an author, depending on the layout of the source text.

Medium_Ebene_1 (v3_l1) is the media outlet each news story appeared in, categorized by media type. The table below gives an overview.

Table 4. Overview of Medium_Ebene_1

Value Label Translation N Percent Cumulative

1000 TV TV 878 2.04 2.04

2000 Radio Radio 281 0.65 2.69

3000 Tagespresse Dailies 21,648 50.32 53.01

4000 Wochenpresse Weeklies 7,054 16.4 69.41

5000 Websites Websites 13,160 30.59 100

Total 43,021 100

Medium_Ebene_2 (v3_l2) is the media outlet each news story appeared in, categorized by media genre. The table below gives an overview.

Table 5. Overview of Medium_Ebene_3.

Value Label Translation N Percent Cumulative

1100 TV (ORF) Public Service TV 593 1.38 1.38

1200 TV (Privat) Commercial TV 285 0.66 2.04

2100 Radio (ORF) Public Service Radio 281 0.65 2.69

3100 Qualitätszeitungen Elite newspapers 4,150 9.65 12.34

Boulevard- 21 3200 Popular newspapers 6,851 15.32 28.27

zeitungen Seite Seite

Mid-market 3300 Midrange-Zeitungen 2,063 4.8 33.06 newspaper

3400 Bundesländerzeitungen Regional dailies 8,584 19.95 53.01

4100 Magazine Magazines 748 1.74 54.75

Regionalpresse 4200 Regional weeklies 6,306 14.66 69.41 (wöchentlich)

Online 5100 Websites (Medien) 13,015 30.25 99.66


Online-only news 5200 Websites (allgem.) 145 0.34 100 sites

Total 43,021 100

Medium_Ebene_3 (v3_l3) is the media outlet each news story appeared in, corrected manually for inaccuracies.

Table 6. Overview of Medium_Ebene_3.

Value Label Value Label

1101 ZiB 4105 Die ganze WOCHE

1102 ZiB2 4211 Woche Graz

1103 ZiB20 4212 Woche Obersteiermark

1104 ZiB24 4213 Woche HBZ

1105 ZiB Magazin 4214 Woche Weiz 1201 ATV Aktuell 4215 Woche Südweststeiermark

1202 Guten Abend Österreich 4216 Woche Murtaler Zeitung

22 Seite Seite

1203 Sat.1 Austria News 4217 Woche Bildpost

1204 Pro7 Austria News 4218 Woche Kärnten

2101 Ö1 Mittagsjournal 4219 Salzburger Woche

3101 Der Standard 4220 Rundschau - Oberländer


3102 Die Presse + Die Presse am 4230 NÖN Niderösterreichische Sonntag Nachrichten

3103 Salzburger Nachrichten 4240 derGrazer

3104 Wirtschaftsblatt 4250 BVZ - Burgenländische


3201 Kronen Zeitung 5101 orf.at

3202 Österreich 5102 derstandard.at

3203 Heute 5103 diepresse.com

3301 Kurier 5104 salzburg.com 3401 Kleine Zeitung 5105 krone.at

3402 OÖ Nachrichten 5106 oe24.at

3403 Tiroler Tageszeitung 5107 heute.at

3404 VN Vorarlberger Nachrichten 5108 kurier.at

3405 Wiener Zeitung 5109 kleinezeitung.at

3406 Kärntner Tageszeitung 5110 nachrichten.at

3407 Neue Vorarlberger 5111 tt.com


3408 Neues Volksblatt 5112 vol.at

3409 TT Kompakt 5113 vienna.at

3410 SV Salzburger Volkszeitung 5114 news.at

4101 News 5115 noen.at

4102 Format 5201 news.google.at 4103 Falter 5202 gmx.at

4104 Profil

Note: for the frequencies per outlet, see table 2. 23 Seite Seite

Date (v4a) is the date on which the news story was published or broadcasted, saved as string variable.

Medium (v4) is the name of the outlet that published or broadcasted the story. This variable is not corrected for inaccuracies, so we suggest to use Medium_Ebene_3 instead.

Seite (v5) is the number of the page on which a news story appeared.

Ressort_Ebene 2 (v6_l2) is the news category in which the news story appeared (the

categories used differ across media, therefore an overview would take too much space and is omitted).

Wortzahl (v7) is the length of the news story in words. The headline and byline are not taken into account.

AmCAT uniqueID (v0_ac) is the unique ID of the article across all AmCAT servers.

6. Actors (parties, top candidates and members of


Actor variables measure whether an actor is present in an article (coded as 1) or not

(coded as 0).

6.1. Parties (vp1-14)

For measuring the saliency of the political parties that took part in the Austrian

National Election of 2013, we used the following search strings: spo# (spo OR sp OR ("partei sozialdemo*"~5) OR sozialdemokrat* ) ovp# ovp OR vp OR volkspartei OR "partei spindelegger"~5 "die schwarzen" "den schwarzen" fpo# fpo OR freiheitlich* OR fp OR ("partei (freiheitlich* OR Strache*)"~5) grune# ("grune partei" "die grunen" "die grune" "der grunen" " der grune" "die grune partei" "grune alternative" "okopartei" grunpolitiker* "grun politiker*" "den grunen" "tiroler grune*" "wiener grune*" "grune* tirol" "grune* wien" "grune* burgenland" "burgenland* grune*" "grune* kartnen" 24 "karntner grune*" "grune* salzburg" "salzburger grune*" "grune* oberosterreich" "oberosterreichische* grune*" "grune* steiermark" Seite

"steiermarkische* grune*" "grune* vorarlberg" "vorarlberger grune*" "grune* niederosterreich*" "niederosterreichische* grune*") OR ("grune* (spitzenkandidat* vorstandssprecher* landtagsabgeordnet* fpo* freiheitl* blau* spo sp sozialdemokrat* sozialist* rot* ovp vp volkspartei schwarz* neos lif lf kpo kommunist* chef* vorsitz* glawischnig sprecher* bundesvorstand* bundessprecher* landesvorstand* parlament* regierung* opposition* partei* bundnis* fraktion klub* vize* burgermeister stadtrat* landesrat* bereichssprecher* referent* landespart* ortsgrupp*)"~5) OR "(grune)" bzo# bzo OR "bundnis zukunft osterreich" "die orangen" "den orangen" "der orangen" teamstronach# stronach* neos# neos OR "das neue osterreich" OR "wahlplattform lif"~5 kpo# kpo OR "kommunistisch* partei osterreich"~3 kommunisten piraten# (piratenpartei piraten) NOT deutschland* cpo# ("christliche partei" cpo "die christen") NOT (norwegen syrien "hass gegen die christen" "syri* christliche partei"~5 "norweg* christliche partei"~5) slp# "sozialistisch* link* partei" OR "slp" euaustrittspartei# "eu austrittspartei" OR ("partei eu austritt"~1) mannerpartei# "mannerpartei" wandel#("der wandel" AND (kleinpartei* partei* wahl* nationalratswahl* mulla))

Values: 1 = present; 0 = absent.

Precision and recall values for these search strings were as follows:

Table 7. Precision and recall for the search strings for parties.

Party Precision Recall Party Precision Recall SPÖ 1 0.92 KPÖ 1 1

ÖVP 0.96 0.94 Piraten 0.98 0.88 FPÖ 1 0.96 CPÖ 0.98 0.92

Grüne 0.88 0.96 SLP 0.98 0.82 BZÖ 1 1 EU-AUS 1 0.9

Team Stronach 1 1 Männerpartei 0.98 1 Neos 1 0.86 Der Wandel 0.98 1

25 Seite Seite

6.2. Top candidates (vk1-14)

For measuring the saliency of top candidates for each party, we used the following search strings:

Faymann# faymann* Spindelegger# spindelegger* spindi

Strache#strache* Glawischnig#(glawischnig* )NOT "glawischnig dieter"~3 Bucher# "bucher* ( sepp* OR josef OR bzo OR chef OR bundnischef OR obmann

OR bundnisobmann OR parteichef OR faymann OR spindelegger OR strache OR glawischnig OR stronach)"~5 Stronach# stronach* NOT ("stronach (partei* gruppe* team* mitarbeit* klub* parlaments* landes* bezirks* bundes* landtags* burgermeister* spitzenkand* abgeordnet* klubobmann* klubchef* liste )"~1 "stronach (bundniss* koalition*)"~3)

strolz# "strolz (matthias OR neo* obmann* spitzenmann* parteichef* boss* partei* spitzenkandidat* vorsitz* chef* bundessprecher*)"~10 messner#( "messner* (mirko OR chef* OR spitzenkand* OR obmann* OR vorsitz* OR kpo kommunist* bundessprecher* spitzenmann* parteichef*)"~5) wieser#("wieser* (mario pirat* obmann* spitzenmann* parteichef* boss* partei* gruppe* team* spitzenkandidat* vorsitz* chef*)"~5)

gehring# "gehring* (rudolf OR christen OR cpo OR obmann* spitzenmann* parteichef* boss* partei* spitzenkandidat* vorsitz* chef* bundessprecher*)"~5 OR ("gehring christlich* partei"~5) grusch# "grusch* (sonja OR slp OR linkspartei obfrau* spitzenfrauparteichef* boss* partei* spitzenkandidat* vorsitz* chef* bundessprecher*)"~5

marschall# "marschall* (robert OR austrittspartei chef* OR spitzenkand* OR obmann* OR vorsitz* OR kpo kommunist* bundessprecher* spitzenmann* parteichef*)"~5 hausbichler# "hausbichler* (hannes OR mannerpartei* obmann* spitzenmann* parteichef* boss* partei* spitzenkandidat* vorsitz* bundessprecher*)"~5 ("hausbichler partei mann*"~5)

mulla# "mulla (wandel OR fayad)"~5

Values: 1 = present; 0 = absent.

Precision and recall values for these search strings were as follows:

Table 8. Precision and recall for the search strings for top candidates.

Top candidate Precision Recall Top candidate Precision Recall

Faymann 1 0.93 Messner 1 1

Spindelegger 1 0.94 Wieser 0.98 1

Strache 1 1 Gehring 1 1 26

Glawischnig 1 1 Grusch 1 1 Seite

Bucher 1 1 Marschall 1 1

Stronach 0.98 0.64 Hausbichler 1 1

Strolz 1 1 Mulla 1 1

6.3. Members of Government (vk15-30)

For measuring the saliency of the members of Government (Ministers and Secretaries of State) we used the following search strings: hundstorfer# hundstorfer* fekter# fekter* heinischh# ("heinisch (gabriele OR frauenminister* OR sp OR spo OR beamtenminister* OR minister* hossek hosek)"~5) stoger# ("stoger* (alois OR gesundheitsminister* OR sp-* OR spo* OR bundesminister* minister* regierung*)"~5 NOT "peter stoger") Mikl Leitner# "mikl* (johanna OR innenministerin* OR bundesministerin* OR vp OR ovp volkspartei vp-*leitner)"~5 karl# "karl (beatrix OR justizminister* OR minister* OR bundesminister*)"~5 berlakovich# "nikolaus b" OR ("Berlakovich (nikolaus OR umweltminister* OR agrarminister* OR landwirtschaftsminister* OR minister* OR niki OR vp-* OR vp OR volkspartei OR ovp* biene* pestizid* agrar* umwel* landwirtschaft* bauer*)"~5) klug# "klug (gerald OR verteidigungsminister* OR sportminister* OR minister* OR sp OR spo)"~5 schmied# "schmied (claudia OR bildungsminister* OR kulturminister* OR minister* OR unterrichtsminister* OR sp-* OR spo* gesamtschul*)"~5 bures# bures NOT "bures radim"~5 mitterlehner# ("mitterlehner (reinhold OR wirtschaftsminister* OR minister* OR familienminister* OR vp OR vp-* OR ovp* volkspartei energieminister*)"~5) Tochterle#("tochterle karl heinz"~5) OR "tochterle (wissenschaftsminister* OR bundesminister* minister* OR forschungsminister* OR ovp* OR vp OR vp-* volkspartei karlheinz)"~5

kurz# "kurz (sebastian OR intergrationsstaatsekretar* OR staatssekretar* ovp* jvp* volkspartei* vp vp-*)"~5 lopatka# "lopatka (reinhold OR staatssekretar* OR vp OR vp-*OR volkspartei OR ovp)"~5 schieder# "schieder (andreas OR staatssekretar* OR finanzstaatssekretar* OR sp OR sp-*OR spo* sozialdemokrat*)"~5 "schieders (andreas OR staatssekretar* OR finanzstaatssekretar* OR sp OR sp-*OR spo* sozialdemokrat*)"~5 ostermayer# "ostermayer (josef OR staatssekretar* OR medienstaatssekretar* OR sp OR sp-* OR sozialdemokrat* OR spo)"~5 "ostermayers (josef OR staatssekretar* OR medienstaatssekretar* OR sp OR sp-* OR sozialdemokrat* OR


27 Seite Seite

Values: 1 = present; 0 = absent.

The precision and recall of these search strings is as follows:

Table 9. Precision and recall for the search strings for members of Government.

Minister Precision Recall Secretary of State Precision Recall

Hundstorfer 1 1 Kurz 0.96 1

Fekter 1 1 Lopatka 1 1

Heinisch Hosek 1 1 Schieder 1 0.99

Stöger 1 1 Ostermayer 1 0.91

Mikl Leitner 1 0.78

Karl 1 0.75

Berlakovich 0.96 1

Klug 1 0.83

Schmied 1 1

Bures 1 1

Mitterlehner 1 1

Töchterle 1 0.84

6.4. Political knowledge

Political knowledge variables code whether three political knowledge facts are present

(coded as 1) or absent (coded as 0) in an article. We measured the following political knowledge facts:

Voting Age (v8)

This political knowledge fact concerns Austria’s minimum voting age (sixteen years old). Its search string is as follows:

28 wahlalter16#("wahlalter* 16"~10 "wahlberechtig* 16"~10 "wahlalter* gesenk*"~10 "wahlalter* senk*"~10 "erstwahl* 16"~10 "erstwahl* gesenk*"~10 Seite

"erstwahl* senk*"~10 "wahlen 16"~5 "wer spatestens am Wahltag 16 Jahre alt ist" "ab 16 jahren ist man dabei" "wahl* 16 lebensjahr"~5) NOT (wahlalterna* "16 prozen*" senkungsidee*)

Values: 1 = present; 0 = absent.

Precision: 0,83 Recall: 0,89

Threshold of votes for entering parliament (v9)

This political knowledge fact concerns the threshold for Austrian political parties to enter Parliament (four percent). Its search string is as follows:

Vierprozenthurde#( vierprozenthurde* sperrklausel* ("(hurde* mindest* klausel* sperr* einzug* einzieh* einzuzieh* eingezog* erreich* schaff* uberspring*) (parlament* nationalrat* nr ) (vier 4) prozent*"~5) )

Values: 1 = present; 0 = absent.

Precision: 0.8, Recall: 0.8

Appointment of prime minister (vx)

This political knowledge fact concerns the procedure by which the Prime Minister is

appointed. This procedure was not mentioned in the news, therefore we excluded the variable from our dataset. Its searchstring is as follows:

("(ern?nn* beruf* bestimm*) (bundeskanzler* kanzler* regierungschef* regierungsspitze* regierung) (staatsoberhaupt* prasident* bundesprasident* fischer*)"~5)

No mention throughout the entire election period.

6.5. Style

Style variables are about the style in which an article is written, or an item is presented. The first two variables, conflict and language of games, sports

Conflict (v10) konflikt#((konflikt* unvereinbarkeit* kritisier* kritik* intervention* intervenier* weiterkampf* widerstand* "wehr* gegen"~5 "vorwurf gegen" protest* offensiv* angriff* frontalangriff* widerspruch* "greif* an"~10 "griff* an"~10 kampf* attack* angegriffen* streit* gestritten*)) NOT

29 "(offensiv* angriff* kampf* attack* streit* gestritten*) (syrien* syrisch*

agypt* djihad* iran* sudan* sudsudan* nordkorea* un uno saudi* Seite

staatengemeinschaft* sicherheitsraat* terroris* kaida qaida pakistan sudan* jemen* libyen lybisch* somali* eritre* kongo* elfenbeinkust* ivor* ivoire hamas hisbollah miliz gotteskrieger taliban*)"~25

Values: 1 = present; 0 = absent.

Precision: 0.86 Recall: 0.86

Emotionalization (v11)

Emotionalization is operationalized as a continuous variable that represents the 2 proportion of emotion words in a news story .

We used the sentiment lexicon Sentilex (Wolf et al 2008) to determine which words are emotion words. After slightly altering the lexicon to fit the syntax used in AmCAT, we created a search string from the words in the lexicon. This search string is not published here for reasons of space.

Minimum: 0 Maximum: 1 Mean: 0.433

Precision: 0.90 Recall: 0.89

Language of games, sports and war (v12)

Games# ((verspiel* zweikampf* dreikampf* angriff* angegriff* angreif* vergeltung* terror* nahkampf* wortgefecht* scharmutzel* gefecht* krieg kriege kriegerisch* arsenal* giftschrank* aufmarsch* gegenschlag*

uberraschungsangriff* uberraschungsschlag* armada attentat* bombig* bombadier* bombardier* explosiv* propaganda* neutralisier* konter kontern gekontert kontert* konterattack* konterangriff* feldzug* sprengkraft sprengstoffsprengt* schlachtfeld* fahnd* offensiv* defensiv* hinterhalt* schlacht hauptquartier massaker* massakrier* kampf* gekampft* kanone* schiess* geschoss* zuschlagen zugeschlagen* zuschlug zuschlagt anschlag* kommando* geheimwaffe* belager* kampf* krise* regierungskrise* koalitionskrise* attacke* wahlschlacht* wahlkampf* himmelfahrtskommando* kreuzzug* blitzkrieg* vebunde* verteidigung verteidigt* schutzengr?ben* geschutz mobilisier* erober* vormarsch* bekampf* keule verbotskeule richtungsk?mpf* putsch* widerstand* feind* mitstreiter* rebellisch* aufstand aufruhr* intervention* revolte* palastrevolte* gerangel krawall* scharmutzel* konfrontation* konfrontier* frontalangriff* kampagne* kampagnisier* 30 2 Because the headline of the article is not counted for the article length, were five cases where the degree of emotionalization exceeded 1 (100%). We these values in the dataset without recoding. Seite

aufbegehr* einschiess* eingeschoss* entscharf* stichel* heckenschutze* abschiess* abgeschoss* querschiess* querschuss* querschutze* front fronten koalitionsfront* regierungsfront* oppositionsfront* grabenkampf*

gelandegewinne vorstoss* ruckendeckung* deckung* storfeuer* durchbruch* abruck* ruckzugsgefecht* verluste* ruckzug vordringen* fuhrung* positionspapier* bollwerk* manover* taktik* taktisch* wahltaktisch* strategie* strategisch* wahlstrategi* kapitulation* kapitulier* flucht fluchtet* gefluchtet* stormanover* giftpfeil* blockad* blockier* mobilisier* rekrutier* flankier* ruckendeck* mitstreiter* bollwerk* schlachtschiff* machtkampf* kampfkandidat* kampfabstimm* uberflugel* vorbeizieh* vorbeigezog* tauchstation* abtauch* abgetaucht* favorit* aussenseiter* outsider* geheimfavorit* rochade* personalrochade* rochier* todesspirale* foul* finale* angezahlt* anzahl* angeschlagen* triumph* pyhrrus* steilpass* steilvorlage* matchwinner* eigentor* powerplay arena wahlkampfarena duell* kanzlerduell* gladiator* auferstehung* opfergang* aufstachel* aufgestachel* fair* unfair* gewinn* sieger* verlierer* schlappe uberrunde* bruchlandung* angeschlagen* angezahlt* startschuss* startrampe* startlocher*) ("(sturm ansturm) auf") ("war room") ("aus volle* rohr*") ("(ins im) (abseits aus)") ("rote karte")

("(auflage vorlage) verwandel*") ("(fest sicher) im sattel") ("auf der strecke (bleib* blieb geblieb*)") ("kopf an kopf") ("ins ziel (kommen gekommen* kam kommt)") ("ins spiel (bringen brachte* gebracht* bringt)") ("stellung (einnehmen eingenommen einnahm beziehen bezog* halten gehalten hielt)") NOT (syrien* syrisch* agypt* djihad* iran* sudan* sudsudan* nordkorea* un uno staatengemeinschaft* sicherheitsraat* terroris* kaida qaida pakistan* sudan* jemen* libyen lybisch* somali* eritre* kongo* elfenbeinkust* ivor* ivoire hamas hisbollah miliz gotteskrieger taliban*))

The search string was based on examples from codebooks for manual coding of this variable (see Kleinen-von Königslöw et al. 2015) and by consulting thesauri and dictionaries.

Values: 1 = present; 0 = absent.

Precision: 0.94, Recall: 0.79

6.6. Personalization (top candidate focus) (v13)

Personalization is operationalized as a continuous variable that represents the proportion of news stories mentioning a top candidate out of all stories mentioning a

party name (including the top candidate, if his or her party is mentioned), for the

seven largest parties participating in the 2013 election. We measured this using the 31

search strings for these parties and candidates, added up the results for all parties Seite Seite and all candidates, and divided the number of stories mentioning a top candidate by the number of stories mentioning a party. The resulting number between 0 and 1 represents the degree of personalization.

Minimum:0 Maximum:1 Mean: 0.113

Precision:0.98 Recall: 0.95

6.7. Issues (vi1-vi14; vi99)

The search strings for issues reflect the issue categories from the manual coding of this variable. Coder instructions (see Kleinen- von Königslöw et al. 2015) were used to create these search strings. We further relied on collections of issue-specific keywords from our coders. Further broadening was achieved by consulting thesauri and dictionaries and extensive test applications.

Please note that the search strings do not reflect the same logic as the manual content analyses. Hits do only refer to an issue occurring in an article. For technical

reasons, some of the issue search strings do only work when called through AmCAT’s R interface. We therefore generally recommend using R for AmCAT issue queries. While we deliver search strings for most substantive issues on AUTNES level 3, we only validated them on level 1 (most general level). We therefore advise validating issues on more fine-grained levels according to specified use.

Values: 1 = present; 0 = absent.

Precision: 0,82 Recall: 0,80

32 Seite Seite

7. Validation procedure

In order to validate our search strings, we calculated their precision and recall, either on the level of individual search strings (for those search strings that measure one of our variables directly) on an aggregated level (for those search strings that measure whether an issue is mentioned in an article or not). For measuring precision, we took a random sample of 50 articles from the search results of each search string, and coded these manually for the presence or absence of our variables. If the variable is present in all articles, precision is 100%. For measuring recall, we drew a random sample of 100 articles from our total population of articles, and coded this manually for the presence or absence of our variables. Then, we compared this to the results of a semi-automatic analysis using search strings, on the same articles.

For the political knowledge variables, that occurred only very rarely, we used strategic samples for the validation process – samples that were more likely to contain the concept than a random selection of articles. For each variable, we first created a

sample that was likely to contain the concept in around half of the articles (e.g. for Voting Age, we used the search term “wahl* 16”~10 OR wahlalter OR erstwahl ), and drew a random sample of 100 articles from the result.

33 Seite Seite

8. References

Davis, J; Goadrich,M (2006). "The relationship between Precision-Recall and ROC curves." Pp. 233-40 in 23rd international conference on Machine learning,

June 25-29. Pittsburgh, Pennsylvania. Fawcett, T (2006). An Introduction to ROC analysis. Pattern Recognition Letter 27:861-74.

Kleinen-von Königslöw, K; Vonbun, R; Eberl, J-M; Haselmayer, M; Jacobi, C; Schönbach, K; Boomgaarden, H. 2015. AUTNES Manual Content Analysis of the Media Coverage 2013 - Documentation. Wien: Universität Wien.

Van Atteveldt, W (2008). Semantic Network Analysis: Techniques for Extracting,

Representing and Querying Media Content. Charleston: BookSurge.

Wolf, M; Horn, A B; Mehl, M R; Haug, S; Pennebaker, J W; Kordy, H (2008).

Computergestützte quantitative Textanalyse: Äquivalenz und Robustheit der deutschen Version des Linguistic Inquiry and Word Count.” Diagnostica

54(2): 85-98. doi: 10.1026/0012- 1924.54.2.85.

