ÖSTERREICHISCHE NATIONALE WAHLSTUDIE Institut für Publizistik- und AUSTRIAN NATIONAL ELECTION STUDY Kommunikationswissenschaft
AUTNES Automatic Content Analysis of the Media Coverage 2013
Documentation Version 1.0.0
Martin Haselmayer, Carina Jacobi, Jakob-Moritz Eberl, Ramona Vonbun
Katharina Kleinen-von Königslöw, Klaus Schönbach, Hajo G. Boomgaarden
Vienna, May 2016
AUTNES Automatic Content Analysis of the Media Coverage 2013 - Documentation
Martin Haselmayer, Carina Jacobi, Jakob-Moritz Eberl, Ramona Vonbun, Katharina Kleinen-von Königslöw, Klaus Schönbach, Hajo G. Boomgaarden
(Edition 1.0.0, 2016)
[email protected] http://www.autnes.at
2 Seite Seite
Contents
1. Introduction 5 ...... Acknowledgement of the data 6
Conditions of use ...... 6
Restrictions ...... 6
...... Confidentiality 6
...... Deposit Requirement 6
2. Study description 7
Funding ...... 7
Data file name ...... 7
Keywords ...... 7
3. Study design 8 ...... Media sample 8
...... Article selection 9
Comparison of the automatic and manual datasets ...... 13
AmCAT ...... 14
Article sets ...... 15
...... Data cleaning and preparation 16
...... Data cleaning 16
4. Data set 17 Coding procedure ...... 17
...... Metadata 20
3
Actors (parties, top candidates and members of government) ...... 24 Seite Seite
Parties ...... 24
Top candidates ...... 26
Members of Government ...... 27
Political knowledge ...... 28
Style...... 29
Issues (vi1-vi14; vi99) ...... 32
Validation procedure ...... 33
References 34
4 Seite Seite
Introduction
The Documentation Report of the AUTNES Automatic Content Analysis is a document to accompany and document the data of the AUTNES Automatic Content Analysis of the 2013 Austrian National Election Coverage. The election was held on 29
September 2013. The body elected was the National Parliament (Nationalrat). This document is divided into four parts. The first part contains a description of the AUTNES Media Side study and more specifically, its automatic content analysis part. The second part addresses the conditions of use of the AUTNES automatic content analysis data. The third part contains a description of the study design – the media sample, data cleaning strategies as well as how to find the full articles in our sample online, in AmCAT (amcat.nl) article sets. A separate section addresses how this sample compares to the AUTNES manual content analysis dataset. The final part of this documentation addresses the semi-automatic analysis of these articles: the method of creating search strings, and a description of each (search string-based) variable in the AUTNES automatic content analysis dataset, including the searchstrings and their precision and recall values.
5 Seite Seite
Acknowledgement of the data
Data users are kindly asked to acknowledge the data and the accompanying release document. Please refer to the GESIS data catalogue (www.gesis.org) for a recommendation on how to cite these data and the documentation.
1. Conditions of use
Restrictions The data are available for non-profit use without restrictions.
Confidentiality AUTNES, the Principal Investigators and the funding institution bear no responsibility for the use of the data, or for interpretations or inferences based on their use, neither do they accept liability for indirect, consequential or incidental damages or losses arising from use of the data collection.
Deposit Requirement In order to facilitate exchanges within the scientific community and to provide funding agencies with essential information about use of archival resources users of the AUTNES data are requested to notify the AUTNES team of all forms of publications referring to AUTNES data.
6 Seite Seite
2. Study description
The data set was created through an automatic analysis of all media reports referring to either a political party or a political candidate running in the Austrian national th th election between August 18 and September 29 2013 (=election day) based on search strings.
th The data collection, cleaning and analysis were conducted between August 18 , 2013 and January 31th, 2015 by Martin Haselmayer and Carina Jacobi, using the online content analysis platform AmCAT (amcat.nl). Principal investigators were Klaus Schönbach, Hajo G. Boomgaarden, and Katharina Kleinen-von Königslöw.
Funding The study was carried out under the auspices of the Austrian National Election Study (AUTNES), a National Research Network (NFN) sponsored by the Austrian Science Fund (FWF) (S10903-G11).
Data file name
ZA5865_ger_v1-0-0.dta ZA5865_ger_v1-0-0.sav
Stata-Format, 43021 Cases, 76 Variables.
Keywords Election coverage, online media, print media, television news, political issues, political parties, political actors, emotionalization.
7 Seite Seite
3. Study design
3.1. Media sample
The table below gives an overview of the media outlets included in the dataset.
Table 1. Media overview
Private TV Elite newspapers Regional dailies Online newspapers
ATV Aktuell 19:20 Der Standard Kärntner Tageszeitung derstandard.at
Pro7 AustriaNews Die Presse Kleine Zeitung diepresse.com Neue Vorarlberger
Sat1 Die Presse am Sonntag Tageszeitung heute.at
Salzburger Nachrichten Neues Volksblatt kleinezeitung.at Oberösterreichische
Public service TV WirtschaftsBlatt Nachrichten krone.at
ZIB 20 Salzburger Volkszeitung kurier.at ZIB 24 Popular newspapers Tiroler Tageszeitung nachrichten.at
ZIB Magazin Heute TT-Kompakt news.at Zeit im Bild 1 Kronen Zeitung Vorarlberger Nachrichten noen.at
ZiB 2 Österreich Vorarlberger Tageszeitung oe24.at
Wiener Zeitung orf.at
Public service radio Mid-market newspaper salzburg.com Ö1 Mittagsjournal Kurier Regional weeklies tt.com
12:00 BVZ vienna.at
Magazines derGrazer vol.at Die ganze Woche NÖ Nachrichten
Falter NÖN Landeszeitung Online-only news site Format Oberländer Rundschau gmx.at
News Salzburger Woche profil WOCHE Bildpost
WOCHE Graz
WOCHE Hartberg WOCHE Kärnten
WOCHE Murtaler Zeitung
WOCHE Obersteiermark WOCHE Südweststeiermark 8
WOCHE Weizer Zeitung Seite
3.2. Article selection To collect our data, articles/transcripts for print newspapers and public television newscasts were downloaded from the database of the Austria Presse Agentur (APA, http://www.apa-defacto.at/Site/index.de.html). For this, we used the following search string containing the names of all Austrian political parties participating in the National Election of 2013, all names of the chancellor, ministers and secretaries of state, as well as the Nationalrat:
(nationalrat* ODER spö ODER övp ODER vp ODER "die grünen" ODER glawischnig ODER fpö ODER bzö ODER "bündnis zukunft österreich" ODER piratenpartei ODER "die piraten" ODER "den piraten" ODER stronach: ODER faymann ODER spindelegger ODER (hundstorfer: UND (*minister* ODER bundesvertreter*)) ODER "rudolf hundstorfer" ODER "hundstorfer rudolf" ODER fekter: ODER "heinisch hosek" ODER "heinisch hoseks" ODER (stöger: UND *minister*) ODER "alois stöger" ODER "stöger alois" ODER "mikl leitner" ODER "mikl leitners" ODER (karl: UND (justizministerin ODER ministerin*)) ODER "beatrix karl" ODER "karl beatrix" ODER (berlakovich: UND *minister*) ODER "nikolaus berlakovich" ODER "berlakovich niklaus" ODER "niki berlakovich" ODER "berlakovich niki" ODER (klug: UND *minister*) ODER "gerald klug" ODER "klug gerald" ODER (schmied: UND *minister*) ODER "claudia schmied" ODER "schmied claudia" ODER (bures UND *minister*) ODER "doris bures" ODER "bures doris" ODER (mitterlehner: UND *minister*) ODER "reinhold mitterlehner" ODER "mitterlehner reinhold" ODER (t:chterle: UND *minister*) ODER "karlheinz t:chterle" ODER "t:chterle karlheinz" ODER "heinz fischer" ODER (fischer UND (Nationalratspräsident ODER staatsoberhaupt ODER bundespräsident*))) ODER neos ODER kpö ODER "kommunistische partei österreich*" ODER "österreich* kommunist*" ODER "slp" ODER "sozialitisch* linkspartei" ODER "josef bucher" ODER "sepp* bucher" ODER "bucher sepp*" "bucher joseph" ODER "strache*" ODER "matthias strolz" ODER "strolz matthias" ODER "mirko messner" ODER "messner mirko" ODER "rot-grün:" ODER "schwarz-grün:" ODER "die grüne" ODER "der grüne" ODER "der grünen" ODER ökopartei ODER grünpolitiker* ODER "grün politiker*" ODER "den grünen" ODER "tiroler grüne*" ODER "wiener grüne*" ODER "grüne* tirol" ODER "grüne* wien" ODER "grüne* burgenland" ODER "burgenländ* grüne*" ODER "grune* kärnten" ODER "kärntner grüne*" ODER "grüne* salzburg" ODER "salzburger grüne*" ODER "grüne* oberösterreich" ODER "oberösterreichische* grüne*" ODER "grüne* steiermark" ODER "steiermarkische* grüne*" ODER "grüne* vorarlberg" ODER "vorarlberger grüne*" ODER "grüne* niederösterreich*" ODER "niederösterreichische* grüne*"
Data were downloaded from the APA databank on a daily basis, and uploaded in
AmCAT article sets per week, separate for television and print newspapers.
9 Seite Seite
For commercial television, every program included was checked for relevant items (at least one political party, candidate or the Nationalrat was mentioned) by one of our coders. The relevant programs were transcribed by student assistants, and uploaded in one AmCAT articleset per program.
Articles from online newspapers were scraped using custom scrapers programmed by a student assistant. These scrapers were checked and compared to the website daily, and any missing data was added later. They scraped a website’s content the morning
after it was published, with the exception of the website orf.at which was scraped every hour, due to the rapidly changing content of this website. The scraped articles were added to AmCAT. In this process, a deduplicator automatically discarded all articles featured on the website for more than one round of scraping based on medium, date/time, headline/byline (including the line where the information of regional editions for certain newspapers were found, so similar articles appearing in different regional editions were kept in the database) and article text. When scraping was finished for the entire campaign period, we selected the relevant articles using the
following search string:
Politik_Allgemein# nationalrat* spo ovp vp "die grunen" glawischnig fpo bzo piratenpartei "die piraten" "den piraten" stronach* faymann spindelegger spindi "hundstorfer (ministe* bundesvertrete*)"~10 "rudolf hundstorfer"~2 fekter* fekter "heinisch hose*"~2 "(stoger stoger?) ministe*"~10 "alois (stoger stoger?)"~2 "mikl leitner" "(karl karl?) (justizministerin ministeri*)"~10 "beatrix karl"~2 berlakovich berlakovich? "(klug? klug) ministe*"~10 "gerald klug"~2 "(schmied? schmied) minister*"~10 "claudia schmied"~2 "bures minister*"~5 "doris bures"~2 "(mitterlehner? mitterlehner) ministe*"~10 "reinhold mitterlehner"~2 "mitterlehner? reinhold?"~2
"(tochterle tochterle?) ministe*"~10 "karlheinz tochterle"~2 "tochterle? karlheinz?"~2 "heinz fischer"~2 "heinz? fischer?"~2 "(fischer fischer?) (nationalratsprasident nationalratsprasident? staatsoberhaupt staatsoberhaupt? bundesprasident* bundesprasident)"~10 "staatssekretar kurz"~2 "staatssekretar? kurz?"~2 "staatssekretar? ostermayer?"~2 "staatssekretar ostermayer"~2 "staatssekretar? lopatka?"~2 "staatssekretar lopatka"~2 "staatssekretar schieder"~2 "staatssekretar? schieder?"~2 neos kpo
"kommunistische partei osterreic*" "osterreic* kommunis*" "josef bucher"~2 "sepp* bucher" "josef? bucher?"~2 strache strache? "matthias strolz" "strolz? matthias?" "mirko messner" "messner? mirko?" "rot grun" "rot grun?" "schwarz grun?" "schwarz grun" "die grune" "der grune?" okopartei grunpolitike* "grun politike*" "den grunen" mannerpartei austrittspartei euaus christenpartei cpo "christlich* parte* osterreichs" "sozialistisc* link* partei" "sozialistisch*
linkspartei" slp "parte* der wandel" "list* der wandel" "grupp* der wandel"
10 Seite Seite
Due to differences in search syntax and technical possibilities, the search string used in for the APA database and the one for AmCAT are not completely identical. We regularly tested the prediction quality of our search strings and relied on standard measures used in binary classification tasks (e.g. Davis and Goadrich 2006; Fawcett
2006). For both data sets, recall proved to be near 100%. Issues with precision were later solved with data cleaning.
st th All news stories from August 1 , 2013 to September 29 , 2013 were included. th However, for Election Day on September 29 , we chose not to include any evening television programs, as these discussed the Election outcome rather than the campaign. After data collection and data cleaning were finished, we added all relevant articles together into the two datasets for automatic content analysis mentioned earlier in this documentation.
Table 2. Data overview
Name of outlet N Percent Cumulative
ZiB 249 0.58 0.58
ZiB2 160 0.37 0.95
ZiB20 69 0.16 1.11
ZiB24 106 0.25 1.36
ZiB Magazin 9 0.02 1.38 ATV Aktuell 70 0.16 1.54
Sat.1 Austria News 127 0.3 1.84
Pro7 Austria News 88 0.2 2.04
Ö1 Mittagsjournal 281 0.65 2.69
Der Standard 1,280 2.98 5.67
Die Presse + Die Presse am Sonntag 1,526 3.55 9.22
Salzburger Nachrichten 1,099 2.55 11.77
Wirtschaftsblatt 245 0.57 12.34 11
Kronen Zeitung 2,748 6.39 18.73 Seite Seite
Österreich 3,156 7.34 26.06
Heute 947 2.2 28.27
Kurier 2,063 4.8 33.06
Kleine Zeitung 2,592 6.02 39.09
OÖ Nachrichten 1,347 3.13 42.22
Tiroler Tageszeitung 1,115 2.59 44.81 VN Vorarlberger Nachrichten 706 1.64 46.45
Wiener Zeitung 707 1.64 48.09
Kärntner Tageszeitung 335 0.78 48.87
Neue Vorarlberger Tageszeitung 510 1.19 50.06
Neues Volksblatt 774 1.8 51.86
TT Kompakt 225 0.52 52.38
SV Salzburger Volkszeitung 273 0.63 53.01
News 135 0.31 53.33 Format 159 0.37 53.7
Falter 221 0.51 54.21
Profil 174 0.4 54.62
Die ganze WOCHE 59 0.14 54.75
Woche Graz 103 0.24 54.99
Woche Obersteiermark 117 0.27 55.26
Woche HBZ 72 0.17 55.43
Woche Weiz 63 0.15 55.58
Woche Südweststeiermark 150 0.35 55.93
Woche Murtaler Zeitung 110 0.26 56.18
Woche Bildpost 97 0.23 56.41
Woche Kärnten 268 0.62 57.03
Salzburger Woche 184 0.43 57.46 Rundschau - Oberländer Wochenzeitung 71 0.17 57.62
NÖN Niderösterreichische Nachrichten 4,242 9.86 67.48 derGrazer 36 0.08 67.57 12
BVZ - Burgenländische Volkszeitung 793 1.84 69.41 Seite
orf.at 1,184 2.75 72.16 derstandard.at 1,934 4.5 76.66 diepresse.com 1,507 3.5 80.16 salzburg.com 1,777 4.13 84.29 krone.at 362 0.84 85.13
oe24.at 531 1.23 86.37 heute.at 567 1.32 87.69
kurier.at 835 1.94 89.63 kleinezeitung.at 2,024 4.7 94.33 nachrichten.at 430 1 95.33 tt.com 197 0.46 95.79 vol.at 361 0.84 96.63 vienna.at 364 0.85 97.47
news.at 466 1.08 98.56 noen.at 476 1.11 99.66
gmx.at 145 0.34 100
Total 43,021 100
3.3. Comparison of the automatic and manual datasets
The selection of articles included in the article sets for automatic content analysis and that in the dataset for manual content analysis differ on a number of characteristics, listed below.
More media Manual content analysis was only conducted on a limited number of daily national newspapers (Kronen Zeitung, Heute, Der Standard, Die Presse, Kleine
Zeitung, Kurier, Österreich, and Salzburger Nachrichten) and television news programs (Zeit im Bild 1, ZiB 2, ZiB 20, ZIB 24, ZIB Magazin, ATV Aktuell, Sat1-
PULS 4 News, Pro7- PULS4 News). All other media, including regional newspapers,
radio, news websites and weekly news magazines, are only part of the automatic 13
content analysis. All media in the manual content analysis are also part of the Seite Seite
automatic content analysis, but the relevance criteria employed there are wider (see below) – in other words, daily national newspapers and television programs will have a slightly larger number of articles/items in the automatic content analysis than in the manual content analysis.
Wider criteria for relevance For the automatic content analysis, all articles on
Austrian politics on whatever level (as long as they included one of the political actors listed in the search string used for article selection) were considered relevant. From
these, coders selected for the manual content analysis all articles which referred to one or more political actorsfrom a predetermined list of relevant actors (all actors relevant to the national election, see the manual content analysis documentation).
Although using the exact same selection criteria for both analyses might be preferable for reasons of consistency, the time and effort required by such a (manual) selection process for the larger media sample in the automatic analysis, especially for regional media, made us opt for the slightly less precise version.
Longer time period The manual content analysis includes a time period of six weeks, th th from August 19 , 2013 to Election Day on August 29 , 2013. For the automatic st content analysis, we have articles available from August 1 .
3.4. AmCAT
AmCAT is an open-source platform for automatic and manual content analysis freely available (after registration) on amcat.nl. The project ID for the AUTNES media side study is 50. All AmCAT users can access this project, but for full access that includes full texts of all media articles and items, one should become a member of the project.
For this, please email one of the authors of this documentation.
All articles within AmCAT are stored in one or more article sets. These article sets form the basis of an automatic content analysis within AmCAT or AmCAT’s R integration (see http://github.com/amcat/amcat-r). Automatic content analysis is
performed in AmCAT’s Query section
( http://amcat.nl/navigator/projects/50/query/). Here, one can use the keyword 14
search string(s) window to enter one or more search strings, and search within one or Seite Seite
1 more selected article sets, media or time periods. AmCAT uses Apache Lucene search engine library which features i.a. Boolean and proximity operators for its
search; ‘Search syntax help’ provides an overview of the exact commands available.
Aside from a simple list of hits, AmCAT has features for various types of output, such as graphs or tables of hits over time or per medium.
3.5. Article sets
Article set no. 10727 Alles 19.08.-29.09. contains all articles for the automatic th th content analysis media selection from August 19 , 2013 to September 29 , 2013, the research period of six weeks also used in the manual content analysis.
Article set no. 12728 Alles 01.08. -29.09. contains all articles for the automatic content analysis media selection collected for this project, that is, for the period from st th August 1 , 2013 to September 29 , 2013.
Automatic content analysis is performed in the query section of AmCAT. Here, the selection of media and/or the time period can berestricted further if so desired.
Article set no. 12743 Mutationen 01.08.-29.09 contains all articles in ‘Mutationen’ - regional/local editions of a national/regional newspaper that were also featured in st th another edition of the same newspaper, from August 1 , 2013 to September 29 ,
2013. These articles were filtered out of the article sets for automatic content analysis, which contain only one version of each article.
N.B. Only these article sets are regularly checked and kept up to date. Other article sets within the AUTNES project might not be complete and/or precise!
15
1 Seite http://lucene.apache.org/core/
4. Data cleaning and preparation
4.1. Data cleaning
After data collection was finished, the article sets in AmCAT were checked for irrelevant articles, namely articles on the German elections, real duplicates and
duplicates due to regional varieties of newspapers. Additionally, we cleaned up individual articles by splitting them and removing irrelevant text.
German elections Mainly because of the name of one party (Grüne – the Green party) which is identical for the German and the Austrian party, combined with the fact that the German elections took place one week before those in Austria, our original dataset included a lot of articles about German elections. We searched for these in our data using the following search string:
(deutschland* merkel* cdu fdp csu "die union" spd "bundnis 90") NOT (spo ovp bzo fpo stronach faymann mitterlehner hundstorfer fekter bures spindelegger "heinz fischer" wien wiener leitner berlakovich lopatka glawischnig osterreich* vp nationalrat* "heinisch hosek" neos)
After checking all articles found by this search string, those mentioning the German elections without any reference to Austrian politics or the Austrian elections were removed from our data entirely.
Real duplicates Real duplicates (articles identical in date and medium and nearly identical in headline and/or text) were found by downloading a list of all articles showing their ID, medium, date, headline and first paragraph, which was manually checked for duplicates. Duplicates found in this way were discarded.
Duplicates due to regional editions We tested for duplicates due to regional editions by downloading a list with all articles in media that have these editions and comparing
articles that were similar in headline and length. If the articles proved to be highly similar, only one of them (selected randomly) was kept in the original article set, while
the other(s) were moved to a separate ‘Mutationen’ article set. For the manually
coded data, coders were also instructed to look for duplicates, and if discovered, these 16
articles were again moved to the ‘Mutationen’ article set. Seite Seite
Cleaning within articles The APA database frequently considered multiple short news reports or letters to the editor as a single article. Our coders were told to list these articles, and we split them into separate articles, removing the original article.
Irrelevant text within articles (e.g. APA database copyright information) was removed using an automatic script.
Articles featuring local politicians or ex-politicians that were not coded in the manual analysis were still considered relevant for the automatic content analysis, so
these were not excluded from any of the article sets.
5. Data set
5.1. Coding procedure
We used search strings to measure all variables. Lucene syntax enables complex searches and allows combining many different sub-blocks and operators. The general procedure for creating search strings was similar for any of these tasks. It combines inductive and deductive steps, including multiple tests and adaptions. A search string
usually starts with basic information provided by the item of interest (names of politicians, issue categories). A second step is to collect additional keywords, synonyms and thesauri related to a person (e.g. function, party) an issue (e.g. dictionaries) or a variable (e.g. related theoretical work).
First versions of a search strings were subsequently tested against the data set to test precision and recall. Search terms were then added or removed, word distances were adapted and different combinations of sub-blocks and operators were employed. In
some cases, we excluded keywords that may change the meaning of a search term.
The stepwise elaboration of the search strings also involved the gathering of additional keywords by coders, who manually coded media articles according to the same coding scheme (issues, persons, some variables). Coders were asked to collect phrases terms in order to compare these with existing search strings. This further helped to improve both precision and recall of our search strings. Also, at least three different 17 researchers cross-checked the search terms to retrieve misspellings and control for Seite
logical errors. Naturally, some categories are harder to detect with search terms than others.
Table 3 gives an overview of all variables in the dataset. We included several types of variables: metadata, political actors (parties, top candidates, ministers and secretaries of state), political knowledge, style, and issues. All are explained below.
Table 3. Variables overview
Variable type Variable name Variable label
Metadata v0 Article_ID
v1 Headline
v2 Byline
v3_l1 Medium_Ebene_1
v3_l2 Medium_Ebene_2
v3_l3 Medium_Ebene_3 v4a Date
v4 Medium
v5 Seite
v6_l2 Ressort_Ebene_2
v7 Wortzahl
v0_ac amcat uniqueID
Parties vp1 SPÖ
vp2 ÖVP
vp3 FPÖ
vp4 Grüne
vp5 BZÖ
vp6 Team Stronach
vp7 Neos
vp8 KPÖ
vp9 Piraten 18
vp10 CPÖ Seite Seite
vp11 SLP
vp12 EUAUS
vp13 Männerpartei
vp14 Der Wandel
Top candidates vk1 Faymann
vk2 Spindelegger vk3 Strache
vk4 Glawischnig
vk5 Bucher
vk6 Stronach
vk7 Strolz
vk8 Messner
vk9 Wieser vk10 Gehring
vk11 Grusch
vk12 Marschall
vk13 Hausbichler
vk14 Mulla
Members of Government vk15 Hundstorfer vk16 Fekter
vk17 Heinisch
vk18 Stöger
vk19 Mikl
vk20 Karl
vk21 Berlakovich
vk22 Klug
vk23 Schmied vk24 Bures
vk25 Mitterlehner
vk26 Töchterle 19
vk27 Kurz Seite
vk28 Lopatka
vk29 Schieder
vk30 Ostermayer
Political knowledge v8 Voting Age 16
V9 4% Hürde Einzug Nationalrat
Style v10 Conflict
v11 Emotionalization
v12 Language of games, sports and war
v13 Personalization
Issues vi1 Wirtschaft
vi2 Wohlfahrtsstaat
vi3 Budget vi4 Bildung und Kultur
vi5 Sicherheit
vi6 Bundesheer
vi7 Außenpolitik
vi8 Europa
vi9 Infrastruktur
vi10 Gesellschaft
vi11 Umweltschutz
vi12 Institutionenreform
vi13 Immigration
vi14 UNDEFINED
vi99 Kein Policy Issue
5.2. Metadata
Metadata were derived from the source texts (digital newspaper articles, scraped news sites, or transcripts) automatically by AmCAT, where necessary manually corrected or
recoded.
20 Seite Seite
Article_ID (v0) is the ID number by which the article can be recognized and searched within AmCAT (amcat.nl).
Headline (v1) is the news story’s headline, stored as string variable.
Byline (v2) is the text below the headline in the news story, for example a sub- headline, a location or an author, depending on the layout of the source text.
Medium_Ebene_1 (v3_l1) is the media outlet each news story appeared in, categorized by media type. The table below gives an overview.
Table 4. Overview of Medium_Ebene_1
Value Label Translation N Percent Cumulative
1000 TV TV 878 2.04 2.04
2000 Radio Radio 281 0.65 2.69
3000 Tagespresse Dailies 21,648 50.32 53.01
4000 Wochenpresse Weeklies 7,054 16.4 69.41
5000 Websites Websites 13,160 30.59 100
Total 43,021 100
Medium_Ebene_2 (v3_l2) is the media outlet each news story appeared in, categorized by media genre. The table below gives an overview.
Table 5. Overview of Medium_Ebene_3.
Value Label Translation N Percent Cumulative
1100 TV (ORF) Public Service TV 593 1.38 1.38
1200 TV (Privat) Commercial TV 285 0.66 2.04
2100 Radio (ORF) Public Service Radio 281 0.65 2.69
3100 Qualitätszeitungen Elite newspapers 4,150 9.65 12.34
Boulevard- 21 3200 Popular newspapers 6,851 15.32 28.27
zeitungen Seite Seite
Mid-market 3300 Midrange-Zeitungen 2,063 4.8 33.06 newspaper
3400 Bundesländerzeitungen Regional dailies 8,584 19.95 53.01
4100 Magazine Magazines 748 1.74 54.75
Regionalpresse 4200 Regional weeklies 6,306 14.66 69.41 (wöchentlich)
Online 5100 Websites (Medien) 13,015 30.25 99.66
newspapers
Online-only news 5200 Websites (allgem.) 145 0.34 100 sites
Total 43,021 100
Medium_Ebene_3 (v3_l3) is the media outlet each news story appeared in, corrected manually for inaccuracies.
Table 6. Overview of Medium_Ebene_3.
Value Label Value Label
1101 ZiB 4105 Die ganze WOCHE
1102 ZiB2 4211 Woche Graz
1103 ZiB20 4212 Woche Obersteiermark
1104 ZiB24 4213 Woche HBZ
1105 ZiB Magazin 4214 Woche Weiz 1201 ATV Aktuell 4215 Woche Südweststeiermark
1202 Guten Abend Österreich 4216 Woche Murtaler Zeitung
22 Seite Seite
1203 Sat.1 Austria News 4217 Woche Bildpost
1204 Pro7 Austria News 4218 Woche Kärnten
2101 Ö1 Mittagsjournal 4219 Salzburger Woche
3101 Der Standard 4220 Rundschau - Oberländer
Wochenzeitung
3102 Die Presse + Die Presse am 4230 NÖN Niderösterreichische Sonntag Nachrichten
3103 Salzburger Nachrichten 4240 derGrazer
3104 Wirtschaftsblatt 4250 BVZ - Burgenländische
Volkszeitung
3201 Kronen Zeitung 5101 orf.at
3202 Österreich 5102 derstandard.at
3203 Heute 5103 diepresse.com
3301 Kurier 5104 salzburg.com 3401 Kleine Zeitung 5105 krone.at
3402 OÖ Nachrichten 5106 oe24.at
3403 Tiroler Tageszeitung 5107 heute.at
3404 VN Vorarlberger Nachrichten 5108 kurier.at
3405 Wiener Zeitung 5109 kleinezeitung.at
3406 Kärntner Tageszeitung 5110 nachrichten.at
3407 Neue Vorarlberger 5111 tt.com
Tageszeitung
3408 Neues Volksblatt 5112 vol.at
3409 TT Kompakt 5113 vienna.at
3410 SV Salzburger Volkszeitung 5114 news.at
4101 News 5115 noen.at
4102 Format 5201 news.google.at 4103 Falter 5202 gmx.at
4104 Profil
Note: for the frequencies per outlet, see table 2. 23 Seite Seite
Date (v4a) is the date on which the news story was published or broadcasted, saved as string variable.
Medium (v4) is the name of the outlet that published or broadcasted the story. This variable is not corrected for inaccuracies, so we suggest to use Medium_Ebene_3 instead.
Seite (v5) is the number of the page on which a news story appeared.
Ressort_Ebene 2 (v6_l2) is the news category in which the news story appeared (the
categories used differ across media, therefore an overview would take too much space and is omitted).
Wortzahl (v7) is the length of the news story in words. The headline and byline are not taken into account.
AmCAT uniqueID (v0_ac) is the unique ID of the article across all AmCAT servers.
6. Actors (parties, top candidates and members of
government)
Actor variables measure whether an actor is present in an article (coded as 1) or not
(coded as 0).
6.1. Parties (vp1-14)
For measuring the saliency of the political parties that took part in the Austrian
National Election of 2013, we used the following search strings: spo# (spo OR sp OR ("partei sozialdemo*"~5) OR sozialdemokrat* ) ovp# ovp OR vp OR volkspartei OR "partei spindelegger"~5 "die schwarzen" "den schwarzen" fpo# fpo OR freiheitlich* OR fp OR ("partei (freiheitlich* OR Strache*)"~5) grune# ("grune partei" "die grunen" "die grune" "der grunen" " der grune" "die grune partei" "grune alternative" "okopartei" grunpolitiker* "grun politiker*" "den grunen" "tiroler grune*" "wiener grune*" "grune* tirol" "grune* wien" "grune* burgenland" "burgenland* grune*" "grune* kartnen" 24 "karntner grune*" "grune* salzburg" "salzburger grune*" "grune* oberosterreich" "oberosterreichische* grune*" "grune* steiermark" Seite
"steiermarkische* grune*" "grune* vorarlberg" "vorarlberger grune*" "grune* niederosterreich*" "niederosterreichische* grune*") OR ("grune* (spitzenkandidat* vorstandssprecher* landtagsabgeordnet* fpo* freiheitl* blau* spo sp sozialdemokrat* sozialist* rot* ovp vp volkspartei schwarz* neos lif lf kpo kommunist* chef* vorsitz* glawischnig sprecher* bundesvorstand* bundessprecher* landesvorstand* parlament* regierung* opposition* partei* bundnis* fraktion klub* vize* burgermeister stadtrat* landesrat* bereichssprecher* referent* landespart* ortsgrupp*)"~5) OR "(grune)" bzo# bzo OR "bundnis zukunft osterreich" "die orangen" "den orangen" "der orangen" teamstronach# stronach* neos# neos OR "das neue osterreich" OR "wahlplattform lif"~5 kpo# kpo OR "kommunistisch* partei osterreich"~3 kommunisten piraten# (piratenpartei piraten) NOT deutschland* cpo# ("christliche partei" cpo "die christen") NOT (norwegen syrien "hass gegen die christen" "syri* christliche partei"~5 "norweg* christliche partei"~5) slp# "sozialistisch* link* partei" OR "slp" euaustrittspartei# "eu austrittspartei" OR ("partei eu austritt"~1) mannerpartei# "mannerpartei" wandel#("der wandel" AND (kleinpartei* partei* wahl* nationalratswahl* mulla))
Values: 1 = present; 0 = absent.
Precision and recall values for these search strings were as follows:
Table 7. Precision and recall for the search strings for parties.
Party Precision Recall Party Precision Recall SPÖ 1 0.92 KPÖ 1 1
ÖVP 0.96 0.94 Piraten 0.98 0.88 FPÖ 1 0.96 CPÖ 0.98 0.92
Grüne 0.88 0.96 SLP 0.98 0.82 BZÖ 1 1 EU-AUS 1 0.9
Team Stronach 1 1 Männerpartei 0.98 1 Neos 1 0.86 Der Wandel 0.98 1
25 Seite Seite
6.2. Top candidates (vk1-14)
For measuring the saliency of top candidates for each party, we used the following search strings:
Faymann# faymann* Spindelegger# spindelegger* spindi
Strache#strache* Glawischnig#(glawischnig* )NOT "glawischnig dieter"~3 Bucher# "bucher* ( sepp* OR josef OR bzo OR chef OR bundnischef OR obmann
OR bundnisobmann OR parteichef OR faymann OR spindelegger OR strache OR glawischnig OR stronach)"~5 Stronach# stronach* NOT ("stronach (partei* gruppe* team* mitarbeit* klub* parlaments* landes* bezirks* bundes* landtags* burgermeister* spitzenkand* abgeordnet* klubobmann* klubchef* liste )"~1 "stronach (bundniss* koalition*)"~3)
strolz# "strolz (matthias OR neo* obmann* spitzenmann* parteichef* boss* partei* spitzenkandidat* vorsitz* chef* bundessprecher*)"~10 messner#( "messner* (mirko OR chef* OR spitzenkand* OR obmann* OR vorsitz* OR kpo kommunist* bundessprecher* spitzenmann* parteichef*)"~5) wieser#("wieser* (mario pirat* obmann* spitzenmann* parteichef* boss* partei* gruppe* team* spitzenkandidat* vorsitz* chef*)"~5)
gehring# "gehring* (rudolf OR christen OR cpo OR obmann* spitzenmann* parteichef* boss* partei* spitzenkandidat* vorsitz* chef* bundessprecher*)"~5 OR ("gehring christlich* partei"~5) grusch# "grusch* (sonja OR slp OR linkspartei obfrau* spitzenfrauparteichef* boss* partei* spitzenkandidat* vorsitz* chef* bundessprecher*)"~5
marschall# "marschall* (robert OR austrittspartei chef* OR spitzenkand* OR obmann* OR vorsitz* OR kpo kommunist* bundessprecher* spitzenmann* parteichef*)"~5 hausbichler# "hausbichler* (hannes OR mannerpartei* obmann* spitzenmann* parteichef* boss* partei* spitzenkandidat* vorsitz* bundessprecher*)"~5 ("hausbichler partei mann*"~5)
mulla# "mulla (wandel OR fayad)"~5
Values: 1 = present; 0 = absent.
Precision and recall values for these search strings were as follows:
Table 8. Precision and recall for the search strings for top candidates.
Top candidate Precision Recall Top candidate Precision Recall
Faymann 1 0.93 Messner 1 1
Spindelegger 1 0.94 Wieser 0.98 1
Strache 1 1 Gehring 1 1 26
Glawischnig 1 1 Grusch 1 1 Seite
Bucher 1 1 Marschall 1 1
Stronach 0.98 0.64 Hausbichler 1 1
Strolz 1 1 Mulla 1 1
6.3. Members of Government (vk15-30)
For measuring the saliency of the members of Government (Ministers and Secretaries of State) we used the following search strings: hundstorfer# hundstorfer* fekter# fekter* heinischh# ("heinisch (gabriele OR frauenminister* OR sp OR spo OR beamtenminister* OR minister* hossek hosek)"~5) stoger# ("stoger* (alois OR gesundheitsminister* OR sp-* OR spo* OR bundesminister* minister* regierung*)"~5 NOT "peter stoger") Mikl Leitner# "mikl* (johanna OR innenministerin* OR bundesministerin* OR vp OR ovp volkspartei vp-*leitner)"~5 karl# "karl (beatrix OR justizminister* OR minister* OR bundesminister*)"~5 berlakovich# "nikolaus b" OR ("Berlakovich (nikolaus OR umweltminister* OR agrarminister* OR landwirtschaftsminister* OR minister* OR niki OR vp-* OR vp OR volkspartei OR ovp* biene* pestizid* agrar* umwel* landwirtschaft* bauer*)"~5) klug# "klug (gerald OR verteidigungsminister* OR sportminister* OR minister* OR sp OR spo)"~5 schmied# "schmied (claudia OR bildungsminister* OR kulturminister* OR minister* OR unterrichtsminister* OR sp-* OR spo* gesamtschul*)"~5 bures# bures NOT "bures radim"~5 mitterlehner# ("mitterlehner (reinhold OR wirtschaftsminister* OR minister* OR familienminister* OR vp OR vp-* OR ovp* volkspartei energieminister*)"~5) Tochterle#("tochterle karl heinz"~5) OR "tochterle (wissenschaftsminister* OR bundesminister* minister* OR forschungsminister* OR ovp* OR vp OR vp-* volkspartei karlheinz)"~5
kurz# "kurz (sebastian OR intergrationsstaatsekretar* OR staatssekretar* ovp* jvp* volkspartei* vp vp-*)"~5 lopatka# "lopatka (reinhold OR staatssekretar* OR vp OR vp-*OR volkspartei OR ovp)"~5 schieder# "schieder (andreas OR staatssekretar* OR finanzstaatssekretar* OR sp OR sp-*OR spo* sozialdemokrat*)"~5 "schieders (andreas OR staatssekretar* OR finanzstaatssekretar* OR sp OR sp-*OR spo* sozialdemokrat*)"~5 ostermayer# "ostermayer (josef OR staatssekretar* OR medienstaatssekretar* OR sp OR sp-* OR sozialdemokrat* OR spo)"~5 "ostermayers (josef OR staatssekretar* OR medienstaatssekretar* OR sp OR sp-* OR sozialdemokrat* OR
spo)"~5
27 Seite Seite
Values: 1 = present; 0 = absent.
The precision and recall of these search strings is as follows:
Table 9. Precision and recall for the search strings for members of Government.
Minister Precision Recall Secretary of State Precision Recall
Hundstorfer 1 1 Kurz 0.96 1
Fekter 1 1 Lopatka 1 1
Heinisch Hosek 1 1 Schieder 1 0.99
Stöger 1 1 Ostermayer 1 0.91
Mikl Leitner 1 0.78
Karl 1 0.75
Berlakovich 0.96 1
Klug 1 0.83
Schmied 1 1
Bures 1 1
Mitterlehner 1 1
Töchterle 1 0.84
6.4. Political knowledge
Political knowledge variables code whether three political knowledge facts are present
(coded as 1) or absent (coded as 0) in an article. We measured the following political knowledge facts:
Voting Age (v8)
This political knowledge fact concerns Austria’s minimum voting age (sixteen years old). Its search string is as follows:
28 wahlalter16#("wahlalter* 16"~10 "wahlberechtig* 16"~10 "wahlalter* gesenk*"~10 "wahlalter* senk*"~10 "erstwahl* 16"~10 "erstwahl* gesenk*"~10 Seite
"erstwahl* senk*"~10 "wahlen 16"~5 "wer spatestens am Wahltag 16 Jahre alt ist" "ab 16 jahren ist man dabei" "wahl* 16 lebensjahr"~5) NOT (wahlalterna* "16 prozen*" senkungsidee*)
Values: 1 = present; 0 = absent.
Precision: 0,83 Recall: 0,89
Threshold of votes for entering parliament (v9)
This political knowledge fact concerns the threshold for Austrian political parties to enter Parliament (four percent). Its search string is as follows:
Vierprozenthurde#( vierprozenthurde* sperrklausel* ("(hurde* mindest* klausel* sperr* einzug* einzieh* einzuzieh* eingezog* erreich* schaff* uberspring*) (parlament* nationalrat* nr ) (vier 4) prozent*"~5) )
Values: 1 = present; 0 = absent.
Precision: 0.8, Recall: 0.8
Appointment of prime minister (vx)
This political knowledge fact concerns the procedure by which the Prime Minister is
appointed. This procedure was not mentioned in the news, therefore we excluded the variable from our dataset. Its searchstring is as follows:
("(ern?nn* beruf* bestimm*) (bundeskanzler* kanzler* regierungschef* regierungsspitze* regierung) (staatsoberhaupt* prasident* bundesprasident* fischer*)"~5)
No mention throughout the entire election period.
6.5. Style
Style variables are about the style in which an article is written, or an item is presented. The first two variables, conflict and language of games, sports
Conflict (v10) konflikt#((konflikt* unvereinbarkeit* kritisier* kritik* intervention* intervenier* weiterkampf* widerstand* "wehr* gegen"~5 "vorwurf gegen" protest* offensiv* angriff* frontalangriff* widerspruch* "greif* an"~10 "griff* an"~10 kampf* attack* angegriffen* streit* gestritten*)) NOT
29 "(offensiv* angriff* kampf* attack* streit* gestritten*) (syrien* syrisch*
agypt* djihad* iran* sudan* sudsudan* nordkorea* un uno saudi* Seite
staatengemeinschaft* sicherheitsraat* terroris* kaida qaida pakistan sudan* jemen* libyen lybisch* somali* eritre* kongo* elfenbeinkust* ivor* ivoire hamas hisbollah miliz gotteskrieger taliban*)"~25
Values: 1 = present; 0 = absent.
Precision: 0.86 Recall: 0.86
Emotionalization (v11)
Emotionalization is operationalized as a continuous variable that represents the 2 proportion of emotion words in a news story .
We used the sentiment lexicon Sentilex (Wolf et al 2008) to determine which words are emotion words. After slightly altering the lexicon to fit the syntax used in AmCAT, we created a search string from the words in the lexicon. This search string is not published here for reasons of space.
Minimum: 0 Maximum: 1 Mean: 0.433
Precision: 0.90 Recall: 0.89
Language of games, sports and war (v12)
Games# ((verspiel* zweikampf* dreikampf* angriff* angegriff* angreif* vergeltung* terror* nahkampf* wortgefecht* scharmutzel* gefecht* krieg kriege kriegerisch* arsenal* giftschrank* aufmarsch* gegenschlag*
uberraschungsangriff* uberraschungsschlag* armada attentat* bombig* bombadier* bombardier* explosiv* propaganda* neutralisier* konter kontern gekontert kontert* konterattack* konterangriff* feldzug* sprengkraft sprengstoffsprengt* schlachtfeld* fahnd* offensiv* defensiv* hinterhalt* schlacht hauptquartier massaker* massakrier* kampf* gekampft* kanone* schiess* geschoss* zuschlagen zugeschlagen* zuschlug zuschlagt anschlag* kommando* geheimwaffe* belager* kampf* krise* regierungskrise* koalitionskrise* attacke* wahlschlacht* wahlkampf* himmelfahrtskommando* kreuzzug* blitzkrieg* vebunde* verteidigung verteidigt* schutzengr?ben* geschutz mobilisier* erober* vormarsch* bekampf* keule verbotskeule richtungsk?mpf* putsch* widerstand* feind* mitstreiter* rebellisch* aufstand aufruhr* intervention* revolte* palastrevolte* gerangel krawall* scharmutzel* konfrontation* konfrontier* frontalangriff* kampagne* kampagnisier* 30 2 Because the headline of the article is not counted for the article length, were five cases where the degree of emotionalization exceeded 1 (100%). We left these values in the dataset without recoding. Seite
aufbegehr* einschiess* eingeschoss* entscharf* stichel* heckenschutze* abschiess* abgeschoss* querschiess* querschuss* querschutze* front fronten koalitionsfront* regierungsfront* oppositionsfront* grabenkampf*
gelandegewinne vorstoss* ruckendeckung* deckung* storfeuer* durchbruch* abruck* ruckzugsgefecht* verluste* ruckzug vordringen* fuhrung* positionspapier* bollwerk* manover* taktik* taktisch* wahltaktisch* strategie* strategisch* wahlstrategi* kapitulation* kapitulier* flucht fluchtet* gefluchtet* stormanover* giftpfeil* blockad* blockier* mobilisier* rekrutier* flankier* ruckendeck* mitstreiter* bollwerk* schlachtschiff* machtkampf* kampfkandidat* kampfabstimm* uberflugel* vorbeizieh* vorbeigezog* tauchstation* abtauch* abgetaucht* favorit* aussenseiter* outsider* geheimfavorit* rochade* personalrochade* rochier* todesspirale* foul* finale* angezahlt* anzahl* angeschlagen* triumph* pyhrrus* steilpass* steilvorlage* matchwinner* eigentor* powerplay arena wahlkampfarena duell* kanzlerduell* gladiator* auferstehung* opfergang* aufstachel* aufgestachel* fair* unfair* gewinn* sieger* verlierer* schlappe uberrunde* bruchlandung* angeschlagen* angezahlt* startschuss* startrampe* startlocher*) ("(sturm ansturm) auf") ("war room") ("aus volle* rohr*") ("(ins im) (abseits aus)") ("rote karte")
("(auflage vorlage) verwandel*") ("(fest sicher) im sattel") ("auf der strecke (bleib* blieb geblieb*)") ("kopf an kopf") ("ins ziel (kommen gekommen* kam kommt)") ("ins spiel (bringen brachte* gebracht* bringt)") ("stellung (einnehmen eingenommen einnahm beziehen bezog* halten gehalten hielt)") NOT (syrien* syrisch* agypt* djihad* iran* sudan* sudsudan* nordkorea* un uno staatengemeinschaft* sicherheitsraat* terroris* kaida qaida pakistan* sudan* jemen* libyen lybisch* somali* eritre* kongo* elfenbeinkust* ivor* ivoire hamas hisbollah miliz gotteskrieger taliban*))
The search string was based on examples from codebooks for manual coding of this variable (see Kleinen-von Königslöw et al. 2015) and by consulting thesauri and dictionaries.
Values: 1 = present; 0 = absent.
Precision: 0.94, Recall: 0.79
6.6. Personalization (top candidate focus) (v13)
Personalization is operationalized as a continuous variable that represents the proportion of news stories mentioning a top candidate out of all stories mentioning a
party name (including the top candidate, if his or her party is mentioned), for the
seven largest parties participating in the 2013 election. We measured this using the 31
search strings for these parties and candidates, added up the results for all parties Seite Seite and all candidates, and divided the number of stories mentioning a top candidate by the number of stories mentioning a party. The resulting number between 0 and 1 represents the degree of personalization.
Minimum:0 Maximum:1 Mean: 0.113
Precision:0.98 Recall: 0.95
6.7. Issues (vi1-vi14; vi99)
The search strings for issues reflect the issue categories from the manual coding of this variable. Coder instructions (see Kleinen- von Königslöw et al. 2015) were used to create these search strings. We further relied on collections of issue-specific keywords from our coders. Further broadening was achieved by consulting thesauri and dictionaries and extensive test applications.
Please note that the search strings do not reflect the same logic as the manual content analyses. Hits do only refer to an issue occurring in an article. For technical
reasons, some of the issue search strings do only work when called through AmCAT’s R interface. We therefore generally recommend using R for AmCAT issue queries. While we deliver search strings for most substantive issues on AUTNES level 3, we only validated them on level 1 (most general level). We therefore advise validating issues on more fine-grained levels according to specified use.
Values: 1 = present; 0 = absent.
Precision: 0,82 Recall: 0,80
32 Seite Seite
7. Validation procedure
In order to validate our search strings, we calculated their precision and recall, either on the level of individual search strings (for those search strings that measure one of our variables directly) on an aggregated level (for those search strings that measure whether an issue is mentioned in an article or not). For measuring precision, we took a random sample of 50 articles from the search results of each search string, and coded these manually for the presence or absence of our variables. If the variable is present in all articles, precision is 100%. For measuring recall, we drew a random sample of 100 articles from our total population of articles, and coded this manually for the presence or absence of our variables. Then, we compared this to the results of a semi-automatic analysis using search strings, on the same articles.
For the political knowledge variables, that occurred only very rarely, we used strategic samples for the validation process – samples that were more likely to contain the concept than a random selection of articles. For each variable, we first created a
sample that was likely to contain the concept in around half of the articles (e.g. for Voting Age, we used the search term “wahl* 16”~10 OR wahlalter OR erstwahl ), and drew a random sample of 100 articles from the result.
33 Seite Seite
8. References
Davis, J; Goadrich,M (2006). "The relationship between Precision-Recall and ROC curves." Pp. 233-40 in 23rd international conference on Machine learning,
June 25-29. Pittsburgh, Pennsylvania. Fawcett, T (2006). An Introduction to ROC analysis. Pattern Recognition Letter 27:861-74.
Kleinen-von Königslöw, K; Vonbun, R; Eberl, J-M; Haselmayer, M; Jacobi, C; Schönbach, K; Boomgaarden, H. 2015. AUTNES Manual Content Analysis of the Media Coverage 2013 - Documentation. Wien: Universität Wien.
Van Atteveldt, W (2008). Semantic Network Analysis: Techniques for Extracting,
Representing and Querying Media Content. Charleston: BookSurge.
Wolf, M; Horn, A B; Mehl, M R; Haug, S; Pennebaker, J W; Kordy, H (2008).
Computergestützte quantitative Textanalyse: Äquivalenz und Robustheit der deutschen Version des Linguistic Inquiry and Word Count.” Diagnostica
54(2): 85-98. doi: 10.1026/0012- 1924.54.2.85.
34
Seite