Internet language in user-generated comments

Linguistic analysis of data from four commenting groups

Internetspråk i användarkommentarer

Lingvistisk analys av material från fyra kommenterande grupper

Jenny Dahlström

Faculty of Arts and Education English English III: Degree project in linguistics 15 hp Supervisor: Pia Sundqvist Examiner: Solveig Granath Autumn 2012

Title: Internet language in user-generated comments: Linguistic analysis of data from four commenting groups Titel på svenska: Internetspråk i användarkommentarer: Lingvistisk analys av material från fyra läsargrupper Author: Jenny Dahlström Pages: 33

Abstract The present study examines typical features of internet language found in user-generated comments collected from commenting groups from four online magazines aimed at different readerships: (1) adult women (Working Mother and Mothering), (2) adult men (Esquire), (3) young women (Seventeen) and (4) young men (Gameinformer). Approximately 5,000 words from each commenting group were collected, creating a 21,087 word corpus which was analyzed with regard to typographic (emoticons, nonstandard typography of and, personal pronouns you and I) and orthographic features (abbreviations, ) as well as syntactic and stylistic features resembling spoken language (contracted forms, ellipsis of subject and/or verb and commenting tone). The results show that adult men wrote the longest comments, followed by adult women, young men and young women in descending order. Furthermore, as for the typical features regarding typography and orthography, it was found that among the four commenting groups, adult men and adult women used them very sparsely, young men used them occasionally and young women used the features most frequently. The analysis of tone showed that adult men mostly used an aggressive or neutral tone, while adult women, young women and young men mostly used a friendly or neutral tone. Young women used an aggressive tone more often than adult women and young men. Moreover, regarding the syntactic and stylistic features, results revealed that the young men were the most frequent users of ellipsis of subject and/or verb, followed by adult women, young women and adult men. Contracted forms were used extensively in the potential places of contractions, regardless of commenting group. Since young men used the ellipsis of subject and/or verb most frequently of all commenting groups and also used the contracted forms in all potential places of contractions, the conclusion is that the young men used a style that is closer to spoken English than the three other commenting groups.

Keywords: asynchronous CMC, internet language, netspeak, chatspeak, user-generated content, user-generated comments, reader responses, gender

Sammanfattning på svenska Den här studien undersöker språkdrag som är typiska för språk på internet. Det material som har undersöks har hämtats från användarkommentarer i nättidningar som är riktade till fyra olika läsargrupper: (1) kvinnor (Working Mother, Mothering), (2) män (Esquire), (3) unga kvinnor (Seventeen) och (4) unga män (Gameinformer). Cirka 5 000 ord hämtades från kommentarsfälten för varje tidning, vilket resulterade i en korpus som omfattade 21 087 ord totalt. Korpusen analyserades med hänsyn till typografiska språkdrag (smileys, ickestandardiserad stavning av personliga pronomen I och you samt and) och ortografiska språkdrag (förkortningar, akronymer) samt syntaktiska och stilistiska språkdrag som påminner om talspråk (sammandragningar, ellips av subjekt och/eller predikatsverb, tonläge). Resultaten visade att män skrev de längsta kommentarerna, följda av kvinnor, unga män och unga kvinnor i fallande ordning. Vad gäller typiska typografiska och ortografiska språkdrag visar resultatet att de återfanns mycket sparsamt i kvinnornas och männens data, att de återfanns då och då i de unga männens data och att de unga kvinnorna var de som använde dessa språkdrag mest frekvent. Analys av tonläge i användarkommentarerna visade att män oftast använde en aggressiv eller neutral ton, medan kvinnor, unga kvinnor och unga män oftast använde en vänskaplig eller neutral ton. Unga kvinnor använde en aggressiv ton oftare än kvinnor och unga män. Utöver detta visade resultatet att ellips av subjekt och/eller predikatsverb var mest frekvent i de unga männens användarkommentarer, följt av kvinnornas, de unga kvinnornas och männens. Sammandragna former användes näst intill undantagslöst i hela korpusen. Eftersom pojkarna uppvisade mest frekvent användning av ellips av subjekt och/eller predikatsverb samt använde sammandragna former i full utsträckning, kan slutsatsen dras att de unga männens syntax är mer påverkad av engelskt talspråk än syntaxen hos de tre andra kommenterande grupperna.

Nyckelord: internetspråk, användarkommentarer

Contents

1. Introduction ...... 1

1.1 Presentation of aims ...... 2

2. Theoretical background ...... 3

2.1 Internet language ...... 4

2.2 Typographic and orthographic features ...... 6

2.3 E-grammar and level of formality ...... 8

2.4 Internet language and gender ...... 9

2.5 Internet language among young men and young women ...... 11

3. Methods ...... 12

3.1 Material ...... 13

3.3 Methods of analysis ...... 16

3.4 Ethical considerations ...... 18

3.5 Methodological considerations ...... 18

4. Analysis and results ...... 19

4.1 Length and number of comments ...... 20

4.2 Typographic features: emoticons and typographic respellings ...... 22

4.3 Orthographic features: abbreviations ...... 23

4.4 Syntactic features and level of formality ...... 25

4.5 Commenting tone ...... 27

4.6 Further remarks on the results ...... 29

5. Discussion: Implications for the future of the ...... 30

6. Conclusion and future research ...... 32

References ...... 35

Appendix A: Emoticons found in the corpus ...... 37

Appendix B: Table of additional contractions ...... 38

1. Introduction

There have been various attempts to define the language used on the Internet. Squires (2010:457) explains that internet language is the variety of language commonly used for communication on the internet and in other types of electronic communication such as mobile phone text messages. The terms netspeak, chatspeak, computer mediated communication (CMC) and electronically mediated language are also employed to describe language used on the internet (Baron, 2009:43, Squires, 2010:457 -459). Abbreviations, blends, respelling of words and emoticons are examples of features that would be considered typical of this variety of language. There are also grammatical structures that are typical of internet language. For example, it is common to leave out the subject in sentence construction and using punctuation in a more informal, and often exaggerated, way. Though the features mentioned above are often considered typical of the language variety, scholars in this field stress that the language used on the internet varies with its user, its purpose and the genre in which the communication occurs (e.g., Baron, 2004:398, Hård av Segerstad 2002:14, 16-21, Squires 2010:463). In this paper the term internet language will be used. It is a term introduced by Lauren Squires, who uses it to imply that it is a language variety that has emerged from and is used on the internet (Squires, 2010:485). It should be stressed that the term does not imply that the features of internet language only are used on the internet - they can be found elsewhere as well.

Research about internet language often examines to what extent the language variety consists of features from spoken language. Internet language is predominantly written, yet consisting of features of both spoken and written language, according to Squires (2010:462). The fact that the language variety is informal, synchronous and ephemeral means that it contains features typical of speech and the fact that it is editable, text-based and asynchronous means that it contains features typical of writing. Research has shown that asynchronous internet language, such as blogging, holds a position closer to writing whereas synchronous internet language, such as communication in chatrooms, is closer to speech (Herring, 2007:8).

What impact does internet language currently have on changes in the English language? What impact will it have on English in the future? Though the answers to these questions must be mere speculations, they are often reflected in public discourse. Squires (2010: 467, 475) explains that discussions in the mass media about this topic are often infected. Many people are afraid of language change and take a prescriptive view, stating that Standard 1

English is superior to the more informal language used on the internet. The same opinions were found in a study of articles from the print media in the early 2000s, where Thurlow (2006:667) examined headlines and found that they warned about the effects of internet language on language change. But will internet language make Standard English change at an alarming rate as the articles predicted? Will it dumb down the English language? Will not users of English still be able to understand in which contexts informal or formal language is suitable? It is important to find answers to these questions, both to avoid prejudices against the language variety and for the purpose of making a contribution to the public discussion, with arguments from scholarly research.

1.1 Presentation of aims

The aim of this study is to examine the frequency of features typical of internet language and relate the findings to different age and gender groups. The research questions are: - To what extent are features considered typical of internet language used in user- generated comments in online magazines? - What linguistic differences can be seen in the language used in the user-generated comments from the following commenting groups: (1) adult men (2) adult women (3) young men and (4) young women? - What does scholarly discourse imply about the impact of internet language on the future English language? What impact of internet language on English can be seen in this study?

In order to answer these questions, the extent of some typical features present in internet language is investigated using data from user-generated comments. The user-generated comments are appended to articles, blog posts and columns in online magazines geared towards four different groups of readers: adult men and adult women plus young men and young women. The material, on which the study is based, consists of a corpus of approximately 5,000 words from user-generated comments from each commenting group. The corpus is analyzed with regard to linguistic differences between the commenting groups as regards the frequency of nonstandard typography, abbreviations, acronyms, contractions, ellipsis of subject and/or verb, commenting tone and level of formality in language use.

The choice to study abbreviations and acronyms is based on Squires’ (2010:467) information that these are the nonstandard written features most commonly presented by the mass media as typical of internet language in the first decade of the 21st century. Ellipsis of subject and/or verb is studied based on Herring’s (2011:5) description of the syntax of internet language as fragmented and Crystal’s (2006:467) claim that the non-standard usage of verbs is a feature of the grammar of internet language. The study of ellipsis of subject and/or verb together with contractions provides results at the syntactic level. In order to provide results at the 2 stylistic level, each user-generated comment is categorized according to the tone of language and content. The method to categorize comments according to tone is based on Herring and Kapidzic’s (2011) previous study on tone in instant messaging (IM). Herring (2007:19) has earlier described tone as the manner in which the communication is carried out.

The texts in the corpora were run separately in the free version of the software program Linguistic inquiry and word count (LIWC), which can be found online.1 The program automatically calculates the frequencies of different categories of words, such as self- references (I, me, my), social words, positive and negative emotions, cognitive words, articles and words with more than six letters (big words). These linguistic dimensions are indications of the level of formality in the writings as they are compared to percentages provided by LIWC for reference to formal and personal texts. Previously this method has been used in an analysis of teen chatroom language made by Herring and Kapidzic (2011) and in my study it provides quantitative results at the word level.

2. Theoretical background

As discussed above, various attempts have been made to define internet language. Crystal (2006:20) characterizes it as a language with features unique for the internet and explains that it has originated from a medium that is electronic, global and interactive. Squires (2006:463), among others, does not agree with this definition since many of the features Crystal describes as part of internet language are not new or unique: they have been used before, for example in personal notes and telegraph messages. She explains that although there are features that have emerged through the use of the internet as a medium, they are neither used in the same way by all users nor used in all situations of communication on the internet (Squires, 2010:463). The use of internet language varies with purpose and genre – as all language use does. The divergence between Crystal’s and Squires’ views of internet language implicates that research about language use and communication on the internet has not yet established its boundaries. Scholars are still breaking ground to investigate the impact internet language has and will have on the English language.

This section will present internet language and give a historical as well as a current picture of the variety. The discrepancy in opinions between scholars of this particular research area and the print media is discussed, as are differences in language use between women and men.

1 LIWC, http://liwc.net/tryonline.php 3

Research about grammatical features considered typical of internet language is also presented below.

2.1 Internet language

When trying to decide how internet language should be defined, several ideological issues need to be considered. Squires (2010:458) points to the following ideologies which she finds central when enregistering internet language: - Linguistic correctness (internet language compared to Standard English) - Distinction between what happens in “real” life (IRL) and what happens on the internet (virtual , VR) - Technology driven language change (technological determinism) - Social acceptability - English language protectionism

During the 1990s when research began in this field, the language used in CMC was considered to be a result of the medium used to produce it, i.e., a result of the computer. It was implied that the language variety and the words used in that variety were a direct result of technology (Squires, 2010:461). The term for this view is technological determinism and it puts the place where communication occurs, i.e., in the computer, in leading position before the actual language used to communicate. Today linguists who study language on the internet take into consideration other ideological issues, such as the common opinion that Standard English is superior to internet language and that the use of internet language is not considered politically correct as it is assumed to differ greatly from Standard English. These are sociocultural and historical considerations that need to be a part of the discussion about this particular variety of language, according to Squires (2010:460). In addition, there is another sociocultural question to address and consider: The speech community of internet language is to a great extent heterogeneous – the people of the speech community are different regarding for instance nationality, social characteristics, age and gender. This means that internet language will vary with its users and the genre and situation in which communication takes place (Hård af Segerstad, 2002:14, 16-21).

In the early days internet language was pictured more like a new lexical register than a new language variety. The mass media printed glossaries of netspeak that were translated into Standard English so new users could understand the jargon (Squires, 2010:465). Even today technological determinism is highly present in public discourse and often reflected in the mass media. This has in some ways determined people’s views on internet language and established it as a more fixed language variety than it actually is, according to Squires (2010:461, 464). Thus, ideological issues, as well as the picture produced by the mass media,

4 have affected public discourse on the subject over the last twenty years. This has made people anxious about language change and an attempt to control the trends of language change is made through prescriptivism, according to Thurlow (2006:668), who also states that many blame technology for this decline in standards. Thurlow (2006:679, 686) explains further that online language is caricatured on a selected set of features in the mass media. Thereby it is presented as a threat to literacy and this makes people in general look at the variety with an exaggerated fear of drastic changes in language.

To find out whether the picture of internet language that Thurlow found in the early 2000s is still present in public discourse online, I searched the social network site Facebook for online groups about netspeak and chatspeak. Many groups sending the same kind of messages were found. The groups convey that internet language is a language variety considered inferior and a threat to future English. The groups have names like “Chatspeak is an insult to the English language”2 and “Your chatspeak pisses me off; learn to type real words”.3 My search for the opposite stand – groups that argue for the use of internet language – resulted in nothing on the same social network site; there were no groups to be found on Facebook that defended the use of electronically mediated language.

Another example that shows evidence of the picture presented by Thurlow is the online collaborative Urban Dictionary and its definition of the word chatspeak. Users of chatspeak are there said to be looked at in a disparaging way by people who have the decency to type out full sentences and know how to spell correctly. The following definitions of chatspeak were found in the Urban Dictionary:

(1) Also known as webspeak, chatspeak is basically an illiterate way of typing, and a way to massacre a language. Shortening words (such as you to u), insisting on ignoring captials [sic], making words numbers, (such as 2 or 4) and not using endmarks are all parts of chatspeak. For most people it annoys them shitless, but certain people insist. (2) This is a form of speech in which one shortens words and replaces the letter "s" with the letter "z" in an effort to save time and look cool. Chatspeakers also rarely use capitalization or correct punctuation. Chatspeakers are generally looked down on by people who can actually spell and who have enough self-respect to type out a real sentence. Chatspeak can never be considered 'literate'.4

2 https://www.facebook.com/pages/Chatspeak-is-an-Insult-to-the-English- Language/105385522831983?fref=ts. Accessed: 10 October, 2012. 3 https://www.facebook.com/pages/Language-On-The- Internet/206373759390544?ref=ts&fref=ts#!/pages/Your-CHATSPEAK-pisses-me-off-LEARN-TO- TYPE-REAL-WORDS/10150099618180646?fref=ts. Accessed: 24 October, 2012. 4 http://www.urbandictionary.com/define.php?term=chatspeak Accessed 13 November, 2012. 5

In my opinion, these examples show that there is a public reaction to electronically mediated language and the reaction sends out signals that the English language is degenerating on account of internet language.

While public discourse about CMC is often concerned with people’s fear that the English language will go to ruin, linguistic scholars like Baron, Crystal and Herring are more hesitant about this assumption. They conclude that purists can relax since the digital of English is doing more good than harm (Boyd in Squires, 2010:467). In fact, scholars studying internet language take quite the opposite stand from public discourse and point to facts that show that electronically mediated language might not be as different from Standard English as was previously thought. Many findings from scholarly research reject the assumption that internet language is a unique and uniform language variety, since – as has been discussed previously – it varies with purpose and genre like all use of language (e.g., Hård af Segerstad, 2002:14. 16-21, Squires, 2010:463). With this in mind, my conclusion is that the fear of language decay is most likely a result of the fact that the media as well as public discussions on the internet are responsible for spreading a myth about internet language – a myth built on the exaggerated, or misinterpreted, scope of language change. The myth implies that internet language is very different from Standard English and that its impact makes language change very rapidly. This mediated picture has given people in general a distorted picture of internet language which is different from that of the scholar.

2.2 Typographic and orthographic features

Research on internet language has provided evidence for a language variety with distinctive written features, according to Squires (2010:457). Most common are acronyms like BRB (be right back) and LOL (laughing out loud), abbreviations such as coz for because and u for you, and respelling of words such as gal or grrlz where both respellings refer to the word girl/s. The following changes in spelling are presented by Crystal (2006:86-98) as typical of internet language: orthographic reduction of letters as in thx for thanks, rebus replacements of letter combinations such as gr8 and b4 (great and before), capitalization and punctuation that varies from standard use like in rAndoM!!!!!!!!!!!!!!!!!!!! (my own example) and letter replacements such as s spelt z. Acronyms and abbreviations are examples of the loosened orthographic norms widely considered to be a defining character of internet language, though Herring (2011:3) stresses that they are not to be interpreted as misspellings but as a means to emphasize the playfulness and creativity in chat-language.

6

Squires (2010:467) explains that abbreviations and acronyms are presented as iconic characteristics of internet language in the mass media in the mid 00s. The mass media’s assumption that these features are most common is based on the idea that internet communication needs to save time and space in order to be efficient. As discussed above, findings by scholars like Baron (2004:416), Squires (2010:482) and Tagliamonte and Denis (2008:12) contradict the presentation of internet language in the mass media. These scholars have analyzed corpora from IM and found that “characteristic IM forms” are not used as often as the media has led people to believe. In fact, they are not used often at all – Tagliamonte and Denis’ (2008:12) study shows a proportion of 2.4% of the total corpus they analyzed. This means that approximately 650 items out of nearly 27, 000 words were considered typical of IM language.

Also a study of text messages and IM carried out by Baron (2008:154) showed the same low frequency of language features considered typical of internet language. The study was carried out on female language among college students. Eight acronyms were found in the text messages and four in the IM conversations: a total of eight examples of lol, two of omg, two of ttyl (talk to you later) and one wtf (what the fuck). The two different corpora studied by Baron were made up of a total of 2,619 words (Baron, 2008:151-154). In the same study, no examples of abbreviations typical of internet language were found in the IM corpus, after the lexical shortenings ya (you), prob. (probably) and em (them) were excluded on account of not being typical of online language. In the text messages 47 abbreviations were found, among others u (you), r (are) and k (OK) (Baron, 2008:154).

In internet language there is an overlap between nonstandard typography and nonstandard orthography (Herring, 2011:2). This means that replacing letters with symbols representing the sound, e.g., gr8 for great and u for you, can be classified as both nontraditional spelling and as a typographic characteristic of internet language. Early studies of online language emphasized the creativity in language use as the foremost drive for these phenomena, but later research suggests that only a small number of nontraditional spellings have been standardized and they occur most frequently in mainstream online contexts (Kapidzic in Herring, 2011:3). The examples mentioned are u for you, msg for message and wanna for want to.

Both Squires (2010:482, 484) and Tagliamonte and Denis (2008:14) note that the nonstandard form of first-person pronoun I written in the lowercase letter is more common in IM than its standard equivalent. Individual differences in the use of apostrophes in contractions and the possessive are other findings made by Squires, who notes that some 7 individuals use the apostrophe all the time while others never use it. She finds it notable that these orthographic features are rarely discussed as a part of internet language in the mass media, even though they are more common in IM than acronyms and abbreviations (Squires, 2010:482).

Another typographic characteristic of internet language is the use of the non-alphabetic keyboard symbols used in emoticons: sequences of keyboard characters that represent facial expressions (Herring, 2011:2). Western-style emoticons are viewed at a 90-degree angle (:D for a laughing face), while Asian-style emoticons are viewed straight on (O_o for a confused face). Crystal (2006:41-42) claims that smileys are one of the most distinctive features of internet language, but at the same time he explains that studies have shown that they are not very common. Also Herring (2011:2) argues that studies have shown that emoticons occur less often than popularly believed in English internet language. Both Crystal and Herring explain that the most frequently used emoticons are variants of a happy face, e.g., smileys like :) and :))), and a winking face, e.g., winkies such as ;) and ;-) (Crystal, 2006:41, Herring, 2011:2). In the study of text messages and IM done by Baron (2008:151-154), emoticons were very infrequent in the messages sent by females. In the corpus of 1,473 words from text messages only two emoticons occurred, and in the corpus of IM conversations, five emoticons were found in the 1,146 words (0.7%). Out of the eight examples, seven were smileys (Baron, 2008:154).

2.3 E-grammar and level of formality

Herring (2011:1) introduces the term e-grammar to represent the set of features that characterize the grammar of electronic language. E-grammar does not imply that there is a single grammar for all varieties of electronically mediated language, since there is evidence that points to e-grammar as “varying systemically across languages, contexts, users, and technological modes” (Herring, 2011:1). Herring (2011:6) refers to previous research of CMC in which it has been shown that internet language can be distinguished from traditional genres of speech and writing. Typically it falls between the two categories when scholars measure the frequency of grammatical function words such as pronouns, determiners, modal auxiliaries and negations in corpora. Herring (2011:6) explains that asynchronous modes of internet language, e.g., email, are often closer to formal writing while synchronous chat often is closer to casual speech.

The syntax of internet English is sometimes described as fragmented compared to standard syntax (Herring, 2011:5). Herring points out that parts of speech such as articles and subject pronouns can be elided in order to save keystrokes in informal styles. Messages that consist 8 of sentence fragments where the subject and/or finite predicate are left out (ellipsis of subject and/or verb), are common in genres like chat, IM, texting and microblogging. Herring (2011:5) states that ellipsis of subject and/or finite verb may be a way to try to type speech- like utterances or may be done as to be brief. Other research shows that internet language can be syntactically casual with freely omitted subjects, modals or articles (Maynor in Baron, 2001:193).

Another part of e-grammar, which traditionally has also been considered a marker for level of formality in a text, is the use of contractions. Baron (2008:154) analyzed the use of contracted forms in text messages and IM conversations. In IM, contracted forms were used to a rate of 68% and in texting 85% of all potential contractions were contracted. The apostrophe in contractions was used far more frequently in IM than in text messages: 32% of the contractions in texting had the apostrophe and 94% of the contractions in IM. Baron (2008:160) suggests that there might be different reasons for leaving out the apostrophe in contractions in the two different modes of electronically mediated language. For instance in texting, several extra steps are needed to insert the apostrophe, while inserting the apostrophe while typing demands almost no effort (Baron, 2008:154). She reasons that the cases where the apostrophe actually was inserted in text messages can reflect the author’s writing habits from the computer keyboards (Baron, 2008:160).

2.4 Internet language and gender

When internet language was first discussed by sociolinguists in the 1990s, it was believed that gender roles would be more equalized as the form of communication was more anonymous than face-to-face communication (Baron, 2004:405). But soon the idea that the anonymity of the internet should make communication more coequal had to give way to ”the realization that online dynamics often replicated offline gender distinctions” (Baron, 2004:405). If this is linked to Walther and Jang’s (2012:4) claim that user-generated content can be in a reactive or interactive interrelationship with the article it comments on, I assume that gender differences in the style of internet language are evident in the same way as gender differences are in speech-style.

Previous research has shown some gender differences in the use of internet language and online communication. For example, Herring (2003:207, see also Baron, 2004:405-406) analyzed one-to-many asynchronous CMC and found that men tended to post longer messages and to be the ones who opened and closed conversations in gender-mixed groups. Other findings by Herring (2011:40) show that men express their opinions strongly, use

9 harsh language and hold an adversarial orientation towards their interlocutors. Women, on the other hand, tend to post relatively short messages where they express support of others and manifest their alignment towards their interlocutors. Herring’s (2003:210) earlier study shows that representation of smiles and laughter (e.g., emoticons, lol, hahaha, hehe etc.) are typed three times as often by women than by men in chatrooms, while the gender ratio is reversed for aggressive and insulting behaviour. The fact that women write shorter messages contrasts with traditional research on writing, where findings show that women write longer texts than men (Baron, 2004:418). To conclude, previous studies show that men and women tend to use different discourse styles in asynchronous CMC (Herring, 2003:210). Women tend to use a style which is supportive and aligned while men more often use a style which is oppositional and adversarial.

While Herring notes differences in male and female language in asynchronous CMC, Atai and Chahkandi (2012:887) conclude that gender-specific stylistic features were used by both sexes, i.e., men used features considered typical of female language and the other way around. The analysis was based on a corpus from two different professional listservs used by both men and women. Their study also showed that the topic of conversation contributed to the choice of language features and also that topics with real world consequences attracted women more, while men tended to be attracted to topics containing abstract theorizing. If the topic was about real world consequences, men wrote longer messages than women on the same subject (Atai and Chahkandi, 2012:884). This can be linked to Dare’s (2011:185) opinion that a holistic examination of the purposes and motivations for communication is an important focus when identifying gender differences in online communication. Other scholars like Walter and Jang (2012:4) also state that the qualities of linguistic, stylistic and semantic components of a message are of interest in research.

When studying a motherhood blog in Brazil, Braga (2011:215) took such a holistic view. She found that women participating in guestbook activities of the blog used this activity to recover the social practice of “woman’s talk” – a practice the interlocutors were missing in their private lives. Generally this female form of communication is viewed from a male perspective and thereby considered as useless and futile, explains Braga. She elaborates on blog environments and language:

In blog environments, topics are generally addressed in a backstage language, closer to spoken than written language, even if their comments are all written. In other words, blog communication is a written form of spoken language. (Braga, 2011:218)

10

While Herring (2007:19, see below for further explanation) uses the term tone to categorize conversations, Braga uses attitude (2011:218). The most regular pattern of attitude in comments tagged to the motherhood blog that Braga has analyzed is kindness (Braga, 2011:218).

2.5 Internet language among young men and young women

When researching synchronous CMC in IM, Baron (2004:414-415) found indications that conversations between female college students were about one third longer than conversations between male college students. In other words, females took more turns in conversations. Baron’s corpus is based on IM conversations between college students and consists of about 12,000 words. Her study showed no patterns based on gender in the use of acronyms and abbreviations, but showed a difference in the use of contractions and emoticons: - males used contracted forms to a rate of 77% while females used them to a rate of 57% - twice as many emoticons were found in the female’s messages, and it is noteworthy that all the emoticons used by males were actually from one single person’s conversations with females

In IM, female college students took longer turns than men in the conversations and their conversations were longer as well (Baron, 2004:418). This could be said to contrast with the findings of Atai and Chahkandi (2012) and Herring (2003) in their studies of adult language, where they found that men tended to write longer messages than women. Baron discusses the fact that there are no indications that same-gender conversations in speech show patterns that women’s oral discussions are longer than men’s, but women tend to write longer sentences in essay writing. Her conclusion is that her corpus shows evidence of a female writing style, but not of a female speech style (Baron, 2004:418). In addition to women’s lesser use of contractions, this fact indicates that women look upon IM as a written medium (Baron, 2004:418).

Another gender difference found in studies on internet language is the use of different stylistic tones in which messages are written. Herring (2007:19) refers to tone as the manner in which a message is performed. The message can be emphasized with the use of emoticons that take on different pragmatic meanings depending on the tone (Herring, 2007:21). According to Leurs and Ponzanesi (2011:205), the interlocutors can emphasize, hide and add nuances to their identities through the use of subject matter, voice, tone and emoticons when they communicate online. In this way it is possible for people in adolescence to monitor response and interactions of others and this will contribute to the development of the identity of the adolescent self (Leurs and Ponzanesi, 2011:206). Guiller and Durndell (2006, quoted 11 in Herring and Kapidzic, 2011:42) found significant gender differences in the use of stylistic variables when researching students’ language in asynchronous discussion groups. Their results showed that young men were more likely to use authoritative language and to respond negatively in interactions, while young women showed support, agreed explicitly and made more personal and emotional contributions.

In support of the findings of Guiller and Durndell, Herring and Kapidzic (2011:42) cite research about tone in profiles on MySpace, done by Thelwall, Wilkinson and Uppal (2010). Their research showed that female messages had a positive tone to a greater extent than the male messages did. Herring and Kapidzic’s (2011:48) own study of tone in IM showed that young women used a friendly tone much more often than young men, who used an aggressive or flirtatious tone more often than young women did. It also showed that young men and young women used the neutral tone to the same extent.

3. Methods

In this study a corpus of 21,087 words was collected in order to investigate typical features of internet language. The data was collected from comments on articles in online magazines. The term user-generated content is used by Walther and Jang (2012:2) and is a definition of one of the message types used in participatory websites - also known as Web 2.0 or social web sites. User-generated content includes readers’ responses to both proprietor content and other user-generated content (Walther and Jang, 2012:4). The systems in participatory websites present both

central messages posted by a web page proprietor, and user-generated content that other readers contribute. These systems can both facilitate and complicate social influence because they provide information from a variety of sources simultaneously who possess different attributes and connote different relationships to readers. (Walther, as seen in Walther and Jang, 2012:2)

One example of such user-generated content is talk-back features that are tagged to online news stories (Walther and Jang, 2012:2). In my view this is comparable to comments posted by readers in online magazines and therefore the term user-generated comments is used in this study.

12

3.1 Material

The corpus was collected from online magazines geared towards adult men, adult women, young men and young women respectively. In order to find suitable magazines, I began by searching Top 10-lists of magazines for men, women and teenagers, since my study is focusing on both gender and age. Different online magazines were visited and I made my choices based on the explicitly targeted readership of each magazine as well as on the ease of accessing article comments. To each commenting group, i.e., adult men, adult women, young men and young women, a second magazine option was selected in case there would not be enough user-generated comments to reach a total of 5,000 words from the first magazine.

The online magazines used for collecting the corpus have explicit targeted readerships, expressed either on the publisher’s homepage or on the homepage of the actual magazine. The two magazines used to collect the adult women’s data are Working Mother (http://www.workingmother.com/) and Mothering (http://www.mothering.com/), both of which are geared towards women. Seventeen (http://www.seventeen.com/) is geared towards teenage girls and young women and was used for collecting the data for young women. The adult men’s data was collected from Esquire (http://www.esquire.com/), which is geared towards men. A magazine geared towards people who play computer- and TV- games and are interested in technical equipment, Gameinformer (http://www.gameinformer.com/), was used for the young men’s data. The choice of this particular magazine is based on the assumption that the targeted readership of Gameinformer is different from the targeted readerships of Esquire and Seventeen in terms of age and gender. Gameinformer has a section that welcomes new users of the online magazine. It is called “Welcome to the brotherhood” and is an indication that the target group is young men and men. In the user guidelines of the site users are recommended to act like adults even if they are not and to use language without curses.5 When a brief search of ten different user-profile pages was done, eight users were in high school, college or university, indicating that the users mostly are in their late teens or early twenties.

Both Esquire and Seventeen are parts of the Hearst Corporation’s publishing. Seventeen’s aim is to report on issues that young women face everyday and it is a teen fashion and beauty magazine.6 A brief search of ten of the user-profiles showed that six out of ten users go to high school and the remaining four users did not state age on their profile page. Esquire is a general life-style magazine “for sophisticated men of contemporary America”. It aims to

5 http://www.gameinformer.com/forums/f/32/t/3729.aspx [Accessed 28 October, 2012]. 6 http://www.hearst.com/magazines/seventeen.php [Accessed 28 October, 2012]. 13 reach men “who are intellectually curious and socially aware”.7 Research of ten of the user- profiles showed that nine out of ten were 50 years of age or older and all of them stated having a university education. The community guidelines on both Esquire and Seventeen say that obscene language or abusive behavior is not accepted in the communities. They encourage respectful and civil behavior to others and do not allow commercial solicitation or advertising. When the data was collected for adult men, only comments posted from user- profiles stating to be male were used and when data for young women was collected only female user-profiles were used.

Working mother is geared towards mothers who work and reports on issues that are of interest to women who are both mothers and professionals. The guidelines ask their users to be careful with spelling mistakes.8 Mothering is a magazine for parents interested in “natural family living” and the user guidelines do not encourage debate; instead they want to provide support to parents on their website.9 Mothering’s forum guidelines do not allow advertising in posts and they provide the users with a list of abbreviations and emoticons that can be used in the community.10 When I collected the data for adult women, all comments that were posted from male user-profiles were excluded.

User-generated comments posted on the sites in the week containing the 15th of the months of July, August, September, October and November 2012 were collected from the magazines to reach a total of 5 000 words for each commenting group. For Seventeen, the one-week period needed to be expanded when collecting the comments as there were not enough comments posted during 35 days. First a try was made to find comments on the second choice magazine geared towards young women, but there were not enough comments to be found there either. Among the magazines examined in this category, Seventeen is the one that has most comments on their articles and therefore the choice was made to expand the time of data collection rather than using the second option magazine. A total of approximately 100 days were used to collect the data from Seventeen.

The user-generated comments from magazines geared towards adult women had to be collected from both the first and the second choice of magazine. Many user-generated comments on the articles in Working Mother and Mothering were posted with a commercial

7 http://www.hearst.com/magazines/esquire.php [Accessed 28 October, 2012]. 8 http://www.workingmother.com/other/working-mother-magazine-writers-guidelines. Accessed: 28 October, 2012 9 http://www.mothering.com/community/a/about-us. Accessed: 28 October, 2012 10 http://www.mothering.com/community/a/pleased-to-meet-you-forum-guidelines. Accessed: 28 October, 2012 14 purpose: members of the community post a comment and at the same time get paid for product placement. These comments often did not comment on the actual article, but instead contained more general content that could apply to many different articles. That means that I encountered problems in selecting which comments should be part of the corpus. I made the choice to exclude comments with commercial content, and only collect comments that could be read as a reader’s response to the article. Therefore two different magazines needed to be searched to reach 5,000 words. Hence two different methods were used when the user- generated comments from the first choice of magazine were not enough: - the period was expanded when collecting comments from magazines geared towards teenage girls and young women and - two different magazines were used when collecting comments from magazines geared towards adult women.

On Gameinformer, Working mother and Mothering the readers are allowed to have their own user-written blogs, while Seventeen and Esquire only have proprietor-written blogs in different categories. User-written blogs means that the members of the community are allowed to create their own blogs on the magazine’s site, while proprietor-written articles and blog posts are written by people employed by the magazine. The user-generated comments were collected from user-written blog posts as well as proprietor-written articles and blog posts. The structures of the magazines’ archives differed greatly and as a result, the collection of comments from the various magazines differed in terms of the time it took collecting all material. For example, collecting data from Gameinformer was quickly done and only took about six hours, compared to the twenty hours needed for collecting data from Esquire and Working Mother/Mothering, respectively. Seventeen required about 12 hours.

The major part of the comments from the magazines geared towards adult women were found in the fashion and food sections, while Seventeen’s most commonly commented articles were on music and celebrities. The political blog in Esquire was commented on to a great extent and in Gameinformer there were comments on almost all articles; both on proprietor-written articles and reviews as well as on many of the user-written blog posts. It is not possible to say anything about which category was mostly commented on in Gameinformer. While comments from Gameinformer and Esquire were mostly placed on the same day as the articles, commenting threads from Seventeen at times had comments that differed by one year in date.

15

3.3 Methods of analysis

The analysis of the corpus was done separately for each commenting group. First the number of words and comments were counted and then the data was searched manually for abbreviations, acronyms, nonstandard typography and orthography plus contractions and ellipsis of subject and/or verb. Nonstandard typography and orthography were then counted with the advanced search function in Word. The frequency of ellipsis of subject and/or verb was manually counted in the data for each commenting group and in the analysis exclamations were not regarded as examples of ellipsis of subject and/or verb. In this study a word was defined as letters or symbols divided by blank spaces, which means that emoticons and articles are counted as words. Each comment consists of at least one utterance but in most cases more than one, based on a definition of utterance as consisting of one or several words that provide referential or pragmatic meaning (Brock 1996, quoted in Sundqvist, 2009:104). Utterances mostly coincided with punctuation. In cases when punctuation was missing, punctuation was added by myself in order to divide the comment into utterances.

After the corpus had been checked for typical features of internet language, the free online version of Linguistic Inquiry and Word Count (described in section 1.1) was used to examine the corpus for self-references, social words, emotions, cognitive words, big words and articles. The results from this analysis were expected to give an indication about the level of formality in the data. The data for each of the four commenting groups was run separately in LIWC.

Next, each user-generated comment was categorized as one out of three different modes of tone – aggressive, friendly or neutral – in order to investigate the discourse style of the four commenting groups. This categorization is based on Herring’s (2007: 18) presentation of a classification scheme for online language. Her scheme is developed from The Speaking model by Hymes (Hymes in Herring 2007:6) and regards the modality of speech called key. In Herring’s adaptation of the model this modality is called tone and a message can be separated into different manners of tone; serious/playful, formal/casual, contentious/friendly and cooperative/sarcastic etc. (Herring, 2007:18). In other words, ”’tone’ refers to the manner or spirit in which discursive acts are performed” (Herring, 2007:19). In my study Herring and Kapidzic’s basic categorization from 2010 is used. Their study categorized the messages from teen chatrooms in three different tones: aggressive, friendly or neutral (Herring and Kapidzic, 2011:45). Later they added three more categories to their study, but in my analysis of the corpus of user-generated comments only the three first categories were used and each

16 comment was categorized under only one of the tones. The categorization was done manually and impressionistically, with the following criteria to guide me: - comments that carried irony or sarcasm were categorized as aggressive in those cases where a subject of insult was addressed in the comment - plaintive, reactive and/or adversary content of the comment were categorized as aggressive - if irony was used in a humorous way, the comment was categorized as friendly or neutral (e.g., friendly bantering) - comments on content in articles, with or without expressed personal opinions were categorized as neutral (i.e., you need not agree with the article’s content to be categorized as neutral) - cheering comments expressing love, gratitude and/or encouragement were categorized as friendly - if the comment contained more than one tone, it was categorized by the most dominant tone

Most of the comments contained only one tone, but in cases when commenting language was friendly or neutral but content insulting, adversary, reactive or plaintive, the comment was categorized in the most dominant tone, like in the following example:

(3) Ahhhhh hellllooooo that's what girls do we're girls!!!!!

A civil language is used in this comment, but the content is adversarial; the Ahhhhh hellllooooo and the repeated exclamation marks, imply the questioning of someone and the wish to tell someone off. Therefore the speech-like exclamation indicates an adversarial stance and the comment was consequently categorized as aggressive. To clarify further how the categorization was made, a few examples (4-9) from the corpus are presented below.

(4) This is such a great quote, brightened my day :) I love her! (friendly) (5) Glad you made it through Sandy with such a great attitude! (friendly) (6) You know X, I like you and all but goddamit it seems like every time I read a post and have something snarky, mildly humorous to say about it - there you are. Damn it, I don't mind that you're better at it than I am - it's just this 'cut off at the pass' stuff that sticks in my craw. (aggressive) (7) I am very upset right now. I HATE YOU I HATE YOU I HATE YOU I HATE YOU I HATE YOU. (aggressive) (8) This looks a bit kitchy, just another way to juice your bucks (neutral) (9) I really thik it's a good ideea to have more womens at US government. They must have the same rights as men's do [sic] (neutral)

17

3.4 Ethical considerations

Based on the collected corpus of user-generated comments, a language analysis was made to look for linguistic differences in language in relation to age and gender. The comments were collected from the different magazines without names or the article which was commented on, and then copied into a computer file. On the internet it is impossible to know exactly who is behind the aliases or the user accounts on social network sites. Hence the conclusions from this study have to be founded on the assumption that most commenters are a certain age and a certain gender, based on the explicit aim of the magazines and the brief search that was made on user-profiles of each commenting group (see section 3.1). Consequently, my assumptions are that the commenting group of Esquire represents adult men, while the commenting group of Gameinformer consists of young men. Likewise I assume that the commenting group of Mothering and Working mother consists of adult women, while young women make up the commenting group of Seventeen.

3.5 Methodological considerations

The method chosen for this study is based on previous research on internet language. Hård af Segerstad (2002:120) concluded that the use of language in email varies with age and gender. Also Herring (2003, 2011) and Baron (2004) found gender differences in the language of IM, as did Tagliamonte and Denis (2008). Tagliamonte and Denis (2008:24) concluded that stylized IM forms are abandoned by adolescents at a young age although the unique style of the medium is kept. Their studies of IM are quantitative, but they do not contain any information about user-generated content or comments, but either way their findings motivate the focus of the present study: age and gender. The studies on online commenting that I was able to retrieve were made from a technological, sociolinguistic and/or social psychological perspective (e.g., Braga, 2011 and Atai and Chahkandi, 2012). This is in line with Thurlow’s (2006:668-669) view that CMC research has tended to focus on ways of communicating rather than on linguistic practice, assumedly on account of technological determinism. Walter and Jang (2012:4) believe that linguistic, stylistic and sematic qualities are of interest when studying online language and behaviour. So opposed to early research about internet language, when the variety was considered a result of the medium, today’s research has a broader focus and takes traditional linguistics such as genre, stylistics and semantics into account. This study has its focus at the typographic, orthographic, syntactic and stylistic levels, and is thereby focused on linguistic practice.

Baron (2004:398) argues that each type of CMC has its own usage conditions depending first on the number of interlocutors and second on whether the communication is instantaneous

18 or not. The usage conditions in their turn can affect the language used in communication when it comes to formality, tone, the number of words, correctness and informativeness. Hård af Segerstad (2002:252) explains that communicators in asynchronous modes have time to plan and revise their writings, while synchronicity demands more rapid typing and with that comes less revision of the written text. She explains further that the relationship between the participants as well as the activity where communication takes place are conditions that need to be regarded:

Language use is adapted according to level of synchronicity, the particular conditions for production and perception in each means of expression, as well as according to the communicative situation and context (Hård af Segerstad, 2002:253)

The findings of my corpus analysis will be linked and compared to previous quantitative findings of research on IM corpora as well as previous findings about asynchronous internet language. User-generated commenting and IM are two different online genres that differ in terms of synchronicity and the number of interlocutors. User-generated comments are not usually received by a known interlocutor, as is the case in IM conversations, but address members of a community that the commenter belongs to. The other members might or might not be known to the commenter and quite often user-generated comments can be read also by people who are not members of the community. Hence the conclusion can be drawn that the user-generated comments that were collected in this study belong to a public genre within internet language. The comments do not require an instant answer from someone and an online comment thread can be visited and read by many different members of the community on different occasions. That is another contrasting feature compared to IM. The assumption can be made that these conditions contribute to greater formality in language as the comments address people unknown to the commenter and apart from that, the commenters have time to revise and express themselves since no one is waiting for a quick reply. One-to- one communication between people who know each other must be assumed to encourage the use of more informal language than one-to-many communication between people who are unacquainted with one another.

4. Analysis and results

The results of the analysis of the corpus of user-generated comments from the four different commenting groups are presented below. In 4.1, a basic analysis of comment length and number of comments is presented and compared to findings by scholars cited above. In 4.2

19 and 4.3 the features generally considered typical of internet language: emoticons, nonstandard spelling, abbreviations and acronyms, are analyzed. In 4.4 the LIWC analysis of the level of formality in the comments is conferred, together with a survey of ellipsis and contractions. Finally section 4.5 delivers the results from the analysis of commenting tone.

Results from the gender- and age analysis of typical features in the collected data are thereby presented from a typographic, orthographic, syntactical and stylistic view respectively. The nonstandard spellings analyzed in my study plus the omission of the apostrophe in contractions are features that overlap in language level. Nonstandard spellings of the pronouns I and you plus the conjunction and are in the present study regarded as typographic features, but could also have been counted as orthographic features since they are respellings (see further discussion in Herring 2011:2 and section 2.2 above). The omission of the apostrophe in contractions can belong to both typography and syntax. As the use of contractions traditionally has been considered a marker of informality in a text, the feature will be covered in section 4.4, Syntactic features and level of formality and not in section 4.2, Typographic features: emoticons and typographic respelling.

4.1 Length and number of comments

Table 1 and 2 present the results from a basic analysis of the user-generated comments. Table 1 shows the analysis from a gender perspective and Table 2 shows the results from the perspective of the four commenting groups.

Table 1. Survey of the corpus from a gender perspective. Commenting group Male Female Total no. of words 10,512 10,575 Total no. of comments 239 402 Mean length of comments (no. of words) 44 26 Average number of utterances per comment 3.28 2.29

20

Table 2. Survey of the corpora from the perspective of the four commenting groups. Adult Adult Young Young Total men women men women Number of words 5,295 5,343 5,217 5,232 21,087 Number of comments 98 132 141 270 641 Mean length of comments 54 41 37 19 33 (no. of words) Shortest comment 6 2 4 1 1 Longest comment 175 167 197 224 224 Average number of utterances per 3.32 2.92 3.25 1.98 2.66 comment

As mentioned, traditional writing research has shown that women write longer texts than men, but this contrasts with findings from asynchronous internet language in which women have been found to write shorter messages than men (Baron, 2004:418, Herring, 2003:207). This contrast is apparent in the analysis of the data in the present study where male comments are nearly twice as long as the female comments in mean length (see Table 1). If a comparison is made between the commenting groups of young women and adult men, the contrast is even more pronounced and Table 2 shows that young women’s comments are approximately one third of adult men’s in mean length. Thus my findings are in line with earlier research done by Herring (2003) and Atai and Chahkandi (2012) on asynchronous internet language.

In the synchronous, one-to-one and private mode of internet language IM, Baron found that female college students write longer messages than male college students (Baron 2004:418). Though the data in this study was collected from a different mode of internet language – a mode very different from the mode analyzed by Baron as it is asynchronous, one-to-many and public – it is interesting to note the difference in results. In the data analyzed here, the mean length of comments by young men is twice as long as the mean length in the young women’s data (see Table 2); a result which differs radically from Baron’s findings. As the genres are so different, the comparison cannot be made too much of, but it emphasizes the fact that internet language is not a single universal language variety used similarly across the internet. As argued by Hård af Segerstad (2002:14, 16-21) and Squires (2010:463) among others, it differs in features and style depending on user, genre and situation. Another result to notice is the fact that the young women’s comments contain both the longest and the shortest comment of the whole corpus.

21

4.2 Typographic features: emoticons and typographic respellings

Use of emoticons is a typographic feature of internet language, and as pointed out in the theoretical background, it is generally presented as a very frequent feature of internet language in the mass media. In my data for the four commenting groups, the differences in emoticon usage range from none in the adult men’s data to 90 examples in the young women’s data. In percentages, 1.7% of the words in the young women’s comments are emoticons. In Baron’s (2004:414-415) study of IM, female college students used emoticons twice as often as male college students and also Herring’s (2003:210) research showed that women use representations of smiles and laughter more often than men. Their findings correlate the results of this study where nine emoticons, all representations of a happy face, were found in the data for the adult women and none were found in the adult men’s data. Of the 90 emoticons in the data for young women, 44 are different representations of smileys, 32 are graphic representations of a heart and five are representations of winkies (see Appendix A for a full list). Altogether, the results show that the vast majority of the 90 emoticons represent positive feelings. Also in the young men’s comments the representations of positive feelings are in the majority: of the total of 16 emoticons, six are smileys and four are winkies.

Previous research by Baron (2008:151-154) and Herring (2011:2) has shown that emoticons are very infrequent in IM and mobile phone text messages, even though they are often viewed as a common feature in public discourse about internet language. The results from this study are in accord with previous findings: a total of 215 emoticons in the whole corpus makes a ratio for emoticons of 1%. One percent must be considered a very low ratio, considering the space the feature is given in public discourse (cf Thurlow 2006:679, 686).

Table 3. Frequency of typographic features in the comments. Adult men Adult women Young men Young women Emoticons 0 9 16 90 I 102 (100%) 231 (97.9%) 305 (95.9%) 249 (92.9%) i 0 5 (2.1%) 13 (4.1%) 19 (7.1%) and 121 (98.4%) 134 (96.4%) 122 (96.8%) 98 (96.1%) & 2 (1.6%) 5 (3.6%) 4 (3.2%) 4 (3.9%) you 39 (100%) 55 (100%) 64 (98.5%) 124 (95.4%) u 0 0 1 (1.5%) 6 (4.6%)

22

Respellings and nonstandard typography overlap, according to Herring (2011:2). In Table 3 nonstandard spelling of I, and and you (i, &, u), are compared with the number of standard spellings of the same words. And is occasionally represented with the keyboard symbol & in the data, but overall the standard form is used more than 96% of the time. The use of u for you is employed even more rarely and is never used by adult men or adult women. Though the use of nonstandard spelling u for you is most frequent in the commenting group of young women, it is rare overall and the standard form is used in 95% of the cases.

In IM the use of lowercase letter i for personal pronoun I has been noted to be more common than the standard form (Squires 2010:482, 484, Tagliamonte & Denis 2008:14). As discussed above it can be awkward to compare results from two very different genres within internet language, unless it is to accentuate differences between the modes of communication. As can be seen in Table 3, the use of lowercase i is clearly not more common than its standard equivalent in the data of the present study. However, it is still the nonstandard typographic feature most commonly used in the corpus of user-generated comments. Though the ratio is very low compared to the ratio in IM, where the use of lowercase i is more common than the standard spelling of I, it is the young women who show the highest frequency in the use of nonstandard spelling for the personal pronoun I in my study. The results of the present study suggest that the young women are the most frequent users of nonstandard typography, as, for all examined features, their ratios are the highest compared to the three other commenting groups.

4.3 Orthographic features: abbreviations

When discussing the orthographic features there is a need to clarify the different kinds of abbreviations that will be discussed. The is one kind of abbreviation presented in this study and here the term acronym also includes initialisms. This means that all representations where the initial letters in the words of a phrase are combined will be called acronyms, even though they could be divided into (1) acronyms (initial letters put together that can be pronounced, e.g., lol) or (2) initialisms (initial letters put together that cannot be pronounced, e.g., omg). This is done in line with a study of IM done by Baron (2004), in which she did not separate the two. Only acronyms that seem to be distinctive for language on the internet were counted, leaving out acronyms that are also part of common offline writing, e.g., UN, US, i.e., etc. and their likes. This is also in line with Baron’s (2004) study of IM. The term abbreviation is used to represent all short forms of words found in the corpus, e.g., clippings (e.g., fave), lexical shortenings (e.g., ‘cuz), and orthographic reduction of letters (e.g., fk).

23

Abbreviations and acronyms are characterizing features of internet language, since they are timesaving and save space when there is a need to be brief, according to Squires (2010:467). In my corpus, both abbreviations and acronyms are very infrequent. Corresponding to Baron’s (2008:154) study of IM, a few abbreviations were found in the data. All but one were instances of abbreviations used in informal written and spoken English and thereby they cannot be considered typical of internet language. The abbreviation found that can be considered typical of internet language is the clipping props (proper recognition or proper respect), which was found two times in the comments by the young men. Three abbreviated representations of expletives (bugfk, fk, f’ing) were found in the data for adult men, all of which contained orthographic reduction of letters. No expletives were found in the comments by the three other commenting groups. Kinda, a contracted speech-like form of kind of, occurred three times in the adult men’s data and five times in the young men’s data. Another word normally found in speech, the lexical shortening ‘cuz (because), was found one time in the young women’s data. The clipping fave was found in both young women’s and adult women’s data (three and one time/s respectively) and another clipping, kiddi (kidding), was found in the adult men’s comments. In the adult men’s data a total of 11 abbreviations were found, though neocon (neoconservative), commies (communists) and kiddi cannot be considered typical of language on the internet. The young men’s data showed a total of six abbreviations, a total of three were found in the young women’s data and one in the adult women’s data. All in all 21 abbreviated words, 0.1%, were found in the total corpus made up of 21,087 words, not counting acronyms. Consequently the results from this study corroborate Baron’s findings from IM conversations. The fact that expletives only occur in the comments by the adult men also corroborates earlier findings about language use on the internet (Herring 2011:40).

A special form of abbreviation is the acronym. Acronyms occurred in the data from user- generated comments with approximately the same frequency as abbreviations. All in all 27 acronyms were found in the total corpus: 20 in the young women’s data, six in the young men’s data and one in the adult men’s data. The adult women’s comments contained no acronyms. Lol, including equivalents such as looolll, lolz and olz, was found nine times in the young women’s comments and five times in the young men’s comments and is consequently the most common acronym used in the collected corpus. Omg could be seen seven times, all of which were in the young women’s comments. Btw (by the way), idk (I don’t know), imho (in my humble opinion), ba (bad a**), pll (pretty little liar) were found one time each. These results are in line with Baron’s (2008:154) study on IM and text messages, where acronyms were also sparse.

24

4.4 Syntactic features and level of formality

As discussed above, both ellipsis and contractions can be regarded as orthographic reductions as well as markers of informality in a text. Not all possible contractions have been analyzed in this study, only the ones listed in Table 4. Those are the negated auxiliary verbs that appeared more than ten times each in the corpus (a table of additional contractions can be found in Appendix B). When analyzing contracted forms, also the use of the apostrophe in contractions was included, as it can be read as a signal for new patterns in the use of the apostrophe.

In Baron’s (2004:414-415) study of IM, a difference in the use of contractions was noticed between genders: men used contracted forms more often than women. In the study on IM and text messages, she found that contracted forms were used 68% of the time in IM and 85% in text messages (Baron 2008:154). A difference in the use of the apostrophe in IM and text messages was also found: in IM 94% of the contractions contained the apostrophe while only 32% of the contracted forms in text messages contained the apostrophe. In the present study, contracted forms are used in the great majority of the potential places of negative contractions (91%), as shown in Table 4.

Table 4. Frequency of negated contractions and their uncontracted forms plus frequency of ellipsis of subject and/or verb in user-generated comments (the number in brackets shows the instances in which the apostrophe has been left out). Adult men Adult women Young men Young women Total cannot 0 1 0 0 1 can’t/ 1 7 (1) 7 10 (2) 25 (3) do not 1 0 1 0 2 don’t/dont 11 5 11 (1) 27 (1) 54 (2) is not 4 1 0 0 5 isn’t/isnt 2 1 2 2 7 will not 1 0 0 0 1 won’t/wont 2 0 3 5 (1) 10 (1) did not 2 0 0 0 2 didn’t/didnt 2 1 13 5 21 Ellipsis* 3 19 37 28 87 Utterances** 325 385 458 535 1703 *ellipsis of subject and/or verb **total number of utterances

25

Only 11 out of 128 possible contractions are uncontracted forms, of which eight were written by the adult men. That means that the adult men’s comments contain the highest usage of uncontracted negated auxiliaries in the present study. In six of the total of 117 contractions, the apostrophe has been left out. Phrased differently, 95% of the contractions contained the apostrophe. Added up, the results show that contracted forms are used more often in this study of asynchronous internet language than in previous studies done on synchronous internet language. The use of the apostrophe in contractions in my study is comparable to Baron’s (2008:154) findings based on IM.

Fragmented syntax is common in genres like chat, IM, texting and microblogging and can be a way to try to write speech-like utterances (Herring 2011:5). The analysis of the data from user-generated comments shows some differences in frequency between the commenting groups. Ellipsis of subject and/or verb is most common in the comments written by the young men where 8% of the utterances contain ellipsis of subject and/or verb. Ratios for ellipsis for the three other commenting groups are 5 % (adult women), 5% (young women) and 3% (adult men). Taking into account that comments consist of at least one but sometimes up to 15 utterances, ellipsis cannot be considered a very common feature in this study.

Table 5. Analysis of linguistic features at word level (percentages). LIWC Self- Social Positive Negative Cognitive Articles Big dimension references words emotions emotions words words Adult men 3.04 7.34 2.42 2.11 5.89 7.89 19.36 Adult 7.00 9.41 4.72 1.05 6.14 5.99 15.65 women Young men 6.77 6.59 5.11 1.39 6.90 6.09 14.67 Young 6.09 9.82 5.33 1.81 7.45 4.45 11.34 women Formal * 4.2 8.0 2.6 1.6 5.4 7.2 19.6 Personal* 11.4 9.5 2.7 2.6 7.8 5.0 13.1 *percentages provided by LIWC for reference to a formal and a personal writing style (henceforth referred to as the LIWC references)

As already mentioned, asynchronous modes of internet language are often closer to formal writing than synchronous modes and when measuring the frequency of grammatical words such as pronouns and determiners, internet language typically falls between the two extremes speech and writing (Herring 2011:6). LIWC provides references for all word

26 categories in formal and personal texts (see Table 5). The LIWC analysis shows that the adult men’s use of self-references and positive emotions is much lower than the three other commenting groups’, though the adult men use big words and negative emotions to a higher degree. By and large the analysis of data for the adult men shows similarities to the LIWC’s references for formal texts. Adult women and young women have the highest figures when it comes to the use of social words and positive emotions, both of which are indicators of personal texts. Both the young men and the adult women fall between the LIWC references for self-references, cognitive words, articles and big words. Data for the young women shows infrequent use of big words, probably depending on the age of the members of the commenting group. As shown in section 3.1, most of the user profiles stated that the commenters were in high school. The LIWC analysis of the comments by the young women shows closeness to the LIWC references for a personal text.

Seemingly the adult men’s comments include linguistic features at word level that are close to a formal writing style, while the young men and the adult women fall between a formal and personal writing style, though high as regards positive emotions. The analysis also shows that the adult women use social words to a high extent, thus indicating a personal style. The young women make frequent use of social words and words which display positive emotions, but less so when it comes to the use of articles and big words. Thus indications of a personal writing style are found in the young women’s comments.

4.5 Commenting tone

Below the ratio of use for each commenting group is presented in pie charts showing comments categorized as aggressive, friendly or neutral. The results from the analysis of tone in comments are at the stylistic level and to make it easier to compare the results, the pie charts are presented next to one another, see Figures 1-4.

The results show that the neutral tone is used by the adult women, the young men and the young women in slightly more than half of their comments, while nearly half of the adult men’s comments were classified as neutral. This means that the most frequent tone, regardless of commenting group, is the neutral tone. Moreover, the results show great differences at the stylistic level when it comes to the use of the friendly and the aggressive tone. The friendliness shown by the adult women and young men and to some extent by the young women in my study is in line with Braga’s (2011:218) finding that the most common attitude in the blog environment she analyzed was kindness. Almost half of the comments from the young men were written in a friendly tone, which means that the young men use a friendly tone more often than the young women do. This result is not in line with earlier 27 findings: the results from the study by Thelwall et al (2010, as seen in Herring 2011:42) showed that young women used a friendly tone to a greater extent than young men did and consequently this contrasts to the results of the present study.

Figure 1. Tone in user-generated Figure 2. Tone in user-generated comments by adult men. comments by young men.

Figure 3. Tone in user-generated Figure 4. Tone in user-generated comments by adult women. comments by young women.

Most striking in the analysis of tone is perhaps the extent to which the adult men in my data use an aggressive tone in communication, compared to the three other commenting groups. Also worth noting is the result showing that the young women use an aggressive tone five times as often as the young men and the adult women. From the gender perspective it is interesting that the aggressive tone is used only marginally in the data for the young men while the data for the adult men shows a decidedly higher frequency. Likewise it is interesting to note that the young women use an aggressive tone more often than the adult women do.

The findings from my study regarding the adult men’s use of an aggressive tone in 42% of the user-generated comments is in line with Herring’s (2011:40) findings that men express their opinions strongly and hold an adversarial orientation to their interlocutors. The adult men in the present study show a much higher use of aggressive and insulting behaviour compared to 28 males in Herring’s study from the early 00s, who used it three times as often as females. In this study of user-generated comments, the adult men used aggressive and insulting behaviour in 42% of the comments, while the adult women used it in only 2% of their comments. The fact that the young men used an aggressive tone in only 2% of the comments contradicts previous findings. Guiller and Durndell (2006, cited in Herring 2011:42) found that male students were more likely to use authoritative language and respond in a negative way in interactions in asynchronous discussion groups. When my data is analyzed by gender only, disregarding age, the results from the analysis of tone come closer to the findings of Herring’s (2011) study: aggressive tone in female data accounts for 11% and male data accounts for 43% in user-generated comments.

Findings by Atai and Chahkandi (2012:887) saying that gender-typical stylistic features are used by both sexes can be linked to the results from both the young men’s and the young women’s comments. In the young women’s data at least one of ten comments holds an aggressive tone which is considered typical of a male style of online language. In contrast to that, the data for the young men shows a rather large proportion of comments that holds a friendly tone, generally considered a typical stylistic feature of women.

In this study the adult women and the young men can be said overall to display either a friendly or a neutral stance, while the adult men more often are either aggressive or neutral and only occasionally friendly. The young women are friendly or neutral in most cases, but occasionally they use an aggressive tone.

4.6 Further remarks on the results

The reasons for the differences in the writing styles of the four commenting groups are hard to figure out, but educational background and age are likely to be important factors when a person chooses his or her level of formality as regards writing style. In this study the people with the highest education - middle-aged men and mature women - use a more formal written style than the two younger commenting groups. The level of educational background and age is also evident in that the data for the young men shows a more formal style in typography and orthography, than the young women do. The commenting group of young women is the youngest of the examined groups - most of the commenters are of high school- age. In the young men’s commenting group most of the commenters go to college or university and that makes them approximately five years older than the young women. A five-year age difference at this time in life is quite a large span. This might have affected the results of the study of user-generated comments and it would have been preferable if the age of the young women and the young men had been more alike. On the other hand, it is 29 possible to draw on the age difference and conclude that the typical features of internet language are used more by young women in early and mid-teenage years and still evident, but not so emphasized, among young men in their late teens and early twenties. Therefore, it can be assumed that the use of typical features of internet language alters with ascending age in adolescence, which is in line with the conclusion made by Tagliamonte and Denis (2008:24) saying that stylized IM forms are abandoned by adolescents at an early age.

An interesting, and perhaps the most surprising, result in this study was the degree to which adult men used an aggressive tone when commenting on articles in the online magazine. Previous research has shown that men hold an adversarial stance to their interlocutors, so this study is in line with that, but the frequency surprised me. In recent previous research, for instance in Atai and Chahkandi (2012) and Braga (2011), the topic of discussion has been regarded when analyzing user-generated content. In this study the additional information about topic might have revealed something about the great differences in use of the aggressive tone in the different commenting groups. Several of the scholars cited in this paper argue that internet language varies with purpose and situation. Therefore the holistic view that Dare (2011) argues for would have given a more outspoken framework to this study. That might have made the study more informative concerning the user-generated content, as a qualitative study of the content could have been performed as well as the quantitative.

5. Discussion: Implications for the future of the English language

As discussed above, scholars are not very worried about the decay of the English language on account of internet language, while the public reaction to internet language is often represented by fear of drastic language changes. Though it is impossible to say exactly how the English language will evolve, Baron has made some suppositions about possible future changes. Baron (2009:44) argues that among the changes that will be brought into the future, spelling or vocabulary will not be the most important. Instead she believes that the changes in attitude towards language structures will be more significant. Two shifts are listed that, according to Baron, are likely to affect future English language: - the whatever-attitude towards language rules and correctness - the enhanced control of linguistic interactions (Baron, 2009:44-45)

The whatever-attitude towards language rules and correctness is a change that will result in people’s declining concern for what is prescriptively considered good English. People will

30 simply not worry too much about traditional language rules and grammar, as long as the communication can be understood. Instead focus will be more on tolerance and personal expression. The other shift – control of linguistic interactions – means that people in general “increasingly come to see language not as an opportunity for interpersonal dialogue but as a system we can maneuver for individual gain” (Baron, 2009:45). In other words, people have the opportunity to manipulate their interaction with others by using communication acts they can control. How then, is this manipulation set to work? According to Baron (2009:45) there are a number of actions concerning the interaction with others, where the message sent can be controlled: - the choice between calling or texting someone - the signals that are sent when designing social network pages with staged photos and when choosing what information the contacts will be able to access in social network profiles - the possibility to choose not to answer a phone call since, with caller-id, the caller is known beforehand - the possibility to pretend to be talking on the phone when meeting an unwanted conversation partner on the street (Baron, 2009:45)

In the present study the style used by the young women is closest to the writing style commonly presented in the media and at social network sites as the style of language used on the internet – a style ridiculed and mocked by people who consider Standard English superior (see section 2.1 for quotes from Urban Dictionary). But will young women change their online communication style as they grow up and get more educated as Tagliamonte and Denis (2008) suggest? And will their written style be transferred to the next generation and become a type of standard online communication style that young teenagers use? When Baron explains the “whatever-attitude”, she states that correctness, which I suppose can be analogous to Standard English, will not be as important in the future as it is today. Her supposition says that people will be more tolerant towards language varieties and personal expressions in the future and if this is so, parts of the internet language style used by the young women in this study might have a bright future despite the prescriptivists’ views.

When concluding this discussion about future language, it is exciting to note that Baron’s suppositions contrast radically with the wishes of the public discourse. While the discussions within public discourse express that standard spelling and formally correct sentence structures are features that ought to be preserved, Baron believes that tolerance towards nonstandard language features and language correctness is what the future will bring. The gap between those two views on future language is rather large and mirror two absolutely opposite viewpoints. And what does that imply? Has the media given people with controversial ideas more speaking space and as a result prejudices about language change

31 which has resulted in a somewhat lopsided debate? Or have scholars researching online language not succeeded in conveying their knowledge that Standard English is actually not changing at a high speed? Whether this is a question of scholars uninterested in making their results known to the public or if it is a lack of interest from the mass media to report the scholarly results – as they are not making any headlines – will be left unsaid here, but need to be addressed in another discussion. As I see it, there is a question regarding responsibility here that needs to be approached so the debate can become more equal.

6. Conclusion and future research

One aim of this study was to examine the extent to which typical features of internet language could be found in user-generated comments collected from four different commenting groups. The adult men’s comments contain only a few features of internet language, adult women’s and young men’s comments show a rather low, but still evident, usage of typical features and the comments by the young women show the highest frequency of features typical of internet language. Another aim of the study was to examine to what degree internet language consists of spoken features. The examined feature contractions is common in spoken language and the ellipsis of subject and/or verb can be an attempt at writing speech- like utterances, according to Herring (2011:5). Throughout the commenting groups contracted forms were used in the vast majority of the potential places of contractions. Ellipsis of subject and/or verb was most common in the young men’s comments, but very rare in the data for the adult men. This implies that the young men use a style that is closer to spoken English than the three other commenting groups.

A matrix describing where the four commenting groups are placed along the dimensions informal-formal and nonstandard-standard is shown in Figure 5. The young women are placed in the first quadrant as their data contains many nonstandard features and indicates an interpersonal writing style. Thereby they seem to represent the writing style closest to the picture of internet language mediated by the mass media. Their data shows the highest rates of use of nonstandard typography and orthography, such as the use of emoticons and acronyms, the use of lowercase letter i for I and the omission of apostrophes in contractions – all features that are generally considered typical of internet language. In addition to this, the analysis of the young women’s data in LIWC shows the highest use of social words and positive emotions and values that overall represent a personal writing style. The fact that the young women are high in the use of typical typographic and orthographic features and at the same time show percentages at word level close to the LIWC references for a personal text, 32 indicate informality in the language of the comments. Other signs of an informal writing style are the use of the friendly as well as the aggressive tone, showing that the content of the comments is emotional.

Figure 5. Matrix describing language use on the internet for four commenting groups.

In contrast to the young women’s writing style, the adult men’s writing style is placed in the fourth quadrant of the matrix. The adult men’s comments show many formal and standard writing features. In the LIWC analysis they contain the highest rates of big words, negative emotions and articles and in addition to that the comments have the lowest rate of fragmented sentences, as well as the highest usage of uncontracted negated auxiliaries. Thus, the overall analysis of the adult men’s comments implicates a writing style close to formal writing, something which contradicts the mediated picture of internet language from the mass media. Though the adult men’s style is close to standard and indicate a formal writing style, there are two things that contradict this picture: first, the results show that the adult men use contracted forms more often than uncontracted forms and second, it is only in the adult men’s comments that expletives are found.

The adult women’s and young men’s commenting groups are placed in between the young women and the adult men in the matrix. In comparison to young women and adult men, the comments by adult women show some formal features and other features considered typical of internet language. The formal features are the relatively frequent usage of big words and the frequent usage of standard typography, and the infrequent, almost nonexistent, presence of abbreviations and acronyms. The use of emoticons and the extensive use of a friendly tone are features that are indications of an interpersonal writing style. All in all, adult women seem to use a personal and friendly writing style characterized by standard spelling and syntax.

Similarly, the data for the young men indicates a friendly and interpersonal writing style. The young men’s comments contain the highest ratio of fragmented sentences, which must be considered a marker of an informal writing style. The rather infrequent use of big words and 33 the frequent use of positive emotions, as well as the extensive use of the friendly tone are indicators of an interpersonal writing style. Another finding is that the young men fall between the LIWC references for formal and personal writing styles in self-references, cognitive words and articles. Typography and orthography are close to standard in the young men’s comments and this is attested by the relatively low usage of emoticons, abbreviations and acronyms and the standard typography in pronouns. Fragmented sentences and the use of contractions are other signs of informal written language and the data for the young men shows the highest ratio as regards the use of ellipsis of subject and/or verb and full use of contractions. Therefore it is possible to conclude that the young men’s syntax is more heavily influenced by spoken English than the other three commenting groups.

As for future research, I suggest a study that analyzes user-generated content quantitatively to find out more about the differences in commenting tone. In such a study knowledge of discourse content is of interest, as well as the purpose of the user-generated content.

34

References

Ames, Melissa, & Himsel Burcon, Sarah (eds.). 2011. Woman and language: Essays on gendered communication across media. Jefferson, NC: McFarland & Co.

Atai, Mahmood Reza, & Chahkandi, Fatemeh. 2012. Democracy in computer-mediated communication: Gender, communicative style, and amount of participation in professional listservs. Computers in Human Behavior 28(3): 881-888.

Baron, Naomi S. 2004. See you online: Gender issues in college student use of instant messaging. Journal of Language and Social Psychology 23(4): 397-423.

Baron, Naomi S. 2008. Always on: Language in an online and mobile world. New York, NY: Oxford University Press.

Baron, Naomi S. 2009. Are digital media changing language? Educational leadership, 66(6): 42-46.

Braga, Adriana. 2011. Gender blogging: Femininity and communication practices on the internet. In M. Ames & S. Himsel Burcon (eds.), 215-228.

Crystal, David. 2006. Language and the internet (2nd edition). Cambridge: Cambridge University Press.

Dare, Julie. 2011. Women, kin-keeping, and the inscription of gender in mediated communication environments. In M. Ames & S. Himsel Burcon (eds.), 185-198

Herring, Susan C. 2003. Gender and power in on-line communication. In J. Holmes & M. Meyershoff (eds.), 202-228.

Herring, Susan C. 2007. A faceted classification scheme for computer-mediated discourse. Language@internet 4(1). Available at http://www.languageatinternet.org/articles/2007/761/Faceted_Classification_Scheme_ for_CMD.pdf [Accessed December 7, 2012].

Herring, Susan C. 2011. Grammar and electronic communication. Preprint version retrieved from: http://ella.slis.indiana.edu/~herring/e-grammar.2011.pdf. [Accessed November 10, 2012].

Herring, Susan C., & Kapidzic, Sanja. 2011. Gender, communication, and self-presentation in teen chatrooms revisited: Have patterns changed? Journal of Computer-Mediated Communication, 17(1): 39-59.

Holmes, Janet, & Meyerhoff, Miriam (eds.). 2003. The handbook of language and gender. Oxford: Blackwell Publishing.

35

Hård af Segerstad, Ylva. 2002. Use and adaptation of written language to the conditions of computer-mediated communication. Göteborg: Göteborgs universitet.

Leurs, Koen, & Ponzanesi, Sandra. 2011. Gendering the construction of instant messaging. In M. Ames & S. Himsel Burcon (eds.), 199-214.

Squires, Lauren. 2010. Enregistering internet language. Language in Society, 39(4): 457- 492.

Sundqvist, Pia. 2009. Extramural English matters: Out-of-school English and its impact on Swedish ninth graders’ oral proficiency and vocabulary. Dissertation. Karlstad: Karlstad University Studies 2009:55.

Tagliamonte, Sali A., & Denis, Derek. 2008. Linguistic ruin? LOL! Instant messaging and teen language. American Speech 83(1): 3-34.

Thurlow, Crispin. 2006. From statistical panic to moral panic: The metadiscursive construction and popular exaggeration of new media language in the print media. Journal of Computer-Mediated Communication 11(3): 667-701.

Walther, Joseph B. & Jang, Jeong-woo. 2012. Communication processes in participatory websites. Journal of Computer-Mediated Communication 18(1): 2-15.

36

Appendix A: Emoticons found in the corpus

Emoticon found in the adult women’s data:

Smileys: :)

Emoticons found in the young men’s data:

Smileys: :) :-) (: ^_^ Winkies: ;P ;) ;-) (; ;D Other: :/ :( D: -_-

Emoticons found in the young women’s data:

Smileys: :) :-) (: (((: ^_^ Winkies: ;P ;) ;-) (;

Other: :/ D: -_- *_* <33333 ♥

37

Appendix B: Table of additional contractions

Table 6. Frequency of contracted forms in user-generated comments (the number in brackets shows the instances in which the apostrophe has been left out) Adult men Adult women Young men Young women Total I am 4 15 3 15 37 I'm/im 8 22 34 (6)* 25 (2) 89 (8) I will 4 4 4 3 15 I'll/Ill 2 2 10 1 15 (0) I would 4 3 7 11 25 I'd/Id 3 1 4 3 (1) 11 (1) you are 2 9 1 6 18 you're/your 2 7 10 (2) 7 (4) 26 (6) it is 4 8 3 3 18 it's/its 18 19 37 (6) 38 (11) 112 (17) *four of the instances where the apostrophe had been left out was written by the same person

Note that the words your and its are used as contracted forms for you are and it is. The instances where the words were used as possessive pronouns are not included in these numbers.

38