Internet-Based Research in the

Social Science of Religion

William Sims Bainbridge Co-director of Human-Centered Computing at the National Science Foundation (NSF)

ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion

For a decade, social scientists have been aware that much religion-oriented communication takes place on Internet (Hadden and Cowan 2000). During that time, the amount of activity online has increased greatly, and the forms of Internet usage have diversified seemingly without end. It is also true that scientists have discovered new ways to extract data from websites or other Internet-based systems, even when they are not explicitly religious, that can benefit researchers interested in religion. No longer is the task merely studying the innovative ways people can use Internet for religious purposes. It is now also possible to use Internet-derived data to develop and test general theories of religious behavior that apply offline as well as online.

This paper will describe Internet-based research methods that are cutting-edge, meet reasonable tests of validity and reliability, and are sufficiently practical that students can use them for graduate papers and dissertations at the same time that their professors are preparing professional publications based on them. The emphasis will be on quantitative methods, but some qualitative methods will also be mentioned, in part to place the quantitative techniques in a wider methodological context, as well as to identify directions in which innovations might be developed. At the outset, we can identify seven general principles:

1. Internet based research can employ traditional techniques of social-science research, and can adapt those methods in fresh ways.

2. Entirely new valid methodological approaches can also be developed, sometimes with only the most tenuous or metaphoric relations to earlier methods.

3. To maximize both innovativeness and efficiency, collaborations between social scientists and computer scientists are often necessary.

4. Even when working collaboratively with computer scientists, a social scientist needs to develop a significant expertise managing Internet data, including even some programming knowledge, but this is actually not difficult to achieve.

5. Working with existing data collected from Internet, or with new data collected by an innovative online system, will require the social scientist to pay more attention to issues of data management than is common in more traditional contexts.

6. The best results will come from studies that carefully but aggressively address methodological and theoretical issues together, realizing that the most important challenges and opportunities require deep thinking about both, and that insights from one can inform the other.

7. Internet-related technologies and their social applications are in constant flux, so researchers should be looking for new possibilities, and the examples offered here are meant to inspire rather than constrain scientific creativity.

Collaborations between social scientists and computer or information scientists will require both sides to gain appreciation of the other's point of view. Social scientists in particular will need to realize that many of

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 1 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion the very best computer scientists conceptualize science very differently, particularly without the same kind of dedication to theory and zeal in comparing competing theoretical positions that social scientists love. One example will suffice, an excellent recent computer science article about religion and information technology,

"Re-Placing Faith: Reconsidering the Secular-Religious Use Divide in the and Kenya" by Susan P.

Wyche, Paul M. Aoki, and Rebecca E. Grinter.

Before we even consider the topic, it is important to note that this is a conference paper, given at CHI

2008 in Florence, Italy. Conferences play an almost totally different role in computer science from the role they play in social science, and CHI is the most prestigious and influential scientific gathering on the relationships between human beings and information technology. It is the annual conference of SIGCHI, the special interest group on human factors in computing of the Association of Computing Machinery. Giving a paper at CHI is like getting one published in Social Forces for a sociologist, but the publication is immediate, rather than waiting a year or two as with social science paper journals. A social scientist who wants to collaborate with computer scientists will need to adapt to the rough and rapid, but still seriously reviewed, publication system in computer science.

Another characteristic of this article that requires some adjustment on the part of social scientists is that it seems to have a very practical focus, rather than being motivated by the desire to test abstract theory. Noting the continuing and perhaps increasing significance of religion, and the possibility that secular populations make greater use of information technology, the researchers have carried out a series of studies to understand how information technologies could be better designed to serve the distinctive needs of highly religious people, indeed to serve some of their religious needs (Wyche et al. 2006, 2009a, 2009b). For example, in this study the researchers discovered that religious people often want to remember points that were made in an especially inspirational Sunday church sermon, and so they developed a note taking system using mobile phone technology to help them accomplish this in a versatile, convenient, and cost effective manner.

A third characteristic of the study is that the investment in varied aspects of the methodology has a very different balance from what we would expect to see in a professional social scientific study. The research team collected data in Atlanta, Georgia, and Nairobi, Kenya, at great effort, but did so through somewhat unstructured interviews and ethnographic observation with small numbers of individuals. This is standard in the field of human-computer interaction research. The goal is to understand in depth what can be learned from people who

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 2 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion act as key native informants and who invest much of their own effort in the study, but without any concern over what fraction of the general population these people represent. Their function is to inspire innovation among the computer scientists, who design new technology through a sort of collaboration with their research subjects.

In the case of this fine study, the result is a contribution not only to knowledge, but even more importantly to the existing store of design ideas from which technologists may draw, and a contribution to the people of faith who will use future information technology designed to serve religious purposes. For computer scientists, theory tends to mean one of two things. First of all, it refers to mathematical theory typically concerning methods of calculating algorithms. The criterion of good theory by this definition is that it guides calculations that are both swift and accurate. Second, theory in the human-centered computing area really refers to design principles to guide the creation of new technologies to serve specified human needs. In this case, the computer scientists draw intelligently upon some social science of religion concepts, and they accomplish good ethnography of Kenyan religious and community culture, but in the service of future technologies to benefit religious people, rather than to frame abstract theories about religion.

One more feature of this study deserves mention as background for the present paper, namely that it studies information technology broader than the term "Internet" would cover. The people in Atlanta used

Internet, but those in Nairobi used cell phones and text messaging using those phones. Technically, Internet refers to a data communication network that uses the TCP/IP protocol, but much of what you can access through

Internet is not really native to it and may originally use other technologies. The World Wide Web is a subset of the billions of files reachable over Internet, those formatted with the Hypertext Markup Language (HTML), and within the Web there are many files belonging to the Deep Web that cannot be accessed by search engines because they are behind password protection or other barriers. Just as the Web is a subset of Internet, Internet is a subset of The Net, which comprises all forms of electronic communication. Already barriers are breaking down between traditional electronic media, and the distinctions between radio and podcasts, television and YouTube, telephone and Skype are historical anachronisms. Thus, while this paper will emphasize data that can indeed be accessed over the current Internet, the reader should be alert to the fact that realities and definitions are changing rapidly, and all modes of electronic communication are currently converging.

Here we shall emphasize the usual social-scientific concerns with theory and methods, more than technological results, but remain mindful of the somewhat different priorities of the computer scientists who

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 3 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion provide us with the needed technologies. We shall consider different kinds of Internet-based research under six rough headings, arranged from work that is most similar to traditional quantitative social-scientific methodologies, to work that is least similar but still connects directly to the kinds of theories that social scientists have addressed for many years. We begin with online questionnaires, which draw upon a century of survey research traditions, then turn to recommender systems, which are very new but similar in many respects to questionnaires. Geographic data analysis also has a century-old tradition in the social sciences, but new sources of georeferenced data can be found online today. Although everybody is familiar with search engines like

Google and Alta Vista, they can be used in a number of ways to collect data that can be analyzed in several ways, and more advanced natural language processing methods naturally build on familiar features of search engines. New areas where old theories can be applied include cultures inside virtual worlds.

1. Online Questionnaires

Computers have been used to administer questionnaires for many years, but mass administration online directly to respondents waited until the World Wide Web gained popularity in the mid-1990s. Perhaps the most important traditional application before then was in computer-assisted telephone interviewing. As pioneered by the U.S. Census in 1790, and perhaps rather earlier around the year 0 when Caesar Augustus sent agents to count the population of the Roman Empire so it could be taxed, interviewers had long asked standardized questions verbally, writing down the responses themselves rather than requiring the respondent to do it. I have seen rough estimates that perhaps ten percent of the adults in the Roman Empire could read and write, so most could not have filled out a paper questionnaire, but we should be mindful of the fact that some people in modern societies cannot do so either, and each technology excludes at least some potential respondents.

Using a computer to do telephone interviewing has several advantages, some of which transfer to online questionnaires. The interviewer reads the questions from the screen, and enters the response with a single key press or mouse click, or in some cases typing in the word or phrase the respondent speaks. The computer automatically moves to the next question, saving the effort of manually turning a page, and it can jump to contingent questions that might confuse the interviewer and would often confuse respondents if the questionnaire were on paper. A common example is questions about religious affiliation. Are you Catholic, Protestant, Jewish,

Other, or None? People who selected "Protestant" are then often asked to define exactly which Protestant denomination they belong to, something one would not bother asking a Roman Catholic.

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 4 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion

Among the greatest advantage of computer-assisted interviewing is that it skips the often laborious process of entering responses from a paper questionnaire into a computer for analysis. It should be recalled that while computer-administration may be relatively new, computer analysis is quite old, arguably dating from

Hollerith's work on the 1900 census and even earlier (Bainbridge 2004c).

As Donald Dillman (2002) has noted, Internet-based questionnaire surveys are now one of the important research options, and their disadvantages are somewhat reduced by the increasing difficulty of getting good samples for telephone surveys. The chief issue for Internet-based questionnaires is that their data will not be representative of the population as a whole, both because many people — especially in some subgroups — will not have Internet access, and because many people will refuse to answer a questionnaire online when invited to do so.

I would argue, however, that conceptualizing online questionnaires in terms of traditional survey research is too limiting. As I understand the term, survey is not synonymous with questionnaire. Rather it refers to an attempt to collect new data that are representative of the population of interest. Conceivably, a survey could be done without asking any questions, for example visiting a random sample of rural homes to visually determine what fraction of them had indoor flush toilets. For at least two reasons, sociologists and political scientists had gotten in the habit of assuming that every proper questionnaire needed to be administered to a random sample.

The first reason was descriptive. If the goal is to describe a population, then a census is the methodologically best method, but cost concerns often rule that out. A simple random sample, if it is large enough, should accurately represent the population. Furthermore, if nonresponse bias is also random, then it is possible to use statistical techniques to estimate the sampling errors. Unfortunately, nonresponse biases are not random, and increasing fractions of the population refuse to be surveyed, or simply cannot easily be located.

Face-to-face administration tends to get the highest response rate, but is exceedingly costly. Thus, a national questionnaire like the General Social Survey will use a cluster sampling technique, to minimize interviewer travel costs, and tends to advise against uncritical application of tests of statistical significance which assume simple random samples. In his textbooks in research methodology, Earl Babbie (2004) has been advising students that tests of statistical significance are not really appropriate in sociological research, a controversial point but one that clearly highlights the issue.

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 5 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion

Descriptive accuracy primarily serves journalistic, political, and policy purposes, rather than scientific ones concerned with discovering and testing general theories. Polling research earned the social sciences much prestige in the wider world, by offering insights and advice that were credibly based on rigorous, scientific methodology. Journalists want to be able to say what is happening to "people" in their society, or to the society in general. Politicians want to know what the electorate thinks about the issues of the day, so they require a random sample of voters – or of those mythical beasts, the "likely voters." Policy makers similarly need to know what is happening to "the American family" or "the average citizen."

When the General Social Survey was launched back in 1972, it was an expression of the Social

Indicators Movement that hoped to use the GSS to monitor conditions in the United States so that policy makers could adjust government regulations and programs for maximum benefit. Using questionnaire surveys as social indicators to guide government policy assumes a lot about the way governments and particular political parties actually function, and for most of the years since the birth of the GSS, sociological surveys were simply not a significant part of US government decision-making.

The second reason why representative samples are preferable is related to the fact that social statistics tend to assume simple random samples, but goes a bit deeper than that. Hopefully simple random samples minimize the possibility that the correlation between two variables is the spurious result of other variables, or that the lack of a correlation results from a real relationship that is masked by some unmeasured suppressor variable. This is a debatable point, but in practice I suggest that many social scientists take this idea for granted without even noticing it. Consider a random sample of the United States. Typically, as in the case of the GSS, the sample leaves out "institutionalized" populations, children, Americans living abroad (or in the armed forces), and perhaps the underclass and undocumented immigrants. But even if you could get a true random sample of

Americans, you would not have a random sample of human beings. Americans are five percent of the world, and the current world is perhaps five percent of all the humans who have ever lived. Thus there is a huge selection bias, and crucially for this point, that bias may correlate with variables of interest.

Rather then relying upon a random sample to limit spuriousness and suppression, which it may not really do very well anyway, a better choice is replication. There are really two functionally related ways to accomplish this. External replication means giving a questionnaire to members of very different groups, to see if the results carry over from one to another. Internal replication is accomplished when the use of subsamples or

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 6 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion statistical techniques controlling for additional variables accomplishes the same thing within a single dataset.

Under favorable conditions, both of these can be accomplished with online questionnaires, if effort is invested to get a very large and diverse set of respondents, affording many opportunities for internal replication, and if one is prepared to replicate key findings by some other method. Perhaps the most famous pre-Internet example of external questionnaire replication is the Glock and Stark (1966) study that initially surveyed Northern California church members in a very limited geographic area, then subsequently replicated key findings with a national sample.

While there were good reasons for giving high priority to sampling with pre-Internet questionnaires, this inescapably gave lower priorities to other values, notably item quality and topic coverage. In the 1950s and

1960s, much more effort was invested in item-creation than today, especially in development of multi-item and often multi-factor measurement scales. An expensive national survey often cannot afford to include many items on a single topic, and the ones that are included need to be intelligible to everybody. Thus, they are written to a

"lowest common denominator" standard, rather than reflecting the complexity and nuance of theoretical debates in the social sciences of religion.

If the aim is to study a small subgroup of the population, such as atheists, then one will need either a huge sample, or a carefully targeted one, each of which might be achieved over Internet (Bainbridge 2005). Cost considerations and the fact that the average person has no opinion on many of the topics of interest to social science also militate against research on a wide range of topics that are relevant only to subgroups within the population. This is especially worrisome when the research concerns social and cultural change, because many new phenomena will be unknown to the majority of respondents in a random sample of the general population.

Online questionnaires can address these issues in a number of ways, beginning with where the items come from in the first place.

At a first approximation, the material for questionnaire items can come from two very different sources:

(1) existing theory expressed in the publications of social scientists, or (2) the experiences, beliefs, and behavior of the non-scientists we wish to study. In general, I do not favor survey researchers writing items out of their own imaginations, as they sit in their academic armchairs, but I advocate going through a serious process of discovery beyond the boundaries of their own personal experience. My favorite classic example of items derived from existing theory is the Mach Scale developed out of the works of Italian political theorist Niccolò

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 7 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion

Machiavelli (Christie and Geis 1970). A number of statements were derived directly from Machiavelli's publications, then augmented with a few others that expressed ideas that were in his works but not stated so simply. A large collection of these items were administered to college students, in a lengthy iterative process, and then statistical techniques were used to develop a high-reliability 20-item scale, containing a couple of subscales. This Mach Scale was then used in a wide variety of studies with different populations, which had the effect of determining its generalizability beyond the original student population.

Classical scale-construction work like this in personality and social psychology inspired me to launch an

Internet-based project in 1997, called the Question Factory. I posted a number of online questionnaires consisting of open-ended items, asking people to express their views on some topic. One asked, "Imagine the future and try to predict how the world will change over the next century. Think about everyday life as well as major changes in society, culture, and technology." After successful preliminary work with The Question

Factory, this item was included in the pioneering Web-based questionnaire, Survey2000, organized by sociologist James Witte and sponsored by the National Geographic Society (Witte, et. al. 2000). Approximately

20,000 respondents gave thoughtful written responses to this item, from which I was able to cull 2,000 distinct predictions, 100 of them about religion (Bainbridge 2003, 2004b, 2004d).

A very more recent example, not directly about religion but it easily could have been, is part of a doctoral dissertation about World of Warcraft (WoW) by a British student named Jane Barnett (Barnett et al. in press). The focus was how people conceptualized anger and the behaviors that made them angry, in this online virtual world. Barnett began, using online forums and email rather than an online open-ended questionnaire, by eliciting examples of in-WoW scenarios that had made 33 thoughtful respondents angry, and she edited and combined these to produce a battery of 93 provisional items. Hundreds of other respondents rated them in terms of how angry these behaviors would make them feel, and an interactive process employing factor analysis and scale reliability measures reduced them to a 28-item scale with four subscales. One finding that might be relevant to the social science of religion is that people become angry at other people's negative behavior, regardless of whether that behavior was intended to harm. This reminds us that the moral codes promulgated by religions may not directly relate to the cognitive and emotional processes that determine people's senses of anger or appreciation.

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 8 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion

Once one has questionnaire items, one needs respondents. One of the factors that made Survey2000 a success was the fact it was sponsored by the National Geographic Society, and the NGS publicized the questionnaire on its website and in its main magazine. About 50,000 people completed the questionnaire, most in the United States and Canada, but with at least 100 respondents from each of 33 other nations.

A year later, the NGS helped publicized Survey2001, which actually consisted of separate questionnaires for adults and children, and the adult questionnaire was administered online in four languages.

Readers of National Geographic magazine have diverse interests, but they are probably far more aware of environmental and global issues than the average person. Thus many of the topic areas were salient for most respondents, even though they were not a random sample. Many items were organized in topical modules, and each respondent was given one at random. After completing it, the respondent was given the choice of doing another one, also selected by the computer at random. Again, this process trades the representativeness of the sample against salience of the items for the respondent, but analysis of the data showed great diversity of opinion among respondents to any module. Given the very large number of overall respondents, each module obtained many responses, and the article on the New Age I published in Journal for the Scientific Study of Religion

(Bainbridge 2004) was based on fully 3,909 English-speaking respondents to the module I included in

Survey2001.

Teenage respondents to the youth questionnaire in Survey2001 were recruited in two very different ways. First, many were recruited off the National Geographic website. Second, others filled out the questionnaire as a school assignment connected with Geography Awareness Week. Teachers were recruited so that two classes did the questionnaire in each U.S. state and province of Canada. The fact that these two methods obtained very different kinds of respondents, permitted internal replication, and in one study I compared gender correlations with 1,191 respondents in each group (Bainbridge 2002).

Inviting respondents is not the same thing as motivating them, and motivational factors will vary depending on the nature of the population and the topic of the research. A study by Dmitri Williams and his collaborators (Huh and Williams in press) is a marvelous example of how motivation and salience can combine with opportunities to collect additional data online to supplement a questionnaire. His study is part of a massive effort focused on the virtual world (or online multiplayer role-playing game) EverQuest II. The Sony company, which created EverQuest II, provided access to the raw data on its computer servers, documenting millions of

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 9 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion social and economic interactions between the avatars of the users. A random sample of players was then sent an invitation to complete an online questionnaire, and offered a highly valuable virtual object as payment, achieving a very high response rate. The questionnaire included a well-developed battery of items about motivations for being in EverQuest II, as well as objective questions about the respondent such as his or her gender. It was then possible to connect the questionnaire responses to the characteristics and behavior of the avatars, for example comparing the gender of the person and his or her avatar, and comparing the degree of aggressiveness across both the real and virtual genders.

Another study that shows how online methodological innovations can achieve scientific gains was done in Japan and published in American Journal of Political Science (Horiuchi et al. 2007). This study combined a questionnaire with a randomized assignment experiment, and employed analytical innovations as well. One of the issues in the 2004 election to the upper house of the Japanese legislature was pension reform. Three questionnaires were used at different stages in the process: respondent screening, pre-election attitudes, and post- election attitudes. The sample was randomly assigned to one of three groups: (1) those asked to visit the website of one the two main political parties, (2) those asked to visit the websites of both parties, and (3) those not asked to visit any website and not given the pre-election questionnaire. Of course, the main comparisons concerned responses to the post-election questionnaire. Random assignment to the treatment groups and the control group is of course a traditional method used by experimentalists to get around biases introduced by non-random samples of respondents. This study underscores the tremendous possibilities for methodological innovation, building on traditional methods, which Internet offers.

2. Recommender Systems

A vast amount of information about modern culture lies latent in the databases of commercial websites in what are usually called recommender systems (Resnick and Varian 1997; Basu et al. 1998) but also sometimes referred to as collaborative filtering systems (Goldberg et al. 1992; Canny 2002). With the growth of online merchandising, websites have invested heavily in recommender systems of many kinds that advertise to a user products the merchant thinks that particular individual might want to buy. A vast scientific literature now exists concerning recommender systems, but essentially all of it is oriented toward making predictions of customer preferences, rather than exploring how these systems could be used as social science research tools (Herlocker et

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 10 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion al. 2004). The most obvious way to use recommender systems to do social science research on religion is to examine what religious movies or books cluster together, based on preference correlations across large numbers of cases, employing statistical techniques almost identical to the ones we have been using for decades with questionnaire data.

In some cases, such as the one for the Netflix movie rental company, the system actually uses a simple questionnaire. People who rented movies are invited to rate them on the website, using a five-step scale. Then the system uses statistical methods to predict which other movies the individual might want to rent, based both on that individual's expressed preferences, and the preferences of other people whose preference patterns are similar. The Internet Movie DataBase is not a rental company, but it also encourages people to rate movies, using a ten-point preference scale. We will use some data from these two sources to illustrate typical research procedures, admittedly on a much smaller scale than a real research project would use.

The Internet Movie DataBase has a category called "based on the Bible," including 10 theatrical-release films that were rated on a scale from 1 to 10 by at least 1,000 persons.i Of these, seven are also in the NetFlix database, and listed here in Figure 1. The IMDB data are available for anyone to see on its website, whereas the

NetFlix figures come from analysis of the raw data, which were distributed to anyone who wished to register as a contestant in the first NetFlix contest, designed to see if anyone could create a better algorithm for predicting people's preferences. The contest data consisted of 17,770 separate text files representing an equal number of movies, and some effort was required to get these data in shape for analyzing.

Figure 1: Seven Bible-Related Movies in Two Recommender Systems

IMDB IMDB NetFlix NetFlix Raters Mean Raters Mean The Ten Commandments (1956) 18,481 7.9 20,910 3.9 The Last Temptation of Christ (1988) 18,628 7.5 12,739 3.4 The Prince of Egypt (1998) 21,568 6.8 16,664 3.7 Jonah: A VeggieTales Movie (2002) 1,585 6.4 7,775 3.6 The Greatest Story Ever Told (1965) 2,976 6.3 3,180 3.6 The Bible: In the Beginning… (1966) 1,179 5.7 955 3.3 Left Behind (2000) 3,816 4.6 4,646 3.3

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 11 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion

A quick look at the work preparing the NetFlix data can illustrate the need for data management skills on the part of researchers. Each of the text files contained a long series of short lines, each one representing the response by one person. Here are the first five lines of the file for The Ten Commandments:

577397,3,2005-07-05 1527030,1,2005-07-07 2480084,5,2005-07-13 891353,3,2005-07-14 1718816,4,2005-07-15

The first number is an ID code representing the respondent; this is crucial, because it allows the researcher to combine the data for different films rated by the same person. The total number of respondents in the dataset is 400,000, but the ID numbers go considerably higher, one of the little details of which the researcher needs to be aware when preparing to assemble the dataset. The second, one-digit number, between the two commas, is the actual preference rating for that respondent and film, a number from 1 (did not like) to 5

(liked very much). The last part of each line is the date on which the person rated the film. The file for The Ten

Commandments has fully 20,910 such lines of data.

Simply put, there are two ways to combine the necessary data files: (1) do it manually, using whatever standard tools one is already familiar with, or (2) write a computer program specially designed for the particular project. I use both methods, and generally find that I need to do a little manual work before I really understand what features need to be coded into a program that will do the "heavy lifting" for me.

For example, using an ordinary word processor and spreadsheet, I manually combined the data for the first three very popular films: The Ten Commandments, The Last Temptation of Christ, and The Prince of Egypt.

The first two films are live-action epics depicting portions of the Old Testament and New Testament, respectively. The Prince of Egypt is a cartoon remake of The Ten Commandments, even adopting the same debatable assumption that the pharaoh dealt with was Ramses the Great. The two movies about Moses treat the subject reverently, whereas The Last Temptation of Christ was a very controversial film, based on a controversial novel by Nikos Kazantzakis, as its Wikipedia page explains: "Like the novel, the film depicts the life of Jesus Christ, and its central thesis is that Jesus, while free from sin, was still subject to every form of temptation that humans face, including fear, doubt, depression, reluctance and lust. This results in the book and

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 12 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion film depicting Christ being tempted by imagining himself engaged in sexual activities, a notion that has caused outrage from some Christians."ii

Thus these three films nicely illustrate ways in which works of popular culture may differ along various dimensions. The word processor was used to replace the commas with tabs, so that the data would automatically go into the correct columns when loaded in the spreadsheet. Then a good deal of manipulation — the equivalent of programming by putting if-then statements into spreadsheet cells and doing several sortings — was required to get the data in shape for analysis both in the spreadsheet itself and after transfer to the SPSS statistical analysis software. For larger numbers of films, one would want to invest the effort to write a program that could combine hundreds of files automatically.

Of the total 42,572 respondents, 35,617 rated only one of these three movies, 6,169 rated two, and 786 rated all three. This suggests researchers will need to deal with challenges of missing data, but that whenever

Internet provides very large numbers of cases for statistical analysis, a sufficient number will connect any two variables. For the 4,240 people who rated both movies about Moses, the films correlated significantly (r = 0.33).

Just 2,634 people rated both Ten Commandments and Last Temptation of Christ, and the correlation was only

0.02. A total of 1,653 rated Last Temptation of Christ and Prince of Egypt, with a preference correlation of only

0.05. A recent publication, using a slightly different subset of the NetFlix data, found a solid positive correlation

(0.31) between Ten Commandments and the reverent 2004 film, The Passion of Christ (Bainbridge 2007b).

The fact that many people rated both Moses films, but fewer rated either of them with the controversial film about Jesus, suggests that there is a second way to code preference data — not in terms of which scale rating was given, but whether a film was rated at all. I recoded the ratings so that 1 represented any rating and 0 represented no rating. This analysis produced three negative correlations, suggesting that the three films had significantly different audiences. The two Moses films had a moderate negative correlation (-0.23), and the two live action films had a somewhat larger one (-0.37). But there was a huge negative correlation between Prince of

Egypt and Last Temptation of Christ (-0.60), probably because the former is a cartoon feature that families may have watched with their children, whereas the latter is decidedly an adult film.

This recoding eliminated the very concept of missing data, so the correlations were based on fully

42,572 cases. Although these correlations were calculated in a reasonable manner, quite suitable for comparison purposes, it should be pointed out that the calculation did not include any of the roughly 357,000 people in the

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 13 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion dataset who did not rate any of the three films, something one might need to consider doing for different research purposes.

Researchers who want to make use of recommenders systems to chart cultural trends should realize that people's preferences for cultural products like movies are only partly determined by their ostensible topics. Also important for films are the featured actor, the year the film was made, and what might be called the mood, style, or emotional tone of the picture. An excellent example is what results when the 1959 movie Ben-Hur is entered into MovieLens, a motion picture recommender system created for research purposes by GroupLens Research at the University of Minnesota.iii The ten most similar movies, as reflected in correlations between people's preferences, are:

Ben-Hur: A Tale of the Christ (1925) Spartacus (1960) Ten Commandments, The (1956) Great Escape, The (1963) Patton (1970) Bridge on the River Kwai, The (1957) Seven Days in May (1964) Longest Day, The (1962) Fail-Safe (1964) Magnificent Seven, The (1960)

The first of these is the silent film based on the same novel as the 1959 movie. Like Ben-Hur,

Spartacus depicted the Roman Empire and was released just the year after it, however the ideological content of

Spartacus was not Judeo-Christian but class politics. Ten Commandments, like Ben-Hur, was oriented toward the

Bible and starred the same actor, Charlton Heston. The other films date from roughly the same period as the target film, concern human conflict, and tend either to have noble main characters or at least to raise issues about nobility of character. One could say these are all serious action pictures with strong plots, either set in historical settings, or in the case of the Cold War related movies, Seven Days in May and Fail-Safe, historical from today's perspective. All have famous main actors. Thus, the religious dimension of Ben-Hur is only one of the factors that makes it correlate with other films in people's expressed preferences.

Movies are a convenient example, but many kinds of products are covered by recommender systems, and others include items with religious significance. The online bookseller, Amazon.com, bases its recommender system on actual book-buying behavior, rather than preferences expressed on a questionnaire scale.

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 14 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion

Amazon.com's internal data would be excellent for research purposes, but what is available online is not very detailed and useful chiefly for examples. On July 21, 2009, Amazon.com categorized 1,865 items in a general

Religion and Spirituality category, with these three heading the best seller ranking:

The Family: The Secret Fundamentalism at the Heart of American Power by Jeff Sharlet

The Secret by Rhonda Byrne

The Biology of Belief: Unleashing the Power of Consciousness, Matter, & Miracles by Bruce H. Lipton

According to its Amazon.com Web page, customers who bought The Family also bought Crazy for

God: How I Grew Up as One of the Elect, Helped Found the Religious Right, and Lived to Take All (or Almost

All) of It Back by Frank Schaeffer and four secular books that were critical of contemporary American culture.

Apparently, one popular current theme is conspiracy theories of American politics, some of which involve religion.

Customers who bought The Secret also bought three related products by the same author, plus Law of

Attraction: The Science of Attracting More of What You Want and Less of What You Don't by Michael J. Losier and You Can Heal Your Life by Louise Hay which carries the motto, "What we think about ourselves becomes the truth for us..." Customers who bought The Biology of Belief also bought two self-control inspirational books by Dr. Wayne W. Dyer, Excuses Begone! and No Excuses!, and two mind control books by Lynne McTaggart,

The Intention Experiment: Using Your Thoughts to Change Your Life and the World and The Field Updated Ed:

The Quest for the Secret Force of the Universe. They also bought The Divine Matrix: Bridging Time, Space,

Miracles, and Belief by Gregg Braden. These examples remind one of The Power of Positive Thinking by Dr.

Norman Vincent Peale, and customers who bought that classic book also bought classic self-help books by Dale

Carnegie. Thus, a second popular category of "Religion and Spirituality" books covers self-control books that vary in the extent to which they employ religious rather than psychological or pseudoscientific metaphors.

Amazon.com does carry many conventionally religious books, but these examples show how a recommender system can be used to explore ongoing developments in the surrounding culture that relate to religion without necessarily corresponding with traditional definitions.

3. Geographic Data Analysis

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 15 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion

This approach applies traditional quantitative methods of social ecology to new kinds of data already available on the Web but little exploited so far. Social scientists have long compared geographically-based religion-related variables to develop and test theories. Perhaps the most familiar classic work is Emile

Durkheim's 1897 book Suicide, which compared rates of self-murder between Protestant and Catholic areas of

Europe. Less familiar, but at least available in English, was Henry Morselli's 1882 book on the same topic, which was the source of many of Durkheim's numbers but less ambitious theoretically. However, the real classic in this tradition is almost totally unknown, Adolph Heinrich Gotthilf Wagner's 1864 book, Die Gesetzmässigkeit in den Scheinbar Willkürlichen Menschlichen Handlungen vom Standpunkte der Statistik, which has never been translated. In my view Wagner's book is by far the most admirable of the three, not merely for being earlier, but precisely because it is more cautious than Durkheim in asserting theoretical explanations and does not, like

Durkheim, leave out statistics that inconveniently contradict the theory.

Given the century and a half tradition of geographic statistics on religion, what Internet chiefly contributes is access to a large number of new measures, or more convenient access to data that have been available before. In the early 1980s, I counted classified telephone book listings for astrologers and new religious movements in both the United States and Canada (Stark and Bainbridge 1985). While some effort is required to assign them to the correct geographic units, the chief challenge thirty years ago was finding the phone books in the first place. I located many in my university library, others in a city's public library, and in a few cases I hired a student to call information operators in small cities and ask them politely to check their own local phonebook.

For a study of the 22 metropolitan statistical areas in Canada, I actually obtained my own personal collection of all the paper phonebooks.

Online telephone directories greatly simplify this work, although they do not remove all the hand labor.

First, one must compare online telephone directories to identify the most complete one. Typically, one must then work manually state by state in the US, entering the desired search term or scanning all the listings for churches, because it is hard to write a computer Web crawler program to do this automatically. For a recent tabulation of astrologers by state, I found that the most accurate method was to paste each page of astrology listings into a word processing document, then edit it with a combination of manual labor and search-and-replace commands, before porting the text into a spreadsheet (Bainbridge 2007a: 117, 254). Then more work was required to format

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 16 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion the data, often simply because different listings had different numbers of lines of data, and to find duplicate listings that needed to be removed. Some sense of the magnitude of this work is reflected in the final total of unique listings, which was 3,859, and three work days were required to prepare the data manually for computer analysis.

Often, a religious denomination or movement lists its centers, clergy, or even members on a website, that may be used in the same manner to generate geographic rates. Figure 2 shows five measures I developed from such websites.

Figure 2: New Religion Indicators per 100,000

Yoga Geographic Regions of TM 3HO Yoga Serve Alliance the US Websites Centers Teachers Teachers Teachers New England 2.12 0.22 0.48 6.29 8.17 Middle Atlantic 1.55 0.03 0.22 2.06 6.04 East North Central 1.19 0.04 0.08 0.82 3.20 West North Central 1.30 0.07 0.11 0.74 2.18 South Atlantic 4.01 0.05 0.19 1.14 4.44 East South Central 0.37 0.02 0.03 0.47 1.18 West South Central 0.88 0.03 0.19 0.52 2.11 Mountain 2.98 0.07 0.69 1.33 6.17 Pacific 9.60 0.10 0.45 0.87 4.13 USA 3.26 0.06 0.25 1.30 4.10

In 1998, the Church of Scientology launched 15,693 personal Web pages in 11 languages for members in 45 nations. Of the total, 8,762 or 55.8 percent were residents of the United States, and they were tabulated by the nine divisions of the nation in Figure 2. The remaining columns tabulate data for four Asian-oriented religious or spiritual movements, beginning with rates based on 178 Transcendental Meditation centers in the

United States in 2006. In the same year, the website of the International Kundalini Yoga Teachers Association, the successor to the Healthy-Happy-Holy Organization (3HO) of Yogi Bhajan, listed 747 3HO yoga teachers. A website called Yogaserve listed 3,847 teachers of yoga in the US who have chosen to register, and the website of the Yoga Alliance listed fully 12,166 teachers.

Such data are very useful to test or develop theories about the socio-cultural environments that are hospitable for new religious movements (Stark and Bainbridge 1985). In general, western areas of the United

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 17 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion

States have high rates of geographic migration, low rates of membership in conventional religious organizations, and probably as a consequence have high rates of new religious movements. However, in Figure 2 as in earlier data, New England has somewhat high rates, despite having church-member rates comparable to other eastern regions. Among the theories that could be tested about why this is the case are three: (1) new religious movements are attracted by the high density of elite educational institutions, (2) for historical reasons New

England is weak in religious which would provide an alternative to mainstream denominations, or (3) something about the socially conscious (e.g. liberal) culture of New England. Like some earlier data, the table also suggests that the South Atlantic region may be increasingly open to some kinds of spiritual movements, possibly in retirement communities in Florida, or secular communities in Florida and around Atlanta and the

District of Columbia. Of course, data on any one may reflect its own unique regional history, and the geographic location of its headquarters, so the availability of data about numerous groups over

Internet is a great benefit for researchers.

For the kinds of things counted in the above table, it makes perfect sense to use the total populations of the geographic area to produce rates. In some cases, one might want to use some subset of the population, such as adults or elderly people, as the divisor. In other cases, one might need to use a completely different kind of variable for the divisor in a rate. For example one might divide the number of churches belonging to one denomination by the number of churches belonging to all denominations. The first column of the table is based on websites belonging to the Church of Scientology, but established for individual members, so population is a good divisor. For rates with other kinds of websites in the numerator, however, one might need websites in the denominator as well.

For example, one could compare all the Web pages hosted by the governments of U.S. states, to see what fraction of them in each state contained a religion-related word like "church." At one time, one could get decent geographically-based counts from searching websites in each of the fifty US state domains, because originally the .us domain was limited to governments. Thus, one could enter "church site:ma.us" into Google to get all the government Web pages registered in the .us domain that had the word "church" on them. More recently, the .us domain was opened up, so that citizens and non-governmental organizations can use these domains, and the implications for social science are not yet clear.

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 18 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion

When basing rates on ratios of websites, one should be alert to the possibility that relationships will be non-linear, because for example very small-population states may need pages covering a wide range of topics, almost as wide as for large-population states. The basic lesson is that one must become familiar with one's data, and think carefully about what social process produced the cases, in order to know what the statistics actually measure.

Some researchers may want to invest in developing cooperative relationships with corporations that have access to geographically-based data through their online business. For example, Google offers businesses a complex service called Google Analytics, which can produce maps and tables of the numbers of people accessing a given Web page from different geographic locations.iv In many cases, a company's website provides geographical data but in an inconvenient form, and thus working with the company to obtain the data directly could be much more efficient. For example, I just entered the word "Christ" into the eBay website and discovered 9,397 items for sale whose descriptions contained the word "Christ." For each, I could manually look at the advertisement to see geographically where the item was, but doing so for all of them would be exceedingly tedious.

4. Search Engines

Among the most heavily used online services — and one of the most useful for social scientists in often unexpected ways — are search engines like Google. Although some details of each search engine are kept secret by the company offering it, they are based on principles from the cognitive and social sciences, as well as on computer science. Thus, social scientists of religion would do well to learn as much as they can about their research potential, and this section of the current essay can only scratch the surface. A good starting point for readers who want to learn more is the classic book Finding Out About by Richard Belew (2000).

When the World Wide Web was launched in the early 1990s, creators of Web pages were encouraged to put keywords describing the page in a hidden area of the HTML code that could be searched but would not be visible to the average user. Unfortunately, people very quickly gamed the system, putting popular but irrelevant terms in the code. In addition, as the Web grew — now with over a trillion pages — it became impossible to search it in real time. Commercial search engines index the Web by sending crawler programs out across it looking for new pages. They categorize Web pages in terms of the words in the part of the code visible to users,

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 19 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion but for many searches the number of pages containing the search term is enormous. I just this moment searched for "God," and Google gave me 469,000,000 Web pages on where to look.

One response, exemplified by the Alta Vista search engine, was to allow the user to do the Boolean searches preferred by librarians. Currently, Alta Vista allows the user to fill in any of four different text fields: all of these words, this exact phrase, any of these words, and none of these words. When I just now searched for

"God," Alta Vista estimated it could find 1,450,000,000 pages containing this word. When I told it to search for

"God" but only on pages that did not contain "Jesus" or "Christ," the estimated number of hits declined to

1,160,000,000. Clearly, this is still too large a number of pages for me to visit in this life. Therefore, modern search engines need to augment the traditional search for keywords with some method for prioritizing the pages.

As it happens, the first hit Alta Vista gave me in this more restrictive search was a Wikipedia Web page listing names of God in Judaism, clearly a very appropriate page given my search terms. Google's solution to the prioritization problem was PageRank, an algorithm based on links between Web pages, measuring what fraction of other relevant pages link to the page in question, thus a measure of its popularity for people interested in the topic of the search (Brin and Page 1998).

Most users of search engines seem unaware of the special ways in which they can be used, both the different ways in which searches can be framed, and the potential uses of the results of a search. An example of how both kinds of awareness can be useful to the researcher is the possibility of exploiting the ability of several search engines to limit searches to specified Internet domains. Searching Google's "God site:edu" gives you

4,350,000 pages that contain the word "God" which are in the ".edu" domain reserved primarily for U.S, educational institutions. Searching Google's "God site:gov" gives you the 826,000 US government pages that refer to God. "God site:nih.gov" gives you the 8,470 pages mentioning God on the immense website of the

National Institutes of Health. Given that different Internet domains represent different provinces of culture and society, comparing across domains can be useful for social scientists.

When I did the research for Figure 3 in 2006 (Bainbridge 2007a: 153, 257), Google estimated that

173,000,000 pages contained the word "God." Of these, 11,900,000 were in the .edu domain, and 82,200,000 were in the .com domain. The ratio of these two numbers (.edu/.com) is 0.145 or 14.5 percent. This is a measure of how educational versus how commercial the concept is, but only if compared with the ratios for other terms.

Similarly, the ratio of .gov to .net pages, 9.4 percent, is a measure of how governmentally official the concept is.

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 20 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion

Note that the word "church" has higher ratios, reflecting the fact that churches are important educational and civic institutions, as well as religious ones. In contrast, words relating to agnosticism and atheism are relatively rare in official institutions of modern society, despite all the debates about secularization.

Another useful search trick is to seek the Web pages that link to a particular other Web page. For example, http://www.thearda.com/ is the home page of The Association of Religion Data Archives, a prominent online digital library. Searching Google for "link:thearda.com" returns 728 hits, including a list of religion- related websites on the website of Paul Brians of Washington State University.v

Figure 3: Google Estimated Frequency of Words on Web Pages

Pages Containing the Word (thousands) Ratios Words All Domains .edu .com .gov .net .edu/.com .gov/.net agnostic 5,040 140 2,640 14 230 5.3% 6.3% atheism 4,970 109 2,750 1 431 4.0% 0.1% atheist 7,660 118 4,740 1 393 2.5% 0.2% Bible 68,400 4,460 39,400 199 3,210 11.3% 6.2% church 160,000 23,100 65,600 2,320 6,420 35.2% 36.1% God 173,000 11,900 82,200 792 8,460 14.5% 9.4%

Paul's page offers a good example of how sophisticated users most efficiently use Web pages. A naive user would laboriously click on each link and look at the page it leads to. A researcher on religion websites should probably do something quite different, opening the source code from the browser (View/Source in

Internet Explorer and View/Page Source in FireFox). This immediately lets the user see the page description text which Google displays, and Paul's rather responsible list of hidden keywords — although "cool sites" is debatable:

Later in the HTML code the user would see the actual links, the first eight of which are:

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 21 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion

Academic Info on Religion

Esoterica: The Journal of Esoteric Studies

Internet Sacred Text Archive

Adherents.com Religion statistics

Review of Biblical Literature

3E Encyclopedia Information about various bodies of mystical and religious thought.

Religion

American Religion Data Archive

Note that each of these gives the link itself plus words that appear on Paul's page where the user would click to go to the site. A sophisticated user would copy the whole section of the source doe into a word processor, search and replace every "<" or ">" or quotation mark with a tab, then dump the result into a spreadsheet, which if set correctly will immediately activate the links just as if they were on Paul's Web page.

But now the user can save the information, access it conveniently later on, and add other information about the sites if desired.

Search engines not only allow one to map the relationships between websites (or the topics they represent) in a kind of conceptual space; they can also chart changes over time. For example, Google offers a trend analysis — or rather a pair of analyses, one based on Google searches by individuals, and one based on how frequently a given topic has appeared in Google news stories.vi For example, I entered the word

"Scientology" into Google Trends, and got the graph shown in Figure 4.

Figure 4: The Result of Entering "Scientology" into Google Trends

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 22 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion

Note that Google adds flags to prominent news stories that might explain some of the peaks in the search popularity graphs. In this case, there were six, and they link to stories as follows:

A. Stars turn out for Cruise's Scientology wedding BreakingNews.ie – Nov 18 2006 http://www.breakingnews.ie/2006/11/18/story285683.html

B. Clearwater, Fla.: Scientology stronghold Herald – Sep 23 2007 http://www.bostonherald.com/entertainment/arts_culture/view.bg?articleid=1033551

C. Germany to ban Scientology TransWorldNews (press release) – Dec 7 2007 http://www.transworldnews.com/NewsStory.aspx?id=30014&cat=11

D. Cruise Scientology Video Surfaces Online Local6.com – Jan 17 2008 http://www.clickorlando.com/entertainment/15074244/detail.html

E. Scientology helped Cruise overcome dyslexia Frontline – Jan 5 2009 http://www.hinduonnet.com/thehindu/holnus/009200901051860.htm

F. French court tries Church of Scientology WOOD-TV – May 25 2009 http://www.woodtv.com/dpp/news/international/intl_ap_french_court_tries_church_of_sciento logy_20090525_2448265

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 23 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion

In 2005, when the search data peak, Google does not call out a news article, but one can get a sense of what was happening by entering "Scientology 2005" into the main Google search portal. The highest ranked site that came up when I did this on June 12, 2009, was an English-language version of a German government report critical of Scientology,vii which begins with a link to a news article at the English-language site of the German news magazine, Der Spiegel, "Germany Prepares to Ban Scientology."viii It appears that public interest in

Scientology is aroused whenever the mass media raise controversies about it, or when a Scientologist celebrity like Tom Cruise gets unusual publicity.

If one were doing a research project over a significant period of time, it would be possible to access and save websites periodically, and then analyze changes. This might work especially well when a short series of events or a heated online argument made people update the sites frequently. For a longer-term historical perspective, one may turn to the Wayback machine of the .ix One enters a website URL, such as www.scientology.org, and the Wayback Machine offers historical versions of it. Wayback archived the

Scientology website three times in 1996, beginning November 14, and 29 times in 2008. The peak year was

2005, in which it did so 355 times. Interestingly, the Wayback Machine does not include historical pages from www.xenu.net, the most prominent anti-Scientology website, and only this generic message appears:

"Siteowners might have also requested that their sites be excluded from the Wayback Machine. When this has occurred, you will see a 'blocked site error' message." For a researcher studying conflict around new religious movements, this constitutes data as much as it does missing data.

In earlier research (Bainbridge 2007b), I compared a pro-Scientology website with an anti-Scientology website, plus three sites about The Family (Children of God), two of them opposed to that group, by analyzing links to pairs of websites. Entering "link:www.scientology.org link:www.xenu.net" into the now-defunct MSN search engine produced all the websites that linked to both Scientology's official site, and to the most prominent anti-Scientology site. That does not seem to work for Bing, MSN's successor, and Google returns the sites that link to either of the pair, when what we need is only those that link to both. However, Alta Vista still permits this kind of search. Completing similar double link searches for all possible pairs of a set of religion-related sites, would provide data to map their degrees of similarity, without relying on the words contained on them, even though most uses of search engines are based on keyword searches. Already, researchers have begun directly examining the links on religion-oriented Web pages as a way of charting the topography of faith (Scheitle 2005).

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 24 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion

5. Natural Language Processing

For more than four decades (Stone et al. 1966), social scientists have used computers to analyze written texts, but the recent explosive development of new approaches and online sources of written materials have greatly expanded the opportunities for this kind of work. In computer science today, natural language processing (NLP) refers to a major research area and to numerous software tools for collecting, analyzing, and transforming written text, recordings of spoken language, and even automatic analysis of human gestures based on computer vision techniques (Martin 2004). The best-developed and most useful current methods of value to social scientists of religion involve traditional written text.

All search engines make some use of the text on websites, but they generally just look for the words entered in by the user, and augment this information with non-textual information such as the number of in- coming links to each Web page from other Web pages having similar textual content. One example that goes much further is Clusty (clusty.com), a meta-search-engine that sends the user's query to several independent databases, combines the results, then clusters them in terms of the words that distinguish them from each other.

For example, entering the word "Bible" into Clusty returned 230 websites from seven sources, with somewhat overlapping results:

Ask - Top 82 results retrieved out of 18,740,000 in 0.091 seconds.

Gigablast - No results retrieved in 1.039 seconds.

Live - Top 82 results retrieved out of 83,900,000 in 0.55 seconds.

NY Times - No results retrieved in 0.321 seconds.

Open Directory - Top 82 results retrieved out of 8,510 in 0.325 seconds.

Sponsored Listings - Top 4 results retrieved out of 4 in 0.317 seconds.

Yahoo! News - Top 10 results retrieved out of 25 in 0.772 seconds.

The system clustered these sites into ten major categories: Bible Study (33 sites), King James (32),

Bible Search (16), New Testament (15), Audio (16), Ministry, Church (14), Free Bible (14), Netbible (10),

Pictures (10), and BibleGateway.com (7). While there is nothing surprising in this list, it does suggest how people use the Web to explore the Bible. We see that for English-speaking users, the King James version still

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 25 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion stands out from all the rest. The audio sites let the user listen to recordings of people reading chapters, whereas the picture category includes sites that offer images depicting Bible stories, and Free Bible identifies sites that offer free Bibles, free software to read Bibles on the user's PDA, or in other ways combine the words "free" and

"Bible." Indeed, statistical analysis of the co-presence of pairs of words on websites is one of the main tools used for clustering them (Cilibrasi and Vitanyi 2007). Search Bible sites take the process one step further, by facilitating searching the Bible for desired quotations or topics, and the last BibleGateway.com category is simply the most prominent of these sites, which is especially good for comparing passages across translations and languages.

Clusty, and other systems like it as they develop in the near future, can be used to explore the society's orientations toward a very wide range of religious topics, both technical and esoteric, as well as commonplace.

For example, entering "Christology" classifies 169 websites by keywords thus: Doctrine (21 sites), God (19),

Bible (15), Definition (11), History (11), Trinity (11), Bibliography, University (7), Pictures (7), Course,

Catholic (6), and Incarnation (6). Entering the name of a Canadian new religious movement called "Raelians" clusters 188 sites: Cloning (47 sites), (23), UFO (15), Claude Vorilhon (12), Aliens (14), Blog (13),

Intelligent Design (5), Love (7), Media (8), and UFOland, Raelians Target Las Vegas (5). In fact, the Raelians were founded by Claude Vorilhon, believe that aliens have brought the truth in UFOs, stress love and seek to clone human beings, all topics identified by Clusty automatically. The last category refers to five sites that report the group's latest activity, and one of them says: " The Raelian Movement is announcing plans to build a

UFOland in Las Vegas where visitors can attend a Happiness Academy and see a full-size replica of a UFO."x

Serious research of this kind would want to use large corpora of data, which might or might not be available over Internet, with a professional and well-defined software system for clustering texts on the basis of the words in them. In fact, many different NLP text analysis systems have been made available over the past decade, incorporating a range of algorithms, so the choices are rather daunting for an unassisted social scientist

(e.g. Landauer et al. 1998). In addition, some of the best systems lack conventional user interfaces and require customization before being used on a particular project, so collaboration with a specialist in NLP may be necessary.

Years ago, developers of computer technology to handle language were optimistic they could duplicate the nuances of human communication by building grammatical rules and narrative structures into their programs.

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 26 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion

An example concerning religion was the remarkable 1980 paper, "A Formal Grammar of Expressiveness for

Sacred Legends" (Dreizin 1980), which asserted: "The best way for a researcher to present his knowledge of folklore is to demonstrate the ability to construct at least rough approximations of folk stories." This means that a researcher who really understands religious parables and stories should be able to write a computer program to generate realistic ones automatically. NLP researchers have backed off from this level of hubris in recent years, yet it remains an interesting goal for the future. Perhaps unfortunately the success of purely statistical approaches to analysis of text, often using the "bag of words" approach that totally ignores grammar and narrative structure, has tended to overshadow more sophisticated approaches.

Two more recent papers use religious examples to illustrate the possibility for research at an intermediary level of linguistic sophistication, seeing analogies or correspondences in the clustering of words generated by different religions. Tony Veale (2003: 137) pointed out the value of thinking — and computing — in terms of analogies:

Whereas a conventional thesaurus is indexed on a single probe word, analogical queries require both a

source and a target term, to permit a mapping between two domains to be constructed. Thus, instead of

a simple query "church" or "bible," one can pose much more specific queries like "Muslim church"

(mosque), "Hindu bible" (the Vedas), "Celtic Ares" (Morrigan) or "Jewish German" (Yiddish).

Semantic precision thus takes on a very different complexion when analogy is involved: though

"mosque" and "synagogue" are not even near-synonyms, one can say that each forms a perfect

correspondence with the other in the analogy of a "Muslim synagogue." Thus, one should differentiate

between semantic precision (the basis of synonymy), and analogical precision (the basis of analogy and

metaphor).

Veale's empirical analyses identified analogies across the deities in ancient Greek, Roman, Hindu,

Norse, and Celtic religions, which of course may reflect their common Indo-European cultural roots. A comparable study by Marx et al. (2002), clustering in terms of the co-presence of common keywords, identified themes addressed by both Buddhism and Christianity, in such areas as scripture and theology, sin and suffering,

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 27 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion characteristics of the religion's founder, philosophical concepts, and customs and rituals. A second analysis, again based just on statistical analysis of word usage, compared Buddhism with Islam.

6. Virtual Worlds

A very large number of new kinds of communication over Internet express the personalities or public persona of millions of people, which often include religious elements. Early research is looking at social networking sites such as Facebook and MySpace, and the recent Twitter fad is concentrating research efforts on the broader category of text messaging. Perhaps the richest of these online social environments, although not yet the most popular, is virtual worlds (Bainbridge 2007c). While definitions have not yet stabilized, these are generally taken to be online computerized environments, visually similar to the real world, in which each individual is represented by an avatar, and avatars can interact in complex and somewhat creative ways. Note that avatar itself is an originally religious term.

Roughly speaking, there are two kinds of virtual worlds, those that came from a tradition of computer games and those that did not, although leading examples of both kinds are so complex that the word "game" no longer really applies. Second Life is the best-known non-game virtual world, and World of Warcraft is the best known one marketed as a game. Some, such as Entropia Universe, fall between the two categories. All of the leading ones provide wide scope for social interaction, allow users to create (or at least assemble) their own virtual social groups and objects, and have some educational potential.

Much of the best research to date has been qualitative, often from a humanities or "games studies" perspective, but quantitative studies have begun, such as the EverQuest II research described above. I begin here with qualitative descriptions, partly to set the stage for quantitative methods, but also because I see a major role for qualitative research in this area. As a new cultural form, virtual worlds need to be studied as innovations in their own right, each with its own distinctive characteristics. In addition, they often depict innovative, exotic, ancient, or fantasy religions, whose theology and symbolism demand intensive qualitative analysis.

Second Life is a tool-rich online environment in which users can create their own objects, including full- scale architecture, and then experience and manipulate them through their avatars. For example, Vassar College has created a large island in Second Life, duplicating part of its actual campus but including a full-scale replica of the Sistine Chapel complete with copies of Michelangelo's artwork across the arched ceiling. The implications

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 28 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion for the social science of religion are suggested in the agreement users' avatars are required to accept upon entry:

"Visiting the Sistine Chapel creates a deeply moving experience for many people for a variety of reasons, including religious, artistic and educational. To preserve this same experience for those visiting the Sistine

Chapel in Second Life, we expect all visitors to conduct themselves here as they would in real life: with respect for the environment as well as for those visiting the environment."

Figure 5 shows a much larger Second Life replica of a religious architectural site, the of Saint

Francis of Assisi. An avatar can walk through this huge assembly of buildings, sit on a pew in the chapel to pray, and wander through authentic hallways and stairs. At a vast Islamic site, Al Andalus Alhambra, one may take an actual 23-minute magic carpet ride over a small city, dominated by a huge mosque. A somewhat more modest replica of is hemmed in by a virtual business district, where one may purchase a variety of virtual goods, as is also the case for a memorial for the World Trade Center where the twin towers are as translucent as the proverbial ghosts.

Figure 5: The Basilica of Saint Francis of Assisi in Second Life

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 29 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion

It is my impression that many examples of religious architecture in Second Life were educational or commercial design projects, perhaps with some cultural intent but not intended or used for real religious services. A vast number of small groups meet regularly in Second Life, however, and some of them religious or spiritual. The Anglican Cathedral of Second Life on Epiphany Island holds regular worship services, and regular meditations are held on Osho Island which belongs to what used to be called the Rajneesh Movement. Clearly, interviews or participant observation are appropriate ethnographic methods for studying these online religious or spiritual groups. As Figure 5 demonstrates, photography is an appropriate method for documenting virtual architecture, and it can also be used to document group activities as the next figure shows.

Figure 6 records a remarkable moment in May 2008, on a virtual mountaintop in World of Warcraft, where participants in a major scientific conference are sharing what might truthfully be called an ecstatic religious experience. I organized the conference, in collaboration with the magazine Science, to show that a gamelike virtual world could be an environment for legitimate scientific and scholarly communication. About

200 people attended, with as many as 120 attendees at each of three plenary sessions. Their avatars were together in virtual space but their physical bodies were strewn from Australia through North America and Europe to

Russia. One result was a book of essays by many authors about a variety of social dimensions of virtual worlds

(Bainbridge in press). The mountaintop, near Crossroads in the Barrens on the Kalimdor continent, holds a virtual memorial to Michel Koiter, a young artist who worked on World of Warcraft, but who died in 2004 just months before it was released.xi The angel standing at the peak of the hill is the same form as the ones that resurrect temporarily "dead" avatars at graveyards in this virtual world (Klastrup 2006). The conference participants marched up the hill, prayed or meditated briefly, then danced in joyous celebration of Koiter's creativity.

Figure 6: A Quasi-Religious Celebration in World of Warcraft

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 30 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion

Figure 7 was taken in the Temple of Mitra in the Aquilonian capital, Tarantia, in the gamelike world,

Age of Conan. The woman in the center is my character, Atlantea, who is just in the process of casting a protective spell over herself. Above her head is a representation of her spiritual essence, the naked upper body of a woman with the head of a snake. She is a Tempest of Set; that is, a priestess of the serpent god Set who rules the weather and creates storms. The man in the foreground is a priest of Mitra, the sun god to whom this temple is devoted. He is not the avatar of another player, although avatars of users who are Priests of Mitra do often visit this temple to receive or complete quest assignments. Rather, he is a non-player character (NPC), operated by simple artificial intelligence (AI) programming. Already at the low levels of intelligence of these AIs,

"interviewing" them can be a valuable method for learning about the culture. Three other human figures in the scene are statues. The largest of these represents the former king of Aquilonia, Numedides, who was deposed and slaughtered by the Cimmerian barbarian warrior, Conan, and now serves as a saint for the followers of

Mitra.

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 31 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion

Figure 7: The Temple of Mitra in Age of Conan

This is a good point to dispel the myth that online virtual worlds are games for teenagers. While many players of the games are young, the median age seems to be around 30, and we would expect that to increase as the technology works its way through the lifespan. Second Life attempts to keep teenagers in their own Teen

Second Life world, and the main world includes an extensive "red light district" where all kinds of virtual erotic experiences take place, and even some users function as prostitutes. Age of Conan includes prostitutes, as well, although they all appear to be NPCs, and many of the quests concern marital infidelity, although they do not directly depict it. One must register with a credit card to enter Age of Conan, and swear that one is an adult.

Whereas public virtual sexual intercourse is the most "adult" thing seen in Second Life, the adult content in Age of Conan tends to consist of severed heads, human entrails, and gore-splattered landscapes.

The Temple of Mitra scene also stresses the importance of the culture behind the gamelike virtual worlds, often called the lore. I know that the statue depicts Numedides only because the very first Conan story,

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 32 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion published by Robert E. Howard in 1932, records Conan's displeasure about the fact the priests had sainted his predecessor.xii Other prominent virtual worlds, such as The Matrix Online and Lord of the Rings Online, are similarly based on existing cultural properties, but many, notably World of Warcraft, are entirely new. The emerging Warcraft culture has been the focus of many recent novels, but there are also extensive online digital libraries devoted to it. Most prominent are WoWWiki which currently boasts fully 76,554 articles about the

Warcraft Universe,xiii and Wowhead which among other things offers detailed descriptions of fully 8,098 quests that can be undertaken in World of Warcraft.xiv The most popular virtual worlds have wikis, user forums with tens of thousands of posts, and myriads of websites established by the thousands of guilds, clans, corporations or other user groups that have been created around them. Research on virtual worlds, therefore, can take advantage of a range of Internet resources that are actually outside these worlds but oriented toward them.

The religious culture inside the theme-oriented or game-like worlds varies considerably. The Matrix

Online depicted a virtual city of the future that was frozen in the year 1999, so some of the neighborhoods incidentally possessed churches, but they were not prominent in the action. In contrast, the vast Amarr Empire in

EVE Online is a theocracy that uses religion to dominate other peoples and to justify harsh treatment of slaves because pain supposedly promotes their spiritual development. World of Warcraft depicts a wide range of religions — both positively and negatively — and many of the users' characters are priests, druids, or shamans.

At the Cathedral of the Light in Stormwind city, human characters can learn about a religion that lacks a god but has an ethic promoting tenacity, respect and compassion. At the Temple of the Moon in Darnassus city, they may learn about the loving moon goddess Elune and the need to protect nature from technology. Most of the many religions of NPCs, to which users cannot belong, are depicted negatively. Among the most extensive and interesting of these is the Scarlet Crusade, which is devoted to destroying the Undead who linger between life and death, in the belief that both life and death are good, but their mixture is an abomination.

The three main competing religions in Age of Conan illustrate the ways in which modern fantasies of ancient religions continue to fascinate people who are influenced by but not devout adherents of Christianity.

Computer games often depict religion, but it is almost never the conventional kind — typically Asian, ancient, or fantasy — and players tend to be less religious than the average (Bainbridge and Bainbridge 2007). Mitra was an actual Indo-Iranian deity, although the architecture in Aquilonia follows Greek and Roman styles. Set was an actual ancient Egyptian deity, and the Stygian nation (across the river Styx) that worships Set has many of the

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 33 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion qualities of ancient Egypt. The third chief god, Crom, belongs to the Cimmerian (i.e. Celtic) barbarians, of whom

Conan himself was the most prominent example. Age of Conan presents Mitra as a god of ethics and hope who did not demand absolute loyalty. In contrast, Set was an exclusive god who offered his followers unusual powers in return for strict loyalty. Crom was an aloof creator god in a brutal society, who wanted nothing to do with humans after they had been born and was contemptuous of any coward who would stoop so low as to pray for help. People who spend much time in Age of Conan will come to take these concepts for granted, and it is hard to say at this early stage in research and in the development of online cultures, what the consequences will be for their general attitude toward religion.

For most purposes, a key research method in virtual worlds will be participant observation, which immediately raises issues of the researcher's self presentation. In Second Life, I use two avatars: (1) Bainbridge

Thespian who makes things and participates in conferences but does not do ethnography, and (2) Interviewer

Wilber who does ethnography and whose public information clearly announces that he is doing research. Clear ethical guidelines for ethnography in virtual worlds do not yet exist, but many researchers believe that the real- world anonymity of users provides significant protection to them in most current virtual worlds. When the aim is to document the culture built into a theme-oriented virtual world like Age of Conan, it is often necessary to create multiple characters, each with its own distinctive characteristics to permit studying one dimension of the world. In this case, I created three characters: the Tempest of Set depicted in Figure 7, a Priest of Mitra, and a

Bear Shaman for the barbarian Cimmerian culture. Complete ethnographic documentation of any of these worlds is a major endeavor. For example, my book about World of Warcraft is based on running twenty-two characters a total of 2,300 hours (Bainbridge in press).

To this point in my ethnographic work inside better than a dozen virtual worlds, I have taken about

40,000 "screenshot" pictures. These pictures are automatically saved in a particular location on the computer's hard disk for each virtual world, automatically have the date and time attached, and can easily be annotated, sorted into subfolders, and edited as desired. Indeed, the first step in doing such work is figuring out how to take screenshots in the particular world, which can be as simple as learning which key to press in most worlds or as complicated as running separate software simultaneously as is required for Entropia Universe. Most of the time, the edges of the screen contain much information displayed by the user interface, and most of my screenshots include it, removing the interface only for special pictures such as the three included here. Whether for data

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 34 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion collection or to produce publishable pictures, much effort is often required to take good screenshots, because one often needs to go through a complex series of actions over minutes or even hours to get in the right position, and occasionally orchestrate events so that the desired scene will play out.

Screenshots are the only way in some worlds to record the text chat — conventionally in the lower left corner of the screen — through which users primarily communicate. In the case of Second Life, one may run word processing software simultaneously, and conveniently paste interesting text from the chat directly into a text document. In World of Warcraft, entering "/chatlog" into the text chat automatically saves the session as a text file. Although text chat remains the main medium of communication, all major virtual worlds today include voice communications, although some users prefer separate voice software, notably TeamSpeak, Ventrillo, and

Skype. Naturally, any speech that can be heard through headphones can be recorded and transcribed later in the conventional manner. Second Life includes a module to make video, and some participants in the 2008 World of

Warcraft conference used separate software to take sound videos of the events.

To conclude this paper, I will illustrate the possibilities for quantitative research in these worlds, using

World of Warcraft. Many of the new online systems have the ability to collect data, and the most advanced virtual worlds have both in-built search engines and the option to extend the functionality of the software by writing mod (modification) or add-on software. World of Warcraft allows users to write programs in a popular scripting language, Lua, so long as the programs do not confer an unfair competitive advantage on computationally sophisticated players. An extensive international modding community has grown up, consisting of amateurs who write programs that run in conjunction with World of Warcraft, and who share and improve their code (Kow and Nardi in press). Some of these programs are very useful for researchers.

Researchers interested in a particular online communication system should explore its capabilities, looking for opportunities to collect data in unexpected ways. For example, World of Warcraft incorporates a number of tools to help players find others to team up with. I just logged in as Tarkas, my Orc warrior, and imagined he was about to go on a quest that required a healer who could protect him as he attacks enemies, and perhaps even resurrect him if he is "killed." The best healers and resurrectors are priests. He entered "/who priest" into the text chat system, and immediately a list appeared on the screen of 16 priests who were online at the moment, using the same Internet server as Tarkas and belonging to the same faction within the game. The output listed their names, their experience levels, their races, and their locations within the virtual world, as well

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 35 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion as the guild affiliations that are their most significant social group memberships. Game researchers centered at the Palo Alto Research Center wrote an add-on program using this /who feature to take automatic censuses of tens of thousands characters online repeatedly over a period of months (Ducheneaut et al. 2006, 2007). They were especially interested in the changing status over time of both individual characters and guilds, and social interactions were central to their research.

Much data about virtual worlds can be obtained outside them, for example in the extensive discussion forums in which players report their experiences and share advice. In addition to WoWWiki and Wowhead, a digital library called the Armory displays several pages of information for each of the millions of characters who have reached level 10 (out of 80) in the experience ladder all characters must ascend. For some of my research, I used auxiliary software called CensusPlus to draw samples of thousands of characters, all those that were online during a particular day selected for sampling in the particular realms of World of Warcraft in which I had characters. I then manually looked up subsamples of these characters in the Armory, saving their pages as XML files then writing a computer program to parse those thousands of files and format them for a spreadsheet, from which they were ported into a statistical analysis program. Here I will offer a simpler example.

Two of my characters belonged to one of the largest user guilds in all of World of Warcraft, the Alea

Iacta Est guild that was created in conjunction with a popular weekly podcast devoted to this virtual world, The

Instance. When I accessed the Armory, it offered extensive data about fully 4,632 AIE members. It would be possible but very difficult to write a crawler program that would automatically download all their data — difficult because of the complexity of decisions about what data to enter where on the page to get back the desired information. However, I discovered that the main page for the guild had some limited information about all the members hidden in the XTML source code. For example, here are the lines for my two characters,

Catullus the level 80 Blood Elf priest and Annihila the level 70 Undead death knight:

One can quickly infer that classId="5" refers to a priest, and classId="6" to a death knight. Gender 0 is male, and 1 is female. Race 10 is Blood Elf and race 5 is Undead. It was a simple matter to copy the 4,632 lines of code into a word processor and make it search for every quotation mark and replace it with ^t which inserts a

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 36 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion tab — a total of 74,112 replacements but they took just a few seconds. Saved as a plain text file, this mass of data could be opened directly into a spreadsheet, where a few useless columns could quickly be deleted, making it a dataset ready for statistical analysis. As Figure 8 shows, female characters are much more likely to be priests than male characters are, 14.0 percent versus 7.6 percent, a finding replicated again and again in World of

Warcraft datasets.

Note that female characters in AIE actually have earned more achievement points and slightly more experience levels on average than male characters, so these virtual women are certainly not half-hearted wimps.

Ranks in guilds are ranged from the guildmaster who is rank 1 down to new members who have ranks of 6 or more in AIE, and female characters are slightly more likely than males to be guild officers. If this were the report of a research study, rather than a methodological paper, we would immediately analyze the female nurturant role in the wider culture, the statistically greater interest of females in religion despite their often lower status in church hierarchies, and consider how those real-world factors may be reflected in the greater likelihood of female World of Warcraft characters to be priests. But for present purposes it is enough to point out that at relatively little effort we were able to assemble a dataset suitable for statistical analysis in the light of theories relevant to the social science of religion.

Figure 8: Gender Comparison of 4,632 Members of Alea Iacta Est

Male Female Percent Priests 7.6% 14.0% Percent Death Knights 13.7% 10.2% Percent Warriors 9.4% 2.8% Mean Achievement Points 533.5 591.4 Mean Experience Level 48.7 49.7 Percent Guild Rank <6 2.6% 4.0% Cases 3321 1311

Conclusion

Several of the methods described above allow researchers to do conventional social science in a new setting. We also see the potential for transforming areas of social science in profound ways. For example, using recommender systems and search engines to cluster religious phenomena and map them conceptually is a form of twenty-first century cultural anthropology. Internet offers direct access to much of modern culture. While one can perform ethnography in these cultures, as I have done in a dozen virtual worlds, one may also do quantitative

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 37 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion studies of the dynamic structure of cultures and subcultures. Empirical studies will have implications for theory.

Given the importance of the concept of culture in studies of religion, and the ability to examine how social interaction intertwines with cultural evolution, these research methods could be exceedingly valuable in the future of social science of religion.

References

Babbie, Earl R. 2004. The Practice of Social Research. Belmont, California: Thomson/Wadsworth.

Bainbridge, William Sims. 2000. "Religious Ethnography on the World Wide Web." Pp. 55-80 in Religion on the Internet, edited by Jeffrey K. Hadden and Douglas E. Cowan. New York: Elsevier.

Bainbridge, William Sims. 2002 . "Validity of Web-Based Surveys. Pp. 51-66 in Computing in the Social Sciences and Humanities, edited by Orville V. Burton. Urbana: University of Illinois Press.

Bainbridge, William Sims. 2003. "Massive Questionnaires for Personality Capture." Social Science Computer Review 21 (3): 267-280.

Bainbridge, William Sims. 2004a. "After the New Age" Journal for the Scientific Study of Religion 43: 381-394.

Bainbridge, William Sims. 2004b. "The Future of the Internet: Cultural and Individual Conceptions," pp. 307- 324 in Society Online: The Internet in Context, edited by Philip N. Howard and Steve Jones. Thousand Oaks, California: Sage.

Bainbridge, William Sims. 2004c. "Hollerith Card." Pp. 326-328 in Berkshire Encyclopedia of Human-Computer Interaction, edited by William Sims Bainbridge. Great Barrington, Massachusetts: Berkshire.

Bainbridge, William Sims. 2004d. "Religion and Science," Futures 36: 1009-1023.

Bainbridge, William Sims. 2005. "Atheism." Interdisciplinary Journal of Research on Religion, http://www.bepress.com/ijrr/vol1/iss1/art2/.

Bainbridge, William Sims. 2007a. Across the Secular Abyss. Lanham, Maryland: Lexington.

Bainbridge, William Sims. 2007b. "Expanding the Use of the Internet in Religious Research." Review of Religious Research 49(1): 7-20.

Bainbridge, William Sims. 2007c. "The Scientific Research Potential of Virtual Worlds," Science, 317 (27 July): 472-476.

Bainbridge, William Sims, and Wilma Alice Bainbridge. 2007b. "Electronic Game Research Methodologies: Studying Religious Implications." Review of Religious Research 49(1): 35-53.

Barnett, Jane Mark Coulson, and Nigel Foreman. In press. "Examining Player Anger in World of Warcraft." In Online Worlds, edited by William Sims Bainbridge. Artington, Guildford, United Kingdom: Springer.

Basu, Chumki, Haym Hirsh, and William Cohen. 1998. "Recommendation as Classification: Using Social and Content-Based Information in Recommendation," Proceedings of the Fifteenth National Conference on Artificial Intelligence, Madison, Wisconsin.

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 38 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion

Belew, Richard K. 2000. Finding Out About: A Cognitive Perspective on Search Engine Technology and the WWW. New York: Cambridge University Press.

Brin, Sergey, and Lawrence Page. 1998. "The Anatomy of a Large-Scale Hypertextual Web Search Engine." Computer Networks 30(1-7): 107-117.

Canny, John. 2002. "Collaborative Filtering with Privacy via Factor Analysis. Pp. 238-245 in Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM.

Christie, Richard, and Florence L. Geis. 1970. Studies in Machiavellianism. New York: Academic Press.

Cilibrasi, Rudi L., and Paul M. B. Vitanyi. 2007. "The Google Similarity Distance." IEEE Transactions on Knowledge and Data Engineering 19: 370-383.

Dillman, Don A. 2002. " Presidential Address: Navigating the Rapids of Change: Some Observations on Survey Methodology in the Early Twenty-First Century." The Public Opinion Quarterly 66: 473-494.

Dreizin, F., A. Shenhar, and H. Bar-Itzhak. 1980. "A Formal Grammar of Expressiveness for Sacred Legends." Pp. 159-166 in Proceedings of the 8th Conference on Computational Linguistics. Morristown, New Jersey: Association for Computational Linguistics.

Ducheneaut, Nicolas, Nick Yee, Eric Nickell, and Robert J. Moore. 2006. "Building an MMO with Mass Appeal: A Look at Gameplay in World of Warcraft." Games and Culture 1:281-317.

Ducheneaut, Nicolas, Nick Yee, Eric Nickell, and Robert J. Moore. 2007. "The Life and Death of Online Gaming Communities: A Look at Guilds in World of Warcraft." Pp. 839-848 in Proceedings of CHI 2007. New York: ACM.

Durkheim, Emile. 1897. Suicide. New York: Free Press [1951].

Glock, Charles Y., and . 1966. Christian Beliefs and Anti-Semitism. New York: Harper and Row.

Goldberg, David, David Nichols, Brian M. Oki, and Douglas Terry. 1992. "Using Collaborative Filtering to Weave an Information Tapestry," Communications of the ACM 35: 61-70.

Hadden, Jeffrey K., and Douglas E. Cowan (eds.). 2000. Religion on the Internet. New York: Elsevier.

Herlocker, Jonathan L., Joseph A. Konstan, Loren G. Terveen, and John T. Riedl. 2004. "Evaluating Collaborative Filtering Recommender Systems." ACM Transactions on Information Systems 22:5-53.

Horiuchi, Yusaku, Kosuke Imai and Naoko Taniguchi. 2007. "Designing and Analyzing Randomized Experiments: Application to a Japanese Election Survey Experiment ." American Journal of Political Science 51: 669-687.

Huh, Searle, and Dmitri Williams. In press. "Dude Looks Like a Lady: Gender Swapping in an Online Game." In Online Worlds, edited by William Sims Bainbridge. Artington, Guildford, United Kingdom: Springer.

Klastrup, Lisbeth. 2006. "Death Matters: Understanding Gameworld Experiences." In Proceedings of the International Conference on Advances in Computer Entertainment Technology (ACE) 2006. New York: ACM.

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 39 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion

Kow, Yong Ming, and Bonnie Nardi. In press. "Culture and Creativity: World of Warcraft Modding in China and the U.S." In Online Worlds, edited by William Sims Bainbridge. Artington, Guildford, United Kingdom: Springer.

Landauer, Thomas K., Peter W. Foltz, and Darrell Laham. 1998. "An Introduction to Latent Semantic Analysis," Discourse Processes 25: 259-284.

Martin, James. 2004. "Natural Language Processing. Pp. 495-501 in Berkshire Encyclopedia of Human- Computer Interaction, edited by William Sims Bainbridge. Great Barrington, Massachusetts: Berkshire.

Marx, Zvika, Ido Dagan, Joachim Buhmann, and Eli Shamir. 2002. "Coupled Clustering: A Method for Detecting Structural Correspondence. Journal of Machine Learning Research 3:747–780.

Morselli, Henry. 1882. Suicide: An Essay on Comparative Moral Statistics. New York: Appleton.

Resnick, Paul, and Hal R. Varian. 1997. "Recommender Systems," Communications of the ACM, 40:56-58.

Scheitle, Christopher P. 2005. "The Social and Symbolic Boundaries of Congregations: An Analysis of Website Links," Interdisciplinary Journal of Research on Religion 1: www.religjournal.com.

Stark, Rodney, and William Sims Bainbridge. 1985. The Future of Religion. Berkeley: University of California Press.

Stone, Philip J., Dexter C. Dumphy, Marshall S. Smith, and Daniel M. Ogilvie. 1966. The General Inquirer: A Computer Approach to Content Analysis. Cambridge, Massachusetts: MIT Press.

Veale, Tony. 2003. "The Analogical Thesaurus." Pp. 137-142 in Proceedings of the 15th Innovative Applications of Artificial Intelligence Conference (IAAI 2003), http://afflatus.ucd.ie/Papers/iaai2003.pdf.

Wagner, Adolph Heinrich Gotthilf. 1864. Die Gesetzmässigkeit in den Scheinbar Willkürlichen Menschlichen Handlungen vom Standpunkte der Statistik. Hamburg, Germany: Boyes und Geisler.

Weber, L. M., Loumakis, A., and Bergman, J. (2003). Who participates and why? Social Science Computer Review, 21(1), 26-42.

Witte, J. C., Amoroso, L. M., & Howard, P. E. N. (2000). Method and representation in Internet-based survey tools: Mobility, community, and cultural identity in Survey2000. Social Science Computer Review, vol. 18, no. 2, pp. 179-195.

Wyche, Susan P., Gillian R. Hayes, Lonnie D. Harvel, and Rebecca E. Grinter. 2006. "Technology in Spiritual Formation: An Exploratory Study of Computer Mediated Religious Communications." Pp. 199-208 in Proceedings of the 2006 20th Anniversary Conference on Computer Supported Cooperative Work. New York: ACM.

Wyche, Susan P., Paul M. Aoki, and Rebecca E. Grinter. 2008. "Re-Placing Faith: Reconsidering the Secular- Religious Use Divide in the United States and Kenya." Pp. 11-20 in Proceedings of CHI 2008. New York: ACM.

Wyche, Susan P., Kelly E. Caine, Benjamin K. Davison, Shwetak N. Patel, Michael Arteaga, and Rebecca E. Grinter. 2009a. "Sacred Imagery in Techno-Spiritual Design." Pp. 55-58 in Proceedings of the 27th International Conference on Human Factors in Computing Systems. New York: ACM.

Wyche, Susan P., and Rebecca E. Grinter. 2009b. "Extraordinary Computing: Religion as a Lens for Reconsidering the Home." Pp. 749-758 in Proceedings of the 27th International Conference on Human Factors in Computing Systems. New York: ACM.

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 40 of 42 ARDA GUIDING PAPER Internet-Based Research in the Social Science of Religion

i http://www.imdb.com/keyword/based-on-the-bible/ ii http://en.wikipedia.org/wiki/The_Last_Temptation_of_Christ_(film) iii www.movielens.org iv http://www.google.com/analytics/ v http://www.wsu.edu:8080/~brians/serious/religion.html vi http://www.google.com/trends vii http://www.scribd.com/doc/13367046/German-Report-on-Scientology-2005 viii http://www.spiegel.de/international/germany/0,1518,522052,00.html ix http://www.archive.org/web/web.php x http://www.lasvegasnow.com/global/Story.asp?s=10478797 xi http://www.sonsofthestorm.com/memorial_twincruiser.html xii Robert E. Howard, "The Phoenix on the Sword," http://gutenberg.net.au/ebooks06/0600811.txt xiii http://www.wowwiki.com/WoWWiki:About xiv http://www.wowhead.com/?quests

COPYRIGHT  ASSOCIATION OF RELIGION DATA ARCHIVES | 41 of 42