Workshop on Questionnaires for Linguistic Description and Typology

Workshop on questionnaires for linguistic description and typology/ Workshop sur les outils d'élicitation pour la linguistique descriptive et la typologie November 9th and 10th, 2017/9 et 10 novembre 2017 Room 628 (6th floor), Olympe de Gouges building University of Paris Diderot, Paris

Program/Programme

Thursday, November 9th /Jeudi 9 novembre

9:30-10 Welcome & introduction Aimée Lahaussois Aimée Lahaussois HTL/CNRS 10-11 Keynote talk Birgit Hellwig Linguistic diversity, empirical data University of Cologne and theory 11-11:30 break/pause 11:30-12 Adapting experimental visual Natalia Cacéres stimuli protocols for wider use University of Oregon 12-12:30 An online questionnaire for Jerzy Gaszewski collecting valency data University of Wroclaw 12:30-14 lunch/déjeuner 14-14:30 Video elicitation of negative Olga Lovick directives in Alaskan Dene First Nations University of Canada languages: reflections on & methodology Siri G. Tuttle Alaska Native Language Center 14:30-15 Storyboards vs. Picture-aided Golsa Nouri-Hosseini Translation: A case study xon the University of Gothenburg typology of comparison & Elizabeth Coppock University of Gothenburg 15-15:30 Automatic construction of a lexical Daria Ryzhova typological questionnaire NRU Higher School of Economics, Moscow & Denis Paperno Loria/CNRS 15:30-16 break/pause 16-16:30 SES and word comprehension: a Scaff Camila touchscreen study LSCP/ENS, EHESS, CNRS, PSL, Univ. Paris Diderot Friday, November 10th/Vendredi 10 novembre

10-10:30 A mixed type of questionnaire for Ksenia Shagal describing participial systems: University of Helsinki designing, testing, polishing 10:30-11 Approaches to questionnaires from Jozina Vander Klok a cross-dialectal perspective: University of Oslo Towards ‘best practices’ & Thomas J. Conners University of Maryland 11-11:30 break/pause 11:30-12 A visual stimulus for eliciting Path of Marine Vuillermet motion: design, use and diffusion DDL/CNRS & Anetta Kopecka DDL/CNRS – Université Lyon 2 12-13 Keynote talk Lila San Roque The Social Cognition Parallax Radboud University & MPI for Interview Corpus Psycholinguistics 13-14 lunch/déjeuner 14-15:30 Discussion

Abstracts

Linguistic diversity, empirical data and theory Birgit Hellwig, University of Cologne

Linguistic fieldwork and questionnaires/stimuli have always gone hand in hand, with already our earliest fieldmethods guides providing sometimes very detailed checklists of questions to be asked in the field. Since then, there have been increasingly sophisticated methodological discussions about the pros and cons of various types of questionnaires/stimuli, their design and their implementation, drawing on the vast experience of fieldworkers around the world. By way of introduction, I first present an overview of the types of questionnaires/stimuli in use. Secondly, I intend to take a step back and reflect on their purposes: why is it that we use them? why do we go through the considerable efforts of designing good questionnaires/stimuli? The answer to this question brings us to linguistic and psycholinguistic theory. Our theories tend to be based on empirical data from a biased sample of well-described languages - not doing justice to the enormous linguistic diversity in the world. As Evans & Levinson (2009: 447) put it, a major challenge of our discipline is to harness this linguistic diversity and "to show how the child's mind can learn and the adult's mind can use, with approximately equal ease, any one of this vast range of alternative systems." It is argued that this challenge should be - and is being - taken up within language documentation and description, and this talk explores the role of questionnaires/stimuli in this enterprise. Adapting experimental visual stimuli protocols for wider use Natalia Cáceres, University of Oregon

Stimuli based studies are presented as being easy to implement with a brief period of familiarization given that, in principle, no specialized knowledge is sought (Majid 2012:56, Hellwig 2006:331). In this paper, I discuss the use of a particular subtype of video stimuli for which fieldworkers have reported that it is precisely the medium and the unfamiliar contents of video stimuli that represent an impediment for their successful use in geographically remote communities. Two categories of visual stimuli-based protocols can be identified: those that are to be used as a discovery tool for the grammatical strategies a language uses and those that are supposed to elicit an effect of a stimulus on language production (as opposed to a simple description). An example of the latter is the Fish Film protocol which tests the hypothesis that main clause grammar in many languages is sensitive to ATTENTION DETECTION, a cognitive process established via 30+ years of cognitive psychology experiments (reviewed in Wright & Ward 2008). This protocol was first tested in English. With almost 100% accuracy, English speakers coded a cued element in the video stimuli as grammatical subject (Tomlin 1995, 1997 and others) with an active sentence when an agent is cued (The red fish ate the blue fish) and a passive sentence when it is the patient that is cued (The red fish was eaten by the blue fish). Similar significant results were reported for 15 other languages with very different grammatical structures in main clauses (e.g. Japanese [Hayashi et al. 2002], Malagasy [Rasolofo 2006], and Burmese [Soe 1999]). These results however have not been replicated with populations of languages traditionally spoken in remote locations, which are not written or taught in school through books. Fieldworkers informally report that this is in part because the stimulus is not culturally appropriate in that there are too many novel elements in it (animated drawings, watching events on a screen) and also because the cue is not adapted to efficiently orienting the speakers’ attention (the cue in the shape of a black arrow tends to be identified as an additional participant in the event). When we find that this kind of protocol does not work out of the context for which it was originally designed, does it mean that we are faced with people whose cognition is different or that we do not know how to get at their cognition using the tools we use with students from western universities in industrialized towns? For the first category of video-stimuli, reports of failed experiences have led to efforts in rendering the contents as culturally neutral as possible to allow data collection in the largest array of languages independently of cultural differences (e.g. the Trajectoire video, Ishibashi et al. 2006). This indeed facilitates identification of the depicted scenes. However, having designed and conducted two different pilot protocols similar to the Fish Film, I have observed that providing cultural familiarity is not sufficient in helping speakers unfamiliar with producing language for non-communicative means to produce a response on demand that could be equated to spontaneous speech influenced by the stimuli. I argue that in order to conduct protocols of the second category, we must modify them so as to neutralize variables that depend on the prior experience of experimental subjects (in absentia communication, being able to accurately identify an event in an animated video, produce a timed response to a stimulus, not to construct sequential narratives or behavioral explanations). Such an effort would help field linguists in obtaining comparable results in the field to those that can be obtained in a lab, ultimately providing the field of linguistics with a more representative sample of languages to feed into theories of human behavior.

An online questionnaire for collecting valency data Jerzy Gaszewski, University of Wroclaw, Poland

The paper presents a questionnaire used for collecting data on valency in a set of languages (in my study the languages of Central Europe). The output of the informants are sentences instantiating valency patterns of individual verbs in the analysed languages. I make use of two modes of data collection that cancel out each other’s methodological weaknesses. Version A provides the informant with an example sentence in the contact language (I have English in this function). The informant translates the sentence into their native language, and matches the parts of the original and the translation as in the example (filled with Hungarian data): original sentence: I thanked those people for their help. translated sentence: Megkööszööntem azöknak az embereknek a segíítseíguöket. element corresponding to ‘thanked’: megköszöntem basic form of this element: köszön element corresponding to ‘those people’: azoknak az embereknek H_dative basic form of this element: ember element corresponding to ‘for their help’: a segíítseíguöket H_accusative basic form of this element: segítség notes: ‘meg-‘ as a prefix also denotes ‘perfect’ tense (black – stimuli for the informant, red – provided data, green – grammatical tagging) The sentences are simple and relate to real-life situations so that informants can imagine the translations being used. A single verbal meaning (‘thank’ above) is represented by at least 2 sentences differing in the lexical content of the argument phrases and tense. The divisions of the sentence are then tagged for grammatical markers (adpositions and cases). Version B provides the informant with a verb in their language (drawing on the same set of verbal meanings). The informant creates example sentences with the verb on their own and translates them into English. These sentences then undergo division according to valency structure and tagging for grammatical markers. The questionnaire is dynamic in several ways. When relevant, informants are able (and encouraged to) provide multiple translations for a single stimulus sentence or verb, which allows variation in valency marking to surface. Furthermore, the exact content of the stimuli can vary. This is in fact necessary in version B, which targets each language separately. Lastly, what I present is a ready questionnaire for a particular project, but the questionnaire’s software allows for easy creation of new versions tailored to other studies of valency. These features are possible because the questionnaire operates as an online application accessed by the informants, which accounts for its limitations too. For example, access to the Internet and literacy in native language are rather trivial requirements in the investigated area, this is not so in all language communities. On the theoretical level, the paper discusses the details of selection of data for the analysis. Valency offers a fine illustration of this general problem. Since any language has hundreds or thousands of valency carriers in its lexicon, making principled choices is absolutely essential. In picking the verbs I combined the objective factor of corpus frequency of verbs and the presence of oblique valency markers (the focus of the study). These criteria were applied independently to a subset of the languages involved in the study and the resulting lists of verbal meanings (which overlapped to a considerable extent) were combined.

Video elicitation of negative directives in Alaskan Dene languages: reflections on methodology Olga Lovick, First Nations University of Canada Siri G. Tuttle, Alaska Native Language Center

In the investigation of positive and negative directives in the Alaskan Dene languages Koyukon, Lower Tanana, and Upper Tanana (Tuttle & Lovick 2014, Lovick & Tuttle 2015), one of the most striking results is that, while positive directives are relatively common, negative ones re extremely rare, yet exhibit a great variety of forms. The form of negative directives seems to depend on several factors, particularly on whether the prohibited act violates social norms. In order to determine whether this is actually the case, and to better understand the variety of forms, more data was required. Initially, we attempted elicitation of phrases such as “don’t chop wood” (which would not violate social norms) and “don’t grab a man around the wrist” (which is considered taboo), but it appeared that speakers defaulted to a simple form which further discussion revealed to be rather impolite. Instead, we created video clips of university students performing activities that are considered taboo and activities that are merely foolish or (mildly) dangerous. We showed these clips to several elders per language and asked them to pretend that the student was their grandchild: How would they advise them? We kept this instruction simple and vague so that elders were able to respond in a manner that they deem appropriate to the situation. In this paper, we want to critically evaluate our approach. In favor of using this methodology is that we were able to collect a variety of both direct and indirect negative directives. Our consultants very much enjoyed this work, responding freely and offering much commentary. Additionally, unlike in textual analysis, we were able to recognize indirect negative directives for what they are. We also know in each instance whether the prohibited action is considered taboo or not. The collected data is thus very rich. There were however also several problems with our methodology. The students in the videos were all in their early 20s and our consultants were of the opinion that people of that age should know better than to behave so inappropriately. (Recruitment of children to act in these videos was not possible for cultural reasons.) But most importantly, just the fact that we tried to elicit negative directives results in “unnatural” data -- to tell someone not to do something is itself an attack of an individual’s positive face (Brown & Levinson 1987; see also Lovick 2016 for more discussion of this issue) and speakers avoid it, preferring non-linguistic cues. Interestingly, the fact that the students were non-native worked in our favor, since white people breaking a taboo is less upsetting than native people doing so. In spite of these caveats, we found this methodology helpful, not to mention fun and engaging. While recording of natural everyday interactions certainly would be preferable, using video stimuli for the elicitation of negative commands yields very rich data in field situations like ours, where the native language is no longer used on a daily basis. Tailoring our videos to the cultural groups we work with lets us ensure that we capture relevant cultural distinctions. Yet even though the “taboo” activities shown in the videos are specific to Northern Dene groups, we believe that the videos could easily be adapted to other cultural groups.

Storyboards vs. Picture-aided Translation: A case study on the typology of comparison Golsa Nouri-Hosseini, University of Gothenburg Elizabeth Coppock, University of Gothenburg

This work (a) presents a novel questionnaire for eliciting comparatives and superlatives of quality and quantity, (b) suggests guidelines for creating visual elicitation stimuli, and (c) reports on a study comparing two visual elicitation methods, storyboards and picture-aided translation, showing that picture-aided translation might work better than storyboards for some purposes. Storyboards are a series of pictures which tell a story, and the participants are invited to tell the story in their native language, based on the pictures. In picture-aided translation, each picture is accompanied by a written sentence, and participants are asked to give translations based on both the picture and the text. Storyboards are advocated by Matthewson (2015), in contrast to direct elicitation (when the context is verbally provided to the participants), since they elicit more natural, spontaneous utterances, minimize the influence of the contact language, and obviate the need for verbal context description, which minimizes the risk of misunderstanding of the context. However, storyboards pose heavy cognitive burdens on the participants’ memory and this can result in discomfort for the participants and failure to elicit the target constructions. Therefore, we conducted a systematic comparison of storyboards and picture-aided translation, to see whether the presence of text makes data elicitation better or worse. In our study, we included two different stories: the ‘What Matters’ story, which we developed, and the ‘Bake-off’ story from Totem Field Storyboards. The ‘What Matters’ story was developed through pilot studies on Swahili, Kagulu, Mixtec, Swedish, Persian, and Arabic. Problematic stimuli were modified after each pilot test. The image shows a sample of the changes.

To compare picture-aided translation with storyboards, we conducted a study on eight Persian speakers; each consultant participated in four tasks, and each data elicitation session took about one hour. We then scored the results along several dimensions, including ‘faithfulness’, which is a measure of success in eliciting the target construction; a sentence was scored as 1 when the target construction was elicited and 0 otherwise. Our results show that picture-aided translation increases faithfulness: on average (per participant), the percentage of sentences faithfully translated increased 20% using picture-aided translation for the ‘What Matters’ story, and 10% for the ‘Bake-off’ story. We received more faithful translations for the ‘Bake-off’ story than the ‘What Matters’ story, possibly due to length of story and sentences, suggesting that storyboards should be kept short and simple. Our results also suggest that participants may perform better if they first see the pictures, and then the text. We did not directly measure naturalness in our study, but there is some evidence that Persian speakers were able to resist the influence of the English text. Feedback received after each data elicitation session indicated that participants generally felt more comfortable when text was present. In addition, participants reported that both picture-aided translation and storyboard tasks felt equally fun. We conclude that picture-aided translation may be suitable for some purposes in semantic fieldwork, such as: a) when the researcher is interested in eliciting one particular kind of structure, and (b) when the target structure (e.g. superlatives) could be replaced in a particular context by other, 'competing' constructions, like a comparative or an intensifier (e.g. 'the tree was very tall'). Furthermore, Picture-aided translations seem better equipped to keep participants focused in their use of a target structure. Finally, we present some practical implications and propose tips for the fieldworkers who intend to use translation elicitation materials in their fieldwork.

Automatic construction of a lexical typological questionnaire Daria Ryzhova, NRU Higher School of Economics, Moscow Denis Paperno, Loria (CNRS)

We propose creating questionnaires for lexical typology (in the spirit of Rakhilina and Reznikova 2016) in an automatized fashion. We evaluate our system on questionnaire creation for 'smooth', 'sharp', 'thick', and 'straight' (object features often but not always expressed by adjectives), and perform a quantitative and qualitative analysis of the results. For lexical typology, the role of questionnaires is even greater than for grammatical typology: there is usually little or no data on lexical distribution in reference grammars, the information given in dictionaries is limited, and existing text corpora are often too small for lexicon analysis. Usually, a researcher starts from a thorough corpus study of languages with a long written tradition, extracts patterns of usage of the target items, and takes them as a basis of a questionnaire for the analysis of low resourced languages. A natural development of this methodology is research on parallel corpora. In this case, the list of all occurrences of the target items can serve as a questionnaire already filled with the data from various languages (Dahl 2007, W.lchli, Cysouw 2012). Despite the advantages of this method, the size of existing parallel corpora prevents them from being a basis for typological research of most lexical items, not to say about lexical interference effects inevitably arising in translated text. We suggest another way to simplify typological work by an automatic algorithm aimed at detecting typologically interesting groups (clusters) of word usages. Our algorithm relies on data from the monolingual, well-balanced Russian National Corpus (www.ruscorpora.ru) and results in a list of context clusters to be translated into other languages (e.g. straight pole/avenue/path vs. direct descendant/predecessor/heritage, see Table 1). We experiment with adjectives expressing qualitative features, taking into account the minimal diagnostic context, namely nouns that can be modified with the target adjective (Rakhilina, Reznikova 2016). Limiting the number of items (context noun + adjective) in a cluster to 3 (last step of the algorithm), we form a reasonably sized questionnaire where, ideally, usages within a cluster should exhibit typologically uniform behavior while usages in different clusters can be lexicalized differently in different languages. Our algorithm consists of the following steps: 1) extracting a list of frequent phrases, or bigrams, of the form “adjective + noun”; 2) computing a co-occurrence-based vector representation for every noun phrase; 3) clustering the vector space; 4) eliminating all clusters containing less than three elements, and extracting three core elements from the bigger ones. This algorithm allows revealing semantic oppositions that indeed are typologically relevant. For example, many languages distinguish lexically ‘sharp edges (e.g. knives)’ and ‘sharp points (e.g. arrows)’, having two distinct adjectives with the meaning ‘sharp’: one for the first sense, another for the second one (compare tranchant vs. pointu in French). There is no such distinction in Russian; still, Russsian noun phrases illustrating these context types fall into two different clusters (ostryj nož ‘sharp knife’, ostryj nožik ‘sharp little knife’, ostroje lezvije ‘sharp blade’ vs. ostraja strela ‘sharp arrow’, ostroje kop’ë ‘sharp spear’, ostryj kamen’ ‘sharp stone’). A more strict quantitative evaluation of the resulting questionnaires confirms the viability of our method. We tested our methodology on four datasets that have already received a typological description: adjectives from the semantic domains ‘sharp’, ‘straight’, ‘smooth’ and ‘thick’ (see Rakhilina, Reznikova forthcoming). We compared the automatically constructed questionnaires to manual typological annotations of the same contexts and computed recall (the proportion of typologically relevant context types presented in the final version of the questionnaire) and precision (in our case, the homogeneity of the resulting groups of examples) for every dataset. The recall metric fluctuates between 0.73 and 1, while precision achieves the values from 0.68 to 0.88, for different datasets.

Table 1. Fragment of an automatically constructed questionnaire for the field ‘straight’

Table 2. Fragment of an automatically constructed questionnaire for the field ‘sharp’

SES and word comprehension: a touchscreen study Scaff, Camila, LSCP (ENS/EHESS/CNRS/PSL/Paris Diderot) & Florencia Alam, Alejandrina Cristia, Celia Renata Rosemberg

There are different ways in research to assess the language skills of a child during her early development. There are two main lines when it comes to study word comprehension. On the one hand, vocabulary assessment questionnaires administered to the child’s parents or primary caregiver, the most popular being the MacArthur-Bates Communicative Developmental Inventory, or CDI (Fenson et al., 1994). This tool is a standardized measure, available in multiple languages, simple to administer, and includes all lexical categories. However, there are several drawbacks, involving artefacts of the measure (e.g., the fact that the lists are not comprehensive but merely representative, and designed to measure individual variation through a sampling of the vocabulary), and that parents may be inaccurate in their reports. Therefore, it is preferable to measure word comprehension more directly, thus avoiding a potential link through caregivers’ intuitions. On the other hand, other strands of research have attempted to develop more direct measures of word knowledge from the child him/herself. A range of behavioral responses have been collected from the children, with some studies using gaze shifting between two visual referents and others using some overt decision sign, such as pointing or touching the referent. Among the main drawbacks of these studies involves their small sample sizes and their limited access to more varied populations. A recent study compared all three techniques (looking, pointing, touching) with the same materials, and concluded that touch-screen vocabulary tests may be best technique (Frank et al., 2016). Popularity for touchscreen devices and their quotidianity have created interest in the potential use of these devices as scientific tools. Specially when it comes to reach and assess performances of usually neglected communities such as children from low socioeconomic households. In order to focus on the impact of socioeconomic differences within cultures we developed a study built for a touchscreen device assessing word comprehension from two different countries, namely Argentina and France. A force paired choice task, inspired by the Computerized Comprehension Task (Friend& Keplinger, 2003), containing 41 pairs of words with differents levels of difficulty and containing three different lexical categories (nouns, adjectives and verbs) illustrated by pictures was administered at the daycares of the two countries.The lexical items were chosen controlling for familiarity in Argentina to avoid bias between both SES groups. The test was then translated and adapted in French. The tablet records two measures for word comprehension: accuracy and reaction time. We assess French 2- to 3-year-olds (NΩ117) in three daycares in the south of Paris, a very cosmopolite area. Parents of participating children completed a questionnaire with sociodemographic information about the child and her previous exposition to touchscreen devices. An analysis of variance showed no significant differences on children's performances in neither of the two measures as a function of habituation to a touchscreen device. Results showed significant differences between the two groups regarding accuracy but not on word processing measures. In Argentina and France, socioeconomic status came as a significant variable only for accuracy. With performances of children from lower socioeconomic status having lower scores than their counterparts. Age was a significant variable explaining variance for both measures for both countries. Ultimately and in accordance to recent work (Frank et al., 2016), we acknowledge the benefits of using a touchscreen device for assessing word comprehension in young children. This technology allows access to remote populations, test more in more ecological conditions and recruitment of more participants given the handiness of the portable device.

A mixed type of questionnaire for describing participial systems: designing, testing, polishing Ksenia Shagal, University of Helsinki

This paper presents a questionnaire for describing individual participles or the entire participial paradigm in a given language. From a cross-linguistic perspective, participles can be defined as non-finite verb forms employed for adnominal modification, e.g. the form written in the book [written by my supervisor]. The list of linguistic parameters that I am taking into account in the questionnaire is based on my doctoral dissertation on the typology of participles (Shagal 2017) and involves both morphological and syntactic properties of participial forms. Typologically oriented questionnaires for describing particular language phenomena typically belong to one of the two major types. Questionnaires of the first type are intended for linguists working on individual languages, and they consist of questions about the grammar of the language, answering which presupposes certain linguistic analysis (cf. Malchukov & Haspelmath & Comrie 2010, Miestamo & Tamm & Wagner- Nagy 2015). Questionnaires of the second type are based on stimuli sentences that can in principle be translated by any native speaker (cf. Dahl 1985). In my project, the idea is to combine these two types and create a theoretically oriented questionnaire for a field linguist, which, on the other hand, will be supplemented by a set of actual sentences to be used during field work. In the case of participles, this is especially important, since the linguist often has to deal with complicated structures that are hard to elicit. In addition, the data collected using this type of questionnaire is more verifiable and comparable. The recommended stimuli sentences come from my own extensive fieldwork on participles and relative clauses in individual languages (Kalmyk, Nanai, Uilta, and Nivkh) since 2006. The questionnaire is going to be tested in August 2017 on a field trip to the Mari El district, Russia. The target language of the trip is Hill Mari, a Uralic language with a rich and well-developed participial system, where participial forms can perform a variety of functions. During the field trip, I will be using the first version of the questionnaire to describe the participial paradigm of the language with which I have no previous fieldwork experience. Together with undergraduate students participating in the trip, I will try to develop a questionnaire that would be as user-friendly as possible, which will ensure that it can later be used by researchers and enthusiasts with different linguistic background. In my talk, I am going to report on the process of designing, testing and polishing the questionnaire, and discuss the potential problems related to balancing the two approaches, as well as the benefits of the mixed approach.

Approaches to questionnaires from a cross-dialectal perspective: Towards ‘best practices’ Jozina Vander Klok, University of Oslo Thomas J. Conners, CASL, University of Maryland

In this paper, we evaluate the conditions and challenges of implementing questionnaires crossdialectally based on two case studies in Javanese (Austronesian). From these experiences, we put forward initial best practices for implementing questionnaires cross-dialectally. Questionnaires are defined in this paper as a list of questions that a number of participants provide an answer to; the exact implementation is left open (i.e., rating task; fill-in-the-blank; (semi-)forced choice). Spoken primarily on Java, Indonesia by over 90 million people, Javanese highly varies across dialects across all areas of the grammar (e.g. Suwadji 1981; Hatley 1984; Hoogervorst 2010) but remains underdocumented and understudied. The first case study concerns a questionnaire on modality (Vander Klok 2013, 2014), which was conducted on two East Javanese varieties; one spoken in the village of Paciran and the other spoken in the city of Malang. This was implemented as a rating task and a semi-forced choice task in Paciran, and as elicitation in Malang. Since the questionnaire investigated the semantics of modality, we included contexts for felicity judgments, with 41 items (33 target and 8 fillers). The second case study concerns a questionnaire on yes-no question strategies (Vander Klok, Ahsanah & Sayekti 2017), conducted on one Central Javanese variety as spoken in the city of Semarang, and three villages on the north shore in East Java: Montong, Weru, and Blimbing. This questionnaire was implemented as a rating task across all four varieties with 70 or more items (no contexts; no fillers). Questionnaires were identified as an advantageous field method tool for Javanese since it is fairly easy to run numerous participants and therefore have a larger sample size. One of the main issues we found using questionnaires cross-dialectally was in maximizing direct comparison while still allowing for (lexical/phonological/morphosyntactic) variability across dialects. For instance, the yes-no question questionnaire had 88 items for Semarang (but 70 for the other dialects) since only Semarang uses the particle ndak as a strategy. Another example is that some items were rated low for independent reasons in some dialects, such as a proper name sounding strange. Another major issue concerned the prevalence of Standard Javanese in written form; Javanese is not codified, but the variety spoken in the courtly centers of Yogyakarta and Surakarta/Solo (‘Standard Javanese’) is what is taught in schools. Our goal was to study specific varieties, and therefore presented the target items in the Javanese variety under discussion with representative phonological differences, but some items were rated low by some participants due to ‘divergent’ spellings. We also were faced with how to identify participants of a dialect or variety, especially for the city setting. We took a very broad approach, and targeted speakers who grew up and still lived in the same place (regardless of their parents background, but this information was still recorded as metadata). Based on the above challenges, we propose initial best practices for implementing questionnaires cross-dialectally. While most of these points are pertinent for implementing questionnaires in general as well, we include these issues as they were particularly important based on our case studies. - Emphasis should be on the developmental stages of the questionnaire, specifically in developing the items as well as a mandatory pilot test as an elicitation task. Approaches to questionnaires from a cross-dialectal perspective: Towards ‘best practices’ - Ideally the same person should conduct the questionnaire across different varieties for consistency, but also have a contact person of each variety. If this is not possible, open and direct contact between those conducting the questionnaire is imperative for timely feedback. - Instructions should emphasize the interest of the variety under discussion. - A window for feedback from the participants should be available in the questionnaire, such as final comments or space to offer alternative target items. - In running the questionnaire, detailed meta-data should be collected from all participants including place/neighbourhood raised and language use information.

A visual stimulus for eliciting Path of motion: design, use and diffusion Marine Vuillermet, DDL (CNRS) & Anetta Kopecka, DDL (CNRS/Université Lyon 2)

Our presentation aims at contributing to a guide of best practices on the design and practical use of visual stimuli in geographically different linguistic field sites. For this purpose, we present a case study based on our experience in creating a visual stimulus for eliciting the expression of Path of motion in typologically and genetically varied languages. Our visual stimulus, DVD Trajectoire (Ishibashi, Kopecka & Vuillermet 2006), has been designed within the framework of a research program named “Trajectoire” supported by the Fedération de Typologie et Universaux Linguistiques (CNRS, France). The main goal was to develop a methodological tool for eliciting cross-linguistically comparable data in order to 1) explore the morphosyntactic tools used by the speakers of typologically diverse languages to express Path of motion, and 2) investigate the asymmetry in the expression of Source (initial part) and Goal (final part) of motion (see e.g. Ikegami 1987; Bourdin 1997; Kopecka & Ishibashi 2011; Kopecka & Narasimhan 2012). Inspired by research methods developed at the Max Planck Institute for Psycholinguistics (e.g. Cut & Break (Bohnemeyer, Bowerman & Brown 2001), Put & Take (Bowerman et al. 2001), Reciprocal Constructions (Evans et al. 2001)), the DVD Trajectoire stimulus comprises 76 videos clips including 2 training clips, 55 target clips and 19 fillers, and it includes 3 distinct versions so as to minimize possible routine effects. The 55 target clips were designed varying several parameters, namely the Figure (e.g. individual vs. group of people), the Ground (e.g. objects vs. locations), the Path (initial, median, and final), the Deixis (towards vs. away from the deictic center), and, less systematically, the Manner-of-motion (e.g. walking vs. running). As the stimulus was designed to be used in geographically and culturally very diverse areas, we made sure that it was “ecological”. Thus, the scenes were filmed in outdoor natural environment (e.g. lake, sea, forest, meadow, cave, etc.), and this seems to have guaranteed its good accessibility to “non-WEIRD” populations (Henrich, Heine & Norenzayan 2010; Majid & Levinson 2010). So far, the stimulus has been used in a great diversity of languages, both understudied and well-known languages. However, the diffusion of the stimulus has not been well controlled up to now. Despite the fact that we have elaborated a questionnaire to have feedback from researchers regarding the use of the Trajectoire material and to share experiences, keeping track of the users and field sites has not been an easy task. We are thus currently considering TULQUEST online as the most adequate venue for diffusing the visual material and the various documents accompanying it. The Social Cognition Parallax Interview Corpus Lila San Roque, Radboud University & MPI for Psycholinguistics Danielle Barth, Australian National University (ANU) Nicholas Evans, Australian National University (ANU)

Tasks that involve constructing or re-telling narratives have a venerable and long- standing presence within the typology of non-linguistic stimuli for language research. Within the last few decades, stimuli-based elicitation that involve dialogue and negotiation have also demonstrated the invaluable role they can play in understanding a language, for example, in regard to the complex semantics of ‘situated’ domains such as spatial reference and deixis. What about tasks that combine these elements? The Social Cognition Parallax Interview Corpus (SCOPIC) is built from a task involving a set of pictures to be collaboratively arranged as a story by two or more speakers. The project aims to develop data that are structurally and narratalogically complex, rich in affect, and support interaction, participant involvement, and spontaneous expression, while maintaining some degree of comparability across diverse languages. We describe the background to SCOPIC and the main aims of the project, including some practical and theoretical successes and pitfalls that we have encountered in our collaborative involvement with more than 30 languages. We focus on reported speech (and related phenomena) as an example of a domain for comparison in our data. The precise structures that can participate in reported speech and thought may be very different across languages, requiring a coding scheme that can capture diverse formal expression without losing sight of functionally coherent categories (e.g., direct versus indirect speech reports). In regard to the nature of the task itself, we found that while specific elements of the stimuli (the use of speech and thought bubbles in pictures) targeted the domain of reported speech and thought, these were not the only predictors for when speech reports occurred in the data. Overall, we see a strong tendency to represent the speech and thought or emotion of characters in the central story, so that our preliminary results suggest the possible prominence of this narrative and affective strategy (itself rich in socially cognitive implications), but in different forms, across diverse languages and cultures. Collaboratively informed development of parallax, stimuli-based corpus data can support qualitative and quantitative insights into lexical and grammatical structuration and practices of language use, from both language- specific and cross-linguistic perspectives.