Neuroinformatics.Pdf

Total Page:16

File Type:pdf, Size:1020Kb

Neuroinformatics.Pdf 1 23 Your article is protected by copyright and all rights are held exclusively by Springer Science +Business Media New York. This e-offprint is for personal use only and shall not be self- archived in electronic repositories. If you wish to self-archive your article, please use the accepted manuscript version for posting on your own website. You may further deposit the accepted manuscript version in any repository, provided it is only made publicly available 12 months after official publication or later and provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer's website. The link must be accompanied by the following text: "The final publication is available at link.springer.com”. 1 23 Author's personal copy Neuroinform DOI 10.1007/s12021-013-9210-5 ORIGINAL ARTICLE Action and Language Mechanisms in the Brain: Data, Models and Neuroinformatics Michael A. Arbib & James J. Bonaiuto & Ina Bornkessel-Schlesewsky & David Kemmerer & Brian MacWhinney & Finn Årup Nielsen & Erhan Oztop # Springer Science+Business Media New York 2013 Abstract We assess the challenges of studying action and Databasing empirical data . Federation of databases . language mechanisms in the brain, both singly and in relation Collaboratory workspaces to each other to provide a novel perspective on neuroinformatics, integrating the development of databases for encoding – separately or together – neurocomputational models and Bridging the Gap Between Models and Experiments empirical data that serve systems and cognitive neuroscience. The present article offers a perspective on the neuroinformatics Keywords Linking models and experiments . Models, challenges of linking neuroscience data with models of the neurocomputational . Action and the brain . Language and the neural and other interactions that yield those data. We will brain . Mirrorsystems . Multi-level data . Multi-level models . focus on cognitive and systems neuroscience. The perspective is particularly informed by consideration of the brain M. A. Arbib (*) mechanisms underlying action and language and emerged Computer Science and Neuroscience Graduate Program, University from discussions held as part of the Workshop on “Action, of Southern California, Los Angeles, CA, USA Language and Neuroinformatics” held in July of 2011 (see the e-mail: [email protected] section “Acknowledgement, Discussion Groups and Workshop J. J. Bonaiuto Participants” for further details). The Appendix provides a Division of Biology, California Institute of Technology, Pasadena, Tabulation of Present and Future Resources relevant to the CA, USA issues discussed in this article. e-mail: [email protected] I. Bornkessel-Schlesewsky What is a Model? Neurolinguistics, University of Marburg, Marburg, Germany e-mail: [email protected] For one researcher, a model may be a technique for marshaling D. Kemmerer diverse data into a coherent format, whereas for others a model Speech, Language, & Hearing Sciences and Psychological Sciences, (conceptual or computational) provides an account of how Purdue University, West Lafayette, IN, USA interactions of entities within the brain mediate between inputs, e-mail: [email protected] internal states and outputs. We refer to the former as “data B. MacWhinney models” and the latter as “processing models.” Clearly, the Psychology, Computational Linguistics, and Modern Languages, result of data modeling is crucial to any specification of what it Carnegie Mellon University, Pittsburgh, PA, USA is that a processing model has to explain. If the word “model” e-mail: [email protected] is used in what follows without a qualifying adjective or F. Å. Nielsen context, it will be in the sense of a “processing model”. Technical University of Denmark, Copenhagen, Denmark There is a third type of model – not a model of data, not a e-mail: [email protected] model of the brain, but a model in the brain. This idea goes back, at least, to Craik (1943). It relates both to the general E. Oztop Ozyegin University, Istanbul, Turkey notion of perceptual schemas and motor schemas (Arbib e-mail: [email protected] 1981), the control theory concepts of feedback and Author's personal copy Neuroinform feedforward, and the notion of forward and inverse models of this general framework. BODB requires that SEDs associated a system (Wolpert and Kawato 1998). Thus, in modeling with a model be divided (at least) into those which are used to neural mechanisms for action and action recognition, design the model, and those used to test the model. For processing models of how the brain employs forward and computational models, the latter SEDs are compared with inverse models may play a crucial role (Oztop et al. 2013; summaries of simulation results (SSRs) obtained using the Oztop et al. 2005). model. If there are summary data, then somewhere there are Fostering Modeler-Experimentalist Collaboration unsummarized data. Often such data are only available, if at all, in Supplementary Material for a published paper, or on a Even among experimentalists who have rich interactions with laboratory Website. This raises two challenges: The modelers, few make explicit what challenges – whether at the development of further Websites for the integration of level of explaining specific data or in search of a conceptual unsummarized data and the linkage of summarized data framework – they want modelers to address, and few will (perhaps in another Website) to the data they summarize. For adjust their agenda to test novel predictions of models. Thus a example, the huge volumes of data associated with each challenge of particular interest here is to chart how individual fMRI scan in one comparison for a specific neuroinformatics could provide means to deepen these experimental-control condition comparison may be interactions. Among the relevant issues are understanding summarized into brain imaging tables which might aggregate how to summarize data sets into a form for which modeling multiple scans for that comparison into a single table, losing is appropriate, and appreciating the value of models which do detail but, hopefully, gaining conceptual clarity in the process. not fit data but do provide fresh insights – this in addition to BrainMap (Laird et al. 2005), brainmap.org, provides the the more obvious appreciation of those which do so – while classic repository for such brain imaging data, and Laird avoiding something like the “epicycles” used to adjust the et al. (2009) discuss the potential analyses that are possible orbits of Ptolemaic astronomy, i.e., without introducing ad hoc using the BrainMap database and coordinate-based ALE mechanisms whose only raison d’etre is to explain a very (activation likelihood estimation) meta-analyses, along with limited data set. some examples of how these tools can be applied to create a Arbib and colleagues (Arbib et al. 2014b)argueforindexing probabilistic atlas and an ontological system describing models not only with respect to brain structures (e.g., a model of function-structure correspondences. Nielsen (2013) focuses circuits in basal ganglia and prefrontal cortex) but also with on neuroinformatics tools of the BrainMap-based Brede Wiki, respect to brain operating principles (BOPs) which provide http://neuro.imm.dtu.dk/wiki/, which support federation and general mechanisms (such as reinforcement learning, winner- interaction of brain imaging data. (See the section on take-all, feedforward-feedback coupling, etc.) which may be “Federation” below for more on the general issues concerning employed in analyzing the roles of very different brain regions federation.). in diverse behaviors. Moreover, they argue that each model A range of brain models has been documented in the should be associated with summaries of empirical data (SEDs) ModelDB database, http://senselab.med.yale.edu/modeldb/. defined at the granularity of the model. There are at least two Model DB and BODB seem good targets for federation. The problems here. (a) Even if experimentalists make clear the exact former provides documented code for each model, with methodology used to extract data and process it – as in the instructions on how to run it to get published results; the framework offered by Lohrey et al. (2009)tointegrateanobject latter links each model to the summaries of experimental model, research methods (workflows), the capture of data used to design and test it, and provides tools to retrieve experimental data sets and the provenance of those data sets and compare models which address related sets of data. While for fMRI – the problem remains of integrating data gathered BODB tends to focus on systems and cognitive neuroscience with different protocols into a meaningful challenge for models “up a level” from many of the models in ModelDB, modeling. (b) If model A explains view A of data set D, while Bohland et al. (2013) have focused on the challenges of model B explains view B of D; and yet the models are different, linking data at multiple levels from genes and gene expression how may one build on them to more fully address aspects of D that guide brain developmentandprovidethemolecular revealed in the combination of the 2 views? For example, one signatures of populations of neurons to the architecture of model might be successful if it can explain the averaged functional brain subsystems. How can modelers begin to build responses of a brain region to a key set of stimuli rather than systems that span the layers between information
Recommended publications
  • Talk Bank: a Multimodal Database of Communicative Interaction
    Talk Bank: A Multimodal Database of Communicative Interaction 1. Overview The ongoing growth in computer power and connectivity has led to dramatic changes in the methodology of science and engineering. By stimulating fundamental theoretical discoveries in the analysis of semistructured data, we can to extend these methodological advances to the social and behavioral sciences. Specifically, we propose the construction of a major new tool for the social sciences, called TalkBank. The goal of TalkBank is the creation of a distributed, web- based data archiving system for transcribed video and audio data on communicative interactions. We will develop an XML-based annotation framework called Codon to serve as the formal specification for data in TalkBank. Tools will be created for the entry of new and existing data into the Codon format; transcriptions will be linked to speech and video; and there will be extensive support for collaborative commentary from competing perspectives. The TalkBank project will establish a framework that will facilitate the development of a distributed system of allied databases based on a common set of computational tools. Instead of attempting to impose a single uniform standard for coding and annotation, we will promote annotational pluralism within the framework of the abstraction layer provided by Codon. This representation will use labeled acyclic digraphs to support translation between the various annotation systems required for specific sub-disciplines. There will be no attempt to promote any single annotation scheme over others. Instead, by promoting comparison and translation between schemes, we will allow individual users to select the custom annotation scheme most appropriate for their purposes.
    [Show full text]
  • Child Language
    ABSTRACTS 14TH INTERNATIONAL CONGRESS FOR THE STUDY OF CHILD LANGUAGE IN LYON, IASCL FRANCE 2017 WELCOME JULY, 17TH21ST 2017 SPECIAL THANKS TO - 2 - SUMMARY Plenary Day 1 4 Day 2 5 Day 3 53 Day 4 101 Day 5 146 WELCOME! Symposia Day 2 6 Day 3 54 Day 4 102 Day 5 147 Poster Day 2 189 Day 3 239 Day 4 295 - 3 - TH DAY MONDAY, 17 1 18:00-19:00, GRAND AMPHI PLENARY TALK Bottom-up and top-down information in infants’ early language acquisition Sharon Peperkamp Laboratoire de Sciences Cognitives et Psycholinguistique, Paris, France Decades of research have shown that before they pronounce their first words, infants acquire much of the sound structure of their native language, while also developing word segmentation skills and starting to build a lexicon. The rapidity of this acquisition is intriguing, and the underlying learning mechanisms are still largely unknown. Drawing on both experimental and modeling work, I will review recent research in this domain and illustrate specifically how both bottom-up and top-down cues contribute to infants’ acquisition of phonetic cat- egories and phonological rules. - 4 - TH DAY TUESDAY, 18 2 9:00-10:00, GRAND AMPHI PLENARY TALK What do the hands tell us about lan- guage development? Insights from de- velopment of speech, gesture and sign across languages Asli Ozyurek Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands Most research and theory on language development focus on children’s spoken utterances. However language development starting with the first words of children is multimodal. Speaking children produce gestures ac- companying and complementing their spoken utterances in meaningful ways through pointing or iconic ges- tures.
    [Show full text]
  • Multimedia Corpora (Media Encoding and Annotation) (Thomas Schmidt, Kjell Elenius, Paul Trilsbeek)
    Multimedia Corpora (Media encoding and annotation) (Thomas Schmidt, Kjell Elenius, Paul Trilsbeek) Draft submitted to CLARIN WG 5.7. as input to CLARIN deliverable D5.C­3 “Interoperability and Standards” [http://www.clarin.eu/system/files/clarin­deliverable­D5C3_v1_5­finaldraft.pdf] Table of Contents 1 General distinctions / terminology................................................................................................................................... 1 1.1 Different types of multimedia corpora: spoken language vs. speech vs. phonetic vs. multimodal corpora vs. sign language corpora......................................................................................................................................................... 1 1.2 Media encoding vs. Media annotation................................................................................................................... 3 1.3 Data models/file formats vs. Transcription systems/conventions.......................................................................... 3 1.4 Transcription vs. Annotation / Coding vs. Metadata ............................................................................................. 3 2 Media encoding ............................................................................................................................................................... 5 2.1 Audio encoding ..................................................................................................................................................... 5 2.2
    [Show full text]
  • Conference Abstracts
    EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION Held under the Patronage of Ms Neelie Kroes, Vice-President of the European Commission, Digital Agenda Commissioner MAY 23-24-25, 2012 ISTANBUL LÜTFI KIRDAR CONVENTION & EXHIBITION CENTRE ISTANBUL, TURKEY CONFERENCE ABSTRACTS Editors: Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis. Assistant Editors: Hélène Mazo, Sara Goggi, Olivier Hamon © ELRA – European Language Resources Association. All rights reserved. LREC 2012, EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION Title: LREC 2012 Conference Abstracts Distributed by: ELRA – European Language Resources Association 55-57, rue Brillat Savarin 75013 Paris France Tel.: +33 1 43 13 33 33 Fax: +33 1 43 13 33 30 www.elra.info and www.elda.org Email: [email protected] and [email protected] Copyright by the European Language Resources Association ISBN 978-2-9517408-7-7 EAN 9782951740877 All rights reserved. No part of this book may be reproduced in any form without the prior permission of the European Language Resources Association ii Introduction of the Conference Chair Nicoletta Calzolari I wish first to express to Ms Neelie Kroes, Vice-President of the European Commission, Digital agenda Commissioner, the gratitude of the Program Committee and of all LREC participants for her Distinguished Patronage of LREC 2012. Even if every time I feel we have reached the top, this 8th LREC is continuing the tradition of breaking previous records: this edition we received 1013 submissions and have accepted 697 papers, after reviewing by the impressive number of 715 colleagues.
    [Show full text]
  • Gold Standard Annotations for Preposition and Verb Sense With
    Gold Standard Annotations for Preposition and Verb Sense with Semantic Role Labels in Adult-Child Interactions Lori Moon Christos Christodoulopoulos Cynthia Fisher University of Illinois at Amazon Research University of Illinois at Urbana-Champaign [email protected] Urbana-Champaign [email protected] [email protected] Sandra Franco Dan Roth Intelligent Medical Objects University of Pennsylvania Northbrook, IL USA [email protected] [email protected] Abstract This paper describes the augmentation of an existing corpus of child-directed speech. The re- sulting corpus is a gold-standard labeled corpus for supervised learning of semantic role labels in adult-child dialogues. Semantic role labeling (SRL) models assign semantic roles to sentence constituents, thus indicating who has done what to whom (and in what way). The current corpus is derived from the Adam files in the Brown corpus (Brown, 1973) of the CHILDES corpora, and augments the partial annotation described in Connor et al. (2010). It provides labels for both semantic arguments of verbs and semantic arguments of prepositions. The semantic role labels and senses of verbs follow Propbank guidelines (Kingsbury and Palmer, 2002; Gildea and Palmer, 2002; Palmer et al., 2005) and those for prepositions follow Srikumar and Roth (2011). The corpus was annotated by two annotators. Inter-annotator agreement is given sepa- rately for prepositions and verbs, and for adult speech and child speech. Overall, across child and adult samples, including verbs and prepositions, the κ score for sense is 72.6, for the number of semantic-role-bearing arguments, the κ score is 77.4, for identical semantic role labels on a given argument, the κ score is 91.1, for the span of semantic role labels, and the κ for agreement is 93.9.
    [Show full text]
  • (Or, the Raising of Baby Mondegreen) Dissertation
    PRESERVING SUBSEGMENTAL VARIATION IN MODELING WORD SEGMENTATION (OR, THE RAISING OF BABY MONDEGREEN) DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Christopher Anton Rytting, B.A. ***** The Ohio State University 2007 Dissertation Committee: Approved by Dr. Christopher H. Brew, Co-Advisor Dr. Eric Fosler-Lussier, Co-Advisor Co-Advisor Dr. Mary Beckman Dr. Brian Joseph Co-Advisor Graduate Program in Linguistics ABSTRACT Many computational models have been developed to show how infants break apart utterances into words prior to building a vocabulary—the “word segmenta- tion task.” Most models assume that infants, upon hearing an utterance, represent this input as a string of segments. One type of model uses statistical cues calcu- lated from the distribution of segments within the child-directed speech to locate those points most likely to contain word boundaries. However, these models have been tested in relatively few languages, with little attention paid to how different phonological structures may affect the relative effectiveness of particular statistical heuristics. This dissertation addresses this is- sue by comparing the performance of two classes of distribution-based statistical cues on a corpus of Modern Greek, a language with a phonotactic structure signif- icantly different from that of English, and shows how these differences change the relative effectiveness of these cues. Another fundamental issue critically examined in this dissertation is the practice of representing input as a string of segments. Such a representation im- plicitly assumes complete certainty as to the phonemic identity of each segment.
    [Show full text]
  • Waxholm Space Atlas of the Sprague Dawley Rat Brain
    NeuroImage 97 (2014) 374–386 Contents lists available at ScienceDirect NeuroImage journal homepage: www.elsevier.com/locate/ynimg Waxholm Space atlas of the Sprague Dawley rat brain Eszter A. Papp a,TrygveB.Leergaarda, Evan Calabrese b, G. Allan Johnson b, Jan G. Bjaalie a,⁎ a Department of Anatomy, Institute of Basic Medical Sciences, University of Oslo, Oslo, Norway b Center for In Vivo Microscopy, Department of Radiology, Duke University Medical Center, Durham, NC, USA article info abstract Article history: Three-dimensional digital brain atlases represent an important new generation of neuroinformatics tools for Accepted 1 April 2014 understanding complex brain anatomy, assigning location to experimental data, and planning of experiments. Available online 12 April 2014 We have acquired a microscopic resolution isotropic MRI and DTI atlasing template for the Sprague Dawley rat brain with 39 μm isotropic voxels for the MRI volume and 78 μm isotropic voxels for the DTI. Building on this Keywords: template, we have delineated 76 major anatomical structures in the brain. Delineation criteria are provided for Digital brain atlas Waxholm Space each structure. We have applied a spatial reference system based on internal brain landmarks according to the Sprague Dawley Waxholm Space standard, previously developed for the mouse brain, and furthermore connected this spatial Rat brain template reference system to the widely used stereotaxic coordinate system by identifying cranial sutures and related Segmentation stereotaxic landmarks in the template using contrast given by the active staining technique applied to the tissue. Magnetic resonance imaging With the release of the present atlasing template and anatomical delineations, we provide a new tool for spatial Diffusion tensor imaging orientationanalysis of neuroanatomical location, and planning and guidance of experimental procedures in the Neuroinformatics rat brain.
    [Show full text]
  • Metapragmatics of Playful Speech Practices in Persian
    To Be with Salt, To Speak with Taste: Metapragmatics of Playful Speech Practices in Persian Author Arab, Reza Published 2021-02-03 Thesis Type Thesis (PhD Doctorate) School School of Hum, Lang & Soc Sc DOI https://doi.org/10.25904/1912/4079 Copyright Statement The author owns the copyright in this thesis, unless stated otherwise. Downloaded from http://hdl.handle.net/10072/402259 Griffith Research Online https://research-repository.griffith.edu.au To Be with Salt, To Speak with Taste: Metapragmatics of Playful Speech Practices in Persian Reza Arab BA, MA School of Humanities, Languages and Social Science Griffith University Thesis submitted in fulfilment of the requirements of the Degree of Doctor of Philosophy September 2020 Abstract This investigation is centred around three metapragmatic labels designating valued speech practices in the domain of ‘playful language’ in Persian. These three metapragmatic labels, used by speakers themselves, describe success and failure in use of playful language and construe a person as pleasant to be with. They are hāzerjavāb (lit. ready.response), bāmaze (lit. with.taste), and bānamak (lit. with.salt). Each is surrounded and supported by a cluster of (related) word meanings, which are instrumental in their cultural conceptualisations. The analytical framework is set within the research area known as ethnopragmatics, which is an offspring of Natural Semantics Metalanguage (NSM). With the use of explications and scripts articulated in cross-translatable semantic primes, the metapragmatic labels and the related clusters are examined in meticulous detail. This study demonstrates how ethnopragmatics, its insights on epistemologies backed by corpus pragmatics, can contribute to the metapragmatic studies by enabling a robust analysis using a systematic metalanguage.
    [Show full text]
  • The Neuron Phenotype Ontology: a FAIR Approach to Proposing and Classifying Neuronal Types
    bioRxiv preprint doi: https://doi.org/10.1101/2020.09.01.278879; this version posted September 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. The Neuron Phenotype Ontology: A FAIR Approach to Proposing and Classifying Neuronal Types Thomas H. Gillespie 1 (https://orcid.org/0000-0002-7509-4801) Shreejoy Tripathy2,3 (https://orcid.org/0000-0002-1007-9061) Mohameth François Sy (https://orcid.org/0000-0002-4603-9838) Maryann E. Martone 1, (https://orcid.org/0000-0002-8406-3871) Sean L. Hill2,3,4 (https://orcid.org/0000-0001-8055-860X) 1Department of Neuroscience, University of California, San Diego 2Department of Psychiatry, University of Toronto 3Krembil Center for Neuroinformatics, Centre for Addiction and Mental Health: Toronto 4Blue Brain Project, École polytechnique fédérale de Lausanne (EPFL), Campus Biotech, 1202 Geneva, Switzerland Corresponding Author: Sean Hill, Ph.D., [email protected] Blue Brain Project Campus Biotech, EPFL chemin des Mines, 9 Geneva, Switzerland CH-1202 Acknowledgements: This work was supported by NIH Brain Initiative award 5U24MH114827-04, a Canadian Institute for Health Research post-doctoral fellowship and National Institutes of Health grant MH111099, the Krembil Foundation, and by funding to the Blue Brain Project, a research center of the École polytechnique fédérale de Lausanne (EPFL), from the Swiss government’s ETH Board of the Swiss Federal Institutes of Technology . Meetings supporting this work were facilitated by the International Neuroinformatics Coordinating Facility (INCF) through the Neuroinformatics for cell types special interest group.
    [Show full text]
  • Foundational Model of Structural Connectivity in the Nervous System
    Foundational model of structural connectivity in the INAUGURAL ARTICLE nervous system with a schema for wiring diagrams, connectome, and basic plan architecture Larry W. Swanson1 and Mihail Bota Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089 This contribution is part of the special series of Inaugural Articles by members of the National Academy of Sciences elected in 2010. Contributed by Larry W. Swanson, October 8, 2010 (sent for review September 13, 2010) The nervous system is a biological computer integrating the body’s ular, cellular, systems, and behavioral organization levels. A reflex and voluntary environmental interactions (behavior) with Human Connectome Project goal might be framed as providing a relatively constant internal state (homeostasis)—promoting sur- the detailed structural data needed to create a foundational vival of the individual and species. The wiring diagram of the nervous system structural model analogous to the DNA double- nervous system’s structural connectivity provides an obligatory helix structural model. foundational model for understanding functional localization at Neuroinformatics offers powerful new tools to store, share, molecular, cellular, systems, and behavioral organization levels. mine, analyze, and model data about neural connectivity in- This paper provides a high-level, downwardly extendible, concep- cluding the human brain—by far the most complex system tual framework—like a compass and map—for describing and known. Databases and inference engines for automatic reasoning exploring in neuroinformatics systems (such as our Brain Architec- in neuroinformatics workbenches require an integrated concep- ture Knowledge Management System) the structural architecture tual framework: (i)adefined universe of discourse (concept of the nervous system’s basic wiring diagram.
    [Show full text]
  • Unit 3: Available Corpora and Software
    Corpus building and investigation for the Humanities: An on-line information pack about corpus investigation techniques for the Humanities Unit 3: Available corpora and software Irina Dahlmann, University of Nottingham 3.1 Commonly-used reference corpora and how to find them This section provides an overview of commonly-used and readily available corpora. It is also intended as a summary only and is far from exhaustive, but should prove useful as a starting point to see what kinds of corpora are available. The Corpora here are divided into the following categories: • Corpora of General English • Monitor corpora • Corpora of Spoken English • Corpora of Academic English • Corpora of Professional English • Corpora of Learner English (First and Second Language Acquisition) • Historical (Diachronic) Corpora of English • Corpora in other languages • Parallel Corpora/Multilingual Corpora Each entry contains the name of the corpus and a hyperlink where further information is available. All the information was accurate at the time of writing but the information is subject to change and further web searches may be required. Corpora of General English The American National Corpus http://www.americannationalcorpus.org/ Size: The first release contains 11.5 million words. The final release will contain 100 million words. Content: Written and Spoken American English. Access/Cost: The second release is available from the Linguistic Data Consortium (http://projects.ldc.upenn.edu/ANC/) for $75. The British National Corpus http://www.natcorp.ox.ac.uk/ Size: 100 million words. Content: Written (90%) and Spoken (10%) British English. Access/Cost: The BNC World Edition is available as both a CD-ROM or via online subscription.
    [Show full text]
  • English Corpus Linguistics: an Introduction - Charles F
    Cambridge University Press 0521808790 - English Corpus Linguistics: An Introduction - Charles F. Meyer Index More information Index Aarts, Bas, 4, 102 Biber, Douglas, et al. (1999) 14 adequacy, 2–3, 10–11 Birmingham Corpus, 15, 142 age, 49–50 Blachman, Edward, 76–7 Altenberg, Bengt, 26–7 BNC see British National Corpus AMALGAM Tagging Project, 86–7, 89 Brill, Eric, 86 American National Corpus, 24, 84, 142 Brill Tagger, 86–8 American Publishing House for the Blind British National Corpus (BNC), 143 Corpus, 17, 142 annotation, 84 analyzing a corpus, 100 composition, 18, 31t, 34, 36, 38, 40–1, 49 determining suitability, 103–7, 107t copyright, 139–40 exploring a corpus, 123–4 planning, 30–2, 33, 43, 51, 138 extracting information: defining parameters, record keeping, 66 107–9; coding and recording, 109–14, research using, 15, 36 112t; locating relevant constructions, speech samples, 59 114–19, 116f, 118f tagging, 87 framing research question, 101–3 time-frame, 45 future prospects, 140–1 British National Corpus (BNC) Sampler, see also pseudo-titles (corpus analysis 139–40, 143 case study); statistical analysis Brown Corpus, xii, 1, 143 anaphors, 97 genre variation, 18 annotation, 98–9 length, 32 future prospects, 140 research using, 6, 9–10, 12, 42, 98, 103 grammatical markup, 81 sampling methodology, 44 parsing, 91–6, 98, 140 tagging, 87, 90 part-of-speech markup, 81 time-frame, 45 structural markup, 68–9, 81–6 see also FROWN (Freiburg–Brown) tagging, 86–91, 97–8, 111, 117–18, 140 Corpus types, 81 Burges, Jen, 52 appositions, 42, 98 Burnard,
    [Show full text]