Grammatical Category and the Neural Processing of Phrases Amelia Burroughs1, Nina Kazanina2,3 & Conor Houghton 1*
Total Page:16
File Type:pdf, Size:1020Kb
www.nature.com/scientificreports OPEN Grammatical category and the neural processing of phrases Amelia Burroughs1, Nina Kazanina2,3 & Conor Houghton 1* The interlocking roles of lexical, syntactic and semantic processing in language comprehension has been the subject of longstanding debate. Recently, the cortical response to a frequency-tagged linguistic stimulus has been shown to track the rate of phrase and sentence, as well as syllable, presentation. This could be interpreted as evidence for the hierarchical processing of speech, or as a response to the repetition of grammatical category. To examine the extent to which hierarchical structure plays a role in language processing we recorded EEG from human participants as they listen to isochronous streams of monosyllabic words. Comparing responses to sequences in which grammatical category is strictly alternating and chosen such that two-word phrases can be grammatically constructed—cold food loud room—or is absent—rough give ill tell— showed cortical entrainment at the two-word phrase rate was only present in the grammatical condition. Thus, grammatical category repetition alone does not yield entertainment at higher level than a word. On the other hand, cortical entrainment was reduced for the mixed-phrase condition that contained two-word phrases but no grammatical category repetition—that word send less—which is not what would be expected if the measured entrainment refected purely abstract hierarchical syntactic units. Our results support a model in which word-level grammatical category information is required to build larger units. Te ability of the human brain to rapidly generate meaning from an incoming stream of words is an impressive feat. Te role played by hierarchical syntactic structure during this processing is the subject of an ongoing debate with, on the two extremes, some arguing that full hierarchical analysis is central to sentence comprehension 1,2,3, while others claim that hierarchical representations are non-essential4–7. According to the hierarchical account of language, comprehension is underpinned by the brain’s ability to abstract over a number of linguistic levels, such as grammatical categories and phrases and to combine them hier- archically according to a set of grammatical principles. In this view language users parse an incoming sequence of words into a nested tree-like structure that details taxonomy-like relationships between syntactic constituents and enables sentence comprehension (Fig. 1). In support of this view, it has been demonstrated, using MEG in 8 and using EEG in9 that cortical activity can entrain to the rate of syllable, phrase and sentence presentation. In the EEG experiments participants were played continuous streams of four-word sentences, where each word was 320 ms long in duration and consisted of only a single syllable. As in Fig. 1 each sentence was composed of a noun phrase and a verb phrase, each containing two words. Tus these stimuli have a specifc frequency at three levels of linguistic structure: syl- lables at 3.125 Hz, phrases at 1.5625 Hz and sentences at 0.78125 Hz. Te neural responses were analysed using time-frequency decomposition and measures of inter-trial phase coherence (ITPC). Cortical activity was found to be phase-locked to the rate of presentation of syllables, phrases and sentences even though only the syllable frequency was present in the auditory signal itself, the other two frequencies rely on the meaning of the words and the structure of the sentences. However, it has also been suggested that the brain could rely on simpler, potentially more generic, strategies underpinned by statistical processing of linguistic representations. In line with this, recent work7 has shown that a model solely based on distributional word semantics is sufcient to predict the response observed in 8,9. In this model the distributional word semantics are represented by skipgram–word2vec vectors10,11. In skipgram, word2vec vectors are calculated by training a simple linear neural network with one hidden layer; the input and 1Department of Computer Science, University of Bristol, Bristol, UK. 2School of Psychological Science, University of Bristol, Bristol, UK. 3International Laboratory of Social Neurobiology, Institute for Cognitive Neuroscience, National Research University Higher School of Economics, Moscow, Russian Federation. *email: [email protected] Scientifc Reports | (2021) 11:2446 | https://doi.org/10.1038/s41598-021-81901-5 1 Vol.:(0123456789) www.nature.com/scientificreports/ Sentence 0.78125 Hz Noun Phrase Verb Phrase 1.5625 Hz Adjective Noun Verb Noun 3.125Hz new plans gave hope Figure 1. Demonstration of a syntactic tree for an example sentence from8,9. Te sentence is composed of a noun phrase and a verb phrase, each consisting of two words presented at a rate of 3.125 Hz. Te sentence is described using a hierarchical tree that splits the sentence frst into a noun phrase and a verb phrase and then into words. Tis is a particularly simple tree, more complex trees may have more branches. Tis simple structure is convenient for frequency tagging and the diferent frequencies corresponding to the three levels of the tree, sentence, phrase and word, have been marked here. output layers both correspond to words and the network is trained on the task of predicting, from a given input word, the unordered list of words that occur in proximity to it in text. Te components of the word2vec vector for a given word are the weights feeding forwards from the word to the hidden layer. Words that are likely to occur in a similar context have similar representations in the hidden layer and hence are associated with similar word2vec vectors. To a striking degree these high-dimensional vectors have specifc directions that serve, at least locally, to represent specifc concepts, so that, for example, the same direction that leads from “big” to “biggest” leads from “small” to “smallest”12,13. In7 fctive EEG signals representing experimental trials were constructed from the word2vec vectors for each stimulus. In the EEG experiment each word was presented for 320 ms and so, in the fcitive data 320 copies of the vector for each word were lined up side-by-side forming the columns of a matrix so each column represents 1 ms of the stimulus. Te rows of this matrix were then treated as an EEG trace and this fctive signal was analysed in the same way as the real EEG signal is, with measures of the evoked response averaged over rows, much as we average over individual electrodes. Tis simulated EEG signal demonstrated the same entrainment to words, phrases and sentences as the real signal. Since the high-dimensional word2vec vectors represent single words only and do not explicitly encode information about word sequences, this demonstrated that semantic relation- ships that can be deduced from a text corpus are sufcient to explain the ITPC peaks seen in the real experiment, without any need to invoke the hierarchical structure of the sentence. Te current study aimed to elucidate the importance of hierarchical structure during language process- ing using EEG. We recorded neural activity from 20 participants while they listened to streams of two-word sequences from four diferent conditions: AN (adjective–noun): repetition of adjective–noun sequences, AV (adjective–verb): repetition of adjective–verb sequences, MP (mixed phrase): repetition of grammatical two-word phrases with varying grammatical categories, RR (random): random word order; no phrases possible. In the AN and AV conditions, grammatical categories occurred at a regular rate, so adjectives, nouns or verbs were repeated every other word. However the stream could only be parsed into grammatical phrases in the AN condition, for example: A NA NA N AN: cold food loud room tall girl where underlining indicates grammatical phrases. In the AV condition, no such grammatical phrases could be formed, as in the example: A VAVAV AV: rough give ill tell thin chew In the MP condition, grammatical two-word phrases could be formed, but grammatical category occurred with no regularity, for example Det N V P Adv V MP: that word send less too loud and, fnally, in the RR condition there are neither grammatical phrases nor any regularity Pre V A P V Det RR: from solve good him ask an Scientifc Reports | (2021) 11:2446 | https://doi.org/10.1038/s41598-021-81901-5 2 Vol:.(1234567890) www.nature.com/scientificreports/ AN AV 1 1 0.8 0.8 0.6 0.6 ITPC 0.4 0.4 0.2 0.2 0 0 01234 01234 MP random 1 1 0.8 0.8 0.6 0.6 ITPC 0.4 0.4 0.2 0.2 0 0 01234 01234 frequency,Hz frequency, Hz Figure 2. ITPC of the simulated EEG calculated from the word2vec representation for the four conditions. Te model yielded peaks in ITPC at the rate of syllable presentation (3.125 Hz) in each of the four conditions (AN, AV, MP, random). Vertical red lines represent the frequency at which syllables and two-word sequences are presented, the blue lines show responses for 20 simulated particants, the black is the grand average over these participants. As in9 the principle measure of response used here is the inter-trial phase coherence (ITPC); this quantifes the clustering of phases across trials at a given frequency. According to the hierarchical account we would expect a peak in ITPC at the phrase rate in both the AN and MP conditions, but no peak in ITPC in the AV or RR condition. A sequential account of language processing that relies primarily upon word-level statistics would instead predict a peak in ITPC at the phrasal rate in both the AN and AV conditions: our stimulus was designed so that the peak calculated using the fctive EEG simulated from word2vec gives similar phrase peaks in the AN and AV conditions.