Character-Word LSTM Language Models

Total Page:16

File Type:pdf, Size:1020Kb

Character-Word LSTM Language Models Character-Word LSTM Language Models Lyan Verwimp Joris Pelemans Hugo Van hamme Patrick Wambacq ESAT – PSI, KU Leuven Kasteelpark Arenberg 10, 3001 Heverlee, Belgium [email protected] Abstract A first drawback is the fact that the parameters for infrequent words are typically less accurate because We present a Character-Word Long Short- the network requires a lot of training examples to Term Memory Language Model which optimize the parameters. The second and most both reduces the perplexity with respect important drawback addressed is the fact that the to a baseline word-level language model model does not make use of the internal structure and reduces the number of parameters of the words, given that they are encoded as one-hot of the model. Character information can vectors. For example, ‘felicity’ (great happiness) is reveal structural (dis)similarities between a relatively infrequent word (its frequency is much words and can even be used when a word lower compared to the frequency of ‘happiness’ is out-of-vocabulary, thus improving the according to Google Ngram Viewer (Michel et al., modeling of infrequent and unknown words. 2011)) and will probably be an out-of-vocabulary By concatenating word and character (OOV) word in many applications, but since there embeddings, we achieve up to 2.77% are many nouns also ending on ‘ity’ (ability, com- relative improvement on English compared plexity, creativity . ), knowledge of the surface to a baseline model with a similar amount of form of the word will help in determining that ‘felic- parameters and 4.57% on Dutch. Moreover, ity’ is a noun. Hence, subword information can play we also outperform baseline word-level an important role in improving the representations models with a larger number of parameters. for infrequent words and even OOV words. In our character-word (CW) LSTM LM, we 1 Introduction concatenate character and word embeddings and Language models (LMs) play a crucial role in feed the resulting character-word embedding to the many speech and language processing tasks, among LSTM. Hence, we provide the LSTM with infor- others speech recognition, machine translation and mation about the structure of the word. By concate- optical character recognition. The current state of nating the embeddings, the individual characters the art are recurrent neural network (RNN) based (as opposed to e.g. a bag-of-characters approach) LMs (Mikolov et al., 2010), and more specifically are preserved and the order of the characters is im- long short-term memory models (LSTM) (Hochre- plicitly modeled. Moreover, since we keep the total iter and Schmidhuber, 1997) LMs (Sundermeyer embedding size constant, the ‘word’ embedding et al., 2012) and their variants (e.g. gated recurrent shrinks in size and is partly replaced by character units (GRU) (Cho et al., 2014)). LSTMs and GRUs embeddings (with a much smaller vocabulary and are usually very similar in performance, with GRU hence a much smaller embedding matrix), which models often even outperforming LSTM models decreases the number of parameters of the model. despite the fact that they have less parameters to We investigate the influence of the number of train. However, Jozefowicz et al. (2015) recently characters added, the size of the character embed- showed that for the task of language modeling dings, weight sharing for the characters and the size LSTMs work better than GRUs, therefore we focus of the (hidden layer of the) model. Given that com- on LSTM-based LMs. mon or similar character sequences do not always In this work, we address some of the drawbacks occur at the beginning of words (e.g. ‘overfitting’ of NN based LMs (and many other types of LMs). – ‘underfitting’), we also examine adding the charac- 417 Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 417–427, Valencia, Spain, April 3-7, 2017. c 2017 Association for Computational Linguistics ters in forward order, backward order or both orders. Research on replacing the word embeddings We test our CW LMs on both English and entirely has been done for neural machine transla- Dutch. Since Dutch has a richer morphology tion (NMT) by Ling et al. (2015) and Costa-jussa` than English due to among others its productive and Fonollosa (2016), who replace word-level compounding (see e.g. (Reveil,´ 2012)), we expect embeddings with character-level embeddings. that it should benefit more from a LM augmented Chung et al. (2016) use a subword-level encoder and with formal/morphological information. a character-level decoder for NMT. In dependency The contributions of this paper are the following: parsing, Ballesteros et al. (2015) achieve improve- ments by generating character-level embeddings 1. We present a method to combine word with a bidirectional LSTM. Xie et al. (2016) work and subword information in an LSTM LM: on natural language correction and also use an concatenating word and character embeddings. encoder-decoder, but operate for both the encoder As far as we know, this method has not been and the decoder on the character level. investigated before. Character-level word representations can also 2. By decreasing the size of the word-level em- be generated with convolutional neural networks bedding (and hence the huge word embedding (CNNs), as Zhang et al. (2015) and Kim et al. (2016) matrix), we effectively reduce the number of have proven for text classification and language parameters in the model (see section 3.3). modeling respectively. Kim et al. (2016) achieve state-of-the-art results in language modeling for 3. We find that the CW model both outperforms several languages by combining a character-level word-level LMs with the same number of CNN with highway (Srivastava et al., 2015) and hidden units (and hence a larger number of LSTM layers. However, the major improvement parameters) and word-level LMs with the is achieved by adding the highway layers: for a same number of parameters. These findings small model size, the purely character-level model are confirmed for English and Dutch, for a without highway layers does not perform better small model size and a large model size. The than the word-level model (perplexity of 100.3 size of the character embeddings should be compared to 97.6), even though the character model proportional to the total size of the embedding has two hidden layers of 300 LSTM units each and (the concatenation of characters should not is compared to a word model of two hidden layers exceed the size of the word-level embedding), of only 200 units (in order to keep the number and using characters in the backward order of parameters similar). For a model of larger improves the perplexity even more (see size, the character-level LM improves the word sections 3.1, 4.3 and 4.4). baseline (84.6 compared to 85.4), but the largest 4. The LM improves the modeling of OOV improvement is achieved by adding two highway words by exploiting their surface form (see layers (78.9). Finally, Jozefowicz et al. (2016) also section 4.7). describe character embeddings generated by a CNN, but they test on the 1B Word Benchmark, a data set The remainder of this paper is structured as of an entirely different scale than the one we use. follows: first, we discuss related work (section 2); Other authors combine the word and character then the CW LSTM LM is described (section 3) and information (as we do in this paper) rather than tested (section 4). Finally, we give an overview of doing away completely with word inputs. Chen et the results and an outlook to future work (section 5). al. (2015) and Kang et al. (2011) work on models 2 Related work combining words and Chinese characters to learn embeddings. Note however that Chinese characters Other work that investigates the use of character in- more closely match subwords or words than formation in RNN LMs either completely replaces phonemes. Bojanowski et al. (2015) operate on the the word-level representation by a character-level character level but use knowledge about the context one or combines word and character information. words in two variants of character-level RNN LMs. Much research has also been done on modeling Dos Santos and Zadrozny (2014) join word and other types of subword information (e.g. mor- character representations in a deep neural network phemes, syllables), but in this discussion, we limit for part-of-speech tagging. Finally, Miyamoto and ourselves to characters as subword information. 418 Cho (2016) describe a LM that is related to our model, although their character-level embedding is generated by a bidirectional LSTM and we do embedding LSTM not use a gate to determine how much of the word and how much of the character embedding is used. However, they only compare to a simple baseline e.g. cat wt model of 2 LSTM layers of each 200 hidden units without dropout, resulting in a higher baseline perplexity (as mentioned in section 4.3, our CW model also achieves larger improvements than 1 c ct h reported in this paper with respect to that baseline). t 2 We can conclude that in various NLP tasks, char- a ct acters have recently been introduced in several dif- t 3 ferent manners. However, the models investigated ct in related work are either not tested on a competitive baseline (Miyamoto and Cho, 2016) or do not per- ht 1 form better than our models (Kim et al., 2016). In − this paper, we introduce a new and straightforward manner to incorporate characters in a LM that (as Figure 1: Concatenating word and character far as we know) has not been investigated before.
Recommended publications
  • Creating Words: Is Lexicography for You? Lexicographers Decide Which Words Should Be Included in Dictionaries. They May Decide T
    Creating Words: Is Lexicography for You? Lexicographers decide which words should be included in dictionaries. They may decide that a word is currently just a fad, and so they’ll wait to see whether it will become a permanent addition to the language. In the past several decades, words such as hippie and yuppie have survived being fads and are now found in regular, not just slang, dictionaries. Other words, such as medicare, were created to fill needs. And yet other words have come from trademark names, for example, escalator. Here are some writing options: 1. While you probably had to memorize vocabulary words throughout your school years, you undoubtedly also learned many other words and ways of speaking and writing without even noticing it. What factors are bringing about changes in the language you now speak and write? Classes? Songs? Friends? Have you ever influenced the language that someone else speaks? 2. How often do you use a dictionary or thesaurus? What helps you learn a new word and remember its meaning? 3. Practice being a lexicographer: Define a word that you know isn’t in the dictionary, or create a word or set of words that you think is needed. When is it appropriate to use this term? Please give some sample dialogue or describe a specific situation in which you would use the term. For inspiration, you can read the short article in the Writing Center by James Chiles about the term he has created "messismo"–a word for "true bachelor housekeeping." 4. Or take a general word such as "good" or "friend" and identify what it means in different contexts or the different categories contained within the word.
    [Show full text]
  • Improving Efficiency of Map Reduce Paradigm with ANFIS for Big Data (IJSTE/ Volume 1 / Issue 12 / 015)
    IJSTE - International Journal of Science Technology & Engineering | Volume 1 | Issue 12 | June 2015 ISSN (online): 2349-784X Improving Efficiency of Map Reduce Paradigm with ANFIS for Big Data Gor Vatsal H. Prof. Vatika Tayal Department of Computer Science and Engineering Department of Computer Science and Engineering NarNarayan Shashtri Institute of Technology Jetalpur , NarNarayan Shashtri Institute of Technology Jetalpur , Ahmedabad , India Ahmedabad , India Abstract As all we know that map reduce paradigm is became synonyms for computing big data problems like processing, generating and/or deducing large scale of data sets. Hadoop is a well know framework for these types of problems. The problems for solving big data related problems are varies from their size , their nature either they are repetitive or not etc., so depending upon that various solutions or way have been suggested for different types of situations and problems. Here a hybrid approach is used which combines map reduce paradigm with anfis which is aimed to boost up such problems which are likely to repeat whole map reduce process multiple times. Keywords: Big Data, fuzzy Neural Network, ANFIS, Map Reduce, Hadoop ________________________________________________________________________________________________________ I. INTRODUCTION Initially, to solve problem various problems related to large crawled documents, web requests logs, row data , etc a computational processing model is suggested by jeffrey Dean and Sanjay Ghemawat is Map Reduce in 2004[1]. MapReduce programming model is inspired by map and reduce primitives which are available in Lips and many other functional languages. It is like a De Facto standard and widely used for solving big data processing and related various operations.
    [Show full text]
  • Basic Morphology
    What is Morphology? Mark Aronoff and Kirsten Fudeman MORPHOLOGY AND MORPHOLOGICAL ANALYSIS 1 1 Thinking about Morphology and Morphological Analysis 1.1 What is Morphology? 1 1.2 Morphemes 2 1.3 Morphology in Action 4 1.3.1 Novel words and word play 4 1.3.2 Abstract morphological facts 6 1.4 Background and Beliefs 9 1.5 Introduction to Morphological Analysis 12 1.5.1 Two basic approaches: analysis and synthesis 12 1.5.2 Analytic principles 14 1.5.3 Sample problems with solutions 17 1.6 Summary 21 Introduction to Kujamaat Jóola 22 mor·phol·o·gy: a study of the structure or form of something Merriam-Webster Unabridged n 1.1 What is Morphology? The term morphology is generally attributed to the German poet, novelist, playwright, and philosopher Johann Wolfgang von Goethe (1749–1832), who coined it early in the nineteenth century in a biological context. Its etymology is Greek: morph- means ‘shape, form’, and morphology is the study of form or forms. In biology morphology refers to the study of the form and structure of organisms, and in geology it refers to the study of the configuration and evolution of land forms. In linguistics morphology refers to the mental system involved in word formation or to the branch 2 MORPHOLOGYMORPHOLOGY ANDAND MORPHOLOGICAL MORPHOLOGICAL ANALYSIS ANALYSIS of linguistics that deals with words, their internal structure, and how they are formed. n 1.2 Morphemes A major way in which morphologists investigate words, their internal structure, and how they are formed is through the identification and study of morphemes, often defined as the smallest linguistic pieces with a gram- matical function.
    [Show full text]
  • Learning About Phonology and Orthography
    EFFECTIVE LITERACY PRACTICES MODULE REFERENCE GUIDE Learning About Phonology and Orthography Module Focus Learning about the relationships between the letters of written language and the sounds of spoken language (often referred to as letter-sound associations, graphophonics, sound- symbol relationships) Definitions phonology: study of speech sounds in a language orthography: study of the system of written language (spelling) continuous text: a complete text or substantive part of a complete text What Children Children need to learn to work out how their spoken language relates to messages in print. Have to Learn They need to learn (Clay, 2002, 2006, p. 112) • to hear sounds buried in words • to visually discriminate the symbols we use in print • to link single symbols and clusters of symbols with the sounds they represent • that there are many exceptions and alternatives in our English system of putting sounds into print Children also begin to work on relationships among things they already know, often long before the teacher attends to those relationships. For example, children discover that • it is more efficient to work with larger chunks • sometimes it is more efficient to work with relationships (like some word or word part I know) • often it is more efficient to use a vague sense of a rule How Children Writing Learn About • Building a known writing vocabulary Phonology and • Analyzing words by hearing and recording sounds in words Orthography • Using known words and word parts to solve new unknown words • Noticing and learning about exceptions in English orthography Reading • Building a known reading vocabulary • Using known words and word parts to get to unknown words • Taking words apart while reading Manipulating Words and Word Parts • Using magnetic letters to manipulate and explore words and word parts Key Points Through reading and writing continuous text, children learn about sound-symbol relation- for Teachers ships, they take on known reading and writing vocabularies, and they can use what they know about words to generate new learning.
    [Show full text]
  • Study and Analysis of Different Cloud Storage Platform
    International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 03 Issue: 06 | June-2016 www.irjet.net p-ISSN: 2395-0072 Study And Analysis Of Different Cloud Storage Platform S Aditi Apurva, Dept. Of Computer Science And Engineering, KIIT University ,Bhubaneshwar, India. ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Cloud Storage is becoming the most sought 1.INTRODUCTION after storage , be it music files, videos, photos or even The term cloud is the metaphor for internet. The general files people are switching over from storage on network of servers and connection are collectively their local hard disks to storage in the cloud. known as Cloud .Cloud computing emerges as a new computing paradigm that aims to provide reliable, Google Cloud Storage offers developers and IT customized and quality of service guaranteed organizations durable and highly available object computation environments for cloud users. storage. Cloud storage is a model of data storage in Applications and databases are moved to the large which the digital data is stored in logical pools, the centralized data centers, called cloud. physical storage spans multiple servers (and often locations), and the physical environment is typically owned and managed by a hosting company. Analysis of cloud storage can be problem specific such as for one kind of files like YouTube or generic files like google drive and can have different performances measurements . Here is the analysis of cloud storage based on Google’s paper on Google Drive,One Drive, Drobox, Big table , Facebooks Cassandra This will provide an overview of Fig 2: Cloud storage working. how the cloud storage works and the design principle .
    [Show full text]
  • Large-Scale Youtube-8M Video Understanding with Deep Neural Networks
    Large-Scale YouTube-8M Video Understanding with Deep Neural Networks Manuk Akopyan Eshsou Khashba Institute for System Programming Institute for System Programming ispras.ru ispras.ru [email protected] [email protected] many hand-crafted approaches to video-frame feature Abstract extraction, such as Histogram of Oriented Gradients (HOG), Histogram of Optical Flow (HOF), Motion Video classification problem has been studied many Boundary Histogram (MBH) around spatio-temporal years. The success of Convolutional Neural Networks interest points [9], in a dense grid [10], SIFT [11], the (CNN) in image recognition tasks gives a powerful Mel-Frequency Cepstral Coefficients (MFCC) [12], the incentive for researchers to create more advanced video STIP [13] and the dense trajectories [14] existed. Set of classification approaches. As video has a temporal video-frame features then encoded to video-level feature content Long Short Term Memory (LSTM) networks with bag of words (BoW) approach. The problem with become handy tool allowing to model long-term temporal BoW is that it uses only static video-frame information clues. Both approaches need a large dataset of input disposing of the time component, the frame ordering. data. In this paper three models provided to address Recurrent Neural Networks (RNN) show good results in video classification using recently announced YouTube- modeling with time-based input data. A few papers [15, 8M large-scale dataset. The first model is based on frame 16] describe solving video classification problem using pooling approach. Two other models based on LSTM Long Short-Term Memory (LSTM) networks and achieve networks. Mixture of Experts intermediate layer is used in good results.
    [Show full text]
  • Sector and Sphere: the Design and Implementation of a High Performance Data Cloud Yunhong Gu University of Illinois at Chicago
    Sector and Sphere: The Design and Implementation of a High Performance Data Cloud Yunhong Gu University of Illinois at Chicago Robert L Grossman University of Illinois at Chicago and Open Data Group ABSTRACT available, the data is moved to the processors. To simplify, this is the supercomputing model. An alternative Cloud computing has demonstrated that processing very approach is to store the data and to co-locate the large datasets over commodity clusters can be done computation with the data when possible. To simplify, simply given the right programming model and this is the data center model. infrastructure. In this paper, we describe the design and implementation of the Sector storage cloud and the Cloud computing platforms (GFS/MapReduce/BigTable Sphere compute cloud. In contrast to existing storage and and Hadoop) that have been developed thus far have been compute clouds, Sector can manage data not only within a designed with two important restrictions. First, clouds data center, but also across geographically distributed data have assumed that all the nodes in the cloud are co- centers. Similarly, the Sphere compute cloud supports located, i.e., within one data center, or that there is User Defined Functions (UDF) over data both within a relatively small bandwidth available between the data center and across data centers. As a special case, geographically distributed clusters containing the data. MapReduce style programming can be implemented in Second, these clouds have assumed that individual inputs Sphere by using a Map UDF followed by a Reduce UDF. and outputs to the cloud are relatively small, although the We describe some experimental studies comparing aggregate data managed and processed is very large.
    [Show full text]
  • Cloud Computing Synopsis and Recommendations
    Special Publication 800-146 Cloud Computing Synopsis and Recommendations Recommendations of the National Institute of Standards and Technology Lee Badger Tim Grance Robert Patt-Corner Jeff Voas NIST Special Publication 800-146 Cloud Computing Synopsis and Recommendations Recommendations of the National Institute of Standards and Technology Lee Badger Tim Grance Robert Patt-Corner Jeff Voas C O M P U T E R S E C U R I T Y Computer Security Division Information Technology Laboratory National Institute of Standards and Technology Gaithersburg, MD 20899-8930 May 2012 U.S. Department of Commerce John Bryson, Secretary National Institute of Standards and Technology Patrick D. Gallagher, Under Secretary of Commerce for Standards and Technology and Director CLOUD COMPUTING SYNOPSIS AND RECOMMENDATIONS Reports on Computer Systems Technology The Information Technology Laboratory (ITL) at the National Institute of Standards and Technology (NIST) promotes the U.S. economy and public welfare by providing technical leadership for the nation’s measurement and standards infrastructure. ITL develops tests, test methods, reference data, proof of concept implementations, and technical analysis to advance the development and productive use of information technology. ITL’s responsibilities include the development of management, administrative, technical, and physical standards and guidelines for the cost-effective security and privacy of other than national security-related information in Federal information systems. This Special Publication 800-series reports on ITL’s research, guidance, and outreach efforts in computer security and its collaborative activities with industry, government, and academic organizations. National Institute of Standards and Technology Special Publication 800-146 Natl. Inst. Stand. Technol. Spec. Publ.
    [Show full text]
  • Linguistics 1A Morphology 2 Complex Words
    Linguistics 1A Morphology 2 Complex words In the previous lecture we noted that words can be classified into different categories, such as verbs, nouns, adjectives, prepositions, determiners, and so on. We can make another distinction between word types as well, a distinction that cuts across these categories. Consider the verbs, nouns and adjectives in (1)-(3), respectively. It will probably be intuitively clear that the words in the (b) examples are complex in a way that the words in the (a) examples are not, and not just because the words in the (b) examples are, on the whole, longer. (1) a. to walk, to dance, to laugh, to kiss b. to purify, to enlarge, to industrialize, to head-hunt (2) a. house, corner, zebra b. collection, builder, sea horse (3) c. green, old, sick d. regional, washable, honey-sweet The words in the (a) examples in (1)-(3) do not have any internal structure. It does not seem to make much sense to say that walk , for example, consists of the smaller parts wa and lk . But for the words in the (b) examples this is different. These are built up from smaller parts that each contribute their own distinct bit of meaning to the whole. For example, builder consists of the verbal part build with its associated meaning, and the part –er that contributes a ‘doer’ reading, just as it does in kill-er , sell-er , doubt-er , and so on. Similarly, washable consists of wash and a part –able that contributes a meaning aspect that might be described loosely as ‘can be done’, as it does in refundable , testable , verifiable etc.
    [Show full text]
  • Mapreduce: Simplified Data Processing On
    MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat [email protected], [email protected] Google, Inc. Abstract given day, etc. Most such computations are conceptu- ally straightforward. However, the input data is usually MapReduce is a programming model and an associ- large and the computations have to be distributed across ated implementation for processing and generating large hundreds or thousands of machines in order to finish in data sets. Users specify a map function that processes a a reasonable amount of time. The issues of how to par- key/value pair to generate a set of intermediate key/value allelize the computation, distribute the data, and handle pairs, and a reduce function that merges all intermediate failures conspire to obscure the original simple compu- values associated with the same intermediate key. Many tation with large amounts of complex code to deal with real world tasks are expressible in this model, as shown these issues. in the paper. As a reaction to this complexity, we designed a new Programs written in this functional style are automati- abstraction that allows us to express the simple computa- cally parallelized and executed on a large cluster of com- tions we were trying to perform but hides the messy de- modity machines. The run-time system takes care of the tails of parallelization, fault-tolerance, data distribution details of partitioning the input data, scheduling the pro- and load balancing in a library. Our abstraction is in- gram's execution across a set of machines, handling ma- spired by the map and reduce primitives present in Lisp chine failures, and managing the required inter-machine and many other functional languages.
    [Show full text]
  • Big Data Challenges and Hadoop As One of the Solution of Big Data with Its Modules Tapan P
    International Journal of Scientific & Engineering Research, Volume 5, Issue 6, June-2014 133 ISSN 2229-5518 Big Data challenges and Hadoop as one of the solution of big data with its Modules Tapan P. Gondaliya, Dr. Hiren D. Joshi Abstract— this is an era of the tools, trend and technology. And these tools, trend and technology era is mainly depending upon the information. Or this information is called the data and that information is stored in database just like a My SQL, Oracle, Access and other many more database available in different companies. But problem is in our daily life we used tools and devices for the entertainment, websites, Scientific instrument, and many more different devices that generate a large amount of data and that is in zeta bytes or in Petabytes that is called big data and that data create a big problem in our day to day life. Very tough to organize or manage this kind of big data but through Apache Hadoop it’s trouble-free. Basically in this paper we describe the main purpose of Apache Hadoop and how can Hadoop organize or manage the different kinds of data and what are the main techniques are to be used behind that. Index Terms— Big Data, Hadoop, Map Reduce, Hadoop Distributed File System, Big Data Challanges —————————— —————————— 1 INTRODUCTION his is the era of digitalization and we all live in digital T world. In this era we need more and more facility for our life is being easier. So we used different kind of new things for entertaining our life using audio video and digital images, through social stuff like a Facebook Google+ Tweeter, using mobile as well as sensor devices, we used company or gov- ernment sector that all this increases the digitalization and create a large amount of structured, semi structured and un- structured data.
    [Show full text]
  • Words Their Way®: Sixth Edition © 2016 Stages of Spelling Development
    Words Their Way®: Word Study for Phonics, Vocabulary, and Spelling Instruction, Sixth Edition © 2016 Stages of Spelling Development Words Their Way®: Sixth Edition © 2016 Stages of Spelling Development Introduction In this tutorial, we will learn about the stages of spelling development included in the professional development book titled, Words Their Way®: Word Study for Phonics, Vocabulary, and Spelling Instruction, Sixth Edition. Copyright © 2017 by Pearson Education, Inc. All Rights Reserved. 1 Words Their Way®: Word Study for Phonics, Vocabulary, and Spelling Instruction, Sixth Edition © 2016 Stages of Spelling Development Three Layers of English Orthography There are three layers of orthography, or spelling, in the English language-the alphabet layer, the pattern layer, and the meaning layer. Students' understanding of the layers progresses through five predictable stages of spelling development. Copyright © 2017 by Pearson Education, Inc. All Rights Reserved. 2 Words Their Way®: Word Study for Phonics, Vocabulary, and Spelling Instruction, Sixth Edition © 2016 Stages of Spelling Development Alphabet Layer The alphabet layer represents a one-to-one correspondence between letters and sounds. In this layer, students use the sounds of individual letters to accurately spell words. Copyright © 2017 by Pearson Education, Inc. All Rights Reserved. 3 Words Their Way®: Word Study for Phonics, Vocabulary, and Spelling Instruction, Sixth Edition © 2016 Stages of Spelling Development Pattern Layer The pattern layer overlays the alphabet layer, as there are 42 to 44 sounds in English but only 26 letters in the alphabet. In this layer, students explore combinations of sound spellings that form visual and auditory patterns. Copyright © 2017 by Pearson Education, Inc.
    [Show full text]