An Empirical Study of Smoothing Techniques for Language Modeling

Total Page:16

File Type:pdf, Size:1020Kb

An Empirical Study of Smoothing Techniques for Language Modeling An Empirical Study of Smoothing Techniques for Language Modeling Stanley F. Chen Joshua Goodman Harvard University Harvard University Aiken Computation Laboratory Aiken Computation Laboratory 33 Oxford St. 33 Oxford St. Cambridge, MA 02138 Cambridge, MA 02138 sfc©eecs, harvard, edu goodma.n~eecs, harvard, edu Abstract In this work, we carry out an extensive empirical comparison of the most widely used We present an extensive empirical com- smoothing techniques, including those described parison of several smoothing techniques in by 3elinek and Mercer (1980), Katz (1987), and the domain of language modeling, includ- Church and Gale (1991). We carry out experiments ing those described by Jelinek and Mer- over many training data sizes on varied corpora us- cer (1980), Katz (1987), and Church and ing both bigram and trigram models. We demon- Gale (1991). We investigate for the first strate that the relative performance of techniques time how factors such as training data depends greatly on training data size and n-gram size, corpus (e.g., Brown versus Wall Street order. For example, for bigram models produced Journal), and n-gram order (bigram versus from large training sets Church-Gale smoothing has trigram) affect the relative performance of superior performance, while Katz smoothing per- these methods, which we measure through forms best on bigram models produced from smaller the cross-entropy of test data. In addition, data. For the methods with parameters that can we introduce two novel smoothing tech- be tuned to improve performance, we perform an niques, one a variation of Jelinek-Mercer automated search for optimal values and show that smoothing and one a very simple linear in- sub-optimal parameter selection can significantly de- terpolation technique, both of which out- crease performance. To our knowledge, this is the perform existing methods. first smoothing work that systematically investigates any of these issues. 1 Introduction In addition, we introduce two novel smooth- ing techniques: the first belonging to the class of Smoothing is a technique essential in the construc- smoothing models described by 3elinek and Mer- tion of n-gram language models, a staple in speech cer, the second a very simple linear interpolation recognition (Bahl, Jelinek, and Mercer, 1983) as well method. While being relatively simple to imple- as many other domains (Church, 1988; Brown et al., ment, we show that these methods yield good perfor- 1990; Kernighan, Church, and Gale, 1990). A lan- mance in bigram models and superior performance guage model is a probability distribution over strings in trigram models. P(s) that attempts to reflect the frequency with We take the performance of a method m to be its which each string s occurs as a sentence in natu- cross-entropy on test data ral text. Language models are used in speech recog- nition to resolve acoustically ambiguous utterances. For example, if we have that P(it takes two) >> 1 IT P(it takes too), then we know ceteris paribus to pre- IvT - log Pro(t,) fer the former transcription over the latter. i=1 While smoothing is a central issue in language modeling, the literature lacks a definitive compar- where Pm(ti) denotes the language model produced ison between the many existing techniques. Previ- with method m and where the test data T is com- ous studies (Nadas, 1984; Katz, 1987; Church and posed of sentences (tl,...,tzr) and contains a total Gale, 1991; MacKay and Peto, 1995) only compare of NT words. The entropy is inversely related to a small number of methods (typically two) on a sin- the average probability a model assigns to sentences gle corpus and using a single training data size. As in the test data, and it is generally assumed that a result, it is currently difficult for a researcher to lower entropy correlates with better performance in intelligently choose between smoothing schemes. applications. 310 1.1 Smoothing n-gram Models preventing zero bigram probabilities. However, this In n-gram language modeling, the probability of a scheme has the flaw of assigning the same probabil- string P(s) is expressed as the product of the prob- ity to say, burnish the and burnish thou (assuming abilities of the words that compose the string, with neither occurred in the training data), even though each word probability conditional on the identity of intuitively the former seems more likely because the the last n - 1 words, i.e., ifs = wl-..wt we have word the is much more common than thou. To address this, another smoothing technique is to l 1 interpolate the bigram model with a unigram model P(s) = H P(wi[w{-1) ~ 1-~ P i-1 (1) PML(Wi) = c(wi)/Ns, a model that reflects how of- i=1 i=1 ten each word occurs in the training data. For ex- ample, we can take where w ij denotes the words wi • •. wj. Typically, n is taken to be two or three, corresponding to a bigram Pinto p( i J i-1) = APM (w pW _l) + (1 - or trigram model, respectively. 1 getting the behavior that bigrams involving common Consider the case n = 2. To estimate the proba- words are assigned higher probabilities (Jelinek and bilities P(wilwi-,) in equation (1), one can acquire Mercer, 1980). a large corpus of text, which we refer to as training data, and take 2 Previous Work P(Wi-lWi) PML(Wil i-1) -- The simplest type of smoothing used in practice is P(wi-1) additive smoothing (Lidstone, 1920; Johnson, 1932; c(wi-lWi)/Ns aeffreys, 1948), where we take e(wi-1)/Ns i w i-1 = e(wi_,,+l) + c(wi_ w ) + elVl (2) and where Lidstone and Jeffreys advocate /i = 1. where c(c0 denotes the number of times the string Gale and Church (1990; 1994) have argued that this c~ occurs in the text and Ns denotes the total num- method generally performs poorly. ber of words. This is called the maximum likelihood The Good-Turing estimate (Good, 1953) is cen- (ML) estimate for P(wilwi_l). tral to many smoothing techniques. It is not used While intuitive, the maximum likelihood estimate directly for n-gram smoothing because, like additive is a poor one when the amount of training data is smoothing, it does not perform the interpolation of small compared to the size of the model being built, lower- and higher-order models essential for good as is generally the case in language modeling. For ex- performance. Good-Turing states that an n-gram ample, consider the situation where a pair of words, that occurs r times should be treated as if it had or bigram, say burnish the, doesn't occur in the occurred r* times, where training data. Then, we have PML(the Iburnish) = O, which is clearly inaccurate as this probability should r* = (r + 1)n~+l be larger than zero. A zero bigram probability can lead to errors in speech recognition, as it disallows and where n~ is the number of n-grams that. occur the bigram regardless of how informative the acous- exactly r times in the training data. tic signal is. The term smoothing describes tech- Katz smoothing (1987) extends the intuitions of niques for adjusting the maximum likelihood esti- Good-Turing by adding the interpolation of higher- mate to hopefully produce more accurate probabili- order models with lower-order models. It is perhaps ties. the most widely used smoothing technique in speech As an example, one simple smoothing technique is recognition. to pretend each bigram occurs once more than it ac- Church and Gale (1991) describe a smoothing tually did (Lidstone, 1920; Johnson, 1932; Jeffreys, method that combines the Good-Turing estimate 1948), yielding with bucketing, the technique of partitioning a set, of n-grams into disjoint groups, where each group C(Wi-lWi) "[- 1 is characterized independently through a set of pa- = + IVl rameters. Like Katz, models are defined recursively in terms of lower-order models. Each n-gram is as- where V is the vocabulary, the set of all words be- signed to one of several buckets based on its fre- ing considered. This has the desirable quality of quency predicted from lower-order models. Each 1To make the term P(wdw[Z~,,+~) meaningful for bucket is treated as a separate distribution and i < n, one can pad the beginning of the string with Good-Turing estimation is performed within each, a distinguished token. In this work, we assume there are giving corrected counts that are normalized to yield n - 1 such distinguished tokens preceding each sentence. probabilities. 311 Nd bucketing new bucketing ., . ., 2 ° * ~°% ° o °$ o • . ° .~ °e o *°*° * ° • ** o~,~L.s °o . • o oO o ~o ° *b oeW~ o ;.°*~a-:.. • . ° • % at ...,~;e.T¢: ° . .. : ° ° % o% **° ~- ° ~°~ ° o o ° °• °~ ° o* . 6'V, ° o *°Na, o o o , , ,i , , ,i , , ,i , " .... 0 * * I * , , I , , * I , , * I , * lo 100 1000 10000 100000 0.o01 0.01 0.1 1 10 r~rn~¢ of counts in distN~t]on average r~n-zem count in dis~bution r~nus One Figure 1: )~ values for old and new bucketing schemes for Jelinek-Mercer smoothing; each point represents a single bucket The other smoothing technique besides Katz 3.1 Method average-count smoothing widely used in speech recognition is due This scheme is an instance of Jelinek-Mercer to Jelinek and Mercer (1980). They present a class smoothing. Referring to equation (3), recall that of smoothing models that involve linear interpola- Bahl et al. suggest bucketing the A~!-I according tion, e.g., Brown et al. (1992) take to c(Wi_n+l).i--1 We have found that partitioning the i--1 ~!-~ according to the average number of counts *--~+1 ~Wi__ 1 PML(Wi IWi-n+l)i--1 "Iv i-- n-]-I per non-zero element ~(~--~"+1) yields better Iwi:~(~:_.+~)>01 (1-- )~to~-~ ) P~inte~pt /W i wi_n+2)i-1 , (3) results.
Recommended publications
  • Moving Average Filters
    CHAPTER 15 Moving Average Filters The moving average is the most common filter in DSP, mainly because it is the easiest digital filter to understand and use. In spite of its simplicity, the moving average filter is optimal for a common task: reducing random noise while retaining a sharp step response. This makes it the premier filter for time domain encoded signals. However, the moving average is the worst filter for frequency domain encoded signals, with little ability to separate one band of frequencies from another. Relatives of the moving average filter include the Gaussian, Blackman, and multiple- pass moving average. These have slightly better performance in the frequency domain, at the expense of increased computation time. Implementation by Convolution As the name implies, the moving average filter operates by averaging a number of points from the input signal to produce each point in the output signal. In equation form, this is written: EQUATION 15-1 Equation of the moving average filter. In M &1 this equation, x[ ] is the input signal, y[ ] is ' 1 % y[i] j x [i j ] the output signal, and M is the number of M j'0 points used in the moving average. This equation only uses points on one side of the output sample being calculated. Where x[ ] is the input signal, y[ ] is the output signal, and M is the number of points in the average. For example, in a 5 point moving average filter, point 80 in the output signal is given by: x [80] % x [81] % x [82] % x [83] % x [84] y [80] ' 5 277 278 The Scientist and Engineer's Guide to Digital Signal Processing As an alternative, the group of points from the input signal can be chosen symmetrically around the output point: x[78] % x[79] % x[80] % x[81] % x[82] y[80] ' 5 This corresponds to changing the summation in Eq.
    [Show full text]
  • Spatial Domain Low-Pass Filters
    Low Pass Filtering Why use Low Pass filtering? • Remove random noise • Remove periodic noise • Reveal a background pattern 1 Effects on images • Remove banding effects on images • Smooth out Img-Img mis-registration • Blurring of image Types of Low Pass Filters • Moving average filter • Median filter • Adaptive filter 2 Moving Ave Filter Example • A single (very short) scan line of an image • {1,8,3,7,8} • Moving Ave using interval of 3 (must be odd) • First number (1+8+3)/3 =4 • Second number (8+3+7)/3=6 • Third number (3+7+8)/3=6 • First and last value set to 0 Two Dimensional Moving Ave 3 Moving Average of Scan Line 2D Moving Average Filter • Spatial domain filter • Places average in center • Edges are set to 0 usually to maintain size 4 Spatial Domain Filter Moving Average Filter Effects • Reduces overall variability of image • Lowers contrast • Noise components reduced • Blurs the overall appearance of image 5 Moving Average images Median Filter The median utilizes the median instead of the mean. The median is the middle positional value. 6 Median Example • Another very short scan line • Data set {2,8,4,6,27} interval of 5 • Ranked {2,4,6,8,27} • Median is 6, central value 4 -> 6 Median Filter • Usually better for filtering • - Less sensitive to errors or extremes • - Median is always a value of the set • - Preserves edges • - But requires more computation 7 Moving Ave vs. Median Filtering Adaptive Filters • Based on mean and variance • Good at Speckle suppression • Sigma filter best known • - Computes mean and std dev for window • - Values outside of +-2 std dev excluded • - If too few values, (<k) uses value to left • - Later versions use weighting 8 Adaptive Filters • Improvements to Sigma filtering - Chi-square testing - Weighting - Local order histogram statistics - Edge preserving smoothing Adaptive Filters 9 Final PowerPoint Numerical Slide Value (The End) 10.
    [Show full text]
  • Sequence Models
    Sequence Models Noah Smith Computer Science & Engineering University of Washington [email protected] Lecture Outline 1. Markov models 2. Hidden Markov models 3. Viterbi algorithm 4. Other inference algorithms for HMMs 5. Learning algorithms for HMMs Shameless Self-Promotion • Linguistic Structure Prediction (2011) • Links material in this lecture to many related ideas, including some in other LxMLS lectures. • Available in electronic and print form. MARKOV MODELS One View of Text • Sequence of symbols (bytes, letters, characters, morphemes, words, …) – Let Σ denote the set of symbols. • Lots of possible sequences. (Σ* is infinitely large.) • Probability distributions over Σ*? Pop QuiZ • Am I wearing a generative or discriminative hat right now? Pop QuiZ • Generative models tell a • Discriminative models mythical story to focus on tasks (like explain the data. sorting examples). Trivial Distributions over Σ* • Give probability 0 to sequences with length greater than B; uniform over the rest. • Use data: with N examples, give probability N-1 to each observed sequence, 0 to the rest. • What if we want every sequence to get some probability? – Need a probabilistic model family and algorithms for constructing the model from data. A History-Based Model n+1 p(start,w1,w2,...,wn, stop) = γ(wi w1,w2,...,wi 1) | − i=1 • Generate each word from left to right, conditioned on what came before it. Die / Dice one die two dice start one die per history: … … … start I one die per history: … … … history = start start I want one die per history: … … history = start I … start I want a one die per history: … … … history = start I want start I want a flight one die per history: … … … history = start I want a start I want a flight to one die per history: … … … history = start I want a flight start I want a flight to Lisbon one die per history: … … … history = start I want a flight to start I want a flight to Lisbon .
    [Show full text]
  • Review of Smoothing Methods for Enhancement of Noisy Data from Heavy-Duty LHD Mining Machines
    E3S Web of Conferences 29, 00011 (2018) https://doi.org/10.1051/e3sconf/20182900011 XVIIth Conference of PhD Students and Young Scientists Review of smoothing methods for enhancement of noisy data from heavy-duty LHD mining machines Jacek Wodecki1, Anna Michalak2, and Paweł Stefaniak2 1Machinery Systems Division, Wroclaw University of Science and Technology, Wroclaw, Poland 2KGHM Cuprum R&D Ltd., Wroclaw, Poland Abstract. Appropriate analysis of data measured on heavy-duty mining machines is essential for processes monitoring, management and optimization. Some particular classes of machines, for example LHD (load-haul-dump) machines, hauling trucks, drilling/bolting machines etc. are characterized with cyclicity of operations. In those cases, identification of cycles and their segments or in other words – simply data segmen- tation is a key to evaluate their performance, which may be very useful from the man- agement point of view, for example leading to introducing optimization to the process. However, in many cases such raw signals are contaminated with various artifacts, and in general are expected to be very noisy, which makes the segmentation task very difficult or even impossible. To deal with that problem, there is a need for efficient smoothing meth- ods that will allow to retain informative trends in the signals while disregarding noises and other undesired non-deterministic components. In this paper authors present a review of various approaches to diagnostic data smoothing. Described methods can be used in a fast and efficient way, effectively cleaning the signals while preserving informative de- terministic behaviour, that is a crucial to precise segmentation and other approaches to industrial data analysis.
    [Show full text]
  • Introduction to Machine Learning Lecture 3
    Introduction to Machine Learning Lecture 3 Mehryar Mohri Courant Institute and Google Research [email protected] Bayesian Learning Bayes’ Formula/Rule Terminology: likelihood prior Pr[X Y ]Pr[Y ] Pr[Y X]= | . posterior| Pr[X] probability evidence Mehryar Mohri - Introduction to Machine Learning page 3 Loss Function Definition: function L : R + indicating the penalty for an incorrectY prediction.×Y→ • L ( y,y ) : loss for prediction of y instead of y . Examples: • zero-one loss: standard loss function in classification; L ( y,y )=1 y = y for y,y . ∈Y • non-symmetric losses: e.g., for spam classification; L ( ham , spam) L ( spam , ham) . ≤ squared loss: standard loss function in • 2 regression; L ( y,y )=( y y ) . − Mehryar Mohri - Introduction to Machine Learning page 4 Classification Problem Input space : e.g., set of documents. X N feature vector Φ ( x ) R associated to x . • ∈ ∈X N notation: feature vector x R . • ∈ • example: vector of word counts in document. Output or target space : set of classes; e.g., sport, Y business, art. Problem: given x , predict the correct class y ∈Y associated to x . Mehryar Mohri - Introduction to Machine Learning page 5 Bayesian Prediction Definition: the expected conditional loss of predicting y is ∈Y [y x]= L(y,y)Pr[y x]. L | | y ∈Y Bayesian decision : predict class minimizing expected conditional loss, that is y∗ =argmin [y x]=argmin L(y,y)Pr[y x]. L | | y y y ∈Y zero-oneb loss: y∗ =argmaxb Pr[y x]. • y | Maximum a Posteriori (MAP) principle. b Mehryar Mohri - Introduction to Machine Learning page 6 Binary Classification - Illustration 1 Pr[y x] 1 | Pr[y x] 2 | 0 x Mehryar Mohri - Introduction to Machine Learning page 7 Maximum a Posteriori (MAP) Definition: the MAP principle consists of predicting according to the rule y =argmaxPr[y x].
    [Show full text]
  • Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems
    Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems Daniel Valcarce, Javier Parapar, Álvaro Barreiro Information Retrieval Lab Department of Computer Science University of A Coruña, Spain {daniel.valcarce, javierparapar, barreiro}@udc.es ABSTRACT the users' past behaviour. Continuous developments in this The use of Relevance-Based Language Models for top-N rec- field have been made to meet these high expectations. ommendation has become a promising line of research. Pre- We can distinguish multiple approaches to recommenda- vious works have used collection-based smoothing methods tion [24]. They are often classified in three main categories: for this task. However, a recent analysis on RM1 (an estima- content-based, collaborative filtering and hybrid techniques. tion of Relevance-Based Language Models) in document re- Content-based approaches generate recommendations based trieval showed that this type of smoothing methods demote on the item and user descriptions: they suggest items simi- the IDF effect in pseudo-relevance feedback. In this paper, lar to those liked by the target user [9]. In contrast, collab- we claim that the IDF effect from retrieval is closely related orative filtering methods rely on the interactions (typically to the concept of novelty in recommendation. We perform ratings) between users and items in the system [21]. Finally, an axiomatic analysis of the IDF effect on RM2 conclud- there exist hybrid algorithms that combine both collabora- ing that this kind of smoothing methods also demotes the tive filtering and content-based approaches. IDF effect in recommendation. By axiomatic analysis, we Traditionally, Information Retrieval (IR) has focused on find that a collection-agnostic method, Additive smoothing, delivering the information that users demand [1].
    [Show full text]
  • Data Mining Algorithms
    Data Management and Exploration © for the original version: Prof. Dr. Thomas Seidl Jiawei Han and Micheline Kamber http://www.cs.sfu.ca/~han/dmbook Data Mining Algorithms Lecture Course with Tutorials Wintersemester 2003/04 Chapter 4: Data Preprocessing Chapter 3: Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept hierarchy generation Summary WS 2003/04 Data Mining Algorithms 4 – 2 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data noisy: containing errors or outliers inconsistent: containing discrepancies in codes or names No quality data, no quality mining results! Quality decisions must be based on quality data Data warehouse needs consistent integration of quality data WS 2003/04 Data Mining Algorithms 4 – 3 Multi-Dimensional Measure of Data Quality A well-accepted multidimensional view: Accuracy (range of tolerance) Completeness (fraction of missing values) Consistency (plausibility, presence of contradictions) Timeliness (data is available in time; data is up-to-date) Believability (user’s trust in the data; reliability) Value added (data brings some benefit) Interpretability (there is some explanation for the data) Accessibility (data is actually available) Broad categories: intrinsic, contextual, representational, and accessibility. WS 2003/04 Data Mining Algorithms 4 – 4 Major Tasks in Data Preprocessing
    [Show full text]
  • Smoothing Parameter and Model Selection for General Smooth Models
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION , VOL. , NO. , –, Theory and Methods http://dx.doi.org/./.. Smoothing Parameter and Model Selection for General Smooth Models Simon N. Wooda,NatalyaPyab, and Benjamin Säfkenc aSchool of Mathematics, University of Bristol, Bristol, UK; bSchool of Science and Technology, Nazarbayev University, Astana, Kazakhstan, and KIMEP University, Almaty, Kazakhstan; cChairs of Statistics and Econometrics, Georg-August-Universität Göttingen, Germany ABSTRACT ARTICLE HISTORY This article discusses a general framework for smoothing parameter estimation for models with regular like- Received October lihoods constructed in terms of unknown smooth functions of covariates. Gaussian random effects and Revised March parametric terms may also be present. By construction the method is numerically stable and convergent, KEYWORDS and enables smoothing parameter uncertainty to be quantified. The latter enables us to fix a well known Additive model; AIC; problem with AIC for such models, thereby improving the range of model selection tools available. The Distributional regression; smooth functions are represented by reduced rank spline like smoothers, with associated quadratic penal- GAM; Location scale and ties measuring function smoothness. Model estimation is by penalized likelihood maximization, where shape model; Ordered the smoothing parameters controlling the extent of penalization are estimated by Laplace approximate categorical regression; marginal likelihood. The methods cover, for example, generalized additive models for nonexponential fam- Penalized regression spline; ily responses (e.g., beta, ordered categorical, scaled t distribution, negative binomial and Tweedie distribu- REML; Smooth Cox model; tions), generalized additive models for location scale and shape (e.g., two stage zero inflation models, and Smoothing parameter uncertainty; Statistical Gaussian location-scale models), Cox proportional hazards models and multivariate additive models.
    [Show full text]
  • Introduction to Naivebayes Package
    Introduction to naivebayes package Michal Majka March 8, 2020 1 Introduction The naivebayes package provides an efficient implementation of the popular Naïve Bayes classifier. It was developed and is now maintained based on three principles: it should be efficient, user friendly and written in Base R. The last implies no dependencies, however, it neither denies nor interferes with the first as many functions from the Base R distribution use highly efficient routines programmed in lower level languages, such as C or FORTRAN. In fact, the naivebayes package utilizes only such functions for resource-intensive calculations. This vignette should make the implementation of the general naive_bayes() function more transparent and give an overview over its functionalities. 2 Installation Just like many other R packages, the naivebayes can be installed from the CRAN repository by simply typing into the console the following line: install.packages("naivebayes") An alternative way of obtaining the package is first downloading the package source from https://CRAN.R-project.org/package=naivebayes, specifying the location of the file and running in the console: # path_to_tar.gz file <- " " install.packages(path_to_tar.gz, repos= NULL, type= "source") The full source code can be viewed either on the official GitHub CRAN repository: https: //github.com/cran/naivebayes or on the development repository: https://github.com/ majkamichal/naivebayes. After successful installation, the package can be used with: library(naivebayes) 3 Main functions The general function
    [Show full text]
  • Image Smoothening and Sharpening Using Frequency Domain Filtering Technique
    International Journal of Emerging Technologies in Engineering Research (IJETER) Volume 5, Issue 4, April (2017) www.ijeter.everscience.org Image Smoothening and Sharpening using Frequency Domain Filtering Technique Swati Dewangan M.Tech. Scholar, Computer Networks, Bhilai Institute of Technology, Durg, India. Anup Kumar Sharma M.Tech. Scholar, Computer Networks, Bhilai Institute of Technology, Durg, India. Abstract – Images are used in various fields to help monitoring The content of this paper is organized as follows: Section I processes such as images in fingerprint evaluation, satellite gives introduction to the topic and projects fundamental monitoring, medical diagnostics, underwater areas, etc. Image background. Section II describes the types of image processing techniques is adopted as an optimized method to help enhancement techniques. Section III defines the operations the processing tasks efficiently. The development of image applied for image filtering. Section IV shows results and processing software helps the image editing process effectively. Image enhancement algorithms offer a wide variety of approaches discussions. Section V concludes the proposed approach and for modifying original captured images to achieve visually its outcome. acceptable images. In this paper, we apply frequency domain 1.1 Digital Image Processing filters to generate an enhanced image. Simulation outputs results in noise reduction, contrast enhancement, smoothening and Digital image processing is a part of signal processing which sharpening of the enhanced image. uses computer algorithms to perform image processing on Index Terms – Digital Image Processing, Fourier Transforms, digital images. It has numerous applications in different studies High-pass Filters, Low-pass Filters, Image Enhancement. and researches of science and technology. The fundamental steps in Digital Image processing are image acquisition, image 1.
    [Show full text]
  • Optimising the Smoothness and Accuracy of Moving Average for Stock Price Data
    Technological and Economic Development of Economy ISSN 2029-4913 / eISSN 2029-4921 2018 Volume 24 Issue 3: 984–1003 https://doi.org/10.3846/20294913.2016.1216906 OPTIMISING THE SMOOTHNESS AND ACCURACY OF MOVING AVERAGE FOR STOCK PRICE DATA Aistis RAUDYS1*, Židrina PABARŠKAITĖ2 1Faculty of Mathematics and Informatics, Vilnius University, Naugarduko g. 24, LT-03225 Vilnius, Lithuania 2Faculty of Electrical and Electronics Engineering, Kaunas University of Technology, Studentų g. 48, LT-51367 Kaunas, Lithuania Received 19 May 2015; accepted 05 July 2016 Abstract. Smoothing time series allows removing noise. Moving averages are used in finance to smooth stock price series and forecast trend direction. We propose optimised custom moving av- erage that is the most suitable for stock time series smoothing. Suitability criteria are defined by smoothness and accuracy. Previous research focused only on one of the two criteria in isolation. We define this as multi-criteria Pareto optimisation problem and compare the proposed method to the five most popular moving average methods on synthetic and real world stock data. The comparison was performed using unseen data. The new method outperforms other methods in 99.5% of cases on synthetic and in 91% on real world data. The method allows better time series smoothing with the same level of accuracy as traditional methods, or better accuracy with the same smoothness. Weights optimised on one stock are very similar to weights optimised for any other stock and can be used interchangeably. Traders can use the new method to detect trends earlier and increase the profitability of their strategies. The concept is also applicable to sensors, weather forecasting, and traffic prediction where both the smoothness and accuracy of the filtered signal are important.
    [Show full text]
  • Adding Improvements to Multinomial Naive Bayes for Increasing the Accuracy of Aggressive Tweets Classification
    ISSN (Online) 2278-1021 IJARCCE ISSN (Print) 2319-5940 International Journal of Advanced Research in Computer and Communication Engineering Vol. 8, Issue 11, November 2019 Adding Improvements to Multinomial Naive Bayes for Increasing the Accuracy of Aggressive Tweets Classification Divisha Bisht1, Sanjay Joshi2 Student, Department of Information Technology, College of Technology, GBPUA&T, Pantnagar, India1 Assistant Professor, Department of Information Technology, College of Technology, GBPUA&T, Pantnagar, India2 Abstract: Naïve Bayes is a popular supervised learning method widely used for text classification and sentiment analysis. There has been a rise of aggressive troll comments in the social networking sites which leads to online harassment and causes distressful online experiences. This paper uses Naïve Bayes classifier using Bag of Words on ‘Tweets dataset for Detection of Cyber-Trolls’ (dataset taken from Kaggle) and aims to improve baseline model by adding cumulative changes and studying their impact on the performance of the model. Keywords: Naïve Bayes, Classification, Improvements, Accuracy I. INTRODUCTION The prevalence, ease of access and anonymity in the social media communities supported by the all dominating web of internet has led to a rise of online negative behaviour in the form of trolling. A regular influx of a large number of troll comments can be observed in most Social Networking Sites (SNS) like Twitter, Facebook, Reddit. The Collins English Dictionary defines troll as “someone who posts unkind or offensive messages
    [Show full text]