View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by St Andrews Research Repository Computation of Infix Probabilities for Probabilistic Context-Free Grammars Mark-Jan Nederhof Giorgio Satta School of Computer Science Dept. of Information Engineering University of St Andrews University of Padua United Kingdom Italy
[email protected] [email protected] Abstract or part-of-speech, when a prefix of the input has al- ready been processed, as discussed by Jelinek and The notion of infix probability has been intro- Lafferty (1991). Such distributions are useful for duced in the literature as a generalization of speech recognition, where the result of the acous- the notion of prefix (or initial substring) prob- tic processor is represented as a lattice, and local ability, motivated by applications in speech recognition and word error correction. For the choices must be made for a next transition. In ad- case where a probabilistic context-free gram- dition, distributions for the next word are also useful mar is used as language model, methods for for applications of word error correction, when one the computation of infix probabilities have is processing ‘noisy’ text and the parser recognizes been presented in the literature, based on vari- an error that must be recovered by operations of in- ous simplifying assumptions. Here we present sertion, replacement or deletion. a solution that applies to the problem in its full generality. Motivated by the above applications, the problem of the computation of infix probabilities for PCFGs has been introduced in the literature as a generaliza- 1 Introduction tion of the prefix probability problem.