Spoken Language Understanding II
Total Page:16
File Type:pdf, Size:1020Kb
NPFL099 - Statistical dialogue systems Spoken language understanding II Filip Jurčíček Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic Home page: http://ufal.mff.cuni.cz/~jurcicek Version: 13/03/2013 NPFL099 2013LS 1/41 Outline ● What is SLU? ● Parsers ● Semantic tuple classifiers ● Hidden Vector State ● PCCG NPFL099 2013LS 2/41 Spoken language understanding ● Definition ● SLU converts recognised speech into meaning ● We are looking for mapping of ● I am looking for a Chinese restaurant into ● inform(venue=restaurant)&inform(food=Chinese) NPFL099 2013LS 3/41 Meaning representation ● Is in the form of dialogue acts: ● Each composed of: ● a dialogue act type: – inform, request, confirm, select, affirm, deny, hello, bye, repeat, help, request_alternatives, etc. ● semantic information: – venue=restaurant – food=Chinese NPFL099 2013LS 4/41 Semantic tuple classifiers ● Developed for trees ● Based on composition of simple classifiers for ● non terminal and terminal nodes ● No word alignment is considered ● classifiers are conditioned on the complete sentence F. Mairesse et al., ”Spoken language understanding from unaligned data using discriminative classification models” in ICASSP '09: Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, 2009, pp. 4749-4752. NPFL099 2013LS 5/41 Trees and DAs inform(venue=”restaurant”, food=”Chinese”, near=”railway station”) CLASSIC notation! NPFL099 2013LS 6/41 High precision mode ● dialogue act type is the root ● C(DAT|W) ● slot name is next ● C(NAME|DAT,W) ● slot value is next ● C(VALUE|NAME,W) ● High precision because ● slot names are conditioned on the DAT inform(venue=”restaurant”, food=”Chinese”, near=”railway station”) NPFL099 2013LS 7/41 High recall mode ● dialogue act type is the root ● C(DAT|W) ● slot name and value is next ● C(NAME, VALUE| W) ● High recall because ● slots are conditioned only on W ● error in classification of DAT does not propagate inform(venue=”restaurant”, food=”Chinese”, near=”railway station”) NPFL099 2013LS 8/41 Choice of classifiers ● Arbitrary classifiers ● Classification ● 1 out of N - DAT ● 1 out of 2 – presence of a slot/vale ● Original paper used SVM (support vector machines) ● a kernel based technique using training examples as data-points ● simple scalar product on features works well NPFL099 2013LS 9/41 Features ● Training classifiers in high recall mode ● C(DAT|W) ● C(NAME,VALUE|W) ● To condition on W, extract features ● bag of words trick – N-grams – N-grams from dependency tries etc. ● Tuning ● features can be optimised for each classifier separately NPFL099 2013LS 10/41 Alternatives to SVM ● Naive Bayes ● 1 out of N ● Decision trees ● 1 out of N ● CART ● C4.5 ● Decision forests ● combination of many trees ● increased robustness NPFL099 2013LS 11/41 Alternatives to SVM ● Logistic regression N ∑ θi Φi (W ) ● 1 out of 2 p(C=1∣W )≈e i ● Kernelised logistic regression N ● 1 out of 2 ∑ θi k (W i , W ) p(C=1∣W )≈e i k (W i , W )=Φ(W i )Φ(W ) Dot product kernel NPFL099 2013LS 12/41 Summary of the classifiers Name of classier Type Output Phoenix handcrafted deterministic SVM data driven deterministic with error margins Naïve Bayes data driven probabilistic Decision tree data driven deterministic can be used for regressions Decision forests data driven probabilistic Logistic regression data driven probabilistic Kernelised regression data driven probabilistic NPFL099 2013LS 13/41 Other alternatives ● Weka toolkit ● KSTAR, IBK, JRIP ● mlpy.sourceforge.net ● Linear Discriminant Analysis (LDA), Basic Perceptron, Elastic Net, Logistic Regression, (Kernel) Support Vector Machines (SVM), Diagonal Linear Discriminant Analysis (DLDA), Golub Classifier, Parzen-based, (kernel) Fisher Discriminant Classifier k-Nearest-Neighbor, Iterative RELIEF, Classification Tree, Maximum Likelihood Classifier ● http://scikit-learn.sourceforge.net ● Lasso, Elastic Net, Least Angle Regression, LARS Lasso, Orthogonal Matching Pursuit (OMP), Bayesian Regression, Logisitic regression, Support Vector Machines, Stochastic Gradient Descent, Nearest Neighbors, Gaussian Processes, Partial Least Squares, Naive Bayes, Decision Trees, Ensemble methods, Feature selection NPFL099 2013LS 14/41 PCFG extension ● Probabilistic version of a CFG ● R → INFORM S ● R → REQUEST S inform(venue=”restaurant”, food=”Chinese”, ● S → S S near=”railway station”) ● S → A ● S → AV ● INFORM → looking ● REQUEST → can I get ● A → food ● AV → food = FV ● FV → Chinese I am looking for a Chinese restaurant near the rail way station. ● FV → Italian NPFL099 2013LS 15/41 PCFG extension ● We need to know probabilities of: ● P(R → INFORM S | W) ● P(S → AV | W) ● P(AV → food = FV | W) ● P(FV → Italian | W) N ∑ θ i Φc ,i( W ) ● … p(C=c∣W )≈e i Or any other probabilistic classifier ● W can be the whole input ● or only the part of the input spanned by the rule NPFL099 2013LS 16/41 PCFG approach is more general ● It can theoretically handle: ● I would like Chinese restaurants which are not cheap or expensive or expensive hotels which are not near city centre but near south of the town. ● inform( venue = restaurant && food = Chinese && (price_range != cheap || price_range != expensive) || venue = hotel && price_range = expensive && (near ! = centre || near = south) ) ● However, we do not need this at this moment NPFL099 2013LS 17/41 Hidden Vector State parser ● PCFG structure is too complex ● Arbitrary depth of a tree ● Lets limit the depth of the semantic tree DEPARTURE TO STATION TIME jede nějaký spěšný vlak do Prahy kolem čtvrté odpoledne Y. He and S. Young (2005). "Semantic Processing using the Hidden Vector State Model." Computer Speech and Language 19(1): 85-106. NPFL099 2013LS 18/41 Pushdown automaton approximation ● Push a new concept for each input word ● Pop 0,1,2,3 concepts fro the current stack NPFL099 2013LS 19/41 Pushdown automaton approximation ● Push a new concept for each word ● Pop 0,1,2,3 concepts fro the current stack NPFL099 2013LS 20/41 HVS training ● Given an utterance ● Unaligned state sequence ● Train probabilistic model of ● observations – P(w| c1) ● push – P(c1|c2,c3,c4) ● pop – P(n|c1,c2,c3,c4) ● EM algorithm NPFL099 2013LS 21/41 HVS summary ● Although well received by community ● it's is not very good ● The learned model is biased towards unimportant common words ● In English: a, the, an ● The reason: ● generative structure ● the articles are in almost all utterances ● P( w = 'a' | c ) is the most reliably estimated probability ● So the automatically inferred alignment ● align some concepts always with articles, etc. NPFL099 2013LS 22/41 Processing multiple hypotheses ● ASR provides N-best list ● 0.33 ± I am looking for a bar ● 0.26 ± I am looking for the bar ● 0.11 ± I am looking for a car ● 0.09 ± I am looking for the car ● ... ● How do we get? ● 0.59 ± inform(task=find, venue=bar) ● 0.20 ± null() ● ... NPFL099 2013LS 23/41 Processing multiple hypotheses ● Semantic parser: P(d∣w ) ● Automatic speech recognition: P(w∣a) ● We want to get: P(d∣a) ● where ● d – dialogue act ● w – word sequence ● a – audio signal NPFL099 2013LS 24/41 Processing multiple hypotheses ● ASR provides multiple word sequence hypotheses ● we have to sum over them ( ∣ )= ( ∣ ) ( ∣ ) P d a ∑w P d w P w a ● Algorithm ● Compute semantic interpretation for every word seq. ● Weight them by the prob. of the word sequence ● Merge the same dialogue acts and sum their probs. NPFL099 2013LS 25/41 Alternative ● ASR provides P(w∣a) ● map directly from probability distribution to dialogue acts P(d∣a)=P(d∣P(w∣a)) θT⋅Φ (P(w∣a)) P(d∣a)≈e d ● P(w|a) - can be compactly represented in the form of a confusion network NPFL099 2013LS 26/41 Confusion network ● N-best list ● 0.33 ± I am looking for a bar ● 0.26 ± I am looking for the bar ● 0.11 ± I am looking for a car ● 0.09 ± I am looking for the car ● Confusion network I – 0.9 my – 0.07 am – 0.9 a – 0.6 bar – 0.5 looking – 1.0 for – 1.0 car – 0.4 hi – 0.02 ε – 0.01 ε – 0.1 the – 0.4 ε – 0.1 NPFL099 2013LS 27/41 Features from an utterance ● 1 – best utterance ● I am looking for a bar ● E.g. bigrams ● (I,am) = 1 ● (am,looking) = 1 ● (looking,for) = 1 ● (for,a) = 1 ● (a,bar) = 1 NPFL099 2013LS 28/41 Features from a CN ● CN I – 0.9 my – 0.07 am – 0.9 a – 0.6 bar – 0.5 looking – 1.0 for – 1.0 car – 0.4 hi – 0.02 ε – 0.01 ε – 0.1 the – 0.4 ε – 0.1 ● E.g. bigram features Φd (P(w∣a)) ● (I,am) = 0.81 ● (my,am) = 0.063 Normalise for the length of N-grams ● (looking,for) = 1 ● (a,bar) = 0.3 ● (a,car) = 0.24 ● (a,eps) = 0.06 NPFL099 2013LS 29/41 Thank you! Filip Jurčíček Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic Home page: http://ufal.mff.cuni.cz/~jurcicek NPFL099 2013LS 30/41 SLU exercise ● Built a SLU component using a technique of your choice ● E.g. ● Phoenix parser ● TBL ● SVM ● Decision trees – Random (Decision) forests ● Conditional Random Fields ● Template based matching – clustering ● CFG based NPFL099 2013LS 31/41 Dialogue acts in data ● The format is slightly different ● Each slot value pair has a DAT ● perfect thank you goodbye ● bye()&thankyou() ● i'd like an english restaurant that plays folk music in the north part of town ● inform(area="north")&inform(food="english")&inform( music="folk")&inform(type="restaurant") ● and what type of restaurant is the grand ● inform(name="the grand")&request(food) NPFL099 2013LS 32/41 Dialogue acts in data ● The format is slightly different ● Each slot value pair has a DAT ● perfect thank you goodbye ● bye()&thankyou() ● i'd like an english restaurant that plays folk music in the north part of town ● inform(area="north")&inform(food="english")&inform( music="folk")&inform(type="restaurant") ● and what type of restaurant is the grand ● inform(name="the grand")&request(food) NPFL099 2013LS 33/41 Provided data ● All data ● do not distribute ● use only for NPFL099 ● Original CUED data ● in SDS/applications/TownInfo/cued_data ● Processed data in new format ● run ./cued-sem2ufal-sem.py in SDS/applications/TownInfo ● new data in SDS/applications/TownInfo/data ● I provide the data already in new format NPFL099 2013LS 34/41 Provided data ● Data: train, dev, test ● Data: asr, transribed ● Files: ● auto_database.py ● database.py ● ..