An Unsupervised Topic Segmentation Model Incorporating Word Order

Shoaib Jameel and Wai Lam

The Chinese University of Hong Kong

One line summary of the work We will see how maintaining the document structure such as paragraphs, sentences, and the word order helps improve the performance of a .

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 1 Outline

Motivation Related Work

I Probabilistic Unigram Topic Model (LDA) I Probabilistic N-gram Topic Models I Topic Segmentation Models Overview of our model

I Our N-gram Topic Segmentation model (NTSeg) Experiments of NTSeg

I Word-Topic and Segment-Topic Correlation Graph I Topic Segmentation Experiment I Document Classification Experiment I Document Likelihood Experiment Conclusions and Future Directions

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 2 Motivation Many works in the topic modeling literature assume exchangeability among the words. As a result we see many ambiguous words in topics. For example, consider few topics obtained from the NIPS collection using the Latent Dirichlet Allocation (LDA) model: Example

Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 architecture order connectionist potential prior recurrent first role membrane bayesian network second binding current data module analysis structures synaptic evidence modules small distributed dendritic experts

The problem with the LDA model Words in topics are not insightful. Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 3 Motivation Many works in the topic modeling literature assume exchangeability among the words. As a result we see many ambiguous words in topics. For example, consider few topics obtained from the NIPS collection using the Latent Dirichlet Allocation (LDA) model: Example

Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 architecture order connectionist potential prior recurrent first role membrane bayesian network second binding current data module analysis structures synaptic evidence modules small distributed dendritic experts

The problem with the LDA model Words in topics are not insightful. Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 3 Latent Dirichlet Allocation Model (LDA) (Blei et al., JMLR-2003)

Generative Process Graphical Model in Plate 1 Draw θ from Dirichlet(α), Diagram where each θ(d) consists of topic distribution for document θ α d 2 Draw φ from Dirichlet(β), where φ encompasses word distribution for topic z β 3 For every word in the document d (d) 1 Draw a topic zi from Multinomial (θ(d)) (d) w φ 2 Draw a word wi from Multinomial (φ (d) ) N Z zi M

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 4 Relaxing the Bag-of-Words Assumption in a Topic Model Can the bag-of-words assumption be relaxed in a topic model? This makes more sense as this is how documents are written by humans and also read.

more Here things are lot more dog a organized. I can understand noticed that it was actually the cat I which noticed a dog. lot cat understand don’t things here

Figure : This is how the bag-of-words looks (left) - complete chaos. The one on the right makes more sense to us.

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 5 Relaxing the Bag-of-Words Assumption Topic Model (BTM) (Wallach, ICML-2006)

Some Properties of the model Graphical Model of BTM

Word is generated by both the α topic and the previous word θ Inspired by the Hierarchical Dirichlet Language Model (d) (d) (d) (d) zi 1 zi zi+1 zi+2 Better empirical results than − the LDA model

(d) (d) (d) (d) w A limitation of the model wi 1 wi wi+1 i+2 − I Always generates in M a topic

σ δ ZV

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 6 Relaxing the Bag-of-Words Assumption LDA-Collocation Model (LDACOL) (Griffiths et al., Psy. Rev-2007)

Some Properties of the model Graphical Model of LDACOL

Word is generated by the α topic, the previous word and a binary bigram status variable θ Each word has a topic (d) (d) (d) (d) ψ assignment and a collocation zi 1 zi zi+1 zi+2 − V assignment (d) (d) (d) xi xi+1 xi+2 Can form longer order phrases γ (d) (d) (d) (d) wi 1 wi wi+1 wi+2 − Can generate both unigrams M and bigram words β φ σ δ A limitation of the model Z V

I Only the first word in a bigram has a topic assignment

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 7 Relaxing the Bag-of-Words Assumption Topical N-Gram Model (TNG) (Wang et al., ICDM-2007)

Some Properties of the model Graphical Model of TNG

Extends the LDACOL model α Each word has a topic

assignment and a collocation θ assignment (d) (d) (d) (d) ψ zi 1 zi zi+1 zi+2 Can form longer order phrases − ZV

(d) (d) (d) Can generate both unigrams xi xi+1 xi+2 γ and bigram words (d) (d) (d) (d) wi 1 wi wi+1 wi+2 − A limitation of the model M

I Words in a bigram may have β φ σ δ different topic assignments Z ZV

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 8 What Topic N-gram models do - An Illustration

Abstract We give necessary and sufficient conditions for uniqueness of the support vector solution for the problems of 1 Para. pattern recognition and regression estimation, for a general class of cost functions. We sho that if the solution is not unique, all support vectors are necessarily at bound, and we give some simple examples of non-unique solutions. We note that uniqueness of the primal (dual) silution does not necessarily imply uniqueness of the dual (primal) solution. We show how to compute the threshold b when the solution is unique, but when all support vectors are bound, in which ...

aa 2 Para. Acknowledgements C. Burges wishes to thank W. Keasler, V. Lawrence and C. Nohl of Lucent Technologies for their case the usual method for determining b does not work... support. Reference [1] R. Fletcher, Practical Methods of Optimization. John Wiley and Sons, Inc., 2nd edition, 1987.

Topic 1 Topic 2 support vector acknowledgements cost functions reference

Consider the document as a whole Find topical n-grams in the document

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 9 Bag of Words in Topic Segmentation These models maintain the document structure such as paragraphs or sentences Assume that words within a paragraph or a sentence are exchangeable Introduces the notion of segment-topics or super-topics and word-topics

Paragraph n in the document d

follow 1 this is will next 2 paragraph

follow this is 3 paragraph next 2 will

Paragraph n +1 in the document d

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 10 A Topic Segmentation Model (LDSEG) (Shafiei et al., Canadian AI-2008)

Model Properties Graphical Model in Plate Performs topic segmentation Diagram Can work at paragraph and τ ρ sentence level π c a binary variable gives the Ω y c change in topics segment-wise Segments come from a θ α predefined number of super-topics β The super-topics comprise of z a mixture of word-topics w φ

N Z S M

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 11 A Topic Segmentation Model (LDSEG) (Shafiei et al., Canadian AI-2008)

Model Properties Graphical Model in Plate This region is similar to the Diagram LDA model τ ρ Segments exhibit multiple π topics Ω y c Words are generated from a predefined number of word-topics θ α

β z

w φ

N Z S M

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 12 Topic Segmentation Illustration using the LDSEG model

Abstract We give necessary and sufficient conditions for uniqueness of the support vector solution for the problems of pattern recognition and regression estimation, for a general class of cost functions. We sho that if the solution is not 1 Para. unique, all support vectors are necessarily at bound, and we give some simple examples of non-unique solutions. We Super-Topic 7 note that uniqueness of the primal (dual) silution does not necessarily imply uniqueness of the dual (primal) solution. We show how to compute the threshold b when the solution is unique, but when all support vectors are bound, in which ...

Acknowledgements C. Burges wishes to thank W. Keasler, V. Lawrence and C. Nohl of Lucent Technologies for their 2 Para. case the usual method for determining b does not work... Super-Topic 4 support. Reference [1] R. Fletcher, Practical Methods of Optimization. John Wiley and Sons, Inc., 2nd edition, 1987.

Word-Topic 1 Word-Topic 2 Super-Topic 1 Super-Topic 2

support acknowledgements Word-Topic 1 Word-Topic 5 vector reference Word-Topic 9 Word-Topic 8 cost Word-Topic 12 Word-Topic 1

Performs topic segmentation Unigram words are assigned to the word-topics Segments are assigned to the document-topics or super-topics

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 13 Our Research Contributions

We propose a model called, NTSeg Our proposed model maintains the document structure such as paragraphs and sentences Maintains the order of the words in the document Detects and coordinates two topic granularity levels

I Segment-Topics I Word-Topics Derivation of the posterior inference scheme Conducted extensive text mining experiments

I Shown improvement over state-of-the-art models

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 14 Our Proposed Model (NTSeg)

Ω ρ α K

π(d) τ (d)

(d) (d) ys 1 ys −

(d) cs 1 − (d) cs (d) (d) θs 1 θs −

(d) (d) (d) z z z (d) (d) (d) s 1,n 1 s 1,n s 1,n+1 zs,n 1 zs,n zs,n+1 − − − − −

(d) (d) x (d) (d) xs 1,n s 1,n+1 xs,n xs,n+1 − −

(d) (d) (d) (d) (d) (d) ws 1,n 1 ws 1,n ws 1,n+1 w − − − ws,n 1 ws,n s,n+1 − −

M

β φ ψ γ σ δ Z ZV ZV

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 15 Our Proposed Model (NTSeg)

Ω ρ α K

π(d) τ (d)

(d) (d) ys−1 ys

(d) cs−1 (d) cs (d) (d) θs−1 θs

(d) (d) (d) d (d) z − − z − z − (d) ( ) s 1,n 1 s 1,n s 1,n+1 zs,n−1 zs,n zs,n+1

d (d) x( ) (d) (d) xs−1,n s−1,n+1 xs,n xs,n+1

d d ( ) ( ) (d) d (d) ws−1,n−1 ws−1,n w − (d) ( ) s 1,n+1 ws,n−1 ws,n ws,n+1

M

β φ ψ γ σ δ Z ZV ZV

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 16 Few Properties of NTSeg

Segments are assigned to the segment-topics (d) Assume a Markov property on the segment-topics ys (d) cs denotes the segment-topic change-points Segments can be taken as a paragraphs or sentences

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 17 Our Proposed Model (NTSeg)

Ω ρ α K

π(d) τ (d)

(d) (d) ys−1 ys

(d) cs−1 (d) cs (d) (d) θs−1 θs

(d) (d) (d) d (d) z − − z − z − (d) ( ) s 1,n 1 s 1,n s 1,n+1 zs,n−1 zs,n zs,n+1

d (d) x( ) (d) (d) xs−1,n s−1,n+1 xs,n xs,n+1

d d ( ) ( ) (d) d (d) ws−1,n−1 ws−1,n w − (d) ( ) s 1,n+1 ws,n−1 ws,n ws,n+1

M

β φ ψ γ σ δ Z ZV ZV

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 18 Some properties of NTSeg

Does not break the order of the words Can form unigrams, bigrams and higher order phrases (using x) variable The phrases share the same topic

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 19 b b b b b b

θ

d d z( ) (d) z(d) z( ) i−1 zi i+1 i+2

d d x( ) x(d) x( ) i i+1 i+2

d w( ) w(d) w(d) w(d) i−1 i i+1 i+2

Irish cricket team

Word Word Word Topic Topic Topic 7 7 7

x =1 x =1

Irish cricket team

Figure : This is how we form longer phrases

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 20 Technical Challenges for NTSeg

Sharing of the same word-topic among words in a phrase Coordination of segment-topics and word-topics Derivation of the Posterior Inference scheme

I Gibbs sampling update equations

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 21 Posterior Inference Gibbs Sampling

Sampling word-topic assignments

P(z(d), x(d) w, z(d), x(d), y, c, α, β, γ, δ, ρ, Ω) si si | ¬si ¬si ∝ (d) (α (d) (d) + h 1) (γ (d) + p (d) (d) (d) 1) y z (d) x z w x s si szsi − × si s,i−1 s,i−1 si −  β (d) +n (d) (d) −1 w z w ( )  si si si if x d = 0  PV  si  βv +n (d) −1  v=1 z v si δ (d) +m (d) (d) (d) −1 (1) × w w w z  si si s,i−1 si if x(d) = 1 & z(d) = z(d)  PV  si si s,i−1  δv +m (d) (d) −1  v=1 w vz s,i−1 si

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 22 Posterior Inference Gibbs Sampling

Sampling segment-topic assignments

P(y (d), c(d) z, y (d), c(d), w, x, α, β, γ, δ, ρ, Ω) s s | ¬s ¬s ∝  (d) (d) (ρ (d) + b 1) (α (d) (d) + h 1)  y (d) y z (d)  s ys − × s si szsi − ×  (d)  κ +Ω0  ( )  cs,0 d  P1 (d) if cs = 0 x=0 κcs,x +Ω0+Ω1 (d) (2) ( )  κ +Ω1   d cs,1 (α (d) (d) + h (d) 1) 1 (d)  ys z sz P κ +Ω +Ω  si si − × x=0 cs,x 0 1  (d) (d) (d)  if cs = 1 & s > 1 & ys = y(s−1)

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 23 NTSeg Word-Topic and Segment-Topic Illustration

Abstract We give necessary and sufficient conditions for uniqueness of the support vector solution for the problems of pattern recognition and regression estimation, for a general class of cost functions. We sho that if the solution is not 1 Para. unique, all support vectors are necessarily at bound, and we give some simple examples of non-unique solutions. We Segment Topic 7 note that uniqueness of the primal (dual) silution does not necessarily imply uniqueness of the dual (primal) solution. We show how to compute the threshold b when the solution is unique, but when all support vectors are bound, in which ...

Acknowledgements C. Burges wishes to thank W. Keasler, V. Lawrence and C. Nohl of Lucent Technologies for their 2 Para. case the usual method for determining b does not work... Segment Topic 4 support. Reference [1] R. Fletcher, Practical Methods of Optimization. John Wiley and Sons, Inc., 2nd edition, 1987.

Word-Topic 1 Word-Topic 2 Segment-Topic 1 Segment-Topic 2

support vector acknowledgements Word-Topic 1 Word-Topic 5 cost functions reference Word-Topic 9 Word-Topic 8 Word-Topic 12 Word-Topic 1

Performs document segmentation based on topic N-gram words are assigned to the word-topics Segments are assigned to the segment-topics

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 24 Word-topic and Segment-topic Correlation Graph

Used a large dataset - OHSUMED

I OHSUMED consists of 348,566 medical abstracts The idea is to show the discovery of n-gram words of topics via the correlation graph

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 25 Word-Topic and Segment-Topic Correlation Graph Result of NTSeg

hiv human immunodeficiency virus weeks fetal aids umbilical artery infected fetal heart infection adult vaccine protection antibody response immunization protective

method medical results obtained health children system family physicians child clinical laboratory primary care sexual abuse methods family pratice adults young children women infants gestational age confidence interval minor risk pregnant woman men preterm infants risk factors

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 26 Topic Correlation Graph Correlation Graph from GD-LDA (Caballero et al., CIKM-2012) study computer patient make people internet hand universe information system disease technology tv medical service people increase people program busy work drug state bank problem united economy talk system story nato investor journal clinton percent time american price editor military budget war york nuclear president politic chechnya power

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 27 Topic Segmentation Experiment (d) The ordering of cs gives the topic change-points in the document Used two benchmark datasets - Books and Lectures datasets

I Books dataset - Medical text book, 140 sentences, 227 chapters I Lectures dataset - Undergraduate lecture recording of Physics and AI classes, 90 min lecture, 700 sentences, 8500 words A segment here is a sentence Comparative method - TopicTiling Algorithm (Reidl et al, ACL-2012) Used two commonly used evaluation metrics

I Pk - Probability that the two segments drawn randomly from a document are incorrectly identified as belonging to the same topic I WinDiff - Moves a sliding window across the text and counts the number of times the hypothesized and referenced segment boundaries are different from within the window These two evaluation metrics give an error estimate, so the lower, the better

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 28 Topic Segmentation Results Books dataset 10 Segment-Topics (K) 30 Segment-Topics (K) 50 Segment-Topics (K)

0.330 0.330 0.330 0.325 0.325 0.325 0 320 Pk Pk 0.320 Pk . 0.320 0.315 0.315 0.310 0.310 0.315 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 Word-Topics (Z) Word-Topics (Z) Word-Topics (Z) 10 Segment-Topics (K) 30 Segment-Topics (K) 50 Segment-Topics (K)

0.350 0.350 0.350 0.345 0.345 0.340 0.340 0.340 WinDiff WinDiff WinDiff 0.335 0.330 0.335 0.330 0.330 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 Word-Topics (Z) Word-Topics (Z) Word-Topics (Z)

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 29 Topic Segmentation Results Lectures dataset 10 Segment-Topics (K) 30 Segment-Topics (K) 50 Segment-Topics (K)

0.380 0.380 0.380 0.375 0 375 0.370 . 0.370 Pk Pk Pk 0.370 0 365 . 0.360 0.360 0.365 0.355 0.350 0.360 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 Word-Topics (Z) Word-Topics (Z) Word-Topics (Z) 10 Segment-Topics (K) 30 Segment-Topics (K) 50 Segment-Topics (K) 0.460 0.460 0.461 0.455 0.458 0.455 0.450 0.455 0.452 0.450 0.445 0.449 WinDiff WinDiff 0.440 WinDiff 0.445 0.435 0.446 0.443 0.440 0.430 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 Word-Topics (Z) Word-Topics (Z) Word-Topics (Z)

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 30 Document Classification Experiment Dataset

Generate four datasets from 20 Newsgroups data The datasets are:

I Computer I Politics I Sports I Science Each dataset comprises of equal number of documents of several classes. For example, the Computer dataset consists of the following classes:

I Graphics I Hardware I X Windows I Mac I Microsoft Windows

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 31 Document Classification Experiment Experimental Setup

Split each dataset into training and test set maintaining the class distribution

I We used 75% training and 25% testing in our experiments For each class, we generate a topic model using the training set During classification, compute the likelihood of each document in the test set in each topic model The test document gets classified to that class where the likelihood is maximum Evaluation Metrics

I Standard Precision, Recall and F-Measure for each class I Adopted Macro-Averaging scheme

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 32 Document Classification Experiment Comparative Methods

Latent Dirichlet Segmentation Method (Word-Topics and Super-Topics) - LDSEG (Shafiei et al., Canadian AI-2008) Pachinko Allocation Model (Super-Topics and Word-Topics) - PAM (Li and McCallum, ICML-2006) LDA Collocation Model (N-gram Topic Model) - LDACOL (Griffiths et al., Psy. Rev-2007) Topical N-gram Model (N-gram Topic Model) - TNG (Wang et al., ICDM-2007) Phrase Discovery Topic Model based on Pitman-Yor Process - PDLDA (Lindsey et al., EMNLP-2012)

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 33 Document Classification Experiment Results

Precision Recall F-Measure Precision Recall F-Measure LDSEG 0.580 0.420 0.487 0.440 0.400 0.419 PAM 0.550 0.450 0.495 0.500 0.330 0.398 LDACOL 0.400 0.300 0.343 0.420 0.370 0.393 TNG 0.490 0.420 0.452 0.560 0.470 0.511 PDLDA 0.580 0.500 0.537 0.580 0.510 0.543 NTSeg 0.640 0.520 0.574 0.620 0.560 0.588 Computer dataset Science dataset Precision Recall F-Measure Precision Recall F-Measure LDSEG 0.390 0.320 0.352 0.330 0.320 0.325 PAM 0.540 0.490 0.514 0.368 0.360 0.363 LDACOL 0.550 0.410 0.470 0.200 0.180 0.189 TNG 0.550 0.450 0.495 0.340 0.290 0.313 PDLDA 0.590 0.410 0.484 0.380 0.210 0.271 NTSeg 0.620 0.570 0.594 0.420 0.380 0.399 Politics dataset Sports dataset

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 34 Document Modeling Experiment

NIPS dataset 10 Segment-Topics (K) 106 50 Segment-Topics (K) · 106 · 8.600 − 8.600 − 8.700 8.700 − −

Log-Likelihood 8.800 8.800 Log-Likelihood − − 50 100 150 200 50 100 150 200 Word-Topics (Z) Word-Topics (Z)

Figure : NTSeg ( ) LDSEG ( ), PAM ( ), LDACOL ( ), TNG ( ), and PDLDA ( ).

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 35 Document Modeling Experiment Results

OHSUMED dataset (348,566 medical abstracts) 50 Segment-Topics (K) 150 Segment-Topics (K) 107 107 3.150 · · − 3.150 −

3.200 3.200 − −

3.250 .

Log-Likelihood Log-Likelihood 3 250 − −

3.300 3.300 − 200 300 400 500 − 200 300 400 500 Word-Topics (Z) Word-Topics (Z)

Figure : NTSeg ( ) LDSEG ( ), PAM ( ), LDACOL ( ), TNG ( ), and PDLDA ( ).

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 36 Concluding Remarks

We have presented a topic segmentation model that:

I Maintains the document structure such as paragraphs and sentences I Keeps the order of the words intact We have applied our model in multitudes of text mining tasks

I We have obtained good improvement over the state-of-the-art models

Future Direction: Nonparametric topic segmentation model We wish to automatically find out the number of latent topics that describes the collection instead of manually supplying and varying the number of latent topics

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 37 Acknowledgments

Specially thank SIGIR (the organization) for the Student Travel Award and also the local organizers for waiving my student registration fee (and for helping me keep busy for about 8 hours during the conference time). The experience has been rewarding.

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 38 References

Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. JMLR, 3, 993-1022. Wallach, H. M. (2006). Topic modeling: Beyond bag-of-words. Proc of ICML (pp. 977-984). Griffiths, T. L., Steyvers, M., and Tenenbaum, J. B. (2007). Topics in semantic representation. Psychological review, 114(2), 211. Wang, X., McCallum, A., and Wei, X. (2007). Topical n-grams: Phrase and topic discovery, with an application to information retrieval. In Proc. of ICDM, (pp. 697-702). Caballero, K. L., Barajas, J., and Akella, R. (2012). The generalized dirichlet distribution in enhanced topic detection. In Proc. of CIKM, (pp. 773-782). Riedl, M., and Biemann, C. (2012). Topictiling: a algorithm based on LDA. In Proc. of ACL, (pp. 37-42). Lindsey, R V., William P. H. , and Michael J. S. (2012) A phrase-discovering topic model using hierarchical Pitman-Yor processes. In Proc. of EMNLP, (pp. 214-222). Li, W., and McCallum, A. (2006). Pachinko allocation: DAG-structured mixture models of topic correlations. In Proc. of ICML (pp. 577-584).

Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 39