Topic Model | Notion

First attempts to find topics from data is Latent Semantic Analysis (LSA): find the best low-rank approximation of a document-term matrix. Approximation, based on SVD.

Latent Dirichlet Allocation

Latent because we use probabilistic inference to infer missing probabilistic pieces of the generative story.

Dirichlet because of the Dirichlet parameters encoding sparsity. Allocation because the Dirichlet distribution encodes the prior for each document’s allocation over topics.

Story

Generate Topics
Document Allocations
Words in Context

Inference (rvs.)

Topic Assignments
Document Allocation

$$ \theta_{d,i}\approx\frac{N_{d,i}+\alpha_i}{\sum_kN_{d,k}+\alpha_k} $$
Topics

$$ \phi_{i,v}\approx\frac{V_{i,v}+\beta_v}{\sum_wV_{i,w}+\beta_w} $$
Assign word to a particular topic