[Could not find the bibliography file(s)
Mixed Membership Stochastic Blockmodels (MMSB)[?] is a generative model for links in a network. Simply put, give an adjacency matrix , MMSB assume is governed by the memberships/topics of each nodes. Since the original paper[?] only introduced the Variational Inference, this post derives a Gibbs Sampling method for the purpose of parameter estimation in MMSB (actually the MMSB presented in this post is slightly different from[?], as I put a prior on interaction matrix ).
The generative process of MMSB is given below:
For each node :
Draw a membership distribution vector
For each entry in :
For each pair of nodes :
Draw a membership
Draw a membership
Here is a simple explanation for the notations:
is (number of topics) dimensional hyperparameter for membership/topic distribution.
are two scalar hyperparameters for .
is the membership/topic distribution for node .
is the interaction probability matrix of memberships.
is the membership of node when it interacts with node (the link can be treated as directional, as well as unidirectional). Note that is abbreviated for .
is the adjacency matrix, each entry is either one or zero.
Later on, I am also going to use the following notations:
is the count of node assigning to membership when it interacts with the rest nodes in the network. For directional network, it can be divided into two sets: and (), the former is the assignment of node when it links to other nodes, the latter is the assignment of node when other nodes link to it.
is the count of linked node pairs with membership assignments and (sum up and if don’t consider direction).
is the count of un-linked node pairs with membership assignment and (sum up and if don’t consider direction).
To perform Gibbs Sampling, we need to find the posterior for joint probability of both data and latent variables . Well, here it is:
Now that we obtained the joint probability of data and latent variables, we can start to derive the conditioned probability of latent variable as following:
Given samples of generated from Gibbs Sampling, parameters can be estimated in the following ways:
For , we have
E.q. 4 is a Dirichlet p.d.f, using Dirichlet variable property, we get the expectation(mean) of :
For , we have
E.q. 5 is a Beta p.d.f, using Beta variable property, we get the expectation(mean) of :
One may want to sample multiple de-correlated samples of to calculate the above parameters.
Likelihood and perplexity
Until now, I didn’t mention the formula for calculating the likelihood of data , which can be important some time for testing of convergence (or using ). In [?], the authors mentioned an approximation technique for computing , it is the harmonic mean of a set of values of from samples, where is sampled from the posterior . So that is:
Using the previous derivation, we can easily compute that:
Where is a constant:
Now with , we can compute in the way:
Where , is introduced to make log-sum-exp computed correctly, usually we can set .
The likelihood of hold out data sometime is also an useful quantity, it reflects generalized predictive power of the model, and usually it is measured by perplexity, essentially it is the inverse of the geometric mean per-link likelihood. It is defined by:
This term can be evaluated by first hold out some subset of links and non-links, and evaluated by computing:
Where and are estimated in the training data.
The Mixed Membership Stochastic Blockmodels is not scalable as it requires to examining edges (this is a lot even for medium size network with only thousands of nodes!), in sharp contrast to sparse networks as arise more often in real datasets. But in practice, for those non-existing edges, instead of using them all, we can sample a small portion of them, this could effectively reduce the computational complexity, especially for sparse networks where we expect the sample ratio of non-existing edges to be even fewer. However, it is still remained to be examined that the relationship between the loss of performance and sampling ratio of non-existing links.
Also due to the scalability issue, one may consider the MMSB without prior for each document topic distribution, which can be also by EM rather than Gibbs Sampling. Also, one can derive Gibbs Sampling for combined LDA and MMSB [?] in similar ways to Gibbs Sampling for LDA (see here) and MMSB individually (basically key is to derive , where ).