Jacob Jensen, Florent Krzakala, Cristopher Moore, Cosma Shalizi, Xiaoran Yan
Paper #: 13-05-016
A central problem in analyzing networks is partitioning them into modules or communities, clusters with a statistically homogeneous pattern of links to each other or to the rest of the network. One of the best tools for this is the stochastic block model, which in its basic form imposes a Poisson degree distribution on all nodes within a community or block. In contrast, degree-corrected block models allow for heterogeneity of degree within blocks. Since these two model classes often lead to very different partitions of nodes into communities, we need an automatic way of deciding which model is more appropriate to a given graph. We present a principled and scalable algorithm for this model selection problem, and apply it to both synthetic and real-world networks. Specifically, we use belief propagation to efficiently approximate the log-likelihood of each class of models, summed over all community partitions, in the form of the Bethe free energy. We then derive asymptotic results on the mean and variance of the log-likelihood ratio we would observe if the null hypothesis were true, i.e. if the network were generated according to the non-degree-corrected block model. We find that for sparse networks, significant corrections to the classic asymptotic likelihood-ratio theory (underlying chi^2 hypothesis testing or the AIC) must be taken into account. We test our procedure against two real-world networks and find excellent agreement with our theory.