Cosma Shalizi

External Professor




Most of my work involves stochastic aspects of nonlinear dynamical systems, unsupervised machine learning, or some combination of the two; almost all of it uses information theory, which I find to be an invaluable tool for proving probabilistic results.

My original training is in the statistical physics of complex systems — high-dimensional systems where the variables are strongly interdependent, but cannot be effectively resolved into a single low-dimensional subspace. I was (and am) particularly fond of the method of symbolic dynamics, and of cellular automata, which are spatial stochastic processes modeling pattern formation, fluid flow, magnetism and distributed computation, among other things. Much of my earlier work involves complexity measures, like thermodynamic depth, and, even more, the Grassberger-Crutchfield-Young "statistical complexity", the minimal amount of information about the past of a system required for optimal prediction of its future. This notion is intimately related to that of a minimal predictively-sufficient statistic, and in turn to the existence and uniqueness of a predictively optimal Markovian representation for every stochastic process, whether the original process is Markovian or not. (See here for details.) The same ideas also work on spatially extended systems, including those where space is an irregular graph or network, only then the predictive representation is a Markov random field.

Over the last several years, I've moved away from the mathematics of optimal prediction, towards devising algorithms to identify such predictors from finite data, and applying those algorithms to concrete problems. On the algorithmic side, Kristina Klinkner and I devised an algorithm, CSSR, which exploits the formal properties of the optimal predictive states to efficiently reconstruct them from discrete sequence data, and used large deviations arguments to show asymptotic convergence. (This is related to, but strictly more powerful than, variable-length Markov chains or context trees.) Working with Rob Haslinger, we also developed a (nameless) reconstruction algorithm for spatio-temporal random fields. We've used that to give a quantitative test for self-organization, and to automatically filter stochastic fields to identify their coherent structures (with Jean-Baptiste Rouquier and Cristopher Moore).

Several groups are using CSSR on empirical data — I know of work in anomaly detection, crystallography, geophysics, natural language processing, communications and ecology. I am working with a number of people on applications in neuroscience, for instance to the computational structure of spike trains. An ongoing set of projects, with Klinkner and Marcelo Camperi, uses the reconstructed states to build a noise-tolerant measure of coordinated activity and information sharing called "informational coherence". Informational coherence, in turn, defines functional modules of neurons with coordinated behavior, cutting across the usual anatomical modules. In addition, I'm involved in more conventional statistical modeling of neural signals, such as using multi-channel EEG data to identify sleep anomalies (with Matthew Berryman; here is a preparatory paper), and analytic approximations to traditional nonlinear state-estimation (with Shinsuke Koyama, Lucia Castellanos and Rob Kass).

Separately from all this, I work on analyzing learning procedures and asymptotic inference as noise-driven dynamical systems; for instance, the convergence of nonparametric Bayesian updating with mis-specified models and dependent data. I also remain interested in the role of information theory and statistical inference in the foundations of statistical mechanics, where I think some of the conventional views have things completely backwards. Two other legacies from my time in statistical physics are an interest in improving inference for various heavy-tailed distributions, and in complex networks, especially how network structure influences patterns of collective behavior, thereby confounding causal inferences (code for that paper).

I am writing a book on the statistical analysis of complex systems models.