Power-law Distributions in Empirical Data

This page is a companion for the review article on power-law distributions in empirical data, written by Aaron Clauset (me), Cosma R. Shalizi and M.E.J. Newman. The intention is that this page will host implementations of the methods we describe in the article. For now, these are simply the versions we wrote (in Matlab and R), but our hope is to eventually host versions in a variety of languages. In general, we want to make the methods as accessible to the community as possible.

Journal Reference
A. Clauset, C.R. Shalizi, and M.E.J. Newman, "Power-law distributions in empirical data" E-print (2007). arXiv:0706.1062

Random number generators
This function generates continuous values randomly distributed according to one of the five distributions discussed in the article (power law, exponential, log-normal, stretched exponential, and power law with cutoff). Usage information is included in the file; type 'help randht' at the Matlab prompt for more information.
randht.m (Matlab)

Fitting a power-law distribution
This function implements both the discrete and continuous maximum likelihood estimators for fitting the power-law distribution to data, along with the goodness-of-fit based approach to estimating the lower cutoff for the scaling region. Usage information is included in the file; type 'help plfit' at the Matlab prompt for more information.
plfit.m (Matlab)
plfit.r (R, by Laurent Dubroca)

Visualizing the fitted distribution
After several requests, I've written this function, which plots (on log-log axes) the empirical distribution along with the fitted power-law distribution. Usage information is included in the file; type 'help plplot' at the Matlab prompt for more information. (This function is now also included in the full package download below.)
plplot.m (Matlab)

Estimating uncertainty in the fitted parameters
This function implements the nonparametric approach for estimating the uncertainty in the estimated parameters for the power-law fit found by the plfit function. It too implements both continuous and discrete versions. Usage information is included in the file; type 'help plvar' at the Matlab prompt for more information.
plvar.m (Matlab)

Calculating p-value for fitted power-law model
This function implements the Kolmogorov-Smirnov test (which computes a p-value for the estimated power-law fit to the data) for the power-law model. As above, it too implements both continuous and discrete versions of the test. Usage information is included in the file; type 'help plpva' at the Matlab prompt for more information.
plpva.m (Matlab)

Reimann Zeta function
The discrete estimator needs to calculate the Reimann Zeta function for normalization. Matlab includes this function in the Symbolic Math Toolbox, but there are free versions available if you don't have this toolbox. For instance, Paul Godfrey's special functions library (via Matlab Central File Exchange) gives one, which we mirror here (note, you need both these files; tip to Will Tracy).
deta.m (Matlab)
zeta.m (Matlab)

Calculating likelihood-ratio test results
The functions necessary to compute the log likelihood ratio tests is implemented in the statistical programming language R. Documentation of these functions is given in a separate file, and the R functions themselves are in a downloadable tgz file (note: this is not a proper R package, yet).
Documentation
R code

Download all files
To make getting the most up-to-date versions of the files even easier, they're available as a downloadable tgz file here
Download all files

A note about bugs and alternative implementations
The code provided here is provided as-is, with no warranty, etc. But, if you do experience problems using the code, please let us know via email. Further, we are happy to host implementations of any of these functions in other programming languages, in the interest of facilitating their more widespread use. Note: if you use our code in an academic publication, it would be courteous of you to thank me and Cosma in your acknowledgements.

Updates
25 April 2008: changed randht, plpva and plvar to only initialize the pseudo-random number generator on the first time they are called.
5 March 2008: in the integer routines, plfit, plvar and plpva now automatically switch to a slower but more memory efficient estimation routine when the vectorized default routine fails (e.g., Out of Memory error when max(x) is extremely large).
29 February 2008: posted Laurent Dubroca's R implementation of plfit.
17 February 2008: posted the plplot.m function for plotting the fitted power-law distributions against the empirical data.
30 January 2008: corrected typo in plpva when using hidden 'sample' option, and reordered the commands for 'limit' and 'sample' throughout (thanks to Klaas Dellschaft for suggestions).
28 September 2007: corrected typo in argument parsing for randht.m, significant efficiency improvements to xmin estimation routine in plfit.m, plpva.m and plvar.m (thanks to Jim Bagrow for suggestions).
7 September 2007: corrected interim reporting in plpva.m; changed plfit.m, plvar.m and plpva.m to reshape input vector to column format, and to prevent using continuous approximation in small-sample regime for discrete data.
25 July 2007: corrected a typo in plvar.m, typo in pareto.R, typo in log-likelihood for discrete cut-off powerlaw and fixed small bug in a plotting routine.
29 June 2007: corrected a typo in plpva.m, typo in pareto.R and updated compilation instructions in discpowerexp.R.