Power-law Distributions in Binned Empirical Data
This page is a companion for the paper on
power-law distributions in binned empirical data, written by
Yogesh Virkar and
Aaron Clauset (me). It presents
a version of the power-law tools from here that work with data that are binned.
This page hosts our implementations of the methods we describe in the article,
including several by developers other than us. Our goal is for the methods to
be widely accessible to the community.
NOTE: we cannot provide technical support for code not written by us, and
we are busy with other projects now and so may not provide support for our own
code.
Journal Reference
Y. Virkar and A. Clauset, Power-law distributions in binned empirical data.
Annals of Applied Statistics 8(1), 89 - 119 (2014). (arXiv:1208.3524).
Fitting a binned power-law distribution
This function fits a power-law model to binned data using maximum likelihood estimator discussed
in the paper. It uses a goodness-of-fit based method to estimate the lower bound for the scaling
region. Additional information can be obtained by typing 'help bplfit' at the Matlab command window.
bplfit.m (Matlab, by Yogesh Virkar)
Visualizing the plotting function
This function plots (on log-log axes) the binned empirical data along with the
fitted power-law model. Additional information can be obtained by typing 'help
bplplot' at the Matlab command window.
bplplot.m (Matlab, by Yogesh Virkar)
Estimating uncertainty in the fitted parameters
This function estimates the uncertainty in estimated parameters for the
power-law model using nonparametric bootstrap approach. Additional information
can be obtained by typing 'help bplvar' at the Matlab command window.
bplvar.m (Matlab, by Yogesh Virkar)
Calculating p-value for the fitted power-law model
Using the Kolmogorov-Smirnov statistic as a distance measure between data and
fitted model and semi-parametric bootstrap for resampling of data, this
function calculates the plausibility of the fitted power-law model. Additional
information can be obtained by typing 'help bplpva' at the Matlab command
window.
bplpva.m (Matlab, by Yogesh Virkar)
Comparing to alternative distributions
Using maximum likelihood estimator found for each alternative distribution
(exponential, stretched exponential, lognormal and power law with exponential
cut off), these functions fit the corresponding alternative distribution to
binned data. Additional information about usage can be obtained by typing
help name_of_function
.
bexpnfit.m (Matlab, by Yogesh Virkar)
bstexpfit.m (Matlab, by Yogesh Virkar)
blgnormfit.m (Matlab, by Yogesh Virkar)
bplcutfit.m (Matlab, by Yogesh Virkar)
Calculating likelihood ratio
This function implements the log likelihood ratio test explained in the paper
to compare between different fitted models. Additional information about usage
can be obtained by typing 'help blrtest'.
blrtest.m (Matlab, by Yogesh Virkar)
Calculating probability density functions
This function calculates the probability density function for specified type
of model. Finding PDFs can be useful for model comparison (See 'blrtest.m').
Additional information about usage can be obtained by typing 'help getPDF'.
getPDF.m (Matlab, by Yogesh Virkar)
Download all files
All the functions implemented above are available as a single downloadable zip
file here.
Full Matlab package (by Yogesh
Virkar)
A note about bugs and alternative implementations
The code provided here is provided as-is, with no warranty, with no
guarantees of technical support or maintenance, etc. If you experience
problems while using the code, please let us know via email. We are also happy
to host (or link to)
implementations of any of these functions in other programming languages,
in the interest of facilitating their more widespread use. That being said,
all such code also comes with no warranties, etc. If you do have questions
about any of the implementations, please contact the respective function's
author.
Finally, if you use our code in an academic publication, it would be courteous
of you to thank Yogesh in your acknowledgements for providing you with
implementations of the methods.
Data sets used
All data sets used in the paper are either previously published or are available online.
- Estimated number of personnel in a terrorist organization, binned by
powers of ten, except that the first two bins are merged.
Link to the data (mirror).V. Asal and R. K. Rethemeyer. "The Nature of the Beast: Organizational structures and the lethality of terrorist attacks." Journal of Politics 70(2):437-449 (2008).
- Diameter of branches in the plant species Cryptomeria, binned
in 30mm intervals.
Download the data.K. Shinokazi, K. Yoda, K. Hozumi, and T. Kira, "A quantitative analysis of plant form-The Pipe Model Theory II: Further evidence of the theory and its application in forest ecology." Japanese Journal of Ecology 14(2):133-139 (1964).
- Volume of ice in an iceberg calving event.
Contact authors for data.A. Chapuis and T. Tetzlaff, "The variability of tidewater-glacier calving: origin of event-size and interval distributions." E-print, arXiv:1205.1640 (2012).
- Length of a patient's hospital stay within a year.
Contact authors for data.Heritage Provider Network. Health Heritage Prize Data Files, HHP_release3 (2012).
- Wind speed (mph) of a tornado in the United States from 2007 to 2011, binned into categories according to the Enhanced Fujita (EF) scale, a roughly logarithmic binning scheme.
Link to the data.Storm Prediction Center, Severe Weather Database Files (1950-2011) (2011).
- Maximum wind speed (knots) of tropical storms and hurricanes in the
United States between 1949 and 2010.
Link to the data.B. Jarvinen, C. Neumann, and M.A.S. Davis, NHC Data Archive. National Hurricane Center (2012).
- The human population of U.S. cities in the 2000 U.S. Census.
Download the data. - The sizes in acres of wildfires occurring on US federal land between 1986 and 1996.
Download the data.M. E. J. Newman, "Power laws, Pareto distributions and Zipf's law." Contemporary Physics 46, 323 (2005).
- The intensities of earthquakes occurring in California between 1910 and 1992, measured as the maximum amplitude of motion during the quake.
Download the data. (Magnitudes on the Gutenberg-Richter scale.)M. E. J. Newman, "Power laws, Pareto distributions and Zipf's law." Contemporary Physics 46, 323 (2005).
- Area (sq. km) of glaciers in Scandinavia.
Link to the data.World Glacier Monitoring Service and National Snow and Ice Data Center. World Glacier Inventory (2012).
- Number of cases per 100,000 of various rare disease.
Link to the data.Orphanet Report Series, Rare Diseases collection. Prevalence of rare diseases: Bibliographic data (2011).
- Number of genes associated with a disease.
Link to the data (Table 1).K. Goh, M. Cusick, D. Valle, B. Childs, M. Vidal, and A. L. Barabasi, "The human disease network." Proc. Nat. Acad. Sci. USA 104(21), 8685-8690 (2007).
Updates
8 June 2015: corrected a small bug in the final calculation of the likelihood of the fit in bplfit; this bug did not impact any other aspects of the calculation. Thanks to Babak Fotouhi for finding it
5 September 2012: data information posted.
3 September 2012: v1.0 of code posted.
16 July 2012: initial page created.