Power-law Distributions in Empirical Data

This page is a companion for the review article on power-law distributions in empirical data, written by Aaron Clauset (me), Cosma R. Shalizi and M.E.J. Newman. Here we provide information about and pointers to the 24 data sets we used in our paper. Some of these data sets are ours, but many are not. For those in the latter category, the best we can do is to link to their original sources so that you can either get the latest version of the data, or contact the proper copyright holder for access. When appropriate, we provide bibliographic information for the original source; if there is none, the proper reference is probably our paper.

Journal Reference
A. Clauset, C.R. Shalizi, and M.E.J. Newman, "Power-law distributions in empirical data" SIAM Review 51(4), 661-703 (2009). (arXiv:0706.1062)

Empirical Data Sets

  1. The frequency of occurrence of unique words in the novel Moby Dick by Herman Melville.
    Download the data.

    M. E. J. Newman, "Power laws, Pareto distributions and Zipf's law." Contemporary Physics 46, 323 (2005).

  2. The degrees (i.e., numbers of distinct interaction partners) of proteins in the partially known protein-interaction network of the yeast Saccharomyces cerevisiae.
    Link to the data.

    T. Ito et al., "Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins."Proceedings of the National Academy of Sciences (USA) 97, 1143 (2000).

  3. The degrees of metabolites in the metabolic network of the bacterium Escherichia coli.
    Contact authors for data.

    M. Huss and P. Holme, "Currency and commodity metabolites: their identification and relation to the modularity of metabolic networks." IET Systems Biology 1, 280 (2007).

  4. The degrees of nodes in the partially known network representation of the Internet at the level of autonomous systems for May 2006.
    Contact authors for data.

    P. Holme, J. Karlin and S. Forrest, "Radial Structure of the Internet." Proceedings of the Royal Society A 463, 1231 (2007).

  5. The number of calls received by customers of AT&T's long distance telephone service in the United States during a single day.
    Contact authors for data.

    J. Abello, A. Buchsbaum and J. Westbrook, in Proceedings of the 6th European Symposium on Algorithms (Springer, Berlin) (1998).
    W. Aiello, F. Chung and L. Lu in Proceedings of the 32nd Annual ACM Symposium on Theory of Computing (Association of Computing Machinery, New York), 171-180 (2000).

  6. The intensity of wars from 1816-1980 measured as the number of battle deaths per 10,000 of the combined populations of the warring nations.
    Link to the data.

    M. Small and J. D. Singer, Resort to Arms: International and Civil Wars, 1816-1980 (Sage Publications, Beverly Hills) (1982).
    D. C. Roberts and D. L. Turcotte, "Fractality and Self-Organized Criticality of Wars." Fractals 6, 351 (1998).

  7. The severity of terrorist attacks worldwide from February 1968 to June 2006, measured as the number of deaths directly resulting.
    Download the data.

    A. Clauset, M. Young and K. S. Gleditsch, "On the Frequency of Severe Terrorist Events." Journal of Conflict Resolution 51, 58 (2007).

  8. The number of bytes of data received as the result of individual web (HTTP) requests from computer users at a large research laboratory during a 24-hour period in June 1996.
    Contact authors for data.

    W. Willinger and V. Paxson, "Where Mathematics meets the Internet." Notices of the American Mathematical Society 45, 961 (1998).

  9. The number of species per genus of "recent" mammals.
    Link to the data.

    F. A. Smith et al., "Body mass of late quaternary mammals." Ecology 84, 3403 (2003).

  10. The numbers of sightings of birds of different species in the North American Breeding Bird Survey for 2003.
    Link to the data. (This links to the 2008 data, but there's a link to a folder that contains the 2003 data.)

  11. The numbers of customers affected in electrical blackouts in the United States between 1984 and 2002.
    Download the data.

    M. E. J. Newman, "Power laws, Pareto distributions and Zipf's law." Contemporary Physics 46, 323 (2005).

  12. The numbers of copies of bestselling books sold in the United States during the period 1895 to 1965.

    A. P. Hackett, 70 Years of Best Sellers, 1895-1965 (R. R. Bowker Company, New York) (1967).

  13. The human populations of US cities in the 2000 US Census.
    Download the data.

  14. The sizes of email address books of computer users at a large university.
    Contact authors for data.

    M. E. J. Newman, S. Forrest and J. Balthrop, "Email networks and the spread of computer viruses." Physical Review E 66, 035101 (2002).

  15. The sizes in acres of wildfires occurring on US federal land between 1986 and 1996.
    Download the data.

    M. E. J. Newman, "Power laws, Pareto distributions and Zipf's law." Contemporary Physics 46, 323 (2005).

  16. Peak gamma-ray intensity of solar flares between 1980 and 1989.
    Download the data.

    M. E. J. Newman, "Power laws, Pareto distributions and Zipf's law." Contemporary Physics 46, 323 (2005).

  17. The intensities of earthquakes occurring in California between 1910 and 1992, measured as the maximum amplitude of motion during the quake.
    Download the data. (Magnitudes on the Gutenberg-Richter scale.)

    M. E. J. Newman, "Power laws, Pareto distributions and Zipf's law." Contemporary Physics 46, 323 (2005).

  18. The numbers of adherents of religious denominations, bodies, and sects, as compiled and published on the web site adherents.com.
    Link to the data.

  19. The frequencies of occurrence of US family names in the 1990 US Census.
    Download the data.

  20. The aggregate net worth in US dollars of the richest individuals in the United States in October 2003.
    Link to the data. (Note: there are other years than 2003 available here.)

    M. E. J. Newman, "Power laws, Pareto distributions and Zipf's law." Contemporary Physics 46, 323 (2005).

  21. The number of citations received between publication and June 1997 by scientific papers published in 1981 and listed in the Science Citation Index.
    Link to the data.

    S. Redner, "How Popular is Your Paper? An Empirical Study of the Citation Distribution." European Physical Journal B 4, 131 (1998).

  22. The number of academic papers authored or coauthored by mathematicians listed in the American Mathematical Society's MathSciNet database. (Data compiled by Jerry Grossman.)
    Contact author for data.

  23. The number of "hits" received by web sites from customers of the America Online Internet service in a single day.
    Contact authors for data.

    L. A. Adamic and B. A. Huberman, "The nature of markets in the World Wide Web." Quarterly Journal of Electronic Commerce 1, 5-12 (2000).

  24. The number of links to web sites found in a 1997 web crawl of about 200 million web pages, represented as a simple histogram.
    Download the data.

    A. Broder et al., "Graph structure in the Web." Computer Networks 33, 309 (2000).