Network Analysis and Modeling
CSCI 5352, Fall 2013
Time: Tuesday and Thursday, 12:30pm - 1:45pm
Room: ECCS 1B12
Instructor: Aaron Clauset
Office: ECOT 743
Office hours: Tuesday, 2:00-3:30pm
Email: look it up
Syllabus
Description
Course work and grading
Schedule and lecture notes
Problem sets
Supplemental readings
Description
Network science is a thriving and increasingly important cross-disciplinary
domain that focuses on the representation, analysis and modeling of complex
social, biological and technological systems as networks or graphs. Modern
data sets often include some kind of network. Nodes can have locations,
directions, memory, demographic characteristics, content, and preferences.
Edges can have lengths, directions, capacities, costs, durations, and types.
And, these variables and the network structure itself can vary, with edges and
nodes appearing, disappearing and changing their characteristics over time.
Capturing, modeling and understanding networks and rich data requires
understanding both the mathematics of networks and the computational tools for
identifying and explaining the patterns they contain.
This graduate-level course will examine modern techniques for analyzing and
modeling the structure and dynamics of complex networks. The focus will be on
statistical algorithms and methods, and both lectures and assignments will
emphasize model interpretability and understanding the processes that
generate real data. Applications will be drawn from computational biology and
computational social science. No biological or social science training is
required. (Note: this is not a scientific computing course, but there will be
plenty of computing for science.)
Prerequisites
Recommended: CSCI 3104 (undergraduate algorithms) and APPM 3570 (applied
probability), or equivalent preparation.
Note: An adequate mathematical and programming background is mandatory. The
concepts and techniques covered in this course depend heavily on basic
statistics (distributions, Monte Carlo techniques), scientific programming,
and calculus (integration and differentiation). Students without sufficient
preparation will struggle to keep up with the lectures and assignments.
Students without proper preparation may audit the course.
Text
Required (available at the CU Bookstore):
1. Networks: An Introduction by M.E.J. Newman
2. Pattern Recognition and Machine Learning by C.M. Bishop.
Optional:
1. All of Statistics by L. Wasserman
2. Numerical Recipes
3. Networks, Crowds and Markets by D. Easley and J. Kleinberg
4. Error and the Growth of Experimental Knowledge by D.G. Mayo.
Course work and grading
Attendance to the lectures is required.
Most of the class will be standard graduate-style lectures by the instructor.
These will be supplemented by guest lectures on special or advanced topics,
and class discussions of selected papers drawn from the networks literature.
Problem sets will develop and extend topics presented in class, and will
introduce additional topics not covered in class. Performance on the problem
sets will be the major component of evaluation. There are no written
examinations in the course, and thus students are expected to spend serious
quality time on the problem sets. Additional details are given in the syllabus.
Problem sets: There will be 6 problem sets, each requiring
approximately 15 hours of outside work to complete. Programming components of
the problem sets may be completed in any reasonable imperative language. I
recommend something like
Matlab or Python (with
appropriate libraries), which have data analysis and visualization
capabilities built-in. Familiarity with
Mathematica will be
useful for algebra and calculus components of some of the mathematical
problems.
Solutions must be submitted in PDF format (e.g., typeset using
LaTeX), should include all
necessary details for me to follow the logic, and must be submitted via email
by 11:59pm the day they are due. No late assignments will be accepted. Non-PDF formatted files will not be accepted.
Collaboration on the problem sets between enrolled students is encouraged.
However, students may not copy (in any way) from their collaborators and they
must respect University academic policies on academic honesty at all times.
To be clear: you may discuss the problems verbally and even work together on
the solution, but you must write up your solutions separately. Student
collaborators must be listed on the submission. Copying from any source, in
any way, including the Web but especially from another student (past or
present), is strictly forbidden and I will not hesitate to fail students who
display ignorance or disregard for norms of academic honesty. If you are
unsure about whether something is permitted under these rules, ask me well
before the deadline.
Independent project: The purpose of the independent project is to
formulate and explore a research question of the student's devising related
to network analysis and modeling. Students must work independently, but are
encouraged to solicit feedback from other students as the project develops.
The deliverables are a short (10 minute) in-class presentation of the project
results, and a 10-page writeup. See the syllabus for more details.
Grading Grades will be assigned based on (section 002) attendance
(20%), problem sets (50%), and independent project (30%), or (section 740)
problem sets (63%) and independent project (37%).
Tentative Schedule
Week 1 : Introduction and overview (Lectures 1 and 2)
Week 2 : Network basics, and centrality (Lecture 3 and Lecture 4)
Week 3 : Assortativity, transitivity, and reciprocity (Lecture 5)
Week 4 : Degree distributions (Lecture 6 and Lecture 7)
Week 5 : Small worlds and navigable networks (Lecture 8 and Lecture 9)
Week 6 : Random graphs (Lecture 10)
Week 7 : Configuration model (Lecture 11 and Lecture 12)
Week 8 : Empirical network case studies
Week 9 : Preferential attachment (Lecture 13 and Lecture 14)
Week 10 : Community structure and modularity (Lecture 15)
Week 11 : Stochastic block models (Lecture 16 and Lecture 17)
Week 12 : MCMC and hierarchical block models (Lecture 18 and Lecture 19)
Week 13 : Temporal networks (Lecture 20)
Week 14 : Fall break
Weeks 15-16 : Project presentations
Problem Sets
Problem set 1 (assigned Aug 27; due Sept 9) [data file]
Problem set 2 (assigned Sept 10; due Sept 25) [data file]
Problem set 3 (assigned Sept 26; due Oct 7)
Problem set 4 (assigned Oct 8; due Oct 21) [data file (zip)]
Problem set 5 (assigned Oct 22; due Nov 4) [data file]
Problem set 6 (assigned Nov 5; due Nov 18) [data file]
Week 1:
- M.E.J. Newman, "The structure and function of complex networks." SIAM Review 45, 167-256 (2003).
- L. Breiman, "Statistical Modeling: The Two Cultures." Statistical Science 16, 199-231 (2001).
Week 4:
- M.E.J. Newman, "Power laws, Pareto distributions and Zipf's law." Contemporary Physics 46(5), 323-351 (2005).
- M. Mitzenmacher, "A Brief History of Generative Models for Power Law and Lognormal Distributions." Internet Mathematics 1(2), 226-251 (2004).
- A. Clauset, C.R. Shalizi and M.E.J. Newman, "Power-law distributions in empirical data." SIAM Review 51(4), 661-703 (2009).
Week 5:
- S. Milgram, "The Small-World Problem." Psychology Today 1(1), 61-67 (1967).
- D. Watts and S. Strogatz, "Collective dynamics of 'small-world' networks." Nature 393, 440-442 (1998).
- J. Kleinberg, "The Small-World Phenomenon, an Algorithmic Perspective." Proc. 32nd ACM Symposium on Theory of Computing (2000).
- D. Liben-Nowell et al., "Geographic routing in social networks." PNAS 102(33), 11623-11628 (2005).
Week 7:
- M.E.J. Newman, S.H. Strogatz and D.J. Watts, "Random graphs with arbitrary degree distributions and their applications." Physical Review E 64, 026118 (2001).
- D.S. Callaway, J.E. Hopcroft, J.M. Kleinberg, M.E.J. Newman and S.H. Strogatz, "Are randomly grown graphs really random?." Physical Review E 64, 041902 (2001).
Week 9:
- M.E.J. Newman, "The first-mover advantage in scientific publication." European Physics Letters 86, 68001 (2009).
- S. Redner, "Citation statistics from 110 years of Physical Review." Physics Today 58, 49-54 (2005).
Week 10:
- A. Clauset, M.E.J. Newman and C. Moore, "Finding community structure in very large networks." Physical Review E 70, 066111 (2004).
- B.H. Good, Y.-A. de Montjoye and A. Clauset, "Performance of modularity maximization in practical contexts." Physical Review E 81, 046106 (2010).
Week 12:
- C.J. Geyer, "Practical Markov Chain Monte Carlo." Statistical Science 7(4), 473-483 (1992).
- A. Clauset, C. Moore and M.E.J. Newman, "Hierarchical structure and the prediction of missing links in networks." Nature 453, 98-101 (2008).
Resources
LaTeX (general) and TeXShop (Mac)
Matlab license for CU staff (includes student employees)
Mathematica license for CU students
NumPy/SciPy libraries for Python (similar to Matlab)
NetworkX Python package for network analysis.
graph-tool, network analysis and visualization software (Python, C++)
GNU Octave (similar to Matlab)
Wolfram Alpha (Web interface for simple integration and differentiation)
Machine Learning, Statistical Inference and Induction Notebook (by Cosma Shalizi)
Power Law distributions, etc. Notebook (by Cosma Shalizi)
Statistics Done Wrong, The woefully complete guide (by Alex Reinhart)
Some Advice on Process for
[Research Projects]
Cytoscape, network
visualization software
yEd Graph Editor, network visualization software
Graphviz,
network visualization software
Gephi, network visualization software