Network Analysis and Modeling
CSCI 5352, Fall 2013

Time: Tuesday and Thursday, 12:30pm - 1:45pm
Room: ECCS 1B12

Instructor: Aaron Clauset
Office: ECOT 743
Office hours: Tuesday, 2:00-3:30pm
Email: look it up
Syllabus

Description
Course work and grading
Schedule and lecture notes
Problem sets
Supplemental readings

Description
Network science is a thriving and increasingly important cross-disciplinary domain that focuses on the representation, analysis and modeling of complex social, biological and technological systems as networks or graphs. Modern data sets often include some kind of network. Nodes can have locations, directions, memory, demographic characteristics, content, and preferences. Edges can have lengths, directions, capacities, costs, durations, and types. And, these variables and the network structure itself can vary, with edges and nodes appearing, disappearing and changing their characteristics over time. Capturing, modeling and understanding networks and rich data requires understanding both the mathematics of networks and the computational tools for identifying and explaining the patterns they contain.

This graduate-level course will examine modern techniques for analyzing and modeling the structure and dynamics of complex networks. The focus will be on statistical algorithms and methods, and both lectures and assignments will emphasize model interpretability and understanding the processes that generate real data. Applications will be drawn from computational biology and computational social science. No biological or social science training is required. (Note: this is not a scientific computing course, but there will be plenty of computing for science.)

Prerequisites
Recommended: CSCI 3104 (undergraduate algorithms) and APPM 3570 (applied probability), or equivalent preparation.

Note: An adequate mathematical and programming background is mandatory. The concepts and techniques covered in this course depend heavily on basic statistics (distributions, Monte Carlo techniques), scientific programming, and calculus (integration and differentiation). Students without sufficient preparation will struggle to keep up with the lectures and assignments. Students without proper preparation may audit the course.

Text
Required (available at the CU Bookstore):
1. Networks: An Introduction by M.E.J. Newman
2. Pattern Recognition and Machine Learning by C.M. Bishop.

Optional:
1. All of Statistics by L. Wasserman
2. Numerical Recipes
3. Networks, Crowds and Markets by D. Easley and J. Kleinberg
4. Error and the Growth of Experimental Knowledge by D.G. Mayo.

Course work and grading
Attendance to the lectures is required.

Most of the class will be standard graduate-style lectures by the instructor. These will be supplemented by guest lectures on special or advanced topics, and class discussions of selected papers drawn from the networks literature. Problem sets will develop and extend topics presented in class, and will introduce additional topics not covered in class. Performance on the problem sets will be the major component of evaluation. There are no written examinations in the course, and thus students are expected to spend serious quality time on the problem sets. Additional details are given in the syllabus.

Problem sets: There will be 6 problem sets, each requiring approximately 15 hours of outside work to complete. Programming components of the problem sets may be completed in any reasonable imperative language. I recommend something like Matlab or Python (with appropriate libraries), which have data analysis and visualization capabilities built-in. Familiarity with Mathematica will be useful for algebra and calculus components of some of the mathematical problems.

Solutions must be submitted in PDF format (e.g., typeset using LaTeX), should include all necessary details for me to follow the logic, and must be submitted via email by 11:59pm the day they are due. No late assignments will be accepted. Non-PDF formatted files will not be accepted. Collaboration on the problem sets between enrolled students is encouraged. However, students may not copy (in any way) from their collaborators and they must respect University academic policies on academic honesty at all times. To be clear: you may discuss the problems verbally and even work together on the solution, but you must write up your solutions separately. Student collaborators must be listed on the submission. Copying from any source, in any way, including the Web but especially from another student (past or present), is strictly forbidden and I will not hesitate to fail students who display ignorance or disregard for norms of academic honesty. If you are unsure about whether something is permitted under these rules, ask me well before the deadline.

Independent project: The purpose of the independent project is to formulate and explore a research question of the student's devising related to network analysis and modeling. Students must work independently, but are encouraged to solicit feedback from other students as the project develops. The deliverables are a short (10 minute) in-class presentation of the project results, and a 10-page writeup. See the syllabus for more details.

Grading Grades will be assigned based on (section 002) attendance (20%), problem sets (50%), and independent project (30%), or (section 740) problem sets (63%) and independent project (37%).

Tentative Schedule
Week 1 : Introduction and overview (Lectures 1 and 2)
Week 2 : Network basics, and centrality (Lecture 3 and Lecture 4)
Week 3 : Assortativity, transitivity, and reciprocity (Lecture 5)
Week 4 : Degree distributions (Lecture 6 and Lecture 7)
Week 5 : Small worlds and navigable networks (Lecture 8 and Lecture 9)
Week 6 : Random graphs (Lecture 10)
Week 7 : Configuration model (Lecture 11 and Lecture 12)
Week 8 : Empirical network case studies
Week 9 : Preferential attachment (Lecture 13 and Lecture 14)
Week 10 : Community structure and modularity (Lecture 15)
Week 11 : Stochastic block models (Lecture 16 and Lecture 17)
Week 12 : MCMC and hierarchical block models (Lecture 18 and Lecture 19)
Week 13 : Temporal networks (Lecture 20)
Week 14 : Fall break
Weeks 15-16 : Project presentations

Problem Sets
Problem set 1 (assigned Aug 27; due Sept 9) [data file]

Problem set 2 (assigned Sept 10; due Sept 25) [data file]

Problem set 3 (assigned Sept 26; due Oct 7)

Problem set 4 (assigned Oct 8; due Oct 21) [data file (zip)]

Problem set 5 (assigned Oct 22; due Nov 4) [data file]

Problem set 6 (assigned Nov 5; due Nov 18) [data file]

Supplemental Readings

Week 1:

Week 4:

Week 5:

Week 7:

Week 9:

Week 10:

Week 12:

Resources
LaTeX (general) and TeXShop (Mac)
Matlab license for CU staff (includes student employees)
Mathematica license for CU students
NumPy/SciPy libraries for Python (similar to Matlab)
NetworkX Python package for network analysis.
graph-tool, network analysis and visualization software (Python, C++)
GNU Octave (similar to Matlab)
Wolfram Alpha (Web interface for simple integration and differentiation)
Machine Learning, Statistical Inference and Induction Notebook (by Cosma Shalizi)
Power Law distributions, etc. Notebook (by Cosma Shalizi)
Statistics Done Wrong, The woefully complete guide (by Alex Reinhart)
Some Advice on Process for [Research Projects]
Cytoscape, network visualization software
yEd Graph Editor, network visualization software
Graphviz, network visualization software
Gephi, network visualization software