This directory contains the programs to find the number of transmitted variants in acute HCV infections as described in the supplement to the paper [ref]. Two different methods of clustering are implemented and are provided in separate folders. You can download these as zip files: `AvThreshClust`

and `MaxThreshClust`

.

Both of these clustering tools are based on a model that takes into account HCV's life cycle, that HCV replication occurs via a replication complex, and that there can be many replication complexes continuously producing viruses from a long-lived infected cell. As a result, the model predicts that sequences with a small number of shared mutations can arise in a subject at detectable frequencies prior to the onset of immune selection. Furthermore, the model shows that these clusters have to satisfy two separate criteria: (a) the total amount of mutation that could have accumulated is limited by the mutation rate of the virus and the generation time, and (b) the number of mutations shared by distinct sequences is related by coalescent theory to the growth and stabilization of viral load in these acute infections. Starting at the tips of the phylogenetic tree, these codes identify the largest clusters that are consistent with these two criteria.

The two codes presented here differ in their conservativeness.
`AvThreshClust`

limits only the average amount of mutation
observed in the cluster, allowing a few highly divergent viruses to be
included within them. The average is a pretty robust measure of amount
of evolution; so the number of clusters identified by this method is likely
to be a lower bound on the number of transmitted variants.
`MaxThreshClust`

is far more aggressive in that it applies a
cutoff on the most divergent sequence allowed in a cluster. In
acute-to-acute transmissions sampled to a moderate depth, this provides a
better estimate of the likely number of transmitted variants.

To cluster by average mutation method, in a Terminal enter:

` ````
unzip AvThreshClust.zip
cd AvThreshClust
```

To cluster by maximum mutation method, in a Terminal enter:

` ````
unzip MaxThreshClust.zip
cd MaxThreshClust
```

The instructions to compile and run are the same in both cases:

- To Compile:
`make`

- To Run:
`./newicktree`

Inputs:- The program will ask for a name for the output file and the number of individual time points you wish to combine.
- Then it will ask for the first tree file. For each individual time point file, it will ask you to enter:
- whether the tree is in binary form with zero length branches, or has replaced them with polytomies. Enter 0 for multifurcating tree with polytomies and 1 for binary.
- the number of sequences in this tree. Should be an integer.
- the average number of nucleotides per sequence in the tree.
- the maximum number of shared mutations allowed rounded to the nearest integer. It will split a cluster if the number of shared mutations exceeds this threshold, but not if it equals this threshold. We used 2 a sequence of ~2500 bp and 4 for a sequence of ~5000 bp.
- Maximum number of mutations allowed in a cluster. Round to the nearest integer. It will split a cluster if the distance from the root of the cluster to any tip (if MaxThresh) or on average (if AvThresh) is equal to or exceeds this threshold. We used number of days since last negative sample divided by 8 for a sequence of ~2500 bp and divided by 4 for a sequence of ~5000 bp.

- Then for the combined time point file enter:
- The file name.
- 1 for binary, 0 for multifurcating (see above)
- the number of sequences in this tree. Should be an integer.
- the average number of nucleotides per sequence in the tree. This does not need to match that of the individual time points.