Sample Usage


>>> import thoth.thoth as thoth
>>> import numpy as np
>>> a_prob = thoth.prob_from_array([2,3,12,5,7])

Entropy


>>> thoth.entropy(a_prob)
2.063651255829042

“Naieve” (plug-in) entropy

>>> thoth.entropy_nsb(a_prob)
2.112851131072393

NSB estimate of entropy (note that for large cardinalities, you should consider the nsb-entropy code).

>>> thoth.entropy_ww(a_prob)
2.05377018210134

Wolpert and Wolf estimate of entropy.

>>> results = thoth.calc_entropy([2,3,12,5,7], 10000)
>>> results
array([ 2.1694808 , 2.00301591, 2.33896117, 1.89535524, 2.57598497])

Bootstrap returns estimate, +/- 1 sigma ranges, +/- 2 sigma ranges; 10000 refers to the number of Monte-Carlo samples.

Jensen-Shannon Divergence


>>> results = thoth.calc_jsd([1,2,3,4,5],[5,4,3,2,1], 0.5, 10000)
>>> results
array([ 0.05104706, -0.09282951, 0.19567123, -0.2672938 , 0.28200758])

Here alpha was set to 0.5; as before, bootstrap returns estimate, +/- 1 sigma ranges, +/- 2 sigma ranges. thoth.jsd can provide the naieve JSD (when given a prob object). Note that JSD can not actually be negative—this example shows the limits of zeroth-order bias correction.

Mutual Information


>>> arrays = np.array([[12,2,3], [3,16,2], [3,2,40]])
>>> results = thoth.calc_mi(arrays, 10000)
>>> results
array([ 0.58479914, 0.45267706, 0.71551216, 0.30732072, 0.83644323])

Mutual information is symmetric, so you don’t need to worry about transposing arrays!

>>> m_prob=thoth.jprob_from_array([[12,2,3], [3,16,2], [3,2,40]])
>>> thoth.mi_nsb(m_prob)
0.5578284421839139
>>> thoth.mi(m_prob)
0.6240974016800506
>>> thoth.mi_ww(m_prob, 1.0/9.0)
0.6401887867416165

To use the NSB, WW and naieve estimators, you will need to make “jprobs” (THOTH’s internal representation). For MI, you get to choose the beta parameter; here we have chosen the IUV (see paper).

Additional Questions & Responses


Dear Simon,

I'm a student from USC, now I'm using your THOTH estimator.
I'm now estimating the mutual information between two binary variables. Say observed samples of (X,Y) are:
(1,0), (1,0), (1,0), (0,1), (0,1), (0,1), (0,0)

How to use your bootstrap mutual information estimator?
Is it written as:
array = np.array([[1,0],[1,0],[1,0],[0,1],[0,1],[0,1],[0,0]]);
results = thoth.calc_mi(array, 10000)
or written as:
array = np.array([[1,1,1,0,0,0,0],[0,0,0,1,1,1,0]]);
results = thoth.calc_mi(array, 10000)

Another question is that if using your estimator for calculating the mutual information between two variables, is it possible to lead to negative results?

Best Regards,
Shuyang Gao

>>>

Dear Shuyang --

Glad to hear from you, and glad to hear that you're giving THOTH a workout.

The mutual information estimator takes the joint probability, not the list of samples.

So your example would be passed to THOTH as:

[[1, 3], [3, 0]]

(one observation of X=0, Y=0; three of X=1, Y=0; three of X=0, Y=1; none of X=1, Y=1).

This turns out to be a much more efficient way of representing the data, of course.

To answer your second question: zeroth order bias correction can produce negative results. I recommend looking at the Entropy paper for more on this question, including Figure 1. Higher-order corrections can ameliorate this problem, at the cost of affecting the coarse-graining properties.

Best wishes,

Simon

****

Hi Simon,

I've used ANACONDA for all my python packages, and for some reason this installation is not seem to be working out.

1. I have installed GSL
2. I have altered the folders in "setup_cfg.py" to the GSL installation

Still gives me an error when I "python setup_cfg.py install"

[...]
gcc -fno-strict-aliasing -I/Users/rionbr/anaconda/include -arch x86_64 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/rionbr/anaconda/lib/python2.7/site-packages/numpy/core/include -I/usr/local/include -I/Users/rionbr/anaconda/include/python2.7 -c src/nsb.c -o build/temp.macosx-10.5-x86_64-2.7/src/nsb.o
gcc -bundle -undefined dynamic_lookup -L/Users/rionbr/anaconda/lib -arch x86_64 -arch x86_64 build/temp.macosx-10.5-x86_64-2.7/src/bstrap_bc_wrap.o build/temp.macosx-10.5-x86_64-2.7/src/characterization.o build/temp.macosx-10.5-x86_64-2.7/src/prob.o build/temp.macosx-10.5-x86_64-2.7/src/prob_util.o build/temp.macosx-10.5-x86_64-2.7/src/nsb.o -L/usr/local/bin -L/usr/local/bin -lgsl -lgslcblas -o build/lib.macosx-10.5-x86_64-2.7/thoth/_bstrap_bc.so -fpic
ld: library not found for -lgsl
clang: error: linker command failed with exit code 1 (use -v to see invocation)
error: command 'gcc' failed with exit status 1


Any idea of what might be?

Thanks!
=]

<<< [some excitement here, not shown]

IT WORKED

Now using gsl-1.6 (instead of 1.9) seems to work, I changed the GSL folder to:

156-56-95-122:~ rionbr$ gsl-config --libs
-L/usr/local/lib -lgsl -lgslcblas
156-56-95-122:~ rionbr$ gsl-config --cflags
-I/usr/local/include

But with not -L or -I in front.

[Simon note: I find THOTH works just fine with gsl-1.15 (where 1.15 > 1.6 and 1.9) and installation via MacPorts.]