HOWTO
We have prepared some simple programs in fortran90 to implement the maximum
likelihood dataclustering method, and a few other standard methods. If you do
not have a fortran90 compiler, follow this link.
In what follows, we suppose you have an ensemble of N data sets, each
of length D.
The steps to follow are:
-
You need to download the source code (e.g. grpsan.f90) and the corresponding
parameter file (e.g. grpsan.par).
-
The parameter file must be modified according to the characteristics of
your dataset, and according to what you are trying to do. For example, in this
file you specify how many data sets you have (N), in what range
of "beta" (the fictitious temperature) you will run the simulation, and
a conventional 3 letters prefix for the input/output files.
-
You should check that the distribution of your data is not too different
from a gaussian. Maybe you will find it useful to take the logarithm of the data
sets,
or consider using the Kendall's tau rather than the covariance matrix (see
the first Phys. Rev. E paper...).
-
To prepare the covariance matrix (Pearson's coefficients) of your data,
you FIRST have to normalize the data, so that they have ZERO mean, and
UNIT variance. In other words, subtract from each set its average,
and divide each set by the square root of its variance.
-
Be sure that the covariance matrix is written in the correct upper triangular
form!
-
Compile the code (you now how to do it, right?). If this step fails, let
us know!
-
Run the code. You can keep track of what is going on by looking a the
files xxx.ent.yyy and xxx.now.yyy. They contain info on the current ground
state, energy, temperature, etc...
-
Relax. Simulated annealing can take some time! To have an idea, it
took us about 12 hours to find the ground state of N=2500
sets.... But at the beginnning you probably want to choose a faster annealing
schedule, just to explore the "energy landscape" generated by your
data.
Now that you have (hopefully) read these instruction, you can download the programs.