Slide 14 of 19
Notes:
The goal of clustering methods is to uncover the structure which is present in the data set. However the data is affected by measurement noise. This has been made clear by studies on the S&P500 correlation matrix of Pearson’s coefficients (Laloux et al. 1998, Plerou et al. 1998) which shows that the distribution of eigenvalues is considerably affected by noise dressing. Having an explicit model for the correlation, whose parameters can be fitted, we can separate the noisy component form the structure. In few words (see ref. [1] for more details) we compute the parameters of the model at several values of b. This gives us a model theoretical correlations. In order to compare these with the real correlations, we generate synthetic data sets of the same length D of the real data and compare the correlation matrix of the synthetic data with the same coefficients for the real data set. In other words we redress with noise the theoretical correlations in order to compare them with actual data. Finally we find the value of b which best reproduces the observed noise-dressed correlation matrix. The results are shown in the figures.