Physics and Chemistry of Biological Systems

Unsupervised Learning and non-Parametric methods
6.33 CFU

The aim of this course is to introduce the essential tools of unsupervised learning and dimensional reduction. These tools are of increasing use in preprocessing large databases in biophysics, molecular dynamics, and beyond. We will present the most relevant dimensionality reduction algorithms for linear data manifolds, curved manifolds, and manifolds with arbitrarily complex topologies. We will then introduce a selection of approaches for estimating the probability density and the intrinsic dimension of the data manifold. We will introduce unsupervised classification and clustering. Finally, we will introduce unsupervised learning approaches suitable for analysing time-ordered data, such are those generated in a molecular dynamics trajectory. We briefly touch upon the mathematical and algorithmic foundations of the methods, highlighting their strengths and limitations. The self-directed solution of data analysis exercises is an essential part of the course.