This repository contains an introductory exposition of principal component analysis (PCA). The rendered version is available at https://fmicompbio.github.io/PCA_intro2020/pca-intro.html.
Principal component analysis (or PCA) is arguably one of the most widely used tools in a data scientist's toolbox, with efficient and easy-to-use implementations available for most data analysis frameworks. What is often less emphasized is the elegant theory underlying PCA, and how we can use a better understanding of this theory to get more out of our PCA results, as well as recognize situations where it may not be the optimal tool. In this session, we will first go through some of this theory, and explain the sense in which the principal components provide the 'optimal' low-dimensional representation of a data set. Next, we will show how to run PCA in R and interpret the various outputs.