Note

This tutorial was generated from a Jupyter notebook that can be accessed here.

Global and local PCA#

In this tutorial, we present how global and local PCA can be performed on a synthetic data set using the reduction module.

We import the necessary modules:

from PCAfold import preprocess
from PCAfold import reduction
from PCAfold import PCA, LPCA
import matplotlib.pyplot as plt
from matplotlib import gridspec
from matplotlib.colors import ListedColormap
import numpy as np

and we set some initial parameters:

n_points = 1000
save_filename = None
global_color = '#454545'
k1_color = '#0e7da7'
k2_color = '#ceca70'
color_map = ListedColormap([k1_color, k2_color])

Generate a synthetic data set for global PCA#

We generate a synthetic data set on which the global PCA will be performed. This data set is composed of a single cloud of points.

mean_global = [0,1]
covariance_global = [[3.4, 1.1], [1.1, 2.1]]

x_noise, y_noise = np.random.multivariate_normal(mean_global, covariance_global, n_points).T
y_global = np.linspace(0,4,n_points)
x_global = -(y_global**2) + 7*y_global + 4
y_global = y_global + y_noise
x_global = x_global + x_noise

Dataset_global = np.hstack((x_global[:,np.newaxis], y_global[:,np.newaxis]))

This data set can be seen below:

Global PCA#

We perform global PCA to obtain global principal components, global eigenvectors and global eigenvalues:

pca = PCA(Dataset_global, 'none', n_components=2)
principal_components_global = pca.transform(Dataset_global, nocenter=False)
eigenvectors_global = pca.A
eigenvalues_global = pca.L

We also retrieve the centered and scaled data set:

Dataset_global_pp = pca.X_cs

Generate a synthetic data set for local PCA#

Similarly, we generate another synthetic data set that is composed of two distinct clouds of points.

mean_local_1 = [0,1]
mean_local_2 = [6,4]
covariance_local_1 = [[2, 0.5], [0.5, 0.5]]
covariance_local_2 = [[3, 0.3], [0.3, 0.5]]

x_noise_1, y_noise_1 = np.random.multivariate_normal(mean_local_1, covariance_local_1, n_points).T
x_noise_2, y_noise_2 = np.random.multivariate_normal(mean_local_2, covariance_local_2, n_points).T
x_local = np.concatenate([x_noise_1, x_noise_2])
y_local = np.concatenate([y_noise_1, y_noise_2])

Dataset_local = np.hstack((x_local[:,np.newaxis], y_local[:,np.newaxis]))

This data set can be seen below:

../_images/tutorial-pca-data-set-for-local-pca.svg

Cluster the data set for local PCA#

We perform clustering of this data set based on pre-defined bins using the available preprocess.predefined_variable_bins function. We obtain cluster classifications and centroids for each cluster:

(idx, borders) = preprocess.predefined_variable_bins(Dataset_local[:,0], [2.5], verbose=False)
centroids = preprocess.get_centroids(Dataset_local, idx)

The result of this clustering can be seen below:

In local PCA, PCA is applied in each cluster separately.

Local PCA#

We perform local PCA to obtain local principal components, local eigenvectors and local eigenvalues:

lpca = LPCA(Dataset_local, idx, scaling='none')
principal_components_local = lpca.principal_components
eigenvectors_local = lpca.A
eigenvalues_local = lpca.L

Plotting global versus local PCA#

Finally, for the demonstration purposes, we plot the identified global and local eigenvectors on top of both synthetic data sets. The visual result of performing PCA globally and locally can be seen below:

../_images/tutorial-pca-global-local-pca.svg

Note, that in local PCA, a separate set of eigenvectors is found in each cluster. The same goes for principal components and eigenvalues.