Note
This tutorial was generated from a Jupyter notebook that can be accessed here.
Plotting PCA results#
In this tutorial, we present plotting functionalities from the reduction
module that aid in visualizing PCA results.
We import the necessary modules:
from PCAfold import PCA
from PCAfold import reduction
import numpy as np
and we set some initial parameters:
title = None
save_filename = None
As an example, we will use a data set representing combustion of syngas (CO/H2 mixture) in air generated from the steady laminar flamelet model. This data set has 11 variables and 50,000 observations. The data set was generated using Spitfire software [HAS+22] and a chemical mechanism by Hawkes et al. [HSSC07]. To load the data set from the tutorials directory:
X = np.genfromtxt('data-state-space.csv', delimiter=',')
X_names = ['$T$', '$H_2$', '$O_2$', '$O$', '$OH$', '$H_2O$', '$H$', '$HO_2$', '$CO$', '$CO_2$', '$HCO$']
We generate four PCA objects corresponding to four scaling criteria:
pca_X_Auto = PCA(X, scaling='auto', n_components=3)
pca_X_Range = PCA(X, scaling='range', n_components=3)
pca_X_Vast = PCA(X, scaling='vast', n_components=3)
pca_X_Pareto = PCA(X, scaling='pareto', n_components=3)
and we will plot PCA results from the generated objects.
Eigenvectors#
Weights of a single eigenvector can be plotted using the reduction.plot_eigenvectors
function. Note, that multiple eigenvectors can be passed as an input and this function will
generate as many plots as there are eigenvectors supplied.
Below is an example of plotting just the first eigenvector:
plt = reduction.plot_eigenvectors(pca_X_Auto.A[:,0], variable_names=X_names)
To plot all eigenvectors resulting from a single PCA
class object:
plts = reduction.plot_eigenvectors(pca_X_Auto.A, variable_names=X_names)
Two weight normalizations are available:
No normalization. To use this variant set
plot_absolute=False
. Example can be seen below:
plt = reduction.plot_eigenvectors(pca_X_Auto.A[:,0], eigenvectors_indices=[], variable_names=X_names, plot_absolute=False, save_filename=save_filename)
Absolute values. To use this variant set
plot_absolute=True
. Example can be seen below:
plt = reduction.plot_eigenvectors(pca_X_Auto.A[:,0], eigenvectors_indices=[], variable_names=X_names, plot_absolute=True, save_filename=save_filename)
Eigenvectors comparison#
Eigenvectors resulting from, for instance, different PCA
class objects can
be compared on a single plot using the reduction.plot_eigenvectors_comparison
function.
Two weight normalizations are available:
No normalization. To use this variant set
plot_absolute=False
. Example can be seen below:
plt = reduction.plot_eigenvectors_comparison((pca_X_Auto.A[:,0], pca_X_Range.A[:,0], pca_X_Vast.A[:,0], pca_X_Pareto.A[:,0]), legend_labels=['Auto', 'Range', 'Vast', 'Pareto'], variable_names=X_names, plot_absolute=False, color_map='coolwarm', save_filename=save_filename)
Absolute values. To use this variant set
plot_absolute=True
. Example can be seen below:
plt = reduction.plot_eigenvectors_comparison((pca_X_Auto.A[:,0], pca_X_Range.A[:,0], pca_X_Vast.A[:,0], pca_X_Pareto.A[:,0]), legend_labels=['Auto', 'Range', 'Vast', 'Pareto'], variable_names=X_names, plot_absolute=True, color_map='coolwarm', save_filename=save_filename)
Eigenvalue distribution#
Eigenvalue distribution can be plotted using the reduction.plot_eigenvalue_distribution
function.
Two eigenvalue normalizations are available:
No normalization. To use this variant set
normalized=False
. Example can be seen below:
plt = reduction.plot_eigenvalue_distribution(pca_X_Auto.L, normalized=False, save_filename=save_filename)
Normalized to 1. To use this variant set
normalized=True
. Example can be seen below:
plt = reduction.plot_eigenvalue_distribution(pca_X_Auto.L, normalized=True, save_filename=save_filename)
Eigenvalue distribution comparison#
Eigenvalues resulting from, for instance, different PCA
class objects can
be compared on a single plot using the reduction.plot_eigenvalues_comparison
function.
Two eigenvalue normalizations are available:
No normalization. To use this variant set
normalized=False
. Example can be seen below:
plt = reduction.plot_eigenvalue_distribution_comparison((pca_X_Auto.L, pca_X_Range.L, pca_X_Vast.L, pca_X_Pareto.L), legend_labels=['Auto', 'Range', 'Vast', 'Pareto'], normalized=False, color_map='coolwarm', save_filename=save_filename)
Normalized to 1. To use this variant set
normalized=True
. Example can be seen below:
plt = reduction.plot_eigenvalue_distribution_comparison((pca_X_Auto.L, pca_X_Range.L, pca_X_Vast.L, pca_X_Pareto.L), legend_labels=['Auto', 'Range', 'Vast', 'Pareto'], normalized=True, color_map='coolwarm', save_filename=save_filename)
Cumulative variance#
Cumulative variance computed from eigenvalues can be plotted using the
reduction.plot_cumulative_variance
function. Example of a plot:
plt = reduction.plot_cumulative_variance(pca_X_Auto.L, n_components=0, save_filename=save_filename)
The number of eigenvalues to look at can also be truncated by setting
n_components
input parameter accordingly. Example of a plot when
n_components=5
in this case:
plt = reduction.plot_cumulative_variance(pca_X_Auto.L, n_components=5, save_filename=save_filename)
Two-dimensional manifold#
Two-dimensional manifold resulting from performing PCA transformation can be
plotted using the reduction.plot_2d_manifold
function. We first calculate
the principal components by transforming the original data set to the new basis:
principal_components = pca_X_Vast.transform(X)
By setting color=X[:,0]
parameter, the manifold can be additionally
colored by the first variable in the data set (in this case, the temperature). Note that you can select the colormap to use through the color_map
parameter. Example of using color_map='inferno'
and coloring by the first variable in the data set:
plt = reduction.plot_2d_manifold(principal_components[:,0], principal_components[:,1], color=X[:,0], x_label='$Z_1$', y_label='$Z_2$', colorbar_label='$T$ [K]', color_map='inferno', figure_size=(10,4), save_filename=save_filename)
Example of an uncolored plot:
plt = reduction.plot_2d_manifold(principal_components[:,0], principal_components[:,1], x_label='$Z_1$', y_label='$Z_2$', figure_size=(10,4), save_filename=save_filename)
Example of using color_map='Blues'
and coloring by the first variable in the data set:
plt = reduction.plot_2d_manifold(principal_components[:,0], principal_components[:,1], color=X[:,0], x_label='$Z_1$', y_label='$Z_2$', colorbar_label='$T$ [K]', color_map='Blues', figure_size=(10,4), save_filename=save_filename)
Three-dimensional manifold#
Similarly, a three-dimensional manifold can be visualized:
plt = reduction.plot_3d_manifold(principal_components[:,0], principal_components[:,1], principal_components[:,2], elev=30, azim=-20, color=X[:,0], x_label='$Z_1$', y_label='$Z_2$', z_label='$Z_3$', colorbar_label='$T$ [K]', color_map='inferno', figure_size=(15,8), save_filename=save_filename)
Parity plot#
Parity plots of reconstructed variables can be visualized using the
reduction.plot_parity
function. We approximate the data set using the previously
obtained two principal components:
X_rec = pca_X_Vast.reconstruct(principal_components)
and we generate a parity plot which visualizes the reconstruction of the first variable:
plt = reduction.plot_parity(X[:,0], X_rec[:,0], color=X[:,0], x_label='Observed $T$', y_label='Reconstructed $T$', colorbar_label='$T$ [K]', color_map='inferno', figure_size=(7,7), save_filename=None)
Similarly as in the plot_2d_manifold
function, you can select the colormap to use.