ML Algorithms
Unsupervised Learning
11 / 13

Principal Component Analysis (PCA)

A dimensionality reduction technique that finds the directions of maximum variance in your data.

The Goal

PCA transforms correlated features into a smaller set of uncorrelated components that capture as much of the original variance as possible. Useful for visualization, noise reduction, and speeding up downstream models.

How It Works

  • Standardize the data (zero mean, unit variance)
  • Compute the covariance matrix
  • Find eigenvectors and eigenvalues — the principal components
  • Project data onto the top K components

Python Implementation

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris

X, _ = load_iris(return_X_y=True)
X_scaled = StandardScaler().fit_transform(X)

pca = PCA(n_components=2)
X_2d = pca.fit_transform(X_scaled)

print("Explained variance:", pca.explained_variance_ratio_)
print("Total:", sum(pca.explained_variance_ratio_))

Choosing the Number of Components

Plot cumulative explained variance and pick K such that you preserve, say, 95% of variance.

Important caveat
PCA components are linear combinations of original features and lose direct interpretability. For non-linear structure, consider t-SNE or UMAP.