Supervised Learning
05 / 13
K-Nearest Neighbors (KNN)
A simple, intuitive algorithm: predict by looking at the K closest training examples.
How It Works
KNN is a lazy learner — it doesn't build a model during training. Instead, at prediction time it:
- Computes the distance from the query point to every training point
- Selects the K nearest neighbors
- For classification: returns the majority class
- For regression: returns the average value
Distance Metrics
Euclidean: d(p, q) = √Σ(pᵢ − qᵢ)²
Manhattan: d(p, q) = Σ|pᵢ − qᵢ|
Choosing K
- Small K → noisy, overfits
- Large K → smooth, may underfit
- Often use cross-validation; odd K avoids ties in binary classification
Python Implementation
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
X, y = load_iris(return_X_y=True)
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2)
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_tr, y_tr)
print("Accuracy:", knn.score(X_te, y_te))Important: Scale your features!
Because KNN relies on distance, features with larger scales dominate. Always normalize or standardize your data before applying KNN.
Pros and Cons
Pros: Simple, no training phase, works for multi-class, naturally non-linear.
Cons: Slow at prediction time, suffers from curse of dimensionality, sensitive to irrelevant features.