K-Nearest Neighbors (KNN)

A simple, intuitive algorithm: predict by looking at the K closest training examples.

How It Works

KNN is a lazy learner — it doesn't build a model during training. Instead, at prediction time it:

Computes the distance from the query point to every training point
Selects the K nearest neighbors
For classification: returns the majority class
For regression: returns the average value

Distance Metrics

Euclidean: d(p, q) = √Σ(pᵢ − qᵢ)²

Manhattan: d(p, q) = Σ|pᵢ − qᵢ|

Choosing K

Small K → noisy, overfits
Large K → smooth, may underfit
Often use cross-validation; odd K avoids ties in binary classification

Python Implementation

from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

X, y = load_iris(return_X_y=True)
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2)

knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_tr, y_tr)
print("Accuracy:", knn.score(X_te, y_te))

Important: Scale your features!

Because KNN relies on distance, features with larger scales dominate. Always normalize or standardize your data before applying KNN.

Pros and Cons

Pros: Simple, no training phase, works for multi-class, naturally non-linear.

Cons: Slow at prediction time, suffers from curse of dimensionality, sensitive to irrelevant features.