ML Algorithms
Supervised Learning
05 / 13

K-Nearest Neighbors (KNN)

A simple, intuitive algorithm: predict by looking at the K closest training examples.

How It Works

KNN is a lazy learner — it doesn't build a model during training. Instead, at prediction time it:

  • Computes the distance from the query point to every training point
  • Selects the K nearest neighbors
  • For classification: returns the majority class
  • For regression: returns the average value

Distance Metrics

Euclidean: d(p, q) = √Σ(pᵢ − qᵢ)²
Manhattan: d(p, q) = Σ|pᵢ − qᵢ|

Choosing K

  • Small K → noisy, overfits
  • Large K → smooth, may underfit
  • Often use cross-validation; odd K avoids ties in binary classification

Python Implementation

from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

X, y = load_iris(return_X_y=True)
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2)

knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_tr, y_tr)
print("Accuracy:", knn.score(X_te, y_te))
Important: Scale your features!
Because KNN relies on distance, features with larger scales dominate. Always normalize or standardize your data before applying KNN.

Pros and Cons

Pros: Simple, no training phase, works for multi-class, naturally non-linear.

Cons: Slow at prediction time, suffers from curse of dimensionality, sensitive to irrelevant features.