Neural Networks

The foundation of deep learning — networks of simple units that learn complex patterns.

The Neuron

A single neuron computes a weighted sum of inputs and passes it through an activation function:

a = φ(w·x + b)

Common Activation Functions

ReLU: max(0, z) — fast, default for hidden layers
Sigmoid: 1 / (1 + e⁻ᶻ) — for binary output
Tanh: (eᶻ − e⁻ᶻ) / (eᶻ + e⁻ᶻ) — zero-centered
Softmax: for multi-class output probabilities

Architecture

A feedforward network stacks layers: input → hidden → output. Each layer's outputs feed the next. With enough hidden units, neural nets can approximate any continuous function (universal approximation theorem).

Backpropagation

Training uses gradient descent combined with backpropagation: compute the loss at the output, then propagate gradients backward through the chain rule to update every weight.

Python (PyTorch)

import torch
import torch.nn as nn

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        return self.fc2(x)

model = Net()
loss_fn = nn.CrossEntropyLoss()
optim = torch.optim.Adam(model.parameters(), lr=1e-3)

Beyond Feedforward

CNNs — for images (convolutional layers)
RNNs / LSTMs — for sequences
Transformers — for language and beyond

The deep learning era

Neural networks unlocked breakthroughs in vision, speech, and language. Modern LLMs and image generators are all built on these foundations — scaled to billions of parameters.