Deep Learning
12 / 13
Neural Networks
The foundation of deep learning — networks of simple units that learn complex patterns.
The Neuron
A single neuron computes a weighted sum of inputs and passes it through an activation function:
a = φ(w·x + b)
Common Activation Functions
- ReLU: max(0, z) — fast, default for hidden layers
- Sigmoid: 1 / (1 + e⁻ᶻ) — for binary output
- Tanh: (eᶻ − e⁻ᶻ) / (eᶻ + e⁻ᶻ) — zero-centered
- Softmax: for multi-class output probabilities
Architecture
A feedforward network stacks layers: input → hidden → output. Each layer's outputs feed the next. With enough hidden units, neural nets can approximate any continuous function (universal approximation theorem).
Backpropagation
Training uses gradient descent combined with backpropagation: compute the loss at the output, then propagate gradients backward through the chain rule to update every weight.
Python (PyTorch)
import torch
import torch.nn as nn
class Net(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(784, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = torch.relu(self.fc1(x))
return self.fc2(x)
model = Net()
loss_fn = nn.CrossEntropyLoss()
optim = torch.optim.Adam(model.parameters(), lr=1e-3)Beyond Feedforward
- CNNs — for images (convolutional layers)
- RNNs / LSTMs — for sequences
- Transformers — for language and beyond
The deep learning era
Neural networks unlocked breakthroughs in vision, speech, and language. Modern LLMs and image generators are all built on these foundations — scaled to billions of parameters.