for _ in range(self.n_iter):
errors = 0
for xi, target in zip(X, y):
update = self.eta * (target - self.predict(xi))
self.w_ += update * xi
self.b_ += update
errors += int(update != 0.0)
self.errors_.append(errors)
return selfDaily Notes: 2025-11-28
daily
ML Notes
Adaline: another single-layer neural network
- Adaline: Adaptive Linear Neuron. Also known as the Widrow-Hoff rule: Published by Bernard Widrow and Tedd Hoff a few years after Rosenblatt’s perceptron algorithm as an improvement.
- The Adaline algorithm is important because it introduces the importance of defining and minimizing continuous loss functions.
- This lays the groundwork for other ML classification algorithms such as logistic regression, support vector machines, multilayer neural networks, etc.
Comparison with Perceptron - Perceptron learning tweaks the weights when the predicted class (the net input and how it compares to the threshold) disagrees with the label. Each update uses eta * (target - predict(x)) and it counts the errors per epoch. - Consider the following loop from Perceptron.fit():
Adaline uses Gradient Descent to differentiate from Perceptron in 2 important ways:
- Linear activation: No more thresholding the net input. Instead, Adaline keeps the raw value
net_input = wx + band compares that to the true continuous target - Cost function and update: Adaline minimizes the sum of squared errors (SSE) between
net_inputandtarget.- Gradient of SSE w.r.t. weights:
(target - net_input) * x - Weight update for weights & bias:
w += eta * (target - net_input) * x
- Gradient of SSE w.r.t. weights:
NoteWhy is the actual difference important?
The gradient using the actual difference is important because the updates become smoother and differentiable. This allows you to apply batch or stochastic gradient descent.
- Pseudocode for per-sample gradient descent loop:
- Compute
net_inputfor each sample - Calculate the error which is
target - net_input - Determine the direction and magnitude to update the weights by multiplying that error by the input vector
xand learning rateeta - Aggregate SSE per epoch to monitor whether it is converging
- Compute
NoteWhy is this better than Perceptron?
Adaline relies on a continuous error surface, which allows it to converge when a Perceptron might oscillate
Personal Notes
- This is the first time I’ve seen the mathematical intuition emerge as extremely important.
- It’s hard to get back into ML studying. I’m trying to keep the big picture in mind of the “why” behind the learning.
- It’s extremely motivating to listen to how Gabriel Petersson thinks about empowering yourself to learn using AI - getting down to the bottom of things and truly understanding vs. vibecoding.
Questions I still have
- Why is Adaline a single layer neural network? I’m assuming it’s because there’s one “decision” that the algorithm makes before it corrects itself per epoch. This is consistent with what I’d expect given the videos I’ve seen of MNNs in action. Would love to understand whether this is correct.
Tomorrow’s plan
- I need to study up on Gradient Descent to truly understand Adaline. I will watch 3Blue1Brown’s video as well.