class AdalineGD:
"""ADAptive LInear NEuron classifier.
Parameters
--------------
eta: float.
This is the learning rate (between 0.0 and 1.0)
n_iter: int
Passes over the training dataset
random_state: int
Random number generator seed for random weight generalization
Attributes
---------------
w_: 1D-Array
Weights after fitting
b_: Scalar
Bias after fitting
losses_: list
Mean Squared Error loss function values in each epoch
"""
def __init__(self, eta=0.01, n_iter = 50, random_state = 1):
self.eta = eta
self.n_iter = n_iter
self.random_state = random_state
def net_input(self, X):
"""Calculate net input
Think of this as the linear combination of features and weights + bias
This helps get the neuron's "raw score"
"""
return np.dot(X, self.w_) + self.b_
def activation(self, X):
"""Compute linear activation
This is just the identity."""
return X
def predict(self, X):
"""Return class label after unit step
Binary class label of 1 when the activation >= 0.5, else 0
"""
return np.where(self.activation(self.net_input(X)) >= 0.5, 1, 0)
def fit(self, X, y):
""" Fit training data.
Parameters
-----------
X: array-like, shape = [m_examples, n_features]
Training vectors
m_examples: # of examples
n_features: # of features
y: array-like, shape = [m_examples]
Target values
Returns
-----------
self: object
"""
# Initialize the RNG, weights w_ with small random values, bias b_ to zero, losses_ empty
rgen = np.random.RandomState(self.random_state)
# This starts weights as tiny random numbers
self.w_ = rgen.normal(loc=0.0, scale=0.01, size=X.shape[1])
# This starts the bias at zero
self.b_ = np.float_(0.)
# This prepares a list to record training error at each pass
self.losses_ = []
for i in range(self.n_iter): # for each epoch
# Given whatever is w_ and b_, we're computing the raw score for each example
net_input = self.net_input(X)
# Activation is the identity (no "squashing" yet)
output = self.activation(net_input)
# How far off are we from the desired targets?
# Positive if too low, negative if too high
errors = (y - output)
"""Weight update.
This step is important!
We are calculating the gradient based on the whole training set,
not just evaluating each individual training example (as in the perceptron).
This makes the learning "smoother" - less aberrations because of individual examples.
This is called "Batch Gradient Descent."
Think of the model as making guesses with two kinds of knobs:
w_: one knob per input feature (like volume sliders for each input).
b_: one extra knob that shifts everything up or down (like a master volume).
Weight update (the "averaged nudge"):
Recall that error = (y - (X * w + b))
Multiplying by * 2.0 is because of the Chain Rule
The loss is error^2, so the derivative is 2 * error * derivative of error
Derivative of (X * w + b) w.r.t. w is X.
Multiply X.T * error to aggregate per feature across all samples
X has rows of training examples, columns of features
errors is rows of how wrong we were per example
X.T is the transpose of X so that each feature lines up with the
errors across example
X.T.dot(errors) is the dot product that combines every feature with
its errors.
X is (n_samples, n_features); errors is (n_samples,).
Flipping X gives X.T as (n_features, n_samples).
Dotting (n_features, n_samples) with (n_samples,) yields (n_features,):
a separate summed value for each feature.
Recall that the loss is the mean squared error
Therefore, we need to divide by N (# of examples) so the
summed gradient becomes an average
/ X.shape[0] means “take the average over all examples” so we dont
overreact to any single case.
Bias update: nudge the bias by the avg error
"""
self.w_ += self.eta * 2.0 * X.T.dot(errors) / X.shape[0]
self.b_ += self.eta * 2.0 * errors.mean()
# Compute Mean Squared Error
loss = (errors**2).mean()
# Track loss history so we can see learning progress
self.losses_.append(loss)
return selfDaily Notes: 2025-12-10
daily
ML Notes
Adaline implementation
Personal Notes
- I think I understand the implementation line by line. This tweet by GabrielPeterss4 helped. It’s worth reviewing again and again.
Questions I still have
- Need to be able to implement this from scratch. I wonder if I’m missing the forest for the trees here, but I do think it’s important to really understand Gradient Descent forwards and backwards.