Daily Notes: 2025-11-16

daily
Published

November 16, 2025

ML Notes

Implementing the perceptron as a simple ML classification algorithm without scikit-learn

  • Mc-Culloch-Pitts (MCP) neuron model: The biological neuron as a simple logic gate with multiple input signals arriving at the dendrites and a binary output.
  • Perceptron learning rule: Frank Rosenblatt built upon this MCP neuron model by proposing an algorithm that would learn a weight vector \(w\) that would be multiplied with the input features \(x\) to make a decision about whether the neuron fires or not: i.e., a binary output.
  • This is helpful because it can predict with the classification problem: does a new data point belong to one class or another?

Formally, a decision function \(f(z)\) where, given a defined threshold \(\theta\):

\(z = w_1x_1 + w_2x_2 + ... + w_nx_n = w^Tx\)

Note

\(w\) and \(z\) are both column vectors, which is why we take the transpose \(w^T\) to get the dot product of the \((n \times 1)\) column vectors. \((1 \times n) * (n \times 1) = 1 \times 1\)

\(f(z) = \begin{cases} 1, & z \ge \theta \\ 0, & z < \theta \end{cases}\)

If we introduce a bias unit \(b = -\theta\) for ease of implementation, then:

\(z = w^Tx\) or \(z = w_1x_1 + w_2x_2 + ... + w_nx_n + b = w^Tx + b\)

\(y = f(z) = \begin{cases} 1, & z \ge 0 \\ 0, & z < 0 \end{cases}\)

The perceptron learning rule can be summarized as follows:

  1. Initialize the weights and bias unit to 0
  2. For each training example \(x^{(i)}\), compute the output value \(\hat{y}^{(i)}\) which is the predicted class label of the \(i\)th training example, predicted by the threshold function \(f(z)\)
  3. Compare the predicted class label of the \(i\)th training example \(\hat{y}^{(i)}\) to the true class label of the \(i\)th training example \(y^{(i)}\)
  4. Update the weights \(w\) and bias unit \(b\) simultaneously

Formally,

\(\forall w_j \in w, w_j := w_j + \Delta w_j\)

\(\Delta w_j = \eta(y^{(i)} - \hat{y}^{(i)})* x^{(i)}\)

\(b := b + \Delta b\)

\(\Delta b = \eta(y^{(i)} - \hat{y}^{(i)})\)

Note

\(:=\) is “defined as”

\(\eta\) is the Greek letter “eta” and is often used for the learning rate in ML, typically defined as a constant between 0 & 1

Some observations:

  • Each weight \(w_j\) corresponds to a feature \(x_j\). The bias unit \(b\) does not.
  • Each weight update \(\Delta w_j\) is proportional to the value of \(x^{(i)}_j\). The bias unit update is not.
    • Compare \(x^{(i)}_j = 10\) to \(x^{(i)}_j = 1\) in the example where it is incorrectly classified as class \(0\) when the true class label is \(1\). Assume \(\eta = 1\)
      • \(\Delta w_j = (1 - 0) * 10 = 10\)
      • \(\Delta w_j = (1 - 0) * 1 = 1\)
    • The first example will push the decision boundary by a factor of \(10\)
  • The bias unit \(b\) is part of the linear combination (the score that the perceptron computes), not the activation (the step function). So, \(y = f(z)\) is correct, not \(y = f(z) + b\). The bias shifts the decision boundary.
  • You can see from the formal definition that the bias unit and weights remain unchanged when the perceptron predicts the class label correctly. The perceptron only updates when it makes a mistake in classification.
  • If the data is linearly separable, the perceptron is guaranteed to find a separating hyperplane within a finite amount of updates. If not linearly separable, it will update forever - you need to maximum number of epochs in this situation.
Note

A step function (in the context of perceptrons) is an activation function that outputs only two possible values. It decides yes/no based on whether the input crosses a threshold.

An activation function is applied to a neuron to determine whether it should “fire” or stay inactive.

An epoch is a pass over the training dataset.

Implementation in Python

  • If you define the perceptron interface as a Python class, you can initialize new Perceptron objects that can learn from data using a fit method and make predictions using a predict method.
Note

An underscore _ is appended to attributes that are not created upon initialization of object, e.g., self.w_

In Python’s OOP framework, a class is the blueprint, an object is an instance of the class, __init__ is the initializer method. An instance method is a function defined inside a class that operates on a specific instance (object) of that class.

Every instance method must take self as the first parameter because you need to explicitly state which object you are applying it to, i.e., self.something means apply this something to this object so it persists. Persistence is important because after the method finishes, the object will still “remember” it (you’re attaching it to the object itself and can run print(object.something) on it after the method finishes).

Standard practice for any model class: set hyperparameters as instance attributes once on creation under __init__, then reuse them whenever you train or retrain the model.

import numpy as np 

class Perceptron:
    """Perceptron classifier.

    Parameters

    eta: float, learning rate between 0.0 and 1.0
    n_iter: int, epochs
    random_state: int, random number generator (RNG) for random weight initialization

    Attributes

    w_: 1d-array, weights after fitting
    b_: scalar, bias unit after fitting
    errors_: list, number of misclassifications aka updates in each epoch
    """

    def __init__(self, eta=0.01, n_iter = 20, random_state=1):
        self.eta = eta
        self.n_iter = n_iter
        self.random_state = random_state

    def fit(self, X, y):
        """Fit training data.

        Parameters 

        X: array, features
        y: array, target values

        Returns

        self: object
        """
        rgen = np.random.RandomState(self.random_state) # random number generator
        self.w_ = rgen.normal(loc=0.0, scale=0.01, size=X.shape[1]) # initializes weight vector
        # np.random.normal samples from a Gaussian (== normal distribution) with mean (loc = 0.0, no bias towards + or -) and std (scale = 0.01, weights begin near 0)
        # size == the number of columns in X
        self.b_ = np.float_(0.) # initializes bias value to 0.0
        self.errors_ = [] # logs how many samples were misclassified at each epoch

        for _ in range(self.n_iter): # repeats the training pass n_iter times
            errors = 0 # resets counter at start of each epoch
            for xi, target in zip(X, y): # zip pairs each feature vector xi from X with the corresponding target label in y
                update = self.eta * (target - self.predict(xi)) # update will be 0 if there was no error
                self.w_ += update * xi
                self.b_ += update
                errors += int(update != 0.0) # converts Boolean (update != 0.0) to 1 if True and 0 if False
            self.errors_.append(errors)
        return self
    
    def net_input(self, X):
        """Calculate net input"""
        return np.dot(X, self.w_) + self.b_ # np.dot(a, b) is the dot product of a & b
    
    def predict(self, X):
        """Return class label"""
        return np.where(self.net_input(X) >= 0.0, 1, 0) # np.where(... >= 0.0, 1, 0) returns 1 when the net input >= 0.0, 0 otherwise

def test_perceptron_learns_and_gate():
    # 1) Define the AND dataset
    X = np.array([
        [0, 0],
        [0, 1],
        [1, 0],
        [1, 1],
    ])
    y = np.array([0, 0, 0, 1])

    # 2) Create the model
    clf = Perceptron(eta=0.1, n_iter=20, random_state=1)

    # 3) Train on the dataset
    clf.fit(X, y)

    # 4) Check predictions
    preds = clf.predict(X)

    # 5) Assert predictions match the true labels
    assert np.array_equal(preds, y)

if __name__ == "__main__":
    test_perceptron_learns_and_gate()
    print("✅ Perceptron implementation passed.")
✅ Perceptron implementation passed.

Personal Notes

  • I first read about the MCP neuron model from Why Machines Learn by Anil Ananthaswamy and found his exposition to be helpful in understanding this chapter.
  • I implemented all of these functions with Latex. I’m slowly getting the hang of it.
  • I learned how to use pytest from my command line today. Note to self: pytest uses a discovery pattern of a filename starting with test_ I’m still a novice at writing tests and have been outsourcing this to AI. I understand that this is an important skill (even though LLMs are very good at this) so I should practice this…

Questions I still have

  • N/A

Tomorrow’s plan

  • Continue with Chapter 2 of Raschka’s ML textbook.