Implementing the perceptron as a simple ML classification algorithm without scikit-learn
Mc-Culloch-Pitts (MCP) neuron model: The biological neuron as a simple logic gate with multiple input signals arriving at the dendrites and a binary output.
Perceptron learning rule: Frank Rosenblatt built upon this MCP neuron model by proposing an algorithm that would learn a weight vector \(w\) that would be multiplied with the input features \(x\) to make a decision about whether the neuron fires or not: i.e., a binary output.
This is helpful because it can predict with the classification problem: does a new data point belong to one class or another?
Formally, a decision function \(f(z)\) where, given a defined threshold \(\theta\):
\(z = w_1x_1 + w_2x_2 + ... + w_nx_n = w^Tx\)
Note
\(w\) and \(z\) are both column vectors, which is why we take the transpose\(w^T\) to get the dot product of the \((n \times 1)\) column vectors. \((1 \times n) * (n \times 1) = 1 \times 1\)
\(f(z) = \begin{cases} 1, & z \ge \theta \\ 0, & z < \theta \end{cases}\)
If we introduce a bias unit\(b = -\theta\) for ease of implementation, then:
\(z = w^Tx\) or \(z = w_1x_1 + w_2x_2 + ... + w_nx_n + b = w^Tx + b\)
\(y = f(z) = \begin{cases} 1, & z \ge 0 \\ 0, & z < 0 \end{cases}\)
The perceptron learning rule can be summarized as follows:
Initialize the weights and bias unit to 0
For each training example \(x^{(i)}\), compute the output value \(\hat{y}^{(i)}\) which is the predicted class label of the \(i\)th training example, predicted by the threshold function\(f(z)\)
Compare the predicted class label of the \(i\)th training example \(\hat{y}^{(i)}\) to the true class label of the \(i\)th training example \(y^{(i)}\)
Update the weights \(w\) and bias unit \(b\) simultaneously
\(\eta\) is the Greek letter “eta” and is often used for the learning rate in ML, typically defined as a constant between 0 & 1
Some observations:
Each weight \(w_j\) corresponds to a feature \(x_j\). The bias unit \(b\) does not.
Each weight update \(\Delta w_j\) is proportional to the value of \(x^{(i)}_j\). The bias unit update is not.
Compare \(x^{(i)}_j = 10\) to \(x^{(i)}_j = 1\) in the example where it is incorrectly classified as class \(0\) when the true class label is \(1\). Assume \(\eta = 1\)
\(\Delta w_j = (1 - 0) * 10 = 10\)
\(\Delta w_j = (1 - 0) * 1 = 1\)
The first example will push the decision boundary by a factor of \(10\)
The bias unit \(b\) is part of the linear combination (the score that the perceptron computes), not the activation (the step function). So, \(y = f(z)\) is correct, not \(y = f(z) + b\). The bias shifts the decision boundary.
You can see from the formal definition that the bias unit and weights remain unchanged when the perceptron predicts the class label correctly. The perceptron only updates when it makes a mistake in classification.
If the data is linearly separable, the perceptron is guaranteed to find a separating hyperplane within a finite amount of updates. If not linearly separable, it will update forever - you need to maximum number of epochs in this situation.
Note
A step function (in the context of perceptrons) is an activation function that outputs only two possible values. It decides yes/no based on whether the input crosses a threshold.
An activation function is applied to a neuron to determine whether it should “fire” or stay inactive.
An epoch is a pass over the training dataset.
Implementation in Python
If you define the perceptron interface as a Python class, you can initialize new Perceptron objects that can learn from data using a fit method and make predictions using a predict method.
Note
An underscore _ is appended to attributes that are not created upon initialization of object, e.g., self.w_
In Python’s OOP framework, a class is the blueprint, an object is an instance of the class, __init__ is the initializer method. An instance method is a function defined inside a class that operates on a specific instance (object) of that class.
Every instance method must take self as the first parameter because you need to explicitly state which object you are applying it to, i.e., self.something means apply this something to this object so it persists. Persistence is important because after the method finishes, the object will still “remember” it (you’re attaching it to the object itself and can run print(object.something) on it after the method finishes).
Standard practice for any model class: set hyperparameters as instance attributes once on creation under __init__, then reuse them whenever you train or retrain the model.
import numpy as np class Perceptron:"""Perceptron classifier. Parameters eta: float, learning rate between 0.0 and 1.0 n_iter: int, epochs random_state: int, random number generator (RNG) for random weight initialization Attributes w_: 1d-array, weights after fitting b_: scalar, bias unit after fitting errors_: list, number of misclassifications aka updates in each epoch """def__init__(self, eta=0.01, n_iter =20, random_state=1):self.eta = etaself.n_iter = n_iterself.random_state = random_statedef fit(self, X, y):"""Fit training data. Parameters X: array, features y: array, target values Returns self: object """ rgen = np.random.RandomState(self.random_state) # random number generatorself.w_ = rgen.normal(loc=0.0, scale=0.01, size=X.shape[1]) # initializes weight vector# np.random.normal samples from a Gaussian (== normal distribution) with mean (loc = 0.0, no bias towards + or -) and std (scale = 0.01, weights begin near 0)# size == the number of columns in Xself.b_ = np.float_(0.) # initializes bias value to 0.0self.errors_ = [] # logs how many samples were misclassified at each epochfor _ inrange(self.n_iter): # repeats the training pass n_iter times errors =0# resets counter at start of each epochfor xi, target inzip(X, y): # zip pairs each feature vector xi from X with the corresponding target label in y update =self.eta * (target -self.predict(xi)) # update will be 0 if there was no errorself.w_ += update * xiself.b_ += update errors +=int(update !=0.0) # converts Boolean (update != 0.0) to 1 if True and 0 if Falseself.errors_.append(errors)returnselfdef net_input(self, X):"""Calculate net input"""return np.dot(X, self.w_) +self.b_ # np.dot(a, b) is the dot product of a & bdef predict(self, X):"""Return class label"""return np.where(self.net_input(X) >=0.0, 1, 0) # np.where(... >= 0.0, 1, 0) returns 1 when the net input >= 0.0, 0 otherwisedef test_perceptron_learns_and_gate():# 1) Define the AND dataset X = np.array([ [0, 0], [0, 1], [1, 0], [1, 1], ]) y = np.array([0, 0, 0, 1])# 2) Create the model clf = Perceptron(eta=0.1, n_iter=20, random_state=1)# 3) Train on the dataset clf.fit(X, y)# 4) Check predictions preds = clf.predict(X)# 5) Assert predictions match the true labelsassert np.array_equal(preds, y)if__name__=="__main__": test_perceptron_learns_and_gate()print("✅ Perceptron implementation passed.")
✅ Perceptron implementation passed.
Personal Notes
I first read about the MCP neuron model from Why Machines Learn by Anil Ananthaswamy and found his exposition to be helpful in understanding this chapter.
I implemented all of these functions with Latex. I’m slowly getting the hang of it.
I learned how to use pytest from my command line today. Note to self: pytest uses a discovery pattern of a filename starting with test_ I’m still a novice at writing tests and have been outsourcing this to AI. I understand that this is an important skill (even though LLMs are very good at this) so I should practice this…