Daily Notes: 2025-11-29
ML Notes
Gradient Descent, or how machines learn
- When measuring how “costly” a wrong prediction is / how “accurate” a training example was, you can use the Sum of Squared Errors (SSE).
- It is said to be appropriate when modeling regression with Gaussian noise.
- It is not appropriate for classification or distributions with outliers (perhaps because it penalizes large errors heavily?).
Formally, the Sum of Squared Errors is defined as:
\(SSE = \sum\limits_{i} (y_i - \hat{y}_i)^2\)
- Recall that Weights \(w\) represent how strongly each input dimension influences the neuron, and the Bias \(b\) shifts the activation threshold and acts as an offset.
- 3Blue1Brown says: Neurons are connected to the neurons in the previous layer. The weights are the strength of those connections, and for ReLU-like activations, the bias affects when the neuron is active/inactive.
- A network “learns” by adjusting the parameters \(W\) and \(b\) to minimize a cost function / loss function.
- Gradient Descent is one way in which the network can minimize this cost function by converging towards a local minimum, even when the derivative of the minimum is not 0.
\(W\) and \(w\) mean different things in neural networks.
\(W\): A weight matrix for an entire layer. If a layer has \(n_{in}\) inputs and \(m_{out}\) neurons, then \(W \in R^{n_{in} \times m_{out}}\). Each row is the weight vector of one neuron, and \(z = Wx + b\) is the equivalent of doing many individual dot products at once.
\(w\): A weight vector for a single neuron.
\(w = \begin{bmatrix} w_1 \\ w_2 \\ \vdots \\ w_n \end{bmatrix}\)
Formally, Gradient Descent is the algorithm used to minimize a cost function \(J(\theta)\) by iteratively updating parameters in the direction that reduces the loss:
\(\theta := \theta - \eta \nabla_{\theta} J(\theta)\)
Personal Notes
Questions I still have
- To answer a question from yesterday, it does seem like MNNs are a version of the SNN where there is a \(wx + b\) weight & bias calculation per layer, and then you do it again and again.