Back arrow

Machine Learning: The Perceptron

Oct. 19, 2023

MACHINE LEARNING

A Biological Analogy

A perceptron is the simplest neural network we can create. Deep neural networks can have thousands, or even millions, of perceptrons tied together. For now, let's just focus on a single perceptron.

Interestingly, the perceptron has an uncanny resemblance to a biological neuron - it receives some input, operates on it, and produces an output:

A comparsion of a biological neuron next to a perceptron.

But Actually, What is It?

Realizing the relationship between a perceptron and a biological neuron doesn't do us much good from a mathematical standpoint. Let's begin by updating our perceptron with some critical variables, ww and bb.

A perceptron with its input, weight, bias, and output.

Every input to the perceptron has an associated weight, denoted by ww. The perceptron itself is associated with a bias, bb. Let's apply these variables to something you learned in math class all those years ago:

y=wx+by = wx+b

That's the equation for a line - look at you go! However, we're still missing something important. If you've read about neural networks before, you've probably heard of something called the activation function. Let's denote this as ff:

A complete perceptron with the addition of its activation function.

We can take our previous line equation and put it through the activation function, yielding:

y=f(wx+b)y = f(wx + b)

But... what's the point? Let's say we're training a model to recognize whether a fruit is an apple or watermelon based on its weight. If y=wx+b=36.27y=wx+b=36.27, does the perceptron think it's an apple or watermelon?

What if we were able to squish 36.2736.27 into a value between 00 and 11 and treat it as a probability? We could then choose apple if the value is closer to 00 and watermelon if the value is closer to 11. This makes the decision process much easier - let's further this example by introducing a popular activation function used in classification problems.

The Sigmoid Function

σ(z)=11+ez\sigma(z)=\frac{1}{1+e^{-z}}

The purpose of the sigmoid function is to take an input, zz, and squash it to a decimal value between 00 and 11. Here's its graph:

The graph of the sigmoid function.

Back to our previous apple vs. watermelon example: After applying the perceptron's weight and bias to the fruit's mass of 73g73\text{g}, we end up with:

y=σ(w73+b)y = \sigma(w\cdot73 + b)

Let's say the result comes out to be 0.130.13. We still don't really know what this means in terms of classifying something as an apple or watermelon.

If we establish what's known as a decision boundary, we will be able to make a finite decision. Let's let our decision boundary be 0.50.5. Then, if y<=0.5y <= 0.5, we classify it as an apple. Otherwise, if y>0.5y>0.5, we classify it as a watermelon. If y=0.13y=0.13, the perceptron believes it to be an apple. On the other hand, if y=0.62y=0.62, then the perceptron believes it to be a watermelon. It's important to note that, at this point, all of these outputs are just guesses.

Learning

Of course, in the beginning, we won't get great results. The perceptron could start by deciding a fruit weighing 9000g9000\text{g} is an apple and one weighing 76g76\text{g} is a watermelon.

The perceptron "learns" by tuning the weight, ww and bias, bb, through a process called gradient descent. Gradient descent aims to minimize the error, or loss function. In simpler terms, we're minimizing how "wrong" the perceptron's guesses are.

Instead of presenting math, I'll introduce a simple analogy. Imagine your guitar is out of tune and you are attempting to tune the low E-string. Each image below is a step in the process of tuning this string:

Four steps in a guitar tuning process as an analogy to gradient descent.

  1. Depicted by the image at left, the tuner initially tells us the string is too flat.
  2. To fix this, we try tightening the string to make the pitch higher. We're getting closer, but the tuner says it's still too flat.
  3. We try to take a bigger step and accidentally tighten the string too much. Indicated by the yellow region, the note is now too sharp.
  4. Let's try loosening the string to make the pitch lower. Indicated by the blue region, the note is just right!
    This is an oversimplified example of how gradient descent works. In this example. We're tuning the weights and biases to improve the correctness of the guesses generated by the perceptron.

A Big Problem

All this time, we've been operating under the assumption there is only one feature in xx. This is fine for our apple vs. watermelon example; however, most problems have more than one feature.

For example, let's say we're trying to recommend apartments to renters based on square footage and number of bedrooms. This means we now have two features - you guessed it - square footage and number of bedrooms.

Handling Multiple Features

Let's call square footage x0x_0 and number of bedrooms x1x_1. Instead of just a scalar number, our input xx is now a matrix:

x=[x0x1]x= \begin{bmatrix} x_0 & x_1 \end{bmatrix}

Remember, each input is associated with its own weight. Let's update our perceptron with w0w_0 and w1w_1:

A perceptron with multiple features and corresponding weights.

And now for our formulas:

x=[x0x1]w=[w0w1]y=f(wTx+b)\begin{gathered} x= \begin{bmatrix} x_0 & x_1 \end{bmatrix} \\ w= \begin{bmatrix} w_0 & w_1 \end{bmatrix} \\ y = f(w^T\cdot x + b) \end{gathered}

Now, we take the dot product of the transposed weight matrix and the input matrix as we're now dealing with matrices instead of scalars. After simplifying, we get:

f(w0x0+w1x1+b)f(w_0x_0+w_1x_1+b)

After the dot product is expanded, we can see each input is multiplied with its correct weight.

Wrapping Up

In this post, I've given a brief introduction into deep neural networks by starting with its simplest element: the perceptron. I've introduced weights, biases, and the activation function. We know the perceptron is capable of learning by tuning its parameters in a process called gradient descent. The foundations of machine learning rely heavily on basic linear algebra and calculus - maybe not as complicated as you initially thought!

I'm currently in the process of writing a follow-up post to further the information presented in this post - please check back again as I'll link to the follow-up post in this one. Please feel free to reach out to me with questions, comments, edits, etc. Thank you for reading!