Punishing The Machine

Today is a fun one, and by fun one, I mean totally mind numbing. There are some really neat concepts that I will discussing, however, they become quite technical. So, gather what you can from the context and enjoy the blog!

This blog will go over the exciting process that actually makes the machine learn. Like I mentioned before, machines learn by fine tuning their neural network weights and biases to better detect certain symbols. The cool thing is, the weights and biases can develop themselves using a special function that reduces “cost”. More on this in a bit, but for now, let’s recap.

Neural networks are comprised of layers of neurons that range from 0 to 1 in grey scale, or activeness. These layers are connected – neuron by neuron – by mathematical functions, influenced by weights and biases. These weights and biases are what allow a neural network to identify certain aspects of an input image. This then generates a pattern on each layer of neurons until the final digit is revealed.

Now how does a computer know if it has successfully identified an image? The first couple of times that an image is input, the answer will be far from correct. Reason being is the weights and biases are set completely random. To check the result, the incorrectly activated output neuron values are subtracted from the correct output neuron, then squared (this is difficult to explain, so check the image below for reference).

The columns on the right represent the initial output layer compared to the expected output layer where the correct output digit – 3 – has a grey scale of 1. The column outputs are placed into a difference function that calculates the square of the difference. Larger values represent a larger “cost” which means that the values are incorrect and need to be adjusted. Smaller values indicate a better match. This is how a computer punishes itself.

Now, it’s time for the machine to learn from its mistakes. To understand how this is done, imagine another function that represents a cost function. This cost function takes weights in as as the input, then spits out the cost value accordingly. To minimize the cost, the local minimum of the function needs to be evaluated which requires some calculus.

Once the minimum cost has been located, the associated weights are then used to tune the neural network. Repeating this process with a series of input images, such as the ones below, a computer can literally learn.

This is a really general way that computers learn. There are, of course, hundreds of new methods in which computers can maximize image processing speeds. But I hope you enjoyed these introductory blogs. More AI content is on the way!