Back Propagation in Neural Networks


Once you understand Feed Forward mechanism which isn't much useful when you do not back propagate to update the weights and refine your Neural Network, your Neural Network isn't of much use. So in this blog post, let us understand what is back propagation and how could we represent the back propagation mathematically! Back propagation generally means that you distribute the errors to the hidden layers in a proportionate manner relative to how much the weights were contributing to the actual error in the subsequent layer. so the larger the weight, the more of the output error is carried back to the hidden layer.

As always, here is a sheet of paper where I have worked out the Math (Matrix representation). I have vectorised the back propagation mechanism!

Now you might wonder from where did I arrive at my Milestone matrix notation. If you expand on that WThidden_output you will find out that this weights matrix is just a Transpose of the original weights Matrix during the forward propagation phase. This is a big step in understanding the way we distribute the errors into the subsequent hidden layers of the Neural Networks.

The hidden_output in my sheet above in the simple 2 layer, 2 node network mean that my hidden layer is also the input layer. The reason why I decided to call it as hidden_output is for the fact that a typical Neural Network will not just have an input and an output layer, but also several hidden layers. So don't be confused with the hidden_output, well I could have as well called it input_output! But nevertheless, you get the message! Don't you?

Understanding Feed Forward Neural Network Architectures


I have been reading through the architectures of Neural Networks and wanted to grasp the idea behind calculating the weights in a Neural Network and as you can see in the image below is a simple 2 node 2 layer Neural Network.

As you can see that for simplicity sake, I have just used a 2 node 2 layer network, but the same holds true for any sized Neural Network. It's just that the size of the Matrix increases proportionally to the number of inputs! The concepts outlined in the image holds gold!

Math behind simple Linear Regression


I have been wondering on how the math behind a Linear regression works as in most of the ML books that you encounter, the focus will be on giving you a linear equation and just plugging this equation in a Python library to solve for the slope and bias and then use it to predict the new values. It is very rare that they show you how to find the m and b values. So here in a piece of paper, I decided to try that out and it worked out very well! So if you want to learn it, try to understand what partial derivatives are!

In the above solution, I have just solved for m, which is the slope term in a Linear Regression. You can apply the same technique to solve for b! So what you effectively do is to differentiate one term while treating the others as a constant. In simple terms, this is called a partial derivative. A derivative is a measure of something that changes while a partial derivative is a measure of something that changes while treating everything else in this world as a constant! It's that simple! More on Partial Derivatives

What I have shown you here is a Simple Linear Regression, but the technique applies equally as good as a multi variate Linear Regression! Math is fucking fun - Once you understand it!

My Trek Remedy clocks 1000 km this season


So as of today, I managed to clock on record a total of 1000 Km with my Trek Remedy 8. It is obviously much higher than that as for several rides I haven't turned my Sigma odometer device. But for the record, the mileage is 1000 Km for this year so far.

I have explored much of the singles trails around my area! It was fun riding this bike! I hope to do even more miles for the next season!