Introduction to Machine Learning: Logistic Regression — Part 1
Written by Charaka Abeywickrama, Data Science and Engineering Associate & Dr. Rajitha Navarathna, Principal Data Scientist at OCTAVE
Introduction
Logistic regression is a supervised machine learning model used for classification, more specifically it is primarily used for binary classification. Simply put this model could be used to classify input data into two classes, for example, given an input of student revision time the model may classify whether a student is likely to PASS or FAIL. Another example in banking could be the use of customer transaction data to classify transactions as FRAUD or NOT FRAUD.
Please note that despite it being called logistic regression it is actually a classification algorithm. So let’s now dive in deep and understand how this all works.
Going over the math
Logistic regression works by squeezing the output of the linear function between 0 and 1. If you aren’t sure about linear regression you can read my previous blog on this. As you may know, with linear regression we get a continuous output however with the logistic function, the output is restricted between 0 and 1. This is done using the sigmoid function as shown below. We can think of the result of the logistic function as the probability or likelihood that the given input X is of class 1.
Decision boundary
The decision boundary is simply a threshold. For binary classification, if the output of the logistic regression function is greater than or equal to the decision boundary it belongs to class 1. However, if the output is less than the decision boundary it belongs to class 0. As seen in the above figure the 0.5 is marked as the decision boundary however we can tune the boundary similar to a hyperparameter.
Cost function
If you’re not familiar with the cost function you may think of the cost function as the method we use to calculate the error of our model. In linear regression we use a method referred to as mean absolute error (MAE), however, we do not use this approach for logistic regression, as this may lead to a suboptimal solution. For logistic regression this cost function is nonconvex therefore there may be local minimums that the model may settle in. Instead, we use the below function to calculate the error.
Okay if looking at this didn’t make sense to you, let me try and explain the cost function further. Let’s take an example, for y=1 if the model (hθ(x)) output is 1 then the cost would be 0 as -log(1) = 0. On the other hand for y=1 if the model (hθ(x)) output is 0 then the cost would be a maximum as shown by the first graph. The reverse is true for y=0, therefore we may write the above function as:
This cost function is for a single sample in the dataset (one row), so if we would like to calculate the average cost we simply use the below formula. Sum over the cost of all rows and divide by 2 * the number of rows:
Updating the weights
Using the cost function we may update the weights (θ values) of our model. We do so by using the below formula. A reminder α is the learning rate, this is simply a constant telling us how big a step to take for each iteration. Further, the differentiation of our cost function with respect to θ gives us the gradient of the curve. Our objective here is to find a θ for which the cost/error is a minimum.
At point 1 of the slope, the gradient is positive and steep (high) therefore the new θ value will be reduced on the next iteration according to the formula.
At point 2 of the slope, the gradient is positive but smaller therefore the new θ value will be reduced by a smaller amount. Finally, the model will converge close to the minimum cost, where the gradient is approximately 0.
Conclusion
In summary, we learned how logistic regression works and where it could be used. In the next part, I will be going over how to code a logistic model. I hope this article was helpful and easy to understand.