Logistic Regression Gradient Descent

Emmanuel Kwakye Nyantakyi
2 min readOct 10, 2021

Differentiating the cost function in logistic regression

Figure 1: Algorithm for gradient descent

The above figure is the general equation for gradient descent. To implement this algorithm, one requires a value for the learning rate and an expression for a partially differentiated cost function with respect to theta. For logistic regression, the gradient descent algorithm is defined as:

Figure 2: Algorithm for gradient descent in logistic regression

After a carefully look, one will notice the algorithm in figure 2, is very similar to the algorithm for gradient descent in linear regression. But note that the hypothesis is different for both linear and logistic regressions. In this post, we will derive the partially differentiated cost function with respect to theta for logistic regression from the definition of the cost function.

The cost function in logistic regression can defined by:

Figure 3: Cost function for logistic regression

Where m is the number of examples, thetaJ is as single parameter, y is a m-dimensional vector of the label, X is a matrix of the input data and h is the hypothesis. The hypothesis can be defined as:

Figure 4: Definition of hypothesis for logistic regression

To differentiate the cost function, it is important to note that the log function in the equation is the natural log and not log of base 10.

Figure 5: Simplifying the cost function

Now we’re going to perform the partial differentiation with respect to the thetaJ, which can be any single parameter in out parameter vector. Partial differentiation is very similar to normal differentiation; the only difference is that this time all other variables are assumed to be constants.

Figure 6: Performing partial differentiation on cost function with respect to theta

Thanks for getting to the end of this post. Leave a clap if you enjoyed this :)

--

--