Logistic Regression-python implementation from scratch without using sklearn
Table of contents:
- Generate data
- Split data into the train (75%) and test (25%)
- Standardize the data
- Initialize the weight_vector and intercept
- Compute Sigmoid
- Compute Log Loss
- Calculate Gradient w.r.t. ‘w’
- Calculate Gradient w.r.t. ‘b’
- Train the custom model
- Compare custom model with sklearn SGDClassifier model
- End Notes
In my previous article, I explained Logistic Regression concepts, please go through it if you want to know the theory behind it. In this article, I will cover the python implementation of Logistic Regression with L2 regularization using SGD (Stochastic Gradient Descent) without using sklearn library and compare the result with the sklearn library SGDClassifier.
Let’s get started with python implementation. Below are the steps:
1. Generate data: First, we use sklearn.datasets.make_classification to generate n_class (2 classes in our case) classification dataset:
2. Split data into train (75%) and test (25%): using sklearn.model_selection.train_test_split
3. Standardize the data: using sklearn.preprocessing.StandardScaler. StandardScaler Standardize features by removing the mean and scaling to unit variance. The standard score of a sample x is calculated as: z = (x — u) / s. Where u is the mean of the training samples and s is the standard deviation of the training samples
4. Initialize the weight_vector and intercept term to zeros:
5. Compute Sigmoid: Sigmoid(z) = 1/(1 + exp^-z):
6. Compute Log Loss using below formula:
7. Calculate Gradient w.r.t. ‘w’: We now examine the Gradient arising from a single data set:
There are two below good identities to know about the logistic function:
We can use the above identities to get an intuitive form for the gradient:
8. Now compute gradient with respect to ‘b’:
9. Train the custom model: using the below steps:
Lambda (λ) is the regularization parameter. Please check the link to get more idea about the λ parameter.
We train the model with the below parameters:
Let’s print the weight w, intercept b, train, and test log loss values:
10. Compare custom model with sklearn SGDClassifier model: Now let’s compare these values with sklearn library if they are same or different:
We can see that values of weight and intercept are more or less the same.
11. End Notes:
For full code please refer to my GitHub link.