0

Logistic Regression Explained

 2 years ago
source link: https://medium.com/sanrusha-consultancy/logistic-regression-explained-3523cbb22a8e
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Logistic Regression Explained

Logistic Regression is one of the most powerful and my favorite supervised classification techniques. It’s easy to use; however, if you forgot your high school math, you might find it difficult to understand working of Logistic Regression.

I will take you back to that math book before taking deeper dive in Logistic Regression.

Let’s understand formula behind linear regression first. Formula for linear regression is

y=b0+b1X1+b2X2….+bnXn

In this equation, right side contains independent variables which are continuous numbers. This will result into continuous number for dependent variable y.

And here comes the fun part.

Even though name contains regression, Logistic regression is a clarification technique. It is mostly used in cases where the outcome (target variable) will be binary value like Yes or No, True or False, Right or Wrong etc.

So the outcome of the equation should be a binary outcome and not a continuous number. How to convert continuous number to binary ?

Let’s hold it here and let’s review some high school level mathematical functions.

First thing to understand is Log function.

It’s difficult to forget power function. So, let’s review that first.

1*QZzmiCZU5oRBJU1hhwM98g.png

That was easy, 10 to the power of 2 is 100. Log is reverse of this.

If I write Log 100 with the base 10, that means I am asking what power of 10 will make 100. Answer will be 2.

1*RTeMJ18KbqrVGkKgodRVdw.png

Log with base 10 is called common log. If base is not mentioned it is assumed as value 2.718 and it is called natural log.

Log 100 means find the power of 2.718 (also represented as e) to get 100. It can be shown as

1*AKAN3Y3M8LOVeGCBjezlTA.png
1*RkxHgpWjoYh3DgKsNJyzrg.png

The value would be 4.6.

Logit

Logit is log of odds P/(1-P) where P is probability. This is natural log. Base is assumed as e (2.718).

Logistic regression formula is

1*fttXq8GDUFvmyCUFU4tYrQ.png

That means y is replaced with Log (P/1-P) as shown below

1*gpmH75C-KDUcaXi2ysrJbQ.png

As it is natural log base is assumed. The above equation in exponential form will convert to

1*ELM2lo2cyH7QSHFWlLAbcQ.png

Expand this further

1*wDM0l3BogS8GVwkYRaFvhw.png

Coming back to linear regression equation

y=b0+b1X1+b2X2….+bnXn

This equation will return Y as a continuous number. But the above equation of P will calculate the probability based the formula

1*paRrWu-dMEmkNrwFMKTjcQ.png

And, this is where the fun lies. No matter what value of y you put in this equation, it will result P value between 0 and 1. And that is correct, because probability number should always be between 0 and 1.

Try it.

I tried it with following values of Y.

1*Iok6MbJdss85LefieouWNQ.png

It will result in below graph

1*LBzfCr0BEshjzjLDD3SJZQ.png

Now it becomes easy to categorize the result into binary outcome. Let’s say Y value which comes into probability less than 0.5 will result into outcome Yes and other values will result into outcome No.

Here you go!

You understand the heart of Logistic Regression now.

Reference:


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK