#36 Machine Learning & Data Science Challenge 36

#36 Machine Learning & Data Science Challenge 36

VIF(Variation Inflation Factor), Weight of Evidence & Information Value. Why and when to use it?

Variation Inflation Factor:

  • It provides an index that measures how much the variance (the square of the estimate's standard deviation) of an estimated regression coefficient is increased because of collinearity.

  • VIF = 1 / (1-R-Square of j-th variable) where R2 of jth variable is the coefficient of determination of the model that includes all independent variables except the jth predictor.

  • Where R-Square of the j-th variable is the multiple R2 for the regression of Xj on the other independent variables (a regression that does not involve the dependent variable Y).

  • If VIF > 5, then there is a problem with multicollinearity.

Understanding VIF:

  • If the variance inflation factor of a predictor variable is 5 this means that the variance for the coefficient of that predictor variable is 5 times as large as it would be if that predictor variable were uncorrelated with the other predictor variables.

  • In other words, if the variance inflation factor of a predictor variable is 5 this means that the standard error for the coefficient of that predictor variable is 2.23 times (√5 = 2.23) as large as it would be if that predictor variable were uncorrelated with the other predictor variables.

Weight of evidence (WOE) and information value (IV) are simple, yet powerful techniques to perform variable transformation and selection.

The formula to create WOE and IV is;

Here is a simple table that shows how to calculate these values.

The IV value can be used to select variables quickly.

Did you find this article valuable?

Support Bhagirath's Blog Vision by becoming a sponsor. Any amount is appreciated!