statistical question are better asked on stats.stackexchange. However, I just went through this for statsmodels, e.g. https://github.com/statsmodels/statsmodels/issues/2376
First, there is no multicollinearity problem in your model and data. p-values are low and confidence intervals are pretty narrow, so the parameters in the model should be a good estimates. A vif of 8 is not large.
A large vif in the constant indicates that the (slope) explanatory variables have also a large constant component. An example would be when a variable has a large mean but only a small variance. An example for perfect collinearity with the constant and rank deficiency of the design matrix is the dummy variable trap, when we did not remove one of the levels of a categorical variable in dummy encoding and the dummies sum to 1 and, therefore, replicate a constant.
The purpose of including the constant in the vif computation is to discover this kind of problems with the design matrix exog
provided by the user. It would not show up if we compute vif on demeaned or standardized explanatory variables.
There has been a long standing debate in statistics and econometrics about whether multicollinearity measures should include a constant or work only with demeaned explanatory variables.
I am currently preparing an extension to statsmodels that gives users the option to compute both versions, with and without constant.
In some cases reparameterization, demeaning and scaling, can improve numerical precision and prediction. So we want to have measures that check the actual design matrix provided by users, but also check a standardized version of the data to see whether demeaning and scaling might improve numerical precision.