I'm posting an answer to correct a couple points that seem to have gotten confused. There really is no predict
-function as such. That is what is meant where the help page says "predict" is a "generic function". Sometimes generic functions do have a fun.default
method, but in the case of predict.*
, there is no default method. So dispatch is on the basis of the class of the first argument. There will be separate help pages for each method and the help page for "predict" lists several. Package authors need to write their own predict methods for new classes.
Logistic regression predates the machine learning paradigm, so expecting it to "predict classes" is somewhat unrealistic. Even the fact that you can get a "response" prediction is a gift over what the software would have provided 30 years ago when some of us were taking our regression classes. One needs to understand that probabilities are generally not 0 or 1 but rather something in between. If the user wants to set a threshold and determine how many cases exceed the threshold then that is an analyst decision and the analysts need to make any transformations to categories they deem worthwhile.
Executing: predict(fit, train$Sex)
would be expected to give a result that was as long as there were values from the training set, so I'm guessing that you perhaps meant to try predict(fit, test$Sex)
and were disappointed. If that's the case then it should have been: predict(fit, list(Sex=test$Sex) )
. R needs the argument to be a value that can be coerced to a dataframe, so a named list of values is a minimum requirement for predict
-ors.
If predict.glm
gets a malformed argument to the second argument, newdata
, it falls back on the original data argument and uses the linear predictors that are retained in the model object.