I have a dataset with lots of features (mostly categorical features(Yes/No)) and lots of missing values.
One of the techniques for dimensionality reduction is to generate a large and carefully constructed set of trees against a target attribute and then use each attribute’s usage statistics to find the most informative subset of features. That is basically we can generate a large set of very shallow trees, with each tree being trained on a small fraction of the total number of attributes. If an attribute is often selected as best split, it is most likely an informative feature to retain.
I am also using an imputer to fill the missing values.
My doubt is what should be the order to the above two. Which of the above two (dimensionality reduction and imputation) to do first and why?