(previously posted here, to the wrong sub, with not enough info, which was closed, I edited, the edits seem to have been deleted, & the post consigned to purgatory, so apologies for re-posting, I don't know whether the previous post can/should be resurrected)
In R, I've run some Boosted Regression Trees, aka Generalized Boosting Models, using dismo
which uses gbm
. Reproducible example to get people to where I am currently:
library(dismo); data(Anguilla_train)
angaus.tc5.lr01 <- gbm.step(data=Anguilla_train, gbm.x = 3:13, gbm.y = 2, family = "bernoulli", tree.complexity = 5, learning.rate = 0.01, bag.fraction = 0.5)
(From here). This leaves you with gbm model object "angaus.tc5.lr01". I'd like to generate dendrograms of the splits (folds?), i.e. plot the trees, as per De'ath 2007 (see pic, left pane). BUT: De'ath's plot is of a single regression tree, not a boosted regression tree which is the average of potentially thousands of trees each run with a different set of data randomly drawn from the dataset.
User ckluss kindly suggested rpart, however that needs the model to be generated by rpart
so doesn't work for BRTs/GBMs produced by gbm.step
. The same is true of prp
from rpart.plot
.
pretty.gbm.tree
in gbm
extracts a matrix of info for any one tree selected (try pretty.gbm.tree(angaus.tc5.lr01, i.tree=1)
for the first) so I'm wondering if this might be a plausible route to success? E.g. by writing some script which creates an averaged tree matrix using all of the available trees, then converting this into a tree-like object, possibly using some of the methods here.
People have asked varyingly similar questions seemingly with no success elsewhere on the net. BRT models are regularly described as being 'black boxes' so maybe the general opinion is that one shouldn't need/be able/bother to probe into them and display their inner processes.
If anyone knows enough about BRTs / gbm
and has any ideas, they'd be gratefully received.
Thanks.