0

我正在分析口袋妖怪的数据集。我想创建一个随机森林来预测口袋妖怪是否可以成为传奇。

现在,我有一个由 118 个观察值和 44 列组成的训练数据集:

    variables:
 $ type1_bug     : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type1_dark    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type1_dragon  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type1_electric: int  0 1 0 0 0 1 0 0 0 0 ...
 $ type1_fairy   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type1_fighting: int  0 0 0 0 0 0 0 0 0 0 ...
 $ type1_fire    : int  0 0 1 0 0 0 1 0 0 1 ...
 $ type1_flying  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type1_ghost   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type1_grass   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type1_ground  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type1_ice     : int  1 0 0 0 0 0 0 0 0 0 ...
 $ type1_normal  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type1_poison  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type1_psychic : int  0 0 0 1 1 0 0 0 1 0 ...
 $ type1_rock    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type1_steel   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type1_water   : int  0 0 0 0 0 0 0 1 0 0 ...
 $ type2_        : int  0 0 0 1 1 1 1 1 0 0 ...
 $ type2_bug     : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type2_dark    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type2_dragon  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type2_electric: int  0 0 0 0 0 0 0 0 0 0 ...
 $ type2_fairy   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type2_fighting: int  0 0 0 0 0 0 0 0 0 0 ...
 $ type2_fire    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type2_flying  : int  1 1 1 0 0 0 0 0 1 1 ...
 $ type2_ghost   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type2_grass   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type2_ground  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type2_ice     : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type2_normal  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type2_poison  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type2_psychic : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type2_rock    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type2_steel   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ type2_water   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ hp            : int  90 90 90 106 100 90 115 100 106 106 ...
 $ attack        : int  85 90 100 150 100 85 115 75 90 130 ...
 $ defense       : int  100 85 90 70 100 75 85 115 130 90 ...
 $ sp_attack     : int  95 125 125 194 100 115 90 90 90 110 ...
 $ sp_defense    : int  125 90 85 120 100 100 75 115 154 154 ...
 $ speed         : int  85 100 90 140 100 115 100 85 110 90 ...
 $ is_legendary  : int  1 1 1 1 1 1 1 1 1 1 ...

如您所见,有虚拟变量,但也有目标类is_legendary

问题在于数据不平衡:与传奇口袋妖怪相关的观察数量明显少于非传奇口袋妖怪。因此,我想通过创建合成数据来平衡数据集。有人告诉我,SMOTE function但我遇到了一个错误。请看下面的整个代码:

#Creating a dataset for legendary pokemon and non legendary pokemon

pokemonllegendari <- df_net[df_net$is_legendary == 1,]
pokemoncomu <- df_net[df_net$is_legendary == 0,]

#Selecting attributes

pokemonllegendari <- pokemonllegendari %>% select(type1,type2,hp,attack,defense,sp_attack,sp_defense,speed,is_legendary)
pokemoncomu<- pokemoncomu %>% select(type1,type2,hp,attack,defense,sp_attack,sp_defense,speed,is_legendary)

#Balancing dataset
pokemoncomusample <- sample_n(pokemoncomu,100)

# Concatenating dataset

rawdata <- rbind(pokemonllegendari,pokemoncomusample)

# Dummy variables
rawdata <- dummy.data.frame(rawdata,sep="_")

# Creating training and test datasets

dt <- sort(sample(nrow(rawdata),nrow(rawdata)*.7))

train <- rawdata[dt,]
test <- rawdata[-dt,]

# Increasing number of legendary pokemons using SMOTE

smoted_data <- SMOTE(is_legendary~., train, perc.over=100)

错误是:

Error in T[i, ] : subscript out of bounds
4

0 回答 0