我有这个名为 tmp.df.lhs.denorm 的数据表,我在前面提供了前 2 行:
> dput(tmp.df.lhs.denorm[1:2])
structure(list(rules = c("{} => {Dental anesthetic products-Injectables cartridges|2288210-Septocaine Cart 4% w/EPI}",
"{Dental small equipment-Water distiller parts & acc|5528005-EzeeKleen 2.5HD UV Lamp1,Dental small equipment-Water distiller parts & acc|5528005-EzeeKleen 2.5HD UV Lamp2} => {Dental small equipment-Water distiller parts & acc|5528004-EzeeKleen 2.5HD RO Membra}"
), support = c(0.501710236989983, 0.000610798924993892), confidence = c(0.501710236989983,
1), lift = c(1, 1637.2), rule.id = 1:2, lhs_1 = c(NA, "Dental small equipment-Water distiller parts & acc|5528005-EzeeKleen 2.5HD UV Lamp1"
), lhs_2 = c(NA, "Dental small equipment-Water distiller parts & acc|5528005-EzeeKleen 2.5HD UV Lamp2"
)), .Names = c("rules", "support", "confidence", "lift", "rule.id",
"lhs_1", "lhs_2"), class = c("data.table", "data.frame"), row.names = c(NA,
-2L), .internal.selfref = <pointer: 0x0000000007120788>)
注意列 lhs_1 和 lhs_2 是 str split 在列规则上的产物。
我的问题是,对于不同的数据,列规则可能包含由逗号分隔的不同数量的规则,例如,我可以得到 3 列 lhs_1 、 lhs_2 和 lhs_3 等等,这取决于我在列规则中有多少个逗号。解决方案是确定固定数量的 lhs_* 列(我的代码中的参数,假设为 6),其中这个特定示例 dt tmp.df.lhs.denorm 将与名为 lhs_3、lhs_4 的额外 4 个空列合并, lhs_5 和 lhs_6。任何帮助表示赞赏