我有以下数据集
> head(data)
X UserID NPS V3 V4 V5 Event V7 Element ElementValue
1 1 254727216 10 0 19 10 nps.agent.14b.no other attempt was made 10/4/2014 23:59 cea.element_name nps.agent.14b.no other attempt was made
2 2 298379949 0 0 28 11 nps.agent.14b.no other attempt was made 9/30/2014 23:59 cea.element_name nps.agent.14b.no other attempt was made
3 3 254710917 0 0 20 12 nps.agent.14b.no other attempt was made 9/15/2014 23:59 cea.element_name nps.agent.14b.no other attempt was made
4 4 238919392 7 0 17 9 nps.agent.14b.no other attempt was made 9/17/2014 23:59 cea.element_name nps.agent.14b.no other attempt was made
5 5 144693025 10 0 18 10 nps.agent.14b.no other attempt was made 9/17/2014 23:59 cea.element_name nps.agent.14b.no other attempt was made
6 6 249978568 5 0 21 12 nps.agent.14b.no other attempt was made 9/18/2014 23:59 cea.element_name nps.agent.14b.no other attempt was made
当我将数据集拆分为:
data_splitted <- split(data,data$UserID)
这里的问题是当我用整个数据集而不是这个样本尝试这个时,大小的巨大增加超过了我的内存
> format(object.size(data),units="Mb")
[1] "0.2 Mb"
> format(object.size(data_splitted),units="Mb")
[1] "45.7 Mb"
任何有关为什么会发生这种情况以及是否有任何解决方法的见解将不胜感激。