我有一个关于微博的数据集是这样的:
uid mid annotations bmiddle_pic created_at favorited geo in_reply_to_screen_name in_reply_to_status_id in_reply_to_user_id original_pic reTweetId reUserId source thumbnail_pic truncated dateTime year month date
2025135630 3431909076450860 Fri Apr 06 20:12:27 +0800 2012 FALSE None NA NA NA 3.42867E+15 1292317643 <a href=http://localhost/web/cellphone.php#android rel=nofollow>Android???</a> FALSE 6/4/12 20:12 2012 4 6
1707427294 3439478742005300 Fri Apr 27 17:31:36 +0800 2012 FALSE None NA NA NA 3.43689E+15 1717022775 <a href=http://localhost/proc/productintro.php rel=nofollow>???????</a> FALSE 27/4/12 17:31 2012 4 27
1707427294 3449202430032250 Thu May 24 13:30:06 +0800 2012 FALSE None NA NA NA 3.44822E+15 1717022775 <a href=http://localhost/proc/productintro.php rel=nofollow>???????</a> FALSE 24/5/12 13:30 2012 5 24
1444865141 3432162475292600 Sat Apr 07 12:59:23 +0800 2012 FALSE None NA NA NA 3.43215E+15 1406200033 <a href=http://localhost/web/cellphone.php#iphone rel=nofollow>iPhone???</a> FALSE 7/4/12 12:59 2012 4 7
1444865141 3451309551846890 Wed May 30 09:03:02 +0800 2012 FALSE None NA NA NA 3.45109E+15 1406200033 <a href=http://localhostrel=nofollow>????</a> FALSE 30/5/12 9:03 2012 5 30
1422308692 3449219618915960 [{'name': u'\u827a\u672f\u4e0e\u751f\u6d3b\u7684\u5e73\u884c\u5bf9\u8bdd', 'title': u'\u827a\u672f\u4e0e\u751f\u6d3b\u7684\u5e73\u884c\u5bf9...', 'url': u'http://localhost/ft/201205215587', 'detailid': u'201205215587', 'appid': 47, 'id': u''}] Thu May 24 14:38:21 +0800 2012 FALSE None NA NA NA 3.44922E+15 1438620052 <a href=http://localhost rel=nofollow>???</a> FALSE 24/5/12 14:38 2012 5 24
我把它变成了一个 data.table 但我无法设置密钥:
DT <- data.table(df)
keycols = c("reUserId", "year","month")
setkeyv(DT, keycols)
它说:
Error in setkeyv(eihun.im60k, keycols) :
Column 17 is length 9 which differs from length of column 1 (143). Invalid data.table. Check NEWS link at top of ?data.table for latest bug fixes. If not already reported and fixed, please report to datatable-help.
当我尝试测试时,作者 Matt Dowle 建议`data.table` 错误:“重新排序收到的不规则长度列表”在 setkey:
sapply(DT, length)
它返回:
uid mid annotations bmiddle_pic
143 143 143 143
created_at favorited geo in_reply_to_screen_name
143 143 143 143
in_reply_to_status_id in_reply_to_user_id original_pic reTweetId
143 143 143 143
reUserId source thumbnail_pic truncated
143 143 143 143
dateTime year month date
143 143 143 143
因此,如果每列的长度为 143,为什么我仍然收到这个错误,说第 17 列的长度为 9?提前致谢!
附言
dput(head(df))
它返回
structure(list(uid.mid.annotations.bmiddle_pic.created_at.favorited.geo.in_reply_to_screen_name.in_reply_to_status_id.in_reply_to_user_id.original_pic.reTweetId.reUserId.source.thumbnail_pic.truncated.dateTime.year.month = structure(c(2L,
3L, 4L, 5L, 6L, 1L), .Label = c("105411 1422308692 3449219618915963 Thu May 24 14:38:21 +0800 2012 False None NA NA NA 3449215332521999 1438620052 <a href=http://localhost rel=nofollow>\345\276\256\350\256\277\350\260\210</a> False 2012-05-24 14:38:21 2012 5",
"22527 2025135630 3431909076450865 Fri Apr 06 20:12:27 +0800 2012 False None NA NA NA 3428667298503554 1292317643 <a href=http://localhost/web/cellphone.php#android rel=nofollow>Android\345\256\242\346\210\267\347\253\257</a> False 2012-04-06 20:12:27 2012 4",
"90933 1707427294 3439478742005300 Fri Apr 27 17:31:36 +0800 2012 False None NA NA NA 3436888868360479 1717022775 <a href=http://localhost/proc/productintro.php rel=nofollow>\346\226\260\346\265\252\345\276\256\345\215\232\344\274\201\344\270\232\347\211\210</a> False 2012-04-27 17:31:36 2012 4",
"91994 1707427294 3449202430032258 Thu May 24 13:30:06 +0800 2012 False None NA NA NA 3448224780547857 1717022775 <a href=http://localhost/proc/productintro.php rel=nofollow>\346\226\260\346\265\252\345\276\256\345\215\232\344\274\201\344\270\232\347\211\210</a> False 2012-05-24 13:30:06 2012 5",
"93408 1444865141 3432162475292602 Sat Apr 07 12:59:23 +0800 2012 False None NA NA NA 3432146591339391 1406200033 <a href=http://localhost/web/cellphone.php#iphone rel=nofollow>iPhone\345\256\242\346\210\267\347\253\257</a> False 2012-04-07 12:59:23 2012 4",
"93772 1444865141 3451309551846895 Wed May 30 09:03:02 +0800 2012 False None NA NA NA 3451094757864706 1406200033 <a href=http://localhost rel=nofollow>\346\226\260\346\265\252\345\276\256\345\215\232</a> False 2012-05-30 09:03:02 2012 5"
), class = "factor")), .Names = "uid.mid.annotations.bmiddle_pic.created_at.favorited.geo.in_reply_to_screen_name.in_reply_to_status_id.in_reply_to_user_id.original_pic.reTweetId.reUserId.source.thumbnail_pic.truncated.dateTime.year.month", row.names = c(NA,
6L), class = "data.frame")