2

我有一个关于微博的数据集是这样的:

uid mid annotations bmiddle_pic created_at  favorited   geo in_reply_to_screen_name in_reply_to_status_id   in_reply_to_user_id original_pic    reTweetId   reUserId    source  thumbnail_pic   truncated   dateTime    year    month   date
2025135630  3431909076450860            Fri Apr 06 20:12:27 +0800 2012  FALSE   None    NA  NA  NA      3.42867E+15 1292317643  <a href=http://localhost/web/cellphone.php#android rel=nofollow>Android???</a>      FALSE   6/4/12 20:12    2012    4   6
1707427294  3439478742005300            Fri Apr 27 17:31:36 +0800 2012  FALSE   None    NA  NA  NA      3.43689E+15 1717022775  <a href=http://localhost/proc/productintro.php rel=nofollow>???????</a>     FALSE   27/4/12 17:31   2012    4   27
1707427294  3449202430032250            Thu May 24 13:30:06 +0800 2012  FALSE   None    NA  NA  NA      3.44822E+15 1717022775  <a href=http://localhost/proc/productintro.php rel=nofollow>???????</a>     FALSE   24/5/12 13:30   2012    5   24
1444865141  3432162475292600            Sat Apr 07 12:59:23 +0800 2012  FALSE   None    NA  NA  NA      3.43215E+15 1406200033  <a href=http://localhost/web/cellphone.php#iphone rel=nofollow>iPhone???</a>        FALSE   7/4/12 12:59    2012    4   7
1444865141  3451309551846890            Wed May 30 09:03:02 +0800 2012  FALSE   None    NA  NA  NA      3.45109E+15 1406200033  <a href=http://localhostrel=nofollow>????</a>       FALSE   30/5/12 9:03    2012    5   30
1422308692  3449219618915960    [{'name': u'\u827a\u672f\u4e0e\u751f\u6d3b\u7684\u5e73\u884c\u5bf9\u8bdd', 'title': u'\u827a\u672f\u4e0e\u751f\u6d3b\u7684\u5e73\u884c\u5bf9...', 'url': u'http://localhost/ft/201205215587', 'detailid': u'201205215587', 'appid': 47, 'id': u''}]     Thu May 24 14:38:21 +0800 2012  FALSE   None    NA  NA  NA      3.44922E+15 1438620052  <a href=http://localhost rel=nofollow>???</a>       FALSE   24/5/12 14:38   2012    5   24

我把它变成了一个 data.table 但我无法设置密钥:

DT <- data.table(df)
keycols = c("reUserId", "year","month")
setkeyv(DT, keycols)

它说:

Error in setkeyv(eihun.im60k, keycols) : 
Column 17 is length 9 which differs from length of column 1 (143). Invalid data.table. Check NEWS link at top of ?data.table for latest bug fixes. If not already reported and fixed, please report to datatable-help.

当我尝试测试时,作者 Matt Dowle 建议`data.table` 错误:“重新排序收到的不规则长度列表”在 setkey

sapply(DT, length)

它返回:

                uid                     mid             annotations             bmiddle_pic 
                143                     143                     143                     143 
         created_at               favorited                     geo in_reply_to_screen_name 
                143                     143                     143                     143 
in_reply_to_status_id     in_reply_to_user_id            original_pic               reTweetId 
                143                     143                     143                     143 
           reUserId                  source           thumbnail_pic               truncated 
                143                     143                     143                     143 
           dateTime                    year                   month                    date 
                143                     143                     143                     143 

因此,如果每列的长度为 143,为什么我仍然收到这个错误,说第 17 列的长度为 9?提前致谢!

附言

dput(head(df))

它返回

    structure(list(uid.mid.annotations.bmiddle_pic.created_at.favorited.geo.in_reply_to_screen_name.in_reply_to_status_id.in_reply_to_user_id.original_pic.reTweetId.reUserId.source.thumbnail_pic.truncated.dateTime.year.month = structure(c(2L, 
3L, 4L, 5L, 6L, 1L), .Label = c("105411 1422308692 3449219618915963   Thu May 24 14:38:21 +0800 2012 False None NA NA NA  3449215332521999 1438620052 <a href=http://localhost rel=nofollow>\345\276\256\350\256\277\350\260\210</a>  False 2012-05-24 14:38:21 2012 5", 
"22527 2025135630 3431909076450865   Fri Apr 06 20:12:27 +0800 2012 False None NA NA NA  3428667298503554 1292317643 <a href=http://localhost/web/cellphone.php#android rel=nofollow>Android\345\256\242\346\210\267\347\253\257</a>  False 2012-04-06 20:12:27 2012 4", 
"90933 1707427294 3439478742005300   Fri Apr 27 17:31:36 +0800 2012 False None NA NA NA  3436888868360479 1717022775 <a href=http://localhost/proc/productintro.php rel=nofollow>\346\226\260\346\265\252\345\276\256\345\215\232\344\274\201\344\270\232\347\211\210</a>  False 2012-04-27 17:31:36 2012 4", 
"91994 1707427294 3449202430032258   Thu May 24 13:30:06 +0800 2012 False None NA NA NA  3448224780547857 1717022775 <a href=http://localhost/proc/productintro.php rel=nofollow>\346\226\260\346\265\252\345\276\256\345\215\232\344\274\201\344\270\232\347\211\210</a>  False 2012-05-24 13:30:06 2012 5", 
"93408 1444865141 3432162475292602   Sat Apr 07 12:59:23 +0800 2012 False None NA NA NA  3432146591339391 1406200033 <a href=http://localhost/web/cellphone.php#iphone rel=nofollow>iPhone\345\256\242\346\210\267\347\253\257</a>  False 2012-04-07 12:59:23 2012 4", 
"93772 1444865141 3451309551846895   Wed May 30 09:03:02 +0800 2012 False None NA NA NA  3451094757864706 1406200033 <a href=http://localhost rel=nofollow>\346\226\260\346\265\252\345\276\256\345\215\232</a>  False 2012-05-30 09:03:02 2012 5"
), class = "factor")), .Names = "uid.mid.annotations.bmiddle_pic.created_at.favorited.geo.in_reply_to_screen_name.in_reply_to_status_id.in_reply_to_user_id.original_pic.reTweetId.reUserId.source.thumbnail_pic.truncated.dateTime.year.month", row.names = c(NA, 
6L), class = "data.frame")
4

1 回答 1

3

包“data.table”不能将 POSIXlt 的变量作为有效表的一部分。在 data.table(mydf) 之前对其进行转换。

于 2013-10-14T10:01:26.637 回答