0

我正在通过 rfacebook 包从公共页面中提取数据。我的代码如下:

fb_oauth <- fbOAuth(app_id="###", app_secret="###", extended_permissions = FALSE,
                    legacy_permissions = FALSE) #id/secret is hidden


telekom <- getPage(page="TelekomMK", token=fb_oauth)

t_head <- head(telekom, n = 1)

这是我得到的数据:

    dput(t_head)
structure(list(from_id = "86689960994", from_name = "Telekom MK", 
    message = "<U+0421><U+043C><U+0430><U+0440><U+0442><U+0444><U+043E><U+043D> <U+0437><U+0430> <U+0441><U+0430><U+043C><U+043E> 1 <U+0434><U+0435><U+043D><U+0430><U+0440>! <U+041E><U+0434><U+0431><U+0435><U+0440><U+0435><U+0442><U+0435> <U+0433><U+043E> Huawei P Smart <U+0432><U+043E> Magenta 1L <U+0438> <U+0434><U+043E><U+0431><U+0438><U+0458><U+0442><U+0435> <U+0434><U+0432><U+043E><U+0458><U+043D><U+043E> <U+043F><U+043E><U+0432><U+0435><U+045C><U+0435> <U+043C><U+043E><U+0431><U+0438><U+043B><U+0435><U+043D> <U+0438><U+043D><U+0442><U+0435><U+0440><U+043D><U+0435><U+0442>. <ed><U+00A0><U+00BD><ed><U+00B3><U+00B2>", 
    created_time = "2018-03-09T12:00:00+0000", type = "photo", 
    link = "https://www.facebook.com/TelekomMK/photos/a.90789160994.98029.86689960994/10156117256280995/?type=3", 
    id = "86689960994_10156117256945995", story = NA_character_, 
    likes_count = 6, comments_count = 0, shares_count = 0), .Names = c("from_id", 
"from_name", "message", "created_time", "type", "link", "id", 
"story", "likes_count", "comments_count", "shares_count"), row.names = 1L, class = "data.frame")

我不明白的是..为什么写在cyrlic上的文本会以这些不可读的字符返回?有没有办法解决这个问题?

非常感谢

4

1 回答 1

0

我刚刚使用

TelekomMK_posts <- getPage("TelekomMK", token = fboauth, 
                          n=10000, since = '2009/01/01', 
                          until = '2018/03/15')

结果很好 你只对 3 月 9 日的特定帖子感兴趣吗?我经常使用 FB 提取西里尔字母 - 如果您有任何问题,请联系我 尝试编码为 UTF-8

于 2018-03-27T16:39:00.643 回答