r - 尝试在 R 中解析 bencode / torrent 文件

Question

我希望我可以通过 R 自动解析种子文件。我尝试使用R-bencode包：

library('bencode')
test_torrent <- readLines('/home/user/Downloads/some_file.torrent', encoding = "UTF-8")
decoded_torrent <- bencode::bdecode(test_torrent)

但面临错误：

Error in bencode::bdecode(test_torrent) : 
  input string terminated unexpectedly

此外，如果我尝试仅解析此文件的一部分bdecode('\xe7\xc9\xe0\b\xfbD-\xd8\xd6(\xe2\004>\x9c\xda\005Zar\x8c\xdfV\x88\022t\xe4գi]\xcf')，我会得到

Error in bdecode("\xe7\xc9\xe0\b\xfbD-\xd8\xd6(\xe2\004>\x9c\xda\005Zar\x8c\xdfV\x88\022t\xe4գi]\xcf") : 
  Wrong encoding '�'. Allowed values are i, l, d or a digit.

也许在 R 中还有其他方法可以做到这一点？或者我可以在 Rscript 中插入另一种语言代码？提前致谢！

score 0 · Accepted Answer

这里似乎有几个问题。

首先，您的代码不应将 torrent 文件视为 UTF-8 编码的文本文件。每个 torrent 文件都被分成相同大小的文件pieces（除了最后一个 ; ））。Torrent 包含每个片段的 SHA1 哈希值的串联。SHA1 哈希不太可能是有效的 UTF-8 字符串。

因此，您不应该使用将文件读入内存readLines，因为那是针对文本文件的。相反，您应该使用connection：

test_torrent <- file("/home/user/Downloads/some_file.torrent")
open(test_torrent, "rb")
bencode::bdecode(test_torrent)

其次，这个库似乎也遇到了类似的问题。由于readChar它使用，还假设它正在处理文本。这可能是由于最近的 R 版本更改，尽管该库已有 6 年以上的历史。我能够应用快速破解并通过传递useBytes=TRUE到readChar.

https://github.com/UkuLoskit/R-bencode/commit/b97091638ee6839befc5d188d47c02567499ce96

您可以按如下方式安装我的版本：

install.packages("devtools")
library(devtools)
devtools::install_github("UkuLoskit/R-bencode")

警告讲师！我不是 R 程序员 :)。

score 0 · Accepted Answer

可能是 torrent 文件以某种方式损坏。

bencode 值必须以字符i（对于整数）、l（对于列表）、d（对于字典）或数字（对于字符串的长度）开头。

示例字符串 ( '\xe7\xc9...') 不以任何这些字符开头，因此无法解码。

有关bencode 格式的更多信息，请参阅此内容。

r - 尝试在 R 中解析 bencode / torrent 文件

2 回答 2

Related

Reference