r - 无法使用 r 编程读取临时文件，文件编码错误

Question

使用 R 编程

我有两组数据（securityj 和 securityc）。我想找到cosine它们之间的相似度值

我使用了这段代码lsa library

databasfile = tempfile()
dir.create(databasfile)
write( databasej, file=paste(databasfile, "D1", sep="/"))
write( databasec, file=paste(databasfile, "D2", sep="/"))
myMatrix = textmatrix(databasfile)

databaseRes <- lsa::cosine(myMatrix[,1], myMatrix[,2])

securityfile = tempfile()
dir.create(securityfile)

write( securityj, file=paste(securityfile, "D1", sep="/"))
write( securityc, file=paste(securityfile, "D2", sep="/"))
securityMatrix = textmatrix(securityfile)

securityRes <- lsa::cosine(securityMatrix[,1], securityMatrix[,2])

运行时出现此错误(textmatrix(securityfile))

FUN(X[[i]], ...) 中的错误：[lsa] - 由于文件的编码问题，无法打开文件 C:\Users\AAA\AppData\Local\Temp\RtmpIDmcl7\file1898438fde2/D1。

在处理数据库文件时它运行得非常好，但是对于安全文件我有错误，并且数据是从同一个原始文件中获取的。问题是我创建了文件然后立即读取它。我尝试更改原始文件编码并确保它是 UTF-8 但没有任何改变

textmatrix是中的一个函数lsa library。我的数据是从清理过的招聘广告中获取的两个二元组列表，（databasej，databasec）和（securityj，securityc）都来自同一个文本文件，它在第一个文件中有效，但在第二个文件中出现错误。对于分隔符 sep="/" ，它与文档中所需的函数相同。

securityj 中的样本输入

 [333] "risk assessment"               "beginning darkmatter"         
 [335] "best practices"                "create dream"                 
 [337] "darkmatter agile"              "darkmatter bring"             
 [339] "darkmatter impossible"         "darkmatter place"             
 [341] "drive lead"                    "education drive"              
 [343] "experience education"          "forensic analysis"            
 [345] "freedom create"                "knowledge network"            
 [347] "lead missing"                  "missing freedom"              
 [349] "offers personal"               "perl python"                  
 [351] "related security"              "security risks"               
 [353] "standard operating"            "windows linux"                
 [355] "security controls"             "systems security"             
 [357] "advice guidance"               "application penetration"      
 [359] "certified information"         "forensics malware"            
 [361] "guidance areas"                "networks applications"        
 [363] "new era"                       "practice advice"              
 [365] "provisioning best"             "security certified"           
 [367] "web application"               "government oil"               
 [369] "kill chain"                    "network based"                
 [371] "risk assessments"              "technical experience"         
 [373] "audit compliance"              "business units"

score 0 · Accepted Answer

如果没有可重现的示例，包括看起来像用户定义函数的源代码，就很难评估这个问题textmatrix。

唯一让我感到震惊的是您创建的文件非常奇怪。您正在创建一个有效但随机的目录，然后看起来您正在尝试使用错误的分隔符将两个文件放在该目录中（您的文件分隔符是反斜杠，并且您正在尝试使用在目录中添加文件一个正斜杠）。根据是什么testmatrix（它对传递给它的字符向量参数的作用）以及和的结构databasej，databasec它可能能够理解数据库案例中的文件，但不能理解安全案例。但这是没有可重复示例的猜测。您可以尝试使用带有内置变量的独立于平台的文件分隔符.Platform$file.sep，或者如果您只是在本地运行它，请将其与您的文件分隔符匹配，即\而不是/. 如果这有效，那么万岁。如果没有，试着写一个可复现的例子，你可能会得到更好的帮助~

score 0 · Accepted Answer

0

我将文件编码更改为ANSI，它工作

于 2018-03-09T18:41:30.210 回答

r - 无法使用 r 编程读取临时文件，文件编码错误

2 回答 2

Related

Reference