使用 R 编程
我有两组数据(securityj 和 securityc)。我想找到cosine
它们之间的相似度值
我使用了这段代码lsa library
databasfile = tempfile()
dir.create(databasfile)
write( databasej, file=paste(databasfile, "D1", sep="/"))
write( databasec, file=paste(databasfile, "D2", sep="/"))
myMatrix = textmatrix(databasfile)
databaseRes <- lsa::cosine(myMatrix[,1], myMatrix[,2])
securityfile = tempfile()
dir.create(securityfile)
write( securityj, file=paste(securityfile, "D1", sep="/"))
write( securityc, file=paste(securityfile, "D2", sep="/"))
securityMatrix = textmatrix(securityfile)
securityRes <- lsa::cosine(securityMatrix[,1], securityMatrix[,2])
运行时出现此错误(textmatrix(securityfile))
FUN(X[[i]], ...) 中的错误:[lsa] - 由于文件的编码问题,无法打开文件 C:\Users\AAA\AppData\Local\Temp\RtmpIDmcl7\file1898438fde2/D1。
在处理数据库文件时它运行得非常好,但是对于安全文件我有错误,并且数据是从同一个原始文件中获取的。问题是我创建了文件然后立即读取它。我尝试更改原始文件编码并确保它是 UTF-8 但没有任何改变
textmatrix
是 中的一个函数lsa library
。我的数据是从清理过的招聘广告中获取的两个二元组列表,(databasej,databasec)和(securityj,securityc)都来自同一个文本文件,它在第一个文件中有效,但在第二个文件中出现错误。对于分隔符 sep="/" ,它与文档中所需的函数相同。
securityj 中的样本输入
[333] "risk assessment" "beginning darkmatter"
[335] "best practices" "create dream"
[337] "darkmatter agile" "darkmatter bring"
[339] "darkmatter impossible" "darkmatter place"
[341] "drive lead" "education drive"
[343] "experience education" "forensic analysis"
[345] "freedom create" "knowledge network"
[347] "lead missing" "missing freedom"
[349] "offers personal" "perl python"
[351] "related security" "security risks"
[353] "standard operating" "windows linux"
[355] "security controls" "systems security"
[357] "advice guidance" "application penetration"
[359] "certified information" "forensics malware"
[361] "guidance areas" "networks applications"
[363] "new era" "practice advice"
[365] "provisioning best" "security certified"
[367] "web application" "government oil"
[369] "kill chain" "network based"
[371] "risk assessments" "technical experience"
[373] "audit compliance" "business units"