我正在学习面向黑客的机器学习教程 ( https://github.com/johnmyleswhite/ML_for_Hackers ),并且我正在使用 Sublime Text 作为文本编辑器。为了运行我的代码,我使用 SublimeREPL R。
我正在使用此代码,直接取自书中:
setwd("/path/to/folder")
# Load the text mining package
library(tm)
library(ggplot2)
# Loading all necessary paths
spam.path <- "data/spam/"
spam2.path <- "data/spam_2/"
easyham.path <- "data/easy_ham/"
easyham.path2 <- "data/easy_ham_2/"
hardham.path <- "data/hard_ham/"
hardham2.path <- "data/hard_ham_2/"
# Get the content of each email
get.msg <- function(path) {
con <- file(path, open = "rt", encoding = "latin1")
text <- readLines(con)
msg <- text[seq(which(text == "")[1] + 1, length(text),1)]
close(con)
return(paste(msg, collapse = "\n"))
}
# Create a vector where each element is an email
spam.docs <- dir(spam.path)
spam.docs <- spam.docs[which(spam.docs != "cmds")]
all.spam <- sapply(spam.docs, function(p) get.msg(paste(spam.path, p, sep = "")))
# Log the spam
head(all.spam)
这段代码在 RStudio 中运行良好(这里提供的数据:https ://github.com/johnmyleswhite/ML_for_Hackers/tree/master/03-Classification )但是当我在 Sublime 中运行它时,我收到以下错误消息:
> all.spam <- sapply(spam.docs,
+ function(p) get.msg(file.path(spam.path, p)))
Error in seq.default(which(text == "")[1] + 1, length(text), 1) :
'from' cannot be NA, NaN or infinite
In addition: Warning messages:
1: In readLines(con) :
invalid input found on input connection 'data/spam/00006.5ab5620d3d7c6c0db76234556a16f6c1'
2: In readLines(con) :
invalid input found on input connection 'data/spam/00009.027bf6e0b0c4ab34db3ce0ea4bf2edab'
3: In readLines(con) :
invalid input found on input connection 'data/spam/00031.a78bb452b3a7376202b5e62a81530449'
4: In readLines(con) :
incomplete final line found on 'data/spam/00031.a78bb452b3a7376202b5e62a81530449'
5: In readLines(con) :
invalid input found on input connection 'data/spam/00035.7ce3307b56dd90453027a6630179282e'
6: In readLines(con) :
incomplete final line found on 'data/spam/00035.7ce3307b56dd90453027a6630179282e'
>
当我从 John Myles White 的 repo 中获取代码时,我得到了相同的结果。
我怎样才能解决这个问题?
谢谢