r - R/SublimeREPL R - 代码不在 sublime 中工作，但在 RStudio 中工作

Question

我正在学习面向黑客的机器学习教程 ( https://github.com/johnmyleswhite/ML_for_Hackers )，并且我正在使用 Sublime Text 作为文本编辑器。为了运行我的代码，我使用 SublimeREPL R。

我正在使用此代码，直接取自书中：

setwd("/path/to/folder")
# Load the text mining package
library(tm)
library(ggplot2)

# Loading all necessary paths
spam.path <- "data/spam/"
spam2.path <- "data/spam_2/"
easyham.path <- "data/easy_ham/"
easyham.path2 <- "data/easy_ham_2/"
hardham.path <- "data/hard_ham/"
hardham2.path <- "data/hard_ham_2/"

# Get the content of each email
get.msg <- function(path) {
    con     <- file(path, open = "rt", encoding = "latin1")
    text    <- readLines(con)
    msg     <- text[seq(which(text == "")[1] + 1, length(text),1)]
    close(con)

    return(paste(msg, collapse = "\n"))
}

# Create a vector where each element is an email
spam.docs   <- dir(spam.path)
spam.docs   <- spam.docs[which(spam.docs != "cmds")]
all.spam    <- sapply(spam.docs, function(p) get.msg(paste(spam.path, p, sep = "")))

# Log the spam
head(all.spam)

这段代码在 RStudio 中运行良好（这里提供的数据：https ://github.com/johnmyleswhite/ML_for_Hackers/tree/master/03-Classification ）但是当我在 Sublime 中运行它时，我收到以下错误消息：

> all.spam <- sapply(spam.docs,
+                    function(p) get.msg(file.path(spam.path, p)))
Error in seq.default(which(text == "")[1] + 1, length(text), 1) : 
  'from' cannot be NA, NaN or infinite
In addition: Warning messages:
1: In readLines(con) :
  invalid input found on input connection 'data/spam/00006.5ab5620d3d7c6c0db76234556a16f6c1'
2: In readLines(con) :
  invalid input found on input connection 'data/spam/00009.027bf6e0b0c4ab34db3ce0ea4bf2edab'
3: In readLines(con) :
  invalid input found on input connection 'data/spam/00031.a78bb452b3a7376202b5e62a81530449'
4: In readLines(con) :
  incomplete final line found on 'data/spam/00031.a78bb452b3a7376202b5e62a81530449'
5: In readLines(con) :
  invalid input found on input connection 'data/spam/00035.7ce3307b56dd90453027a6630179282e'
6: In readLines(con) :
  incomplete final line found on 'data/spam/00035.7ce3307b56dd90453027a6630179282e'
>

当我从 John Myles White 的 repo 中获取代码时，我得到了相同的结果。

我怎样才能解决这个问题？

谢谢

score 0 · Accepted Answer

我认为问题在于使用 encoding=latin1，你可以删除这个，我在我的环境中测试它，它运行良好。

spam.docs <- paste(spam.path,spam.docs,sep="")

all.spam <- sapply(spam.docs,get.msg) 警告消息：在 readLines(con) 中：在“XXXXXXXXXXXXXXXXXX/ML_for_Hackers-master/03-Classification/data/spam/00136.faa39d8e816c70f23b4bb8758d8a74f0”上找到不完整的最后一行

里面还有一些警告，但它可以很好地产生结果。

谢谢。

r - R/SublimeREPL R - 代码不在 sublime 中工作，但在 RStudio 中工作

1 回答 1

Related

Reference