r - 关于读取包含 UTF-8 字符的文件

Question

我有一个包含用 UTF-8 保存的汉字的 csv 文件。

电视项目价格 5000

第一行是标题，第二行是数据。换句话说，它是一个两个向量。

我读了这个文件如下：

amatrix<-read.table("test.csv",encoding="UTF-8",sep=",",header=T,row.names=NULL,stringsAsFactors=FALSE)

但是，输出包括标题的未知标记，即 XUFEFF

在此处输入图像描述

score 1 · Accepted Answer

That is the byte order mark sometimes found in Unicode text files. I'm guessing you're on Windows, since that's the only popular OS where files can end up with them.

What you can do is read the file using readLines and remove the first two characters of the first line.

txt <- readLines("test.csv", encoding="UTF-8")
txt[1] <- substr(txt[1], 3, nchar(txt[1]))
amatrix <- read.csv(text=txt, ...)

r - 关于读取包含 UTF-8 字符的文件

1 回答 1

Related

Reference