8

I try to open a UTF-8 encoded .csv file that contains (traditional) Chinese characters in R. For some reason, R displays the information sometimes as Chinese characters, sometimes as unicode characters.

For instance:

data <-read.csv("mydata.csv", encoding="UTF-8")

data

will produce unicode characters, while:

data <-read.csv("mydata.csv", encoding="UTF-8")

data[,1]

will actually display Chinese characters.

If I turn it into a matrix, it will also display Chinese characters, but if I try to look at the data (command View(data) or fix(data)) it is in unicode again.

I've asked for advice from people who use a Mac (I'm using a PC, Windows 7), and some of them got Chinese characters throughout, others didn't. I tried to save the original data as a table instead and read it into R this way - same result. I tried running the script in RStudio, Revolution R, and RGui. I tried to adjust the locale (e.g. to chinese), but either R didn't let me change it or else the result was gibberish instead of unicode characters.

My current locale is:

"LC_COLLATE=French_Switzerland.1252;LC_CTYPE=French_Switzerland.1252;LC_MONETARY=French_Switzerland.1252;LC_NUMERIC=C;LC_TIME=French_Switzerland.1252"

Any help to get R to consistently display Chinese characters would be greatly appreciated...

4

2 回答 2

3

不是错误,更多的是characterfactor构建data.frame.

您可以先从data <-read.csv("mydata.csv", encoding="UTF-8", stringsAsFactors=FALSE)使您的汉字成为该character类型的开始,因此通过将它们打印出来,您应该会看到您所期望的。

@nograpes:同样x=c('中華民族');x; y <- data.frame(x, stringsAsFactors=FALSE),一切都应该没问题。

于 2012-10-24T19:25:15.523 回答
2

就我而言,utf-8 编码在我的 r 中不起作用。但是 Gb* 编码有效。utf8 在 ubuntu 中工作。首先,您需要找出操作系统中的默认编码。并按原样对其进行编码。Excel 无法将其正确编码为 utf8,即使它声称将其保存为 utf8。

(1) 下载“Open Sheet”软件。

(2) 正确打开。您可以滚动编码方法,直到在预览窗口中看到显示的中文字符。

(3) 将其保存为 utf-8(如果你想要 utf-8)。(UTF-8 并不能解决所有问题,您必须知道系统中的默认编码)

于 2016-07-26T23:51:00.583 回答