1

我有 unicode csv 文件:

LabelName,Label1,Label2,SpeciesLabel,Group,Subgroup,Species
التسمية 1,Group 1,Subgroup 1,Species 1,1,1,1
التسمية 2,Group 1,Subgroup 1,Species 1,1,1,1
التسمية 3,Group 1,Subgroup 1,Species 1,1,1,1

我想将它读入 R,我使用了这个命令:

Data = read.csv("Data.csv", encoding="UTF-8", fileEncoding = "UTF-8")

但我得到了这个错误:

Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  empty beginning of file
In addition: Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  invalid input found on input connection 'Data.csv'
2: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 1 appears to contain embedded nulls
3: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  incomplete final line found by readTableHeader on 'Data.csv'

如何在 R 中读取 unicode(带有阿拉伯字母)csv 文件。

谢谢!

4

1 回答 1

0

readLines您可以使用with 参数读取文件warn = FALSE,然后read.csv使用以下参数执行text

arabic <- readLines("arabic.csv", warn = FALSE, encoding = "UTF-8")
Data = read.csv(text = arabic)
str(Data)

输出:

'data.frame':   3 obs. of  7 variables:
 $ X.U.FEFF.LabelName: Factor w/ 3 levels "التسمية 1","التسمية 2",..: 1 2 3
 $ Label1            : Factor w/ 1 level "Group 1": 1 1 1
 $ Label2            : Factor w/ 1 level "Subgroup 1": 1 1 1
 $ SpeciesLabel      : Factor w/ 1 level "Species 1": 1 1 1
 $ Group             : int  1 1 1
 $ Subgroup          : int  1 1 1
 $ Species           : int  1 1 1
于 2018-09-29T19:02:21.107 回答