2

我想将文本文件中的数据读入 R 数据框。数据由管道分隔,|并且在值周围也有引号。我已经尝试了一些组合,read.table但它将所有内容导入到单个字段中,而不是拆分它。数据如下所示:

"CompetitorDataID"|"CompetitorID"|"ItemID"|"UserID"|"CountryID"|"SegmentID"|"TaskID"|"Price"|"Comment"|"CreateDate"|"GeneralCustomer"|"TenderResult"
"29"|"5"|"187630"|"1375"|"5"|"398"|"4085"|"5.000000"|"test"|"2013-01-1002:58:23.230000000"|"False"|"1"
"30"|"5"|"1341"|"1294"|"5"|"398"|"4088"|"6.000000"|"test"|"2013-01-1003:15:26.687000000"|"False"|"1"
"31"|"5"|"1007"|"1375"|"5"|"398"|"4105"|"5.000000"|""|"2013-01-1005:50:51.150000000"|"False"|"1"

尽管此代码在粘贴到 R 时会导入,但它不适用于原始文本文件。我收到以下错误消息:

Warning messages:
1: In read.table("competitorDataCopy.txt", header = TRUE, sep = "|") :
  line 1 appears to contain embedded nulls
2: In read.table("competitorDataCopy.txt", header = TRUE, sep = "|") :
  line 2 appears to contain embedded nulls
3: In read.table("competitorDataCopy.txt", header = TRUE, sep = "|") :
  line 3 appears to contain embedded nulls
4: In read.table("competitorDataCopy.txt", header = TRUE, sep = "|") :
  line 4 appears to contain embedded nulls
5: In read.table("competitorDataCopy.txt", header = TRUE, sep = "|") :
  line 5 appears to contain embedded nulls
6: In read.table("competitorDataCopy.txt", header = TRUE, sep = "|") :
  line 1 appears to contain embedded nulls
7: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  embedded nul(s) found in input
4

3 回答 3

2

您可以通过以下方式轻松导入管道分隔的 .txt 文件:

file_in <- read.table("C:/example.txt", sep = "|")

这适用于任何字符分隔的文本文件,只需更改sep以适应。

于 2018-06-14T13:01:22.593 回答
0

我通过在记事本中打开文件并将编码从 Unicode 更改为 ANSI 解决了这个问题。不知道为什么这会有所作为,但它现在可以干净地导入。

于 2014-11-11T12:14:30.090 回答
0

设置 sep="|" 似乎对我有用。的默认参数read.tablequote="\"",因此它会自动从值的开头/结尾去除引号。

read.table(text='"CompetitorDataID"|"CompetitorID"|"ItemID"|"UserID"|"CountryID‌​
"|"SegmentID"|"TaskID"|"Price"|"Comment"|"CreateDate"|"GeneralCustomer"|"TenderRe‌​sult" 
"29"|"5"|"187630"|"1375"|"5"|"398"|"4085"|"5.000000"|"test"|"2013-01-10     02:58:23.230000000"|"False"|"1" 
"30"|"5"|"1341"|"1294"|"5"|"398"|"4088"|"6.000000"|"test"|"2013-01-10     03:15:26.687000000"|"False"|"1" 
"31"|"5"|"1007"|"1375"|"5"|"398"|"4105"|"5.000000"|""|"2013-01-10 05:50:51.150000000"|"False"|"1"'
, sep="|", header=T)
于 2014-11-10T21:36:07.367 回答