r - R 跳过 /dev/stdin 中的行

Question

我有一个包含数字列表的文件（自己制作：）for x in $(seq 10000); do echo $x; done > file。

$> R -q -e "x <- read.csv('file', header=F); summary(x);"

> x <- read.csv('file', header=F); summary(x);
       V1       
 Min.   :    1  
 1st Qu.: 2501  
 Median : 5000  
 Mean   : 5000  
 3rd Qu.: 7500  
 Max.   :10000

现在，人们可能期望cating 文件和读取 from/dev/stdin具有相同的输出，但事实并非如此：

$> cat file | R -q -e "x <- read.csv('/dev/stdin', header=F); summary(x);"
> x <- read.csv('/dev/stdin', header=F); summary(x);
       V1       
 Min.   :    1  
 1st Qu.: 3281  
 Median : 5520  
 Mean   : 5520  
 3rd Qu.: 7760  
 Max.   :10000

使用table(x)显示跳过了一堆行：

    1  1042  1043  1044  1045  1046  1047  1048  1049  1050  1051  1052  1053 
    1     1     1     1     1     1     1     1     1     1     1     1     1 
 1054  1055  1056  1057  1058  1059  1060  1061  1062  1063  1064  1065  1066 
    1     1     1     1     1     1     1     1     1     1     1     1     1
 ...

看起来 R 正在做一些有趣的事情stdin，因为这个 Python 将正确打印文件中的所有行：

cat file | python -c 'with open("/dev/stdin") as f: print f.read()'

这个问题似乎是相关的，但更多的是关于在格式错误的 CSV 文件中跳过行，而我的输入只是一个数字列表。

score 3 · Accepted Answer

head --bytes=4K file | tail -n 3

产生这个：

1039
1040
104

这表明 R 在 /dev/stdin 上创建了一个大小为 4KB 的输入缓冲区，并在初始化期间填充它。当您的 R 代码读取 /dev/stdin 时，它会在此时从文件中开始：

实际上，如果在文件中将行替换为，则会在以下1041内容中1043得到“3”而不是“1” table(x)：

3  1042  1043  1044  1045  1046  1047  1048  1049  1050  1051  1052  1053 
1     1     1     1     1     1     1     1     1     1     1     1     1 
...

第一个1实际上table(x)是的最后一个数字1041。前 4KB 的文件已被吃掉。

r - R 跳过 /dev/stdin 中的行

1 回答 1

Related

Reference