bash - tee >(cat -n) < tmpfile 在重复之前完全打印 tmpfile

Question

tmpfile 包含以下内容：

a
b
c
d

问题标题中的命令输出如下：

[me@localhost somedir]$ tee >(cat -n) < tmpfile    
a
b
c
d
[me@localhost somedir]$      1  a
     2  b
     3  c
     4  d

由于 tee 和 cat 是通过命名管道连接的，所以我希望 cat 在 tee 打印下一行之前完成向终端发送输出。像这样的东西：

[me@localhost somedir]$ tee >(cat -n) < tmpfile    
a
1  a
b
2  b
c
3  c
d
4  d
[me@localhost somedir]$

有人可以解释一下这里发生了什么吗？我考虑了竞争条件的可能性，其中 tee 刚刚获胜，但这也发生在大小等于几 KB 的文件中。我觉得这里还有更多的东西。

谢谢。

score 2 · Accepted Answer

如果您希望对方赢得这一点，您可以轻松做到（假设我们使用相同的实现tee，因为具体的排序是实现定义的而不是标准化的）：

# note that this uses automatic FD allocation support added in bash 4.1
( exec {orig_stdout}>&1; { tee >(cat >&$orig_stdout) | cat -n; } <<<$'a\nb\nc' )

简而言之：（tee由 GNU coreutils 8.2.2 实现）写入每个块——而不是每一行；POSIX 规范tee明确禁止面向行的输出缓冲——首先到其标准输出，然后从左到右依次到每个参数。

您可以在实现中看到：

/* Move all the names 'up' one in the argv array to make room for
   the entry for standard output.  This writes into argv[argc].  */
for (i = nfiles; i >= 1; i--)
  files[i] = files[i - 1];

...然后使用 in 中的数组条目构建一个descriptors1:1 的数组映射files，并依次写入每个：

/* Write to all NFILES + 1 descriptors.
   Standard output is the first one.  */
for (i = 0; i <= nfiles; i++)
  if (descriptors[i]
      && fwrite (buffer, bytes_read, 1, descriptors[i]) != 1)

为了解释为什么这将以一致的行为而不是竞争的方式实现—— tee 的 POSIX 规范要求它不缓冲输入。因此，必须在对每个描述符的写入之间保持顺序（当然，在该点之后，如果任何管道中的项目进行自身缓冲，顺序可能会丢失）。

现在：这并不是说tee在继续下一个之前将完整的输入复制到每个位置。相反， tee 以BUFSIZ每个字节块的形式工作，其中BUFSIZ特定于操作系统的常量保证不少于 256 字节，并且在现代（非嵌入式）Linux 上经常在 8K 附近。因此，如果您使用大得多的输入，您会看到交错，正如您所期望的那样......但由于上述原因，顺序一致。

bash - tee >(cat -n) < tmpfile 在重复之前完全打印 tmpfile

1 回答 1

Related

Reference