1

Executive Summary

Is it standard behavior that shells skip over NUL bytes when doing process substitution?

For example, executing

printf '\0abc' | read value && echo $value

will yield abc. The NUL value is skipped, even though the hexdump of the printf output shows it's clearly being output.

My first thought was "word splitting". However, when using an actual process substitution

value=$(printf '\0abc')

the results are similar and = does not perform word splitting.

Long Story

While searching for the proper answer for this question, I realized that at least three of the shell implementation (ash, zsh, and bash) I am reasonably familiar with will ignore a NUL character when reading the value from process substitution into a variable.

The exact point in the pipeline when this happens seems to be different, but the result is consistently that a NUL byte gets dropped as if it was never there in the first place.

I have checked with some of the implementations, and well, this seems to be normal behavior.

ash will skip over '\0' on input, but it is not clear from the code if this is pure coincidence or intended behavior:

if (lastc != '\0') {
    [...]
}

The bash source code contains an explicit, albeit #ifdef'd warning telling us that it skipped a NUL value on process substitution:

#if 0
      internal_warning ("read_comsub: ignored null byte in input");
#endif

I'm not so sure about zsh's behaviour. It recognizes '\0'as a meta character (as defined by the internal imeta() function) and prepends a special Meta surrogate character and sets bit #5 on the input character, essentially unmetaing it, which makes also makes '\0' into a space ' ')

if (imeta(c)) {
    *ptr++ = Meta;
    c ^= 32;
    cnt++;
}

This seems to get stripped later because there is no evidence that value in the above printf command contains a meta character. Take this with a large helping of salt, since I'm not to familiar with zsh's internals. Also note the side effect free statements.

Note that zsh also allows you to include NUL (meta-escaped) in IFS (making it possible to e.g. word-split find -print0 without xargs -0). Thus printf '\0abc' | read value and value=$(printf '\0abc') should yield different results depending on the value of IFS (read does field splitting).

4

1 回答 1

4

所有现存的 POSIX shell 都使用 C 字符串(以 NUL 结尾),而不是 Pascal 字符串(将它们的长度作为单独的元数据携带,因此能够包含 NUL)。因此,它们不可能在字符串内容中包含 NUL。Bourne Shell 和 ksh 尤其如此,它们都是对 POSIX sh 标准的主要影响。

该规范允许 shell 在此处以实现定义的方式运行;在不知道特定的 shell 和发布的目标的情况下,我不希望在终止在第一个 NUL 处返回的流和完全丢弃 NUL 之间有特定的行为。报价

shell 应通过在子 shell 环境中执行命令来扩展命令替换(请参阅 Shell 执行环境)并用命令的标准输出替换命令替换(命令文本加上封闭的“$()”或反引号),删除替换结束时的一个或多个字符的序列。输出结束前的嵌入字符不得删除;但是,它们可能被视为字段分隔符并在字段拆分期间被消除,具体取决于 IFS 的值和有效的引用。如果输出包含任何空字节,则行为未指定。


这并不是说您不能在广泛可用的 shell 中读取和生成包含 NUL 的流!请参阅下面的内容,使用进程替换(为 bash 编写,但应与 ksh 或 zsh 一起使用,如有微小更改):

# read content from stdin into array variable and a scalar variable "suffix"
array=( )
while IFS= read -r -d '' line; do
  array+=( "$line" )
done < <(process that generates NUL stream here)
suffix=$line # content after last NUL, if any

# emit recorded content
printf '%s\0' "${array[@]}"; printf '%s' "$suffix"
于 2015-09-22T16:50:42.333 回答