Executive Summary
Is it standard behavior that shells skip over NUL bytes when doing process substitution?
For example, executing
printf '\0abc' | read value && echo $value
will yield abc
. The NUL value is skipped, even though the hexdump of the printf
output shows it's clearly being output.
My first thought was "word splitting". However, when using an actual process substitution
value=$(printf '\0abc')
the results are similar and =
does not perform word splitting.
Long Story
While searching for the proper answer for this question, I realized that at least three of the shell implementation (ash, zsh, and bash) I am reasonably familiar with will ignore a NUL character when reading the value from process substitution into a variable.
The exact point in the pipeline when this happens seems to be different, but the result is consistently that a NUL byte gets dropped as if it was never there in the first place.
I have checked with some of the implementations, and well, this seems to be normal behavior.
ash
will skip over '\0'
on input, but it is not clear from the code if this is pure coincidence or intended behavior:
if (lastc != '\0') {
[...]
}
The bash
source code contains an explicit, albeit #ifdef
'd warning telling us that it skipped a NUL value on process substitution:
#if 0
internal_warning ("read_comsub: ignored null byte in input");
#endif
I'm not so sure about zsh
's behaviour. It recognizes '\0'
as a meta character (as defined by the internal imeta()
function) and prepends a special Meta
surrogate character and sets bit #5 on the input character, essentially unmetaing it, which makes also makes '\0'
into a space ' '
)
if (imeta(c)) {
*ptr++ = Meta;
c ^= 32;
cnt++;
}
This seems to get stripped later because there is no evidence that value
in the above printf
command contains a meta character. Take this with a large helping of salt, since I'm not to familiar with zsh
's internals. Also note the side effect free statements.
Note that zsh
also allows you to include NUL (meta-escaped) in IFS
(making it possible to e.g. word-split find -print0
without xargs -0
). Thus printf '\0abc' | read value
and value=$(printf '\0abc')
should yield different results depending on the value of IFS
(read
does field splitting).