bash - 在 BASH 中按字节读取文件

Question

我需要读取我指定的文件的第一个字节，然后是第二个字节，第三个等等。我怎么能在 BASH 上做到这一点？PS我需要得到这个字节的十六进制

score 35 · Accepted Answer

完全重写：2019 年 9 月！

比以前的版本更短更简单！（更快，但不是那么多）

是的，bash可以读写二进制文件：

句法：

LANG=C IFS= read -r -d '' -n 1 foo

将填充$foo1 个二进制字节。不幸的是，由于 bash 字符串不能保存空字节 ($ \0)，因此需要读取一个字节一次。

但是对于字节读取的值，我错过了这个man bash（看看 2016 年的帖子，在这个底部）：

 printf [-v var] format [arguments]
 ...
     Arguments to non-string format specifiers are treated as C constants,
     except that ..., and if  the leading character is a  single or double
     quote, the value is the ASCII value of the following character.

所以：

read8() {
    local _r8_var=${1:-OUTBIN} _r8_car LANG=C IFS=
    read -r -d '' -n 1 _r8_car
    printf -v $_r8_var %d "'"$_r8_car
}

$OUTBIN将使用来自 STDIN 的第一个字节的十进制 ascii 值填充提交的变量名称（默认为）

read16() {
    local _r16_var=${1:-OUTBIN} _r16_lb _r16_hb
    read8 _r16_lb &&
    read8 _r16_hb
    printf -v $_r16_var %d $(( _r16_hb<<8 | _r16_lb ))
}

将使用来自 STDIN 的前 16 位字的十进制值填充提交的变量名称（默认为$OUTBIN）...

当然，要切换Endianness，您必须切换：

    read8 _r16_hb &&
    read8 _r16_lb

等等：

# Usage:
#       read[8|16|32|64] [varname] < binaryStdInput

read8() {  local _r8_var=${1:-OUTBIN} _r8_car LANG=C IFS=
    read -r -d '' -n 1 _r8_car
    printf -v $_r8_var %d "'"$_r8_car ;}
read16() { local _r16_var=${1:-OUTBIN} _r16_lb _r16_hb
    read8  _r16_lb && read8  _r16_hb
    printf -v $_r16_var %d $(( _r16_hb<<8 | _r16_lb )) ;}
read32() { local _r32_var=${1:-OUTBIN} _r32_lw _r32_hw
    read16 _r32_lw && read16 _r32_hw
    printf -v $_r32_var %d $(( _r32_hw<<16| _r32_lw )) ;}
read64() { local _r64_var=${1:-OUTBIN} _r64_ll _r64_hl
    read32 _r64_ll && read32 _r64_hl
    printf -v $_r64_var %d $(( _r64_hl<<32| _r64_ll )) ;}

所以你可以source这样，然后如果你/dev/sda是gpt分区的，

read totsize < <(blockdev --getsz /dev/sda)
read64 gptbackup < <(dd if=/dev/sda bs=8 skip=68 count=1 2>/dev/null)
echo $((totsize-gptbackup))
1

答案可能是1（第一个 GPT 在扇区 1，一个扇区是 512 字节。GPT 备份位置在字节 32。bs=8要跳过 512 -> 64 + 32 -> 4 = 544 -> 68 个块...请参阅GUID 分区表在维基百科）。

快速小写功能...

write () { 
    local i=$((${2:-64}/8)) o= v r
    r=$((i-1))
    for ((;i--;)) {
        printf -vv '\%03o' $(( ($1>>8*(0${3+-1}?i:r-i))&255 ))
        o+=$v
    }
    printf "$o"
}

此函数默认为 64 位，小端。

Usage: write <integer> [bits:64|32|16|8] [switchto big endian]

使用两个参数时，第二个参数必须是 , 或 , 之一8，才能16生成输出的位长。3264
使用任何虚拟的第三个参数（甚至是空字符串），函数将切换到大端。

.

read64 foo < <(write -12345);echo $foo
-12345

...

2015年第一次发帖...

升级以添加特定 bash 版本（带有 bashisms）

使用新版本的printf内置，您可以做很多事情而无需 fork ( $(...)) 使您的脚本更快。

首先让我们看看（通过使用seqand sed）如何解析高清输出：

echo ;sed <(seq -f %02g 0 $(( COLUMNS-1 )) ) -ne '
    /0$/{s/^\(.*\)0$/\o0337\o033[A\1\o03380/;H;};
    /[1-9]$/{s/^.*\(.\)/\1/;H};
    ${x;s/\n//g;p}';hd < <(echo Hello good world!)
0         1         2         3         4         5         6         7
012345678901234567890123456789012345678901234567890123456789012345678901234567
00000000  48 65 6c 6c 6f 20 67 6f  6f 64 20 77 6f 72 6c 64  |Hello good world|
00000010  21 0a                                             |!.|
00000012

十六进制部分是否从第 10 列开始并在第 56 列结束，间隔 3 个字符并在第 34 列有一个额外的空格。

因此，可以通过以下方式进行解析：

while read line ;do
    for x in ${line:10:48};do
        printf -v x \\%o 0x$x
        printf $x
      done
  done < <( ls -l --color | hd )

老原帖

编辑 2为十六进制，您可以使用hd

echo Hello world | hd
00000000  48 65 6c 6c 6f 20 77 6f  72 6c 64 0a              |Hello world.|

或者od

echo Hello world | od -t x1 -t c
0000000  48  65  6c  6c  6f  20  77  6f  72  6c  64  0a
          H   e   l   l   o       w   o   r   l   d  \n

不久

while IFS= read -r -n1 car;do [ "$car" ] && echo -n "$car" || echo ; done

试试看：

while IFS= read -rn1 c;do [ "$c" ]&&echo -n "$c"||echo;done < <(ls -l --color)

解释：

while IFS= read -rn1 car  # unset InputFieldSeparator so read every chars
    do [ "$car" ] &&      # Test if there is ``something''?
        echo -n "$car" || # then echo them
        echo              # Else, there is an end-of-line, so print one
  done

编辑；问题已编辑：需要十六进制值！？

od -An -t x1 | while read line;do for char in $line;do echo $char;done ;done

演示：

od -An -t x1 < <(ls -l --color ) |        # Translate binary to 1 byte hex 
    while read line;do                    # Read line of HEX pairs
        for char in $line;do              # For each pair
            printf "\x$char"              # Print translate HEX to binary
      done
  done

演示 2：我们有十六进制和二进制

od -An -t x1 < <(ls -l --color ) |        # Translate binary to 1 byte hex 
    while read line;do                    # Read line of HEX pairs
        for char in $line;do              # For each pair
            bin="$(printf "\x$char")"     # translate HEX to binary
            dec=$(printf "%d" 0x$char)    # translate to decimal
            [ $dec -lt 32  ] ||           # if caracter not printable
            ( [ $dec -gt 128 ] &&         # change bin to a single dot.
              [ $dec -lt 160 ] ) && bin="."
            str="$str$bin" 
            echo -n $char \               # Print HEX value and a space
            ((i++))                       # count printed values
            if [ $i -gt 15 ] ;then
                i=0
                echo "  -  $str"
                str=""
              fi
      done
  done

2016年9月新帖：

这在非常特定的情况下可能很有用，（我用它们在两个磁盘之间手动复制 GPT 分区，在低级别，没有/usr挂载......）

是的，bash 可以读取二进制文件！

...但只有一个字节，一个...（因为 `char(0)' 无法正确读取，正确读取它们的唯一方法是考虑end-of-file，如果没有读取字符并且未到达文件末尾，则读取的字符是 char(0))。

这更像是一个概念证明，而不是一个非常有用的工具：有一个纯bash版本的hd(hexdump)。

这使用最近的bashisms，低于bash v4.3或更高。

#!/bin/bash

printf -v ascii \\%o {32..126}
printf -v ascii "$ascii"

printf -v cntrl %-20sE abtnvfr

values=()
todisplay=
address=0
printf -v fmt8 %8s
fmt8=${fmt8// / %02x}

while LANG=C IFS= read -r -d '' -n 1 char ;do
    if [ "$char" ] ;then
        printf -v char "%q" "$char"
        ((${#char}==1)) && todisplay+=$char || todisplay+=.
        case ${#char} in
         1|2 ) char=${ascii%$char*};values+=($((${#char}+32)));;
           7 ) char=${char#*\'\\};values+=($((8#${char%\'})));;
           5 ) char=${char#*\'\\};char=${cntrl%${char%\'}*};
                values+=($((${#char}+7)));;
           * ) echo >&2 ERROR: $char;;
        esac
      else
        values+=(0)
      fi

    if [ ${#values[@]} -gt 15 ] ;then
        printf "%08x $fmt8 $fmt8  |%s|\n" $address ${values[@]} "$todisplay"
        ((address+=16))
        values=() todisplay=
      fi
  done

if [ "$values" ] ;then
        ((${#values[@]}>8))&&fmt="$fmt8 ${fmt8:0:(${#values[@]}%8)*5}"||
            fmt="${fmt8:0:${#values[@]}*5}"
        printf "%08x $fmt%$((
                50-${#values[@]}*3-(${#values[@]}>8?1:0)
            ))s |%s|\n" $address ${values[@]} ''""'' "$todisplay"
fi
printf "%08x (%d chars read.)\n" $((address+${#values[@]})){,}

您可以尝试/使用它，但不要尝试比较性能！

time hd < <(seq 1 10000|gzip)|wc
   1415   25480  111711
real    0m0.020s
user    0m0.008s
sys     0m0.000s

time ./hex.sh < <(seq 1 10000|gzip)|wc
   1415   25452  111669
real    0m2.636s
user    0m2.496s
sys     0m0.048s

同样的工作： 20 毫秒hdvs 2000 毫秒我的bash script.

...但是如果您想读取文件头中的 4 个字节，甚至是硬盘驱动器中的扇区地址，这可以完成工作...

score 10 · Accepted Answer

你试过了xxd吗？它直接提供十六进制转储，如你所愿..

对于您的情况，命令将是：

xxd -c 1 /path/to/input_file | while read offset hex char; do
  #Do something with $hex
done

注意：从十六进制中提取字符，而不是在读取行时。这是必需的，因为 read 不会正确捕获空白。

score 4 · Accepted Answer

使用 read单个字符可以一次读取如下：

read -n 1 c
echo $c

[回答]

试试这个：

#!/bin/bash
# data file
INPUT=/path/to/input.txt

# while loop
while IFS= read -r -n1 char
do
        # display one character at a time
    echo  "$char"
done < "$INPUT"

从这个链接

第二种方法 Using awk，逐个字符循环

awk '{for(i=1;i<=length;i++) print substr($0, i, 1)}' /home/cscape/Desktop/table2.sql

第三种方式，

$ fold -1 /home/cscape/Desktop/table.sql  | awk '{print $0}'

编辑：将每个字符打印为HEX数字：

假设我有一个文件名file：

$ cat file
123A3445F

我已经编写了一个awk脚本（named x.awk）来逐个字符地读取file并打印到HEX：

$ cat x.awk
#!/bin/awk -f

BEGIN    { _ord_init() }

function _ord_init(    low, high, i, t)
{
    low = sprintf("%c", 7) # BEL is ascii 7
    if (low == "\a") {    # regular ascii
        low = 0
        high = 127
    } else if (sprintf("%c", 128 + 7) == "\a") {
        # ascii, mark parity
        low = 128
        high = 255
    } else {        # ebcdic(!)
        low = 0
        high = 255
    }

    for (i = low; i <= high; i++) {
        t = sprintf("%c", i)
        _ord_[t] = i
    }
}
function ord(str,    c)
{
    # only first character is of interest
    c = substr(str, 1, 1)
    return _ord_[c]
}

function chr(c)
{
    # force c to be numeric by adding 0
    return sprintf("%c", c + 0)
}

{ x=$0; printf("%s , %x\n",$0, ord(x) )}

为了编写这个脚本，我使用了 awk-documentation
现在，您可以将这个awk脚本用于您的工作，如下所示：

$ fold -1 /home/cscape/Desktop/file  | awk -f x.awk
1 , 31
2 , 32
3 , 33
A , 41
3 , 33
4 , 34
4 , 34
5 , 35
F , 46

注意：A值是41十六进制十进制。以十进制打印更改%x为%d脚本的最后一行x.awk。

试试看！！

score 1 · Accepted Answer

另一个解决方案，使用 head、tail 和 printf：

for a in $( seq $( cat file.txt | wc -c ) ) ; do cat file.txt | head -c$a | tail -c1 | xargs -0 -I{} printf '%s %0X\n' {} "'{}" ; done

更具可读性：

#!/bin/bash

function usage() {
    echo "Need file with size > 0"
    exit 1
}

test -s "$1" || usage

for a in $( seq $( cat $1 | wc -c ) )
do
    cat $1 | head -c$a | tail -c1 | \
    xargs -0 -I{} printf '%c %#02x\n' {} "'{}"
done

score 0 · Accepted Answer

0

read与-n选项一起使用。

while read -n 1 ch; do
  echo $ch
done < moemoe.txt

于 2012-12-15T05:42:38.487 回答

score 0 · Accepted Answer

虽然我更想扩展 Perleone 自己的帖子（因为这是他的基本概念！），但我的编辑毕竟被拒绝了，我被建议将其作为单独的答案发布。很公平，所以我会这样做。

对 Perleone 原始脚本的改进的简而言之：

seq在这里完全是矫枉过正。一个用作（同样简单的）计数器变量的简单while循环就可以很好地完成工作（也更快）a
最大值，$(cat $1 | wc -c) 必须分配给一个变量，否则每次都会重新计算，并使这个备用脚本运行得比它派生的那个还要慢。
无需在简单的使用信息行上浪费功能。但是，有必要了解两个命令周围的（强制性）花括号，因为没有{ }，该exit 1命令将在任何一种情况下执行，并且脚本解释器永远不会进入循环。（最后一点：( )也可以，但方式不同！括号将产生一个subshell，而花括号将在当前shell 中执行其中的命令。）

#!/bin/bash

test -s "$1" || { echo "Need a file with size greater than 0!"; exit 1; }

a=0
max=$(cat $1 | wc -c)
while [[ $((++a)) -lt $max ]]; do
  cat $1 | head -c$a | tail -c1 | \
  xargs -0 -I{} printf '%c %#02x\n' {} "'{}"
done

score 0 · Accepted Answer

我有一个建议要给出，但希望得到每个人的反馈，以及来自 syntaxerror 用户的个人建议。

我对 bash 不太了解，但我认为将“cat $1”存储在变量中可能会更好。但问题是 echo 命令也会带来一点开销，对吧？

test -s "$1" || (echo "Need a file with size greater than 0!"; exit 1)
a=0
rfile=$(cat $1)
max=$(echo $rfile | wc -c)
while [[ $((++a)) -lt $max ]]; do
  echo $rfile | head -c$a | tail -c1 | \
  xargs -0 -I{} printf '%c %#02x\n' {} "'{}"
done

在我看来，它会有更好的性能，但我还没有测试过..