linux - 不使用 sed 或 awk 从文件中删除特定行

Question

我需要使用 bash 脚本从文件中删除特定的行号。

我使用 -n 选项从 grep 命令获取行号。

由于各种原因，我不能使用 sed，其中至少一个是它没有安装在该脚本需要运行的所有系统上，并且安装它不是一个选项。

awk 是不可能的，因为在测试中，在具有不同 UNIX/Linux 操作系统（RHEL、SunOS、Solaris、Ubuntu 等）的不同机器上，它会在每个机器上给出（有时非常严重）不同的结果。所以，没有awk。

有问题的文件只是一个平面文本文件，每行一条记录，所以除了按数字删除行外，不需要做任何花哨的事情。

如果可能的话，我需要避免做一些事情，比如提取文件的内容，不包括我想要删除的行，然后覆盖原始文件。

score 7 · Accepted Answer

既然你有grep，显而易见的事情是：

$ grep -v "line to remove" file.txt > /tmp/tmp
$ mv /tmp/tmp file.txt
$

但听起来你不想使用任何临时文件 - 我假设输入文件很大，这是一个内存和存储供不应求的嵌入式系统。我认为您理想情况下需要一个可以就地编辑文件的解决方案。我认为这可能是可能的，dd但还没有弄清楚:(

更新- 我想出了如何用 dd 编辑文件。也grep,head和cut是必需的。如果这些不可用，那么它们可能大部分都可以解决：

#!/bin/bash

# get the line number to remove
rline=$(grep -n "$1" "$2" | head -n1 | cut -d: -f1)
# number of bytes before the line to be removed
hbytes=$(head -n$((rline-1)) "$2" | wc -c)
# number of bytes to remove
rbytes=$(grep "$1" "$2" | wc -c)
# original file size
fsize=$(cat "$2" | wc -c)
# dd will start reading the file after the line to be removed
ddskip=$((hbytes + rbytes))
# dd will start writing at the beginning of the line to be removed
ddseek=$hbytes
# dd will move this many bytes
ddcount=$((fsize - hbytes - rbytes))
# the expected new file size
newsize=$((fsize - rbytes))
# move the bytes with dd.  strace confirms the file is edited in place
dd bs=1 if="$2" skip=$ddskip seek=$ddseek conv=notrunc count=$ddcount of="$2"
# truncate the remainder bytes of the end of the file
dd bs=1 if="$2" skip=$newsize seek=$newsize count=0 of="$2"

这样运行它：

$ cat > file.txt
line 1
line two
line 3
$ ./grepremove "tw" file.txt
7+0 records in
7+0 records out
0+0 records in
0+0 records out
$ cat file.txt
line 1
line 3
$

可以说这dd是一个非常危险的工具。您可以轻松地无意中覆盖文件或整个磁盘。要非常小心！

score 4 · Accepted Answer

4

尝试ed。下面基于此处文档的示例2从test.txt

ed -s test.txt <<!
2d
w
!

于 2013-10-02T02:08:40.883 回答

score 2 · Accepted Answer

你可以在没有 grep 的情况下使用 posix shell 内置函数来完成它，它应该在任何 *nix 上。

while read LINE || [ "$LINE" ];do
  case "$LINE" in
    *thing_you_are_grepping_for*)continue;;
    *)echo "$LINE";;
  esac
done <infile >outfile

score 2 · Accepted Answer

2

如果n是您要省略的行：

{
  head -n $(( n-1 )) file
  tail +$(( n+1 )) file
} > newfile

于 2013-10-02T02:32:18.343 回答

score 2 · Accepted Answer

Givendd被认为对于这种就地行删除来说太危险了，我们需要一些其他方法来对文件系统调用进行相当细粒度的控制。我最初的冲动是用 c 写一些东西，但尽管可能，我认为这有点矫枉过正。相反，值得寻找常见的脚本（不是 shell 脚本）语言，因为这些语言通常具有相当低级的文件 API，它们以相当直接的方式映射到文件系统调用。我猜这可以使用 python、perl、Tcl 或许多其他可用的脚本语言之一来完成。我最熟悉 Tcl，所以我们开始吧：

#!/bin/sh
# \
exec tclsh "$0" "$@"

package require Tclx

set removeline [lindex $argv 0]
set filename [lindex $argv 1]

set infile [open $filename RDONLY]
for {set lineNumber 1} {$lineNumber < $removeline} {incr lineNumber} {
    if {[eof $infile]} {
        close $infile
        puts "EOF at line $lineNumber"
        exit
    }
    gets $infile line
}
set bytecount [tell $infile]
gets $infile rmline

set outfile [open $filename RDWR]
seek $outfile $bytecount start

while {[gets $infile line] >= 0} {
    puts $outfile $line
}

ftruncate -fileid $outfile [tell $outfile]
close $infile
close $outfile

请注意，在我的特定盒子上，我有 Tcl 8.4，所以我必须加载 Tclx 包才能使用 ftruncate 命令。在 Tcl 8.5 中，chan truncate可以使用 which 来代替。

您可以将要删除的行号和文件名传递给此脚本。

简而言之，脚本执行以下操作：

打开文件进行阅读
阅读前 n-1 行
获取下一行（第 n 行）开头的偏移量
读取第 n 行
使用新的 FD 打开文件进行写入
将写入 FD 的文件位置移动到第 n 行开头的偏移量
继续从 read FD 中读取剩余的行并将它们写入 write FD，直到整个 read FD 被读取
截断写FD

该文件被准确地编辑到位。不使用临时文件。

我很确定这可以用 python 或 perl 重写，或者......如果需要的话。

更新

好的，因此可以使用与上面的 Tcl 脚本类似的技术，在几乎纯 bash 中完成就地行删除。但最大的警告是您需要有truncate可用的命令。我在我的 Ubuntu 12.04 VM 上确实有它，但在我较旧的基于 Redhat 的机器上没有。这是脚本：

#!/bin/bash

n=$1
filename=$2
exec 3<> $filename
exec 4<> $filename
linecount=1
bytecount=0
while IFS="" read -r line <&3 ; do
    if [[ $linecount == $n ]]; then
        echo "omitting line $linecount: $line"
    else
        echo "$line" >&4
        ((bytecount += ${#line} + 1))
    fi
    ((linecount++))
done
exec 3>&-
exec 4>&-

truncate -s $bytecount $filename
#### or if you can tolerate dd, just to do the truncate:
# dd of="$filename" bs=1 seek=$bytecount count=0
#### or if you have python
# python -c "open(\"$filename\", \"ab\").truncate($bytecount)"

我很想听到一种更通用的（仅限 bash？）方法来在最后进行部分截断并完成此答案。当然，截断也可以完成dd，但我认为我之前的回答已经排除了这一点。

为了记录，这个站点列出了如何用许多不同的语言进行就地文件截断——以防万一这些语言中的任何一种都可以在您的环境中使用。

score 1 · Accepted Answer

如果您可以指出在哪种情况下最明显的 Awk 脚本在哪个平台上对您失败，也许我们可以设计一个解决方法。

awk "NR!=$N" infile >outfile

当然，仅将其提供给 Awk 就很容易获得$N低音grep。这将删除包含第一次出现的行foo：

awk '/foo/ { if (!p++) next } 1' infile >outfile

score -1 · Accepted Answer

根据 Digital Trauma 的回答，我发现了一个改进，只需要 grep 和 echo，但不需要 tempfile：

echo $(grep -v PATTERN file.txt) > file.txt

根据您的文件包含的行类型以及您的模式是否需要更复杂的语法，您可以使用双引号将 grep 命令包含在内：

echo "$(grep -v PATTERN file.txt)" > file.txt

（从 crontab 中删除时很有用）

linux - 不使用 sed 或 awk 从文件中删除特定行

7 回答 7

Related

Reference