python - 如何在 AWK 或 Python 中从多个文本文件中打印第二行和最后三行？

Question

使用 awk，我很难尝试从多个文本文件中打印第二行和最后三行。另外，我想将输出定向到文本文件。

任何帮助或建议将不胜感激。

score 3 · Accepted Answer

这样做的好处是整个文件不保存在内存中。

awk 'NR == 2 {print}; {line1 = line2; line2 = line3; line3 = $0} END {print line1; print line2; print line3}' files*

编辑：

下面使用gawk手册中的一些代码，这些代码可移植到其他版本的 AWK。它提供每个文件的处理。请注意，gawk版本 4 提供BEGINFILE和ENDFILE规则。

#!/usr/bin/awk -f
function beginfile (file) {
    line1 = line2 = line3 = ""
}

function endfile (file) {
    print line1; print line2; print line3
}

FILENAME != _oldfilename \
     {
         if (_oldfilename != "")
             endfile(_oldfilename)
         _oldfilename = FILENAME
         beginfile(FILENAME)
     }

     END   { endfile(FILENAME) }

FNR == 2 {
    print
}

{
    line1 = line2; line2 = line3; line3 = $0
}

将其另存为文件，也许称其为“fileparts”。然后做：

chmod u+x fileparts

然后你可以这样做：

./fileparts file1 file2 anotherfile somemorefiles*.txt

它将在一组输出中输出每个文件的第二行和最后三行。

或者您可以将其修改为输出到单独的文件，或者您可以使用 shell 循环输出到单独的文件：

for file in file1 file2 anotherfile somemorefiles*.txt
do
    ./fileparts "$file" > "$file.out"
done

您可以随意命名输出文件。它们将是文本文件。

score 1 · Accepted Answer

为避免一次将整个文件读入内存，请使用 maxlen 为 3 的 deque 创建滚动缓冲区以捕获最后 3 行：

from collections import deque
def get2ndAndLast3LinesFrom(filename):
    with open(filename) as infile:
        # advance past first line
        next(infile)
        # capture second line
        second = next(infile)
        # iterate over the rest of the file a line at a time, saving the final 3
        last3 = deque(maxlen=3)
        last3.extend(infile)        
        return second, list(last3)

您可以将这种方法推广到可以采用任何可迭代的函数：

def lastN(n, seq):
    buf = deque(maxlen=n)
    buf.extend(seq)
    return list(buf)

然后您可以使用部分创建不同长度的“last-n”函数：

from functools import partial
last3 = partial(lastN, 3)

print last3(xrange(100000000)) # or just use range in Py3

score 1 · Accepted Answer

如果你不喜欢 Python 或 AWK 的实现，你可以使用你的 shell 和标准的 head/tail 实用程序做一些非常简单的事情。

for file in "$@"; do
    head -n2 "$file" | tail -n1
    tail -n3 "$file"
done

您也可以将其包装在一个函数中或将其放在脚本中，然后如果您真的需要，然后在 Python 或 AWK 中使用subprocess.check_output()调用它，但在这种情况下，使用本机方法可能更容易，而不是产生一个外部进程。

score 0 · Accepted Answer

这会起作用，但它会将整个文件加载到内存中，如果您的文件非常大，这可能并不理想。

text = filename.readlines()

print text[2] # print second line

for i in range(1,4): # print last three lines
    print text[-i]

这里还讨论了一些不错的替代方案。

score 0 · Accepted Answer

我不知道 awk 但如果您使用的是 Python，我想您将需要这样的东西

inf = open('test1.txt','rU')
lines = inf.readlines()
outf = open('Spreadsheet.ods','w')
outf.write(str(lines[1]))
outf.write(str(lines[-3]))
outf.write(str(lines[-2]))
outf.write(str(lines[-1]))
outf.close()
inf.close()

python - 如何在 AWK 或 Python 中从多个文本文件中打印第二行和最后三行？

5 回答 5

Related

Reference