3

I am working with large files, and my question here is two-fold.

  1. Bash - For testing purposes, I would like to iterate over every file in a given directory, taking the Head of each file (say Head 10000), and be left with a cut-down version of each. Either in the same directory or another it doesn't matter a whole lot, though I suppose the same would be preferred.

  2. Python3 - How can I do this programmatically? I imagine I need to use the os module?

4

3 回答 3

5

试试这个:

for i in *; do
    cp "$i" "$i.tail"
    sed -i '10001,$d' "$i.tail"
done

或者简单地说:

for i in *; do
    sed '10001,$d' "$i" > "$i.tail"
done

或者 :

for i in *; do
    head -n 1000 "$i" > "$i.tail"
done

对于 python,如果您想使用 shell 代码,请参阅http://docs.python.org/2/library/subprocess.html 。

于 2013-08-06T21:39:24.897 回答
5

重击:

最直接的方法:

#!/usr/bin/env bash
DEST=/tmp/
for i in *
do
   head -1000 "${i}" > ${DEST}/${i}
done

如果您有大量文件,您可以通过生成文件列表、将它们拆分并针对每个列表运行循环来运行多个作业。

Python:

假设目标是不产生 shell 会话来执行外部二进制文件,如“head”,这就是我将如何去做。

#!/usr/bin/env python
import os

destination="/tmp/"

for file in os.listdir('.'):
  if os.path.isfile( file ):
    readFileHandle = open(file, "r")
    writeFileHandle = open( destination + file , "w")
    for line in range( 0,1000):
      writeFileHandle.write(readFileHandle.readline())
    writeFileHandle.close()
    readFileHandle.close()
于 2013-08-06T22:06:02.153 回答
-1

要以这种方式缩写当前目录中的所有文件,您可以使用:

for f in *; do [[ $f != *.small ]] && head -n 10000 "$f" > "$f".small; done

这些文件将以.small.

要从 python 做到这一点,

import os
os.system('for f in *; do [[ $f != *.small ]] && head -n 10000 "$f" > "$f".small; done')
于 2013-08-06T21:40:15.960 回答