linux - 在 Linux 中将文件拆分为不相等的块

Question

我希望将一个大文件（大约 17M 行字符串）拆分为多个文件，每个块中的行数不同。是否可以像这样将数组发送到“ split -l ”命令：

[
 1=>1000000,
 2=>1000537,
 ...
]

以便将那么多行发送到每个块

score 11 · Accepted Answer

使用复合命令：

{
  head -n 10000 > output1
  head -n   200 > output2
  head -n  1234 > output3
  cat > remainder
} < yourbigfile

这也适用于循环：

{
  i=1
  for n in 10000 200 1234
  do
      head -n $n > output$i
      let i++
  done
  cat > remainder
} < yourbigfile

这在 OS X 上不起作用，其中head读取和丢弃额外的输出。

score 1 · Accepted Answer

1

该split命令没有该功能，因此您必须使用不同的工具，或编写自己的工具。

于 2013-02-05T22:18:33.910 回答

score 1 · Accepted Answer

您可以sed通过获取另一个脚本来sed为您生成命令。

# split_gen.py
use strict;
my @limits = ( 100, 250, 340,999);
my $filename = "joker";

my $start = 1;
foreach my $end (@limits) {
    print qq{sed -n '$start,${end}p;${end}q' $filename > $filename.$start-$end\n};
    $start = $end + 1;
}

运行从而perl split_gen.py给出：

sed -n '1,100p;100q' joker > joker.1-100
sed -n '101,250p;250q' joker > joker.101-250
sed -n '251,340p;340q' joker > joker.251-340
sed -n '341,999p;999q' joker > joker.341-999

如果您对命令感到满意，那么您可以

perl split_gen.py | sh

然后享受等待，因为大文件可能会很慢。

linux - 在 Linux 中将文件拆分为不相等的块

3 回答 3

Related

Reference