linux - 如何将一个文本文件拆分为多个 *.txt 文件？

Question

我得到了一个文本文件file.txt（12 MB），其中包含：

something1
something2
something3
something4
(...)

有没有办法分割file.txt成 12 个 *.txt 文件，比如说file2.txt, file3.txt,file4.txt等等？

score 79 · Accepted Answer

您可以使用 Linux Bash 核心实用程序split：

split -b 1M -d  file.txt file

请注意，M或MB两者都可以，但大小不同。MB 为 1000 * 1000，M 为 1024^2

如果要按行分隔，可以使用-l参数。

更新

a=(`wc -l yourfile`) ; lines=`echo $(($a/12)) | bc -l` ; split -l $lines -d  file.txt file

Kirill建议的另一种解决方案，您可以执行以下操作

split -n l/12 file.txt

注意 is lnot one,split -n有几个选项，比如N, k/N, l/k/N, r/N, r/k/N.

score 77 · Accepted Answer

$ split -l 100 input_file output_file

其中-l是每个文件中的行数。这将创建：

输出文件aa
output_fileab
output_fileac
输出文件
……

score 29 · Accepted Answer

CS Pei 的答案不会像 OP 想要的那样生成 .txt 文件。利用：

split -b=1M -d  file.txt file --additional-suffix=.txt

score 2 · Accepted Answer

使用重击：

readarray -t lines < file.txt
count=${#lines[@]}

for i in "${!lines[@]}"; do
    index=$(( (i * 12 - 1) / count + 1 ))
    echo "${lines[i]}" >> "file${index}.txt"
done

使用AWK：

awk '{
    a[NR] = $0
}
END {
    for (i = 1; i in a; ++i) {
        x = (i * 12 - 1) / NR + 1
        sub(/\..*$/, "", x)
        print a[i] > "file" x ".txt"
    }
}' file.txt

与不同split的是，这确保行数最均匀。

score 1 · Accepted Answer

不管之前的回答中说了什么，在我的Ubuntu 16.04 (Xenial Xerus) 上，我必须这样做：

split -b 10M -d  system.log system_split.log

请注意和值之间的空格。-b

score 0 · Accepted Answer

在我的 Linux 系统（Red Hat Enterprise 6.9）上，该split命令没有-n或的命令行选项--additional-suffix。

相反，我使用了这个：

split -d -l NUM_LINES really_big_file.txt split_files.txt.

where-d是在末尾添加一个数字后缀，split_files.txt.并-l指定每个文件的行数。

例如，假设我有一个非常大的文件，如下所示：

$ ls -laF
total 1391952
drwxr-xr-x 2 user.name group         40 Sep 14 15:43 ./
drwxr-xr-x 3 user.name group       4096 Sep 14 15:39 ../
-rw-r--r-- 1 user.name group 1425352817 Sep 14 14:01 really_big_file.txt

该文件有 100,000 行，我想将其拆分为最多 30,000 行的文件。此命令将运行拆分并在输出文件模式的末尾附加一个整数split_files.txt.。

$ split -d -l 30000 really_big_file.txt split_files.txt.

生成的文件被正确拆分，每个文件最多 30,000 行。

$ ls -laF
total 2783904
drwxr-xr-x 2 user.name group        156 Sep 14 15:43 ./
drwxr-xr-x 3 user.name group       4096 Sep 14 15:39 ../
-rw-r--r-- 1 user.name group 1425352817 Sep 14 14:01 really_big_file.txt
-rw-r--r-- 1 user.name group  428604626 Sep 14 15:43 split_files.txt.00
-rw-r--r-- 1 user.name group  427152423 Sep 14 15:43 split_files.txt.01
-rw-r--r-- 1 user.name group  427141443 Sep 14 15:43 split_files.txt.02
-rw-r--r-- 1 user.name group  142454325 Sep 14 15:43 split_files.txt.03


$ wc -l *.txt*
    100000 really_big_file.txt
     30000 split_files.txt.00
     30000 split_files.txt.01
     30000 split_files.txt.02
     10000 split_files.txt.03
    200000 total

score 0 · Accepted Answer

我同意@CS Pei，但这对我不起作用：

split -b=1M -d file.txt file

......因为=后来-b把它扔掉了。相反，我只是删除了它，并且在它和变量之间没有空格，并使用小写的“m”：

split -b1m -d file.txt file

并附加“.txt”，我们使用@schoon所说的：

split -b=1m -d file.txt file --additional-suffix=.txt

我有一个 188.5MB 的 txt 文件，我使用了这个命令 [但-b5m用于 5.2MB 文件]，它返回了 35 个拆分文件，所有这些文件都是 txt 文件和 5.2MB，除了最后一个是 5.0MB。现在，由于我希望我的行保持完整，我想每 100 万行拆分一次主文件，但是该split命令甚至不允许我做-100000更不用说 " -1000000，所以要拆分的大量行是行不通的。

score 0 · Accepted Answer

如果每个部分的行数相同，例如 22，这是我的解决方案：

split --numeric-suffixes=2 --additional-suffix=.txt -l 22 file.txt file

并且您获得file2.txt的前 22 行，file3.txt的 22 下一行，依此类推。

感谢@hamruta-takawale、@dror-s 和 @stackoverflowuser2010

score 0 · Accepted Answer

尝试这样的事情：

awk -vc=1 'NR%1000000==0{++c}{print $0 > c".txt"}' Datafile.txt

for filename in *.txt; do mv "$filename" "Prefix_$filename"; done;

linux - 如何将一个文本文件拆分为多个 *.txt 文件？

9 回答 9

Related

Reference