0

如何在 .bat 文件中合并两个文本文件?或者至少如何在 .bat 文件中读取文件的下一行/测试结束?

是否可以使用 .bat 脚本合并两个文本文件?这个想法不是追加或连接,而是根据每行的内容执行合并操作。一个简化的例子是从两个排序文件中生成一个排序文件,就像在伪代码中一样(伪代码,因为我似乎找不到读取下一行并测试读取文件结尾的方法 - 在 for环形)

:TOP
 Set /p  Line1 Read_Line (file1)
:set /p  Line2 Read_Line (file2)
:TEST
 IF EOF(file1) GOTO  FINISH2
 IF EOF (file2) GOTO FINISH1
 IF  %Line1%  < %Line2% 
        (echo %Line1% - not in 2 >> File3
        set  /p Line1 =Read_Line (file1)
        GOTO TEST)
ELSE IF %Line1%  > %Line2% 
        (echo %Line2% - not in 1>> File3
        set  /p Line2=Read_Line (file2)
        GOTO TEST)
ELSE echo %Line1% in both >> File3
GOTO TOP
:FINISH1
echo %Line2% - not in 1>> File3
        set /p Line1=Read_Line (file1)
        IF NOT (EOF (File1)) 
                (echo %Line1% - not in 2 >> File3
                 GOTO FINISH1)
ELSE GOTO EOF
:FINISH2
           echo %Line2% - not in 1>> File3
        set /p Line2 =Read_Line (file2)
        IF NOT (EOF (File1) )
                (echo %Line2% - not in 1 >> File3
                 GOTO FINISH2)

我尝试使用 for 循环,但在循环内分支似乎停止了循环。我尝试了各种方法(包括并行 .bat)来找到一种在文件内移动光标的方法set<但找不到正确的方法。

4

2 回答 2

2

批处理确实是一种用于文本处理的糟糕“语言”。几乎你能找到的任何其他工具都比批处理更好(更容易开发和更快地执行)。我提供批处理解决方案是因为我喜欢挑战,但我总是会推荐一些其他语言或工具而不是批处理来进行文本处理。话虽如此...

假设两个源文件都已排序。

@echo off
setlocal enableDelayedExpansion

::define the files
set "in1=file1.txt"
set "in2=file2.txt"
set "out=file3.txt"

::define some simple macros
set "eof1=^!ln1^! gtr ^!cnt1^!"
set "eof2=^!ln2^! gtr ^!cnt2^!"
set "read1=if ^!ln1^! leq ^!cnt1^! set "txt1=" & <&3 set /p "txt1=" & set /a ln1+=1"
set "read2=if ^!ln2^! leq ^!cnt2^! set "txt2=" & <&4 set /p "txt2=" & set /a ln2+=1"
set "write1=echo(^!txt1^! - not in 2"
set "write2=echo(^!txt2^! - not in 1"
set "writeBoth=echo(^!txt1^! - in both"

::count the number of lines in each file
for /f %%N in ('find /v /c "" ^<"%in1%"') do set "cnt1=%%N"
for /f %%N in ('find /v /c "" ^<"%in2%"') do set "cnt2=%%N"

::setup redirection in outer block and merge the files in a loop
::The max number of iterations assumes there is no overlap (cnt1+cnt2)
::Break out of the loop as soon as both files have reached EOF.
set /a ln1=0, ln2=0, cnt=cnt1+cnt2
4<"%in2%" 3<"%in1%" (
  %read1%
  %read2%
  for /l %%N in (1 1 %cnt%) do (
    if %eof1% (
        if %eof2% goto :break
        %write2%
        %read2%
    ) else if %eof2% (
        %write1%
        %read1%
    ) else if .!txt1! lss .!txt2! (
        %write1%
        %read1%
    ) else if .!txt2! lss .!txt1! (
        %write2%
        %read2%
    ) else (
        %writeBoth%
        %read1%
        %read2%
    )
  )
) >"%out%
:break

使用 SET /P 读取文件有以下限制:

  • 两个文件中的行必须以<carriage return><line feed>字符结尾(Windows 样式)。它不适用于以<line feed>(Unix 风格)终止的行。
  • 每行最多 1021 个字节(字符),不包括行终止符
  • 尾随控制字符将从每行中删除。

编辑

如果您只是想创建一个没有重复的排序合并文档,那么我相信以下是sean 方法的优化版本。它不如他的优雅,但我相信它要快得多。通过将 EOL 选项设置为 . 它还允许每行以任何字符开头<line feed>。请注意,此解决方案会从输出中删除所有空白行(就像 sean 一样)。可以添加额外的代码来保留一个空行。

@echo off
setlocal disableDelayedExpansion
set lf=^


::above 2 blank lines required
copy /b file1.txt+file2.txt file3.txt >nul
set "old="
(
  for /f eol^=^%lf%%lf%^ delims^= %%A in ('sort file3.txt') do (
    set "new=.%%A"
    setlocal enableDelayedExpansion
    if "!old!" neq "!new!" echo(!new:~1!
    for /f "delims=" %%B in ("!new!") do (
      endlocal
      set "old=%%B"
    )
  )
)>file4.txt
于 2012-07-02T19:16:06.370 回答
1

2个步骤(不需要排序,因为find第2步检查新文件,只有在没有找到数据时才写一些东西):

  1. 合并文件:
    copy file1.txt+file2.txt file3.txt

  2. 删除重复行(/i忽略大小写,省略 ifFredFREDare to be treated as different):

    @echo off
    for /f "tokens=* delims=" %%a in (file3.txt) do (
      find /i "%%a" file4.txt>>nul&&rem
      if errorlevel 1 echo %%a>>file4.txt
      ) 
    

结果文件是file4.txt

于 2012-07-02T18:08:08.107 回答