如果您下载像 Windows 的 sed(或者 Windows 的 grep)这样的工具,这应该相对容易。Gnu 项目免费提供 Windows 的 sed 和 grep。
使用 VBScript、JScript 或 powershell 的正则表达式功能也应该相对容易。
但我想我会尝试使用本地批处理。FINDSTR 支持原始的正则表达式,但它无法提取匹配的文本,因此批处理解决方案相当复杂。
下面的解决方案可以在一行上找到多个标记。它还能够计算每个不同标记的出现次数。SET 搜索和替换不区分大小写,所以我被迫让整个解决方案不区分大小写。
该解决方案只能处理长度约为 8191 字节或更短的行。
只要包含标记的行数相对较少,即使对于非常大的文件,性能也应该很好。
@echo off
setlocal disableDelayedExpansion
set "file=test.txt"
set LF=^
::The two blank lines above are critical to create linefeed - do not remove.
::Clear any existing \mark variables
for /f "delims==" %%A in ('2^>nul set \mark{') do set "%%A="
::Find all lines that contain at least one valid mark and call a routine
::to parse out all marks
for /f eol^=^%LF%%LF%delims^= %%A in (
'findstr /ri \mark{[0-9][0-9]*} "%file%"'
) do (
set "ln=%%A"
call :parseMarks
)
::Create file containing found marks only
>marks.txt (
for /f "delims==" %%A in ('set \mark{') do echo %%A
)
::Create file containing found marks with counts
>markCounts.txt set \mark{
::Print the results
echo Here is a list of found marks
echo -----------------------------
type marks.txt
echo(
echo Here is a list of found marks with the counts
echo ---------------------------------------------
type markCounts.txt
exit /b
:parseMarks
setlocal enableDelayedExpansion
set "ln2=!ln:*\mark{=!"
if !ln2! neq !ln! (
for /f "tokens=1* delims=}" %%B in ("x!ln2!x") do (
endlocal
echo(%%B|findstr /xr x[0-9][0-9]* >nul && (
for /f "delims=x" %%D in ("%%B") do set /a \mark{%%D}+=1
)
set "ln=%%C"
)
if defined ln goto :parseMarks
)
exit /b
这是我使用的 test.txt 文件。它有许多问题测试用例,使批处理解决方案变得困难。
The \mark{} quick brown fox jump \mark{1} over the lazy dog
The quick \mark{10} brown fox jump over the \mark{99a} lazy dog
The quick \mark{}99} brown fox jump over the lazy \mark{100} dog! \MARK{22}!
The quick brown fox jump over the lazy dog \mark{1000} \mark{99
;The \mark{1} quick brown fox jump over the lazy dog
The \mark{!!99} quick brown fox jump over the lazy dog \mark{100}
\mark{22}The quick brown fox \mark{30} jump over the lazy dog
exclude \mark{100a}
exclude \mark{}
include \MARK{22}
这是我的结果
Here is a list of found marks
-----------------------------
\mark{1000}
\mark{100}
\mark{10}
\mark{1}
\mark{22}
\mark{30}
Here is a list of found marks with the counts
---------------------------------------------
\mark{1000}=1
\mark{100}=2
\mark{10}=1
\mark{1}=2
\mark{22}=3
\mark{30}=1