1

我正在尝试使用 AutoIt 检查文本文件并将选择行输出到 CSV。我一直遇到的问题是它需要永远。当前方法一次检查一行。它每秒可以燃烧 5-10 行,但我正在 AutoIt 框架内寻找更快的东西。

代码:

#include <File.au3>
$xnConfirm = False
$xnConfirmMsg = 0
while $xnConfirm = False

      $xnFile = FileOpenDialog("File to Examine...","%userprofile%","All (*.*)") ;InputBox("File???", "Which file do you want to review?","C:\")
   If FileExists($xnFile) = True Then
            $xnConfirm = True
         Else
               $xnConfirmMsg = msgbox(1,"File Not Found...",$xnFile & " does not exist." & @crlf &  "Please select another file.")
      EndIf
WEnd

$xnConfirm = False
$xnConfirmMsg = 0
while $xnConfirm = False
   $xnTargetFile = FileOpenDialog("Location to Save to...",$xnFile & " - output.csv","All (*.*)");"%userprofile%\Documents\output.csv" 
                  ;FileSaveDialog("Location to Save to...","%userprofile%","All (*.*)",16,"output - " & $xnFile & " - output.csv") ;
         Consolewrite("Outputting to " & $xnTargetFile & @crlf)

      if fileexists($xnTargetFile) then
            $xnConfirmMsg = msgbox(4,"Overwrite?","Are you sure you want to overwrite " & @crlf & $xnTargetFile)

               if $xnConfirmMsg = 6 Then
                  $xnConfirm = True
                  filedelete($xnTargetFile)               
               EndIf
            Else   

               $xnConfirm = True

      EndIf    
WEnd

progresson("Line count","Verifying the number of lines in " & $xnFile)
$xnFileLine = _FileCountLines($xnFile) ;InputBox("Number of lines","How many lines are in this document?",10000)
consolewrite("Loading "& $xnFile & " with " & $xnFileLine & " total lines." &  @crlf)
progressoff()

local $hfl = FileOpen($xnFile,0)
FileWrite($xnTargetFile,"")
FileOpen($xnTargetFile, 1)

$i = 1

ProgressOn("Creating CSV","Extracting matching data.","",0,0,16)
$xnTargetLine = 1

FileWriteLine($xnTargetFile,"Timestamp,Message,Category,Priority,EventId,Severity,Title,Machine,App Domain,ProcessID,Process Name,Thread Name,Win32 ThreadId")

While $i < $xnFileLine

                  ;$xnCurrentLine = FileReadLine($xnFile,$i) ;Old Settings
            $xnCurrentLine = FileReadLine($hfl,$i)
            ;MsgBox(1,"",$xnCurrentLine)

      Select
         Case stringinstr($xnCurrentLine,"Timestamp:")
            $xnTargetLine = stringmid($xnCurrentLine,12,stringlen($xnCurrentLine) - 12 + 1) & "," 
         Case stringinstr($xnCurrentLine,"Message:")
            $xnTargetLine = $xnTargetLine  & stringmid($xnCurrentLine,10,stringlen($xnCurrentLine) - 10 + 1) & ","
         Case stringinstr($xnCurrentLine,"Category:")
            $xnTargetLine = $xnTargetLine  & stringmid($xnCurrentLine,11,stringlen($xnCurrentLine) - 11 + 1) & ","
         Case stringinstr($xnCurrentLine,"Win32 ThreadId:")
            $xnTargetLine = $xnTargetLine  & stringmid($xnCurrentLine,16,stringlen($xnCurrentLine) - 16 + 1) & @crlf
               FileWriteLine($xnTargetFile,$xnTargetLine)
         case Else
               consolewrite("Nothing on line " & $i & @crlf)
      EndSelect
         $i = $i + 1
                     ProgressSet(round($i/$xnFileLine * 100,1),$i & " of " & $xnFileLine & " lines examined." & @cr & "Thank you for your patience.")
   WEnd
ProgressOff()

为了解决这个问题,我正在阅读一个类似于跟踪日志的日志文件。我希望将事件输出到 CSV,以便我可以检查趋势。日志文件中的格式如下所示:

Timestamp: 9/26/2013 3:33:23 AM

Message: Log Event Received

Category: Transaction

Win32 ThreadId:2872

我知道这是代码格式,但我希望它更容易阅读。

4

3 回答 3

2

我不确定它是否真的会更快,但是你可以使用正则表达式。如果你能告诉我更多这里的规则是什么:

         Case stringinstr($xnCurrentLine,"Timestamp:")
        $xnTargetLine = stringmid($xnCurrentLine,12,stringlen($xnCurrentLine) - 12 + 1) & "," 
     Case stringinstr($xnCurrentLine,"Message:")
        $xnTargetLine = $xnTargetLine  & stringmid($xnCurrentLine,10,stringlen($xnCurrentLine) - 10 + 1) & ","
     Case stringinstr($xnCurrentLine,"Category:")
        $xnTargetLine = $xnTargetLine  & stringmid($xnCurrentLine,11,stringlen($xnCurrentLine) - 11 + 1) & ","
     Case stringinstr($xnCurrentLine,"Win32 ThreadId:")
        $xnTargetLine = $xnTargetLine  & stringmid($xnCurrentLine,16,stringlen($xnCurrentLine) - 16 + 1) & @crlf
           FileWriteLine($xnTargetFile,$xnTargetLine)
     case Else
           consolewrite("Nothing on line " & $i & @crlf)

如果您能给我 2 或 3 行示例,我可以尝试为您制作一个 Regexp 函数,我认为它会快得多。

编辑:

我做了一个示例脚本。如果输入文件看起来像这样:

Timestamp: 9/26/2013 3:33:23 AM
Message: Log Event Received
Category: Transaction
Win32 ThreadId:2872

然后这个脚本工作得很好

#include <Array.au3>
Local $file = FileOpen("InputFile.txt", 0)
$sText = FileRead($file)
$aSnippets = StringRegExp($sText,"(?:Timestamp:|Message:|Category:|Win32 ThreadId:)(?: )?(.+)",3)
_ArrayDisplay($aSnippets)

结果是一个包含以下内容的数组:

[0] = 9/26/2013 3:33:23 AM
[1] = Log Event Received
[2] = Transaction
[3] = 2872
etc.

如果您想将这 4 行合二为一,请尝试使用 for 循环(如果您愿意,我可以为您制作一个)

对于100 行,他需要0.490570878768441 毫秒来将每个值存储在一个数组中。

于 2013-09-26T20:46:24.000 回答
2

(我想添加一条评论,要求对正在读入的数据进行抽样,但是我还没有足够的分数......)

根据输入文件的大小,我建议使用 _FileReadToArray() 一口气将整个文件读入一个数组,然后循环遍历内存中的数组(而不是在整个过程中保持对文件的访问打开)。此外,我也不会每次都写入输出文件——我会写入一个字符串,然后在完成后保存该字符串。

就像是:

$outputFileData = ""
$inputFileData = _FileReadToArray($xnFile)

For $Counter = 1 to $inputFileData[0]

      $tmpLine = $inputFileData[$Counter]

      Select

         Case stringinstr($tmpLine,"Timestamp:")
            $outputFileData = stringmid($tmpLine,12,stringlen($tmpLine) - 12 + 1) & "," 

         Case stringinstr($tmpLine,"Message:")
            $outputFileData &= stringmid($tmpLine,10,stringlen($tmpLine) - 10 + 1) & ","

         Case stringinstr($xnCurrentLine,"Category:")
            $outputFileData &= stringmid($tmpLine,11,stringlen($tmpLine) - 11 + 1) & ","

         Case stringinstr($xnCurrentLine,"Win32 ThreadId:")
            $outputFileData &= stringmid($tmpLine,16,stringlen($tmpLine) - 16 + 1) & @CRLF

         case Else
              ConsoleWrite("Nothing on line " & $i & @crlf)

      EndSelect

Next

FileWriteLine($xnTargetFile, $outputFileData)

(请注意,我没有包含任何错误检查,也没有检查错误:)

于 2013-09-26T20:56:22.630 回答
0

还有一个可能的想法。

您可以复制输入文件,重命名它,然后删除每个有用的数据。使用正则表达式将非常容易,甚至可能更快。

如果您向我展示输入文件的示例以及输出文件的外观,我可以尝试:)

于 2013-09-26T22:04:17.647 回答