5

I have some files (3-5) that i need to compare:
File 1.txt have 1 million strings.
File 2.txt have 10 million strings.
File 3.txt have 5 million strings.
All these files are compared with file keys.txt (10 thousand strings). If line from currently opened file is the same as one of lines from keys.txt, write this line into output.txt (I hope you understand what i mean).

Now i have:

function Thread.checkKeys(sLine: string): boolean;
var
  SR: TStreamReader;
  line: string;
begin
  Result := false;
  SR := TStreamReader.Create(sKeyFile); // sKeyFile - Path to file keys.txt
  try
    while (not(SR.EndOfStream)) and (not(Result))do
      begin
        line := SR.ReadLine;
        if LowerCase(line) = LowerCase(sLine) then
          begin
            saveStr(sLine);
            inc(iMatch);
            Result := true;
          end;
      end;
  finally
    SR.Free;
  end;
end;

procedure Thread.saveStr(sToSave: string);
var
  fOut: TStreamWriter;
begin
  fOut := TStreamWriter.Create('output.txt', true, TEncoding.UTF8);
  try
    fOut.WriteLine(sToSave);
  finally
    fOut.Free;
  end;
end;

procedure Thread.updateFiles;
begin
  fmMain.flDone.Caption := IntToStr(iFile);
  fmMain.flMatch.Caption := IntToStr(iMatch);
end;

And loop with

    fInput := TStreamReader.Create(tsFiles[iCurFile]);
    while not(fInput.EndOfStream) do
      begin
        sInput := fInput.ReadLine;
        checkKeys(sInput);
      end;
    fInput.Free;
    iFile := iCurFile + 1;
    Synchronize(updateFiles);

So, if i compare these 3 files with file key.txt it takes about 4 hours. How to decrease compare time?

4

2 回答 2

7

An easy solution is to use an associative container to store your keys. This can provide efficient lookup.

In Delphi you can use TDictionary<TKey,TValue> from Generics.Collections. The implementation of this container hashes the keys and provides O(1) lookup.

Declare the container like this:

Keys: TDictionary<string, Boolean>; 
// doesn't matter what type you use for the value, we pick Boolean since we
// have to pick something

Create and populate it like this:

Keys := TDictionary<string, Integer>.Create;
SR := TStreamReader.Create(sKeyFile);
try
  while not SR.EndOfStream do
    Keys.Add(LowerCase(SR.ReadLine), True); 
    // exception raised if duplicate key found
finally
  SR.Free;
end;

Then your checking function becomes:

function Thread.checkKeys(const sLine: string): boolean;
begin
  Result := Keys.ContainsKey(LowerCase(sLine));
  if Result then 
  begin
    saveStr(sLine);
    inc(iMatch);
  end;
end;
于 2013-10-16T08:56:37.157 回答
0

First of all you should load Keys.txt into for example TStringList. Don't read keys each time from file. The second in such high count loop you shouldn't use procedure/functions calls you should do all checks inline.

Something like this:

   Keys:=TStringList.Create;
   Keys.LoadFromFile('keys.txt');

   fInput := TStreamReader.Create(tsFiles[iCurFile]);
   fOut := TStreamWriter.Create('output.txt', true, TEncoding.UTF8);
    while not(fInput.EndOfStream) do
      begin
        sInput := fInput.ReadLine;
        if Keys.IndexOf(sInput)>=0 then
        begin
         fOut.WriteLine(sInput);     
         inc(iMatch);
        end; 

      end;
    fInput.Free;
    fOut.Free;
    iFile := iCurFile + 1;
    Synchronize(updateFiles);

    Keys.Free;
于 2013-10-16T08:46:56.037 回答