delphi - 如何让 PChar 越过十六进制 00 到达 Delphi 中的文件末尾？

Question

我正在解析非常大的文件（Unicode - Delphi 2009），并且我有一个非常有效的例程来使用 Stackoverflow 问题中概述的 PChar 变量：在 Delphi 中解析一行的最快方法是什么？

一切都很好，直到我遇到一个包含一些嵌入式 hex:00 字符的文件。这个字符表示 PChar 字符串的结束，我的解析在该点停止。

但是，当您加载文件时，如下所示：

FileStream := TFileStream.Create(Filename, fmOpenRead or fmShareDenyWrite);
Size := FileStream.Size;

然后你发现文件的大小要大得多。如果您使用记事本打开文件，它会加载到文件末尾，而不是像 PChar 那样在第一个 hex:00 处停止。

如何在仍然使用 PChar 解析的同时读到文件末尾，而不会减慢我的读取/解析速度？

score 5 · Accepted Answer

您的其他问题中接受的代码在达到 #0 字符时会中断。要解决这个问题，您只需要保存输入的长度并检查它。更新后的代码如下所示：

type
  TLexer = class
  private
    FData: string;
    FTokenStart: PChar;
    FCurrPos: PChar;
    FEndPos: PChar;                                         // << New
    function GetCurrentToken: string;
  public
    constructor Create(const AData: string);
    function GetNextToken: Boolean;
    property CurrentToken: string read GetCurrentToken;
  end;

{ TLexer }

constructor TLexer.Create(const AData: string);
begin
  FData := AData;
  FCurrPos := PChar(FData);
  FEndPos := FCurrPos + Length(AData);                      // << New
end;

function TLexer.GetCurrentToken: string;
begin
  SetString(Result, FTokenStart, FCurrPos - FTokenStart);
end;

function TLexer.GetNextToken: Boolean;
var
  cp: PChar;
begin
  cp := FCurrPos; // copy to local to permit register allocation

  // skip whitespace
  while (cp <> FEndPos) and (cp^ <= #32) do                 // << Changed
    Inc(cp);

  // terminate at end of input
  Result := cp <> FEndPos;                                  // << Changed

  if Result then
  begin
    FTokenStart := cp;
    Inc(cp);
    while (cp <> FEndPos) and (cp^ > #32) do                // << Changed
      Inc(cp);
  end;

  FCurrPos := cp;
end;

score 2 · Accepted Answer

如果你到达一个#0字符，但你还没有消耗文件中的所有字符，那么继续。你如何继续前进取决于你一开始是如何决定停下来的。

您引用的问题有以下代码：

while (cp^ > #0) and (cp^ <= #32) do
  Inc(cp);

// using null terminator for end of file
Result := cp^ <> #0;

这显然会停在一个空字符处。如果您不希望空字符表示文件的结尾，则不要停在空字符处。而是在消耗完所有字符后停止。您必须知道预期有多少个字符，并跟踪您看过多少个字符。

nChars := Length(FData);
nCharsSeen := 0;
while (nCharsSeen < nChars) and (cp^ <= #32) do begin
  Inc(cp);
  Inc(nCharsSeen);
end;

// using character count for end of file
Result := nCharsSeen < nChars;

引用的答案是解析字符串，所以我习惯于Length学习字符数。如果您正在解析文件，请改用类似的东西TFileStream.Size。

score 1 · Accepted Answer

我从您之前接受的答案中获取了代码，并通过添加两个附加变量对其进行了略微修改：

FPosInt: NativeUInt;
FSize: NativeUInt;

FSize在构造函数中使用字符串长度进行初始化（字符串变量存储了它的长度，而 PChar 没有）。 FPosInt是文件中当前字符的编号。构造函数中的附加代码：

FSize := Length(FData);
FPosInt := 0;

然后函数中的相关部分GetNextToken不再停止在第一个零字节处，而是继续直到到达字符串的最后一个字符：

// skip whitespace; this test could be converted to an unsigned int
// subtraction and compare for only a single branch
while (cp^ <= #32) and (FPosInt < FSize) do
  begin
  Inc(cp);
  Inc(FPosInt);
  end;

// end of file is reached if the position counter has reached the filesize
Result := FPosInt < FSize;

我在 while 条件下切换了这两个语句，因为它们是从左到右评估的，而第一个语句将更频繁地评估为 false。

另一种方法不计算字符数，而是保存指针的起始位置。在构造函数中：

FSize := Length(FData);
FStartPos := NativeUInt(FCurrPos);

并在GetNextToken：

// skip whitespace; this test could be converted to an unsigned int
// subtraction and compare for only a single branch
while (cp^ <= #32) and ((NativeUInt(cp) - FStartPos) < FSize) do
  Inc(cp);

// end of file is reached if the position counter has reached the filesize
Result := (NativeUInt(cp) - FStartPos) < FSize;

delphi - 如何让 PChar 越过十六进制 00 到达 Delphi 中的文件末尾？

3 回答 3

Related

Reference