python - 在内存中移动字节而不替换现有字节并且没有大缓冲区

Question

如何通过保留现有数据和使用与 CPU 寄存器大小匹配的缓冲区将内存中的一些字节从一个位置移动到另一个位置？

更精确的公式：

我正在用 FreePascal 编写一些代码（为了好玩）。目前我需要将一些字节移动到其他地方的功能。内置函数 system.Move 做的很粗鲁——它不关心在移动和覆盖时保留目标地址中的数据。当然，我可以使用缓冲区保存数据，然后使用移动功能从缓冲区恢复数据。但是当移动大量数据时，它需要一个很大的缓冲区。我想避免它并使用与 CPU 寄存器大小匹配的缓冲区。

假设我们总是从较低位置移动到较高位置（Pos1 < Pos2），我需要的示例。从位置 2 移动 3 个字节到位置 7：

在此处输入图像描述

我可以使用字节大小的缓冲区来做到这一点（→ 表示写入，↔ 表示交换值）：

    7 → Buffer
    2 → 7
    Buffer ↔ 4
    Buffer ↔ 9
    Buffer ↔ 6
    Buffer ↔ 3
    Buffer ↔ 8
    Buffer ↔ 5
    Buffer ↔ 2

更大的例子：将 3 个字节从位置 3 移动到位置 15

在此处输入图像描述

算法现在看起来像这样：

    15 → Buffer
    3 → 15
    Buffer ↔ 12
    Buffer ↔ 9
    Buffer ↔ 6
    Buffer ↔ 3

    16 → Buffer
    4 → 16
    Buffer ↔ 13
    Buffer ↔ 10
    Buffer ↔ 7
    Buffer ↔ 4

    17 → Buffer
    5 → 17
    Buffer ↔ 14
    Buffer ↔ 11
    Buffer ↔ 8
    Buffer ↔ 5

在前面的示例中，有一大步 - 我们使用一个操作序列移动所有内容，但这里有 3 个大步骤。

以我不明白的方式-如此大的步数似乎等于（Pos2-Pos1）和长度的GCD（最大公约数）。

我编写了一些 python 代码，似乎为给定的移动请求提供了正确的操作顺序

# -*- coding: utf-8 -*-
def func1(Pos1, Pos2, Length):
    Delta = Pos2 - Pos1;
    j = Pos2;
    p1 = Pos1;
    p2 = Pos2;
    Step = 0;
    SubStep = 0;
    while (Step < Delta + Length):
        Step = Step + 1;
        SubStep = SubStep + 1;
        print(" %d → Buffer"%j);
        print(" %d → %d"%(p1,j));
        while 1:
            Step = Step + 1;
            if (j + Delta < Pos2 + Length):
                j = j + Delta;
            else:
                j = j - Length;
            print(" Buffer ↔ %d"%(j));
            if (j == p1):
                p1 = p1 + 1;
                p2 = p2 + 1;
                j = p2;
                break;
    return SubStep;

假设这是正确的，则存在一个巨大的问题-该算法处理速度很慢的字节操作，并且由于我有 amd64-我想让每个操作使用 8 个字节。

我要怎么做？

score 0 · Accepted Answer

我自己的问题中描述的问题似乎可以通过两种简单的方式解决（查看有问题的图片）：

1）你有橙色和绿色块，你需要交换它们 - 将橙色块放在缓冲区中（因为它更小），移动绿色块，然后他们从缓冲区中获取橙色块。

权衡：简单、明显、快速的小块

问题：如果橙色和绿色块的大小相同 - 您将需要最大的缓冲区，并且可能会占用大量内存

2）有问题描述的使用方式。您将能够使用不大于两个块大小（橙色和绿色）的 GCD 的缓冲区。因此，最好使用两个缓冲区：一个具有 CPU 寄存器大小（我的 amd64 为 8 个字节）和一个字节大小的缓冲区。因此，首先您使用（GCD div 8）（对于 64 位系统）步骤使用寄存器大小的缓冲区移动（或移位），然后完成字节大小的缓冲区。

权衡：不需要大缓冲

问题：如果两个块大小的 GCD 小于寄存器大小 - 我们会非常慢（因为在这种情况下只有字节大小的移位操作可用）。一点都不明显。

我做了两个函数来测试这两个解决方案

// License: LGPL
//──────────────────────────────────────────────────────────────────────────────
procedure MemoryMove(const Pos1, Pos2: Pointer; const Len: PtrUInt);
  var
    // Memory addresses
    MemPos1:     PtrUInt;  // bottom block address
    MemPos2:     PtrUInt;  // top block address
    CurPos:      PtrUInt;  // current position
    CurPos1:     PtrUInt;  // current position 1
    CurPos2:     PtrUInt;  // current position 2
    CurPosLimit: PtrUInt;
    // Memory sizes
    MemSize1:    PtrUInt;  // size of bottom memory block
    MemSize2:    PtrUInt;  // size of top memory block
    // Counters
    BufSizeReq:  PtrUInt;  // buffer size required
    // Counters for buffers
    BufCount:    PtrUInt;
    // Mini buffers
    BufReg:      PtrUInt;  // buffer match pointer (registers) size
    BufByte:     Int8U;    // byte sized buffer
    // Temporary variables
    TempReg:     PtrUInt;
    TempByte:    Int8U;
    // Variables for iterations
    Count: IntU;
  begin
    // This code is not obvious, see manual or other documentation
    // http://stackoverflow.com/questions/18539341/move-bytes-in-memory-without-replacing-existing-bytes-and-without-big-buffer

    // Check parameters
    // for being out of memory bounds
    if (PtrUInt(Pos1) > High(PtrUInt) - Len) OR (PtrUInt(Pos2) > High(PtrUInt) - Len) then
    begin
      raise Exception.Create(rsMemOutOfBounds);
    end;
    // Does it make sense to do anything?
    if (Pos1 = Pos2) OR (Len = 0) then
    begin
      Exit;
    end;

    // Move operation interpreted as cycle shifting from low to high address

    // Convert parameters of Move operation to cycle shifting
    if Pos1 < Pos2 then
    begin
      MemPos1  := PtrUInt(Pos1);
      MemPos2  := PtrUInt(Pos2);
      MemSize1 := Len;
      MemSize2 := PtrUInt(Pos2) - PtrUInt(Pos1);
    end
    else
    begin
      MemPos1  := PtrUInt(Pos2);
      MemPos2  := PtrUInt(Pos2) + Len;
      MemSize1 := PtrUInt(Pos1) - PtrUInt(Pos2);
      MemSize2 := Len;
    end;

    // Buffer can't be bigger than GCD of two block sizes
    BufSizeReq := GCD(MemSize1, MemSize2);

    // Prepare some variables
    CurPos1  := MemPos1;
    CurPos2  := MemPos2;
    BufCount := 0;

    // Shift with different buffer sizes
    // Start with biggest, end with smallest

    // Using pointer-size buffer

    // Calculate amount of mini buffers needed to fit required buffer size
    if SizeOf(Pointer) >= SizeOf(BufReg) then
    begin
      BufCount   := BufSizeReq div SizeOf(BufReg);
      BufSizeReq := BufSizeReq - BufCount * SizeOf(BufReg);
    end;
    // Shift data with current buffer size and count
    Count := 1;
    while Count <= BufCount do
    begin
      // First step
      BufReg := PtrUInt(Pointer(CurPos2)^);
      PtrUInt(Pointer(CurPos2)^) := PtrUInt(Pointer(CurPos1)^);
      // Next steps done in loop
      CurPos      := CurPos2;
      CurPosLimit := CurPos2 - MemSize2 + MemSize1;
      repeat
        // Calculate where current element from buffer should be placed
        if (CurPos < CurPosLimit) then
        begin
          CurPos := CurPos + MemSize2;
        end
        else
        begin
          CurPos := CurPos - MemSize1;
        end;
        // Exchange value from buffer with current memory position
        TempReg := BufReg;
        BufReg  := PtrUInt(Pointer(CurPos)^);
        PtrUInt(Pointer(CurPos)^) := TempReg;
      until CurPos = CurPos1; // very bad condition - one mistake and loop will be infinite
      CurPos1 := CurPos1 + SizeOf(BufReg);
      CurPos2 := CurPos2 + SizeOf(BufReg);
      Count   := Count + 1;
    end;

    // NOTE: below code is copy-paste+edit of code above
    // It's very bad for readability, but good for speed

    // Using one-byte-size buffer
    // This **IS** necessarily

    // Calculate amount of mini buffers needed to fit required buffer size
    BufCount   := BufSizeReq;
    // Shift data with current buffer size and count
    Count := 1;
    while Count <= BufCount do
    begin
      // First step
      BufByte := Int8U(Pointer(CurPos2)^);
      Int8U(Pointer(CurPos2)^) := Int8U(Pointer(CurPos1)^);
      // Next steps done in loop
      CurPos      := CurPos2;
      CurPosLimit := CurPos2 - MemSize2 + MemSize1;
      repeat
        // Calculate where current element from buffer should be placed
        if (CurPos < CurPosLimit) then
        begin
          CurPos := CurPos + MemSize2;
        end
        else
        begin
          CurPos := CurPos - MemSize1;
        end;
        // Exchange value from buffer with current memory position
        TempByte := BufByte;
        BufByte  := Int8U(Pointer(CurPos)^);
        Int8U(Pointer(CurPos)^) := TempByte;
      until CurPos = CurPos1; // very bad condition - one mistake and loop will be infinite
      CurPos1 := CurPos1 + 1;
      CurPos2 := CurPos2 + 1;
      Count   := Count + 1;
    end;
  end;
//──────────────────────────────────────────────────────────────────────────────
procedure MemoryMoveBuf(const Pos1, Pos2: Pointer; const Len: PtrUInt);
  var
    Buf: Int8UAD; // buffer
    MemPos11, MemPos12, MemPos21, MemPos22: PtrUInt;
    MemLen1, MemLen2: PtrUInt;
  begin
    // Check parameters
    // for being out of memory bounds
    if (PtrUInt(Pos1) > High(PtrUInt) - Len) OR (PtrUInt(Pos2) > High(PtrUInt) - Len) then
    begin
      raise Exception.Create(rsMemOutOfBounds);
    end;
    // Does it make sense to do anything?
    if (Pos1 = Pos2) OR (Len = 0) then
    begin
      Exit;
    end;
    // There are two memory blocks
    // First is given in parameters
    MemPos11 := PtrUInt(Pos1);
    MemPos12 := PtrUInt(Pos2);
    MemLen1  := Len;
    // The second one must be calculated
    if Pos2 > Pos1 then
    begin
      MemPos21 := PtrUInt(Pos1) + Len;
      MemPos22 := PtrUInt(Pos1);
      MemLen2  := PtrUInt(Pos2) - PtrUInt(Pos1);
    end
    else
    begin
      MemPos21 := PtrUInt(Pos2);
      MemPos22 := PtrUInt(Pos2) + Len;
      MemLen2  := PtrUInt(Pos1) - PtrUInt(Pos2);
    end;
    // Moving
    try
      // Use smaller buffer
      if MemLen1 <= MemLen2 then
      begin
        SetLength(Buf, MemLen1);
        system.Move(Pointer(MemPos11)^, Pointer(Buf)^, MemLen1);
        system.Move(Pointer(MemPos21)^, Pointer(MemPos22)^, MemLen2);
        system.Move(Pointer(Buf)^, Pointer(MemPos12)^, MemLen1);
      end
      else
      begin
        SetLength(Buf, MemLen2);
        system.Move(Pointer(MemPos21)^, Pointer(Buf)^, MemLen2);
        system.Move(Pointer(MemPos11)^, Pointer(MemPos12)^, MemLen1);
        system.Move(Pointer(Buf)^, Pointer(MemPos22)^, MemLen2);
      end;
    finally
      SetLength(Buf, 0);
    end;
  end;
//──────────────────────────────────────────────────────────────────────────────

对于测试，我创建了缓冲区，将下半部分移动到上半部分（交换缓冲区数据的一半） - 这是解决方案 1 的最坏情况。

我用相同的缓冲区大小重复了每一次移动测试以获得平均结果。

所以我从 512 字节缓冲区开始，将 256 字节从位置 0 移动到位置 256，重复此移动 1048576 次，然后使缓冲区大 2 倍，并减少重复 2 次，...，以 536870912 字节缓冲区结束，只有 1 次重复（总共 21 次大迭代）

这给出了良好的 GCD（GCD > 寄存器大小）。为了模拟解决方案 2 的最坏情况，当 (GCD = 1) 时，我只是将移动长度减少 1，因此在第一次迭代中，它从 pos 移动了 255 个字节。0到位置。256.

我很失望——我最喜欢的解决方案 2 的自写函数很慢，即使是 (GCD > 8)，当 (GCD = 1) 时——它非常慢。

图 1 此图显示了结果。OX - 缓冲区大小，OY - 时间（更少 = 更好）。实心黑色圆圈和粗线 - 解决方案 1。底部白色圆圈和细线是具有良好 GCD（GCD > 8，最佳情况）的解决方案 2，顶线 - 坏 GCD（GCD = 1，最坏情况）。灰色填充 - 解决方案 2 的中间值。

说明：解决方案1使用专业程序员在汇编程序中编写的“移动”功能，很难通过性能击败它。

几周后，我发现编译器没有给出最佳的汇编代码。通过一些额外的优化（-O3 -Or），结果会变得更好图 2

结论：我已经测试了这两个函数来给出相同的结果，但是当我做那个图形时没有，所以它可能都是错误的。

当您要移动的内存块之一很小时，解决方案 1 似乎非常好。但是它有一些瓶颈并且不是很线性（因此很难预测执行时间）。

当您受限于分配更多内存时，解决方案 2 似乎适用于移动大量内存。这是一个非常线性的并且可以通过执行时间来预测。但是可能非常慢。

python - 在内存中移动字节而不替换现有字节并且没有大缓冲区

1 回答 1

Related

Reference