performance - Haskell中具有高效异或和位计数的打包大位向量

Question

我正在寻找一种高效的（在空间和时间上）数据类型，它可以保存 384 位向量并支持高效的 XOR 和“位计数”（位数设置为 1）操作。

下面，请找到我的演示程序。我需要的操作都在SOQuestionOps类型类中，我已经为Natural和实现了它Data.Vector.Unboxed.Bit。尤其是后者似乎很完美，因为它有一个zipWords操作应该允许我进行“位计数”和逐字异或而不是逐位运算。它还声称存储打包的位（每字节 8 位）。

{-# LANGUAGE FlexibleInstances #-}
import Data.Bits
import Data.List (foldl')
import Numeric.Natural
import qualified Data.Vector as V
import qualified Data.Vector.Unboxed.Bit as BV

class SOQuestionOps a where
    soqoXOR :: a -> a -> a
    soqoBitCount :: a -> Int
    soqoFromList :: [Bool] -> a

alternating :: Int -> [Bool]
alternating n =
    let c = n `mod` 2 == 0
     in if n == 0
           then []
           else c : alternating (n-1)

instance SOQuestionOps Natural where
    soqoXOR = xor
    soqoBitCount = popCount
    soqoFromList v =
        let oneIdxs = map snd $ filter fst (zip v [0..])
         in foldl' (\acc n -> acc `setBit` n) 0 oneIdxs

instance SOQuestionOps (BV.Vector BV.Bit) where
    soqoXOR = BV.zipWords xor
    soqoBitCount = BV.countBits
    soqoFromList v = BV.fromList (map BV.fromBool v)

main =
    let initialVec :: BV.Vector BV.Bit
        initialVec = soqoFromList $ alternating 384
        lotsOfVecs = V.replicate 10000000 (soqoFromList $ take 384 $ repeat True)
        xorFolded = V.foldl' soqoXOR initialVec lotsOfVecs
        sumBitCounts = V.foldl' (\n v -> n + soqoBitCount v) 0 lotsOfVecs
     in putStrLn $ "folded bit count: " ++ show (soqoBitCount xorFolded) ++ ", sum: " ++ show sumBitCounts

因此，让我们计算最佳情况下的数字：lotsOfVecs不需要分配太多，因为它只是相同 vector 的 10,000,000 倍initialVec。foldl 显然会在每个折叠操作中创建这些向量之一，因此它应该创建 10,000,000 个位向量。位计数应该创建除 10,000,000Int秒之外的任何值。因此，在最好的情况下，我的程序应该使用非常少（并且是恒定的）内存，并且总分配量应该大致为 10,000,000 * sizeof(bit vector) + 10,000,000 * sizeof(int) = 520,000,000 bytes 。

好的，让我们运行程序Natural：

让我们制作initialVec :: Natural，编译

ghc --make -rtsopts -O3 MemStuff.hs

结果（这是 GHC 7.10.1）：

$ ./MemStuff +RTS -sstderr
folded bit count: 192, sum: 3840000000
1,280,306,112 bytes allocated in the heap
201,720 bytes copied during GC
80,106,856 bytes maximum residency (2 sample(s))
662,168 bytes maximum slop
78 MB total memory in use (0 MB lost due to fragmentation)

Tot time (elapsed)  Avg pause  Max pause
Gen  0      2321 colls,     0 par    0.056s   0.059s     0.0000s    0.0530s
Gen  1         2 colls,     0 par    0.065s   0.069s     0.0346s    0.0674s

INIT    time    0.000s  (  0.000s elapsed)
MUT     time    0.579s  (  0.608s elapsed)
GC      time    0.122s  (  0.128s elapsed)
EXIT    time    0.000s  (  0.002s elapsed)
Total   time    0.702s  (  0.738s elapsed)

%GC     time      17.3%  (17.3% elapsed)

Alloc rate    2,209,576,763 bytes per MUT second

Productivity  82.7% of total user, 78.7% of total elapsed


real    0m0.754s
user    0m0.704s
sys 0m0.037s

其中有1,280,306,112 bytes allocated in the heap，这在预期数字的范围内（2x）。顺便说一句，在 GHC 7.8 上，这分配了 353,480,272,096 字节并运行了绝对年龄，因为popCount在 GHC 7.8 上效率不高Natural。

编辑：我稍微更改了代码。在原始版本中，所有其他向量都0在折叠中。这为版本提供了更好的分配数字Natural。我改变了它，使向量在不同的表示（设置了许多位）之间交替，现在我们看到2x了预期的分配。Natural这是(and )的另一个缺点Integer：分配率取决于值。

但也许我们可以做得更好，让我们试试密集的Data.Vector.Unboxed.Bit：

那就是initialVec :: BV.Vector BV.Bit使用相同的选项重新编译和重新运行。

$ time ./MemStuff +RTS -sstderr
folded bit count: 192, sum: 1920000000
75,120,306,536 bytes allocated in the heap
54,914,640 bytes copied during GC
80,107,368 bytes maximum residency (2 sample(s))
664,128 bytes maximum slop
78 MB total memory in use (0 MB lost due to fragmentation)

Tot time (elapsed)  Avg pause  Max pause
Gen  0     145985 colls,     0 par    0.543s   0.627s     0.0000s    0.0577s
Gen  1         2 colls,     0 par    0.065s   0.070s     0.0351s    0.0686s

INIT    time    0.000s  (  0.000s elapsed)
MUT     time   27.679s  ( 28.228s elapsed)
GC      time    0.608s  (  0.698s elapsed)
EXIT    time    0.000s  (  0.002s elapsed)
Total   time   28.288s  ( 28.928s elapsed)

%GC     time       2.1%  (2.4% elapsed)

Alloc rate    2,714,015,097 bytes per MUT second

Productivity  97.8% of total user, 95.7% of total elapsed


real    0m28.944s
user    0m28.290s
sys 0m0.456s

这非常慢，大约是分配的 100 倍:(。

好的，然后让我们重新编译和配置两个运行（ghc --make -rtsopts -O3 -prof -auto-all -caf-all -fforce-recomp MemStuff.hs）：

Natural版本：

COST CENTRE         MODULE  %time %alloc
main.xorFolded      Main     51.7   76.0
main.sumBitCounts.\ Main     25.4   16.0
main.sumBitCounts   Main     12.1    0.0
main.lotsOfVecs     Main     10.4    8.0

Data.Vector.Unboxed.Bit版本：

COST CENTRE         MODULE  %time %alloc
soqoXOR             Main     96.7   99.3
main.sumBitCounts.\ Main      1.9    0.2

真的Natural是固定大小位向量的最佳选择吗？那么 GHC 6.8 呢？还有什么更好的方法可以实现我的SOQuestionOps类型类吗？

score 1 · Accepted Answer

查看包中的Data.LargeWord模块Crypto：

http://hackage.haskell.org/package/Crypto-4.2.5.1/docs/Data-LargeWord.html

它为Bits各种大小的大字提供实例，例如 96 到 256 位。

performance - Haskell中具有高效异或和位计数的打包大位向量

1 回答 1

Related

Reference