optimization - 创建一个大的 Set - 需要减少花费在 GC 上的时间

Question

该程序创建了一个非常大的集合来查找哈希函数冲突。有没有办法减少花在 GC 上的时间？+RTS -s 报告 40+% 的时间花在 GC 上。

示例用法：

./program 0 1000000 +RTS -s
./program 145168473 10200000 +RTS -s

我可以使用更好的算法或数据结构吗？

{-# LANGUAGE OverloadedStrings #-}

import System.Environment
import Control.Monad
import Crypto.Hash.SHA256

import qualified Data.ByteString.Char8 as B
import qualified Data.ByteString.Lazy.Char8 as BL
import Data.Char
import Data.Int
import Data.Bits
import Data.Binary
import Data.Set as Set
import Data.List
import Numeric

str2int :: (Integral a) => B.ByteString -> a
str2int bs = B.foldl (\a w -> (a * 256)+(fromIntegral $ ord w)) 0 bs

t50 :: Int64 -> Int64
t50 i = let h = hash $ B.concat $ BL.toChunks $ encode i
        in
          (str2int $ B.drop 25 h) .&. 0x3ffffffffffff

sha256 :: Int64 -> B.ByteString
sha256 i = hash $ B.concat $ BL.toChunks $ encode i

-- firstCollision :: Ord b => (a -> b) -> [a] -> Maybe a
firstCollision f xs = go f Set.empty xs
  where
    -- go :: Ord b => (a -> b) -> Set b -> [a] -> Maybe a
    go _ _ []     = Nothing
    go f s (x:xs) = let y = f x
                    in
                      if y `Set.member` s
                        then Just x
                        else go f (Set.insert y s) xs

showHex2 i
  | i < 16    = "0" ++ (showHex i "")
  | otherwise = showHex i ""

prettyPrint :: B.ByteString -> String
prettyPrint = concat . (Data.List.map showHex2) . (Data.List.map ord) . B.unpack


showhash inp =
  let  h = sha256 inp
       x = B.concat $ BL.toChunks $ encode inp
   in do putStrLn $ "  - input: " ++ (prettyPrint x) ++ " -- " ++ (show inp)
         putStrLn $ "  -  hash: " ++ (prettyPrint h)

main = do
         args <- getArgs
         let a = (read $ args !! 0) :: Int64
             b = (read $ args !! 1) :: Int64
             c = firstCollision t [a..(a+b)]
             t = t50
         case c of
           Nothing -> putStrLn "No collision found"
           Just x  -> do let h = t x
                         putStrLn $ "Found collision at " ++ (show x)
                         showhash x
                         let first = find (\x -> (t x) == h) [a..(a+b)]
                          in case first of
                               Nothing -> putStrLn "oops -- failed to find hash"
                               Just x0 -> do putStrLn $ "first instance at " ++ (show x0)
                                             showhash x0

score 4 · Accepted Answer

如您所见，GC 统计数据报告生产力低下：

  44,184,375,988 bytes allocated in the heap
   1,244,120,552 bytes copied during GC
      39,315,612 bytes maximum residency (42 sample(s))
         545,688 bytes maximum slop
             109 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     81400 colls,     0 par    2.47s    2.40s     0.0000s    0.0003s
  Gen  1        42 colls,     0 par    1.06s    1.08s     0.0258s    0.1203s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    4.58s  (  4.63s elapsed)
  GC      time    3.53s  (  3.48s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    8.11s  (  8.11s elapsed)

  %GC     time      43.5%  (42.9% elapsed)

  Alloc rate    9,651,194,755 bytes per MUT second

  Productivity  56.5% of total user, 56.4% of total elapsed

最明显的第一步是增加 GC 默认区域以尝试消除调整大小的需要。例如，一个技巧是增加 -A 区域（您可以使用GC tune 等工具为您的程序找到正确的设置）。

  $ ./A ... +RTS -s -A200M

  Total   time    7.89s  (  7.87s elapsed)

  %GC     time      26.1%  (26.5% elapsed)

  Alloc rate    7,581,233,460 bytes per MUT second

  Productivity  73.9% of total user, 74.1% of total elapsed

所以我们减少了四分之一秒的时间，但将生产力提高到了 75%。现在我们应该看看堆配置文件：

在此处输入图像描述

这显示了集合及其 Int 值的线性增长。不过，这是您的算法指定的，因此只要您保留所有结果，我看不到您可以做很多事情。

score 2 · Accepted Answer

您经常做的一件事是通过使用包来构建ByteStrings （如果您想避免与惰性块之间发生这种情况，可以顺便使用）。如果您深入了解他们使用的 monad 的内部结构，您会发现它的默认初始大小约为 32k。出于您的目的，考虑到您只需要 8 个字节，这可能会给垃圾收集器带来比所需更大的压力。binarycerealBuilder

由于您实际上只是binary用于编码，因此您可以自己使用以下方法：

encodeInt64 :: Int64 -> B.ByteString
encodeInt64 x = 
  let 
    go :: Int -> Maybe (Word8, Int)
    go i 
      | i < 0     = Nothing
      | otherwise = 
        let 
          w :: Word8
          w = fromIntegral (x `shiftR` i)
        in Just (w, i-8)
  in fst $ B.unfoldrN 8 go 56

我会冒险你甚至可以做得更好，也许将字节直接插入缓冲区。

以上是一回事，另一个与 GC 无关的点是您使用的是标准实现，您可以使用fromData.Set找到更好的性能。Data.HashSetunordered-containers

最后一点，Don 也提到的是，您可以请求更大的分配区域-A200M（或附近）。

通过上述所有修改（您自己的编码器，使用Data.HashSet和-A200M），您的代码在我的机器上的运行时间从 7.397 秒变为 3.474 秒，%GC 时间分别为 52.9% 和 21.2%。

因此，在您的方法的 Big-O 意义上，您没有做错任何事情，但是您可以稍微降低一些常数！

score 1 · Accepted Answer

我不知道。但是，这里有一些分析器输出，以防有人可以从中构造出真正的答案：

这是堆配置文件（来自运行+RTS -hT）

堆配置文件

我认为firstCollision由于对Set.insert. 但是，内存分配的绝对值是如此之小，以至于我不确定它是否是真正的罪魁祸首——见下文。

以下是分析器的输出（使用编译-prof -fprof-auto，使用运行+RTS -p）：

COST CENTRE         MODULE  %time %alloc

firstCollision.go   Main     49.4    2.2
t50.h               Main     39.5   97.5
str2int             Main      5.4    0.0
firstCollision.go.y Main      3.4    0.0
t50                 Main      1.1    0.0

基本上所有的内存分配都来自本地等效h的序列化/散列管道sha256，其中似乎有很多中间数据结构构造正在进行。

有经验的人能更准确地查明问题吗？

optimization - 创建一个大的 Set - 需要减少花费在 GC 上的时间

3 回答 3

Related

Reference