haskell - 从“Real World Haskell”中实现 splitWith 的正确方法是什么？

Question

我一直在通过现实世界的haskell工作并尝试做练习。我设法从第 4.5 章练习 2 中实现了 splitWith 的工作版本。我觉得这不是一个非常haskell 的做事方式。必须使用累加器实现新功能似乎非常迂回。有没有更惯用的方法来做到这一点，比如折叠？我查看了 foldl 的文档，但我不知道怎么做。

splitWith :: (a -> Bool) -> [a] -> [[a]]
splitWith _ [] = []
splitWith f a  = splitWithAcc f a []
  where 
    splitWithAcc :: (a -> Bool) -> [a] -> [[a]] -> [[a]]
    splitWithAcc f xs acc
      | null xs     = acc
      | f $ head xs = splitWithAcc f (dropWhile f xs) (acc ++ [takeWhile f xs])
      | otherwise   = splitWithAcc f (tail xs) acc

澄清

这是练习的文本：

编写一个函数 splitWith ，它的作用类似于单词，但接受一个谓词和一个任何类型的列表，然后在谓词返回 False 的每个元素上拆分其输入列表：

score 6 · Accepted Answer

递归是你的朋友，但我会做一些不同的事情。首先，当我分裂时，我会让我的条件为真，而不是让它为假。其次，我会使用Data.List被调用的一个方便的函数break

> :t break
break :: (a -> Bool) -> [a] -> ([a], [a])
> break (== ' ') "This is a test"
("This", " is a test")

我会用它来定义我的函数

splitWith' :: (a -> Bool) -> [a] -> [[a]]
splitWith' cond [] = []
splitWith' cond xs = first : splitWith' cond (safeTail rest)
    where
        (first, rest) = break cond xs
        -- Need this function to handle an empty list
        safeTail [] = []
        safeTail (_:ys) = ys

或者，如果你想写得尽可能混乱

splitWith'' :: (a -> Bool) -> [a] -> [[a]]
splitWith'' _ [] = []
splitWith'' cond xs = uncurry (:) $ fmap (splitWith'' cond . safeTail) $ break cond xs
    where
        safeTail [] = []
        safeTail (_:ys) = ys

这是有效的，因为fmap超过 2 元组将该函数应用于元组的第二个元素。然后它 uncurries:并将其应用于第一个和休息。

更新

如果您希望它在谓词为假时拆分，您可以使用span而不是break，或者将其定义为

splitWithWeird cond xs = splitWith' (not . cond) xs

虽然第二个显然会产生稍小的开销（除非编译器可以优化它）

更新 2

如果你想处理重复的字符，如果它适合你的需要，有一个简单、快速的修复：

> filter (not . null) $ splitWithWeird (/= ' ') "This  is   a    test"
["This","is","a","test"]

通过如此简单的修复，我们可能会想将其构建到算法本身中：

splitWithWeird :: (a -> Bool) -> [a] -> [[a]]
splitWithWeird cond [] = []
splitWithWeird cond xs = filter (not . null) $ first : splitWithWeird cond (safeTail rest)
    where
        (first, rest) = span cond xs
        safeTail [] = []
        safeTail (_:ys) = ys

但这将是一个坏主意。由于这是一个递归函数，因此您将filter (not . null)在每个级别添加对的调用，因此在函数中的每个拆分位置。所有这些都必须在返回之前扫描整个列表，因此必须执行额外的检查。最好将它定义为一个单独的函数，以便filter (not . null)只调用一次：

splitWithWeird' :: (a -> Bool) -> [a] -> [[a]]
splitWithWeird' cond xs = filter (not . null) $ splitWithWeird cond xs

或者，如果您希望将其内置到算法中：

splitWithWeird :: (a -> Bool) -> [a] -> [[a]]
splitWithWeird cond xs = filter (not . null) $ splitWithHelper cond xs
    where
        safeTail [] = []
        safeTail (_:ys) = ys
        splitWithHelper cond [] = []
        splitWithHelper cond xs =
            let (first, rest) = span cond xs
            in first : splitWithHelper cond (safeTail rest)

这实际上只是在内部做与定义两个函数相同的事情。请注意，我必须在let ... in ...此处使用附加语句（我不喜欢嵌套 wheres），因为(first, rest) = span cond xs属于splitWithHelper，而不是splitWithWeird。如果将其留在 where 子句中，算法将不起作用。

更新 3

不想在这里只留下一个不理想的解决方案，我已经继续编写了一个算法，用于在子序列上而不是在条件或元素上进行拆分。它确实使用了firstfrom 函数Control.Arrow，但只是为了使代码更加紧凑。

import Control.Arrow (first)

isPrefixOf :: Eq a => [a] -> [a] -> Bool
isPrefixOf [] _ = True
isPrefixOf _ [] = False
isPrefixOf (x:xs) (y:ys) = x == y && isPrefixOf xs ys

splitSubseq :: Eq a => [a] -> [a] -> [[a]]
splitSubseq sub [] = []
splitSubseq sub xs = initial : splitSubseq sub rest
    where
        lsub = length sub
        splitter [] = ([], [])
        splitter yss@(y:ys)
            | isPrefixOf sub yss = ([], drop lsub yss)
            | otherwise = first (y :) $ splitter ys
        (initial, rest) = splitter xs

我并不是说这是一个有效的解决方案，但它应该很容易遵循。首先，我定义了一个函数isPrefixOf，如果第二个列表以第一个列表开头，则返回 True。

我想保持相同的递归模式（first : recursive rest），所以我写splitter来代替spanor break，这就是进来的地方。如果子序列是列表的前缀，isPrefixOf则返回([], restAfterSubsequence)列表，然后从下一个元素开始重复此操作。我在first这里的使用只是为了让我可以递归而简洁地编写这个函数。它只适用(y :)于返回值的第一个元素splitter。从返回的元组的第二个元素splitter只是尚未消耗的输入的其余部分。

如果您有兴趣，这里是该算法的性能统计数据（使用--make -O2, i5 quad 编译）：

main = print $ sum $ take (10 ^ 7) $ map length $ splitSubseq " " $ cycle "Testing "

70000000
   6,840,052,808 bytes allocated in the heap
       2,032,868 bytes copied during GC
          42,900 bytes maximum residency (2 sample(s))
          22,636 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     13114 colls,     0 par    0.06s    0.07s     0.0000s    0.0001s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0002s    0.0004s

  TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    3.68s  (  3.74s elapsed)
  GC      time    0.06s  (  0.07s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    3.74s  (  3.81s elapsed)

然后去嵌入总和和长度：

main = print $ sum $ take (10 ^ 7) $ map length $ repeat "Testing"

70000000
     240,052,572 bytes allocated in the heap
          12,812 bytes copied during GC
          42,900 bytes maximum residency (2 sample(s))
          22,636 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       458 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0001s

  TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.09s  (  0.09s elapsed)
  GC      time    0.00s  (  0.00s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.11s  (  0.09s elapsed)

所以我们可以看到，这只花费了大约 0.1 秒的时间，留给我们大约 3.64 秒的时间让这个算法分割一个包含"Testing "重复 1000 万次的字符串，所有这些都只使用了少量的内存。-threaded唯一的缺点是，当使用更多内核编译和运行时，该算法实际上会显着减慢。

score 2 · Accepted Answer

这是我在做练习时想到的。我所知道的所有 Haskell 都来自这本书，所以我的解决方案不应该包含到目前为止书中没有提到的任何结构：

splitWith pred (x:xs)
    | pred x    = let (first, rest) = span pred (x:xs)
                  in  first : (splitWith pred rest)
    | otherwise = splitWith pred xs
splitWith pred []    = []

score 2 · Accepted Answer

想象一下foldr从右边构建它的结果：

splitWith f xs = case foldr g [[]] xs of {([]:r)-> r; r->r}
  where
    g x r@ ~(s:t) | f x = (x:s):t     -- keep `x` if `f x`
                  | null s = r        -- current word already empty
                  | otherwise = []:r  -- split

惰性模式允许无限列表作为输入。测试：

Prelude> splitWith (/= ' ') "  This is a    test  "
["This","is","a","test"]
Prelude> splitWith (/= ' ') ""
[]
Prelude> take 8 $ splitWith (/= ' ') (cycle "12   12 ")
["12","12","12","12","12","12","12","12"]

score 1 · Accepted Answer

splitWith' :: (a -> Bool) -> [a] -> [[a]]
splitWith' p xs = foldr with [[]] xs
  where
    with a acc@(as:rest) | p a       = (a:as):rest
                         | otherwise = []:acc

score 1 · Accepted Answer

import Data.List (groupBy)
splitWith :: (a -> Bool) -> [a] -> [[a]]
splitWith p = filter (all p) . groupBy ((==) `on` p)

实际上any可以用来代替all，因为它更便宜并且 groupBy保证了[a]for whichp持有的元素是聚集的（所以如果any持有，那么all也持有）；总而言之，p . head可以用来代替all p太。

haskell - 从“Real World Haskell”中实现 splitWith 的正确方法是什么？

5 回答 5

Related

Reference