27

给定这样的元组列表:

dic = [(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]

如何对dic的项目进行分组,从而生成一个列表grp,其中,

grp  = [(1,["aa","bb","cc"]), (2, ["aa"]), (3, ["ff","gg"])]

我实际上是 Haskell 的新手......并且似乎爱上了它......在 Data.List 中使用
groupgroupBy只会将列表中相似的相邻项目分组。我为此编写了一个效率低下的函数,但它会导致内存故障,因为我需要处理一个非常大的编码字符串列表。希望您能帮助我找到更有效的方法。

4

5 回答 5

66

只要有可能,重用库代码。

import Data.Map
sortAndGroup assocs = fromListWith (++) [(k, [v]) | (k, v) <- assocs]

在 ghci 中尝试一下:

*Main> sortAndGroup [(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]
fromList [(1,["bb","cc","aa"]),(2,["aa"]),(3,["gg","ff"])]

编辑在评论中,有些人担心是否(++)flip (++)正确的选择。文档没有说明事物的关联方式。您可以通过实验找出答案,或者您可以使用差异列表来回避整个问题:

sortAndGroup assocs = ($[]) <$> fromListWith (.) [(k, (v:)) | (k, v) <- assocs]
-- OR
sortAndGroup = fmap ($[]) . M.fromListWith (.) . map (fmap (:))

这些替代品的长度与原件的长度大致相同,但它们对我来说可读性差一些。

于 2012-09-13T03:23:20.267 回答
19

这是我的解决方案:

import Data.Function (on)
import Data.List (sortBy, groupBy)
import Data.Ord (comparing)

myGroup :: (Eq a, Ord a) => [(a, b)] -> [(a, [b])]
myGroup = map (\l -> (fst . head $ l, map snd l)) . groupBy ((==) `on` fst)
          . sortBy (comparing fst)

这首先使用以下命令对列表进行排序sortBy

[(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]     
=> [(1,"aa"),(1,"bb"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg")]

然后通过关联的键对列表元素进行分组groupBy

[(1,"aa"),(1,"bb"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg")] 
=> [[(1,"aa"),(1,"bb"),(1,"cc")],[(2,"aa")],[(3,"ff"),(3,"gg")]]

然后将分组项转换为元组map

[[(1,"aa"),(1,"bb"),(1,"cc")],[(2,"aa")],[(3,"ff"),(3,"gg")]] 
=> [(1,["aa","bb","cc"]), (2, ["aa"]), (3, ["ff","gg"])]`)

测试:

> myGroup dic
[(1,["aa","bb","cc"]),(2,["aa"]),(3,["ff","gg"])]
于 2012-09-13T02:08:45.813 回答
6

您也可以使用TransformListComp扩展,例如:

Prelude> :set -XTransformListComp 
Prelude> import GHC.Exts (groupWith, the)
Prelude GHC.Exts> let dic = [ (1, "aa"), (1, "bb"), (1, "cc") , (2, "aa"), (3, "ff"), (3, "gg")]
Prelude GHC.Exts> [(the key, value) | (key, value) <- dic, then group by key using groupWith]
[(1,["aa","bb","cc"]),(2,["aa"]),(3,["ff","gg"])]
于 2012-09-13T02:25:51.830 回答
4
  1. 如果列表没有按第一个元素排序,我认为你不能做得比 O(nlog(n)) 更好。

    • 一种简单的方法是sort使用第二部分答案中的任何内容。

    • 您可以从Data.Map地图Map k [a]中使用,例如使用元组的第一个元素作为键并继续添加值。

    • 你可以编写自己的复杂函数,即使在你所有尝试之后仍然需要 O(nlog(n))。

  2. 如果列表按第一个元素排序,就像您的示例中的情况一样,那么对于@Mikhail 的答案中给出的 groupBy 之类的任务或使用 foldr ,任务是微不足道的,还有许多其他方法。

使用 foldr 的示例如下:

  grp :: Eq a => [(a,b)] -> [(a,[b])]
  grp = foldr f []
     where 
       f (z,s) [] = [(z,[s])] 
       f (z,s) a@((x,y):xs)  | x == z = (x,s:y):xs 
                             | otherwise = (z,[s]):a
于 2012-09-13T02:16:02.970 回答
0
{-# LANGUAGE TransformListComp #-}

import GHC.Exts
import Data.List
import Data.Function (on)

process :: [(Integer, String)] -> [(Integer, [String])]
process list = [(the a, b) |  let info = [ (x, y) | (x, y) <- list, then    sortWith by y ], (a, b) <- info, then group by a using groupWith]
于 2016-04-12T14:27:00.083 回答