haskell - 在管道内使用持久化

Question

首先，我想要完成的任务的简化版本：我有几个大文件（总计 30GB），我想要修剪重复条目。为此，我建立了一个数据哈希数据库，并逐个打开文件，对每个项目进行哈希处理，并将其记录在数据库和输出文件中，前提是其哈希尚未在数据库中。

我知道如何使用迭代器、枚举器来做到这一点，并且我想尝试使用管道。我也知道如何使用管道，但现在我想使用管道和持久性。我遇到了类型问题，可能还有ResourceT.

这里有一些伪代码来说明这个问题：

withSqlConn "foo.db" $ runSqlConn $ runResourceT $ 
     sourceFile "in" $= parseBytes $= dbAction $= serialize $$ sinkFile "out"

问题出在dbAction功能上。我想访问这里的数据库，自然而然。由于它所做的动作基本上只是一个过滤器，我首先想到的是这样写：

dbAction = CL.mapMaybeM p
     where p :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) => DataType -> m (Maybe DataType)
           p = lift $ putStrLn "foo" -- fine
           insert $ undefined -- type error!
           return undefined

我得到的具体错误是：

Could not deduce (m ~ b0 m0)
from the context (MonadIO m, MonadBaseControl IO (SqlPersist m))
  bound by the type signature for
             p :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) =>
                           DataType -> m (Maybe DataType)
  at tools/clean-wac.hs:(33,1)-(34,34)
  `m' is a rigid type variable bound by
      the type signature for
        p :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) =>
                      DataType -> m (Maybe (DataType))
      at tools/clean-wac.hs:33:1
Expected type: m (Key b0 val0)
  Actual type: b0 m0 (Key b0 val0)

请注意，这可能是由于我在设计类型签名时做出的错误假设。如果我注释掉类型签名并删除该lift语句，则错误消息将变为：

No instance for (PersistStore ResourceT (SqlPersist IO))
  arising from a use of `p'
Possible fix:
  add an instance declaration for
  (PersistStore ResourceT (SqlPersist IO))
In the first argument of `CL.mapMaybeM', namely `p'

所以这意味着我们根本无法PersistStore通过ResourceT?

如果不使用，我也无法编写自己的导管CL.mapMaybeM：

dbAction = filterP
filterP :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) => Conduit DataType m DataType
filterP = loop
    where loop = awaitE >>= either return go
          go s = do lift $ insert $ undefined -- again, type error
                    loop

这导致了另一个我不完全理解的类型错误。

Could not deduce (m ~ b0 m0)
from the context (MonadIO m, MonadBaseControl IO (SqlPersist m))
  bound by the type signature for
             filterP :: (MonadIO m,
                                 MonadBaseControl IO (SqlPersist m)) =>
                                Conduit DataType m DataType
     `m' is a rigid type variable bound by
      the type signature for
        filterP :: (MonadIO m,
                            MonadBaseControl IO (SqlPersist m)) =>
                           Conduit DataType m DataType
Expected type: Conduit DataType m DataType
  Actual type: Pipe
                 DataType DataType DataType () (b0 m0) ()
In the expression: loop
In an equation for `filterP'

所以，我的问题是：是否可以像我打算在管道内一样使用持久性？如果，如何？我知道，因为我可以liftIO在管道内使用，所以我可以去使用，比如说HDBC，但我想明确地使用 persistent 以了解它是如何工作的，并且因为我喜欢它的 db-backend 不可知论。

score 7 · Accepted Answer

下面的代码对我来说编译得很好。框架是否有可能在此期间继续发展，而现在一切正常？

但是请注意，随着世界发生了一些变化，或者我没有您的所有代码，我必须进行以下更改。我在 GHC 7.6.3 中使用了管道 1.0.9.3 和持久性 1.3.0。

省略parseBytes，serialise因为我没有你的定义，DataType = ByteString而是定义了。
为值引入了Proxy参数和显式类型签名，undefined以避免类型族注入问题。这些可能不会出现在您的实际代码中，因为它将具有具体或外部确定的val.
已使用await而不是awaitE仅用()作替换Left案例的类型，awaitE已停用。
将一个虚拟Connection创建函数传递给withSqlConn- 也许我应该使用一些 Sqlite 特定函数？

这是代码：

{-# LANGUAGE FlexibleContexts, NoMonomorphismRestriction,
             TypeFamilies, ScopedTypeVariables #-}

module So133331988 where

import Control.Monad.Trans
import Database.Persist.Sql
import Data.ByteString
import Data.Conduit
import Data.Conduit.Binary
import Data.Proxy

test proxy =
    withSqlConn (return (undefined "foo.db")) $ runSqlConn $ runResourceT $ 
         sourceFile "in" $= dbAction proxy $$ sinkFile "out"

dbAction = filterP

type DataType = ByteString

filterP
    :: forall m val
     . ( MonadIO m, MonadBaseControl IO (SqlPersist m)
       , PersistStore m, PersistEntity val
       , PersistEntityBackend val ~ PersistMonadBackend m)
    => Proxy val
    -> Conduit DataType m DataType
filterP Proxy = loop
    where loop = await >>= maybe (return ()) go
          go s = do lift $ insert (undefined :: val)
                    loop

haskell - 在管道内使用持久化

1 回答 1

Related

Reference