我实际上会倒着做这个问题:我将从一元解决方案开始,然后从它倒退到手动解决方案。如果您手动找到正确的解决方案,这将产生相同的代码。
一元解析器的典型类型签名具有以下形式:
type Parser a = String -> Maybe (a, String)
请注意与您的表单的细微差别,您String
在Maybe
. 两者都是有效的,但我更喜欢这种形式,因为如果解析失败,我认为剩余部分String
无效。
这种类型实际上是 的一个特例StateT
,定义为:
newtype StateT s m a = StateT { runStateT :: s -> m (a, s) }
请注意,如果我们选择:
s = String
m = Maybe
...我们返回Parser
类型:
type Parser a = StateT String Maybe a
-- or: type Parser = StateT String Maybe
这很酷的是我们只需要手动定义一个解析器,它是检索单个字符的解析器:
anyChar :: Parser Char
anyChar = StateT $ \str -> case str of
[] -> Nothing
c:cs -> Just (c, cs)
请注意,如果我们删除StateT
包装器,则类型为anyChar
:
anyChar :: String -> Maybe (Char, String)
当我们把它包装进去时,StateT
它变成:
anyChar :: StateT String Maybe Char
...这只是Parser Char
。
一旦我们有了原始解析器,我们就可以使用StateT
'sMonad
接口定义所有其他解析器。例如,让我们定义一个匹配单个字符的解析器:
import Control.Monad
char :: Char -> Parser ()
char c' = do
c <- anyChar
guard (c == c')
那很简单! guard
我们的 monad需要一个MonadPlus
实例,但我们已经有了一个。原因在于以下两个MonadPlus
情况:
instance (MonadPlus m) => MonadPlus (StateT s m) where ...
instance MonadPlus Maybe where ...
这两个实例的组合意味着StateT s Maybe
自动实现MonadPlus
,所以我们可以使用guard
它,它会神奇地做“正确的事情”。
有了这两个解析器,您的最终解析器就变得非常容易编写:
data Wff = Or Char Char deriving (Show)
parseWff :: Parser Wff
parseWff = do
char '['
a <- anyChar
char '|'
b <- anyChar
char ']'
return (Or a b)
这更清楚,更容易理解发生了什么。它也有效:
>>> runStateT parseWff "[A|B]"
Just (Or 'A' 'B',"")
向后工作
这给我们带来了您最初的问题:您如何手写相同的行为?我们将从Monad
和MonadPlus
实例向后工作,以推断它们在幕后为我们做了什么。
要做到这一点,我们必须推导出它的基本 monad 是什么时的 theMonad
和MonadPlus
实例。让我们从实例开始:StateT
Maybe
Monad
StateT
instance (Monad m) => Monad (StateT s m) where
return r = StateT (\s -> return (r, s))
m >>= f = StateT $ \s0 -> do
(a, s1) <- runStateT m s0
runStateT (f a) s1
请注意,它是在基本 monad 的一般术语中定义的。这意味着我们还需要Monad
实例Maybe
来派生上述代码的作用:
instance Monad Maybe where
return = Just
m >>= f = case m of
Nothing -> Nothing
Just a -> f a
如果我们将 monad 实例替换为Maybe
monadStateT
实例,我们会得到:
instance Monad (StateT s Maybe) where
return r = StateT (\s -> Just (r, s))
m >>= f = StateT $ \s0 -> case runStateT m s0 of
Nothing -> Nothing
Just (a, s1) -> runStateT (f a) s1
我们可以做同样的事情来派生 的Monad
实例StateT s Maybe
。我们只需要MonadPlus
举例说明StateT
和 `Maybe:
instance (MonadPlus m) => MonadPlus (StateT s m) where
mzero = StateT (\_ -> mzero)
mplus (StateT f) (StateT g) = StateT (\s -> mplus (f s) (g s))
instance MonadPlus Maybe where
mzero = Nothing
mplus m1 m2 = case m1 of
Just a -> Just a
Nothing -> case m2 of
Just b -> Just b
Nothing -> Nothing
...并将它们合二为一:
instance MonadPlus (StateT s Maybe) where
mzero = StateT (\_ -> Nothing)
mplus (StateT f) (StateT g) = StateT $ \s -> case f s of
Just a -> Just a
Nothing -> case g s of
Just b -> Just b
Nothing -> Nothing
现在我们可以推导出解析器在幕后所做的事情。让我们从char
解析器开始:
char c' = do
c <- anyChar
guard (c == c')
这对:
char c' = anyChar >>= \c -> guard (c == c')
我们刚刚导出了(>>=)
for 的作用StateT s Maybe
,所以让我们将其替换为:
char c' = StateT $ \str0 -> case runStateT anyChar str0 of
Nothing -> Nothing
Just (a, str1) -> runStateT ((\c -> guard (c == c')) a) str1
我们已经知道 的定义anyChar
,所以让我们将其替换为:
char c' = StateT $ \str0 -> case runStateT (StateT $ \str -> case str of
[] -> Nothing
c:cs -> Just (c, cs) ) str0 of
Nothing -> Nothing
Just (a, str1) -> runStateT ((\c -> guard (c == c')) a) str1
我们也知道这runStateT
是 的倒数StateT
,所以:
char c' = StateT $ \str0 -> case (\str -> case str of
[] -> Nothing
c:cs -> Just (c, cs) ) str0 of
Nothing -> Nothing
Just (a, str1) -> runStateT ((\c -> guard (c == c')) a) str1
然后我们可以将 lambda 应用于str0
:
char c' = StateT $ \str0 -> case (case str0 of
[] -> Nothing
c:cs -> Just (c, cs) ) of
Nothing -> Nothing
Just (a, str1) -> runStateT ((\c -> guard (c == c')) a) str1
现在我们将外部 case 语句分布在内部 case 语句上:
char c' = StateT $ \str0 -> case str0 of
[] -> case Nothing of
Nothing -> Nothing
Just (a, str1) -> runStateT ((\c -> guard (c == c')) a) str1
c:cs -> case Just (c, cs) of
Nothing -> Nothing
Just (a, str1) -> runStateT ((\c -> guard (c == c')) a) str1
...并评估案例陈述:
char c' = StateT $ \str0 -> case str0 of
[] -> Nothing
c:cs -> runStateT ((\c -> guard (c == c')) c) cs
然后我们可以将 lambda 应用于c
:
char c' = StateT $ \str0 -> case str0 of
[] -> Nothing
c:cs -> runStateT (guard (c == c')) cs
为了进一步简化,我们需要知道是什么guard
。这是它的源代码:
guard pred = if pred then return () else mzero
我们已经知道什么return
和mzero
forStateT s Maybe
是什么,所以让我们将它们替换为:
guard pred = if pred then StateT (\s -> Just ((), s)) else StateT (\_ -> Nothing)
现在我们可以将它内联到我们的函数中:
char c' = StateT $ \str0 -> case str0 of
[] -> Nothing
c:cs -> runStateT (if (c == c')
then StateT (\s -> Just ((), s))
else StateT (\_ -> Nothing) ) cs
如果我们分发runStateT
它,我们会得到:
char c' = StateT $ \str0 -> case str0 of
[] -> Nothing
c:cs -> (if (c == c')
then (\s -> Just ((), s))
else (\_ -> Nothing) ) cs
同样,我们可以将两个分支应用于cs
:
char c' = StateT $ \str0 -> case str0 of
[] -> Nothing
c:cs -> if (c == c') then Just ((), cs) else Nothing
Monad
如果我们根本不使用orMonadPlus
实例,这就是我们手动编写的等效代码。
最终解析器
我现在将对最后一个函数重复此过程,但将推导留给您作为练习:
parseWff = do
char '['
a <- anyChar
char '|'
b <- anyChar
char ']'
return (Or a b)
parseWff = StateT $ \str0 -> case str0 of
[] -> Nothing
c1:str1 -> if (c1 == '[')
then case str1 of
[] -> Nothing
c2:str2 -> case str2 of
[] -> Nothing
c3:str3 -> if (c3 == '|')
then case str3 of
[] -> Nothing
c4:str4 -> case str4 of
[] -> Nothing
c5:str5 -> if (c5 == ']')
then Just (Or c2 c4, str5)
else Nothing
else Nothing
else Nothing
...但我们可以进一步简化为:
parseWff = StateT $ \str0 -> case str0 of
'[':c2:'|':c4:']':str5 -> Just (Or c2 c4, str5)
_ -> Nothing
请注意,与您编写的函数不同,这不使用任何部分函数,如tail
或不完整的模式匹配。此外,您编写的代码无法编译,但即使编译了,它仍然会给出错误的行为。