haskell - Parsec：线路延续问题

Question

我很难弄清楚这一点。

因此，如果一个字符串后跟一个或多个换行符，后面没有一个或多个空格 - 它是行尾，我返回该行。如果一个字符串后跟一个或多个换行符，然后是一个或多个空格 - 这是一个行继续，我会继续前进，直到遇到没有空格的换行符。然后返回它。

这完全锁定了我的大脑。请帮忙。

更新

如果对我上面的解释有混淆，我举一个例子

From: John Doe <j.doe@gmail.com>
To: dude@cooldomain.biz
Content-Type: multipart/alternative;
  boundary=047d7b2e4e3cdc627304eb094bfe

鉴于上述文本，我应该能够解析 3 行以进行进一步处理，如下所示

["From: John Doe <j.doe@gmail.com>", "To: dude@cooldomain.biz", "Content-Type: multipart/alternative; boundary=047d7b2e4e3cdc627304eb094bfe"]

score 1 · Accepted Answer

我建议将您的解析器分成多遍，这样解析表达式的代码就不会因为空白处理而杂乱无章。例子：

lex :: String -> [Token]

处理空格并将输入拆分为标记。
parse :: Parsec [Token] Expr

将标记流转换为表达式树。

这是加入续行的一种非常简单的方法：

-- | For each line with whitespace in front of it,
-- remove it and append it to the preceding line.
joinContinuedLines :: [String] -> [String]
joinContinuedLines [] = []
joinContinuedLines (x0:xs0) = go x0 xs0
  where
    go joinedLine (x : xs)
      | startsWithSpace x = go (joinedLine ++ x) xs
      | otherwise         = joinedLine : go x xs
    go joinedLine [] = [joinedLine]

    startsWithSpace (x:_) = isSpace x
    startsWithSpace ""    = False

score 1 · Accepted Answer

像这样的伪代码，也许（假设你想保留所有的空白）：

continuedLine = go "" where
    go s = do
        s'      <- many (noneOf "\n")
        empties <- many (char '\n')
        let soFar = s ++ s' ++ empties
        (char ' ' >> go (soFar ++ " ")) <|> return soFar

应用您最喜欢的转换来消除深度嵌套的左关联++s。

编辑：嗯，我突然想到我可能忽略了一个微妙之处。如果这不是延续，您是否希望让换行符“未解析”，可以这么说？如果是这样，您可以使用try执行以下操作：

continuedLine = go "" where
    continuationHerald = do
        empties <- many (char '\n')
        char ' '
        return (empties ++ " ")

    go s = do
        s'   <- many (noneOf "\n")
        cont <- try (Just <$> continuationHerald) <|> return Nothing
        case cont of
            Nothing -> return (s ++ s')
            Just empties -> go (s ++ s' ++ empties)

请注意，我们竭尽全力避免将递归调用go放在try. 这是一个效率问题：这样做会导致解析器拒绝放弃备用return Nothing分支，并阻止对正在解析的字符串的开头进行垃圾收集。

haskell - Parsec：线路延续问题

2 回答 2

Related

Reference