haskell - Reimplementing getContents using getChar

Question

On my journing towards grasping lazy IO in Haskell I tried the following:

main = do
  chars <- getContents
  consume chars

consume :: [Char] -> IO ()
consume [] = return ()
consume ('x':_) = consume []
consume (c : rest) = do
  putChar c
  consume rest

which just echos all characters typed in stdin until I hit 'x'.

So, I naively thought it should be possible to reimplement getContents using getChar doing something along the following lines:

myGetContents :: IO [Char]
myGetContents = do
  c <- getChar
  -- And now?
  return (c: ???)

Turns out it's not so simple since the ??? would require a function of type IO [Char] -> [Char] which would - I think - break the whole idea of the IO monad.

Checking the implementation of getContents (or rather hGetContents) reveals a whole sausage factory of dirty IO stuff. Is my assumption correct that myGetContents cannot be implemented without using dirty, ie monad-breaking, code?

score 6 · Accepted Answer

您需要一个新的原语unsafeInterleaveIO :: IO a -> IO a来延迟其参数操作的执行，直到该操作的结果被评估。然后

myGetContents :: IO [Char]
myGetContents = do
  c <- getChar
  rest <- unsafeInterleaveIO myGetContents
  return (c : rest)

score 1 · Accepted Answer

如果可能的话，你真的应该避免使用任何东西System.IO.Unsafe。它们往往会破坏引用透明度，除非绝对必要，否则它们不是 Haskell 中使用的常用函数。

如果您稍微更改您的类型签名，我怀疑您可以获得更惯用的方法来解决您的问题。

consume :: Char -> Bool
consume 'x' = False
consume _   = True

main :: IO ()
main = loop
  where
    loop = do
      c <- getChar
      if consume c
      then do
        putChar c
        loop
      else return ()

score 0 · Accepted Answer

您可以在没有任何技巧的情况下做到这一点。

如果您的目标只是将所有内容读stdin入 a String，则不需要任何unsafe*功能。

IO是 Monad，而 Monad 是 Applicative Functor。Functor 由 function 定义fmap，其签名为：

fmap :: Functor f => (a -> b) -> f a -> f b

满足这两个定律：

fmap id = id
fmap (f . g) = fmap f . fmap g

实际上，fmap将函数应用于包装的值。

给定一个特定的字符'c'，它的类型是fmap ('c':)什么？我们可以把这两种类型写下来，然后统一起来：

fmap        :: Functor f => (a      -> b     ) -> f a      -> f b
     ('c':) ::               [Char] -> [Char]
fmap ('c':) :: Functor f => ([Char] -> [Char]) -> f [Char] -> f [Char]

回想一下，这IO是一个仿函数，如果我们想定义myGetContents :: IO [Char]，使用它似乎是合理的：

myGetContents :: IO [Char]
myGetContents = do
  x <- getChar
  fmap (x:) myGetContents

这很接近，但不完全等同于getContents，因为此版本将尝试读取文件末尾并抛出错误而不是返回字符串。看看它就应该清楚：没有办法返回一个具体的列表，只有一个无限的 cons 链。知道具体情况是""在 EOF（并使用中缀语法<$>for fmap）使我们能够：

import System.IO
myGetContents :: IO [Char]
myGetContents = do
  reachedEOF <- isEOF
  if reachedEOF
  then return []
  else do
    x <- getChar
    (x:) <$> myGetContents

Applicative 类提供了（轻微的）简化。

回想一下，这IO是一个 Applicative Functor，而不仅仅是任何旧的 Functor。有与此类型类相关的“应用法则”，很像“函子法则”，但我们将特别关注<*>：

<*> :: Applicative f => f (a -> b) -> f a -> f b

这几乎与fmap(aka <$>) 相同，只是要应用的函数也被包装了。然后，我们可以else使用 Applicative 样式避免子句中的绑定：

import System.IO
myGetContents :: IO String
myGetContents = do
  reachedEOF <- isEOF
  if reachedEOF
  then return []
  else (:) <$> getChar <*> myGetContents

如果输入可能是无限的，则需要进行一项修改。

unsafe*还记得我说过如果您只想将所有内容读stdin入 a就不需要这些功能String吗？好吧，如果你只是想要一些输入，你可以。如果您的输入可能无限长，那么您肯定会这样做。最终程序的不同之处在于一个导入和一个单词：

import System.IO
import System.IO.Unsafe
myGetContents :: IO [Char]
myGetContents = do
  reachedEOF <- isEOF
  if reachedEOF
  then return []
  else (:) <$> getChar <*> unsafeInterleaveIO myGetContents

惰性 IO 的定义函数是unsafeInterleaveIO(from System.IO.Unsafe)。这会延迟操作的计算，IO直到需要它为止。

haskell - Reimplementing getContents using getChar

3 回答 3

您可以在没有任何技巧的情况下做到这一点。

Applicative 类提供了（轻微的）简化。

如果输入可能是无限的，则需要进行一项修改。

Related

Reference