12

我正在尝试cat在 Haskell 中编写一个简单的程序。我想将多个文件名作为参数,并将每个文件顺序写入STDOUT,但我的程序只打印一个文件并退出。

我需要做什么才能让我的代码打印每个文件,而不仅仅是传入的第一个文件?

import Control.Monad as Monad
import System.Exit
import System.IO as IO
import System.Environment as Env

main :: IO ()
main = do
    -- Get the command line arguments
    args <- Env.getArgs

    -- If we have arguments, read them as files and output them
    if (length args > 0) then catFileArray args

    -- Otherwise, output stdin to stdout
    else catHandle stdin

catFileArray :: [FilePath] -> IO ()
catFileArray files = do
    putStrLn $ "==> Number of files: " ++ (show $ length files)
    -- run `catFile` for each file passed in
    Monad.forM_ files catFile

catFile :: FilePath -> IO ()
catFile f = do
    putStrLn ("==> " ++ f)
    handle <- openFile f ReadMode
    catHandle handle

catHandle :: Handle -> IO ()
catHandle h = Monad.forever $ do
    eof <- IO.hIsEOF h
    if eof then do
        hClose h
        exitWith ExitSuccess
    else
        hGetLine h >>= putStrLn

我正在运行这样的代码:

runghc cat.hs file1 file2
4

4 回答 4

20

您的问题是exitWith终止整个程序。所以,你不能真正使用forever循环文件,因为显然你不想“永远”运行函数,直到文件结束。catHandle你可以这样重写

catHandle :: Handle -> IO ()
catHandle h = do
    eof <- IO.hIsEOF h
    if eof then do
        hClose h
     else
        hGetLine h >>= putStrLn
        catHandle h

即,如果我们还没有达到 EOF,我们递归并读取另一行。

然而,这整个方法过于复杂。你可以把 c​​at 简单地写成

main = do
    files <- getArgs
    forM_ files $ \filename -> do
        contents <- readFile filename
        putStr contents

由于惰性 i/o,整个文件内容实际上并未加载到内存中,而是流式传输到 stdout。

如果您对操作员感到满意Control.Monad,整个程序可以缩短到

main = getArgs >>= mapM_ (readFile >=> putStr)
于 2012-07-13T17:21:46.697 回答
17

If you install the very helpful conduit package, you can do it this way:

module Main where

import Control.Monad
import Data.Conduit
import Data.Conduit.Binary
import System.Environment
import System.IO

main :: IO ()
main = do files <- getArgs
          forM_ files $ \filename -> do
            runResourceT $ sourceFile filename $$ sinkHandle stdout

This looks similar to shang's suggested simple solution, but using conduits and ByteString instead of lazy I/O and String. Both of those are good things to learn to avoid: lazy I/O frees resources at unpredictable times; String has a lot of memory overhead.

Note that ByteString is intended to represent binary data, not text. In this case we're just treating the files as uninterpreted sequences of bytes, so ByteString is fine to use. If OTOH we were processing the file as text—counting characters, parsing, etc—we'd want to use Data.Text.

EDIT: You can also write it like this:

main :: IO ()
main = getArgs >>= catFiles

type Filename = String

catFiles :: [Filename] -> IO ()
catFiles files = runResourceT $ mapM_ sourceFile files $$ sinkHandle stdout

In the original, sourceFile filename creates a Source that reads from the named file; and we use forM_ on the outside to loop over each argument and run the ResourceT computation over each filename.

However in Conduit you can use monadic >> to concatenate sources; source1 >> source2 is a source that produces the elements of source1 until it's done, then produces those of source2. So in this second example, mapM_ sourceFile files is equivalent to sourceFile file0 >> ... >> sourceFile filen—a Source that concatenates all of the sources.

EDIT 2: And following Dan Burton's suggestion in the comment to this answer:

module Main where

import Control.Monad
import Control.Monad.IO.Class
import Data.ByteString
import Data.Conduit
import Data.Conduit.Binary
import System.Environment
import System.IO

main :: IO ()
main = runResourceT $ sourceArgs $= readFileConduit $$ sinkHandle stdout

-- | A Source that generates the result of getArgs.
sourceArgs :: MonadIO m => Source m String
sourceArgs = do args <- liftIO getArgs
                forM_ args yield

type Filename = String          

-- | A Conduit that takes filenames as input and produces the concatenated 
-- file contents as output.
readFileConduit :: MonadResource m => Conduit Filename m ByteString
readFileConduit = awaitForever sourceFile

In English, sourceArgs $= readFileConduit is a source that produces the contents of the files named by the command line arguments.

于 2012-07-13T19:23:57.760 回答
5

我的第一个想法是这样的:

import System.Environment
import System.IO
import Control.Monad
main = getArgs >>= mapM_ (\name -> readFile name >>= putStr)

它并没有真正以 unix-y 方式失败,并且不做标准输入或多字节的东西,但它是“更多的 haskell”所以我只是想分享它。希望能帮助到你。

另一方面,我想它应该可以轻松处理大文件而不会填满内存,这要归功于 putStr 在文件读取期间已经可以清空字符串这一事实。

于 2012-07-13T17:29:50.927 回答
5

catHandle,它是从 间接调用的catFileArrayexitWith当它到达第一个文件的末尾时调用。这将终止程序,并且不再读取更多文件。

相反,您应该catHandle在到达文件末尾时从函数中正常返回。这可能意味着您不应该阅读forever

于 2012-07-13T17:16:55.460 回答