9

I have been reading a little bit about functional programming on the web lately and I think I got a basic idea about the concepts behind it.

I'm curious how everyday programming problems which involve some kind of state are solved in a pure functional programing language.

For example: how would the word count program from the book 'The C programming Language' be implemented in a pure functional language?

Any contributions are welcome as long as the solution is in a pure functional style.

Here's the word count C code from the book:

#include <stdio.h>

#define IN  1 /* inside a word */
#define OUT 0 /* outside a word */

/* count lines, words, and characters in input */
main()
{
  int c, nl, nw, nc, state;

  state = OUT;
  nl = nw = nc = 0;
  while ((c = getchar()) != EOF) {
    ++nc;
    if (c == '\n')
      ++nl;
    if (c == ' ' || c == '\n' || c = '\t')
      state = OUT;
    else if (state == OUT) {
      state = IN;
      ++nw;
    }
  }

  printf("%d %d %d\n", nl, nw, nc);
}
4

11 回答 11

9

基本上,在功能样式中,您需要将获取数据流的 IO 操作与基于当前字符和当前状态的一些状态转换的纯操作分开。

Tikhon 的 Haskell 解决方案非常简洁,但对输入数据执行了三遍,并将导致整个输入包含在内存中,直到计算出结果。您可以增量处理数据,我在下面使用 Text 包而不是其他高级 Haskell 工具来处理数据(这可能会以非 Haskell 人员的可理解性为代价来清理它)。

首先我们有我们的序言:

{-# LANGUAGE BangPatterns #-}

import Data.Text.Lazy as T
import Data.Text.Lazy.IO as TIO

然后我们定义我们的数据结构来保存进程的状态(字符数、单词数和行数以及状态 IN/OUT):

data Counts = Cnt { nc, nl, nw :: !Int
                  , state :: State  }
        deriving (Eq, Ord, Show)

data State = IN | OUT
        deriving (Eq, Ord, Show)

现在我定义一个“零”状态只是为了方便使用。我通常会制作一些辅助函数或使用像 lense 这样的包来Counts简单地增加结构中的每个字段,但是没有这个答案:

zeros :: Counts
zeros = Cnt 0 0 0 OUT

现在我将您的一系列 if/else 语句转换为纯状态机:

op :: Counts -> Char -> Counts
op c '\n' = c { nc = nc c + 1, nw = nw c + 1, nl = nl c + 1, state=OUT }
op c ch | ch == ' ' || ch == '\t' = c { nc = nc c + 1, state=OUT }
        | state c == OUT = c { nc = nc c + 1, nw = nw c + 1, state = IN }
        | otherwise  = c { nc = nc c + 1 }

最后,该main函数只获取输入流并将我们的操作折叠到字符上:

main = do
        contents <- TIO.getContents
        print $ T.foldl' op zeros contents

编辑:你提到不理解语法。这是一个更简单的版本,我将解释:

import Data.Text.Lazy as T
import Data.Text.Lazy.IO as TIO

op (nc, nw, nl, st) ch
  | ch == '\n'              = (nc + 1, nw + 1 , nl + 1, True)
  | ch == ' ' || ch == '\t' = (nc + 1, nw     , nl    , True)
  | st                      = (nc + 1, nw + 1 , nl    , False)
  | otherwise               = (nc + 1, nw     , nl    , st)

main = do
        contents <- TIO.getContents
        print $ T.foldl' op (0,0,0,True) contents
  • 这些import语句使我们可以访问我们使用的getContentsfoldl'函数。

  • op函数使用了一堆守卫 - 部件之类的| ch = '\n'- 基本上就像 C if/elseif/else 系列。

  • 元组( ... , ... , ... , ... )包含我们所有的状态。Haskell 变量是不可变的,因此我们通过在先前变量的值上加一个(或不加)来创建新的元组。

于 2012-04-07T06:35:26.837 回答
6

一种简单的方法是读取输入,然后使用一些简单的函数来获取行/单词/字符数。像这样的东西会起作用:

count :: String -> (Int, Int, Int)
count str = (length $ lines str, length $ words str, length str)

main :: IO ()
main = fmap count getContents >>= print

这不完全相同,但很接近。

这真的很简单。给定一个字符串,我们可以用标准函数把它变成一个行列表,用标准lines函数把它变成一个单词列表words。由于Stringis just [Char]length因此返回字符数。这就是我们获得三个计数的方式。(供参考,length $ lines str与 相同length (lines str)。)

重要的想法是IO——读取输入并将其打印出来——如何与实际逻辑分离。

此外,我们不是逐个字符地跟踪某些状态,而是通过对输入应用简单的函数来获得实际数字。这些函数都只是标准库函数的组合。

于 2012-04-07T01:37:26.780 回答
5

在你的循环中有四个状态变量,nc、nw、nl 和 state,加上下一个字符 c。循环通过循环记住上次循环的nc、nw、nl和状态,并且c通过循环改变每次迭代。想象一下,您将这些变量从循环中取出并将它们放入一个向量中:[state, nc, nw, nl]。然后将循环构造更改为一个接受两个参数的函数,第一个是向量 [state, nc, nw, nl],第二个是 c,并返回一个新向量,其中包含 nc、nw、nl 的更新值和状态。在 C-ish 伪代码中:

f([state, nc, nw, nl], c) {
    ++nc;
    if (c == '\n')
      ++nl;
    if (c == ' ' || c == '\n' || c = '\t')
      state = OUT;
    else if (state == OUT) {
      state = IN;
      ++nw;
    }
    return [state, nc, nw, nl];
}

现在您可以使用向量 [OUT, 0, 0, 0] 和字符串中的第一个字符(例如“hello, world”)调用该函数,它将返回一个新向量 [IN, 1, 0, 0 ]。用这个新向量和第二个字符'e'再次调用 f,它返回 [IN, 2, 0, 0]。对字符串中的其余字符重复此操作,最后一次调用将返回 [IN, 12, 2, 0],与 C 代码打印的值相同。基本思想是将状态变量从循环中取出,将循环的内容变成一个函数,并将状态变量的向量和下一个输入作为参数传递给该函数,并返回一个新的状态向量结果。有一个叫做 reduce 的函数可以做到这一点。

以下是您在 Clojure 中的操作方式(格式化以强调返回的向量):

(defn f [[state nc nw nl] c]
  (let [nl (if (= c \n)(inc nl) nl)]
    (cond
     (or (= c \space)(= c \n)(= c \t)) [:out  (inc nc) nw       nl]
     (= state :out)                    [:in   (inc nc) (inc nw) nl]
     true                              [state (inc nc) nw       nl]
)))

(defn wc [s] (reduce f [:out 0 0 0] s))

(wc "hello, world")

返回(并在 repl 中打印)[:in 12 2 0]

于 2012-04-07T04:42:48.987 回答
5

这是我在 Scheme 中对纯函数式、严格的、单通道、尾递归解决方案的拍摄:

(define (word-count input-port)
  (let loop ((c (read-char input-port))
             (nl 0)
             (nw 0)
             (nc 0)
             (state 'out))
    (cond ((eof-object? c)
           (printf "nl: ~s, nw: ~s, nc: ~s\n" nl nw nc))
          ((char=? c #\newline)
           (loop (read-char input-port) (add1 nl) nw (add1 nc) 'out))
          ((char-whitespace? c)
           (loop (read-char input-port) nl nw (add1 nc) 'out))
          ((eq? state 'out)
           (loop (read-char input-port) nl (add1 nw) (add1 nc) 'in))
          (else
           (loop (read-char input-port) nl nw (add1 nc) state)))))

word-count接收一个input port作为参数;请注意,没有创建其他数据结构(结构、元组、向量等),而是将所有状态保存在参数中。例如,用于计算包含以下内容的文件中的单词:

hello, world

像这样调用过程:

(call-with-input-file "/path/to/file" word-count)
> nl: 0, nw: 2, nc: 12
于 2012-04-08T05:23:03.677 回答
4

提到了 Common Lisp,但它不是一种纯函数式编程语言,它的标准不支持 TCO。单独的实现可以。

尾递归版本,如果编译器支持它:

(defun word-count (&optional (stream *standard-input*))
  (labels ((word-count-aux (in-p chars words lines)
             (case (read-char stream nil :eof)
               (:eof (values chars words lines))
               (#\newline (word-count-aux nil (1+ chars) words (1+ lines)))
               ((#\space #\tab)   (word-count-aux nil (1+ chars) words lines))
               (otherwise (word-count-aux t
                                          (1+ chars)
                                          (if in-p words (1+ words))
                                          lines)))))
    (word-count-aux nil 0 0 0)))

但由于 TCO 不在标准中,便携式版本看起来更像这样:

(defun word-count (&optional (stream *standard-input*)
                   &aux (in-p nil) (chars 0) (words 0) (lines 0) char)
  (loop while (setf char (read-char stream nil nil)) do
        (case char
          (#\newline         (setf in-p nil) (incf lines))
          ((#\space #\tab)   (setf in-p nil))
          (otherwise (unless in-p (incf words)) (setf in-p t)))
        (incf chars))
  (values chars words lines))

以上不再是Functional

我们可以用高阶替换循环stream-map

(defun stream-map (function stream)
  (loop for char = (read-char stream nil nil)
        while char do (funcall function char)))

(defun word-count (&optional (stream *standard-input*)
                   &aux (in-p nil) (chars 0) (words 0) (lines 0) char)
  (stream-map (lambda (char)
                (incf chars)
                (when (eql char #\newline)
                  (incf lines))
                (if (member char '(#\space #\newline #\tab))
                    (setf in-p nil)
                  (unless in-p
                    (incf words)
                    (setf in-p t))))
              stream)
  (values chars words lines))

状态由闭包修改。为了摆脱它,我们可以实现一个stream-reduce.

(defun stream-reduce (function stream &key initial-value)
  (let ((value initial-value))
    (loop for char = (read-char stream nil nil)
          while char
          do (setf value (funcall function value char)))
  value))

(defun word-count (&optional (stream *standard-input*))
  (rest (stream-reduce
          (lambda (state char)
            (destructuring-bind (in-p chars words lines) state
               (case char
                  (#\newline         (list nil (1+ chars) words (1+ lines)))
                  ((#\space #\tab)   (list nil (1+ chars) words lines))
                  (otherwise         (list t
                                           (1+ chars)
                                           (if in-p words (1+ words))
                                           lines)))))
          stream
          :initial-value (list nil 0 0 0))))
于 2012-04-08T08:13:53.240 回答
4

这是该程序的 Scheme 版本,来自我的博客,它实现了整个 Unix 字数统计程序,包括参数和文件处理。关键函数是 wc,它是纯函数式的。它将所有局部变量移动到局部函数的参数中(通过 named-let 定义),这是将命令式循环转换为函数式样式的标准习惯用法。手册页和代码如下所示:

NAME

    wc -- word count

SYNOPSIS

    wc [ -lwc ] [ name ... ]

DESCRIPTION

    Wc counts lines, words and characters in the named files,
    or in the standard input if no name appears. A word is a
    maximal string of characters delimited by spaces, tabs or
    newlines.

    If the optional argument is present, just the specified
    counts (lines, words, or characters) are selected by the
    letters l, w or c.

#! /usr/bin/scheme --script

(define l-flag #t)
(define w-flag #t)
(define c-flag #t)

(define (update-flags fs)
  (if (not (member #\l fs)) (set! l-flag #f))
  (if (not (member #\w fs)) (set! w-flag #f))
  (if (not (member #\c fs)) (set! c-flag #f)))

(define (put-dec n width)
  (let* ((n-str (number->string n)))
    (display (make-string (- width (string-length n-str)) #\space))
    (display n-str)))

(define (wc)
  (let loop ((inword #f) (c (read-char)) (ls 0) (ws 0) (cs 0))
    (cond ((eof-object? c) (values ls ws cs))
          ((char=? c #\newline)
            (loop #f (read-char) (add1 ls) ws (add1 cs)))
          ((not (member c '(#\space #\newline #\tab)))
            (if inword
                (loop #t (read-char) ls ws (add1 cs))
                (loop #t (read-char) ls (add1 ws) (add1 cs))))
          (else (loop #f (read-char) ls ws (add1 cs))))))

(define (main args)
  (when (and (pair? args) (char=? (string-ref (car args) 0) #\-))
        (update-flags (cdr (string->list (car args))))
        (set! args (cdr args)))
  (if (null? args)
      (let-values (((ls ws cs) (wc)))
        (when l-flag (display ls) (display " "))
        (when w-flag (display ws) (display " "))
        (when c-flag (display cs) (display " "))
        (newline))
      (let loop ((args args) (l-tot 0) (w-tot 0) (c-tot 0))
        (if (null? args)
            (begin (when l-flag (put-dec l-tot 12))
                   (when w-flag (put-dec w-tot 12))
                   (when c-flag (put-dec c-tot 12)))
            (with-input-from-file (car args)
              (lambda ()
                (let-values (((ls ws cs) (wc)))
                  (when l-flag (put-dec ls 12))
                  (when w-flag (put-dec ws 12))
                  (when c-flag (put-dec cs 12))
                  (display " ") (display (car args)) (newline)
                  (loop (cdr args) (+ l-tot ls) (+ w-tot ws) (+ c-tot cs)))))))))     

(main (cdr (command-line)))
于 2012-04-08T14:35:25.593 回答
2

在 Haskell 中使用严格的 IO 而不是惰性的。只做单词,但您可以在此基础上轻松实现字符和线条。需要textconduit包:

module Main
where

import Control.Applicative
import Data.Conduit
import qualified Data.Conduit.Binary as CB
import qualified Data.Conduit.List as CL
import qualified Data.Conduit.Text as CT
import qualified Data.Text as T
import System.Environment

main :: IO ()
main = do args <- getArgs
          print <$> (runResourceT $
            CB.sourceFile (args !! 0)
                $$  CB.lines
                =$= CT.decode CT.utf8
                =$= CL.map T.words
                =$  CL.fold (\acc words -> acc + length words) 0)
于 2012-04-08T15:17:51.797 回答
2

这是一个基于此处发布的 Clojure 示例的解决方案,但在 CL 中使用递归。

(defstruct (state (:constructor make-state (state chars words lines)))
  state chars words lines)


(defun wc (state stream)
  (symbol-macrolet ((s (state-state state))
                    (c (state-chars state))
                    (w (state-words state))
                    (l (state-lines state)))

    (case (read-char stream nil :eof)
      (:eof state)

      (#\Newline (wc (make-state :out (1+ c) w (1+ l)) stream))
      (#\Space   (wc (make-state :out (1+ c) w     l)  stream))

      (t (if (eq s :out)
             (wc (make-state :in (1+ c) (1+ w) l) stream)
             (wc (make-state :in (1+ c)     w  l) stream))))))


(with-input-from-string (stream "Hello Functional Programming World")
  (wc (make-state :out 0 0 0) stream))

;;; => #S(STATE :STATE :IN :CHARS 34 :WORDS 4 :LINES 0)

于 2012-04-07T13:15:53.263 回答
2

我相信你可以写得更优雅一些,同时仍然只对输入进行一次迭代,但你需要让 GHC 做更多的工作,当然使用-O2.

我还没有编译这段代码,更不用说它的速度与 Thomas DuBuisson 的答案,但这应该表明基本方向。

{-# LANGUAGE BangPatterns #-}
import Data.List

wordcount = snd . foldl' go (False,0) 
  where  go (!b,!n) !c =  if  elem c [' ','\t','\n']  then  (False,n)
              else  (True, n + if b then 0 else 1)

linecount = foldl' go 0
  where  go !n !c = n + if c == '\n' then 1 else 0

main = interact $ show . go
  where  go x = (linecount x, wordcount x, foldl' (\!n _ ->n+1) 0 x)

如果我正确理解流融合,那么 GHC 应该 inlinewordcountlinecountinto main,将三个foldl'命令合并为一个,希望如此,然后开始重新安排 if 检查。我希望它是内联的elemfoldl'当然也是。

如果没有,您当然可以强制内联并可能创建一个简单的融合规则,但也许默认值就足够了。或者,也许一些简单的改变会产生预期的效果。

顺便说一句,我写这篇文章foldl' (\n _ ->n+1) 0 x只是因为我听说过不好的故事length,但也许length效果很好,另一个值得分析的变化。

于 2012-04-07T23:07:20.660 回答
1

这是一个 Haskell 实现,我试图与原始 C 程序所遵循的方法保持接近。迭代通常变成折叠操作,包含状态的变量最终作为传递给操作的第一个参数fold

-- Count characters, words, and lines in an input string.
wordCount::String->(Int, Int, Int)
wordCount str = (c,w,l)
  where (inWord,c,w,l) = foldl op (False,0,0,1) str
          where op (inWord,c,w,l) next | next == '\n' = (False,c+1,w,l+1)
                                       | next == '\t' || next == ' ' = (False,c+1,w,l)
                                       | inWord == False = (True,c+1,w+1,l)
                                       | otherwise = (True,c+1,w,l)

main = interact $ show . wordCount
于 2012-04-16T19:21:20.497 回答
1

这是 Typed Racket 中使用matchfor循环宏的版本:

(: word-count : Input-Port -> Void)
(define (word-count in)
  (define-values (nl nw nc st)
    (for/fold: ([nl : Integer 0] [nw : Integer 0] [nc : Integer 0] 
                [state : (U 'in 'out) 'out])
      ([c (in-input-port-chars in)])
      (match* (c state)
        [(#\newline _) (values (add1 nl) nw (add1 nc) 'out)]
        [((? char-whitespace?) _)
         (values (add1 nl) nw (add1 nc) 'out)]
        [(_ 'out) (values nl (add1 nw) (add1 nc) 'in)]
        [(_ _) (values nl nw (add1 nc) state)])))
  (printf "nl: ~s, nw: ~s, nc: ~s\n" nl nw nc))
于 2012-04-10T19:43:12.327 回答