file-io - 方案帮助 - 文件统计

Question

所以我必须在 Scheme 中完成一个项目，但我很困惑。基本上，程序所做的是打开一个文件并输出统计信息。现在我可以计算字符数，但我还需要计算行数和单词数。我现在只是想解决这种情况，但最终我还必须接受两个文件——第一个是文本文件，比如一本书。第二个是单词列表，我必须计算这些单词在第一个文件中出现的次数。显然，我将不得不处理列表，但我希望能得到一些关于去哪里的帮助。这是我到目前为止的代码（并且有效）

(define filestats
          (lambda (srcf wordcount linecount charcount )

                (if (eof-object? (peek-char srcf ) )
                    (begin
                        (close-port srcf)
                        (display linecount)
                        (display " ")
                        (display wordcount)
                        (display " ")
                        (display charcount)
                        (newline) ()
                    )
                    (begin
                        (read-char srcf)
                        (filestats srcf  0 0 (+ charcount 1))   
                    )
                )

            )
)

(define filestatistics
  (lambda (src)
    (let ((file (open-input-file src)))
       (filestats file 0 0 0)
    )
  )
)

score 0 · Accepted Answer

使用 Scheme 的字数统计算法之前已经在 Stack Overflow 中解释过，例如在这里（向上滚动到页面顶部以查看 C 中的等效程序）：

(define (word-count input-port)
  (let loop ((c (read-char input-port))
             (nl 0)
             (nw 0)
             (nc 0)
             (state 'out))
    (cond ((eof-object? c)
           (printf "nl: ~s, nw: ~s, nc: ~s\n" nl nw nc))
          ((char=? c #\newline)
           (loop (read-char input-port) (add1 nl) nw (add1 nc) 'out))
          ((char-whitespace? c)
           (loop (read-char input-port) nl nw (add1 nc) 'out))
          ((eq? state 'out)
           (loop (read-char input-port) nl (add1 nw) (add1 nc) 'in))
          (else
           (loop (read-char input-port) nl nw (add1 nc) state)))))

该过程接收一个输入端口作为参数，因此可以将其应用于文件。请注意，为了计算单词和行数，您需要测试当前字符是换行符还是空白字符。并且需要一个额外的标志（state在代码中调用）来跟踪新单词的开始/结束。

score 0 · Accepted Answer

如何将文件“标记”为行列表，其中行是单词列表，单词是字符列表。

(define (tokenize file)
  (with-input-from-file file
    (lambda ()
      (let reading ((lines '()) (words '()) (chars '()))
        (let ((char (read-char)))
          (if (eof-object? char)
              (reverse lines)
              (case char
                ((#\newline) (reading (cons (reverse (cons (reverse chars) words)) lines) '() '()))
                ((#\space)   (reading lines (cons (reverse chars) words) '()))
                (else        (reading lines words (cons char chars))))))))))

一旦你完成了这个，剩下的就很简单了。

> (tokenize "foo.data")
(((#\a #\b #\c) (#\d #\e #\f))
 ((#\1 #\2 #\3) (#\x #\y #\z)))

file-io - 方案帮助 - 文件统计

2 回答 2

Related

Reference