parsing - 解析制表符分隔的字符串

Question

我在弄清楚如何将制表符分隔的字符串分隔成数据块时遇到了一些麻烦，例如，如果我有一个正在读取的文本文件看起来像这样

a1     b1     c1     d1     e1
a2     b2     c2     d2     e2

我读了我文件的第一行并得到一个字符串

"a1     b1     c1     d1      e2"

我想把它分成 5 个变量 a、b、c、d 和 e，或者创建一个列表（abcde）。有什么想法吗？

谢谢。

score 2 · Accepted Answer

尝试将括号连接到输入字符串的前后，然后使用read-from-string（我假设您使用的是 Common Lisp，因为您标记了问题剪辑）。

(setf str "a1   b1      c1      d1      e2")
(print (read-from-string (concatenate 'string "(" str ")")))

score 2 · Accepted Answer

还有另一种解决方法（也许更强大一点），您也可以轻松修改它，以便在调用回调后可以在字符串中“设置”一个字符，但我没有这样做，因为看来你不需要这种能力。另外，在后一种情况下，我宁愿使用宏。

(defun mapc-words (function vector
                  &aux (whites '(#\Space #\Tab #\Newline #\Rubout)))
  "Iterates over string `vector' and calls the `function'
with the non-white characters collected so far.
The white characters are, by default: #\Space, #\Tab
#\Newline and #\Rubout.
`mapc-words' will short-circuit when `function' returns false."
  (do ((i 0 (1+ i))
       (start 0)
       (len 0))
      ((= i (1+ (length vector))))
    (if (or (= i (length vector)) (find (aref vector i) whites))
        (if (> len 0)
            (if (not (funcall function (subseq vector start i)))
                (return-from map-words)
                (setf len 0 start (1+ i)))
            (incf start))
        (incf len))) vector)

(mapc-words
 #'(lambda (word)
     (not
      (format t "word collected: ~s~&" word)))
 "a1     b1     c1     d1     e1
a2     b2     c2     d2     e2")

;; word collected: "a1"
;; word collected: "b1"
;; word collected: "c1"
;; word collected: "d1"
;; word collected: "e1"
;; word collected: "a2"
;; word collected: "b2"
;; word collected: "c2"
;; word collected: "d2"
;; word collected: "e2"

如果您想在阅读字符串时修改字符串，可以使用以下示例宏，但我对它并不完全满意，所以也许有人会想出一个更好的变体。

(defmacro with-words-in-string
    ((word start end
           &aux (whites '(#\Space #\Tab #\Newline #\Rubout)))
     s
     &body body)
  `(do ((,end 0 (1+ ,end))
        (,start 0)
        (,word)
        (len 0))
       ((= ,end (1+ (length ,s))))
     (if (or (= ,end (length ,s)) (find (aref ,s ,end) ',whites))
         (if (> len 0)
             (progn
               (setf ,word (subseq ,s ,start ,end))
               ,@body
               (setf len 0 ,start (1+ ,end)))
             (incf ,start))
         (incf len))))

(with-words-in-string (word start end)
    "a1     b1     c1     d1     e1
a2     b2     c2     d2     e2"
(format t "word: ~s, start: ~s, end: ~s~&" word start end))

score 0 · Accepted Answer

假设它们是选项卡（不间隔），那么这将创建一个列表

(defun tokenize-tabbed-line (line)
  (loop 
     for start = 0 then (+ space 1)
     for space = (position #\Tab line :start start)
     for token = (subseq line start space)
     collect token until (not space)))

结果如下：

CL-USER> (tokenize-tabbed-line "a1  b1  c1  d1  e1")
("a1" "b1" "c1" "d1" "e1")

parsing - 解析制表符分隔的字符串

3 回答 3

Related

Reference