2

我正在尝试使用 LispWorks 检测文件编码。

LispWorks 应该具备这样的功能,请参阅External Formats and File Streams

[注:细节基于@rainer-joswig 和@svante 评论]

system:*file-encoding-detection-algorithm*设置为默认值,

(setf system:*file-encoding-detection-algorithm*
      '(find-filename-pattern-encoding-match
       find-encoding-option
       detect-utf32-bom
       detect-unicode-bom
       detect-utf8-bom
       specific-valid-file-encoding
       locale-file-encoding))

并且,

;; Specify the correct characters
(lw:set-default-character-element-type 'cl:character)

此处提供了一些可验证的文件:

UNICODE并被LATIN-1正确检测

;; UNICODE
;; http://www.humancomp.org/unichtm/tongtwst.htm
(with-open-file (ss "/tmp/tongtwst.htm")
  (stream-external-format ss))
;; => (:UNICODE :LITTLE-ENDIAN T :EOL-STYLE :CRLF)

;; LATIN-1
(with-open-file (ss "/tmp/windows-1252-2000.ucm")
  (stream-external-format ss))
;; => (:LATIN-1 :EOL-STYLE :LF)

检测UTF-8不能马上起作用

;; UTF-8 encoding
;; http://www.humancomp.org/unichtm/tongtwst8.htm
(with-open-file (ss "/tmp/tongtws8.htm")
  (stream-external-format ss))
;; => (:LATIN-1 :EOL-STYLE :CRLF)

添加UTF-8*specific-valid-file-encodings*使其工作,

(pushnew :utf-8 system:*specific-valid-file-encodings*)
;; system:*specific-valid-file-encodings*
;; => (:UTF-8)

;; http://www.humancomp.org/unichtm/tongtwst8.htm
(with-open-file (ss "/tmp/tongtws8.htm")
  (stream-external-format ss))
;; => (:UTF-8 :EOL-STYLE :CRLF)

但是现在与LATIN-1上面相同的文件被检测为 UTF-8

(with-open-file (ss "/tmp/windows-1252-2000.ucm")
  (stream-external-format ss))
;; => (:UTF-8 :EOL-STYLE :LF)

也推LATIN-1*specific-valid-file-encodings*

(pushnew :latin-1 system:*specific-valid-file-encodings*)
;; system:*specific-valid-file-encodings*
;; => (:LATIN-1 :UTF-8)

;; This one works again
(with-open-file (ss "/tmp/windows-1252-2000.ucm")
  (stream-external-format ss))
;; => (:LATIN-1 :EOL-STYLE :LF)

;; But this one, which was properly detected as `UTF-8`,
;; is now detected as `LATIN-1`, *which is wrong.*
(with-open-file (ss "/tmp/tongtws8.htm")
  (stream-external-format ss))
;; => (:LATIN-1 :EOL-STYLE :CRLF)

我做错了什么?

如何使用 LispWorks 正确检测文件编码?

4

0 回答 0