emacs - 将 Unicode (UTF-8) 代码点转换为字节

Question

我一直在搜索 C 源代码，但我找不到这个函数，我真的不想自己写一个，因为它绝对必须在那里。

详细说明：Unicode 点表示为 U+######## - 这很容易获得，我需要的是字符写入文件的格式（例如）。Unicode 代码点转换为字节，最右边字节的 7 位写入第一个字节，然后下一个字节的 6 位写入下一个字节，依此类推。Emacs 当然知道怎么做，但是我找不到从它获取 UTF-8 编码字符串的字节序列作为字节序列（每个包含 8 位）的方法。

诸如get-byteor之类的函数multybite-char-to-unibyte仅适用于可以使用不超过 8 位表示的字符。我需要相同的东西get-byte，但对于多字节字符，这样我将收到整数向量 0..256 或单个长整数 0..2^32 而不是整数 0..256。

编辑

以防万一以后有人需要这个：

(defun haxe-string-to-x-string (s)
  (with-output-to-string
    (let (current parts)
      (dotimes (i (length s))
        (if (> 0 (multibyte-char-to-unibyte (aref s i)))
            (progn
              (setq current (encode-coding-string
                             (char-to-string (aref s i)) 'utf-8))
              (dotimes (j (length current))
                (princ (format "\\x%02x" (aref current j)))))
          (princ (format "\\x%02x" (aref s i))))))))

score 5 · Accepted Answer

encode-coding-string可能是您正在寻找的：

*** Welcome to IELM ***  Type (describe-mode) for help.
ELISP> (encode-coding-string "eĥoŝanĝo ĉiuĵaŭde" 'utf-8)
"e\304\245o\305\235an\304\235o \304\211iu\304\265a\305\255de"

它返回一个字符串，但您可以使用以下命令访问各个字节aref：

ELISP> (aref (encode-coding-string "eĥoŝanĝo ĉiuĵaŭde" 'utf-8) 1)
196
ELISP> (format "%o" 196)
"304"

或者如果你不介意使用cl函数，concatenate是你的朋友：

ELISP> (concatenate 'list (encode-coding-string "eĥoŝanĝo ĉiuĵaŭde" 'utf-8))
(101 196 165 111 197 157 97 110 196 157 111 32 196 137 105 117 196 181 97 197 173 100 101)

emacs - 将 Unicode (UTF-8) 代码点转换为字节

1 回答 1

Related

Reference