给定一个混合编码的损坏文件(例如 utf-8 和 latin-1),我如何配置 Emacs 以在保存文件时将其所有符号“投影”到单一编码(例如 utf-8)?
我做了以下功能来自动化一些清洁,但我想我可以在某处找到将一个编码中的符号“é”映射到 utf-8 中的“é”的信息,以改进此功能(或有人已经写过这样的功能)。
(defun jyby/cleanToUTF ()
"Cleaning to UTF"
(interactive)
(progn
(save-excursion (replace-regexp "अ" ""))
(save-excursion (replace-regexp "आ" ""))
(save-excursion (replace-regexp "ॆ" ""))
)
)
(global-unset-key [f11])
(global-set-key [f11] 'jyby/cleanToUTF)
我有许多混合编码“损坏”的文件(由于从具有错误字体配置的浏览器复制粘贴),产生以下错误。有时我会手动清理它们,方法是用“”或适当的字符搜索和替换每个有问题的符号,或者更快地将“utf-8-unix”指定为编码(下次编辑和保存时会提示相同的消息文件)。这已成为一个问题,因为在任何此类损坏的文件中,任何重音字符都会被每次保存时大小翻倍的序列替换,最终文件大小翻倍。我正在使用 GNU Emacs 24.2.1
These default coding systems were tried to encode text
in the buffer `test_accents.org':
(utf-8-unix (30 . 4194182) (33 . 4194182) (34 . 4194182) (37
. 4194182) (40 . 4194181) (41 . 4194182) (42 . 4194182) (45
. 4194182) (48 . 4194182) (49 . 4194182) (52 . 4194182))
However, each of them encountered characters it couldn't encode:
utf-8-unix cannot encode these: ...
Click on a character (or switch to this window by `C-x o'
and select the characters by RET) to jump to the place it appears,
where `C-u C-x =' will give information about it.
Select one of the safe coding systems listed below,
or cancel the writing with C-g and edit the buffer
to remove or modify the problematic characters,
or specify any other coding system (and risk losing
the problematic characters).
raw-text emacs-mule no-conversion