unicode - 通过 Applescript 中的分隔符打破 UTF-16 Unicode 文本？

Question

我有一个用 MacRoman 编码的文本列表，由换行符打破。不知何故，第二个列表无法保存在 MacRoman 中，所以我不得不使用 Unicode UTF-16 来获取德语“ö”、“ä”等。虽然 ListA 像预期的那样被填充，但 listB 不再被破坏，我最终得到一个字符串，我无法再破坏/不知道如何破坏。有人可以帮我吗？

set ListA to (read file myFile1 using delimiter linefeed) as list    
display dialog "" & item 1 of ListA    
--> "Name A" 

set ListB to (read file myFile2 using delimiter linefeed as Unicode text) as list    
display dialog "" & item 1 of ListB    
--> "Name A    
Name B    
Name C    
Name D"

score 1 · Accepted Answer

可以有许多不同类型的字符来分隔文本文件中的行。它并不总是换行。处理它们的最简单方法是使用 applescript 命令“paragraphs”，而不是在读取文件时使用分隔符。Paragraphs 非常擅长找出使用的字符并进行处理。它并不总是有效，但在你深入研究问题之前值得一试。因此，尝试像这样读取您的文件...

set ListB to paragraphs of (read file myFile2 as Unicode text)

如果这不起作用，那么您将不得不尝试找出角色是什么。在这些情况下，我所做的是物理打开文件并用鼠标选择返回字符......然后复制它。然后我回到 AppleScript Editor 并将其粘贴到这个命令中。将其粘贴到我有字母“a”的位置。它会给你角色ID。

id of "a"

然后你可以像这样使用分隔符读取文件，显然使用上面命令中的 id 号代替 97 ...

set ListB to read file myFile2 using delimiter (character id 97) as Unicode text

score 0 · Accepted Answer

您确定该文件使用 LF 行尾吗？这对我有用：

set f to POSIX file "/tmp/1"
set b to open for access f with write permission
set eof b to 0
write "あ" & linefeed & "い" to b as Unicode text -- UTF-16
close access b
read f using delimiter linefeed as Unicode text

您是否尝试将文件保存为 UTF-8？您可以通过替换为来阅读Unicode text它«class utf8»。

unicode - 通过 Applescript 中的分隔符打破 UTF-16 Unicode 文本？

2 回答 2

Related

Reference