7

我有一个很大的问题,在网上找不到任何帮助:

我将一个页面从一个网站从 OSX 移动到 Linux(两个系统都在 de_DE.UTF-8 中运行)并在一个非常未知的问题中运行:一些文件不再找到,但显然存在于硬盘驱动器上(可见)同名。所有这些文件都包含德语变音符号。

我拿了一张样本图片,从网页上复制了原始的 request-uri 并直接调用它——同样的错误。重写文件名后它起作用了。是的,我没有打错!

这让我很吃惊,我查看了 apache-log,在其中发现了这些条目:

192.168.56.10 - - [27/Aug/2012:20:03:21 +0200] "GET /images/Sch%C3%B6ne-Lau-150x150.jpg HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.1"
192.168.56.10 - - [27/Aug/2012:20:03:57 +0200] "GET /images/Scho%CC%88ne-Lau-150x150.jpg HTTP/1.1" 404 4205 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.1"

那是我要调查的事情......这是我在 UTF8 图表http://www.utf8-chartable.de/中找到的内容:

ö   c3 b6   LATIN SMALL LETTER O WITH DIAERESIS
¨   cc 88   COMBINING DIAERESIS

我想你已经听说过死键:http ://en.wikipedia.org/wiki/Dead_key如果没有,请阅读文章。这很有趣;)

这是否意味着 OSX 将所有变音符号与字母分开?这是否真的意味着,OSX 将字符 ö 保存为 o 和 ¨ 而不是使用组合产生的真实字符?

如果是,你知道我可以用来重命名这些文件的好脚本吗?这不会是我从 OSX 迁移到 Linux 的第一页……

4

3 回答 3

13
于 2012-08-27T18:51:47.577 回答
2

Thanks, Jon Hanna for much background-information here! This was important to get the full answer: a way to convert from the one to the other normalisation form.

As my changes are in the filesystem (because of file-upload) that is linked in the database, I now have to update my database-dump. The files got already renamed during the move (maybe by the FTP-Client ...)

Command line tools to convert charsets on Linux are:

  • iconv - converting the content of a stream (maybe a file)
  • convmv - converting the filenames in a directory

The charset utf-8-mac (as described in http://loopkid.net/articles/2011/03/19/groking-hfs-character-encoding), I could use in iconv, seems to exist just on OSX systems and so I have to move my sql-dump to my mac, convert it and move it back. Another option would be to rename the files back using convmv to NFD, but this would more hinder than help in the future, I think.

The tool convmv has a build-in (os-independent) option to enforcing NFC- or NFD-compatible filenames: http://www.j3e.de/linux/convmv/man/

PHP itself (the language my system - Wordpress is based on) supports a compatibility-layer here: In PHP, how do I deal with the difference in encoded filenames on HFS+ vs. elsewhere? After I fixed this issue for me, I will go and write some tests and may also write a bug-report to Wordpress and other systems I work with ;)

于 2012-08-27T22:16:49.030 回答
1

Linux distros treat filenames as binary strings, meaning no encoding is assumed - though the graphical shell (Gnome, KDE, etc) might make some assumptions based on environment variables, locale, etc.

OS-X on the other hand requires or enforces (I forget which) their own version of UTF-8 with Unicode normalization to expand all diacritics into combining characters.

On Linux when people do use Unicode in filenames they tend to prefer UTF-8 with precomposed characters when it comes to diacritics.

于 2012-08-27T18:54:28.307 回答