file - 如何使用 TCL 的 exec 复制名称中带有特殊字符的文件？

Question

我正在尝试通过exec命令在我们的平台上上传包含特殊字符的文件，但这些字符总是被解释并且它失败了。

例如，如果我尝试上传mémo.txt文件，则会收到以下错误：

/bin/cp: 无法创建常规文件`/path/to/dir/ m\351mo.txt ': 没有这样的文件或目录

UTF8 在系统上已正确配置，如果我在 shell 上运行命令，它工作正常。

这是TCL代码： exec /bin/cp $tmp_filename $dest_path

我怎样才能让它工作？

score 2 · Accepted Answer

问题的核心是使用何种编码与操作系统进行通信。对于exec和文件名，该编码是encoding system命令返回的任何内容（Tcl 很好地猜测了 Tcl 库启动时的正确值，但偶尔会出错）。在我的计算机上，该命令返回utf-8（正确！）传递给操作系统（并从操作系统接收）的字符串是 UTF-8。

您应该能够使用file copycommand 而不是 doing exec /bin/cp，这在这里会有所帮助，因为它的技巧层级更少（它避免了通过可能强加其自身问题的外部程序）。我们假设正在这样做：

set tmp_filename "foobar.txt";  # <<< fill in the right value, of course
set dest_path "/path/to/dir/mémo.txt"
file copy $tmp_filename $dest_path

如果失败，我们需要找出原因。最可能的问题与编码有关，并且可能以多种可怕的交互方式出错。唉，细节很重要。特别是，路径的编码取决于实际的文件系统（它是创建文件系统时的正式参数），并且当您在另一个挂载中有一个挂载时，路径的各个部分之间的 Unix 可能会有所不同。

如果最坏的情况发生，您可以将 Tcl 置于 ISO 8859-1 模式，然后自己进行所有编码（因为 ISO 8859-1 是“只使用我告诉你的字节”编码）；encoding convertto在这种情况下也很有用。请注意，这可能会生成给其他程序带来麻烦的文件名，但它至少可以让您解决问题。

encoding system iso98859-1
file copy $tmp_filename [encoding convertto utf-8 $dest_path]

在这种情况下，可能需要注意正确转换路径的不同部分：您对正在发生的事情负全部责任。

如果你在 Windows 上，请让 Tcl 处理细节。Tcl 直接使用 Wide (Unicode) Windows API，因此您可以假装不存在这些问题。（还有其他问题。）

在 macOS 上，请不要理会，encoding system因为它是正确的。Mac 对编码有一种非常固执的方法。

score 1 · Accepted Answer

I already tried the file copy command but it says error copying "/tmp/file7k5kqg" to "/path/to/dir/mémo.txt": no such file or directory

My reading of your problem is that, for some reason, your Tcl is set to iso8859-1 ([encoding system]), while the executing environment (shell) is set to utf-8. This explains why Donal's suggestion works for you:

encoding system iso8859-1
file copy $tmp_filename [encoding convertto utf-8 $dest_path]

This will safely pass utf-8 encoded bytearray down to any syscall: é or \xc3\xa9 or \u00e9. Watch:

% binary encode hex [encoding convertto utf-8 é] 
c3a9
% encoding system iso8859-1; exec xxd << [encoding convertto utf-8 é] 
00000000: c3a9                                     ..

This is equivalent to [encoding system] also being set to utf-8 (as to be expected in an otherwise utf-8 environment):

% encoding system
utf-8
% exec xxd << é
00000000: c3a9                                     ..

What you are experiencing (without any intervention) seems to be a re-coding of the Tcl internal encoding to iso8859-1 on the way out from Tcl (because of [encoding system], as Donal describes), and a follow-up (and faulty) re-coding of this iso8859-1 value into the utf-8 environment.

Watch the difference (\xe9 vs. \xc3\xa9):

% encoding system iso8859-1
% encoding system
iso8859-1
%  exec xxd << é
00000000: e9

The problem it then seems is that \xe9 is to be interpreted in your otherwise utf-8 env, like:

$ locale
LANG="de_AT.UTF-8"
...
$ echo -ne '\xe9'
?
$ touch `echo -ne 'm\xe9mo.txt'`
touch: m?mo.txt: Illegal byte sequence
$ touch mémo.txt
$ ls mémo.txt 
mémo.txt
$ cp `echo -ne 'm\xe9mo.txt'` b.txt
cp: m?mo.txt: No such file or directory

But:

$ cp `echo -ne 'm\xc3\xa9mo.txt'` b.txt
$ ls b.txt
b.txt

Your options:

(1) You need to find out why Tcl picks up iso8859-1, to begin with. How did you obtain your installation? Self-compiled? What are the details (version)?

(2) You may proceed as Donal suggests, or alternatively, set encoding system utf-8 explicitly.

encoding system utf-8
file copy $tmp_filename $dest_path

file - 如何使用 TCL 的 exec 复制名称中带有特殊字符的文件？

2 回答 2

Related

Reference