我正在尝试通过exec
命令在我们的平台上上传包含特殊字符的文件,但这些字符总是被解释并且它失败了。
例如,如果我尝试上传mémo.txt文件,则会收到以下错误:
/bin/cp: 无法创建常规文件`/path/to/dir/ m\351mo.txt ': 没有这样的文件或目录
UTF8 在系统上已正确配置,如果我在 shell 上运行命令,它工作正常。
这是TCL代码:
exec /bin/cp $tmp_filename $dest_path
我怎样才能让它工作?
问题的核心是使用何种编码与操作系统进行通信。对于exec
和文件名,该编码是encoding system
命令返回的任何内容(Tcl 很好地猜测了 Tcl 库启动时的正确值,但偶尔会出错)。在我的计算机上,该命令返回utf-8
(正确!)传递给操作系统(并从操作系统接收)的字符串是 UTF-8。
您应该能够使用file copy
command 而不是 doing exec /bin/cp
,这在这里会有所帮助,因为它的技巧层级更少(它避免了通过可能强加其自身问题的外部程序)。我们假设正在这样做:
set tmp_filename "foobar.txt"; # <<< fill in the right value, of course
set dest_path "/path/to/dir/mémo.txt"
file copy $tmp_filename $dest_path
如果失败,我们需要找出原因。最可能的问题与编码有关,并且可能以多种可怕的交互方式出错。唉,细节很重要。特别是,路径的编码取决于实际的文件系统(它是创建文件系统时的正式参数),并且当您在另一个挂载中有一个挂载时,路径的各个部分之间的 Unix 可能会有所不同。
如果最坏的情况发生,您可以将 Tcl 置于 ISO 8859-1 模式,然后自己进行所有编码(因为 ISO 8859-1 是“只使用我告诉你的字节”编码);encoding convertto
在这种情况下也很有用。请注意,这可能会生成给其他程序带来麻烦的文件名,但它至少可以让您解决问题。
encoding system iso98859-1
file copy $tmp_filename [encoding convertto utf-8 $dest_path]
在这种情况下,可能需要注意正确转换路径的不同部分:您对正在发生的事情负全部责任。
如果你在 Windows 上,请让 Tcl 处理细节。Tcl 直接使用 Wide (Unicode) Windows API,因此您可以假装不存在这些问题。(还有其他问题。)
在 macOS 上,请不要理会,encoding system
因为它是正确的。Mac 对编码有一种非常固执的方法。
I already tried the file copy command but it says error copying "/tmp/file7k5kqg" to "/path/to/dir/mémo.txt": no such file or directory
My reading of your problem is that, for some reason, your Tcl is set to iso8859-1
([encoding system]
), while the executing environment (shell) is set to utf-8
. This explains why Donal's suggestion works for you:
encoding system iso8859-1
file copy $tmp_filename [encoding convertto utf-8 $dest_path]
This will safely pass utf-8
encoded bytearray down to any syscall: é
or \xc3\xa9
or \u00e9
. Watch:
% binary encode hex [encoding convertto utf-8 é]
c3a9
% encoding system iso8859-1; exec xxd << [encoding convertto utf-8 é]
00000000: c3a9 ..
This is equivalent to [encoding system]
also being set to utf-8
(as to be expected in an otherwise utf-8
environment):
% encoding system
utf-8
% exec xxd << é
00000000: c3a9 ..
What you are experiencing (without any intervention) seems to be a re-coding of the Tcl internal encoding to iso8859-1
on the way out from Tcl (because of [encoding system]
, as Donal describes), and a follow-up (and faulty) re-coding of this iso8859-1
value into the utf-8
environment.
Watch the difference (\xe9
vs. \xc3\xa9
):
% encoding system iso8859-1
% encoding system
iso8859-1
% exec xxd << é
00000000: e9
The problem it then seems is that \xe9
is to be interpreted in your otherwise utf-8
env, like:
$ locale
LANG="de_AT.UTF-8"
...
$ echo -ne '\xe9'
?
$ touch `echo -ne 'm\xe9mo.txt'`
touch: m?mo.txt: Illegal byte sequence
$ touch mémo.txt
$ ls mémo.txt
mémo.txt
$ cp `echo -ne 'm\xe9mo.txt'` b.txt
cp: m?mo.txt: No such file or directory
But:
$ cp `echo -ne 'm\xc3\xa9mo.txt'` b.txt
$ ls b.txt
b.txt
Your options:
(1) You need to find out why Tcl picks up iso8859-1
, to begin with. How did you obtain your installation? Self-compiled? What are the details (version)?
(2) You may proceed as Donal suggests, or alternatively, set encoding system utf-8
explicitly.
encoding system utf-8
file copy $tmp_filename $dest_path