python - 在 tarfile 中解码 utf-8

翻译自：https://stackoverflow.com/questions/35651630 2016-02-26T12:22:07.383

1223 次

我有包含多字节字符（日语）的 tar 文件。我正在使用 libarchive 解压缩文件。tar 文件中的文件名使用 utf-8 编码。当我尝试解压缩文件时，结果总是丢失多字节字符。

我写了一个 python 脚本来实现我的结果

#!/usr/bin/python27

import tarfile
import pdb
def transform(data):
    u = data.decode('utf8')
    pdb.set_trace()
    #return u.encode('utf8')
    return u

tar = tarfile.open('abc.tar')
for m in tar.getmembers():
    print m.name
    m.name = transform(m.name)
    #print m.name

tar.extractall()

但是我想在 C++ 中实现相同的目标。这是cpp代码的摘录

while (entry = tar_file->nextEntry()) {
    fs::path filepath = path / entry->getFileName();  // loose the utf-8 character s here
    // So I tried the following 
    int wchars_num =  MultiByteToWideChar( CP_ACP , 0 , filepath.string().c_str() , -1, NULL , 0 );
    wchar_t* wstr = new wchar_t[wchars_num];

    //I tried UTF-8 as well in place of CP_ACP
    MultiByteToWideChar( CP_ACP , 0 , filepath.string().c_str() , -1, wstr , wchars_num );
    // But this did not help

python - 在 tarfile 中解码 utf-8

0 回答 0

Related

Reference