0

This question is a modified redux of this previous question:

how to decode ubyte[] to a specified encoding?

I'm looking for an idiomatic way to convert the ubyte[] array returned from a std.zip.ArchiveMember.expandedData attribute into a string or other range-able collection of strings... either the whole contents akin to calling File.open("file"), or something iterable in similar fashion to File.open("file").byLine().

So far everything I've found from the standard documentation that deals with character arrays or strings does not appreciate a ubyte[] argument, and the examples around D's zip file handling are very rudimentary, dealing only with getting raw data out of zip archives and their member files... with no obvious file/stream/io interface capable of being easily layered between the raw bytestream and text-oriented file/string manipulation.

I think I can find something in std.utf or std.uni to decode individual code points, and while/for-loop my way through the bytestream, but surely there might be a better way?

Code sample:

std.zip.ZipArchive zipFile;
// just humor me, this is what I've been given.
zipFile = new std.zip.ZipArchive("dataSet.csv.zip");
foreach(memberFile; zipFile.directory)
{
    zipFile.expand(memberFile);
    ubyte[] uByteArray = memberFile.expandedData;

    // ok, now what?
    // is there a relatively simplistic way to get this
    // decoded/translated byteStream into a string
    // or collection of strings(for example, one string per line
    // of the compressed file) ?

    string completeCsvContents = uByteArray.PQR();
    string[] csvRows = uByteArray.XYZ();
}

Is there anything that I could easily fill in for PQR or XYZ?

Or, if it's a matter of making an API call in the style of

string csvData = std.ABC.PQR(uByteArray)

What would ABC/PQR be?

4

2 回答 2

1

也许只是做

auto stuff = cast(char[]) memberFile.expandedData; 

使用结果char[] stuff时,无论如何它都会自动解码,例如在将其char[] stuff作为输入范围传递时调用范围原语的函数。

因为实际上既没有char[]也没有string被解码。只有dchar[]或者dstring是。

于 2015-12-19T10:44:25.703 回答
1

如果您知道字符串是 UTF-8 编码的,则可以使用std.string.assumeUTF将其转换为字符串/字符数组。正如嵌套类型所提到的,所有这些都是强制转换,但它是模式自我记录。

如果您需要确保生成的字符串实际上是有效的 UTF-8(因为有几个操作在无效字符串上具有未定义的行为),那么您可以使用std.utf.validate. assumeUTF在调试版本下也这样做。

于 2015-12-19T17:11:24.213 回答