7

有没有办法StringBuilder从 a创建一个byte[]

我想提高内存使用率,StringBuilder但我首先是 a byte[],所以我必须从 the 创建 aString然后从 thebyte[]创建StringBuilderString我不认为这个解决方案是最佳的。

谢谢

4

2 回答 2

16

基本上,您最好的选择似乎是直接使用CharsetDecoder

就是这样:

byte[] srcBytes = getYourSrcBytes();

//Whatever charset your bytes are endoded in
Charset charset = Charset.forName("UTF-8");
CharsetDecoder decoder = charset.newDecoder();

//ByteBuffer.wrap simply wraps the byte array, it does not allocate new memory for it
ByteBuffer srcBuffer = ByteBuffer.wrap(srcBytes);
//Now, we decode our srcBuffer into a new CharBuffer (yes, new memory allocated here, no can do)
CharBuffer resBuffer = decoder.decode(srcBuffer);

//CharBuffer implements CharSequence interface, which StringBuilder fully support in it's methods
StringBuilder yourStringBuilder = new StringBuilder(resBuffer);

添加:

经过一些测试,似乎简单new String(bytes)的速度要快得多,而且似乎没有比这更快的简单方法了。这是我运行的测试:

import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.text.ParseException;

public class ConsoleMain {
    public static void main(String[] args) throws IOException, ParseException {
        StringBuilder sb1 = new StringBuilder("abcdefghijklmnopqrstuvwxyz");
        for (int i=0;i<19;i++) {
            sb1.append(sb1);
        }
        System.out.println("Size of buffer: "+sb1.length());
        byte[] src = sb1.toString().getBytes("UTF-8");
        StringBuilder res;

        long startTime = System.currentTimeMillis();
        res = testStringConvert(src);
        System.out.println("Conversion using String time (msec): "+(System.currentTimeMillis()-startTime));
        if (!res.toString().equals(sb1.toString())) {
            System.err.println("Conversion error");
        }

        startTime = System.currentTimeMillis();
        res = testCBConvert(src);
        System.out.println("Conversion using CharBuffer time (msec): "+(System.currentTimeMillis()-startTime));
        if (!res.toString().equals(sb1.toString())) {
            System.err.println("Conversion error");
        }
    }

    private static StringBuilder testStringConvert(byte[] src) throws UnsupportedEncodingException {
        String s = new String(src, "UTF-8");
        StringBuilder b = new StringBuilder(s);
        return b;
    }

    private static StringBuilder testCBConvert(byte[] src) throws CharacterCodingException {
        Charset charset = Charset.forName("UTF-8");
        CharsetDecoder decoder = charset.newDecoder();
        ByteBuffer srcBuffer = ByteBuffer.wrap(src);
        CharBuffer resBuffer = decoder.decode(srcBuffer);
        StringBuilder b = new StringBuilder(resBuffer);
        return b;
    }
}

结果:

Size of buffer: 13631488
Conversion using String time (msec): 91
Conversion using CharBuffer time (msec): 252

以及 IDEONE 上的修改(内存消耗更少)版本:Here

于 2012-06-20T08:11:46.733 回答
4

如果它是您想要的简短语句,那么就没有办法绕过中间的 String 步骤。String 构造函数在一个非常常见的情况下为了方便而混合了转换和对象构造,但 StringBuilder 没有这种方便的构造函数。

如果您感兴趣的是性能,那么您可以通过使用以下方法来避免中间 String 对象:

new StringBuilder(Charset.forName(charsetName).decode(ByteBuffer.wrap(inBytes)))

如果您希望能够微调性能,您可以自己控制解码过程。例如,您可能希望通过使用 averageCharsPerByte 作为对需要多少内存的估计来避免使用过多的内存。如果估计值太短,您可以使用生成的 StringBuilder 来累积所有部分,而不是调整缓冲区的大小。

CharsetDecoder cd = Charset.forName(charsetName).newDecoder();
cd.onMalformedInput(CodingErrorAction.REPLACE);
cd.onUnmappableCharacter(CodingErrorAction.REPLACE);
int lengthEstimate = Math.ceil(cd.averageCharsPerByte()*inBytes.length) + 1;
ByteBuffer inBuf = ByteBuffer.wrap(inBytes);
CharBuffer outBuf = CharBuffer.allocate(lengthEstimate);
StringBuilder out = new StringBuilder(lengthEstimate);
CoderResult cr;
while (true) {
    cr = cd.decode(inBuf, outBuf, true);
    out.append(outBuf);
    outBuf.clear();
    if (cr.isUnderflow()) break;
    if (!cr.isOverflow()) cr.throwException();
}
cr = cd.flush(outBuf);
if (!cr.isUnderflow()) cr.throwException();
out.append(outBuf);

不过,我怀疑上述代码在大多数应用程序中是否值得付出努力。如果应用程序对性能感兴趣,它可能也不应该处理 StringBuilder,而是在缓冲区级别处理所有内容。

于 2012-06-20T08:56:24.567 回答