我有很多网址要处理。我将其中大约 20'000'000 个存储在哈希集中。这会造成一些记忆问题。
我试图创建一个压缩字符串类:
import java.io.*;//file writer
import java.util.*;
import java.util.zip.*;
class CompressedString2 implements Serializable{
private int originalSize;
private byte[] cstring;
public CompressedString2 (){
compress("");
}
public CompressedString2 (String string){
compress(string);
}
public void compress(String str){
try {
byte[] bytes = str.getBytes("UTF-8");
originalSize = bytes.length;
ByteArrayOutputStream deflatedBytes = new ByteArrayOutputStream();
DeflaterOutputStream dos = new DeflaterOutputStream(deflatedBytes,new Deflater(Deflater.DEFAULT_COMPRESSION));
dos.write(bytes);
dos.finish();
cstring=deflatedBytes.toByteArray();
}catch(Exception e){e.printStackTrace();}
}
public String decompress() throws Exception{
String result="";
try{
ByteArrayOutputStream deflatedBytes=new ByteArrayOutputStream();
deflatedBytes.write(cstring);
deflatedBytes.close();
InflaterInputStream iis = new InflaterInputStream(new ByteArrayInputStream(deflatedBytes.toByteArray()));
byte[] inflatedBytes = new byte[originalSize];
iis.read(inflatedBytes);
result= new String(inflatedBytes, "UTF-8");
}catch(Exception e){e.printStackTrace();}
return result;
}
}
但事实上,当我用这样的东西存储它们时:
HashSet<String> urlStr=new HashSet<String>();
HashSet<CompressedString> urlComp=new HashSet<CompressedString>();
String filePath=new String();
filePath=args[0];
int num=0;
try{
BufferedReader br = new BufferedReader(new FileReader(filePath));
String line = br.readLine();
while (line != null) {
num++;
urlStr.add(line);
urlComp.add(new CompressedString(line));
line = br.readLine();
}
} catch(Exception e){
System.out.println("fehler..:");
e.printStackTrace();
}
ObjectOutputStream oos1 = new ObjectOutputStream(new FileOutputStream("testDeflator_rawurls.obj"));
oos1.writeObject(urlStr);
ObjectOutputStream oos4 = new ObjectOutputStream(new FileOutputStream("testDeflator_compressed2.obj"));
oos4.writeObject(urlComp);
“压缩”的网址更大......
有人知道如何成功压缩网址吗?