我正在消化其他一些 zip 文件的内容以生成 MD5。对文件内容进行摘要并生成 MD5,而不是根据时间戳生成 MD5。所以我会断言两个文件具有相同的内容,即使它们是在不同的时间产生的。因此,我编写了以下Java
方法:
public String digest( ZipInputStream entry ) throws IOException{
byte[] digest = null;
MessageDigest md5 = null;
String mdEnc = "";
try {
md5 = MessageDigest.getInstance( "MD5" );
ZipEntry current;
if( entry != null ) {
while(( current = entry.getNextEntry() ) != null ) {
if( current.isDirectory() ) {
digest = this.encodeUTF8( current.getName() );
md5.update( digest );
}
else{
int size = ( int )current.getSize();
if(size > 0){
digest = new byte[ size ];
entry.read( digest, 0, size );
md5.update( digest );
}
}
}
digest = md5.digest();
mdEnc = new BigInteger( 1, md5.digest() ).toString( 16 );
entry.close();
}
}
catch ( NoSuchAlgorithmException e ) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return mdEnc;
}
public byte[] encodeUTF8( String name ) {
final Charset UTF8_CHARSET = Charset.forName( "UTF-8" );
return name.getBytes( UTF8_CHARSET );
}
该方法运行良好,直到我推出了具有Chinese、Korean和Japanese编码的 zip 文件(不仅是传统的UTF-8
):
Processing :mrl_l10n.zip
MD5 A: d41d8cd98f00b204e9800998ecf8427e
MD5 B: d41d8cd98f00b204e9800998ecf8427e
They Match
Processing :fcm.zip
MD5 A: d41d8cd98f00b204e9800998ecf8427e
MD5 B: d41d8cd98f00b204e9800998ecf8427e
They Match
Processing :1_mrm_root.zip
Exception in thread "main" java.lang.IllegalArgumentException
at java.util.zip.ZipInputStream.getUTF8String(Unknown Source)
at java.util.zip.ZipInputStream.getFileName(Unknown Source)
at java.util.zip.ZipInputStream.readLOC(Unknown Source)
at java.util.zip.ZipInputStream.getNextEntry(Unknown Source)
at Tczip.digest(Tczip.java:98)
at Tczip.execute(Tczip.java:33)
有人知道我该如何为此创建解决方法吗?