对大量压缩文本文件(1000 多个文件,大小在 100MB 和 1.5GB 之间)使用TextIO.Read转换,我们有时会收到以下错误:
java.util.zip.ZipException: too many length or distance symbols at
java.util.zip.InflaterInputStream.read(InflaterInputStream.java:164) at
java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117) at
java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at
java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at
java.io.BufferedInputStream.read(BufferedInputStream.java:345) at
java.io.FilterInputStream.read(FilterInputStream.java:133) at
java.io.PushbackInputStream.read(PushbackInputStream.java:186) at
com.google.cloud.dataflow.sdk.runners.worker.TextReader$ScanState.readBytes(TextReader.java:261) at
com.google.cloud.dataflow.sdk.runners.worker.TextReader$TextFileIterator.readElement(TextReader.java:189) at
com.google.cloud.dataflow.sdk.runners.worker.FileBasedReader$FileBasedIterator.computeNextElement(FileBasedReader.java:265) at
com.google.cloud.dataflow.sdk.runners.worker.FileBasedReader$FileBasedIterator.hasNext(FileBasedReader.java:165) at
com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:169) at
com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.start(ReadOperation.java:118) at
com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:66) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:204) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:151) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:118) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:139) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:124) at
java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at
java.lang.Thread.run(Thread.java:745)
在线搜索相同的ZipException,仅导致此回复:
当热部署程序在应用程序完全复制到部署目录之前尝试部署应用程序时,通常会发生 Zip 文件错误。如果复制文件需要几秒钟,这很常见。解决方法是将文件复制到与应用服务器相同的磁盘分区上的临时目录,然后将文件移动到部署目录。
有没有其他人遇到过类似的例外?或者无论如何我们可以解决这个问题?