0

The problem is that the jar files in the /tmp directory (of my client) are not cleaned up after the job is done. I have the following (simplified) code:

public void run() throws IOException {
    PigServer pigServer = null;
    try {
        StringBuilder sb = new StringBuilder();
        // ... some methods that add to the sb ...

        pigServer = new PigServer(ExecType.MAPREDUCE);
        pigServer.setBatchOn();
        pigServer.registerQuery(sb.toString());                     

        // execute and discard the batch
        pigServer.executeBatch();
        pigServer.discardBatch();
    } finally {
        if (pigServer != null) {
            pigServer.shutdown();
        }
    }
}

To my understanding the pigServer.shutdown() should remove all my temporary files in /tmp. After the job is done, however, my /tmp directory is full of Job9196419177728780689.jar files and an empty pig8776538161976852388tmp subdirectory.

When debugging, I see that a lot of jobs on the (remote) hadoop cluster are being deleted, plus one attempt to delete /tmp/temp2071202241 (local). This is not an existing directory on my local system, it seems.

The files do get deleted after I shut down the VM, but this is obviously not what I want to do after every job. Am I missing something?

Edit: I am not the only one with this problem, the issue is filed under https://issues.apache.org/jira/browse/PIG-3338

Edit 2: Possible solution (not by me): http://www.lopakalogic.com/articles/hadoop-articles/pig-keeps-temp-files/

4

1 回答 1

1

你的分析是正确的。Pig 创建一个临时文件,File.createTempFileFile.deleteOnExit在 VM 关闭时将其删除。请参阅此处的代码。

为每个 Pig 脚本启动一个 VM 怎么样?

于 2013-07-01T09:24:52.110 回答