The problem is that the jar files in the /tmp directory (of my client) are not cleaned up after the job is done. I have the following (simplified) code:
public void run() throws IOException {
PigServer pigServer = null;
try {
StringBuilder sb = new StringBuilder();
// ... some methods that add to the sb ...
pigServer = new PigServer(ExecType.MAPREDUCE);
pigServer.setBatchOn();
pigServer.registerQuery(sb.toString());
// execute and discard the batch
pigServer.executeBatch();
pigServer.discardBatch();
} finally {
if (pigServer != null) {
pigServer.shutdown();
}
}
}
To my understanding the pigServer.shutdown() should remove all my temporary files in /tmp. After the job is done, however, my /tmp directory is full of Job9196419177728780689.jar files and an empty pig8776538161976852388tmp subdirectory.
When debugging, I see that a lot of jobs on the (remote) hadoop cluster are being deleted, plus one attempt to delete /tmp/temp2071202241 (local). This is not an existing directory on my local system, it seems.
The files do get deleted after I shut down the VM, but this is obviously not what I want to do after every job. Am I missing something?
Edit: I am not the only one with this problem, the issue is filed under https://issues.apache.org/jira/browse/PIG-3338
Edit 2: Possible solution (not by me): http://www.lopakalogic.com/articles/hadoop-articles/pig-keeps-temp-files/