I need to get the hosts (so the actual machines) where the differente tasks (mapper and reducer) of a Hadoop Job run. So I got a long running job and I need to retrieve the hosts where the tasks are currently running. I need this information in an external programm, so not inside the actual jobs.
I know that I can use hadoop job -list-attempt-ids job_201307251119_0004 map running
to get the task attempts, but this does not show me the hosts.
I also know that I can use the JobClient
to retrieve the host of a finished task. But in my case, the task is still running.
The only solution which came to my mind was to parse the Job-Tracker-HTTP-Interface HTML page which contains the host in the URLs which point to the log-files. But this does not seem like the right way to go, what are the alternatives?