The fsck tool provides an easy way to find out which blocks are in any particular file. For example:
% hadoop fsck <path> -files -blocks -locations -racks
Reference : Hadoop Command Line Guide.
Edit:
An input split is a chunk of the input that is processed by a single map. Each map processes a single split. Each split is divided into records, and the map processes each record a key-value pair in turn. Splits and records are logical but HDFS blocks are physical.
An InputSplit
has a length in bytes and a set of storage locations, which are just hostname strings. A split doesn’t contain the input data; it is just a reference to the data.
You can get InputSplit
instance in map
method.
InputSplit inputSplit=context.getInputSplit(); //Input split instance
String[] splitLocations = inputSplit.getLocations();