0

我正在尝试从多个 HDFS .gz 文件中读取,但我只希望那些以昨天的日期作为文件名开头的文件。我的文件如下所示:

/notmy-data/openSourceDatasets/Temperatures/2013-06-10T133006.gz
/notmy-data/openSourceDatasets/Temperatures/2013-06-11T153006.gz
/notmy-data/openSourceDatasets/Temperatures/2013-06-11T173006.gz
/notmy-data/openSourceDatasets/Temperatures/2013-06-11T193006.gz

这就是我所拥有的...

DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");
Calendar cal = Calendar.getInstance();
cal.add(Calendar.DATE, -1);    
String yesterdate = dateFormat.format(cal.getTime());
Path tpath = new Path("/notmy-data/openSourceDatasets/Temperatures/" + yesterdate + "*");
FileStatus[] status = fileSystem.listStatus(tpath);
System.out.println("["+new Date()+"] Starting tempertaure ingest...");

for (int i=0;i<status.length;i++){
    BufferedReader reader =new BufferedReader(new InputStreamReader(new GZIPInputStream(fileSystem.open(status[i].getPath()))));
    String line;
while (null != (line = reader.readLine())){
        System.out.println(line);
    }

我试过这个有没有星星。我总是得到一个java.io.FileNotFoundException. 我究竟做错了什么??

4

2 回答 2

0

这可能不是最好的方法,但它确实有效....

Path tpath = new Path("/notmy-data/openSourceDatasets/Temperatures/");    ****changed****
FileStatus[] status = fileSystem.listStatus(tpath);
System.out.println("["+new Date()+"] Starting tempertaure ingest...");

for (int i=0;i<status.length;i++){
     ****added****
    String[] fileNameBits = status[i].getPath().toString().split("/"); 
String fileDate = fileNameBits[fileNameBits.length - 1].split("T")[0]; 
String yesterString = yesterdate.toString(); 
if (!fileDate.equals(yesterString)){
    continue;
}
    ****to here****
BufferedReader reader =new BufferedReader(new InputStreamReader(new GZIPInputStream(fileSystem.open(status[i].getPath()))));
String line;
while (null != (line = reader.readLine())){
    System.out.println(line);
}
于 2013-06-12T15:37:49.710 回答
-1

使用Files.newDirectoryStream()

// Does globbing for you!
final DirectoryStream<Path> dirstream 
    = Files.newDirectoryStream(Paths.get("yourBaseDir"), yesterdate + '*');

for (final Path path: dirstream)
    // do stuff with "path"

然而,真正的答案必须等到你说出那FileStatus是什么......

此外,要在对象上打开一个新BufferedReader对象Path,这比您所做的要容易得多:使用Files.newBufferedReader().

于 2013-06-12T13:59:45.103 回答