1

I am facing problem when moving files between two HDFS folders in a spark application. We are using Spark 2.1 version and Scala as programming language. I imported org.apache.hadoop.fs package and 'rename' method as a work around for moving files as I couldn't find method to 'move files between hdfs folders' in that package. Code is as below.

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}          

def move_files(fileName, fromLocation:String, toLocation:String, spark: SparkSession): Unit = {
    val conf = spark.sparkContext.hadoopConfiguration
    val fs = FileSystem.get(conf)
    
    val file_source = new Path(fromLocation + "/" + fileName)
    println(file_source)   
    val file_target = new Path(toLocation + fileName)
    println(file_target)  
 
    try {
    fs.rename(file_source, file_target)
    } catch {
    case e: Exception => println(e); println("Exception moving files between folders")
    }
}

the move files method is called in another method which has other application logic and I need to remove required files from source directory before proceeding with the logic.

def main () {
    /*
    logic
    */
    move files (abc.xml, /location/dev/file_folder_source, /location/dev/file_folder_target, spark)
    /*
    logic
    */
}

That move_files step is getting executed, without any errors but file is not moved out from source folder to target folder. Program Execution is moving on with the logic which is erroring out due to presence of bad files in the source folder. Please suggest any other way to move files between folders in hdfs or point out where I am doing mistake in the above code.

4

1 回答 1

1

api fs.rename(file_source, file_target)return boolean, iftrue表示成功移动文件。false表示文件未移动。

正在成功执行,因为使用的move_filesapi 在无法移动文件的情况下不会失败。它只是返回false并继续执行。您需要明确检查代码中的条件。

使用fs.renameapi,您需要创建目标目录,然后只给出目标目录路径。如下所示:

val file_target = new Path("toLocation")
fs.mkdirs(file_target)
fs.rename(file_source, file_target)

看到这一行val file_target = new Path("toLocation")它只包含目录路径而不是文件名。

于 2018-06-21T01:53:24.453 回答