我对 Scalding 相当陌生,我正在尝试编写一个将 2 个数据集作为输入的 scalding 程序:1) book_id_title: ('id,'title): 包含书 ID 和书名之间的映射,两者都是字符串。2) book_sim: ('id1, 'id2, 'sim):包含书籍对之间的相似性,由它们的ID标识。
scalding 程序的目标是通过查找 book_id_title 表将 book_ratings 中的每个 (id1, id2) 替换为它们各自的标题。但是,我无法检索标题。如果有人可以帮助使用下面的 getTitle() 函数,我将不胜感激。
我的烫伤代码如下:
// read in the mapping between book id and title from a csv file
val book_id_title =
Csv(book_file, fields=book_format)
.read
.project('id,'title)
// read in the similarity data from a csv file and map the ids to the titles
// by calling getTitle function
val result =
book_sim
.map(('id1, 'id2)->('title1, 'title2)) {
pair:(String,String)=> (getTitle(pair._1), getTitle(pair._2))
}
.write(out)
// function that searches for the id and retrieves the title
def getTitle(search_id: String) = {
val btitle =
book_id_title
.filter('id){id:String => id == search_id} // extract row matching the id
.project('title) // get the title
}
谢谢