我正在尝试打开一个上传到 dbfs 位置的文件。但是,在尝试打开文件时出现错误,但在执行 ls 时可以看到该文件。将文件读取到 RDD 时也没有问题。有人可以解释 dbfs 的行为吗?在浏览了文档后,我也尝试了几次。这是我遵循的文档。
#ls
dbutils.fs.ls("/tmp/sample.txt")
Out[82]: [FileInfo(path='dbfs:/tmp/sample.txt', name='sample.txt', size=46044136)]
#creating RDD from the txt file
data_file = "/tmp/sample.txt"
raw_data = sc.textFile(data_file)
raw_data.take(1)
Out[99]: ["Oct 12 2009 \tNice trendy hotel location not too bad...........\t"]
#open the txt file
with open ("/tmp/sample.txt" , 'r') as f:
for i, line in enumerate (f):
if (i%10000==0):
print("read {0} reviews".format(i))
print (gensim.utils.simple_preprocess(line))
FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/tmp/sample.txt'
#as per documentation
with open ("/dbfs/tmp/sample.txt" , 'r') as f:
for i, line in enumerate (f):
if (i%10000==0):
print("read {0} reviews".format(i))
print (gensim.utils.simple_preprocess(line))
FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/tmp/sample.txt'
一直在为此挠头。任何帮助将不胜感激。
PS 如果有帮助,我正在使用 Databricks 的社区版。