这是我通过 UI 在 Dataproc 中作为 PySpark 作业提交的文件
# Load file data fro Google Cloud Storage to Dataproc cluster, creating an RDD
# Because Spark transforms are 'lazy', we do a 'count()' action to make sure
# we successfully loaded the main data file
allFlt = sc.textFile("gs://mybucket/mydatafile")
allFlt.count()
# Remove header from file so we can work w data ony
header = allFlt.take(1)[0]
dataOnly = allFlt.filter(lambda line: line != header)
它开始,然后出错
allFlt = sc.textFile("gs://thomtect/flightinfo")
NameError: name 'sc' is not defined
为什么是这样?Dataproc 不应该已经建立了火花上下文吗?我需要在我的代码中添加什么以便将其作为 Spark 命令接受