这是在 spark 中读取文件的函数:
def textFile(self, name, minPartitions=None, use_unicode=True):
"""
file system (available on all
nodes), or any Hadoop-supported file system URI, and return it as an
RDD of Strings.
If use_unicode is False, the strings will be kept as `str` (encoding
as `utf-8`), which is faster and smaller than unicode. (Added in
Spark 1.2)
>>> path = os.path.join(tempdir, "sample-text.txt")
>>> with open(path, "w") as testFile:
... _ = testFile.write("Hello world!")
>>> textFile = sc.textFile(path)
>>> textFile.collect()
[u'Hello world!']
"""
minPartitions = minPartitions or min(self.defaultParallelism, 2)
return RDD(self._jsc.textFile(name, minPartitions), self,
UTF8Deserializer(use_unicode))