ReadFromText
有没有办法使用Python中的转换来读取多行 csv 文件?我有一个包含一行的文件,我试图让 Apache Beam 将输入读取为一行,但无法使其正常工作。
def print_each_line(line):
print line
path = './input/testfile.csv'
# Here are the contents of testfile.csv
# foo,bar,"blah blah
# more blah blah",baz
p = apache_beam.Pipeline()
(p
| 'ReadFromFile' >> apache_beam.io.ReadFromText(path)
| 'PrintEachLine' >> apache_beam.FlatMap(lambda line: print_each_line(line))
)
# Here is the output:
# foo,bar,"blah blah
# more blah blah",baz
上面的代码将输入解析为两行,即使多行 csv 文件的标准是将多行元素包含在双引号中。