0

我正在尝试将我的数据框转换为 json 字符串。我正在使用 pyspark。

这是我正在使用的代码。

def produceTrainData (self,csvData): #Array[String] = {

    trainData = csvData.withColumn("therapyClass", lit("REMODULIN"))\
                .withColumn("patientAge", lit(52))\
                .withColumn("patientSex", lit("M"))\
                .withColumn("serviceType", lit("PHARMACY"))\
                .withColumn("npiId", lit("27"))\
                .withColumn("requestID", lit(419568891))\
                .withColumn("requestDateTime", lit("20171909 21:30:55"))\

    selectData = trainData.select("payorId", "patientId","therapyType","therapyClass","ndcNumber","procedureCode","patientAge","patientSex",
                        "placeOfService", "serviceDuration","daysOrUnits","charges", "serviceDate", "serviceType","serviceBranchId",
                        "npiId", "diagnosisCode", "authNbr","requestID", "requestDateTime")

    authNbrFilter = col("authNbr") != "-"        
    filterData = selectData.where(authNbrFilter)#.limit(20)
    print(filterData)

    filterData.show(20,False)

    jsons = filterData.toJSON        

    print(jsons)

有两个错误:

  1. 当我打印 jsons 变量 (print(jsons)) 时,它不会像预期的那样返回 rdd,而是返回:绑定方法 DataFrame.toJSON of DataFrame

我很高兴知道错误的原因。

  1. 当我试图收集 jsons 变量时显示下一个错误:AttributeError:'function' object has no attribute 'collect'。

你知道这个错误的原因是什么吗?

4

0 回答 0