我正在尝试将我的数据框转换为 json 字符串。我正在使用 pyspark。
这是我正在使用的代码。
def produceTrainData (self,csvData): #Array[String] = {
trainData = csvData.withColumn("therapyClass", lit("REMODULIN"))\
.withColumn("patientAge", lit(52))\
.withColumn("patientSex", lit("M"))\
.withColumn("serviceType", lit("PHARMACY"))\
.withColumn("npiId", lit("27"))\
.withColumn("requestID", lit(419568891))\
.withColumn("requestDateTime", lit("20171909 21:30:55"))\
selectData = trainData.select("payorId", "patientId","therapyType","therapyClass","ndcNumber","procedureCode","patientAge","patientSex",
"placeOfService", "serviceDuration","daysOrUnits","charges", "serviceDate", "serviceType","serviceBranchId",
"npiId", "diagnosisCode", "authNbr","requestID", "requestDateTime")
authNbrFilter = col("authNbr") != "-"
filterData = selectData.where(authNbrFilter)#.limit(20)
print(filterData)
filterData.show(20,False)
jsons = filterData.toJSON
print(jsons)
有两个错误:
- 当我打印 jsons 变量 (print(jsons)) 时,它不会像预期的那样返回 rdd,而是返回:绑定方法 DataFrame.toJSON of DataFrame
我很高兴知道错误的原因。
- 当我试图收集 jsons 变量时显示下一个错误:AttributeError:'function' object has no attribute 'collect'。
你知道这个错误的原因是什么吗?