0

我是 spark 新手,并且对基于数据类型验证的过滤数据框有疑问。我想过滤和删除没有预期数据类型内容的行。

我试过过滤, df.filter(row => row.schema("id").dataType == "IntegerType")但没有用。

还尝试了以下代码:

val fileSchema = StructType(Array(StructField("english", IntegerType, true), StructField("maths", IntegerType, true)))
    var columnsToValidate1: Set[StructField] = Set[StructField]()
    fileSchema.iterator.foreach(structField => {
      //skipping columns having string datatype as it will not create any parsing issue
      if (structField.dataType != StringType)
        columnsToValidate1 = columnsToValidate1.+(structField)
    })

 val varList = new ArrayList[String]
    
    columnsToValidate1.foreach(x => {
      val columnName = x.name;
      val columnDataType = x.dataType.toString();

      val stringToBeAdd = "df(\"".concat(columnName).concat("\").cast(").concat(columnDataType).concat(")")

      varList.add(stringToBeAdd)
    })
    
    val arrayVal = varList.toString();
       
    val query = arrayVal.substring(1, arrayVal.length()-1);
   
    dataframe.select(query));

但 dataframe.select 不适用于生成的查询。但它的工作正常

val newDF = df.select(df("maths").cast(IntegerType), df("english").cast(IntegerType))

例如:输入数据帧:

+----+----+---------+---------+
|name|  id|    email|  company|
+----+----+---------+---------+
|  n1|   1|n1@c1.com| xyz     |
|  n2|   2|n2@c1.com| 23.45   |
|  n3| mnq|n3@c1.com| abc     |

输出应该是:

+----+----+---------+---------+
|name|  id|    email|  company|
+----+----+---------+---------+
|  n1|   1|n1@c1.com| xyz     |

提前谢谢你

4

0 回答 0