我是 spark 新手,并且对基于数据类型验证的过滤数据框有疑问。我想过滤和删除没有预期数据类型内容的行。
我试过过滤,
df.filter(row => row.schema("id").dataType == "IntegerType")
但没有用。
还尝试了以下代码:
val fileSchema = StructType(Array(StructField("english", IntegerType, true), StructField("maths", IntegerType, true)))
var columnsToValidate1: Set[StructField] = Set[StructField]()
fileSchema.iterator.foreach(structField => {
//skipping columns having string datatype as it will not create any parsing issue
if (structField.dataType != StringType)
columnsToValidate1 = columnsToValidate1.+(structField)
})
val varList = new ArrayList[String]
columnsToValidate1.foreach(x => {
val columnName = x.name;
val columnDataType = x.dataType.toString();
val stringToBeAdd = "df(\"".concat(columnName).concat("\").cast(").concat(columnDataType).concat(")")
varList.add(stringToBeAdd)
})
val arrayVal = varList.toString();
val query = arrayVal.substring(1, arrayVal.length()-1);
dataframe.select(query));
但 dataframe.select 不适用于生成的查询。但它的工作正常
val newDF = df.select(df("maths").cast(IntegerType), df("english").cast(IntegerType))
例如:输入数据帧:
+----+----+---------+---------+
|name| id| email| company|
+----+----+---------+---------+
| n1| 1|n1@c1.com| xyz |
| n2| 2|n2@c1.com| 23.45 |
| n3| mnq|n3@c1.com| abc |
输出应该是:
+----+----+---------+---------+
|name| id| email| company|
+----+----+---------+---------+
| n1| 1|n1@c1.com| xyz |
提前谢谢你