Spark Thrift 服务器尝试在通过 JDBC 传输之前将完整数据集加载到内存中,在 JDBC 客户端上我收到错误:
SQL Error: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)
org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)
org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)
查询:从表中选择 *。是否可以为 Thrift Server 启用流模式之类的功能?主要目标 - 使用 SparkSQL 通过 JDBC 连接授予从 Pentaho ETL 到 Hadoop 集群的访问权限。但是如果 Thrift Server 应该在传输之前将完整的数据集加载到内存中,这种方法将不起作用。