0

Spark Thrift 服务器尝试在通过 JDBC 传输之前将完整数据集加载到内存中,在 JDBC 客户端上我收到错误:

SQL Error: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)
  org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)
  org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)

查询:从表中选择 *。是否可以为 Thrift Server 启用流模式之类的功能?主要目标 - 使用 SparkSQL 通过 JDBC 连接授予从 Pentaho ETL 到 Hadoop 集群的访问权限。但是如果 Thrift Server 应该在传输之前将完整的数据集加载到内存中,这种方法将不起作用。

4

2 回答 2

3

解决方案:spark.sql.thriftServer.incrementalCollect=true

于 2018-11-03T07:59:55.537 回答
1

我的情况增加了火花驱动程序内存和最大结果大小为 spark.driver.memory=xG ,spark.driver.maxResultSize=xG。根据https://spark.apache.org/docs/latest/configuration.html

于 2018-11-03T12:12:20.957 回答