python - 如何指定要分配给 hive 中的查询的最大 vcores？

Question

我在蜂巢上运行多个查询。我有一个有 6 个节点的 Hadoop 集群。集群中的总 vcore 为 21。

我只需要将 2 个内核分配给一个 python 进程，以便其余可用内核将由另一个主进程使用。

代码

from pyhive import hive
hive_host_name = "subdomain.domain.com"
hive_port = 20000
hive_user = "user"
hive_password = "password"
hive_database = "database"

conn = hive.Connection(host=hive_host_name, port=hive_port,username=hive_user, database=hive_database, configuration={})
cursor = conn.cursor()
cursor.execute('select count(distinct field) from somedata')

score 3 · Accepted Answer

尝试在配置映射中传递以下设置：

yarn.nodemanager.resource.cpu-vcores=2

此设置的默认值为 8。

描述：Number of CPU cores that can be allocated for containers.

您更新的代码将如下所示：

from pyhive import hive
hive_host_name = "subdomain.domain.com"
hive_port = 20000
hive_user = "user"
hive_password = "password"
hive_database = "database"
configuration = {
    "yarn.nodemanager.resource.cpu-vcores": 2
}

conn = hive.Connection( \
                       host=hive_host_name,
                       port=hive_port,
                       username=hive_user,
                       database=hive_database,
                       configuration=configuration
                      )
cursor = conn.cursor()
cursor.execute('select count(distinct field) from somedata')

参考网址

python - 如何指定要分配给 hive 中的查询的最大 vcores？

1 回答 1

Related

Reference