cassandra - How to make workers to query only local cassandra nodes?

Question

Suppose I have several machines each having spark worker and cassandra node installed. Is it possible to require each spark worker to query only its local cassandra node (on the same machine), so that no network operation involved when I do joinWithCassandraTable after repartitionByCassandraReplica using spark-cassandra-connector, so each spark worker fetches data from its local storage?

score 2 · Accepted Answer

在 Spark-Cassandra 连接器内部，LocalNodeFirstLoadBalancingPolicy处理这项工作。它首先首选本地节点，然后检查同一 DC 中的节点。具体来说，本地节点是通过java.net.NetworkInterface在主机列表中找到与本地地址列表中的一个匹配的地址来确定的，如下所示：

private val localAddresses =
  NetworkInterface.getNetworkInterfaces.flatMap(_.getInetAddresses).toSet

/** Returns true if given host is local host */
def isLocalHost(host: Host): Boolean = {
  val hostAddress = host.getAddress
  hostAddress.isLoopbackAddress || localAddresses.contains(hostAddress)
}

此逻辑用于创建查询计划，该计划返回查询的候选主机列表。无论计划类型如何（令牌感知或不感知），列表中的第一个主机始终是本地主机（如果存在）。

cassandra - How to make workers to query only local cassandra nodes?

1 回答 1

Related

Reference