“apache-spark-standalone”的相关标签问题

0 投票

0 回答

182 浏览

scala - Spark 2.0 独立模式 collect() 错误

我正在研究 Spark2.0(scala) 和 Play 框架。我正在使用 Intellij IDEA 运行这个独立模式的应用程序。我的应用程序与本地大师一起工作得很好

但是当我切换到独立集群模式时它不起作用

我在集群模式 spark-shell 上尝试了所有代码，一切都很好。我试图找出我的代码有什么问题，发现每个 rdd.map.collect 操作都失败了

这是我的申请失败的地方

简单计算.scala

回归.scala

我收到此错误消息

和这个

请帮我解决这个问题。

火花：2.0.0

标量：2.11.8

播放：2.5.X

2016-10-15T06:18:53.177

0 投票

0 回答

442 浏览

apache-spark - 具有多个主控的独立模式下的火花应用程序

我在 spark 上创建了 1 个应用程序。此应用程序用作 http 服务器，并与 cassandra db 结合使用。在我的情况下，应用程序通过 spark-submit 命令一直在一个主服务器上运行，3 个从属服务器在 3 个不同的集群上。

我想使用 zookeeper 来获得高可用性。现在我可以在每个集群中启动一个主服务器，如果服务器死机，另一个会牵手成为主服务器。这是完美的。但是，我的问题是，我的应用程序会发生什么。当服务器死机时，它会停止我的应用程序。服务器死机时是否可以使我的应用程序保持活动状态？我错过了配置中的某些内容吗？

你可以帮帮我吗？

非常感谢您未来的回答:)

apache-spark high-availability cassandra-2.1 apache-zookeeper apache-spark-standalone

2016-10-24T14:17:22.350

0 投票

0 回答

895 浏览

apache-spark - spark 独立的动态资源分配

我对动态资源分配有疑问/问题。我正在使用带有独立集群管理器的 spark 1.6.2。

我有一个有 2 个核心的工人。我在所有节点上的 spark-defaults.conf 文件中设置了以下参数：

我运行一个包含许多任务的示例应用程序。我在驱动程序上打开端口 4040，我可以验证上述配置是否存在。

我的问题是，无论我做什么，我的应用程序都只能获得 1 个核心，即使另一个核心可用。

这是正常的还是我的配置有问题？

我想要得到的行为是这样的：我有很多用户使用同一个 spark 集群。我希望每个应用程序都将获得固定数量的核心，除非集群的其余部分处于待处理状态。在这种情况下，我希望正在运行的应用程序将获得内核总数，直到新应用程序到达......

我必须为此去mesos吗？

apache-spark apache-spark-standalone

2016-10-27T08:11:19.017

0 投票

1 回答

822 浏览

scala - FAIR 是否可用于 Spark Standalone 集群模式？

我有 2 个节点集群和spark 独立集群管理器。我使用 Scala 多线程触发了多个作业sc。我发现由于FIFO的性质，我的作业被一个接一个地安排，所以我尝试使用FAIR调度

Job1 和 Job2 将从启动器类触发。即使在设置了这些属性之后，我的工作也是在FIFO中处理的。FAIR是否可用于 Spark Standalone 集群模式？是否有更详细描述它的页面？我似乎在 Job Scheduling 中找不到太多关于 FAIR 和 Standalone 的信息。我正在关注这个SOF 问题。我在这里遗漏了什么吗？

scala apache-spark hadoop-yarn apache-spark-standalone

2016-10-27T15:14:37.363

0 投票

2 回答

1180 浏览

logging - Spark 日志记录未返回发送给驱动程序，消息仅存在于工作人员中

我刚刚开始使用的 Spark 集群出现了非常奇怪的行为。

日志记录的正常行为是当一个人运行时spark-submit会看到如下日志消息：

INFO 2016-11-04 13:14:10,671 org.apache.spark.executor.Executor: Finished task 227.0 in stage 4.0 (TID 3168). 1992 bytes result sent to driver

这些通常会很快填满控制台，尤其是当应用程序使用大量分区时。

但是在运行 spark-submit 后，我没有看到任何来自 Spark 的常见日志消息。也许大约 5 行。相反，所有正常的日志消息都在 Spark UI 的驱动程序标准输出中。

所以问题是什么设置和哪里可能告诉 Spark 不要将这些日志条目返回给驱动程序？

这是相当令人沮丧的，因为当日志消息被拆分到多个位置时，很难调试应用程序。通常我只是在运行 spark-submit 后观看日志涌入我的屏幕，然后我就可以感觉到它在做什么。现在我无法获得那种感觉，因为我必须在事件发生后查看日志。

logging apache-spark apache-spark-standalone

2016-11-04T13:50:47.413

0 投票

1 回答

832 浏览

apache-spark - spark rest api /api/v1 给出了不允许的方法

我已经部署了一个 spark 独立集群，但是当我尝试访问其余 api 以获取一些应用程序信息时。我尝试访问的网址是http://ip:4040/api/v1。

其余 api 文档的链接 - > http://spark.apache.org/docs/latest/monitoring.html#rest-api

它说

不允许的方法

我认为这是一些配置问题或其他东西。

提前感谢您的帮助。

apache-spark apache-spark-standalone

2016-11-09T21:25:42.813

0 投票

1 回答

1316 浏览

apache-spark - Forcing driver to run on specific slave in spark standalone cluster running with "--deploy-mode cluster"

I am running a small spark cluster, with two EC2 instances (m4.xlarge).

So far I have been running the spark master on one node, and a single spark slave (4 cores, 16g memory) on the other, then deploying my spark (streaming) app in client deploy-mode on the master. Summary of settings is:

--executor-memory 16g

--executor-cores 4

--driver-memory 8g

--driver-cores 2

--deploy-mode client

This results in a single executor on my single slave running with 4 cores and 16Gb memory. The driver runs "outside" of the cluster on the master-node (i.e. it is not allocated its resources by the master).

Ideally I'd like to use cluster deploy-mode so that I can take advantage of the supervise option. I have started a second slave on the master node giving it 2 cores and 8g memory (smaller allocated resources so as to leave space for the master daemon).

When I run my spark job in cluster deploy-mode (using the same settings as above but with --deploy-mode cluster). Around 50% of the time I get the desired deployment which is that the driver runs through the slave running on the master node (which has the right resources of 2 cores & 8Gb) which leaves the original slave node free to allocate an executor of 4 cores & 16Gb. However the other 50% of the time the master runs the driver on the non-master slave node, which means I get an driver on that node with 2 cores & 8Gb memory, which then leaves no node with sufficient resources to start an executor (which requires 4 cores & 16Gb).

Is there any way to force the spark master to use a specific worker / slave for my driver? Given spark knows that there are two slave nodes, one with 2 cores and the other with 4 cores, and that my driver needs 2 cores, and my executor needs 4 cores it would ideally work out the right optimal placement, but this doesn't seem to be the case.

Any ideas / suggestions gratefully received!

Thanks!

apache-spark apache-spark-standalone

2016-11-10T11:39:03.787

0 投票

0 回答

621 浏览

azure - 远程 Spark 独立执行程序错误

我在远程服务器（Microsoft azure）上运行一个 spark（2.0.1）独立集群。我能够将我的 spark 应用程序连接到这个集群，但是任务在没有任何执行的情况下被卡住（带有以下警告 WARN org.apache.spark.scheduler.TaskSchedulerImpl - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources：）

我试过的：

我已确保我的应用程序的内存、cpu 要求不超过服务器配置。
已将这些变量提供给我的spark-env.sh：SPARK_PUBLIC_DNS ,SPARK_DRIVER_HOST, SPARK_LOCAL_IP, SPARK_MASTER_HOST
可以在浏览器上看到master/worker/application webui。
在远程服务器上打开所有端口（对于我的 IP 和 vpn）。
已禁用ufw。

据我所知，我的工人无法向主人转达。执行程序在 120 秒后超时，使用以下标准错误：

我正在使用我的 vm 的私有 IPSPARK_DRIVER_HOST, SPARK_LOCAL_IP, SPARK_MASTER_HOST和公共 IP 作为SPARK_PUBLIC_DNS并连接到主服务器。master 和 worker 在同一个虚拟机上运行。这个确切的设置正在一个 ec2 实例上运行。任何帮助，将不胜感激。

更新：我可以在机器内正常运行 spark-shell。问题似乎与此类似执行程序无法与驱动程序交互，尽管我在 vm 上打开了端口。有没有办法将驱动程序绑定到我的实例/笔记本电脑的公共 IP？

azure apache-spark amazon-ec2 apache-spark-standalone

2016-11-19T18:41:42.230

0 投票

0 回答

998 浏览

apache-spark - localhost: 8080 for master- 站点不可访问

这些是我在 spark-env.sh 中完成的以下配置：

slaves.sh：

之后，运行命令：./sbin/start-master.sh

输出：

启动 org.apache.spark.deploy.master.Master，登录到 /opt/Spark/spark-2.0.2-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.master.Master- 1-s.out。

返回命令提示符。输出没有 spark-standalone 文档中提到的 url，并且输入localhost:8080显示站点不可访问。当我再次尝试运行master时，它表明它已经是一个正在运行的进程。我已经为 java、python 和 spark 设置了所有路径。我做错了什么或者还有什么要做的，以便火花主 UI 启动？或者即使所有其他网页正常加载，我的代理也会有问题吗？

注意：我使用的是 Linux 操作系统和 spark-2.0.2-bin-hadoop2.7

apache-spark apache-spark-standalone

2016-11-28T08:02:03.090

0 投票

4 回答

7345 浏览

apache-spark - 当我对 python 脚本使用 spark-submit 时，Spark master 不会在 UI 中显示正在运行的应用程序

The image shows 8081 UI.当我启动 scala shell 或 pyspark shell 时，master 会显示正在运行的应用程序。但是当我spark-submit用来运行 python 脚本时，master 没有显示任何正在运行的应用程序。这是我使用的命令：spark-submit --master spark://localhost:7077 sample_map.py. 网络用户界面位于：4040。我想知道我是否以正确的方式提交脚本，或者 spark-submit 是否从未真正显示正在运行的应用程序。

localhost:8080或<master_ip>:8080不为我打开但<master_ip>:8081打开。它显示了执行者信息。

这些是我在 spark-env.sh 中的配置：

我正在使用CentOS，python 2.7并且spark-2.0.2-bin-hadoop2.7.

apache-spark apache-spark-standalone

2016-12-01T09:16:12.990

问题标签 [apache-spark-standalone]

Reference