问题标签 [sparkr]

问问题

For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.

762 问题

0 投票

3 回答

1916 浏览

apache-spark - 在 sparkR 中流式传输？

我在 Scala 中使用 Spark 已经有一段时间了。我现在正在研究 pySpark 和 SparkR。我没有看到 PySpark 和 SparkR 提到的流式传输。有谁知道在使用 Python 和 R 时是否可以进行 Spark 流式传输？

2014-09-29T16:43:32.850

0 投票

9 回答

42178 浏览

r - 如何在 R 中读取 Parquet 并将其转换为 R DataFrame？

我想用 R 编程语言处理Apache Parquet文件（在我的例子中，在 Spark 中生成）。

有 R 阅读器吗？或者正在做一个工作？

如果没有，到达那里最方便的方法是什么？注意：有 Java 和 C++ 绑定：https ://github.com/apache/parquet-mr

r apache-spark parquet sparkr

2015-05-22T17:05:23.813

0 投票

2 回答

655 浏览

r - 使用 install_github 安装 SparkR 包时出错

我正在尝试使用SparkRR 中的包。我有所有依赖包，如devtools,Rtools.exe等。

当我尝试以下命令时：

我收到以下错误：

为了解决这个问题，我设置了一个有效的 http_proxy、https_proxy，但它不工作并抛出上述错误。我是 R/RStudio 的新手。

r apache-spark sparkr

2015-06-02T15:16:12.170

0 投票

1 回答

1933 浏览

r - SparkR collect method crashes with OutOfMemory on Java heap space

With SparkR, I'm trying for a PoC to collect an RDD that I created from text files which contains around 4M lines.

My Spark cluster is running in Google Cloud, is bdutil deployed and is composed with 1 master and 2 workers with 15gb of RAM and 4 cores each. My HDFS repository is based on Google Storage with gcs-connector 1.4.0. SparkR is intalled on each machine, and basic tests are working on small files.

Here is the script I use :

First time I run this, it seems to be working fine, all the tasks are run successfully, spark's ui says that the job completed, but I never get the R prompt back :

Then after a CTRL-C to get the R prompt back, I try to run the collect method again, here is the result :

I understand the exception message, but I don't understand why I am getting this the second time. Also, why the collect never returns after completing in Spark?

I Googled every piece of information I have, but I had no luck finding a solution. Any help or hint would be greatly appreciated!

Thanks

r apache-spark google-hadoop sparkr

2015-06-04T13:45:42.250

0 投票

1 回答

554 浏览

sparkr - 无法在 spark-1.4.0 中启动 sparkR shell

我今天下载了 Spark-1.4.0 并尝试在 Linux 和 Windows 环境中启动 sparkR shell - bin 目录中的命令sparkR不起作用。任何人都成功启动了 sparkR shell，请。让我知道。

谢谢桑杰

sparkr

2015-06-12T13:28:05.170

0 投票

3 回答

1067 浏览

r - 安装 Spark 1.4 自带的 SparkR

最新版本的 Spark (1.4) 现在随 SparkR 一起提供。有谁知道如何在 Windows 上安装 SparkR 实现？sparkR.R 脚本当前位于 C:/spark-1.4.0/R/pkgs/R/

这似乎是朝着正确方向迈出的一步，但这些说明不适用于 Windows，因为没有相关的 sparkR 目录。

r apache-spark sparkr

2015-06-16T13:08:53.903

0 投票

4 回答

3434 浏览

rstudio - 通过 RStudio 加载 com.databricks.spark.csv

我已经安装了 Spark-1.4.0。我还安装了它的 R 包 SparkR，我可以通过 Spark-shell 和 RStudio 使用它，但是，有一个区别我无法解决。

启动 SparkR-shell 时

我可以按如下方式读取 .csv 文件

不幸的是，当我通过 RStudio 启动 SparkR（正确设置我的 SPARK_HOME）时，我收到以下错误消息：

我知道我应该以某种方式加载 com.databricks:spark-csv_2.10:1.0.3，但我不知道该怎么做。有人可以帮助我吗？

rstudio sparkr

2015-06-16T14:21:06.790

0 投票

1 回答

1367 浏览

r - SparkR 和软件包

如何将 Spark 中的一个调用包用于 R 的数据操作？

例如，我正在尝试在 hdfs 中访问我的 test.csv，如下所示

但得到如下错误：

我尝试通过以下选项加载 csv 包

但在加载 sqlContext 时出现以下错误

任何帮助将不胜感激。

r apache-spark sparkr

2015-06-20T08:40:23.717

0 投票

1 回答

1833 浏览

apache-spark - How to do map and reduce in SparkR

How do I do map and reduce operations using SparkR? All I can find is stuff about SQL queries. Is there a way to do map and reduce using SQL?

apache-spark sparkr

2015-06-23T20:22:50.997

0 投票

1 回答

1386 浏览

sparkr - 无法调用 sparkRSQL.init 函数

我是 Spark 的新手，正在尝试运行 SparkR 页面中提到的示例。经过一番努力，我能够将 sparkR 安装到我的机器中，并且能够运行基本的 wordcount 示例。但是，当我尝试运行时：

library(SparkR) #works fine - 加载包 sc <- sparkR.init() #works fine sqlContext <- sparkRSQL.init(sc) #fails

它说，没有名为“sparkRSQL”的包。根据文档 sparkRSQL.init 是 sparkR 包中的一个函数。如果我在这里遗漏任何东西，请告诉我。

提前致谢。

sparkr

2015-06-25T11:11:08.063

1 2 3 4 5 6 7 8 9 10

问题标签 [sparkr]

Reference