问题标签 [azure-hdinsight]

问问题

For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.

932 问题

0 投票

2 回答

151 浏览

hadoop - 获取错误 JAVA_HOME 未设置

我已经安装了HDInsight Emulator，然后尝试运行 hadoop 命令来创建目录。
命令：hadoop fs -mkdir input/files
我收到错误 JAVA_HOME 未设置。
我已经尝试过以下解决方案
Hadoop：«错误：未设置 JAVA_HOME»
使用 Hadoop：本地主机：错误：未设置 JAVA_HOME

2013-12-24T13:09:22.453

0 投票

1 回答

765 浏览

c# - 连接到 HDInsight 模拟器

我正在尝试与 c# 连接。

这是将 Hive 查询成功提交到我的远程 HDInsight 群集的类。我需要在这里更改什么才能连接到本地模拟器

c#.net azure azure-hdinsight

2014-01-02T12:49:39.013

0 投票

1 回答

5651 浏览

powershell - Hadoop Hive 查询中的双引号

我可以在以下查询中使用双引号 ->

但我无法在以下 PowerShell Comamnds 中使用引号 -

它给了我以下错误-

在此处输入图像描述

请给我一些信息，为什么会出现这种零星行为。两种实现都创建了工作，那么为什么一个实现接受双引号而另一个不接受。

powershell azure hive azure-hdinsight

2014-01-03T06:05:27.733

0 投票

1 回答

5876 浏览

hadoop - 插入蜂巢中不存在的地方

我需要 ansi sql 中这个等价物的配置单元语法

所以 tablea 不包含重复项，只插入来自 tableb 的新 id。

hadoop hive azure-hdinsight

2014-01-06T14:07:21.877

0 投票

1 回答

309 浏览

azure - Azure HDInsight 参数被错误引用

我正在尝试使用带有 Hadoop (HDInsight) 的 Azure SQL 数据库中的数据。

为了获取数据并执行作业，我在 C# 控制台程序中运行以下代码：

错误信息：

一些注意事项：

它在没有 --query 参数的情况下工作，即如果我只是选择整个表
如果在 Powershell 中执行该命令有效
如果查询中没有空格（即 --query \"SELECT\" ）没有错误，但显然这不是很有用
单引号 (--query 'SELECT ... $Conditions' ) 有效，但该作业不会产生任何输出
使用 @ 和双引号不起作用
该问题似乎类似于Hadoop Hive Query 中的双引号，但答案（指定作业名称）没有帮助

那么问题来了：为什么查询中的空格会导致这个错误？

在此先感谢您的帮助

azure hadoop azure-sql-database azure-hdinsight

2014-01-09T09:29:17.963

0 投票

1 回答

153 浏览

hive - HDInsight scalability when using Azure Storage

Hi I'm playing around with HDInsight. I'm putting log files into Azure storage and then using Hive external tables to map onto them. I believe Microsoft recommend Azure storage to HDFS so you can delete and recreate the clusters without losing data. What is the scalability vs HDFS. My understanding of HDFS is that it is spread over multiple nodes to allow parallel processesing how does this compare to Azure storage.

hive azure-storage azure-hdinsight

2014-01-13T22:17:49.470

0 投票

1 回答

614 浏览

java - HDInsight hadoop java 程序无法运行 - 找不到库

在 W7 上安装 Haddop HDInsight 之后。无法编译具有 hadoop 特定实现的 java 程序。他们错误地说：

最初我尝试做

javac c:/z/WordCount.java

然后也尝试了

即，给出一个类路径。

好吧，我不确定 javac 到底指向哪里

这是我的 hadoop 文件夹中的内容：

请指教。

java hadoop azure-hdinsight hortonworks-data-platform

2014-01-15T22:59:35.047

0 投票

1 回答

881 浏览

.net - 以编程方式运行 HDInsight 作业 - 群集节点上的 .jar 文件，而不是 Blob 存储中

我按照本教程从 .NET 控制台应用程序向 HDInsight 提交 mapreduce 作业。

它工作正常，但我想知道这一行：

“wasb:///example/jars/hadoop-examples.jar”指的是我的 Azure 存储帐户中的一个 jar，当我将帐户连接到新的 HDInsight 群集时，它会自动放在那里。

超越示例（我想使用 Mahout）......我可以引用我添加到集群节点的 jar 吗？我通过 RDP 将 mahout 安装到了 apps/dist 目录中。我可以从那里很好地运行 Mahout 作业，但我不能将这两个步骤放在一起。

感觉就像我不应该将 jar 文件添加到 blob 存储来使用它们。

.net azure hadoop azure-hdinsight

2014-02-11T17:24:49.880

0 投票

1 回答

628 浏览

azure - HDInsight word count map reduce 程序卡在 mapper 100% 和 reducer 0%

我是 Hadoop 新手，我遇到了与此处发布的非常相似的问题。唯一的问题是 OP 在 Linux 上运行 hadoop，而我在 Windows 上运行它。

我已经在本地机器上安装了 Hadoop Azure HDInsight Emulator。当我运行一个简单的字数统计程序时。Mapper 作业完美运行 100%，但 Reduce 作业卡在 0%。

在此处输入图像描述

我尝试按照 Chris 的建议对其进行调试（针对此 que），发现运行减速器作业的主机名存在问题（这是 OP 的确切问题）

在此处输入图像描述

Reduce 没有在localhost其上运行，而是在某个192.168.17.213没有得到解析的主机名上运行，reducer 无法从那里继续。

这些是错误日志

OP 通过将\etc\hosts文件设置更改为 localhost 解决了该问题。

但这似乎是一个 linux 配置。如何在我的 Hadoop Azure HDInsight Emulator 中将我的主机名设置为 localhost？

azure hadoop mapreduce azure-hdinsight

2014-02-12T08:06:13.427

0 投票

1 回答

573 浏览

powershell - HDInsight powershell 作业提交无法使用流式 C# 作业定义自定义 libjar

我在由 Microsoft Azure HDInsight 服务托管的 hadoop 集群上运行 C# 作业。我必须直接在我的 hdinsight 服务器中使用 hadoop 命令行才能使用我的自定义 Java 输入格式：

调用 bin\hadoop jar lib\hadoop-streaming.jar -D "mapred.max.split.size=33554432" -libjars "../mycustom-hadoop-streaming.jar" -inputformat "mycustom.hadoop.CombinedInputFormat" .. .（我切断了命令的其余部分）

现在我正在尝试通过 powershell 命令行提交作业（从另一台 azure 机器远程提交作业）：

$jobDefinition = New-AzureHDInsightStreamingMapReduceJobDefinition -Defines @{ "mapred.max.split.size"="33554432", "mapred.input.format.class"="mycustom.hadoop.CombinedInputFormat" } ...（剩下的我删了命令）

但是用 powershell 命令行定义 -libjars 的方法在哪里？微软似乎没有考虑过这种能力：http: //msdn.microsoft.com/en-us/library/windowsazure/dn527638.aspx

是否有人尝试执行此操作或有解决方法来定义具有 HDInsight 流作业提交的 libjar？

powershell hadoop mapreduce hadoop-streaming azure-hdinsight

2014-02-12T14:51:09.623

1 2 3 4 5 6 7 8 9 10

问题标签 [azure-hdinsight]

Reference