0

我正在尝试学习 SparkCLR 来处理文本文件并使用Sample如下方式在其上运行 Spark SQL 查询:

[Sample]
internal static void MyDataFrameSample()
{
    var schemaTagValues = new StructType(new List<StructField>
                                {
                                    new StructField("tagname", new StringType()), 
                                    new StructField("time", new LongType()),
                                    new StructField("value", new DoubleType()),
                                    new StructField("confidence", new IntegerType()),
                                    new StructField("mode", new IntegerType())
                                });

    var rddTagValues1 = SparkCLRSamples.SparkContext.TextFile(SparkCLRSamples.Configuration.GetInputDataPath(myDataFile))
        .Map(r => r.Split('\t')
            .Select(s => (object)s).ToArray());
    var dataFrameTagValues = GetSqlContext().CreateDataFrame(rddTagValues1, schemaTagValues);
    dataFrameTagValues.RegisterTempTable("tagvalues");
    //var qualityFilteredDataFrame = GetSqlContext().Sql("SELECT tagname, value, time FROM tagvalues where confidence > 85");
    var qualityFilteredDataFrame = GetSqlContext().Sql("SELECT * FROM tagvalues");
    var data = qualityFilteredDataFrame.Collect();

    var filteredCount = qualityFilteredDataFrame.Count();
    Console.WriteLine("Filter By = 'confidence', RowsCount={0}", filteredCount);
}

但这一直给我一个错误,上面写着:

    [2016-01-13 08:56:28,593] [8] [ERROR] [Microsoft.Spark.CSharp.Interop.Ipc.JvmBridge] - JVM method execution failed: Static method collectAndServe failed for class org.apache.spark.api.python.PythonRDD when called with 1 parameters ([Index=1, Type=JvmObjectReference, Value=19], )
    [2016-01-13 08:56:28,593] [8] [ERROR] [Microsoft.Spark.CSharp.Interop.Ipc.JvmBridge] - 
    *******************************************************************************************************************************
       at Microsoft.Spark.CSharp.Interop.Ipc.JvmBridge.CallJavaMethod(Boolean isStatic, Object classNameOrJvmObjectReference, String methodName, Object[] parameters) in d:\SparkCLR\csharp\Adapter\Microsoft.Spark.CSharp\Interop\Ipc\JvmBridge.cs:line 91
    *******************************************************************************************************************************

我的文本文件如下所示:

10PC1008.AA 130908762000000000            7.059829  100 0
10PC1008.AA 130908762050000000            7.060376  100 0
10PC1008.AA 130908762100000000            7.059613  100 0
10PC1008.BB 130908762150000000            7.059134  100 0
10PC1008.BB 130908762200000000            7.060124  100 0

我使用它的方式有什么问题吗?

编辑 1

我已将以下内容设置为我的 Samples 项目属性:

在此处输入图像描述

我的用户环境变量如下:(不确定是否重要)

在此处输入图像描述

另外我在SparkCLRWorker 日志中看到它无法根据日志加载程序集:

    [2016-01-14 08:37:01,865] [1] [ERROR] [Microsoft.Spark.CSharp.Worker] - System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. 
---> System.IO.FileNotFoundException: Could not load file or assembly 'SparkCLRSamples, Version=1.5.2.0, Culture=neutral, PublicKeyToken=null' or one of its dependencies. The system cannot find the file specified.
       at System.Reflection.RuntimeAssembly._nLoad(AssemblyName fileName, String codeBase, Evidence assemblySecurity, RuntimeAssembly locationHint, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks)
       at System.Reflection.RuntimeAssembly.InternalLoadAssemblyName(AssemblyName assemblyRef, Evidence assemblySecurity, RuntimeAssembly reqAssembly, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks)
       at System.Reflection.RuntimeAssembly.InternalLoad(String assemblyString, Evidence assemblySecurity, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean forIntrospection)
       at System.Reflection.RuntimeAssembly.InternalLoad(String assemblyString, Evidence assemblySecurity, StackCrawlMark& stackMark, Boolean forIntrospection)
       at System.Reflection.Assembly.Load(String assemblyString)
       at System.Runtime.Serialization.FormatterServices.LoadAssemblyFromString(String assemblyName)
       at System.Reflection.MemberInfoSerializationHolder..ctor(SerializationInfo info, StreamingContext context)
       --- End of inner exception stack trace ---
       at System.RuntimeMethodHandle.SerializationInvoke(IRuntimeMethodInfo method, Object target, SerializationInfo info, StreamingContext& context)
       at System.Runtime.Serialization.ObjectManager.CompleteISerializableObject(Object obj, SerializationInfo info, StreamingContext context)
       at System.Runtime.Serialization.ObjectManager.FixupSpecialObject(ObjectHolder holder)
       at System.Runtime.Serialization.ObjectManager.DoFixups()
       at System.Runtime.Serialization.Formatters.Binary.ObjectReader.Deserialize(HeaderHandler handler, __BinaryParser serParser, Boolean fCheck, Boolean isCrossAppDomain, IMethodCallMessage methodCallMessage)
       at System.Runtime.Serialization.Formatters.Binary.BinaryFormatter.Deserialize(Stream serializationStream, HeaderHandler handler, Boolean fCheck, Boolean isCrossAppDomain, IMethodCallMessage methodCallMessage)
       at System.Runtime.Serialization.Formatters.Binary.BinaryFormatter.Deserialize(Stream serializationStream)
       at Microsoft.Spark.CSharp.Worker.Main(String[] args) in d:\SparkCLR\csharp\Worker\Microsoft.Spark.CSharp\Worker.cs:line 149
4

3 回答 3

0

您是否指定了示例数据位置并将源文本文件复制到该位置?没有的话可以参考

https://github.com/Microsoft/SparkCLR/blob/master/csharp/Samples/Microsoft.Spark.CSharp/samplesusage.md

使用参数 [--data | 设置示例数据位置 sparkclr.sampledata.loc]。

于 2016-01-13T07:27:34.287 回答
0

尝试明确设置 [--temp | spark.local.dir] 选项(有关支持的参数的更多信息,请参见sampleusage.md )。SparkCLR worker 可执行文件在执行时被下载到这个目录中。如果您使用默认临时目录,则工作程序可执行文件可能会被您的防病毒软件隔离,误以为您的浏览器下载了某些恶意程序。将默认值覆盖为 c:\temp\SparkCLRTemp 将有助于避免该问题。

如果设置临时目录没有帮助,请共享您在启动 SparkCLR 驱动程序代码时使用的整个命令行参数列表。

于 2016-01-13T19:09:16.060 回答
0

这是您更改端口号的方式,希望对您有所帮助

在 App.config 添加以下内容

为了完整起见,您还必须添加指定 csharpworker 路径的标签

<appSettings>
  <add key="CSharpBackendPortNumber" value="num"/>
  <add key="CSharpWorkerPath" value="C:\MobiusRelease\samples\CSharpWorker.exe"/>
</appSettings>

请注意,要使其在调试模式下工作,您应该首先使用命令行从(mobius 主)目录运行此命令

%SPARKCLR_HOME%\脚本

sparkclr-submit.cmd debug

这会给你一个这样的消息,其中包含一个端口号

[CSharpRunner.main] CSharpBackend 使用的端口号为 5567
* [CSharpRunner.main] 后端运行调试模式。按回车键退出 *

于 2016-09-01T18:51:03.370 回答