0

我正在尝试在本地设置 Spark History 配置服务器。我正在使用 Windows 和 Pycharm 进行 Pyspark 编程。我可以在 localhost:4040 查看 Spark Web-UI。我做过的事情是:

  1. spark-defaults.conf:(我在其中添加了最后三行。)

    #
    # Licensed to the Apache Software Foundation (ASF) under one or more
    # contributor license agreements.  See the NOTICE file distributed with
    # this work for additional information regarding copyright ownership.
    # The ASF licenses this file to You under the Apache License, Version 2.0
    # (the "License"); you may not use this file except in compliance with
    # the License.  You may obtain a copy of the License at
    #
    #    http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    #
    
    # Default system properties included when running spark-submit.
    # This is useful for setting default environmental settings.
    
    # Example:
    # spark.master                     spark://master:7077
    # spark.eventLog.enabled           true
    # spark.eventLog.dir               hdfs://namenode:8021/directory
    # spark.serializer                 org.apache.spark.serializer.KryoSerializer
    # spark.driver.memory              5g
    # spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
    spark.jars.packages                com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.1
    spark.eventLog.enabled      true
    spark.history.fs.logDirectory   file:///D:///tmp///spark-events
    
  2. 运行历史服务器

     C:\Users\hp\spark>bin\spark-class.cmd org.apache.spark.deploy.history.HistoryServer
     Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
     20/08/09 08:58:04 INFO HistoryServer: Started daemon with process name: 13476@DESKTOP-B9KRC6O
     20/08/09 08:58:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
     20/08/09 08:58:23 INFO SecurityManager: Changing view acls to: hp
     20/08/09 08:58:23 INFO SecurityManager: Changing modify acls to: hp
     20/08/09 08:58:23 INFO SecurityManager: Changing view acls groups to:
     20/08/09 08:58:23 INFO SecurityManager: Changing modify acls groups to:
     20/08/09 08:58:23 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hp); groups with view permissions: Set(); users  with modify permissions: Set(hp); groups with modify permissions: Set()
     20/08/09 08:58:24 INFO FsHistoryProvider: History server ui acls disabled; users with admin permissions: ; groups with admin permissions
     20/08/09 08:58:26 INFO Utils: Successfully started service on port 18080.
     20/08/09 08:58:26 INFO HistoryServer: Bound HistoryServer to 0.0.0.0, and started at http://DESKTOP-B9KRC6O:18080
    
  3. 成功运行 Pyspark 程序后,我无法在 Spark-History-server Web UI 上查看作业详细信息。虽然服务器已启动。如下所示:

localhost:18080 网页界面

  1. 我已经使用的参考资料:

    Windows:Apache Spark 历史服务器配置

    如何在 Windows 上运行 Spark History Server

  2. 我使用的代码如下:

     from pyspark import SparkContext,SparkConf
     from pyspark.sql import SparkSession
     conf = SparkConf().setAppName("madhu").setMaster("local")
     sc = SparkContext(conf=conf)
     spark = SparkSession(sc).builder.getOrCreate()
    
    
     def readtable(dbname,table):
         dbname = dbname
         table=table
         hostname = "localhost"
         jdbcPort = 3306
         username = "root"
         password = "madhu"
         jdbc_url = "jdbc:mysql://{0}:{1}/{2}?user={3}&password={4}".format(hostname,jdbcPort, dbname,username,password)
         dataframe = spark.read.format('jdbc').options(driver = 'com.mysql.jdbc.Driver',url=jdbc_url, dbtable=table).load()
         return dataframe
    
     t1 = readtable("db","table1")
     t2 = readtable("db2","table2")
    
     print(t2.show())
     spark.stop()
    

请帮助我如何实现相同的目标。我将提供所需的任何数据。

  1. 我也尝试过使用目录路径:

     spark.eventLog.enabled      true
     spark.history.fs.logDirectory   file:///D:/tmp/spark-events
    
4

1 回答 1

0

您必须在应用程序中提供正确的主 URL 并使用spark-submit运行应用程序。

您可以在 Spark UI 中找到它,localhost:4040 在以下示例中,主 URL 是spark://XXXX:7077. 在此处输入图像描述

您的应用程序应该是:

conf = SparkConf().setAppName("madhu").setMaster("spark://XXXX:7077")
于 2020-11-24T22:58:32.957 回答