1

I need to output the Spark application name (spark.app.name) in each line of the driver log (along with other attributes like message and date). So far I failed to find the correct log4j configuration or any other hints. How could it be done?

I would appreciate any help.

Using Spark standalone mode.

4

1 回答 1

1

一种似乎可行的方法涉及以下两个步骤:

  1. 创建您的自定义log4j.properties文件并更改布局。:

    ...
    # this is just an example layout config
    # remember the rest of the configuration
    log4j.appender.stdout.layout.ConversionPattern=${appName}--%d{yyyy-mm-dd HH:mm:ss,SSS} [%-5p] [%c] - %m%n
    

    该文件必须位于类路径的根目录中(就像src/main/resources大多数构建工具一样)或<spark-home>/conf/log4j.properties在集群中的服务器上进行编辑。

  2. 然后在引导您的火花上下文之前使用引用的键设置一个属性:

    System.setProperty("appName", "application-name");
    SparkSession spark = SparkSession.builder().appName("application-name")
    ...
    

在我的快速测试中,以上在所有行中都产生了类似的内容(在本地模式下测试):

application-name--2020-53-06 16:53:35,741 [INFO ] [org.apache.spark.SparkContext] - Running Spark version 2.4.4
application-name--2020-53-06 16:53:36,032 [WARN ] [org.apache.hadoop.util.NativeCodeLoader] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
application-name--2020-53-06 16:53:36,316 [INFO ] [org.apache.spark.SparkContext] - Submitted application: JavaWordCount
application-name--2020-53-06 16:53:36,413 [INFO ] [org.apache.spark.SecurityManager] - Changing view acls to: ernest
application-name--2020-53-06 16:53:36,414 [INFO ] [org.apache.spark.SecurityManager] - Changing modify acls to: ernest
application-name--2020-53-06 16:53:36,415 [INFO ] [org.apache.spark.SecurityManager] - Changing view acls groups to: 
application-name--2020-53-06 16:53:36,415 [INFO ] [org.apache.spark.SecurityManager] - Changing modify acls groups to: 
application-name--2020-53-06 16:53:36,416 [INFO ] [org.apache.spark.SecurityManager] - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(ernest); groups with view permissions: Set(); users  with modify permissions: Set(ernest); groups with modify permissions: Set()
application-name--2020-53-06 16:53:36,904 [INFO ] [org.apache.spark.util.Utils] - Successfully started service 'sparkDriver' on port 33343.
application-name--2020-53-06 16:53:36,934 [INFO ] [org.apache.spark.SparkEnv] - Registering MapOutputTracker
...

而不是在代码中手动设置变量,您可能更喜欢spark-submit用类似的东西来调用

--conf 'spark.driver.extraJavaOptions=-DappName=application-name'

对于更永久的更改,您可能希望<spark-home>/conf/log4j.properties使用布局更改进行编辑(如果文件不存在则复制模板),并使用系统属性调用spark-submit/spark-shell等。

于 2020-02-06T14:57:12.130 回答