我正在使用xsbt-proguard-plugin,这是一个用于与 Proguard 一起工作的 SBT 插件。
我正在尝试为我编写的Hive Deserializer提供 Proguard 配置,它具有以下依赖项:
// project/Dependencies.scala
val hadoop = "org.apache.hadoop" % "hadoop-core" % V.hadoop
val hive = "org.apache.hive" % "hive-common" % V.hive
val serde = "org.apache.hive" % "hive-serde" % V.hive
val httpClient = "org.apache.httpcomponents" % "httpclient" % V.http
val logging = "commons-logging" % "commons-logging" % V.logging
val specs2 = "org.specs2" %% "specs2" % V.specs2 % "test"
加上一个非托管依赖:
// lib/UserAgentUtils-1.6.jar
因为其中大多数要么用于本地单元测试,要么在 Hadoop/Hive 环境中可用,我希望我的缩小 jarfile 只包括:
- Java 类 SnowPlowEventDeserializer.class 和 SnowPlowEventStruct.class
org.apache.httpcomponents.httpclient
commons-logging
lib/UserAgentUtils-1.6.jar
但我真的很难让语法正确。我应该从我想要保留的类的白名单开始,还是明确过滤掉 Hadoop/Hive/Serde/Specs2 库?我知道这个 SO question,但它似乎不适用于这里。
如果我最初尝试白名单方法:
// Should be equivalent to sbt> package
import ProguardPlugin._
lazy val proguard = proguardSettings ++ Seq(
proguardLibraryJars := Nil,
proguardOptions := Seq(
"-keepattributes *Annotation*,EnclosingMethod",
"-dontskipnonpubliclibraryclassmembers",
"-dontoptimize",
"-dontshrink",
"-keep class com.snowplowanalytics.snowplow.hadoop.hive.SnowPlowEventDeserializer",
"-keep class com.snowplowanalytics.snowplow.hadoop.hive.SnowPlowEventStruct"
)
)
然后我得到一个 Hadoop 处理错误,很明显 Proguard 仍在尝试捆绑 Hadoop:
proguard: java.lang.IllegalArgumentException: Can't find common super class of [[Lorg/apache/hadoop/fs/FileStatus;] and [[Lorg/apache/hadoop/fs/s3/Block;]
同时,如果我尝试使用 Proguard 的过滤语法来建立我不想包含的库的黑名单:
import ProguardPlugin._
lazy val proguard = proguardSettings ++ Seq(
proguardLibraryJars := Nil,
proguardOptions := Seq(
"-keepattributes *Annotation*,EnclosingMethod",
"-dontskipnonpubliclibraryclassmembers",
"-dontoptimize",
"-dontshrink",
"-injars !*hadoop*.jar"
)
)
然后这似乎也不起作用:
proguard: java.io.IOException: Can't read [/home/dev/snowplow-log-deserializers/!*hadoop*.jar] (No such file or directory)
非常感谢任何帮助!