0

我必须使用 Spark Scala 读取来自 REST API 的 JSON 响应,我已经编写了代码(都使用 scala.io.Source.fromInputStream 以及 Scalaj HTTP)但是作业没有在 HDFS 上运行,每次它给我超时异常虽然我已将超时(连接/读取)增加到最大值。

在我的 Intellij(本地)上它工作正常,我在 HDFS 日志中看到除了超时异常之外我找不到其他任何东西,但这可以看出它仍然采用默认超时值,即 100 毫秒(没有采用我提供的最大值在我的代码中)

以下是日志:

21/08/19 11:17:54 INFO jdk.JdkHttpClient: connect timeout: Period{time=100, timeUnit=MILLISECONDS}, read timeout: Period{time=100, timeUnit=MILLISECONDS}, shutdown timeout: Period{time=10, timeUnit=MILLISECONDS}
21/08/19 11:17:54 INFO jdk.JdkHttpClient: connect timeout: Period{time=100, timeUnit=MILLISECONDS}, read timeout: Period{time=100, timeUnit=MILLISECONDS}, shutdown timeout: Period{time=10, timeUnit=MILLISECONDS}
21/08/19 11:17:54 INFO btrace.SparkSensorUtils: Sending init confJson
1/08/19 11:20:12 ERROR mainClasses.TestSap: Connection timed out (Connection timed out)
java.net.ConnectException: Connection timed out (Connection timed out)
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188
    

Below are the code I am using :

Using HTTP Scalaj :

```scala

System.setProperty("sun.net.http.allowRestrictedHeaders", "true")

    val spark = Context.getSparkSession()

    import spark.implicits._

    spark.conf.set("spark.network.timeout", "3000s")
    spark.conf.set("spark.executor.heartbeatInterval", "1500s")
    spark.conf.set("hive.spark.client.server.connect.timeout", "100000ms")
    spark.conf.set("hive.spark.client.connect.timeout", "100000ms")

 val result = Http(Url)
             .auth("xxxx","yyyy")
             .option(HttpOptions.connTimeout(999999999))
             .option(HttpOptions.readTimeout(999999999))
             .asString 

```
Using Scala.io.source :

```scala
 @throws(classOf[java.io.IOException])
    @throws(classOf[java.net.SocketTimeoutException])
    def GetUrlContentJson(url: String): DataFrame ={

      val userpass = "xxxx" + ":" + "yyyy";
     
      val basicAuth = "Basic " +
        javax.xml.bind.DatatypeConverter.printBase64Binary(userpass.getBytes());
      val connection = new URL(url).openConnection
      connection.setRequestProperty("Authorization", basicAuth)
      connection.setConnectTimeout(999999999)
      connection.setReadTimeout(999999999)
      connection.setUseCaches(false)

      val result = scala.io.Source.fromInputStream(connection.getInputStream).mkString
      if (connection.getInputStream != null) connection.getInputStream.close
}
```
For both the cases using same URL , I can able to get response over running on Intellij, whereas running the same on HDFS (Spark Scala Jar) it is giving me Timeout Exception.
It will be really helpful, if anyone can help me to resolve this issue.
4

0 回答 0