http - cloudfoundry groovy（java）应用程序无法读取外部 url，给出 403

Question

https://groups.google.com/group/caelyf/feed/rss_v2_0_topics.xml在浏览器窗口中正确返回 xml 流；

在 cloudfoundry 应用程序中使用 groovy，这会因 http 403 权限失败而失败，例如：

def url = "https://groups.google.com/group/caelyf/feed/rss_v2_0_topics.xml:443".toURL()
def tx = url.getText('UTF-8')

cloudfoundry 论坛暗示只有 https 加端口 443 可以读取外部 url

有任何想法吗？

score 1 · Accepted Answer

不知道为什么你停留:443在网址的末尾？

403 表示禁止。我猜谷歌不会让你用 java 抓取组网站。

您可以尝试将用户代理设置为浏览器的用户代理吗？

def tx = url.openConnection().with {
  setRequestProperty("User-Agent", "Firefox/2.0.0.4")
  inputStream.with {
    def ret = getText( 'UTF-8' )
    close()
    ret
  }
}

或类似的...

我不认为这是一个云铸造问题。您是否尝试过从您的机器上运行上述内容来确认这一点？

编辑：

刚刚尝试过，它可以工作（至少在我的机器上）。这显示了如何将 XMl 加载到解析器中，并从提要中打印标题：

URL url = "https://groups.google.com/group/caelyf/feed/rss_v2_0_topics.xml".toURL()

def tx = new XmlSlurper().with { x ->
  url.openConnection().with {
    // Pretend to be an old Firefox version
    setRequestProperty("User-Agent", "Firefox/2.0.0.4")
    // Get a reader
    inputStream.withReader( 'UTF-8' ) {
      // and parse it with the XmlSlurper
      parse( it )
    }
  }
}

// Print all the titles
tx.channel.item.title.each { println it }

http - cloudfoundry groovy（java）应用程序无法读取外部 url，给出 403

1 回答 1

编辑：

Related

Reference