4

I'm working on a website-scraping application in Scala. The site I'm scraping is heavily session-oriented, so I have to hit the site once to get a session ID before I can do anything else.

I get the connection for retrieving the session ID like this:

url.openConnection().asInstanceOf[HttpURLConnection]

It works fine. The .connected field of the returned HttpURLConnection is false, and it flips to true when I call .connect() on it. No problem.

The first hint of trouble occurs when I finish with the connection and call .disconnect() on it. The .connected field stays true. Hm.

So now I've got my session ID, and I go to get the page that has the form I want on it. I call

url.openConnection().asInstanceOf[HttpURLConnection]

again, just like last time--same code, in fact--except this time the HttpURLConnection it gives me has the .connected field set to true! I thought at first that somehow it was giving me the same object it gave me before, but no, the memory ID is different.

So of course now when I call .setRequestProperty() on the connection, it blows up with an IllegalStateException: Already connected.

Am I misunderstanding how to use HttpURLConnection?

Notes: Scala 2.9.2, Java 6.0. Also, the URL objects on which I call .openConnection() are different objects, not the same.

Thanks...

4

3 回答 3

1

它被称为连接池,以寻求 HTTP Keep-alive。很好。你想要它。如果您真的不这样做,请调用该disconnect()方法。

于 2012-09-27T00:26:03.007 回答
1

根据我的经验,该URL课程不太适合基于会话的工作(尤其是基于 cookie 的会话)。

如果您想利用这一点,我建议您使用类似Apache HTTPClient

恕我直言

于 2012-09-27T00:29:14.833 回答
0

听起来 HttpUrlConnection 在幕后为您保持连接

查看这篇文章以获取一些提示,以强制它关闭连接而不是过度帮助。

尽管在您的情况下听起来您可能想要使用 keep-alive,因为它可以通过避免不必要的连接握手来加快您对网站的调用。

于 2012-09-27T00:20:20.863 回答