我尝试从网站解析信息。但是,它仅在上下文不是很长时才有效。随着 Html 变大,加载的内容不完整。检索到的String总长度在40000左右,每次检索到的字符串计数不一样。(例如:第一次计数为 31345,下一次计数为 31358)所以我无法检索整页。
因此,我认为这个问题可能与互联网连接或缓冲区有关。但是我使用了 bufferedReader,据我所知 HttpURLConnection 像流一样工作,所以应该没有任何问题。我检查了几乎所有与 UrlConnection 相关的页面,但没有人谈论这个。
我的代码有什么问题吗?我已经在这个问题上工作了几天,任何建议都会非常有帮助。提前致谢。
public String getHtmlFromUrl(String url, int startReadingLine) {
String xml = "";
try {
//URL url1 = new URL(url);
URL url1 = new URL("http://support.google.com/analytics/bin/answer.py?hl=zh-Hant&answer=1009602");
HttpURLConnection urlConn = (HttpURLConnection) url1
.openConnection();
urlConn.setRequestProperty("User-Agent",
"Mozilla/5.0 (Windows NT 6.1;zh-tw; MSIE 6.0)");
if (Integer.parseInt(Build.VERSION.SDK) < Build.VERSION_CODES.FROYO) {
System.setProperty("http.keepAlive", "false");
}
urlConn.setReadTimeout(10000 /* milliseconds */);
urlConn.setConnectTimeout(15000 /* milliseconds */);
urlConn.setDoOutput(true);
urlConn.setDoInput(true);
urlConn.setRequestMethod("GET");
urlConn.setUseCaches(false);
InputStreamReader in = new InputStreamReader(
urlConn.getInputStream());
BufferedReader buffer = new BufferedReader(in, 100000);
StringBuilder builder = new StringBuilder();
String auxaux = "";
while ((aux = buffer.readLine()) != null)
builder.append(aux);
xml = builder.toString();
in.close();
urlConn.disconnect();
} catch (SocketTimeoutException e) {
return "time out";
} catch (IOException e) {
e.printStackTrace();
}
// return XML
return xml;
}
这是xml的示例:(计数为40710)
(我没有在 xml 末尾添加“...”)
<!DOCTYPE html><html lang="zh-Hant"class="streamlined streamlined-3"><head><script type="text/javascript">serverResponseTimeDelta=window.external&&window.external.pageT?window.external.pageT:-1;pageStartTime=new Date().getTime...
...
..."納米比亞", "NR": "諾魯", "NP": "尼泊爾", "NL": "荷蘭", "AN": "荷屬安地列斯", "KN": "尼維斯", "NC": "新喀里多尼亞", "NI": "尼加拉瓜", "NE": "尼日", "NG": "奈及利亞", "NU": "紐埃", "KR": "北韓", "NO": "挪威", "NZ": "紐西蘭", "OM": "阿曼", "PW": "帛琉", "PK": "巴基斯坦", "PS": "巴勒斯坦", "PA": "巴拿馬", "PG": "巴布亞新幾內亞", "PY": "巴拉圭", "PE": "秘魯", "PH"...
另一个:(计数 41106)
<!DOCTYPE html><html lang="zh-Hant"class="streamlined streamlined-3"><head><script type="text/javascript">serverResponseTimeDelta=window.external&&window.external.pageT?window.externa...
...
...屬安地列斯", "KN": "尼維斯", "NC": "新喀里多尼亞", "NI": "尼加拉瓜", "NE": "尼日", "NG": "奈及利亞", "NU": "紐埃", "KR": "北韓", "NO": "挪威", "NZ": "紐西蘭", "OM": "阿曼", "PW": "帛琉", "PK": "巴基斯坦", "PS": "巴勒斯坦", "PA": "巴拿馬", "PG": "巴布亞新幾內亞", "PY": "巴拉圭", "PE": "秘魯", "PH"...
编辑:到目前为止,我认为它与它与互联网交互的方式有关,因为每个结果的计数不同,或者它可能是我设备的一些奇怪的错误。根本原因尚未找到。最奇怪的是它在结果中以“...”结尾。它似乎知道结果还没有完成......