这是我用 Groovy 编写的代码,用于从 URL 中获取页面标题。然而,一些网站我得到了“永久移动”,我认为这是因为 301 重定向。如何避免这种情况并让 HttpUrlConnection 跟随正确的 URL 并获得正确的页面标题
例如这个网站我得到了“永久移动”而不是正确的页面标题 http://www.nytimes.com/2011/08/14/arts/music/jay-z-and-kanye-wests-watch-the-王座.html
def con = (HttpURLConnection) new URL(url).openConnection()
con.connect()
def inputStream = con.inputStream
HtmlCleaner cleaner = new HtmlCleaner()
CleanerProperties props = cleaner.getProperties()
TagNode node = cleaner.clean(inputStream)
TagNode titleNode = node.findElementByName("title", true);
def title = titleNode.getText().toString()
title = StringEscapeUtils.unescapeHtml(title).trim()
title = title.replace("\n", "");
return title