java - 链接字符串的子字符串

Question

我只想获取以下链接字符串的实际链接：

String link = <a href="http://www.facebook.com/wwwausedu" target="_blank" class="btnFacebook">Link to Facebook</a>

结果应该只有www.facebook.com/wwwausedu

我尝试了以下但它不工作：

TEMP = link.substring(link.indexOf("http://")+1, tmp.lastIndexOf("\""));

score 5 · Accepted Answer

您不需要的最后一个索引"，而是您之后http://的第一个索引：

TEMP = link.substring(link.indexOf("http://")+7, link.indexOf("\"", link.indexOf("http://")));

该String.indexOf(String str, int fromIndex)函数str在指定索引之后获取第一次出现的。此外，正如@mellamokb the Wise 指出的那样，您需要添加7到索引中，而不是1，因为您想http://从结果中排除。

score 3 · Accepted Answer

为什么不使用专门为解析 HTML 设计的工具，如jsoup。

String link = "<a href=\"http://www.facebook.com/wwwausedu\" "
        + "target=\"_blank\" class=\"btnFacebook\">Link to Facebook</a>";

Document doc = Jsoup.parse(link);
String address = new URL(doc.select("a").attr("href")).toString();

这将返回：http://www.facebook.com/wwwausedu但我们只想要没有协议的部分，所以现在让我们使用 URL

URL url=new URL(address);
System.out.println(url.getHost()+url.getPath());

输出：

www.facebook.com/wwwausedu

score 2 · Accepted Answer

尝试使用正则表达式

    Pattern p = Pattern.compile("href=\"(.*?)\"");
    Matcher m = p.matcher(link);
    String url = null;
    if (m.find()) {
        url = m.group(1); // this will give you the URL
    }

编辑：要删除http也使用正则表达式 "href=\"http://(.*?)\""

java - 链接字符串的子字符串

3 回答 3

Related

Reference