0

我是 Java 新手,我想获取下面文本中的所有 URL

WEBSITE1 https://localhost:8080/admin/index.php?page=home
WEBSITE2 https://192.168.0.3:8084/index.php
WEBSITE3 https://192.168.0.5:9090/controller/index.php?page=home
WEBSITE4 https://192.168.0.1:8080/home/index.php?page=forum

我想要的结果是:

https://localhost:8080
https://192.168.0.3:8084
https://192.168.0.5
https://192.168.0.1:8080

我也想将它存储到链接列表或数组中。有人可以教我吗?谢谢你

4

5 回答 5

1

这就是你可以做到这一点的方法。我为你做了一个,你做剩下的:)

try {
            ArrayList<String> urls = new ArrayList<String>();
            URL aURL = new URL("https://localhost:8080/admin/index.php?page=home");
             System.out.println("protocol = " + aURL.getProtocol()+aURL.getHost()+aURL.getPort());
             urls.add(aURL.getProtocol()+aURL.getHost()+aURL.getPort());
        } catch (MalformedURLException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
于 2013-06-20T13:30:37.433 回答
0

假设line代表单行(可能在循环中):

//get the index of "https" in the string
int indexOfHTTPS= line.indexOf("https://");
//get the index of the first "/" after the "https"
int indexOfFirstSlashAfterHTTPS= line.indexOf("/", indexOfHTTPS + "https://".length());

//take a string between "https" and the first "/"
String url = line.substring(indexOfHTTPS, indexOfFirstSlashAfterHTTPS);

稍后,将此网址添加到ArrayList<String>

ArrayList<String> urlList= new ArrayList<String>();
urlList.add(url);
于 2013-06-20T13:28:39.260 回答
0

您可以在URL class.

 public static void main(String[] args) throws MalformedURLException { 

        String string ="https://192.168.0.5:9090/controller/index.php?page=home";
        URL url= new URL(string);
        String result ="https://"+url.getHost()+":"+url.getPort();
        System.out.println(result);
    }

Output :https://192.168.0.5:9090
于 2013-06-20T13:30:05.503 回答
0

您可以尝试在字符串中找到协议子字符串(“http[s]”)的索引,或者使用简单的Pattern(仅用于匹配“website[0-9]”头,不适用于 URL) .

这是一个解决方案Pattern

String webSite1 = "WEBSITE1 https://localhost:8080/admin/index.php?page=home";
String webSite2 = "WEBSITE2 https://192.168.0.3:8084/index.php";
String webSite3 = "WEBSITE3 https://192.168.0.5:9090/controller/index.php?page=home";
String webSite4 = "WEBSITE4 https://192.168.0.1:8080/home/index.php?page=forum";
ArrayList<URI> uris = new ArrayList<URI>();
Pattern pattern = Pattern.compile("^website\\d+\\s+?(.+)", Pattern.CASE_INSENSITIVE);
Matcher matcher;
matcher = pattern.matcher(webSite1);
if (matcher.find()) {
    try {
        uris.add(new URI(matcher.group(1)));
    }
    catch (URISyntaxException use) {
        use.printStackTrace();
    }
}
matcher = pattern.matcher(webSite2);
if (matcher.find()) {
    try {
        uris.add(new URI(matcher.group(1)));
    }
    catch (URISyntaxException use) {
        use.printStackTrace();
    }
}
matcher = pattern.matcher(webSite3);
if (matcher.find()) {
    try {
        uris.add(new URI(matcher.group(1)));
    }
    catch (URISyntaxException use) {
        use.printStackTrace();
    }
}
matcher = pattern.matcher(webSite4);
if (matcher.find()) {
    try {
        uris.add(new URI(matcher.group(1)));
    }
    catch (URISyntaxException use) {
        use.printStackTrace();
    }
}
System.out.println(uris);

输出:

[https://localhost:8080/admin/index.php?page=home, https://192.168.0.3:8084/index.php, https://192.168.0.5:9090/controller/index.php?page=home, https://192.168.0.1:8080/home/index.php?page=forum]
于 2013-06-20T13:30:59.317 回答
0

使用一个简单的正则表达式来查找开头的内容https?://,然后将其提取到第一个/

Matcher m = Pattern.compile("(https?://[^/]+)").matcher(//
        "WEBSITE1 https://localhost:8080/admin/index.php?page=home\r\n" + //
        "WEBSITE2 https://192.168.0.3:8084/index.php\r\n" + //
        "WEBSITE3 https://192.168.0.5:9090/controller/index.php?page=home\r\n" + //
        "WEBSITE4 https://192.168.0.1:8080/home/index.php?page=forum");
List<String> urls = new ArrayList<String>();
while (m.find()) {
    urls.add(m.group(1));
}
System.out.println(urls);

现在,如果您确实只想获得WEBSITE.部分内容,则只需"(https?://[^/]+)"使用以下内容更改正则表达式:"(.*?)\\s+https?". 其余代码保持不变。

于 2013-06-20T13:31:23.760 回答