0

我遇到了字符串拆分的问题。

这是我的代码

 Document doc = null;
    String name = "MasterEjzz";


    try {
            doc = Jsoup.connect("http://oc.tc/forums").userAgent("Mozilla").get();
    } catch (IOException e) {

            //e.printStackTrace();
            System.out.print("Seems like something went wrong! Are you connected to the internet?");
    }

    Elements content = doc.getElementsByClass("topic");
   Elements post = content.select("div");
   for (Element a : content.select("div")){
       Elements href = a.select("a");
       for (Element link : href){
           String links = link.attr("abs:href");
           String[] b = links.split("https");
           System.out.print(links);
           //System.out.print(b.toString());

我想用单词 https 拆分字符串链接,但是当我这样做时,b[0] 什么也不返回,b[1] 返回一个 OutOfBounds 异常。这是字符串链接返回的内容

https://oc.tc/forums/topics/523b038faf7fb046f700255dhttps://oc.tc/tjandralalahttps://oc.tc/forums/posts/523b2131af7fb0557a002882https://oc.tc/forums/posts/523b2131af7fb0557a002882https://oc.tc/forums/topics/51d1cb3cba6087dd20003a35https://oc.tc/ENSIONMANhttps://oc.tc/forums/posts/523b2117af7fb030690027f7https://oc.tc/forums/posts/523b2117af7fb030690027f7https://oc.tc/forums/topics/519c1971a87858d604004c3ahttps://oc.tc/MadCreeper77https://oc.tc/forums/posts/523b20bdaf7fb010920026a2https://oc.tc/forums/posts/523b20bdaf7fb010920026a2https://oc.tc/forums/topics/51ff8a7daf7fb0053a001fe0https://oc.tc/MadCreeper77https://oc.tc/forums/posts/523b1f8aaf7fb001bf002756https://oc.tc/forums/posts/523b1f8aaf7fb001bf002756https://oc.tc/forums/topics/5237d369af7fb0b81600038dhttps://oc.tc/zacharycraft777https://oc.tc/forums/posts/523b1f72af7fb033ab00259fhttps://oc.tc/forums/posts/523b1f72af7fb033ab00259fhttps://oc.tc/forums/topics/523a5de9af7fb062c7001cf1https://oc.tc/lonelyhornethttps://oc.tc/forums/posts/523b1de7af7fb074e0002416https://oc.tc/forums/posts/523b1de7af7fb074e0002416https://oc.tc/forums/topics/5238ff9aaf7fb001bf0000cahttps://oc.tc/lonelyhornethttps://oc.tc/forums/posts/523b1d2baf7fb02dbc0025echttps://oc.tc/forums/posts/523b1d2baf7fb02dbc0025echttps://oc.tc/forums/topics/5235f53baf7fb04c5100170ehttps://oc.tc/Kevinthedude2000https://oc.tc/forums/posts/523b1b69af7fb01783002714https://oc.tc/forums/posts/523b1b69af7fb01783002714https://oc.tc/forums/topics/522bcb94af7fb05fdc000ec8https://oc.tc/skippy369https://oc.tc/forums/posts/523b19cfaf7fb06378002384https://oc.tc/forums/posts/523b19cfaf7fb06378002384https://oc.tc/forums/topics/523aebe7af7fb0dafa0024d7https://oc.tc/MrAmazing1337https://oc.tc/forums/posts/523b1867af7fb01dde0028e6https://oc.tc/forums/posts/523b1867af7fb01dde0028e6https://oc.tc/forums/topics/523b0f8caf7fb0a8240022e2https://oc.tc/Eulenspielerhttps://oc.tc/forums/posts/523b185daf7fb0dafa002822https://oc.tc/forums/posts/523b185daf7fb0dafa002822https://oc.tc/forums/topics/5239058daf7fb06708000191https://oc.tc/ENSIONMANhttps://oc.tc/forums/posts/523b1787af7fb01092002585https://oc.tc/forums/posts/523b1787af7fb01092002585https://oc.tc/forums/topics/52388f49af7fb0413f000e7dhttps://oc.tc/zacharycraft777https://oc.tc/forums/posts/523b1701af7fb02bf300283chttps://oc.tc/forums/posts/523b1701af7fb02bf300283chttps://oc.tc/forums/topics/5237b7d7af7fb0440f0001b9https://oc.tc/ENSIONMANhttps://oc.tc/forums/posts/523b14a8af7fb0ccc5002285https://oc.tc/forums/posts/523b14a8af7fb0ccc5002285https://oc.tc/forums/topics/5237f69daf7fb040dc0006d1https://oc.tc/ENSIONMANhttps://oc.tc/forums/posts/523b141eaf7fb0c73b002647https://oc.tc/forums/posts/523b141eaf7fb0c73b002647https://oc.tc/forums/topics/51bd5e6eba6087d4e60020efhttps://oc.tc/iLiftinghttps://oc.tc/forums/posts/523b1413af7fb0ccc5002270https://oc.tc/forums/posts/523b1413af7fb0ccc5002270https://oc.tc/forums/topics/51de6b74af7fb0a091004b40https://oc.tc/Haxasauroushttps://oc.tc/forums/posts/523b138eaf7fb0fbf9002313https://oc.tc/forums/posts/523b138eaf7fb0fbf9002313https://oc.tc/forums/topics/5196b74ca87858886a003a43https://oc.tc/Shadowbladzhttps://oc.tc/forums/posts/523b1201af7fb056ab002297https://oc.tc/forums/posts/523b1201af7fb056ab002297https://oc.tc/forums/topics/523b0f4daf7fb01dde002803https://oc.tc/1234notty1234https://oc.tc/forums/posts/523b0f4daf7fb01dde002802https://oc.tc/forums/posts/523b0f4daf7fb01dde002802https://oc.tc/forums/topics/52281f11af7fb0e4ed00423dhttps://oc.tc/Eldnickhttps://oc.tc/forums/posts/523b0f1caf7fb046f7002681https://oc.tc/forums/posts/523b0f1caf7fb046f7002681
4

4 回答 4

0

使用 StringUtils.splitByWholeSeparatorPreserveAllTokens 方法

String https = "https://oc.tc/Haxasauroushttps://oc.tc/Haxasauroushttps://oc.tc/Haxasauroushttps://oc.tc/Haxasauroushttps://oc.tc/Haxasauroushttps://oc.tc/Haxasauroushttps://oc.tc/Haxasauroushttps://oc.tc/Haxasauroushttps://oc.tc/Haxasauroushttps://oc.tc/Haxasauroushttps://oc.tc/Haxasaurous";

String[] strArr = StringUtils.splitByWholeSeparatorPreserveAllTokens(
            https, "https");
for(String str:strArr) {
sysout(str);
}
于 2013-09-19T16:21:45.640 回答
0

可能是您没有检查调用是否.attr()返回null""

   String links = link.attr("abs:href");
   if (links != null && !links.equals("")) { 
       String[] b = links.split("https");

       for (String path : b) {
          if (!path.equals(""))
              System.out.println(link);
       }
   }

当您拆分时"abcdabceabcf""abc"您会得到数组["", "d", "e", "f"]

于 2013-09-19T16:22:20.207 回答
0

我使用以下代码来测试您的问题:

import java.util.Arrays;

public class Test {
    public static void main(String[] args) {
        String str = "https://oc.tc/forums/topics/523b038faf7fb046f700255dhttps://oc.tc/tjandralala";
        String[] split = str.split("https");
        System.out.println(Arrays.toString(split));
    }
}

我得到以下输出:[, ://oc.tc/forums/topics/523b038faf7fb046f700255d, ://oc.tc/tjandralala]

第一个元素不是什么。它是空字符串(λ)。基本上,java 将您的刺痛视为看起来像λhttps://oc.tc/forums/topics/523b038faf7fb046f700255d. 所以第一个拆分是空字符串,然后是 s://...

如果您需要包含 https,请尝试使用 StringUtils。

于 2013-09-19T16:23:54.817 回答
0

当针对问题中显示的主机地址运行时,JSoup返回links

https://oc.tc/forums/topics/523b25afaf7fb08f4b002479

即使用“https”拆分时只有1个元素

解决方案:length在尝试访问特定元素之前检查数组的 或遍历返回的元素。

于 2013-09-19T16:32:10.830 回答