0

解析 HTML 时,每次我有一个 '>' char,我都需要检查它后面是否有一个数字。该号码可以是 1、2 或 3 位数字。

代码看起来不错,但我总是得到一个StringIndexOutOfBoundException.

代码:

while (matches < 19) 
    {
        more  = dataInHtml.indexOf(">",index);
        nextOne = dataInHtml.charAt(more + 1);
        nextTwo = dataInHtml.charAt(more + 2);
        nextThree = dataInHtml.charAt(more + 3);

        if (Character.isDigit(nextOne))  digitOne = true;
        if (Character.isDigit(nextTwo))  digitTwo = true;       
        if (Character.isDigit(nextThree))  digitThree = true;

        if (digitThree)
        {
            data[matches] = dataInHtml.substring(more + 1, 3);
            matches++;
            digitThree = false;
            digitTwo = false;
            digitOne = false;
            index = more + 3;
            itWasADigit = true;
        }

        if (digitTwo)
        {
            data[matches] = dataInHtml.substring(more + 1, 2);
            matches++;
            digitTwo = false;
            digitOne = false;
            index = more + 2;
            itWasADigit = true;
        }           

        if (digitOne)
        {
            data[matches] = dataInHtml.substring(more + 1, 1);
            matches++;
            digitOne = false;
            index = more + 1;
            itWasADigit = true;
        }           

        if (!(itWasADigit))    
        {
            index = more + 1;
            itWasADigit = false;
        }
    }
4

1 回答 1

2

如果你将字符串“string >12”传递给它,看看它会做什么:

more  = dataInHtml.indexOf(">",index);
  nextOne = dataInHtml.charAt(more + 1); <-- get the 1
  nextTwo = dataInHtml.charAt(more + 2); <-- Get the 2
  nextThree = dataInHtml.charAt(more + 3); <-- Try to access outside of the string as more+3 is greater than the highest index in the string, so it crashes out

因此,您看到StringIndexOutOfBoundsException.

使用这样的东西

if(dataInHtml.length() > more+3) 

在尝试访问该字符之前检查字符串的长度是否足够大。

如果您尝试从 HTML 文档中读取数字,这可能不是理想的方法。如果可能,您应该考虑使用解析器对其进行解析。

http://jsoup.org/看起来很有希望。

于 2013-02-08T15:46:54.513 回答