java - 如何使用子字符串从缓冲阅读器中提取文本

Question

我正在尝试使用 substrings 和 bufferedreader 提取两个标签之间的文本，但出现 indexoutofbounds 异常。使用 if 语句是因为我正在解析 5 个网页，并且我想从每个网页中读取文本，下面是我的代码：

    public static List<WebPage> readRawTextFile(Context ctx, int resId) {
    InputStream inputStream = ctx.getResources().openRawResource(
            R.raw.pages);

    InputStreamReader inputreader = new InputStreamReader(inputStream);
    BufferedReader buffreader = new BufferedReader(inputreader);
    String line;
    StringBuilder text = new StringBuilder();
    String txt1 = text.toString();
    try {
        int count = 0;
        while ((line = buffreader.readLine()) != null) {

            if (line.length() == 0) {
                int sURL = line.indexOf("<!--");
                int eURL = line.indexOf("-->");
                String newSub = txt1.substring(txt1.indexOf(sURL) + 1,
                        txt1.indexOf("\""));
                System.out.println(newSub);
            }

score 3 · Accepted Answer

看看这段代码：

if (line.length() == 0) {
    int sURL = line.indexOf("<!--");
    int eURL = line.indexOf("-->");
    String newSub = txt1.substring(txt1.indexOf(sURL) + 1,
            txt1.indexOf("\""));
    ...
}

如果该行为空，您将进入该块。所以sURL并且eURL肯定会是-1。

然后你使用txt1.indexOf(-1)，这开始很奇怪（你为什么要使用indexOf并传入一个索引？） - 我强烈怀疑indexOf这里的两个值都将是-1，所以你将拥有：

String newSub = txt1.substring(0, -1);

...这将失败。也不清楚你为什么要使用txt1.substring而不是在line.substring这里。

基本上，我认为您的代码有很多问题。您应该非常仔细地查看每一行，并对其进行更改，直到它真正有意义为止。然后添加单元测试...

score 0 · Accepted Answer

既然sURL已经是

int sURL = txt1.indexOf("<!--");

，那么txt1.indexOf(sURL)在

String newSub = txt1.substring(txt1.indexOf(sURL) + 1, txt1.indexOf("\""));

行，可能你的意思是：

String newSub = txt1.substring(sURL + 1, txt1.indexOf("\""));

那只会留下您txt1.indexOf("\"")以后使用的原因的谜团。

java - 如何使用子字符串从缓冲阅读器中提取文本

2 回答 2

Related

Reference