java - 尝试从在某些标签之间读取的缓冲阅读器中提取子字符串

Question

我正在使用 bufferedreader 提取 5 个网页，每个网页用空格分隔，我想使用一个子字符串来提取每个页面的 url、html、源和日期。但是我需要有关如何正确使用子字符串来实现这一点的指导，干杯。

public static List<WebPage> readRawTextFile(Context ctx, int resId) {   

    InputStream inputStream = ctx.getResources().openRawResource(
            R.raw.pages);

    InputStreamReader inputreader = new InputStreamReader(inputStream);
    BufferedReader buffreader = new BufferedReader(inputreader);
    String line;
    StringBuilder text = new StringBuilder();

    try {
        while ((line = buffreader.readLine()) != null) {


            if (line.length() == 0) {       
                // ignore for now 
                                //Will be used when blank line is encountered
            }

            if (line.length() != 0)  {
         //here I want the substring to pull out the correctStrings
                int sURL = line.indexOf("<!--");
                    int eURL = line.indexOf("-->");
                line.substring(sURL,eURL);
                **//Problem is here**
            }
        }
    } catch (IOException e) {
        return null;

    }
    return null;
}

score 1 · Accepted Answer

我想你想要的是这样的，

public class Test {
   public static void main(String args[]) {
    String text = "<!--Address:google.co.uk.html-->";
    String converted1 = text.replaceAll("\\<!--", "");
    String converted2 = converted1.replaceAll("\\-->", "");
    System.out.println(converted2);
   }

}

结果显示：地址：google.co.uk.html

score 0 · Accepted Answer

在 catch 块中不要return null使用printStackTrace();. 它将帮助您查找是否出现问题。

        String str1 = "<!--Address:google.co.uk.html-->";
        // Approach 1
        int st = str1.indexOf("<!--"); // gives index which starts from <
        int en = str1.indexOf("-->");  // gives index which starts from -
        str1 = str1.substring(st + 4, en);
        System.out.println(str1);

        // Approach 2
        String str2 = "<!--Address:google.co.uk.html-->";
        str2 = str2.replaceAll("[<>!-]", "");
        System.out.println( str2);

注意 $100：请注意，在 replaceAll 中使用正则表达式将替换包含正则表达式参数的字符串中的所有内容。

java - 尝试从在某些标签之间读取的缓冲阅读器中提取子字符串

2 回答 2

Related

Reference