java - 从文本文件中剥离数据

Question

我将首先发布文本文件中的日期，这只是其中的 4 行，实际文件有几百行长。

2011 年 9 月 9 日星期五
-STV 101--------05:00 - 23:59 SSB 4185 报告于 2011 年 9 月 8 日 2:37 打印

0-AH 104--------07:00 - 23:00 AH GYM 报告打印于 2011 年 9 月 8 日 2:37

-BG 105--------07:00 - 23:00 SH GREAT HALL 报告于 2011 年 9 月 8 日 2:37 打印

我想用这个文本文件做的是忽略上面有日期的第一行，然后忽略下一行的“-”，但读入“STV 101”、“5:00”和“23:59” " 将它们保存到变量中，然后忽略该行上的所有其他字符，之后的每一行都以此类推。

这是我目前完全阅读这些行的方式。然后，一旦用户将路径放入 scheduleTxt JTextfield 中，我就调用此函数。它可以很好地读取和打印每一行。

public void readFile () throws IOException
{
    try
    {
        FileInputStream fstream = new FileInputStream(scheduleTxt.getText());
        DataInputStream in = new DataInputStream(fstream);
        BufferedReader br = new BufferedReader(new InputStreamReader(in));
        String strLine;

        while ((strLine = br.readLine()) != null)   
        {
            System.out.println (strLine);
        }
        in.close();
    }
    catch (Exception e){//Catch exception if any
        System.err.println("Error: " + e.getMessage());
    }
}

更新：事实证明，我还需要从顶行中删除 Friday 并将其放入变量中，谢谢！牛肉。

score 3 · Accepted Answer

没有彻底测试它，但是这个正则表达式会在第 2、5 和 7 组中捕获您需要的信息：（假设您在“0-AH 104----”的示例中只对“AH 104”感兴趣) ^(\S)*-(([^-])*)(-)+((\S)+)\s-\s((\S)+)\s(.)*

    String regex = "^(\\S)*-(([^-])*)(-)+((\\S)+)\\s-\\s((\\S)+)\\s(.)*";
    Pattern pattern = Pattern.compile(regex);
    while ((strLine = br.readLine()) != null){
        Matcher matcher = pattern.matcher(strLine);
        boolean matchFound = matcher.find();
        if (matchFound){
            String s1 = matcher.group(2);
            String s2 = matcher.group(5);
            String s3 = matcher.group(7);
            System.out.println (s1 + " " + s2 + " " + s3);
        }

    }

可以使用非捕获组调整表达式，以便仅捕获您想要的信息。

正则表达式元素的解释：

^(\S)*- Matches group of non-whitespace characters ended by -. Note: Could have been ^(.)*- instead, would not work if there are whitespaces before the first -.
(([^-])*) Matches group of every character except -.
(-)+ Matches group of one or more -.
((\S)+) Matches group of one or more non-white-space characters. This is captured in group 5.
\s-\s Matches group of white-space followed by - followed by whitespace.
'((\S)+)' Same as 4. This is captured in group 7.
\s(.)* Matches white-space followed by anything, which will be skipped.

More info on regular expression can be found on this tutorial. There are also several useful cheatsheets around. When designing/debugging an expression, a regexp testing tool can prove quite useful, too.

java - 从文本文件中剥离数据

1 回答 1

Related

Reference