3

我有一个 Java 程序,它逐行遍历文件,并尝试将每一行与四个正则表达式之一进行匹配。根据匹配的表达式,程序执行特定操作。这是我所拥有的:

private void processFile(ArrayList<String> lines) {
    ArrayList<Component> Components = new ArrayList<>();
    Pattern pattern = Pattern.compile(
            "Object name\\.{7}: (.++)|"
            + "\\{CAT=([^\\}]++)\\}|"
            + "\\{CODE=([^\\}]++)\\}|"
            + "\\{DESC=([^\\}]++)\\}");

    Matcher matcher;
    // Go through each line and see if the line matches the any of the regexes
    // defined
    Component currentComponent = null;

    for (String line : lines) {
        matcher = pattern.matcher(line);

        if (matcher.find()) {
            // We found a tag. Find out which one
            String match = matcher.group();

            if (match.startsWith("Obj")) {
                // We've got the object name
                if (currentComponent != null) {
                    Components.add(currentComponent);
                }
                currentComponent = new Component();
                currentComponent.setName(matcher.group(1));
            } else if (currentComponent != null) {
                if (match.startsWith("{CAT")) {
                    currentComponent.setCategory(matcher.group(2));
                } else if (match.startsWith("{CODE")) {
                    currentComponent.setOrderCode(matcher.group(3));
                } else if (match.startsWith("{DESC")) {
                    currentComponent.setDescription(matcher.group(4));
                }
            }
        }
    }

    if (currentComponent != null) {
        Components.add(currentComponent);
    }
}

如您所见,我已将四个正则表达式合二为一,并将整个正则表达式应用于该行。如果找到匹配项,我会检查字符串的开头以确定匹配的表达式,然后从组中提取数据。如果有人对运行代码感兴趣,下面提供了一些示例数据:

Object name.......: PMF3800SN
Last modified.....: Wednesday 9 November 2011 11:55:04 AM
File offset (hex).: 00140598 (Hex).
Checksum (hex)....: C1C0 (Hex).
Size (bytes)......: 1,736
Properties........: {*DEVICE}
                    {PREFIX=Q}
                    {*PROPDEFS}
                    {PACKAGE="PCB Package",PACKAGE,1,SOT-323 MOSFET}
                    {*INDEX}
                    {CAT=Transistors}
                    {SUBCAT=MOSFET}
                    {MFR=NXP}
                    {DESC=N-channel TrenchMOS standard level FET with ESD protection}
                    {CODE=1894711}
                    {*COMPONENT}

                    {PACKAGE=SOT-323 MOSFET}
                    *PINOUT SOT-323 MOSFET
                    {ELEMENTS=1}
                    {PIN "D" = D}
                    {PIN "G" = G}
                    {PIN "S" = S}

尽管我的代码有效,但我不喜欢稍后在调用startsWith 例程时重复部分字符串的事实。

我很想知道其他人会如何写这个。

阿米尔

4

3 回答 3

3

group()返回null匹配失败的组。因此,您可以将子表达式分组并null在匹配后检查它们:

Pattern pattern = Pattern.compile(
         "(Object name\\.{7}: (.++))|"
         + "(\\{CAT=([^\\}]++)\\})|"
         + "(\\{CODE=([^\\}]++)\\})|"
         + "(\\{DESC=([^\\}]++)\\})"); 
...
if (match.group(1) != null) { // Object ...
    ...
} ...

实际上,如果您的子表达式中没有|s,您甚至可以对现有组执行此操作。

于 2012-04-12T14:46:42.277 回答
2

正如@axtavt 指出的那样,您可以直接发现一个小组是否参加了比赛。您甚至不必更改正则表达式;对于每个备选方案,您已经有一个捕获组。我喜欢使用该start(n)方法进行测试,因为它看起来更简洁,但检查group(n)空值(如 @axtavt 所做的那样)会产生相同的结果。这是一个例子:

private static void processFile(ArrayList<String> lines) {

    Pattern p = Pattern.compile(
            "Object name\\.{7}: (.++)|"
            + "\\{CAT=([^\\}]++)\\}|"
            + "\\{CODE=([^\\}]++)\\}|"
            + "\\{DESC=([^\\}]++)\\}");

    // Create the Matcher now and reassign it to each line as we go.
    Matcher m = p.matcher("");

    for (String line : lines) {
        if (m.reset(line).find()) {
            // If group #n participated in the match, start(n) will be non-negative.
            if (m.start(1) != -1) {
                System.out.printf("%ncreating new component...%n");
                System.out.printf("  name: %s%n", m.group(1));
            } else if (m.start(2) != -1) {
                System.out.printf("  category: %s%n", m.group(2));
            } else if (m.start(3) != -1) {
                System.out.printf("  order code: %s%n", m.group(3));
            } else if (m.start(4) != -1) {
                System.out.printf("  description: %s%n", m.group(4));
            }
        }
    }
}

但是,我不确定我是否同意您关于在代码中重复部分字符串的推理。如果数据格式发生变化,或者您更改了提取的字段,那么更新代码时似乎更容易不同步。换句话说,您当前的代码不是多余的,它是自我记录的。:D

编辑:您在评论中提到了一次处理整个文件而不是逐行处理的可能性。这实际上是更简单的方法:

private static void processFile(String contents) {

    Pattern p = Pattern.compile(
            "Object name\\.{7}: (.++)|"
            + "\\{CAT=([^\\}]++)\\}|"
            + "\\{CODE=([^\\}]++)\\}|"
            + "\\{DESC=([^\\}]++)\\}");

    Matcher m = p.matcher(contents);

    while (m.find()) {
        if (m.start(1) != -1) {
            System.out.printf("%ncreating new component...%n");
            System.out.printf("  name: %s%n", m.group(1));
        } else if (m.start(2) != -1) {
            System.out.printf("  category: %s%n", m.group(2));
        } else if (m.start(3) != -1) {
            System.out.printf("  order code: %s%n", m.group(3));
        } else if (m.start(4) != -1) {
            System.out.printf("  description: %s%n", m.group(4));
        }
    }
}
于 2012-04-12T17:03:37.400 回答
0

我会定义一个模式+可运行的元对象。遍历行,然后遍历元对象。如果匹配,则执行可运行文件。就像是,

class Meta {
  Pattern pattern;
  Runnable runnable;
  Matcher matcher;

  Meta(Pattern p, Runnable r) {
    pattern = p;
    runnable = r;
  }
}

Meta[] metas = new Meta[] { new Meta(Pattern.compile(...), new Runnable() { ... }), new Meta(...), ... };


for (String line : lines) {
  for (Meta meta : metas) {
    final Matcher matcher = meta.pattern.matcher(line);
    if (matcher.matches()) {
      meta.matcher = matcher;
      meta.runnable.run();
    }
  }
}

这是“对象”行的元对象的样子,

Meta m = new Meta(Pattern.compile("Object name\\.{7}: (.++)", new Runnable() {
  // We've got the object name
  if (currentComponent != null) {
    Components.add(currentComponent);
  }
  currentComponent = new Component();
  currentComponent.setName(matcher.group(1));
});
于 2012-04-12T14:48:15.260 回答