0

我需要提取文本中存在的不同类型的值,例如简单值、指数值等。为此我编写了不同的正则表达式。只是我想使用所有这些正则表达式作为列表来识别文本中存在的不同值

(-)?[0-9]+(\\.[0-9]+)?   // simple numbers
[0-9]+(\\.[0-9]+)? *(-|--|to|(up to)|upto) *-?[0-9]+(\\.[0-9]+)? //simple range
(([0-9]+ *0{3},?)|([0-9]+,[0-9]{3})) //thousands

如何在 java 中对其进行编码以识别 java 中的多个模式。我正在使用我在这里敲击的java regex matcher。

private Pattern Value = Pattern.compile("");
Matcher matcher = Value.matcher(docText);
4

2 回答 2

0

您可以将所有模式都放在 a 中ArrayList<Pattern>,读取每一行,然后再次验证您的模式数组。就像是

patternList.add(Pattern.compile("\\w+"));
...
int countPatternMatches(String newLine, List<Pattern> patternList){
 int amt = 0;
 for(Pattern pattern : patternList) {
   Matcher matcher = pattern.matcher(newLine);
   if (matcher.find()) {
     amt++;
   }
 }
 return amt;
}

那么如果匹配的数量等于你匹配所有模式的列表大小

于 2013-05-29T08:58:03.837 回答
0

您可以使用 OR 条件(“|”)字符和圆括号(括号)连接模式,然后根据括号定义相关组。

不过,您的 Pattern 可能会变得非常难以阅读。

编辑 (2)

在下面找到一种不同的方法,混合模式连接和组引用(模式中没有反向引用以避免复杂化)。

package test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    /* 
     * Pattern for a single numerical, starting with optional -/+ sign, 
     * followed by an unidentified length integer,
     * followed by optional "," separator and unidentified length integer, 
     * followed by optional "." separator and unidentified length integer decimals
     */
    private static final String SINGLE_NUMBER_PATTERN = "([\\+-]?\\d+(,\\d+)?(\\.\\d+)?)";
    /*
     * Pattern for range indicator. Note that "up to" is not intended here as a prefix.   
     */
    private static final String RANGE_INDICATOR_PATTERN = "(\\s?(-\\s|--|to|(up\\s?to))\\s?)";
    /*
     * Pattern combining one numerical, a range indicator, and another numerical. 
     * It starts with a negative look-behind reference to avoid starting the match if 
     * with preceded by a range indicator, and avoids ending the match if followed by a 
     * range indicator as well.  
     */
    private static final String RANGE_PATTERN = "(?<!"
            + RANGE_INDICATOR_PATTERN + ")" + SINGLE_NUMBER_PATTERN
            + RANGE_INDICATOR_PATTERN + SINGLE_NUMBER_PATTERN + "(?!"
            + RANGE_INDICATOR_PATTERN + ")";
    /*
     * Pattern defining a single number neither preceded nor followed by a range indicator.
     */
    private static final String ISOLATED_NUMBER_PATTERN = "(?<!"
            + RANGE_INDICATOR_PATTERN + ")" + SINGLE_NUMBER_PATTERN
            + "(?!" + RANGE_INDICATOR_PATTERN + ")";
    /*
     * Ultimate pattern combining single, isolated numbers and ranges. 
     */
    private static final String WHOLE_PATTERN = 
            "(" + ISOLATED_NUMBER_PATTERN + ")|(" + RANGE_PATTERN + ")";
    public static void main(String[] args) {
        // single numbers in various formats
        String singleNumbers = "0 -1000 30,000 3,000.00";
        // various ranges with various numerical formats
        String ranges = "0 to 1 1000- 2,000 30,000--40,000.00 4.00 up to 5 10,000 to -5";
        // mixed
        String mixed = "0 0 to 1 -1000 1000- 2,000 30,000 30,000--40,000.00 3,000.00 4.00 up to 5 10,000 to -5";
        // testing single numbers
        Pattern singleNumber = Pattern.compile(SINGLE_NUMBER_PATTERN);
        Matcher singleNumberMatcher = singleNumber.matcher(singleNumbers);
        while (singleNumberMatcher.find()) {
            System.out.println("SINGLE NUMBER: " + singleNumberMatcher.group());
        }
        // testing ranges
        Pattern rangesPattern = Pattern.compile(RANGE_PATTERN);
        Matcher rangesMatcher = rangesPattern.matcher(ranges);
        while (rangesMatcher.find()) {
            System.out.println("WHOLE RANGE: " + rangesMatcher.group());
            // note how tough it is to correctly guess group numbers
            // if you use Java 7 you can actually name your groups
            System.out.println("\tfirst number in range: "
                    + rangesMatcher.group(4));
            System.out.println("\tsecond number in range: "
                    + rangesMatcher.group(10));
        }
        // testing mixed examples
        Pattern mixedPattern = Pattern.compile(WHOLE_PATTERN);
        Matcher mixedMatcher = mixedPattern.matcher(mixed);
        while (mixedMatcher.find()) {
            System.out.println("WHOLE MATCH: " + mixedMatcher.group());
            if (mixedMatcher.group(1) != null) {
                System.out.println("\tsingle number: " + mixedMatcher.group(1));
            }
            else if (mixedMatcher.group(11) != null) {
                System.out.println("\trange: " + mixedMatcher.group(11));
                // note how CRAZY it is to correctly guess group numbers now!
                // if you use Java 7 you can actually name your groups
                System.out.println("\t\tfirst number in range: "
                        + mixedMatcher.group(15));
                System.out.println("\t\tsecond number in range: "
                        + mixedMatcher.group(21));
            }
        }
    }
}

输出

SINGLE NUMBER: 0
SINGLE NUMBER: -1000
SINGLE NUMBER: 30,000
SINGLE NUMBER: 3,000.00
WHOLE RANGE: 0 to 1
    first number in range: 0
    second number in range: 1
WHOLE RANGE: 1000- 2,000
    first number in range: 1000
    second number in range: 2,000
WHOLE RANGE: 30,000--40,000.00
    first number in range: 30,000
    second number in range: 40,000.00
WHOLE RANGE: 4.00 up to 5
    first number in range: 4.00
    second number in range: 5
WHOLE RANGE: 10,000 to -5
    first number in range: 10,000
    second number in range: -5
WHOLE MATCH: 0
    single number: 0
WHOLE MATCH: 0 to 1
    range: 0 to 1
        first number in range: 0
        second number in range: 1
WHOLE MATCH: -1000
    single number: -1000
WHOLE MATCH: 100
    single number: 100
WHOLE MATCH: 0- 2,000
    range: 0- 2,000
        first number in range: 0
        second number in range: 2,000
WHOLE MATCH: 30,000
    single number: 30,000
WHOLE MATCH: 30,00
    single number: 30,00
WHOLE MATCH: 0--40,000.00
    range: 0--40,000.00
        first number in range: 0
        second number in range: 40,000.00
WHOLE MATCH: 3,000.00
    single number: 3,000.00
WHOLE MATCH: 4.0
    single number: 4.0
WHOLE MATCH: 0 up to 5
    range: 0 up to 5
        first number in range: 0
        second number in range: 5
WHOLE MATCH: 10,00
    single number: 10,00
WHOLE MATCH: 0 to -5
    range: 0 to -5
        first number in range: 0
        second number in range: -5

笔记

这个解决方案不是圣杯。请谨慎使用,从长远来看,这可能会变得非常臃肿且难以阅读!

于 2013-05-29T08:59:35.317 回答