1

对我对正则表达式世界的理解不佳表示歉意。我正在尝试使用正则表达式拆分文本。这就是我现在正在做的事情。请考虑以下字符串


String input = "Name:\"John Adam\"  languge:\"english\"  Date:\" August 2011\"";
Pattern pattern = Pattern.compile(".*?\\:\\\".*?\\\"\\s*");
Matcher matcher = pattern.matcher(input);
List keyValues = new LinkedList();
while(matcher.find()){
   System.out.println(matcher.group());
   keyValues.add(matcher.group());
}
System.out.println(keyValues);

我得到了正确的输出,这就是我正在寻找的。


Name:"John Adam"  
languge:"english"  
Date:" August 2011"

现在,我正在努力使它有点通用。例如,如果我在输入字符串中添加另一个模式。我以不同的模式添加了一个新值 Audience:(user),即 " 被 ();


String input = "Name:\"John Adam\"  languge:\"english\"  Date:\" August 2011\"  Audience:(user)";

这将是什么通用模式?对不起,如果这听起来太蹩脚。

谢谢

4

3 回答 3

2

第 1 步:删除大部分斜杠 - 您不需要转义引号或冒号(它们只是另一个普通字符)

试试这个模式:

".*?:[^\\w ].*?[^\\w ]\\s*"

它适用于所有非单词/空格字符作为分隔符,适用于您的测试用例,并且适用于name:'foo'

于 2012-06-01T19:44:17.900 回答
1

You can always use OR operator |

Pattern pattern = Pattern.compile("(.*?\\:\\\".*?\\\"\\s*)|(.*?\\:\\(.*?\\)\\s*)");
于 2012-06-01T19:16:09.890 回答
1

First of all I should point out that regular expressions are NOT a magic bullet. By that I mean that while they can be incredibly flexible and useful in some cases they don't solve all problems of text matching (for instance parsing XML-like markup)

However, in the example you gave, you could use the | syntax to specify an alternate pattern to match. An example might be:

Pattern pattern = Pattern.compile(".*?\\:(\\\".*?\\\"|\\(.*?\\))\\s*");

This section in parentheses: (\\\".*?\\\"|\\(.*?\\)) can be thought of as: find a pattern that matches \\\".*?\\\" or \\(.*?\\) (and remember what the backslashes mean - they are escape characters.

Note though that this approach, while flexible, requires you to add a specific case quite literally so it's not truly generic in the absolute sense.

NOTE

To better illustrate what I meant by not being able to craft a truly generic solution, here's a more generic pattern that you could use:

Pattern pattern = Pattern.compile(".*?\\:[\\\"(]{1,2}.*?[\\\")]{1,2}\\s*");

The pattern above uses character classes and it's more generic but while it will match your examples, it will also match things like: blah:\stuff\ or blah:"stuff" or even hybrids like blah:\"stuff) or worse blah:((stuff""

于 2012-06-01T19:16:19.220 回答