1

Need help with RegEx. Using C#.

Group of Words in parentheses (round or box or curly) should be considered as one word. The part, which is outside parentheses, should split based on white space ' '.

A) Test Case –

Input - Andrew. (The Great Musician) John Smith-Lt.Gen3rd

Result (Array of string) –<br> 1. Andrew.
2. The Great Musician
3. John
4. Smith-Lt.Gen3rd

B) Test Case –

Input - Andrew. John

Result (Array of string) –<br> 1. Andrew.
2. John

C) Test Case –

Input - Andrew {The Great} Pirate

Result (Array of string) –<br> 1. Andrew
2. The Great
3. Pirate

The input is name of a person or any other entity. Current system is very old written in Access. They did it by scanning character by character. I am replacing it with C#.

I thought of doing it in two steps – first parentheses based split and then word split.

I wanted to throw these cases out as bad input -

  1. Only Starting or ending parentheses available

  2. nested parentheses

Overall, I wanted to split only well-formed (if start parentheses is there, there must be an ending) Inputs only.

4

2 回答 2

5

这是一个正则表达式,它将从您的示例中给出正确的结果:

\s(?=.*?(?:\(|\{|\[).*?(?:\]|\}|\)).*?)|(?<=(?:\(|\[|\{).*?(?:\}|\]|\)).*?)\s

此正则表达式分为两部分,由|(OR) 语句分隔:

  1. \s(?=.*?(?:\(|\{|\[).*?(?:\]|\}|\)).*?) - 在(), [], 或{}
  2. (?<=(?:\(|\[|\{).*?(?:\}|\]|\)).*?)\s- 在(), [], 或{}

以下是每个部分的细分:

第 1 部分(\s(?=.*?(?:\(|\{|\[).*?(?:\]|\}|\)).*?)):

1. \s             - matches white space
2. (?=            - Begins a lookahead assertion (What is included must exist after the \s
3. .*?            - Looks for any character any number of times. The `?` makes in ungreedy, so it will grab the least number it needs
4. (?:\(|\{|\[)   - A non passive group looking for `(`, `{`, or `[`
5. .*?            - Same as #3
6. (?:\]|\}|\))   - The reverse of #4
7. .*?            - Same as #3
8. )              - Closes the lookahead.  #3 through #7 are in the lookahead.

第 2 部分是相同的,但不是前瞻 ( (?=)),而是后视 ( (?<=))

在作者编辑问题后:

对于将搜索仅包含完整括号的行的正则表达式,您可以使用以下命令:

.*\(.*(?=.*?\).*?)|(?<=.*?\(.*?).*\).*

您可以使用它来替换(and){and }or []因此您有完整的大括号和方括号。

于 2013-03-11T22:14:06.150 回答
1

这个怎么样:

Regex regexObj = new Regex(
    @"(?<=\()       # Assert that the previous character is a (
    [^(){}[\]]+     # Match one or more non-paren/brace/bracket characters
    (?=\))          # Assert that the next character is a )
    |               # or
    (?<=\{)[^(){}[\]]+(?=\}) # Match {...}
    |               # or 
    (?<=\[)[^(){}[\]]+(?=\]) # Match [...]
    |               # or
    [^(){}[\]\s]+   # Match anything except whitespace or parens/braces/brackets", 
    RegexOptions.IgnorePatternWhitespace);

这假定没有嵌套的括号/大括号/方括号。

于 2013-03-11T21:37:30.853 回答