1

我正在尝试匹配一个写成单词、数字或罗马数字的数字。这是一堆样本

CHAPTER 1
CHAPTER 2
CHAPTER THREE
CHAPTER IV
CHAPTER TWENTY TWO

我在正则表达式方面很糟糕,这就是我到目前为止所得到的。

(CHAPTER (([0-9]+)|(/* words - see below */)|( /* roman - see below */)))

// words
(TWENTY|THIRTY|etc)?( |-)?(ONE|TWO|THREE|FOUR|FIVE|etc)?

// roman
(I|II|III|IV|V|etc)+

该语句捕获了第 1 章、第 2 章和第 3 章,但尝试将 IV 匹配为一个单词(我猜它与 FIVE 匹配?)。二十二 根本不匹配。

任何人都可以帮忙吗?这是完整的正则表达式

(CHAPTER (
([0-9]+)|
((TWENTY|THIRTY)?( |-)?(ONE|TWO|THREE|FOUR|FIVE)?)|
((I|II|III|IV|V)+)
))

笔记:

这样做的目的是将这些文本表示形式转换为实际的整数。在每种情况下我都有方法可以做到这一点,所以我确实需要区分各种情况

4

3 回答 3

1

由于您已经有了解析器,如果给出一些表面上看起来像有效的罗马/文本输入但不是的东西,那么它们有望优雅地失败,您可以将它们全部调用并查看哪个通过。

如果您不只是想全部调用它们,则此正则表达式应确定将每个输入传递给哪个解析器。

var re = new Regex(
    @"CHAPTER (?:(?<arabic>\d+)|(?<roman>[IVXLCDM]+)|(?<text>[A-Z ]+))");

例如称为

var input = @"CHAPTER 1
CHAPTER 2
CHAPTER THREE
CHAPTER IV
CHAPTER TWENTY TWO";

foreach (Match match in re.Matches(input))
{
    if (match.Groups["arabic"].Success)
    {
        Console.WriteLine("Pass {0} to Arabic parser", match.Groups["arabic"].Value);
    }
    else if (match.Groups["roman"].Success)
    {
        Console.WriteLine("Pass {0} to Roman parser", match.Groups["roman"].Value);
    }
    else if (match.Groups["text"].Success)
    {
        Console.WriteLine("Pass {0} to Text parser", match.Groups["text"].Value);
    }
}

结果是

Pass 1 to Arabic parser
Pass 2 to Arabic parser
Pass THREE to Text parser
Pass IV to Roman parser
Pass TWENTY TWO to Text parser
于 2014-09-20T07:56:55.010 回答
1

罗马数字的正则\bM{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b
表达式是: 数字的正则表达式:\d+
文字的正则表达式:[a-z ]+

将所有这些结合起来:

CHAPTER (?:(?<digits>\d+)|(?<roman>\bM{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b)|(?<literal>[A-Z ]+))
于 2014-09-20T09:45:34.067 回答
0
CHAPTER (?:\d+|(?:XVIII|XVII|XIII|VIII|XIV|XVI|XII|III|VII|XV|VI|IV|XI|IX|XX|III|II|X|V|I)|(?:(?P<d>TWENTY|THIRTY|FORTY|FIFTY|SIXTY|SEVENTY|EIGHTY|NINETY)?(?(d)(?: (?:ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE))?|(?:ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE|TEN|ELEVEN|TWELVE|THIRTEEN|FOURTEEN|FIFTEEN|SIXTEEN|SEVENTEEN|EIGTHEEN|NINETEEN))))

分解和解释:

CHAPTER // match "CHAPTER " literally
    (?:// then either:
        \d+// 1: digits
        |
        (?:// or 2: roman numerals (up to 18) (note: make sure to order them by length!)
            XVIII|XVII|XIII|VIII|XIV|XVI|XII|III|VII|XV|VI|IV|XI|IX|XX|III|II|X|V|I
        )
        |// or 3: words
        (?:
            (?P<d>// first, one of the literals "TWENTY", "THIRTY", etc...
                TWENTY|THIRTY|FORTY|FIFTY|SIXTY|SEVENTY|EIGHTY|NINETY
            )?// ...if possible
            (?(d) // then, if the previous group matched...
                (?: // ...a space...
                    (?:// ...and the numbers "ONE" to "NINE"
                        ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE
                    )
                )?// ...if possible.
                |
                (?://otherwise, one of "ONE" to "NINETEEN"
                    ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE|TEN|ELEVEN|TWELVE|THIRTEEN|FOURTEEN|FIFTEEN|SIXTEEN|SEVENTEEN|EIGTHEEN|NINETEEN
                )
            )
        )
    )

演示。

于 2014-09-20T07:38:34.910 回答