python - 将包含罗马数字的字符串转换为等效整数

Question

我有以下字符串：

str = "MMX Lions Television Inc"

我需要将其转换为：

conv_str = "2010 Lions Television Inc"

我有以下函数将罗马数字转换为其等效整数：

numeral_map = zip(
    (1000, 900, 500, 400, 100, 90, 50, 40, 10, 9, 5, 4, 1),
    ('M', 'CM', 'D', 'CD', 'C', 'XC', 'L', 'XL', 'X', 'IX', 'V', 'IV', 'I')
)

def roman_to_int(n):
    n = unicode(n).upper()

    i = result = 0
    for integer, numeral in numeral_map:
        while n[i:i + len(numeral)] == numeral:
            result += integer
            i += len(numeral)
    return result

我将如何re.sub在这里获取正确的字符串？

（注意：我尝试使用regex此处描述的内容：How do you match only valid roman numbers with a regular expression?但它不起作用。）

score 7 · Accepted Answer

在寻找常用函数/库时，请务必尝试使用Python 包索引。

这是与关键字 'roman' 相关的模块列表。

例如'romanclass'有一个实现转换的类，引用文档：

So a programmer can say:

>>> import romanclass as roman

>>> two = roman.Roman(2)

>>> five = roman.Roman('V')

>>> print (two+five)

and the computer will print:

VII

score 2 · Accepted Answer

re.sub()可以接受一个函数作为替换，该函数将接收一个参数，即 Match 对象，并且应该返回一个替换字符串。您已经有一个将罗马数字字符串转换为 int 的函数，所以这并不困难。

在您的情况下，您需要这样的功能：

def roman_to_int_repl(match):
    return str(roman_to_int(match.group(0)))

现在您可以从您链接的问题中修改正则表达式，以便在更大的字符串中找到匹配项：

s = "MMX Lions Television Inc"
regex = re.compile(r'\b(?=[MDCLXVI]+\b)M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b')
print regex.sub(roman_to_int_repl, s)

这是一个不会替换字符串中“LLC”的正则表达式版本：

regex = re.compile(r'\b(?!LLC)(?=[MDCLXVI]+\b)M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b')

您还可以将原始正则表达式与修改后的替换函数一起使用：

def roman_to_int_repl(match):
    exclude = set(["LLC"])   # add any other strings you don't want to replace
    if match.group(0) in exclude:
        return match.group(0)
    return str(roman_to_int(match.group(0)))

python - 将包含罗马数字的字符串转换为等效整数

2 回答 2

Related

Reference