python - 在南亚编号系统中使用正则表达式逗号分隔大数

Question

我试图找到一个正则表达式来根据南亚编号系统用逗号分隔一个大数字。

几个例子：

1,000,000（阿拉伯语）是10,00,000（印度/印度教/南亚）
1,000,000,000（阿拉伯语）是100,00,00,000（印度/H/SA）。

逗号模式每 7 个数字重复一次。例如， 1,00,00,000,00,00,000。

从 Friedl 的 Mastering Regular Expressions 一书中，我有以下阿拉伯编号系统的正则表达式：

r'(?<=\d)(?=(\d{3})+(?!\d))'

对于印度编号系统，我提出了以下表达式，但它不适用于超过 8 位的数字：

r'(?<=\d)(?=(((\d{2}){0,2}\d{3})(?=\b)))'

使用上述模式，我得到100000000,00,00,000.

我正在使用 Pythonre模块 ( re.sub())。有任何想法吗？

score 7 · Accepted Answer

我知道蒂姆已经回答了您提出的问题，但是假设您从数字而不是字符串开始，您是否考虑过是否需要正则表达式？如果您使用的机器支持印度语言环境，那么您可以只使用语言环境模块：

>>> import locale
>>> locale.setlocale(locale.LC_NUMERIC, "en_IN")
'en_IN'
>>> locale.format("%d", 10000000, grouping=True)
'1,00,00,000'

该解释器会话是从 Ubuntu 系统复制的，但请注意，Windows 系统可能不支持合适的语言环境（至少我的不支持），因此虽然这在某些方面是一个“更清洁”的解决方案，但取决于您的环境，它可能或可能无法使用。

score 6 · Accepted Answer

试试这个：

(?<=\d)(?=(\d{2}){0,2}\d{3}(\d{7})*(?!\d))

例如：

>>> import re
>>> inp = ["1" + "0"*i for i in range(20)]
>>> [re.sub(r"(?<=\d)(?=(\d{2}){0,2}\d{3}(\d{7})*(?!\d))", ",", i) 
     for i in inp]
['1', '10', '100', '1,000', '10,000', '1,00,000', '10,00,000', '1,00,00,000', 
 '10,00,00,000', '100,00,00,000', '1,000,00,00,000', '10,000,00,00,000', 
 '1,00,000,00,00,000', '10,00,000,00,00,000', '1,00,00,000,00,00,000', 
 '10,00,00,000,00,00,000', '100,00,00,000,00,00,000', 
 '1,000,00,00,000,00,00,000', '10,000,00,00,000,00,00,000',
 '1,00,000,00,00,000,00,00,000']

作为评论的正则表达式：

result = re.sub(
    r"""(?x)       # Enable verbose mode (comments)
    (?<=\d)        # Assert that we're not at the start of the number.
    (?=            # Assert that it's possible to match:
     (\d{2}){0,2}  # 0, 2 or 4 digits,
     \d{3}         # followed by 3 digits,
     (\d{7})*      # followed by 0, 7, 14, 21 ... digits,
     (?!\d)        # and no more digits after that.
    )              # End of lookahead assertion.""", 
    ",", subject)

python - 在南亚编号系统中使用正则表达式逗号分隔大数

2 回答 2

Related

Reference