python - 从字符串到 int Python 的高级解析

Question

我需要从 wikipedia.org 获取一些数据。我有字符串 a = '4 200 000+ 文章' ，我需要得到 int b = 4200000。我已经通过 BS4 得到了这个字符串，并且我尝试通过 int(a) 进行简单解析，但很明显这不起作用。你可以帮帮我吗？

score 1 · Accepted Answer

您需要一个正则表达式来从这样的文本中获取数字：

import re

int_numbers = re.compile('\d[\d ]*')

def extract_integer(text):
    value_match = int_numbers.search(text)
    if value_match:
        try:
            return int(value_match.group().replace(' ', ''))
        except ValueError:
            # failed to create an int, ignore
            pass

该模式匹配一个数字，后跟 0 个或更多数字或空格。

演示：

>>> a = '4 200 000+ articles'
>>> extract_integer(a)
4200000

如果您需要输入文本中的所有.finditer()这些数字，请使用和生成器：

def extract_integers(text):
    for value_match in int_numbers.finditer(text):
        try:
            yield int(value_match.group().replace(' ', ''))
        except ValueError:
            # failed to create an int, ignore
            pass

演示：

>>> for i in extract_integers('4 300 123 times 42'):
...     print i
...
4300123
42
>>> list(extract_integers('4 300 123 times 42'))
[4300123, 42]

score 0 · Accepted Answer

如果您只想删除除数字之外的所有内容，则可以使用以下内容：

>>> x = "500000+"
>>> import string
>>> all=string.maketrans('','')
>>> nodigs=all.translate(all, string.digits)
>>> x.translate(all, nodigs)

这将从字符串中删除除数字 0-9 之外的所有字符。

score 0 · Accepted Answer

>>> import re 
>>> a = re.findall(r'[\d ]+',  '4 200 000+ articles' )
>>> print a
['4 200 000', ' ']
>>> [x.replace(' ','') for x in a if x.strip()]
['4200000']

python - 从字符串到 int Python 的高级解析

3 回答 3

Related

Reference