python - Python：从结果中提取数字

Question

我正在编写一个 python 脚本来自动从 imdb 中提取评级，只是我无法从结果中提取数字。

from pattern.web import URL
from pattern.web import plaintext
from pattern.web import decode_utf8
import re

def scrape_imdb(film):
    url = URL (film)
    s=url.download()
    decode_utf8(url.download(s))
    regels=re.compile(('"ratingValue">[0-9].[0-9]'))
    rating= regels.findall(s)
    rating2= rating[0:1]
    rating3= rating2.findall("[0-9"])

    regels2=re.compile ("<title>.*</title>")
    titel=regels2.findall(s)
    print titel, rating2

但这给了我一个错误。有人知道我在做什么错吗？

score 3 · Accepted Answer

正如您在对另一个答案的评论中所写：

我仍然得到：AttributeError：'list'对象没有属性'findall'

所以这似乎是你的问题。re.findall返回匹配列表，rating列表也是。当你这样做时rating2 = rating[0:1]，你将一个子列表分配给rating2，所以rating2它本身也是一个列表（虽然只有一个元素）。列表没有findall方法，因此失败。

您可能想要做的是在第一个结果上运行另一个正则表达式rating：

rating = regels.findall(s)
rating2 = rating[0] # only get the first element; a string
rating3 = re.findall("[0-9]", rating2)

score 0 · Accepted Answer

我相信你这里有一个错字：

rating3= rating2.findall("[0-9"])

它应该是：

rating3= rating2.findall("[0-9]")

python - Python：从结果中提取数字

2 回答 2

Related

Reference