python - 我的代码有什么问题。我正在使用 python 来尝试数据抓取

Question

我正在尝试打印 nba 选秀中的前 30 个选秀权。我正在使用页面： http: //nbadraft.net/2012mock_draft获取信息。当它运行时它说：

invalid syntax: python1.py, line 8, pos 28
File "/Users/seanyeh/Downloads/python1.py", line 8, in ?
  patFinderLink = re.compile(‘&lt;link rel.*href=”(.*)” />’)

所以这是我的代码：

import urllib2
from BeautifulSoup import BeautifulSoup
# or if your're using BeautifulSoup4:
# from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://nbadraft.net/2012mock_draft').read())

patFinderLink = re.compile(‘&lt;link rel.*href=”(.*)” />’)

findPatLink = re.findall(patFinderLink,webpage)

listIterator = []
listIterator[:] = range(1,30)

for i in listIterator:
    print findPatLink[i]

score 3 · Accepted Answer

你在这条线上有一些有趣的角色（也许这是由于剪切和粘贴？）

 ‘&lt;link rel.*href=”(.*)” />’)

还有，我相信你不见了

 import re

在你的代码中。我还收到一个webpage未定义的错误。

既然你在使用 BeautifulSoup，为什么不使用它来提取你感兴趣的元素呢？BeautifulSoup 的整个想法是避免使用字符串操作或正则表达式进行“手动”解析。

python - 我的代码有什么问题。我正在使用 python 来尝试数据抓取

1 回答 1

Related

Reference