python - 为什么我的 Python 脚本没有正确返回页面源？

Question

我刚刚编写了一个脚本，旨在通过字母表查找所有无人认领的四字母 Twitter 名称（实际上只是为了练习，因为我是 Python 新手）。我已经编写了几个使用“urllib2”从 url 获取网站 html 的先前脚本，但这次它似乎不起作用。这是我的脚本：

import urllib2

src=''
url=''
print "finding four-letter @usernames on twitter..."
d_one=''
d_two=''
d_three=''
d_four=''
n_one=0
n_two=0
n_three=0
n_four=0
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

while (n_one > 26):
    while(n_two > 26):
        while (n_three > 26):
            while (n_four > 26):
                d_one=letters[n_one]
                d_two=letters[n_two]
                d_three=letters[n_three]
                d_four=letters[n_four]
                url = "twitter.com/" + d_one + d_two + d_three + d_four

                src=urllib2.urlopen(url)
                src=src.read()
                if (src.find('Sorry, that page doesn’t exist!') >= 0):
                    print "nope"
                    n_four+=1
                else:
                    print url
                    n_four+=1
            n_three+=1
            n_four=0
        n_two+=1
        n_three=0
        n_four=0
    n_one+=1    
    n_two=0
    n_three=0
    n_four=0

运行此代码返回以下错误：

SyntaxError：第 29 行的文件 name.py 中的非 ASCII 字符“\xe2”，但未声明编码；有关详细信息，请参见http://www.python.org/peps/pep-0263.html

在访问该链接并进行一些额外的搜索后，我在文档顶部添加了以下行：

# coding: utf-8

现在，虽然它不再返回错误，但似乎没有发生任何事情。我添加了行

print src

它应该打印了每个 url 的 html，但是当我运行它时什么也没发生。任何建议将不胜感激。

score 5 · Accepted Answer

您可以通过使用摆脱过度嵌套itertools.product

from itertools import product
for d_one, d_two, d_three, d_four in product(letters, repeat=4):
    ...

而不是定义一个字母列表，你可以使用strings.ascii_lowercase

您应该告诉 urlopen 您正在使用哪种协议（http）

url = "http://twitter.com/" + d_one + d_two + d_three + d_four

此外，当您确实获得一个不存在的页面时， urlopen 会引发 a 404，因此您应该检查它而不是查看页面文本

score 1 · Accepted Answer

好吧，你初始化n_one=0，然后做一个循环while (n_one > 26)。当 Python 第一次遇到它时，它会看到while (0 > 26)显然是错误的，因此它会跳过整个循环。

正如 gnibbler 的回答告诉你的那样，无论如何都有更干净的方法来做循环。

python - 为什么我的 Python 脚本没有正确返回页面源？

2 回答 2

Related

Reference