python - 如何将变量添加到 urllib 中的 URL 参数

Question

我正在尝试访问此 URL：

http://ichart.finance.yahoo.com/table.csv?s=GOOG&a=05&b=20&c=2013&d=05&e=28&f=2013&g=d&ignore=.csv

但不是总是 GOOG，而是变量 ticker_list 中输入的任何内容，如下所示：

当我这样做时，它会起作用：

URL = urllib.request.urlopen("http://ichart.finance.yahoo.com/table.csv?s=GOOG&a=05&b=20&c=2013&d=05&e=28&f=2013&g=d&ignore=.csv")
html = URL.read()
print (html)

但如果我这样做：

filename = input("Please enter file name to extract data from: ")
with open(filename) as f:
    data = f.readlines()    # Read the data from the file

tickers_list = []
for line in data:
    tickers_list.append(line)   # Separate tickers into individual elements in list

print (tickers_list[0]) # Check if printing correct ticker
url = "http://ichart.finance.yahoo.com/table.csv?s=%s&a=00&b=1&c=2011&d=05&e=28&f=2013&g=d&ignore=.csv" % str(tickers_list[0])
print (url) # Check if printing correct URL

URL = urllib.request.urlopen(url)
html = URL.read()
print (html)

并给我这个错误：

urllib.error.URLError: <urlopen error no host given>

我没有正确地进行字符串格式化吗？

score 2 · Accepted Answer

您从文件名中读取的数据包括每行末尾的换行符（.readlines()不删除它）。您应该自己删除它；str.strip()删除所有空格，包括换行符：

filename = input("Please enter file name to extract data from: ")
with open(filename) as f:
    tickers_list = f.readlines()    # .readlines() returns a list *already*

print(tickers_list[0].strip())
url = "http://ichart.finance.yahoo.com/table.csv?s=%s&a=00&b=1&c=2011&d=05&e=28&f=2013&g=d&ignore=.csv" % tickers_list[0].strip()
print(url)

response = urllib.request.urlopen(url)
html = response.read()
print(html)

您不需要调用元素，因为从文件str()中tickers_list[0]读取已生成字符串列表。此外，%s格式化占位符会将其值转换为字符串（如果它还不是字符串）。

使用换行符（下面输出中\n的字符repr()），您会得到您看到的确切错误：

>>> url = "http://ichart.finance.yahoo.com/table.csv?s=%s&a=00&b=1&c=2011&d=05&e=28&f=2013&g=d&ignore=.csv" % 'GOOG\n'
>>> print(repr(url))
'http://ichart.finance.yahoo.com/table.csv?s=GOOG\n&a=00&b=1&c=2011&d=05&e=28&f=2013&g=d&ignore=.csv'
>>> urllib.request.urlopen(url)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mj/Development/Libraries/buildout.python/parts/opt/lib/python3.3/urllib/request.py", line 156, in urlopen
    return opener.open(url, data, timeout)
  File "/Users/mj/Development/Libraries/buildout.python/parts/opt/lib/python3.3/urllib/request.py", line 467, in open
    req = meth(req)
  File "/Users/mj/Development/Libraries/buildout.python/parts/opt/lib/python3.3/urllib/request.py", line 1172, in do_request_
    raise URLError('no host given')
urllib.error.URLError: <urlopen error no host given>

如果您打算只处理文件输入中的一行，请使用读取f.readline()该行并省去索引列表的麻烦。你仍然需要去掉换行符。

如果要处理所有行，只需直接循环输入文件，这会分别产生每一行，再次使用换行符：

with open(filename) as f:
    for ticker_name in f:
        ticker_name = ticker_name.strip()
        url = "http://ichart.finance.yahoo.com/table.csv?s=%s&a=00&b=1&c=2011&d=05&e=28&f=2013&g=d&ignore=.csv" % ticker_name

        # etc.

score 2 · Accepted Answer

对于在 python 中操作 url，我建议两种解决方案：furl或URLObject。这两个库为您提供了非常好的界面来轻松操作 url。

文档中的示例furl：

>>> 从毛皮进口毛皮
>>> f = furl('http://www.google.com/?one=1&two=2')
>>> f.args['三'] = '3'
>>> del f.args['one']
>>> f.url
'http://www.google.com/?two=2&three=3'

python - 如何将变量添加到 urllib 中的 URL 参数

2 回答 2

Related

Reference