python - URL 解析只能显式工作

Question

我正在从 .csv 文件中读取 URL，并尝试解析它们。为什么当我将链接显式放在函数中时，我只能在方案和netlocurlparse(...)中获得正确的值，看到变量o2而不是在我让步newsource时urlparse？

for line in file:
    source = str(line.split(",")[2])
    print("ORIGINAL URL: \n" + source)
    newsource = source.replace('"',"")
    print("REMOVING QUOTES: \n" + newsource)
    newsource.strip
    print("STRIPPING SPACES: \n" + newsource + "\n")
    o = urlparse(newsource)
    print("RESULT PARSING: " + str(o) + "\n")
    o2 = urlparse("http://nl.aldi.be/aldi_vlees_609.html")
    print("RESULT MANUAL PARSING: " + str(o2) + "\n")

输出：

score 1 · Accepted Answer

我可以从失败的解析中看到你有一个前导空格字符，这会导致你遇到同样的问题：

>>> urlparse.urlparse(' http://nl.aldi.be/aldi_vlees_609.html')
ParseResult(scheme='', netloc='', path=' http://nl.aldi.be/aldi_vlees_609.html', params='', query='', fragment='')

这条线什么都不做：

newsource.strip

你可能想要：

newsource = newsource.strip()

python - URL 解析只能显式工作

1 回答 1

Related

Reference