3

Let's dive into this, shall we?

Ok, I need to write a script (I don't care what language, prefer something like Python or Javascript, but whatever works I will take time to learn). The script will access multiple URL's, extract text from each site and store it into a folder on my PC. (From there I am manipulating the data with Python, which I know how to do.)

EDIT: Currently I am using python's NLTK module. Here is a simple version of my code:

url  = "<URL HERE>"
html = urlopen(url).read()
raw = nltk.clean_html(html)
print(raw)

This code works fine for both http and https, but not for instances where authentication is required.

Is there a Python module which deals with secure authentication?

Thanks in advance for help! And to the mods who will view this as a bad question, please just give me ways to make it better. I need ideas..from people, not Google.

4

1 回答 1

1

Mechanize ( 2 ) 是一种选择,其他只是使用 urllib2

于 2013-08-08T19:51:50.083 回答