Let's dive into this, shall we?
Ok, I need to write a script (I don't care what language, prefer something like Python or Javascript, but whatever works I will take time to learn). The script will access multiple URL's, extract text from each site and store it into a folder on my PC. (From there I am manipulating the data with Python, which I know how to do.)
EDIT:
Currently I am using python's NLTK
module. Here is a simple version of my code:
url = "<URL HERE>"
html = urlopen(url).read()
raw = nltk.clean_html(html)
print(raw)
This code works fine for both http and https, but not for instances where authentication is required.
Is there a Python module which deals with secure authentication?
Thanks in advance for help! And to the mods who will view this as a bad question, please just give me ways to make it better. I need ideas..from people, not Google.