I've been developing a python script to download a csv file from a webserver. My usual methodology for doing this is to right click on the web page, go to "inspect element" (in chrome), switch to the network view, and then click the link to see what the traffic looks like. I was expecting to see something like "https://domain.com/file_i_need.csv", but instead what I got was the location of perl script. Since I'm not familiar exactly with how this works exactly, I just copied the curl command (right click on relevant network traffic and "Copy as Curl"). So, i initially just issued a curl command to os.system()
. And then once I got that working I tried to modify the script to use pycurl. Now I'd like to change this to use the requests library (mostly for elegance/neatness). I've seen this question answered but I'm wondering if there's a different way of doing it since the backend is slightly different than expected. I see that urllib.urlretreive() is recommended as an alternative but I'm guessing that won't work here.
question: How can I download a file from a webserver where the http to generate the file is a perl script?
i.e. https:://domain.com/file_maker.pl?param1=12345
curl command: ``curl "https://release.domain.com/release_cr_new.pl?releaseid=26851&v=2&m=a&dump_csv=1" -H "Accept-Encoding: gzip,deflate,sdch" -H "Host: release.domain.com" -H "Accept-Language: en-US,en;q=0.8" -H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8" -H "Referer: https://release.domain.com/release_cr_new.html?releaseid=26851&v=2&m=a" -H "Cookie: releasegroup=Development; XR77=3q3pzeMQc1gf-jDlpNtkgr4WvZYqxVZSYzeQHfGAwMTAeZQ6D3g2e6w; __utma=147924903.423899313.1373397746.1378841205.1380290587.15; __utmc=147924903; __utmz=147924903.1380290587.15.14.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); pubcookie_s_release.domain.com=Hm17WT1VJbPpBLOQ+NhtyBbZlfO9qntsoGP0P8BEVeh4d0ay+THE3EkNLc6PV5rJ40Ui7uj/+c6f2tzZYWOJ/j+dyoP5l+J//rL875K9ERxio1FZeiUVRQgeabetZ+V1AWlrkjURmAw2SU1hEz/f2pCt0sHe06C14vWA95PFu1Smp6viWOL8QnaPHFWhGU3uQQH5Wxex0CziHbrYXHuKwnxwWejvVtTM8e8aIHkM2WuB3IIDhGMVtd0r292owvcv6Rvcl7tYSoQaQYfSpPZreXo4tNO9gh9ZIGqao8LaCfG5Fw8+Ow5wQKf2ryVuPc8Ah4MTIzC1UeZxBtxSTyZk5E1in7LCV9E+d/5G84U+ECcdn166gJg1iMG68II81YJO9fYs91gGtA5iUa6h3RpFo+ysBkqbHjCpetOUxfHh47sdr4nUoIWEb0LfKVTYfvmW6BNGx4m90PqE8aQlknv7zxqAQrujqe7h5zSpmaD5UjrfRwp7lYD+6e88vgQzLgWlcAA=; _session_id=eb0095f849a509c3cf65b43680b3002a; default_column_2=bugid%2Cloginname%2Ccomponent%2Cversionvalue%2Cbugdate%2Cshortdescription%2Cpriority%2Cstatus%2Cqacontact%2Csqa_status%2Cis_dep" -H "Connection: keep-alive"`
sorry for the big block of text.