I tried the python codes from the article of Rasha Ashraf "Scraping EDGAR with Python". Yesterday I got helped from you great developer(s). Specially Thanks for Jack Fleeting. The links related to this problem are as follows:
Text Scraping (from EDGAR 10K Amazon) code not working
word count from web text document result in 0
Here is the 2nd Python program from the same article above and still...not working due to the Python version difference, I suppose.
My problem is that I met the initial error called "TypeError: a bytes-like object is required, not 'str' ". I searched StackOverflow and applied one method and another. However, once one error message was gone, the other errors occurred. After I improvised multiple code changes, the result for "print(element4)" showed "None". Which is not the result intended by the author.
My puny trial to correct the original codes proved not working. Thus, here I upload the original codes and the first error message. Once you helped me to solve the initial error message, then I will keep going on solving the 2nd, 3rd, and so on.
I usually have been dealing with numeric variables and categorical ones in the CSV file format with Python. Thus, this web scraping Python program (especially dealing and gathering URLs) is beyond my ability for now in a sense. Please help me to get the result of "element4" other than "None". Then I can have the proper paths of the (10-K) filing of Amazon in the year of 2013.
import time
import csv
import sys
CIK = '1018724'
Year= '2013'
FILE= '10-K'
# Get the Master Index File for the given Year
url='https://www.sec.gov/Archives/edgar/full-index/%s/QTR1/master.idx'%(Year)
from urllib.request import urlopen
response= urlopen(url)
string_match1= 'edgar/data/'
element2 = None
element3 = None
element4 = None
# Go through each line of the master index file and find given CIK # and File (10-K)
# and extract the text file path
for line in response:
if CIK in line and FILE in line:
for element in line.split(' '):
if string_match1 in element:
element2 = element.split('|')
for element3 in element2:
if string_match1 in element3:
element4 = element3
print(element4)
### The path of the 10-K filing
url3 = 'https://www.sec.gov/Archives/'+element4
--- Error Message ---
TypeError Traceback (most recent call last)
<ipython-input-25-8b7ded22bf96> in <module>
25
26 for line in response:
---> 27 if CIK in line and FILE in line:
28 for element in line.split(' '):
29 if string_match1 in element:
TypeError: a bytes-like object is required, not 'str'