python-3.x - Scraping EDGAR with Python codes (Program 2) not working

Question

I tried the python codes from the article of Rasha Ashraf "Scraping EDGAR with Python". Yesterday I got helped from you great developer(s). Specially Thanks for Jack Fleeting. The links related to this problem are as follows:

Text Scraping (from EDGAR 10K Amazon) code not working

word count from web text document result in 0

Here is the 2nd Python program from the same article above and still...not working due to the Python version difference, I suppose.

My problem is that I met the initial error called "TypeError: a bytes-like object is required, not 'str' ". I searched StackOverflow and applied one method and another. However, once one error message was gone, the other errors occurred. After I improvised multiple code changes, the result for "print(element4)" showed "None". Which is not the result intended by the author.

My puny trial to correct the original codes proved not working. Thus, here I upload the original codes and the first error message. Once you helped me to solve the initial error message, then I will keep going on solving the 2nd, 3rd, and so on.

I usually have been dealing with numeric variables and categorical ones in the CSV file format with Python. Thus, this web scraping Python program (especially dealing and gathering URLs) is beyond my ability for now in a sense. Please help me to get the result of "element4" other than "None". Then I can have the proper paths of the (10-K) filing of Amazon in the year of 2013.

import time

import csv

import sys

CIK = '1018724'

Year= '2013'

FILE= '10-K'


# Get the Master Index File for the given Year

url='https://www.sec.gov/Archives/edgar/full-index/%s/QTR1/master.idx'%(Year)

from urllib.request import urlopen

response= urlopen(url)

string_match1= 'edgar/data/'

element2 = None

element3 = None

element4 = None

# Go through each line of the master index file and find given CIK # and File (10-K)

# and extract the text file path

for line in response:

    if CIK in line and FILE in line:

        for element in line.split(' '):

            if string_match1 in element:

                element2 = element.split('|')

                for element3 in element2:

                    if string_match1 in element3:

                        element4 = element3
                        
print(element4)

### The path of the 10-K filing

url3 = 'https://www.sec.gov/Archives/'+element4

--- Error Message ---

TypeError                                 Traceback (most recent call last)

<ipython-input-25-8b7ded22bf96> in <module>

     25
 
     26 for line in response:

---> 27     if CIK in line and FILE in line:

     28         for element in line.split(' '):

     29             if string_match1 in element:


TypeError: a bytes-like object is required, not 'str'

score 0 · Accepted Answer

我相信这就是您正在寻找的：

import requests
import csv

CIK = '1018724'
Year= '2013'
FILE= '10-K'
url='https://www.sec.gov/Archives/edgar/full-index/%s/QTR1/master.idx'%(Year)

req = requests.get(url)
targets = csv.reader(req.text.splitlines(), delimiter='|')
for line in targets:
    if CIK in line and FILE in line:
        print("https://www.sec.gov/Archives/"+line[-1])

输出：

https://www.sec.gov/Archives/edgar/data/1018724/0001193125-13-028520.txt

python-3.x - Scraping EDGAR with Python codes (Program 2) not working

1 回答 1

Related

Reference