问题
我正在关注本教程https://www.youtube.com/watch?v=eTz3VZmNPSE&list=PLxEus0qxF0wciRWRHIRck51EJRiQyiwZT&index=16
当代码返回我的这个错误时。
目标
我需要抓取一个看起来像这样的pdf(我想附上pdf,但我不知道如何):
170001WO01
English (US) into Arabic (DZ)
Trans./Edit/Proof. 22.117,00 Words 1,350 29.857,95
TM - Fuzzy Match 2.941,00 Words 0,500 1.470,50
TM - Exact Match 353,00 Words 0,100 35,30
方法
我正在按照前面提到的 pdfplumber 教程进行操作。
import re
import pdfplumber
import PyPDF2
import pandas as pd
from collections import namedtuple
ap = open('test.pdf', 'rb')
我将我想要作为最终产品的数据框列命名。
Serv = namedtuple('Serv', 'case_number language num_trans num_fuzzy num_exact')
问题
与有 2 个的教程示例相比,我有 5 个不同的行。
case_li = re.compile(r'(\d{6}\w{2}\d{2})')
language_li = re.compile(r'(nglish \(US\) into )(.*)')
trans_li = re.compile(r'(Trans./Edit/Proof. )(\d{2}\.\d{3})')
fuzzy_li = re.compile(r'(TM - Fuzzy Match )(\d{1}\.\d{3})')
exact_li = re.compile(r'(M - Exact Match )(\d{3})')
问题
当我在代码中引入第三行时,出现了一个我不知道的错误。我已按照 2e0byo 的建议修改了代码,但仍然出现错误。
这是新代码:
line_items = []
with pdfplumber.open(ap) as pdf:
page = pdf.pages
for page in pdf.pages:
text = page.extract_text()
for line in text.split('\n'):
line = case_li.search(line)
if line:
case_number = line
line = language_li.search(line)
if line:
language = line.group(2)
line = trans_li.search(line)
if line:
num_trans = line.group(2)
line = fuzzy_li.search(line)
if line:
num_fuzzy = line.group(2)
line = exact_li.search(line)
if line:
num_exact = line.group(2)
line_items.append(Serv(case_number, language, num_trans, num_fuzzy, num_exact))```
---------------------------------------------------------------------------
这是新的错误:
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_13992/1572426536.py in <module>
10 case_number = line
11
---> 12 line = language_li.search(line)
13 if line:
14 language = line.group(2)
TypeError: expected string or bytes-like object
TypeError: expected string or bytes-like object
# GOAL
It would be to append the lines to line_items and eventually
df = pd.DataFrame(line_items)