您可以使用Aspose.PDF Cloud SDK for Python从 PDF 中逐行提取文本以及空格。目前,它支持来自云存储(Amazon S3、DropBox、Google Drive Storage、Google Cloud Storage、Windows Azure Storage、FTP Storage 和 Aspose 默认云存储)的文件处理。
这是示例代码:
import os
import asposepdfcloud
from asposepdfcloud.apis.pdf_api import PdfApi
# Get Client Id and Client Secret from https://cloud.aspose.com
pdf_api_client = asposepdfcloud.api_client.ApiClient(
app_key='xxxxxxxxxxxxxxxxxx',
app_sid='xxxx-xxxx-xxxx-xxxx-xxxxxxxxxx')
pdf_api = PdfApi(pdf_api_client)
temp_folder="Temp"
#upload PDF file to storage
data_file = "C:/Temp/02_pages.pdf"
remote_name="02_pages.pdf"
pdf_api.upload_file(temp_folder + '/' + remote_name,data_file)
llx = 0
lly = 0
urx = 0
ury = 0
response = pdf_api.get_text(remote_name, llx, lly, urx, ury, folder= temp_folder)
for i in response.text_occurrences.list:
print(i.text)
PS:我是 Aspose 的开发布道者