c# - 在 Amazon EC2 实例中的 c# 问题中，itextSharp PDF 到文本文件

Question

我有一个非常奇怪的问题，可能是我不确定 Amazon EC2 实例中发生了什么。

我正在加载 PDF 并提取数据并使用 iTextsharp 组件（版本 5.4.1）将输出作为字符串返回。它在我的本地机器上运行良好。

但是当我部署到 Amazon EC2 实例（windows server 2008 R2）时，它无法正常工作并出现错误。我在日志文件中捕获了如下错误

java.io.IOException: Error: End-of-File, expected line
iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found

我的代码如下，任何帮助/建议将不胜感激。

public static string parseUsingPDFBox(string PDFFilePath)
{
       PdfReader reader = new PdfReader(PDFFilePath);
       StringWriter output = new StringWriter();

       for (int i = 1; i <= reader.NumberOfPages; i++)
            output.WriteLine(PdfTextExtractor.GetTextFromPage(reader, i, new SimpleTextExtractionStrategy()));

       reader.Dispose();
       return output.ToString();
}

我拥有 EC2 实例的所有管理员权限并使用 .Net framework 4.0

score 0 · Accepted Answer

请按如下方式调试：

创建一个FileInputStream到PDFFilePath.

将此流的读bytes入数组。

检查前五个字节。

在您的本地系统上，它们是'%', 'P', 'D', 'F', '-'.

在您的 Amazon EC2 实例上，它们不是。

检查所有字节以查看问题所在。

c# - 在 Amazon EC2 实例中的 c# 问题中，itextSharp PDF 到文本文件

1 回答 1

Related

Reference