c# - 从 PDF 文档中提取表格

Question

我想以PDF编程方式使用 C# 为大学项目提取文档中的表格。我很熟悉itextsharp。

有没有办法可以提取表格itextsharp？
我可以为此目的使用任何其他免费库吗？
我可以将其转换PDF为 XML/HTML 以提取<table>标签吗？如果可以，是否有一个免费的库可以用于PDFHTML 转换？

或者

请给我一个合适的解决方案。

score 0 · Accepted Answer

你能尝试这样的事情，并从我从 VB.Net 转换到 C# equiv 的这个例子中扩展你需要的东西吗

public static string GetTextFromPDF(string PdfFileName)
{
    iTextSharp.text.pdf.PdfReader pdfReader = new iTextSharp.text.pdf.PdfReader(PdfFileName);
    dynamic sOut = string.Empty;

    for (i = 1; i <= pdfReader.NumberOfPages; i++) {
        iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
        sOut += iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(pdfReader, i, its);
    }
    return sOut;
}

c# - 从 PDF 文档中提取表格

1 回答 1

Related

Reference