0

我正在尝试构建一个读取 .pdf 文档并从中提取所有有用文本的应用程序。我的目标是将其用作学校项目的抄袭检查器。

我在 Visual Studio C# 中使用 GroupDocs。在下面的代码中,您可以看到我使用打开文件对话框来获取适当的文件路径。“MessageBox.Show(filePath)”让我在调试过程中确认这一点。

但是,一旦我尝试初始化对象 Parser parser = new Parser(filePath):

using (Parser parser = new Parser(filePath)) //VS specifically points the 
                                               error to this line
            {
                using (TextReader reader = parser.GetText())
                {
                    Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
                }
            }

我收到一个名为 SystemInitializationException 的异常,其中一个名为 FileNotFoundException 的内部异常。在上面的代码片段中,我还尝试将 filePath 变量替换为目录名称,例如 C://School Files//Why Should People Go to Space.pdf

Visual Studio 创建的错误消息

我有一个线索,它在图片中显示的文件名不是实际的文件名。但是,我不确定如何实际改变这一点。我已尝试提供所需的所有信息,但如果需要更多信息,请告诉我。下面我将发布整个代码。

using System;
using System.Windows.Forms;
using GroupDocs.Viewer;
using GroupDocs.Search;
using GroupDocs.Parser;
using GroupDocs.Comparison;
using System.Net;

using System.IO;

using System.Web;

namespace project
{
    public partial class Form1 : Form
    {

        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            var client = new WebClient();
            client.Headers.Add("User-Agent", "C# console program");

            string url = "https://peerj.com/articles/cs-132/";
            string content = client.DownloadString(url);
            //content contains ENTIRE html code for the webpage
            richTextBox1.Text = content;
        }

        private void button1_Click(object sender, EventArgs e)
        {
            var filePath = string.Empty;

            //To retrieve the file path with a dialog window
            using (OpenFileDialog openFileDialog = new OpenFileDialog())
            {
                openFileDialog.InitialDirectory = "c:\\";
                openFileDialog.Filter = "DOCX files (*.docx)|*.docx|PDF files (*.pdf)|*.pdf";
                openFileDialog.FilterIndex = 2;
                openFileDialog.RestoreDirectory = true;

                if (openFileDialog.ShowDialog() == DialogResult.OK)
                {
                    //Get the path of specified file
                    filePath = openFileDialog.FileName;

                }
            }
            MessageBox.Show(filePath);

            //Operate on the filePath string to get text from file
            using (Parser parser = new Parser(filePath))
            {
                using (TextReader reader = parser.GetText())
                {
                    Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
                }
            }
        }
    }
}
4

0 回答 0