我正在尝试构建一个读取 .pdf 文档并从中提取所有有用文本的应用程序。我的目标是将其用作学校项目的抄袭检查器。
我在 Visual Studio C# 中使用 GroupDocs。在下面的代码中,您可以看到我使用打开文件对话框来获取适当的文件路径。“MessageBox.Show(filePath)”让我在调试过程中确认这一点。
但是,一旦我尝试初始化对象 Parser parser = new Parser(filePath):
using (Parser parser = new Parser(filePath)) //VS specifically points the
error to this line
{
using (TextReader reader = parser.GetText())
{
Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
}
}
我收到一个名为 SystemInitializationException 的异常,其中一个名为 FileNotFoundException 的内部异常。在上面的代码片段中,我还尝试将 filePath 变量替换为目录名称,例如 C://School Files//Why Should People Go to Space.pdf
我有一个线索,它在图片中显示的文件名不是实际的文件名。但是,我不确定如何实际改变这一点。我已尝试提供所需的所有信息,但如果需要更多信息,请告诉我。下面我将发布整个代码。
using System;
using System.Windows.Forms;
using GroupDocs.Viewer;
using GroupDocs.Search;
using GroupDocs.Parser;
using GroupDocs.Comparison;
using System.Net;
using System.IO;
using System.Web;
namespace project
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
var client = new WebClient();
client.Headers.Add("User-Agent", "C# console program");
string url = "https://peerj.com/articles/cs-132/";
string content = client.DownloadString(url);
//content contains ENTIRE html code for the webpage
richTextBox1.Text = content;
}
private void button1_Click(object sender, EventArgs e)
{
var filePath = string.Empty;
//To retrieve the file path with a dialog window
using (OpenFileDialog openFileDialog = new OpenFileDialog())
{
openFileDialog.InitialDirectory = "c:\\";
openFileDialog.Filter = "DOCX files (*.docx)|*.docx|PDF files (*.pdf)|*.pdf";
openFileDialog.FilterIndex = 2;
openFileDialog.RestoreDirectory = true;
if (openFileDialog.ShowDialog() == DialogResult.OK)
{
//Get the path of specified file
filePath = openFileDialog.FileName;
}
}
MessageBox.Show(filePath);
//Operate on the filePath string to get text from file
using (Parser parser = new Parser(filePath))
{
using (TextReader reader = parser.GetText())
{
Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
}
}
}
}
}