c# - 如何使用 PDF Clown 搜索西里尔文 pdf

Question

我正在尝试使用PDF Clown以编程方式在俄语 pdf 文件中搜索字符串，如下所示：

var FilePath = @"‪C:\Users\Yvoloshin\source\repos\SearchPdf\Газета «Красная Звезда» №001 от 01 января 1942 года.pdf";
org.pdfclown.files.File file = new org.pdfclown.files.File(FilePath);

// Define the text pattern to look for
var pattern = new Regex("К новым", RegexOptions.IgnoreCase);

// Instantiate the extractor
TextExtractor textExtractor = new TextExtractor(true, true);

foreach (var page in file.Document.Pages)
{
// Extract the page text
var textStrings = textExtractor.Extract(page);

// Find the text pattern matches
var matches = pattern.Matches(TextExtractor.ToString(textStrings));
Console.WriteLine(matches);
Console.ReadLine();
}

当我运行这个时，我得到这个错误：

Unhandled Exception: System.NotSupportedException: The given path's format is not supported.
   at System.Security.Permissions.FileIOPermission.EmulateFileIOPermissionChecks(String fullPath)
   at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
   at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access)
   at org.pdfclown.files.File..ctor(String path)

这是没有为西里尔字体设置 PDF Clown 的问题，还是其他地方的问题？我正在使用 Visual Studio 2017 和 .NET 4.8。

c# - 如何使用 PDF Clown 搜索西里尔文 pdf

0 回答 0

Related

Reference