我正在尝试使用PDF Clown以编程方式在俄语 pdf 文件中搜索字符串,如下所示:
var FilePath = @"C:\Users\Yvoloshin\source\repos\SearchPdf\Газета «Красная Звезда» №001 от 01 января 1942 года.pdf";
org.pdfclown.files.File file = new org.pdfclown.files.File(FilePath);
// Define the text pattern to look for
var pattern = new Regex("К новым", RegexOptions.IgnoreCase);
// Instantiate the extractor
TextExtractor textExtractor = new TextExtractor(true, true);
foreach (var page in file.Document.Pages)
{
// Extract the page text
var textStrings = textExtractor.Extract(page);
// Find the text pattern matches
var matches = pattern.Matches(TextExtractor.ToString(textStrings));
Console.WriteLine(matches);
Console.ReadLine();
}
当我运行这个时,我得到这个错误:
Unhandled Exception: System.NotSupportedException: The given path's format is not supported.
at System.Security.Permissions.FileIOPermission.EmulateFileIOPermissionChecks(String fullPath)
at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access)
at org.pdfclown.files.File..ctor(String path)
这是没有为西里尔字体设置 PDF Clown 的问题,还是其他地方的问题?我正在使用 Visual Studio 2017 和 .NET 4.8。