c# - 如何避免在 C# 中使用 iTextSharp 在 PDF 文件中的元数据关键字中添加双引号？

Question

使用 iTextSharp 库，我可以使用各种模式在 PDF 文件中插入元数据。

出于我的目的，关键字元数据中的关键字由逗号分隔并用双引号括起来。一旦我编写的脚本运行，关键字就会用三引号括起来。

关于如何避免这种情况的任何想法或关于使用 XMP 的任何建议？

所需元数据示例："keyword1","keyword2","keyword3"

当前元数据示例："""keyword1"",""keyword2"",""keyword3"""

编码：

string _keywords = meta_line.Split(',')[1] + ","
                             + meta_line.Split(',')[2] + ","
                             + meta_line.Split(',')[3] + ","
                             + meta_line.Split(',')[4] + ","
                             + meta_line.Split(',')[5] + ","
                             + meta_line.Split(',')[6] + ","
                             + meta_line.Split(',')[7];
            _keywords = _keywords.Replace('~', ',');

            Console.WriteLine(metaFile);

            foreach (string inputFile in Directory.GetFiles(source, "*.pdf", SearchOption.TopDirectoryOnly))
            {
                if (Path.GetFileName(metaFile) == Path.GetFileName(inputFile))
                {
                    string outputFile = source + @"\output\" + Path.GetFileName(inputFile);
                    PdfReader reader = new PdfReader(inputFile);

                    using (FileStream fs = new FileStream(outputFile, FileMode.Create, FileAccess.Write, FileShare.None))
                    {

                        PdfStamper stamper = new PdfStamper(reader, fs);
                        Dictionary<String, String> info = reader.Info;
                        stamper.MoreInfo = info;

                        PdfWriter writer = stamper.Writer;

                        byte[] buffer = new byte[65536];

                        System.IO.MemoryStream ms = new System.IO.MemoryStream(buffer, true);
                        try
                        {
                            iTextSharp.text.xml.xmp.XmpSchema dc = new iTextSharp.text.xml.xmp.DublinCoreSchema();

                            dc.SetProperty(iTextSharp.text.xml.xmp.DublinCoreSchema.TITLE, new iTextSharp.text.xml.xmp.LangAlt(_title));

                            iTextSharp.text.xml.xmp.XmpArray subject = new iTextSharp.text.xml.xmp.XmpArray(iTextSharp.text.xml.xmp.XmpArray.ORDERED);
                            subject.Add(_subject);
                            dc.SetProperty(iTextSharp.text.xml.xmp.DublinCoreSchema.SUBJECT, subject);

                            iTextSharp.text.xml.xmp.XmpArray author = new iTextSharp.text.xml.xmp.XmpArray(iTextSharp.text.xml.xmp.XmpArray.ORDERED);
                            author.Add(_author);
                            dc.SetProperty(iTextSharp.text.xml.xmp.DublinCoreSchema.CREATOR, author);

                            PdfSchemaAdvanced pdf = new PdfSchemaAdvanced();

                            pdf.AddKeywords(_keywords);


                            iTextSharp.text.xml.xmp.XmpWriter xmp = new iTextSharp.text.xml.xmp.XmpWriter(ms);
                            xmp.AddRdfDescription(dc);
                            xmp.AddRdfDescription(pdf);
                            xmp.Close();

                            int bufsize = buffer.Length;
                            int bufcount = 0;
                            foreach (byte b in buffer)
                            {
                                if (b == 0) break;
                                bufcount++;
                            }
                            System.IO.MemoryStream ms2 = new System.IO.MemoryStream(buffer, 0, bufcount);
                            buffer = ms2.ToArray();

                            foreach (char buff in buffer)
                            {
                                Console.Write(buff);
                            }
                            writer.XmpMetadata = buffer;
                        }
                        catch (Exception ex)
                        {
                            throw ex;
                        }
                        finally
                        {
                            ms.Close();
                            ms.Dispose();
                        }

                        stamper.Close();
                     // writer.Close();

                    }

                    reader.Close();
                }
            }

下面的方法没有添加任何元数据 - 不知道为什么（评论中的第 3 点）：

iTextSharp.text.xml.xmp.XmpArray keywords = new     iTextSharp.text.xml.xmp.XmpArray(iTextSharp.text.xml.xmp.XmpArray.ORDERED);
                            keywords.Add("keyword1");
                            keywords.Add("keyword2");
                            keywords.Add("keyword3");


                            pdf.SetProperty(iTextSharp.text.xml.xmp.PdfSchema.KEYWORDS, keywords);

score 2 · Accepted Answer

我目前没有最新的 iTextSharp 版本。我有一个 itextsharp 5.1.1.0。它不包含PdfSchemaAdvanced类，但它具有PdfSchema及其基类XmpSchema。我敢打赌PdfSchemaAdvanced，您的库中的XmpSchema.

唯一做的PdfSchema.AddKeyword一件事：

base["pdf:Keywords"] = keywords;

XmpSchema.[].set反过来做：

base[key] = XmpSchema.Escape(value);

所以很明显，该值正在“转义”，以确保特殊字符不会干扰存储格式。

现在，Escape我所看到的函数执行简单的逐字符扫描并执行替换：

" -> &quot;
& -> &amp;
' -> &apos;
< -> &lt;
> -> &gt;

就这样。似乎是典型的 html-entites 处理。至少在我的图书馆版本中。因此，它不会复制引号，只需更改它们的编码。

然后，AddRdfDescription似乎只是简单地遍历存储的密钥并将它们包装在标签中，无需进一步处理。所以，它会发出类似的东西：

Escaped"Contents&OfThis"Key

作为：

<pdf:Keywords>Escaped&quot;Contents&amp;OfThis&quot;Key</pdf:Keywords>

除了AddKeywords方法，您还应该看到AddProperty方法。它的作用类似于 add-keywords，除了它接收key而不是 Escape()其输入值这一事实。

因此，如果您完全确定您_keywords的格式正确，您可以尝试：

AddProperty("pdf:Keywords", _keywords)

但我不鼓励你这样做。至少在我的 itextsharp 版本中，该库似乎可以正确处理“关键字”并将其安全地格式化为 RDF。

嘿，您也可以尝试使用PdfSchema我刚刚检查的类而不是那个类Advanced。我敢打赌它仍然存在于图书馆中。

但是，总的来说，我认为问题出在其他地方。

双重或三重检查 _keywords 变量的内容，然后检查生成的 PDF 的二进制内容。使用一些十六进制编辑器或简单的纯文本编辑器（如记事本）查看它并查找<pdf:Keywords>标签。检查它实际包含的内容。可能一切正常，可能是您的 pdf-metadata-reader 添加了这些引号。

c# - 如何避免在 C# 中使用 iTextSharp 在 PDF 文件中的元数据关键字中添加双引号？

1 回答 1

Related

Reference