c++ - 从pdf页面中提取带有podofo的图像

Question

我需要使用 podofo 从 pdf 文件中提取所有图像。从文件中提取所有图像效果很好。我为此使用了图像提取器示例。这会接收所有对象并对其进行迭代。但我需要遍历页面并检查页面上的图像对象。有谁知道这是怎么做到的吗？

score 1 · Accepted Answer

Piggy 支持 podofoimgextract，您可以迭代每个页面，获取页面资源对象，检查 XObject 或图像，从这里开始，它与图像提取实用程序中使用的代码几乎完全相同。

for (int pageN = 0; pageN < document.GetPageCount(); pageN++) {
  PdfPage* page = document.GetPage(pageN);
  PdfDictionary resource = page->GetResources()->GetDictionary();

  for (auto& k : resource.GetKeys()) {
    if (k.first.GetName() == "XObject" || k.first.GetName() == "Image") {
      if (k.second->IsDictionary()) {
        auto targetDict = k.second->GetDictionary();
        for (auto& r : k.second->GetDictionary().GetKeys()) {
          // The XObject will usually contain indirect objects as it's values.
          // Check for a reference
          if (r.second->IsReference()) {
            // Get the object that is being referenced.
            auto target =
              document.GetObjects().GetObject(r.second->GetReference());
            if (target->IsDictionary()) {
              auto targetDict = target->GetDictionary();
              auto kf = targetDict.GetKey(PdfName::KeyFilter);
              if (!kf)
                continue;
              if (kf->IsArray() && kf->GetArray().GetSize() == 1 &&
                  kf->GetArray()[0].IsName() &&
                  kf->GetArray()[0].GetName().GetName() == "DCTDecode") {
                kf = &kf->GetArray()[0];
              }
              if (kf->IsName() && kf->GetName().GetName() == "DCTDecode") {
                ExtractImage(target, true);
              } else {
                ExtractImage(target, false);
              }
            }
          }
        }
      }
    }
  }
}

c++ - 从pdf页面中提取带有podofo的图像

1 回答 1

Related

Reference