13

我有以下示例 xml:

<data>
  <products>
    <product>
      <section>Red Section</section>
      <images>
        <image>img.jpg</image>
        <image>img2.jpg</image>
      </images>
    </product>
    <product>
      <section>Blue Section</section>
      <images>
        <image>img.jpg</image>
        <image>img3.jpg</image>
      </images>
    </product>
    <product>
      <section>Green Section</section>
      <images>
        <image>img.jpg</image>
        <image>img2.jpg</image>
      </images>
    </product>
  </products>
</data>

我知道如何在 Clojure 中解析它

(require '[clojure.xml :as xml])
(def x (xml/parse 'location/of/that/xml'))

这将返回一个描述 xml 的嵌套映射

{:tag :data,
 :attrs nil,
 :content [
     {:tag :products,
      :attrs nil,
      :content [
          {:tag :product,
           :attrs nil,
           :content [] ..

这个结构当然可以用标准的 Clojure 函数遍历,但它可能会变得非常冗长,尤其是与使用 XPath 查询它相比时。有没有帮助遍历和搜索这样的结构?例如,我该怎么做

  • 获取所有列表<product>
  • 仅获取其<images>标签包含<image>带有文本“img2.jpg”的产品
  • 获取section“红色部分”的产品

谢谢

4

5 回答 5

10

在这里使用来自data.zip的Zippers是您的第二个用例的解决方案:

(ns core
  (:use clojure.data.zip.xml)
  (:require [clojure.zip :as zip]
            [clojure.xml :as xml]))

(def data (zip/xml-zip (xml/parse PATH)))
(def products (xml-> data :products :product))

(for [product products :let [image (xml-> product :images :image)]
                       :when (some (text= "img2.jpg") image)]
  {:section (xml1-> product :section text)
   :images (map text image)})
=> ({:section "Red Section", :images ("img.jpg" "img2.jpg")}
    {:section "Green Section", :images ("img.jpg" "img2.jpg")})
于 2012-07-18T14:59:39.223 回答
5

这是一个使用data.zip的替代版本,适用于所有三个用例。我发现了这一点,xml->并且xml1->内置了非常强大的导航,在向量中带有子查询。

;; [org.clojure/data.zip "0.1.1"]

(ns example.core
  (:require
   [clojure.zip :as zip]
   [clojure.xml :as xml]
   [clojure.data.zip.xml :refer [text xml-> xml1->]]))

(def data (zip/xml-zip (xml/parse "/tmp/products.xml")))

(let [all-products (xml-> data :products :product)
      red-section (xml1-> data :products :product [:section "Red Section"])
      img2 (xml-> data :products :product [:images [:image "img2.jpg"]])]
  {:all-products (map (fn [product] (xml1-> product :section text)) all-products)
   :red-section (xml1-> red-section :section text)
   :img2 (map (fn [product] (xml1-> product :section text)) img2)})

=> {:all-products ("Red Section" "Blue Section" "Green Section"),
    :red-section "Red Section",
    :img2 ("Red Section" "Green Section")}
于 2014-02-13T13:04:20.453 回答
3

您可以使用类似的库clj-xpath

于 2012-07-18T09:17:23.510 回答
1

在许多情况下,线程优先宏以及 clojures 映射和向量语义是访问 xml 的适当语法。在许多情况下,您想要一些更特定于 xml 的东西(例如 xpath 库),但在许多情况下,现有语言几乎与不添加任何依赖项一样简洁。

(pprint (-> (xml/parse "/tmp/xml") 
        :content first :content second :content first :content first))
"Blue Section"  
于 2012-07-18T18:34:22.650 回答
1

Tupelo 库tupelo.forest可以使用树数据结构轻松解决此类问题。请参阅此问题以获取更多信息。API 文档可以在这里找到

在这里,我们加载您的 xml 数据并先将其转换为 enlive,然后再将其转换为tupelo.forest. 库和数据定义:

(ns tst.tupelo.forest-examples
  (:use tupelo.forest tupelo.test )
  (:require
    [clojure.data.xml :as dx]
    [clojure.java.io :as io]
    [clojure.set :as cs]
    [net.cgrand.enlive-html :as en-html]
    [schema.core :as s]
    [tupelo.core :as t]
    [tupelo.string :as ts]))
(t/refer-tupelo)

(def xml-str-prod "<data>
                    <products>
                      <product>
                        <section>Red Section</section>
                        <images>
                          <image>img.jpg</image>
                          <image>img2.jpg</image>
                        </images>
                      </product>
                      <product>
                        <section>Blue Section</section>
                        <images>
                          <image>img.jpg</image>
                          <image>img3.jpg</image>
                        </images>
                      </product>
                      <product>
                        <section>Green Section</section>
                        <images>
                          <image>img.jpg</image>
                          <image>img2.jpg</image>
                        </images>
                      </product>
                    </products>
                  </data> " )

和初始化代码:

(dotest
  (with-forest (new-forest)
    (let [enlive-tree          (->> xml-str-prod
                                 java.io.StringReader.
                                 en-html/html-resource
                                 first)
          root-hid             (add-tree-enlive enlive-tree)
          tree-1               (hid->hiccup root-hid)

hid 后缀代表“Hex ID”,它是唯一的十六进制值,其作用类似于指向树中节点/叶的指针。在这个阶段,我们刚刚加载了森林数据结构中的数据,创建了 tree-1,它看起来像:

[:data
 [:tupelo.forest/raw "\n                    "]
 [:products
  [:tupelo.forest/raw "\n                      "]
  [:product
   [:tupelo.forest/raw "\n                        "]
   [:section "Red Section"]
   [:tupelo.forest/raw "\n                        "]
   [:images
    [:tupelo.forest/raw "\n                          "]
    [:image "img.jpg"]
    [:tupelo.forest/raw "\n                          "]
    [:image "img2.jpg"]
    [:tupelo.forest/raw "\n                        "]]
   [:tupelo.forest/raw "\n                      "]]
  [:tupelo.forest/raw "\n                      "]
  [:product
   [:tupelo.forest/raw "\n                        "]
   [:section "Blue Section"]
   [:tupelo.forest/raw "\n                        "]
   [:images
    [:tupelo.forest/raw "\n                          "]
    [:image "img.jpg"]
    [:tupelo.forest/raw "\n                          "]
    [:image "img3.jpg"]
    [:tupelo.forest/raw "\n                        "]]
   [:tupelo.forest/raw "\n                      "]]
  [:tupelo.forest/raw "\n                      "]
  [:product
   [:tupelo.forest/raw "\n                        "]
   [:section "Green Section"]
   [:tupelo.forest/raw "\n                        "]
   [:images
    [:tupelo.forest/raw "\n                          "]
    [:image "img.jpg"]
    [:tupelo.forest/raw "\n                          "]
    [:image "img2.jpg"]
    [:tupelo.forest/raw "\n                        "]]
   [:tupelo.forest/raw "\n                      "]]
  [:tupelo.forest/raw "\n                    "]]
 [:tupelo.forest/raw "\n                   "]]

接下来,我们使用以下代码删除所有空白字符串:

blank-leaf-hid?      (fn [hid] (and (leaf-hid? hid) ; ensure it is a leaf node
                                 (let [value (hid->value hid)]
                                      (and (string? value)
                                        (or (zero? (count value)) ; empty string
                                          (ts/whitespace? value)))))) ; all whitespace string

blank-leaf-hids      (keep-if blank-leaf-hid? (all-hids))
>>                   (apply remove-hid blank-leaf-hids)
tree-2               (hid->hiccup root-hid)

产生更好的结果树(打嗝格式)

[:data
 [:products
  [:product
   [:section "Red Section"]
   [:images [:image "img.jpg"] [:image "img2.jpg"]]]
  [:product
   [:section "Blue Section"]
   [:images [:image "img.jpg"] [:image "img3.jpg"]]]
  [:product
   [:section "Green Section"]
   [:images [:image "img.jpg"] [:image "img2.jpg"]]]]]

然后,以下代码计算上述三个问题的答案:

product-hids         (find-hids root-hid [:** :product])
product-trees-hiccup (mapv hid->hiccup product-hids)

img2-paths           (find-paths-leaf root-hid [:data :products :product :images :image] "img2.jpg")
img2-prod-paths      (mapv #(drop-last 2 %) img2-paths)
img2-prod-hids       (mapv last img2-prod-paths)
img2-trees-hiccup    (mapv hid->hiccup img2-prod-hids)

red-sect-paths       (find-paths-leaf root-hid [:data :products :product :section] "Red Section")
red-prod-paths       (mapv #(drop-last 1 %) red-sect-paths)
red-prod-hids        (mapv last red-prod-paths)
red-trees-hiccup     (mapv hid->hiccup red-prod-hids)]

结果:

 (is= product-trees-hiccup
   [[:product
     [:section "Red Section"]
     [:images
      [:image "img.jpg"]
      [:image "img2.jpg"]]]
    [:product
     [:section "Blue Section"]
     [:images
      [:image "img.jpg"]
      [:image "img3.jpg"]]]
    [:product
     [:section "Green Section"]
     [:images
      [:image "img.jpg"]
      [:image "img2.jpg"]]]] )

(is= img2-trees-hiccup
  [[:product
    [:section "Red Section"]
    [:images
     [:image "img.jpg"]
     [:image "img2.jpg"]]]
   [:product
    [:section "Green Section"]
    [:images
     [:image "img.jpg"]
     [:image "img2.jpg"]]]])

(is= red-trees-hiccup
  [[:product
    [:section "Red Section"]
    [:images
     [:image "img.jpg"]
     [:image "img2.jpg"]]]]))))

完整的示例可以在 forest-examples 单元测试中找到。

于 2017-06-08T02:50:32.870 回答