4

我是 clojure 的新手,一直在使用 enlive 来转换 html 文档的文本节点。我的最终目标是将结构转换回 html、标签和所有内容。

我目前能够获取 enlive-html/html-resource 返回的 structmap 并将其转换回 html 使用

(apply str (html/emit* nodes))

其中节点是结构图。

我还可以根据需要转换 structmap 的 :content 文本节点。然而,在转换了 structmap 的内容文本节点之后,我最终得到了 MapEntries 的lazyseq。我想将其转换回结构图,以便可以在其上使用 emit*。这有点棘手,因为lazyseqs & structmaps 是嵌套的。

tldr:

我如何转换:

([:tag :html]
 [:attrs nil]
 [:content
  ("\n"
   ([:tag :head]
    [:attrs nil]
    [:content
     ("\n  "
      ([:tag :title] [:attrs nil] [:content ("Page Title")])
      "  \n")])
   "\n"
   ([:tag :body]
    [:attrs nil]
    [:content
     ("\n  "
      ([:tag :div]
       [:attrs {:id "wrap"}]
       [:content
        ("\n    "
         ([:tag :h1] [:attrs nil] [:content ("header")])
         "\n    "
         ([:tag :p] [:attrs nil] [:content ("some paragrah text")])
         "\n  ")])
      "\n")])
   "\n\n")])

进入:

    {:tag :html,
 :attrs nil,
 :content
 ("\n"
  {:tag :head,
   :attrs nil,
   :content
   ("\n  " {:tag :title, :attrs nil, :content ("Page Title")} "  \n")}
  "\n"
  {:tag :body,
   :attrs nil,
   :content
   ("\n  "
    {:tag :div,
     :attrs {:id "wrap"},
     :content
     ("\n    "
      {:tag :h1, :attrs nil, :content ("header")}
      "\n    "
      {:tag :p, :attrs nil, :content ("some paragrah text")}
      "\n  ")}
    "\n")}
  "\n\n")}

更新

kotarak 的回答为我指明了 的方向update-in,我可以使用它来修改地图而不将其转换为序列,从而使我的问题变得无关紧要。

(defn modify-or-go-deeper
  "If item is a map, updates its content, else if it's a string, modifies it"
  [item]
  (declare update-content)
  (cond
    (map? item) (update-content item)
    (string? item) (modify-text item)))

(defn update-content
  "Calls modify-or-go-deeper on each element of the :content sequence"
  [coll]
  (update-in coll [:content] (partial map modify-or-go-deeper)))

我以前for在地图上使用过,但是update-in要走的路。

4

2 回答 2

4

Just put everything back into a map and walk the content recursively.

(defn into-xml
  [coll]
  (let [tag (into {} coll)]
    (update-in tag [:content] (partial map into-xml))))

Note that the content is only transformed as you access it.

Edit: Woops, missed the string parts. Here a working version:

(defn into-xml
  [coll]
  (if-not (string? coll)
    (let [tag (into {} coll)]
      (update-in tag [:content] (partial map into-xml)))
    coll))
于 2012-06-14T06:01:43.460 回答
1

尝试

(def mp '([:tag :html] [:attrs nil] [:content
    (""
    ([:tag :head] [:attrs nil] [:content
        ("\n\t\t"
        ([:tag :title] [:attrs nil] [:content ("page title")])
        "\n\t\t")])
        "\n\t"
        ([:tag :body] [:attrs nil] [:content
            ("\n\t\t"
            ([:tag :div] [:attrs {:id "wrapper"}] [:content
            ("\n\t\t  "
            ([:tag :h1] [:attrs nil] [:content
                ("\n  \t\t\tpage title"
                ([:tag :br] [:attrs nil] [:content ()])
                "\n  \t\t\tand more title\n  \t\t")])
                "\n  \t\t"
                ([:tag :p] [:attrs nil] [:content
                    ("\n  \t\tSome paragraph text"
                    ([:tag :img] [:attrs {:src "images/image.png", :id "image"}] [:content nil])
                    "\n  \t\t")])
            "\n\t\t")]
            "\n\t     \n\t\t"))]
        "\n\n"))]))

(clojure.walk/postwalk (fn [x]
                         (if (and (list? x) (vector? (first x)))
                           (into {} x)
                           x))
                       mp)

它会抛出一个错误,但是如果您将输入更改为

([:tag :html]
 [:attrs nil]
 [:content
  (""
   ([:tag :head]
    [:attrs nil]
    [:content
     ("\n\t\t"
      ([:tag :title] [:attrs nil] [:content ("page title")])
      "\n\t\t")])
   "\n\t"
   ([:tag :body]
    [:attrs nil]
    [:content
     ("\n\t\t"
      ([:tag :div]
       [:attrs {:id "wrapper"}]
       [:content
        ("\n\t\t  "
         ([:tag :h1]
          [:attrs nil]
          [:content
           ("\n  \t\t\tpage title"
            ([:tag :br] [:attrs nil] [:content ()])
            "\n  \t\t\tand more title\n  \t\t")])
         "\n  \t\t"
         ([:tag :p]
          [:attrs nil]
          [:content
           ("\n  \t\tSome paragraph text"
            ([:tag :img]
             [:attrs {:src "images/image.png", :id "image"}]
             [:content nil])
            "\n  \t\t")])
         "\n\t\t")]
       ))]))]))

然后它工作正常。不同之处在于,在编辑后的输入中,您将从包含键值对的同一列表中删除类似“\n\t\t”的字符串。希望这可以帮助。

编辑: 以下对我有用:

(def mp '([:tag :html]
 [:attrs nil]
 [:content
  (""
   ([:tag :head]
    [:attrs nil]
    [:content
     ("\n\t\t"
      ([:tag :title] [:attrs nil] [:content ("page title")])
      "\n\t\t")])
   "\n\t"
   ([:tag :body]
    [:attrs nil]
    [:content
     ("\n\t\t"
      ([:tag :div]
       [:attrs {:id "wrapper"}]
       [:content
        ("\n\t\t  "
         ([:tag :h1]
          [:attrs nil]
          [:content
           ("\n  \t\t\tpage title"
            ([:tag :br] [:attrs nil] [:content ()])
            "\n  \t\t\tand more title\n  \t\t")])
         "\n  \t\t"
         ([:tag :p]
          [:attrs nil]
          [:content
           ("\n  \t\tSome paragraph text"
            ([:tag :img]
             [:attrs {:src "images/image.png", :id "image"}]
             [:content nil])
            "\n  \t\t")])
         "\n\t\t")]
       ))]))]))

(clojure.walk/postwalk (fn [x]
                         (if (and (list? x) (vector? (first x)))
                           (into {} x)
                           x))
                       mp)

尝试将其复制并粘贴到 repl 中。您应该得到以下信息:

{:tag :html,
 :attrs nil,
 :content
 (""
  {:tag :head,
   :attrs nil,
   :content
   ("\n\t\t"
    {:tag :title, :attrs nil, :content ("page title")}
    "\n\t\t")}
  "\n\t"
  {:tag :body,
   :attrs nil,
   :content
   ("\n\t\t"
    {:tag :div,
     :attrs {:id "wrapper"},
     :content
     ("\n\t\t  "
      {:tag :h1,
       :attrs nil,
       :content
       ("\n  \t\t\tpage title"
        {:tag :br, :attrs nil, :content ()}
        "\n  \t\t\tand more title\n  \t\t")}
      "\n  \t\t"
      {:tag :p,
       :attrs nil,
       :content
       ("\n  \t\tSome paragraph text"
        {:tag :img,
         :attrs {:src "images/image.png", :id "image"},
         :content nil}
        "\n  \t\t")}
      "\n\t\t")})})}
于 2012-06-13T23:29:25.233 回答