0

我遵循了 enlive 教程,必须说我对 Enlive 解析网络的能力印象深刻。现在,我进一步查看了此处可用的 scrape3.clj:https ://github.com/swannodette/enlive-tutorial/blob/master/src/tutorial/scrape3.clj

Swannodette 在设计这个例子方面做得很好,但我觉得我们可以让它变得更枯燥一些。

我的问题:我会重写此提取函数以使其更干燥:

(defn extract [node]
  (let [headline (first (html/select [node] *headline-selector*))
        byline   (first (html/select [node] *byline-selector*))
        summary  (first (html/select [node] *summary-selector*))
        result   (map html/text [headline byline summary])]
    (zipmap [:headline :byline :summary] (map #(re-gsub #"\n" "" %) result)))) 

如果您对程序的其他元素有其他想法,请随时分享!

编辑:我到处玩,想出了:

    (defn extract [node]
      (let [s [*headline-selector* *byline-selector* *summary-selector*] 
            selected (map #(html/text (first (html/select [node] %))) s)
            cleaned  (map #(re-gsub #"\n" "" %) selected)]
        (zipmap [:headline :byline :summary] cleaned)))
4

2 回答 2

2

first (html/select [node]可以提升到本地功能:

(defn extract [node]
  (let [selector (fn [sel]) (html/select [node] sel)
        headline (selector *headline-selector*)
        byline   (selector *byline-selector*)
        summary  (selector *summary-selector*)
        result   (map html/text [headline byline summary])]
    (zipmap [:headline :byline :summary] (map #(re-gsub #"\n" "" %) result))))

然后可以删除中间名称,尽管这些有助于使代码的重点清晰,因此这是个人喜好问题:

(defn extract [node]
  (let [selector (fn [selector]) (html/select [node] selector)
        result   (map html/text 
                   (map selector [*headline-selector* 
                                  *byline-selector* 
                                  *summary-selector*]))]
    (zipmap [:headline :byline :summary] (map #(re-gsub #"\n" "" %) result)))) 
于 2013-02-25T23:32:37.223 回答
1

为了使函数的结果“更明显”,我将使用映射文字,如下所示:

(defn extract [node]
  (let [sel #(html/text (first (html/select [node] %)))
        rem #(re-gsub #"\n" "" %)
        get-text #(-> % sel rem)]
    {:headline (get-text *headline-selector*)
     :byline (get-text *byline-selector*)
     :summary (get-text *summary-selector*)
     }))
于 2013-02-26T04:49:20.813 回答