python - 将命令式算法转换为函数式

Question

我编写了一个简单的程序来计算 Java 项目中某些特定包的测试覆盖率的平均值。一个巨大的 html 文件中的原始数据是这样的：

<body>  
package pkg1 <line_coverage>11/111,<branch_coverage>44/444<end>  
package pkg2 <line_coverage>22/222,<branch_coverage>55/555<end>  
package pkg3 <line_coverage>33/333,<branch_coverage>66/666<end>  
...   
</body>

例如，给定指定的包“pkg1”和“pkg3”，平均线路覆盖率为：

(11+33)/(111+333)

平均分支覆盖率是：

(44+66)/(444+666)

我编写了以下程序来获得结果，并且效果很好。但是如何以函数式的方式实现这个计算呢？类似于“(x,y) for x in ... for b in ... if ...”。我知道一点 Erlang、Haskell 和 Clojure，所以这些语言的解决方案也很受欢迎。非常感谢！

from __future__ import division
import re
datafile = ('abc', 'd>11/23d>34/89d', 'e>25/65e>13/25e', 'f>36/92f>19/76')
core_pkgs = ('d', 'f')
covered_lines, total_lines, covered_branches, total_branches = 0, 0, 0, 0
for line in datafile:
    for pkg in core_pkgs:
        ptn = re.compile('.*'+pkg+'.*'+'>(\d+)/(\d+).*>(\d+)/(\d+).*')
        match = ptn.match(line)
        if match is not None:
            cvln, tlln, cvbh, tlbh = match.groups()
            covered_lines += int(cvln)
            total_lines += int(tlln)
            covered_branches += int(cvbh)
            total_branches += int(tlbh)
print 'Line coverage:', '{:.2%}'.format(covered_lines / total_lines)
print 'Branch coverage:', '{:.2%}'.format(covered_branches/total_branches)

score 3 · Accepted Answer

在下面你可以找到我的 Haskell 解决方案。我将尝试解释我在写它时经历的重要点。

首先你会发现我为覆盖数据创建了一个数据结构。创建数据结构来表示您想要处理的任何数据通常是一个好主意。这部分是因为当您可以根据您正在设计的任何内容进行思考时，它可以更轻松地设计您的代码 - 与函数式编程哲学密切相关，部分是因为它可以消除一些您认为自己正在做某事但实际上正在做的错误实际上是在做其他事情。
与前一点相关：我做的第一件事就是将字符串表示的数据转换成我自己的数据结构。当你在做函数式编程时，你经常是在“扫描”中做事。您没有一个函数可以将数据转换为您的格式、过滤掉不需要的数据并汇总结果。对于这些任务中的每一项，您都有三个不同的功能，并且您一次只做一个！

这是因为函数是非常可组合的，也就是说，如果你有三个不同的函数，你可以将它们粘在一起形成一个单独的函数。如果从一个单一的开始，很难将它拆开形成三个不同的。

除非你专门在做 Haskell，否则转换函数的实际工作实际上是相当无趣的。它所做的只是尝试将每个字符串与正则表达式匹配，如果成功，它将覆盖数据添加到结果列表中。
再一次，疯狂的组合即将发生。我没有创建一个函数来循环覆盖范围列表并总结它们。我创建了一个函数来总结两个覆盖范围，因为我知道我可以将它与专用fold循环（有点像for类固醇上的循环）一起使用来总结列表中的所有覆盖范围。我没有必要重新发明轮子并自己创建一个循环。

此外，我的sumCoverages函数与许多专门的循环一起工作，所以我不必编写大量函数，我只需将我的单个函数粘贴到大量预制库函数中！
在main函数中，您将看到我所说的“扫描”或“传递”数据编程的意思。首先我将其转换为内部格式，然后过滤掉不需要的数据，然后汇总剩余数据。这些是完全独立的计算。那就是函数式编程。

您还会注意到我在那里使用了两个专门的循环，filter并且fold. 这意味着我不必自己编写任何循环，我只需在那些标准库循环中添加一个函数，然后让它们从那里获取它。

import Data.Maybe (catMaybes)
import Data.List (foldl')
import Text.Printf (printf)
import Text.Regex (matchRegex, mkRegex)

corePkgs = ["d", "f"]

stats = [
  "d>11/23d>34/89d",
  "e>25/65e>13/25e",
  "f>36/92f>19/76"
  ]

format = mkRegex ".*(\\w+).*>([0-9]+)/([0-9]+).*>([0-9]+)/([0-9]+).*"


-- It might be a good idea to define a datatype for coverage data.
-- A bit of coverage data is defined as the name of the package it
-- came from, the lines covered, the total amount of lines, the
-- branches covered and the total amount of branches.
data Coverage = Coverage String Int Int Int Int


-- Then we need a way to convert the string data into a list of
-- coverage data. We do this by regex. We try to match on each
-- string in the list, and then we choose to keep only the successful
-- matches. Returned is a list of coverage data that was represented
-- by the strings.
convert :: [String] -> [Coverage]
convert = catMaybes . map match
  where match line = do
          [name, cl, tl, cb, tb] <- matchRegex format line
          return $ Coverage name (read cl) (read tl) (read cb) (read tb)


-- We need a way to summarise two coverage data bits. This can of course also
-- be used to summarise entire lists of coverage data, by folding over it.
sumCoverage (Coverage nameA clA tlA cbA tbA) (Coverage nameB clB tlB cbB tbB) =
  Coverage (nameA ++ nameB ++ ",") (clA + clB) (tlA + tlB) (cbA + cbB) (tbA + tbB)


main = do
      -- First we need to convert the strings to coverage data
  let coverageData = convert stats
      -- Then we want to filter out only the relevant data
      relevantData = filter (\(Coverage name _ _ _ _) -> name `elem` corePkgs) coverageData
      -- Then we need to summarise it, but we are only interested in the numbers
      Coverage _ cl tl cb tb = foldl' sumCoverage (Coverage "" 0 0 0 0) relevantData

  -- So we can finally print them!
  printf "Line coverage: %.2f\n" (fromIntegral cl / fromIntegral tl :: Double)
  printf "Branch coverage: %.2f\n" (fromIntegral cb / fromIntegral tb :: Double)

score 1 · Accepted Answer

以下是应用于您的代码的一些快速破解、未经测试的想法：

import numpy as np
import re

datafile = ('abc', 'd>11/23d>34/89d', 'e>25/65e>13/25e', 'f>36/92f>19/76')
core_pkgs = ('d', 'f')
covered_lines, total_lines, covered_branches, total_branches = 0, 0, 0, 0

for pkg in core_pkgs:
    ptn = re.compile('.*'+pkg+'.*'+'>(\d+)/(\d+).*>(\d+)/(\d+).*')
    matches = map(datafile, ptn.match)
    statsList = [map(int, match.groups()) for match in matches if matches]
    # statsList is a list of [cvln, tlln, cvbh, tlbh]
    stats = np.array(statsList)
    covered_lines, total_lines, covered_branches, total_branches = stats.sum(axis=1)

好吧，正如你所看到的，我并没有费心去完成剩余的循环，但我认为现在已经说明了这一点。当然有不止一种方法可以做到这一点。我选择炫耀map()（有些人会说这会降低效率，而且可能确实如此），以及 NumPy 来完成（诚然轻巧的）数学。

score 0 · Accepted Answer

这是相应的 Clojure 解决方案：

(defn extract-data
  "extract 4 integer from a string line according to a package name"
  [pkg line]
  (map read-string
       (rest (first
              (re-seq
               (re-pattern
                (str pkg ".*>(\\d+)/(\\d+).*>(\\d+)/(\\d+)"))
               line)))))

(defn scan-lines-by-pkg
  "scan all string lines and extract all data as integer sequences
    according to package names"
  [pkgs lines]
  (filter seq (for [pkg pkgs
                    line lines]
                (extract-data pkg line))))

(defn sum-data
  "add all data in valid lines together"
  [pkgs lines]
  (apply map + (scan-lines-by-pkg pkgs lines)))

(defn get-percent
  [covered all]
  (str (format "%.2f" (float (/ (* covered 100) all))) "%"))

(defn get-cov
  [pkgs lines]
  {:line-cov (apply get-percent (take 2 (sum-data pkgs lines)))
    :branch-cov (apply get-percent (drop 2 (sum-data pkgs lines)))})

(get-cov ["d" "f"] ["abc" "d>11/23d>34/89d" "e>25/65e>13/25e" "f>36/92f>19/76"])

python - 将命令式算法转换为函数式

3 回答 3

Related

Reference