我有一个小脚本可以从 apache 日志文件中读取、解析和导出某种有趣的(不是真的)统计信息。到目前为止,我已经做了两个简单的选择,日志文件中所有请求中发送的字节总数,以及最常见的 10 个 IP 地址。
第一个“模式”只是所有已解析字节的简单总和。第二个是地图上的折叠(Data.Map),insertWith (+) 1'
用于计算出现次数。
第一个按照我的预期运行,大部分时间都花在解析上,在恒定空间中。
42,359,709,344 字节分配在堆中 72,405,840 字节在 GC 期间复制 113,712 字节最大驻留(1553 个样本) 145,872 字节最大斜率 2 MB 正在使用的总内存(0 MB 由于碎片而丢失)
第 0 代:76311 次收集,
0 次并行,0.89 秒,0.99 秒过去
第 1 代:1553 次收集,0 次并行,0.21 秒,0.22 秒过去INIT 时间 0.00s(经过 0.00s) MUT 时间 21.76s(经过 24.82s) GC 时间 1.10s(经过 1.20s) EXIT 时间
0.00s(经过 0.00s) 总时间 22.87s(经过 26.02s)%GC 时间 4.8%(经过 4.6%)
分配速率 1,946,258,962 字节/MUT 秒
生产力占总用户的 95.2%,占总使用时间的 83.6%
但是,第二个没有!
49,398,834,152 字节分配在堆中 580,579,208 字节在 GC 期间复制 718,385,088 字节最大驻留(15 个样本) 134,532,128 字节最大斜率 1393 MB 正在使用的总内存(172 MB 由于碎片而丢失)
第 0 代:91275 次收集,
0 次并行,252.65 秒,254.46 秒经过
第 1 代:15 次收集,0 次并行,0.12 秒,0.12 秒经过INIT 时间 0.00s(经过 0.00s) MUT 时间 41.11s(经过 48.87s) GC 时间 252.77s(经过 254.58s) EXIT 时间
0.00s(经过 0.01s) 总时间 293.88s(经过 303.45s)%GC 时间 86.0%(经过 83.9%)
分配速率 1,201,635,385 字节/MUT 秒
生产力占总用户的 14.0%,占总使用时间的 13.5%
这是代码。
{-# LANGUAGE OverloadedStrings #-}
module Main where
import qualified Data.Attoparsec.Lazy as AL
import Data.Attoparsec.Char8 hiding (space, take)
import qualified Data.ByteString.Char8 as S
import qualified Data.ByteString.Lazy.Char8 as L
import Control.Monad (liftM)
import System.Environment (getArgs)
import Prelude hiding (takeWhile)
import qualified Data.Map as M
import Data.List (foldl', sortBy)
import Text.Printf (printf)
import Data.Maybe (fromMaybe)
type Command = String
data LogLine = LogLine {
getIP :: S.ByteString,
getIdent :: S.ByteString,
getUser :: S.ByteString,
getDate :: S.ByteString,
getReq :: S.ByteString,
getStatus :: S.ByteString,
getBytes :: S.ByteString,
getPath :: S.ByteString,
getUA :: S.ByteString
} deriving (Ord, Show, Eq)
quote, lbrack, rbrack, space :: Parser Char
quote = satisfy (== '\"')
lbrack = satisfy (== '[')
rbrack = satisfy (== ']')
space = satisfy (== ' ')
quotedVal :: Parser S.ByteString
quotedVal = do
quote
res <- takeTill (== '\"')
quote
return res
bracketedVal :: Parser S.ByteString
bracketedVal = do
lbrack
res <- takeTill (== ']')
rbrack
return res
val :: Parser S.ByteString
val = takeTill (== ' ')
line :: Parser LogLine
l ine = do
ip <- val
space
identity <- val
space
user <- val
space
date <- bracketedVal
space
req <- quotedVal
space
status <- val
space
bytes <- val
(path,ua) <- option ("","") combined
return $ LogLine ip identity user date req status bytes path ua
combined :: Parser (S.ByteString,S.ByteString)
combined = do
space
path <- quotedVal
space
ua <- quotedVal
return (path,ua)
countBytes :: [L.ByteString] -> Int
countBytes = foldl' count 0
where
count acc l = case AL.maybeResult $ AL.parse line l of
Just x -> (acc +) . maybe 0 fst . S.readInt . getBytes $ x
Nothing -> acc
countIPs :: [L.ByteString] -> M.Map S.ByteString Int
countIPs = foldl' count M.empty
where
count acc l = case AL.maybeResult $ AL.parse line l of
Just x -> M.insertWith' (+) (getIP x) 1 acc
Nothing -> acc
---------------------------------------------------------------------------------
main :: IO ()
main = do
[cmd,path] <- getArgs
dispatch cmd path
pretty :: Show a => Int -> (a, Int) -> String
pretty i (bs, n) = printf "%d: %s, %d" i (show bs) n
dispatch :: Command -> FilePath -> IO ()
dispatch cmd path = action path
where
action = fromMaybe err (lookup cmd actions)
err = printf "Error: %s is not a valid command." cmd
actions :: [(Command, FilePath -> IO ())]
actions = [("bytes", countTotalBytes)
,("ips", topListIP)]
countTotalBytes :: FilePath -> IO ()
countTotalBytes path = print . countBytes . L.lines =<< L.readFile path
topListIP :: FilePath -> IO ()
topListIP path = do
f <- liftM L.lines $ L.readFile path
let mostPopular (_,a) (_,b) = compare b a
m = countIPs f
mapM_ putStrLn . zipWith pretty [1..] . take 10 . sortBy mostPopular . M.toList $ m
编辑:
添加 +RTS -A16M 将 GC 降低到 20%。内存使用当然不变。