performance - 找到所有后续元素之间的最大差异

Question

我正在尝试解决我在 codeingame.com 上找到的训练练习

问题如下：您有一个数字列表，并希望找到之间的差异的最小值v_small - v_big，条件是和v_big > v_small在列表中。另外这道题的最大时间是1秒，最大内存使用量是512MB。v_big v_small

对于小型列表，一个简单的算法就足够了：

---------------------------------- try1.hs -------------------------------------
import Control.Applicative ((<$>))

main :: IO ()
main = do _ <- getLine
          v <- g . f . map read . take 1000 . words <$> getLine --or equivalently
--        v <- g . h . map read . take 1000 . words <$> getLine 
          print v

f :: [Int] -> [Int]
f [] =   []
f xx@(x:xs) = (minimum $ map (\y -> y-x) xx) : (f xs)

g :: [Int] -> Int
g [] = 0
g xs = minimum xs

h :: [Int] -> [Int]
h [] = []
h (x:xs) = (foldr (\y' y -> min (y'-x) y) 0 xs): (h xs)

但我认为这两个功能f并h生成n*(n+1)/2许多元素，其中n列表的长度。最后一个列表需要很长时间，有 99999 个元素。

下一次尝试是找到局部最大值和最小值，并仅将最大值与最小值进行比较 - 这应该会将算法的成本降低到 #maxima*#minima

---------------------------------- try2.hs -------------------------------------
import Control.Applicative ((<$>))
-- import Control.Arrow ((&&&))

data Extremum = Max Int | Min Int deriving (Show)


main :: IO ()
main = do _ <- getLine
          e <- getExtremes
          print e

getExtremes :: IO Int
getExtremes = minimum . concat . myMap f . headextr .
                                         map read . take 1000 .words <$> getLine

myMap :: (a -> [a] -> [b]) -> [a] -> [[b]]
myMap _ [] = []
myMap g xx@(x:xs) = (g x) xx : myMap g xs

f :: Extremum -> [Extremum] -> [Int]
f (Max y) (Max _:xs) = f (Max y) xs
f (Max y) (Min x:xs) = (min 0 (x-y)): f (Max y) xs
f _ _ = []

headextr :: [Int] -> [Extremum]
headextr xx@(x:y:_) | x > y = Max x : extremes xx
                    | x < y = Min x : extremes xx
headextr xx = extremes xx


extremes :: [Int] -> [Extremum]
extremes [] = []
extremes [x] = [Max x, Min x]
extremes [x,y]      | x > y          =       Min y:[]
                    | otherwise      =       Max y:[]
extremes (x:y:z:xs) | x > y && y < z = Min y:extremes (y:z:xs)
                    | x < y && y > z = Max y:extremes (y:z:xs)
                    | otherwise      =       extremes (y:z:xs)

但仍然没有达到所需的 1 秒时间。我减少了take 1000用于分析的输入，但由于我不是专业程序员，我无法使用它，我发现的唯一信息 - 这很明显 - 在第一个版本f/h中是昂贵的功能，而在第二个版本中版本f也是罪魁祸首。

此练习的输入文件可在http://www.codingame.com/ide/fileservlet?id=372552140039找到- （如果此链接不起作用，可以在 www.codingame.com -> training -> 找到证券交易所损失 -> Test_5_input.txt/Test_5_output.txt)

那么如何加速这个算法，或者还有其他更快的算法吗？

score 9 · Accepted Answer

您的前两个解决方案很慢，因为对于列表中的每个元素，它们都会进行访问所有连续元素的计算。这是 O(n^2)。我还没有完全理解你的第二个解决方案，但它似乎是这样的：只过滤掉局部极值（局部极值意味着它大于它的两个邻居或小于它的两个邻居），然后运行 O(n^ 2）算法上的极值列表。不幸的是，在最坏的情况下，每个元素都可能是极值，因此总体上也是 O(n^2)。（事实上，在一个随机列表中，我们期望大多数元素都是局部极值，所以这不仅仅是对事物的悲观。）

让我们看看我们是否可以发明一个 O(n) 算法来代替。

我们将从稍微改写的 O(n^2) 算法开始。这个算法的想法是这样的：首先，不确定地选择列表中的一个位置来充当v_big。然后，不确定地选择列表中稍后的位置作为v_small。在所有这些非确定性选择中取最大值。在代码中：

f_spec xs = maximum $ do
    later@(v_big:_) <- tails xs
    v_small:_       <- tails later
    return (v_big - v_small)

现在，我们需要两个单独的见解来将其转化为 O(n) 算法。第一个是我们只需要拆分一次：一旦我们选择v_small了，我们就知道正确的选择方法v_big是选择列表中它之前的最大元素。我们可以这样实现该算法：

f_slow xs = maximum $ do
    earlier@(v_small:_) <- tails (reverse xs)
    let v_big = maximum earlier
    return (v_big - v_small)

这是“几乎”O(n)：它只做出一个不确定的选择；但是一旦我们做出选择后进行必要的计算仍然是 O(n)，导致总运行时间为 O(n^2)。第二个见解是，我们可以记住在我们的非确定性选择之后所需的计算，因此这个计算是 O(1)。我们可以像这样非常有效地构建所有最大值的列表：

maximums xs = scanl1 max xs

像maximum，这个函数需要 O(n) 时间；与不同maximum的是，这个返回的是所有前缀的最大值，xs而不仅仅是整个列表的最大值。所以现在，当我们进行非确定性选择时，我们可以同时选择两者：v_smallv_big

f_fast xs = maximum $ do
    (v_big, v_small) <- zip (maximums xs) xs
    return (v_big - v_small)

从那里你只需要一点美化就可以获得看起来非常漂亮并且仍然在 O(n) 时间内运行的东西：

f xs = maximum $ zipWith (-) (maximums xs) xs

score 1 · Accepted Answer

这是一个使用的解决方案Monoid，BiggestDrop它跟踪数字的最大下降。它会记住第三条信息，即一系列数字中的最小值。这将允许我们将数据集拆分为多个片段，处理这些片段，然后将这些片段组合起来以获得答案。下面的示例代码没有利用这一点；它只是在数据集上折叠Monoid'smappend一次。

可能有更好的方法来编写Monoid更快的 s 。

我尝试使用“管道”库，因为它似乎适合这个问题，但我认为它没有为解决方案添加任何内容。

{-# LANGUAGE ScopedTypeVariables #-}
module Main (
    main
) where


import System.IO

import Data.Maybe
import Data.Monoid
import Data.Char
import Control.Applicative
import Control.Monad

import Pipes
import qualified Pipes.Prelude as P

import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as C

-- Min monoid
newtype Min a = Min {
    getMin :: Maybe a
} deriving (Show, Eq)

instance (Ord a) => Monoid (Min a) where
    mempty = Min Nothing
    mappend x y = Min $ ((liftM2 min) (getMin x) (getMin y)) <|>  (getMin x) <|> (getMin y)

toMin = Min . Just

-- Max monoid
newtype Max a = Max {
    getMax :: Maybe a
} deriving (Show, Eq)

instance (Ord a) => Monoid (Max a) where
    mempty = Max Nothing
    mappend x y = Max $  ((liftM2 max) (getMax x) (getMax y)) <|>  (getMax x) <|> (getMax y)

toMax = Max . Just       

-- Extrema monoid
type Extrema a = (Min a, Max a)

getMinimum = getMin . fst
getMaximum = getMax . snd

toExtrema x = (toMin x, toMax x)

-- Biggest drop monoid

data BiggestDrop a = BiggestDrop {
    extrema :: Extrema a,
    biggestDrop :: Max a
} deriving Show

instance (Num a, Ord a) => Monoid (BiggestDrop a) where
    mempty = BiggestDrop {
        extrema = mempty,
        biggestDrop = mempty
    }
    mappend before after = BiggestDrop {
        extrema = mappend (extrema before) (extrema after),
        biggestDrop = mconcat [
            biggestDrop before,
            biggestDrop after,
            Max $ (liftM2 (-)) (getMaximum $ extrema before) (getMinimum $ extrema after)
        ]
    }

toBiggestDrop x = BiggestDrop {
        extrema = toExtrema x,
        biggestDrop = mempty
    }

-- Read data from stdin and fold BiggestDrop's mappend across it

main = do
  (answer :: BiggestDrop Int) <- P.fold mappend mempty id (words >-> (P.map (toBiggestDrop . read)))
  print answer
  print . fromJust . getMax $ biggestDrop answer
  where
    words = stdinWords >-> (P.map C.unpack) >-> (P.filter ((/=) []))


-- Produce words from stdin

stdinWords' :: (MonadIO m) => Int -> Producer B.ByteString m ()
stdinWords' chunkSize = goMore B.empty
    where
        goMore remainder = do
            eof <- liftIO isEOF
            case eof of
                True ->
                    unless (B.null remainder) $ yield remainder                    
                _ -> do
                    chunk <- liftIO $ B.hGet stdin chunkSize
                    let (first:others) = C.splitWith isSpace chunk
                    goParts ((B.append remainder first):others)
        goParts parts = do
            case parts of
                [] ->
                    goMore B.empty
                [x] ->
                    goMore x
                (x:xs) -> do
                    unless (B.null x) $ yield x
                    goParts xs

stdinWords = stdinWords' 512

我使用'pipes'库将上面的代码放在一起，希望了解'pipes-bytestring'。我不得不放弃并写了一个制作人来读取文件中的单词。从文件中读取的块大小只是一个猜测。

performance - 找到所有后续元素之间的最大差异

2 回答 2

Related

Reference