2

https://www.fpcomplete.com/school/starting-with-haskell/libraries-and-frameworks/text-manipulation/attoparsec给出的解析器似乎可以工作,但它有一个问题。

代码(这里重复)是:

{-# LANGUAGE OverloadedStrings #-}

-- This attoparsec module is intended for parsing text that is
-- represented using an 8-bit character set, e.g. ASCII or ISO-8859-15.
import Data.Attoparsec.Char8
import Data.Word

-- | Type for IP's.
data IP = IP Word8 Word8 Word8 Word8 deriving Show

parseIP :: Parser IP
parseIP = do
  d1 <- decimal
  char '.'
  d2 <- decimal
  char '.'
  d3 <- decimal
  char '.'
  d4 <- decimal
  return $ IP d1 d2 d3 d4

main :: IO ()
main = print $ parseOnly parseIP "131.45.68.123"

如果解析器输入了一个无效的 IP 地址,例如“1000.1000.1000.1000”,它不会失败,并且由于强制数字转换而返回垃圾结果。

有没有简单的方法来解决这个问题?一种方法是使用较大的Word类型Word32并检查数字是否小于 256。但是,如果输入是病态的(例如溢出Word32),即使这样也可能返回垃圾。转换为Integer似乎是一种选择,因为它是无界的,但同样,对抗性输入可能会使程序耗尽内存。

那么避免这些问题的(希望是优雅的)解析器会是什么样子呢?

4

2 回答 2

3

My understanding of your question is that you not only want to fail when the input number is too large, but also you don't want the parser to consume more input than is needed.

We can define a function to parse integers up to a maximum, failing otherwise:

import Data.Attoparsec.ByteString.Char8
import Data.Word
import Data.ByteString (ByteString)
import qualified Data.ByteString as B
import Control.Applicative
import Data.List (foldl')
import Control.Monad 

decimalMax :: Integral a => Integer -> Parser a 
decimalMax dMax = do  
  let numDigs = ceiling $ log (fromIntegral(dMax+1)) / log 10
      getVal = foldl' (\s d -> s*10+fromIntegral (d-48)) 0 . B.unpack
  val <- getVal <$> scan 0 (\n c -> 
          if n > numDigs || not (isDigit c) then Nothing else Just (n+1)) 
  if val <= dMax 
    then return $ fromIntegral val 
    else fail $ "decimalMax: parsed decimal exceeded" ++ show dMax

This function computes the number of digits in the maximum number, then simply consumes at most that many digits. Your parser for IP addresses is remains almost the same:

parseIP :: Parser IP
parseIP = IP <$> dd <*> dd <*> dd <*> dig where 
  dig = decimalMax 255
  dd = dig <* char '.' 

main :: IO ()
main = do
  print $ parseOnly parseIP "131.45.68.123"
  print $ parseOnly parseIP "1000.1000.1000.1000"
于 2015-12-15T15:00:35.820 回答
1

对于简单的非病态输入,您确实可以强制转换为Word8from Integer,这是任意精度并且永远不会溢出:

byte :: Parser Word8
byte = do
    n <- (decimal :: Parser Integer)
    if n < 256 then return n 
               else fail $ "Byte Overflow: " ++ show n ++ " is greater than 255."

现在修改后的程序,

parseIP = do
    d1 <- byte
    char '.'
    d2 <- byte
    char '.'
    d3 <- byte
    char '.'
    d4 <- byte
    return $ IP d1 d2 d3 d4

应该产生必要的输出。

如果您想通过将“1291293919818283309400919...”写为一个非常长的数字来处理试图对您进行 DoS 攻击的人,那么我预见需要做更多的工作来验证某些东西确实是那个长度,以便您在扫描之前最多扫描三位数失败马上就第一次char '.'

以下似乎可以编译并使用import qualified Data.ByteString as BS最高层:

scan0to3digits :: Int -> Char -> Maybe Int
scan0to3digits  = scan 0 helper where
  helper n c 
    | n < 3 && isDigit c  = Just (n + 1)
    | otherwise           = Nothing

byte :: Parser Word8
byte = do
    raw <- scan 0 scan0to3digits
    let p = BS.foldl' (\sum w8 -> 10 * sum + fromIntegral w8 - 48) 0 raw
    if BS.length raw == 0 
      then fail "Expected one or more digits..."
      else if p > 255
        then fail $ "Byte Overflow: " ++ show n ++ " is greater than 255."
        else return (fromInteger p)
于 2015-12-15T15:18:09.813 回答