haskell - Haskell 选择性文本混淆

Question

我想在不模糊某些关键字的情况下混淆文本文件报告，例如报告标题、列标题等。我使用 newLisp 构建了这样的程序。我正在尝试从头开始在 Haskell 中实现该功能。这是我到目前为止得到的代码，对于简单混淆的情况，它可以成功编译并运行。

module Main where

import Data.Char (isAlpha, isNumber, isUpper, toUpper)
import System.Environment (getArgs)
import System.Random (getStdGen, randomR, StdGen)

helpMessage = [ "Usage: cat filename(s) | obfuscate [-x filename] > filename",
  "",
  "Obfuscates text files. This obliterates the text--there is no recovery. This",
  "is not encryption. It's simple, if slow, obfuscation.",
  "",
  "To include a list of words not to obfuscate, use the -x option. List one word",
  "per line in the file.",
  "" ]

data CLOpts = CLOpts { help           :: Bool
                     , exceptionFileP :: Bool
                     , exceptionFile  :: String }

main = do
  args <- getArgs
  if length args > 0
  then do let opts = parseCL args CLOpts { help=False, exceptionFileP=False, exceptionFile="" }
          if help opts
          then do putStrLn $ unlines helpMessage
          else do if exceptionFileP opts
                  then do exceptions <- readFile $ exceptionFile opts
                          obf complexObfuscation $ lines exceptions
                  else do obf simpleObfuscation []
  else do obf simpleObfuscation []
  where obf f xs = do
          g <- getStdGen
          c <- getContents
          putStrLn $ f xs g c

parseCL :: [String] -> CLOpts -> CLOpts
parseCL []          opts = opts
parseCL ("-x":f:xs) opts = parseCL xs opts { exceptionFileP=True, exceptionFile=f }
parseCL      (_:xs) opts = parseCL xs opts { help=True }

simpleObfuscation xs = obfuscate

complexObfuscation exceptions g c = undefined

obfuscate :: StdGen -> String -> String
obfuscate g = obfuscate' g []
  where
    obfuscate' _ a [] = reverse a
    obfuscate' g a text@(c:cs)
      | isAlpha  c = obf obfuscateAlpha g a text
      | isNumber c = obf obfuscateDigit g a text
      | otherwise  = obf id             g a text
    obf f g a (c:cs) = let (x,g') = f (c,g) in obfuscate' g' (x:a) cs

obfuscateAlpha, obfuscateDigit :: (Char, StdGen) -> (Char, StdGen)
obfuscateAlpha (c,g) = obfuscateChar g range
  where range
          | isUpper c = ('A','Z')
          | otherwise = ('a','z')

obfuscateDigit (c,g) = obfuscateChar g ('0','9')

obfuscateChar :: StdGen -> (Char, Char) -> (Char, StdGen)
obfuscateChar = flip randomR

除了作为例外传入的单词之外，我无法理解如何混淆所有文本。我的 newLisp 实现依赖于它的内置正则表达式处理。在 Haskell 中使用正则表达式时我运气不佳。可能是旧图书馆或其他东西。

我尝试将文本拆分为行和单词，并创建J所谓的音品。这种方法很快变得笨拙。我尝试使用解析器，但我认为这也会变得很麻烦。

有没有人对一种简单、直接的方法来识别文本中的异常词以及如何不将它们发送到混淆功能提出建议？Haskell 是如此出色的语言，我肯定在我眼皮底下漏掉了一些东西。

我试过谷歌，但似乎我希望提供一个不混淆的单词例外列表是新颖的。否则，混淆非常简单。

更新

按照我标记为答案的想法，我创建了自己的words函数：

words' :: String -> [String]
words' text = f text [] []
  where f [] wa ta = reverse $ wa:ta
        f (c:cs) wa ta =
          if isAlphaNum c
          then f cs (c:wa) ta
          else f cs [] $ if length wa > 0 then [c]:(reverse wa):ta else [c]:ta

使用break无效。我认为使用 break 和 span 的相互递归会起作用，但在我想尝试之前，我使用了上面的代码。

然后我实现了complexObfuscation如下：

complexObfuscation exceptions g = unlines . map obfuscateLine . lines
  where obfuscateLine = concatMap obfuscateWord . words'
        obfuscateWord word =
          if word `elem` exceptions
          then word
          else obfuscate g word

这完成了我所追求的。不幸的是，我没有预料到同一个生成器会在每次调用混淆时生成相同的字符。所以每个单词都以相同的字符开头。哈哈。另一天的问题。

score 1 · Accepted Answer

阅读异常文件并构建一个Data.Set.Set.

将输入文件拆分为后lines，将其进一步拆分为words.

然后，分别混淆每个单词。如果一个词是elem你Set之前构建的一个词，就让它保持原样。否则，将您的obfuscate功能应用于每个角色。

haskell - Haskell 选择性文本混淆

1 回答 1

Related

Reference