regex - 不区分大小写的正则表达式

Question

在 Haskell 中使用带有选项（标志）的正则表达式的最佳方法是什么

我用

Text.Regex.PCRE

文档列出了一些有趣的选项，例如 compCaseless、compUTF8、... 但我不知道如何将它们与 (=~) 一起使用

score 18 · Accepted Answer

所有Text.Regex.*模块都大量使用类型类，这些类型类用于可扩展性和类似“重载”的行为，但仅从类型来看，使用就不那么明显了。

现在，您可能已经从基本=~匹配器开始了。

(=~) ::
  ( RegexMaker Regex CompOption ExecOption source
  , RegexContext Regex source1 target )
  => source1 -> source -> target
(=~~) ::
  ( RegexMaker Regex CompOption ExecOption source
  , RegexContext Regex source1 target, Monad m )
  => source1 -> source -> m target

要使用=~，必须存在RegexMaker ...LHS 的实例，以及RegexContext ...RHS 和结果的实例。

class RegexOptions regex compOpt execOpt | ...
      | regex -> compOpt execOpt
      , compOpt -> regex execOpt
      , execOpt -> regex compOpt
class RegexOptions regex compOpt execOpt
      => RegexMaker regex compOpt execOpt source
         | regex -> compOpt execOpt
         , compOpt -> regex execOpt
         , execOpt -> regex compOpt
  where
    makeRegex :: source -> regex
    makeRegexOpts :: compOpt -> execOpt -> source -> regex

所有这些类的有效实例（例如、regex=Regex、compOpt=CompOption和execOpt=ExecOption）source=String意味着可以使用某种形式的选项regex来编译 a 。（另外，给定某种类型，只有一组与之配套。不过，很多不同的类型都可以。）compOpt,execOptsourceregexcompOpt,execOptsource

class Extract source
class Extract source
      => RegexLike regex source
class RegexLike regex source
      => RegexContext regex source target
  where
    match :: regex -> source -> target
    matchM :: Monad m => regex -> source -> m target

所有这些类的有效实例（例如 , , regex=Regex）意味着可以匹配 a和 a以产生 a 。（给定这些特定的其他有效 s是, , ,等）source=Stringtarget=BoolsourceregextargettargetregexsourceIntMatchResult StringMatchArray

把这些放在一起，很明显=~and=~~只是方便的功能

source1 =~ source
  = match (makeRegex source) source1
source1 =~~ source
  = matchM (makeRegex source) source1

还有那个=~并且=~~没有空间将各种选项传递给makeRegexOpts.

你可以自己做

(=~+) ::
   ( RegexMaker regex compOpt execOpt source
   , RegexContext regex source1 target )
   => source1 -> (source, compOpt, execOpt) -> target
source1 =~+ (source, compOpt, execOpt)
  = match (makeRegexOpts compOpt execOpt source) source1
(=~~+) ::
   ( RegexMaker regex compOpt execOpt source
   , RegexContext regex source1 target, Monad m )
   => source1 -> (source, compOpt, execOpt) -> m target
source1 =~~+ (source, compOpt, execOpt)
  = matchM (makeRegexOpts compOpt execOpt source) source1

可以像这样使用

"string" =~+ ("regex", CompCaseless + compUTF8, execBlank) :: Bool

或用可以接受选项的方法覆盖=~和=~~

import Text.Regex.PCRE hiding ((=~), (=~~))

class RegexSourceLike regex source
  where
    makeRegexWith source :: source -> regex
instance RegexMaker regex compOpt execOpt source
         => RegexSourceLike regex source
  where
    makeRegexWith = makeRegex
instance RegexMaker regex compOpt execOpt source
         => RegexSourceLike regex (source, compOpt, execOpt)
  where
    makeRegexWith (source, compOpt, execOpt)
      = makeRegexOpts compOpt execOpt source

source1 =~ source
  = match (makeRegexWith source) source1
source1 =~~ source
  = matchM (makeRegexWith source) source1

或者您可以在需要的地方直接使用match,makeRegexOpts等。

score 10 · Accepted Answer

我对 Haskell 一无所知，但如果您使用的是基于 PCRE 的正则表达式库，那么您可以在正则表达式中使用模式修饰符。要以不区分大小写的方式匹配“无大小写”，您可以在 PCRE 中使用此正则表达式：

(?i)caseless

模式修饰符 (?i) 覆盖在正则表达式之外设置的任何区分大小写或不区分大小写选项。它也适用于不允许您设置任何选项的运算符。

类似地，(?s) 打开“单行模式”，使点匹配换行符，(?m) 打开“多行模式”，使 ^ 和 $ 在换行符处匹配，并且 (?x) 打开自由-spacing 模式（字符类之外的未转义空格和换行符无关紧要）。你可以组合字母。(?ismx) 打开一切。连字符关闭选项。(?-i) 使正则表达式区分大小写。(?xi) 开始一个自由间距区分大小写的正则表达式。

score 8 · Accepted Answer

如果您希望compOpt使用defaultCompOpt.

像这样的工作：

match (makeRegexOpts compCaseless defaultExecOpt  "(Foo)" :: Regex) "foo" :: Bool

以下两篇文章应该对您有所帮助：

Real World Haskell，第 8 章。高效的文件处理、正则表达式和文件名匹配

Haskell 正则表达式教程

regex - 不区分大小写的正则表达式

3 回答 3

Related

Reference