f# - FParsec：如何组合解析器，以便它们以任意顺序匹配

Question

任务是找到特定的键值对并解析它们。对可以以任何顺序出现。我的部分工作尝试：

open FParsec

type Parser<'a> = Parser<'a, unit>
type Status = Running | Done

type Job = 
    { Id: int
      Status: Status
      Count: int }

let ws = spaces

let jobId: Parser<int> = ws >>. skipStringCI "Job id" >>. ws >>. skipChar '=' >>. ws >>. pint32

let status: Parser<Status> = 
    ws >>. skipStringCI "Status" >>. ws >>. skipChar '=' >>. ws >>. (
        (skipStringCI "Running" >>% Running) <|> (skipStringCI "Done" >>% Done))

let count: Parser<int> = ws >>. skipStringCI "Count" >>. ws >>. skipChar '=' >>. ws >>. pint32

let parse: Parser<Job> = parse {
    do! skipCharsTillStringCI "Job id" false 1000
    let! id = jobId
    do! skipCharsTillStringCI "Status" false 1000
    let! status = status
    do! skipCharsTillStringCI "Count" false 1000
    let! count = count
    return { Id = id; Status = status; Count = count }}

[<EntryPoint>]
let main argv = 
    let sample = """
Some irrelevant text.
Job id = 33
Some other text.
Status = Done
And another text.
Count = 10
Trailing text.
"""
    printfn "%A" (run parse sample)
    0
(* 
result:
 Success: {Id = 33;
 Status = Done;
 Count = 10;} 
*)

因此，它可以工作，但有两个问题：明显的重复（jobId 函数中的“Job id”和顶级解析器中的“Job id”等），它需要“Job id”、“Status”和“Count” " 以这个特定的顺序排序，这在要求上是错误的。

我有一种强烈的感觉，有一个优雅的解决方案。

谢谢！

score 4 · Accepted Answer

第一个问题（重复）可以通过一个小的重构来解决。基本思想是将每个解析器包装到一个可以跳过的包装器中。
请注意，这段代码还远非完美，我只是试图使重构尽可能小。

let jobId: Parser<int> = pint32

let status: Parser<Status> = 
    (skipStringCI "Running" >>% Running) <|> (skipStringCI "Done" >>% Done)

let count: Parser<int> = pint32

let skipAndParse prefix parser =
    skipCharsTillStringCI prefix false 1000
    >>. ws >>. skipStringCI prefix >>. ws >>. skipChar '=' >>. ws >>. parser

let parse: Parser<Job> = parse {
    let! id = skipAndParse "Job id" jobId
    let! status = skipAndParse "Status"  status
    let! count = skipAndParse "Count" count
    return { Id = id; Status = status; Count = count }}

第二个问题更复杂。如果您希望数据线以自由顺序出现，则必须考虑以下情况：

并非所有数据线都存在；
某条数据线出现两次以上；

为了缓解这种情况，您需要生成一条list找到的数据线，分析所需的一切是否都在那里，并决定如何处理任何可能的重复项。

请注意，每个数据行不能再有“跳过”部分，因为它可能会在实际解析器之前跳过一个信息行。

let skipAndParse2 prefix parser =
    ws >>. skipStringCI prefix >>. ws >>. skipChar '=' >>. ws >>. parser

// Here, you create a DU that will say which data line was found
type Result =
    | Id of int
    | Status of Status
    | Count of int
    | Irrelevant of string

// here's a combinator parser
let parse2 =
    // list of possible data line parsers
    // Note they are intentionally reordered
    [
    skipAndParse2 "Count" count |>> Count
    skipAndParse2 "Status"  status |>> Status
    skipAndParse2 "Job id" jobId |>> Id
    // the trailing one would skip a line in case if it has not
    // been parsed by any of prior parsers
    // a guard rule is needed because of specifics of
    // restOfLine behavior at the end of input: namely, it would
    // succeed without consuming an input, which leads
    // to an infinite loop. Actually FParsec handles this and
    // raises an exception
    restOfLine true .>> notFollowedByEof |>> Irrelevant
    ]
    |> List.map attempt // each parser is optional
    |> choice // on each iteration, one of the parsers must succeed
    |> many // a loop

运行代码：

let sample = "
Some irrelevant text.\n\
Job id = 33\n\
Some other text.\n\
Status = Done\n\
And another text.\n\
Count = 10\n\
Trailing text.\n\
"

sample |> run parse2 |> printfn "%A "

将产生以下输出：

Success: [Irrelevant ""; Irrelevant "Some irrelevant text."; Id 33;
Irrelevant ""; Irrelevant "Some other text."; Status Done; Irrelevant "";
Irrelevant "And another text."; Count 10; Irrelevant ""]

进一步的处理需要过滤Irrelevant元素，检查重复或缺失的项目，并形成Job记录或引发错误。

更新：进一步处理以隐藏Result并返回的简单示例Job option：

// naive implementation of the record maker
// return Job option
// ignores duplicate fields (uses the first one)
// returns None if any field is missing
let MakeJob arguments =
    let a' =
        arguments
        |> List.filter (function |Irrelevant _ -> false | _ -> true)

    try
        let theId     = a' |> List.pick (function |Id x -> Some x | _ -> None)
        let theStatus = a' |> List.pick (function |Status x -> Some x | _ -> None)
        let theCount  = a' |> List.pick (function |Count x -> Some x | _ -> None)
        Some { Id=theId; Status = theStatus; Count = theCount }
    with
        | :?System.Collections.Generic.KeyNotFoundException -> None

要使用它，只需将以下行添加到代码中parse2：

|>> MakeJob

f# - FParsec：如何组合解析器，以便它们以任意顺序匹配

1 回答 1

Related

Reference