c# - 如何在 C# 中解析标记的文本

Question

我正在尝试使用 MigraDoc 制作一个简单的文本格式化程序来实际排版文本。我想通过标记文本来指定格式。例如，输入可能如下所示：

"The \i{quick} brown fox jumps over the lazy dog^{note}"

这将表示“快速”是斜体，“注意”是上标。为了进行拆分，我在我的字典中制作了一个字典TextFormatter：

internal static TextFormatter()
    {
        FormatDictionary = new Dictionary<string, TextFormats>()            
        {
            {@"^", TextFormats.supersript},
            {@"_",TextFormats.subscript},
            {@"\i", TextFormats.italic}
        };
    }

然后，我希望使用一些查找修饰符字符串并匹配大括号中的内容的正则表达式进行拆分。

但由于字符串中可以存在多种格式，我还需要跟踪匹配的正则表达式。例如得到一个List<string, TextFormats>, （其中string包含的字符串TextFormats是对应于适当特殊序列的 TextFormats 值，并且项目按出现顺序排序），然后我可以迭代基于TextFormats.

感谢您的任何建议。

score 1 · Accepted Answer

考虑以下代码...

string inputMessage = @"The \i{quick} brown fox jumps over the lazy dog^{note}";
MatchCollection matches = Regex.Matches(inputMessage, @"(?<=(\\i|_|\^)\{)\w*(?=\})");

foreach (Match match in matches)
{
    string textformat = match.Groups[1].Value;
    string enclosedstring = match.Value;
    // Add to Dictionary<string, TextFormats> 
}

祝你好运！

score 0 · Accepted Answer

我不确定 Dot-Net 中是否提供回调，但是

如果您有类似的字符串，"The \i{quick} brown fox jumps over the lazy dog^{note}" 并且
您只想在找到它们时进行替换。
可以通过回调使用正则表达式替换

 #  @"(\\i|_|\^){([^}]*)}"

 ( \\i | _ | \^ )         # (1)
 {
 ( [^}]* )                # (2)
 }

然后在回调中检查捕获缓冲区 1 的格式，替换为{fmtCodeStart}\2{fmtCodeEnd}

或者你可以使用

 #  @"(?:(\\i)|(_)|(\^)){([^}]*)}"

 (?:
      ( \\i )             # (1)
   |  ( _ )               # (2)
   |  ( \^ )              # (3)
 )
 {
 ( [^}]* )                # (4)
 }

然后在回调中

 if (match.Groups[1].sucess) 
   // return "{fmtCode1Start}\4{fmtCode1End}"
 else if (match.Groups[2].sucess) 
   // return "{fmtCode2Start}\4{fmtCode2End}"
 else if (match.Groups[3].sucess) 
   // return "{fmtCode3Start}\4{fmtCode3End}"

c# - 如何在 C# 中解析标记的文本

2 回答 2

Related

Reference