c# - 获取字符串中的标记块

Question

我的项目陷入困境，无法克服这个困难。我需要其他人的帮助来解决这个问题：

我有一个字符串，在该字符串内部有一些标记文本，我想手动将它们取出并将它们放入字符串的数组列表中。最终结果可能有两个数组列表，一个是普通文本，另一个是令牌文本。下面是一个字符串示例，其中包含一些被打开标记“[[”和关闭标记“]]”包围的标记。

第一步，通过将淀粉源与热水混合来制备麦芽汁，称为 [[Textarea]]。热水与压碎的麦芽或麦芽酒在麦芽浆桶中混合。糖化过程大约需要 [[CheckBox]]，在此期间淀粉转化为糖，然后甜麦芽汁从谷物中排出。谷物现在在称为 [[Radio]] 的过程中进行清洗。这种洗涤允许酿酒商尽可能地从谷物中收集 [[DropDownList]] 可发酵液体。

操作字符串后得到两个数组列表：

结果：

Normal Text ArrayList { "The first step, where the wort is prepared by mixing the starch source with hot water, is known as ", ". Hot water is mixed with crushed malt or malts in a mash tun. The mashing process takes around ", ", during which the starches are converted to sugars, and then the sweet wort is drained off the grains. The grains are now washed in a process known as ", ". This washing allows the brewer to gather ", " the fermentable liquid from the grains as possible." }

Token Text ArrayList { "[[Textarea]]", "[[CheckBox]]", "[[Radio]]", "[[DropDownList]]" }

两个数组列表，一个是普通文本数组列表，有 5 个元素，是标记之前或之后的文本，另一个是标记文本数组列表，有 4 个元素，是字符串内部的标记文本。

这个作品可以用哪一种技术切分串，但是对于长长的文本来说太难了，而且很容易出错，有时不能得到我想要的。如果在这个问题上有一些帮助，请在 C# 中发布，因为我正在使用 C# 来执行此任务。

score 1 · Accepted Answer

这似乎可以完成这项工作（尽管请注意，目前，我的tokens数组包含普通标记，而不是用[[and包裹它们]]：

var inp = @"The first step, where the wort is prepared by mixing the starch source with hot water, is known as [[Textarea]]. Hot water is mixed with crushed malt or malts in a mash tun. The mashing process takes around [[CheckBox]], during which the starches are converted to sugars, and then the sweet wort is drained off the grains. The grains are now washed in a process known as [[Radio]]. This washing allows the brewer to gather [[DropDownList]] the fermentable liquid from the grains as possible.";

var step1 = inp.Split(new string[] { "[[" }, StringSplitOptions.None);
//step1 should now contain one string that's due to go into normal, followed by n strings which need to be further split
var step2 = step1.Skip(1).Select(a => a.Split(new string[] { "]]" }, StringSplitOptions.None));
//step2 should now contain pairs of strings - the first of which are the tokens, the second of which are normal strings.

var normal = step1.Take(1).Concat(step2.Select(a => a[1])).ToArray();
var tokens = step2.Select(a => a[0]).ToArray();

这也假设输入中没有不平衡[[和]]序列。

进入该解决方案的观察结果：如果您首先将字符串拆分为[[原始文本中的每一对，那么第一个输出字符串已经生成。此外，第一个字符串之后的每个字符串都包含一个标记、]]对和一个普通文本。例如，第二个结果step1是：“Textarea]]。热水与压碎的麦芽或麦芽混合在糖化桶中。糖化过程大约需要”

因此，如果您将这些其他结果拆分]]成对，那么第一个结果是一个标记，第二个结果是一个普通字符串。

c# - 获取字符串中的标记块

1 回答 1

Related

Reference