我有一个文本字符串(svg 文件),我需要在其中检索给定起始位置的嵌套标签的完整分支。该算法应返回最少数量的 TAG,且打开和关闭均等。
<g... <g... </g... <g... </g... </g>
一些样本数据:
<g id="shape57-240" v:mID="57" v:groupContext="shape" transform="translate(550.246,-1356.2)">
<title>Entscheidung.54</title>
<desc>Modelcoding</desc>
<v:userDefs>
<v:ud v:nameU="msvThemeColors" v:val="VT0(36):26"/>
<v:ud v:nameU="msvThemeEffects" v:val="VT0(16):26"/>
</v:userDefs>
<v:textBlock v:margins="rect(4,4,4,4)"/>
<v:textRect cx="42.5197" cy="1583.12" width="85.04" height="42.5197"/>
<g id="shadow57-241" v:groupContext="shadow" v:shadowOffsetX="1.44" v:shadowOffsetY="-1.44" v:shadowType="1"
transform="matrix(1,0,0,1,1.44,1.44)" class="st9">
<path d="M0 1583.12 L21.26 1561.86 L42.52 1561.86 L63.78 1561.86 L85.04 1583.12 L63.78 1604.38 L42.52 1604.38 L21.26
1604.38 L0 1583.12 Z" class="st10"/>
</g>
<path d="M0 1583.12 L21.26 1561.86 L42.52 1561.86 L63.78 1561.86 L85.04 1583.12 L63.78 1604.38 L42.52 1604.38 L21.26
1604.38 L0 1583.12 Z" class="st23"/>
<text x="13.52" y="1580.12" class="st12" v:langID="1034"><v:paragraph v:horizAlign="1"/><v:tabList/>Modelcoding<v:newlineChar/></text></g>
这是我最终想出的解决方案。我昨天遇到的问题与损坏的 SVG 结构有关;因此,我在 40 次迭代后在 while 循环中放置了一个中断。
string ExtractTAG(string s, int offset)
{
string org = s.Substring(offset, s.Length - offset);
s = s.Substring(offset + 1, s.Length - offset - 1);
int shift = 0;
int level = 1;
int n = 0;
int nextClose;
int nextOpen;
int CheckPoint = 0;
while (level != 0 && n < 40)
{
nextClose = s.IndexOf("</g>");
nextOpen = s.IndexOf("<g ");
if (nextClose == -1)
{
//no next closing point => corrupt file
return "";
}
if (nextClose < nextOpen || nextOpen == -1)
{
CheckPoint = nextClose;
level = level - 1;
}
else if (nextClose > nextOpen)
{
CheckPoint = nextOpen;
level = level + 1;
}
shift = shift + CheckPoint;
n = n + 1;
s = s.Substring(CheckPoint + 1, s.Length - CheckPoint - 1);
}
if (n == 40)
{
return "Length could not be determined after 40 iteration - SVG might be courrupt";
}
return org.Substring(0, shift + n + 4);
}