2

我来自服务的一些 html 响应

<style> .transcription, .trsc{line-height:19px; padding-left:20px; font-family:Lucida Sans Unicode; padding-right:5px;} </style><div id="shView"> <div class="cforms_result" id="cforms_result1"> <div class="ref_cform" onclick="javascript:GetFullWordCBK('1', 'wordER');"><span class="fsform_link"><a href="javascript:;" onclick="javascript:GetFullWordCBK('1', 'wordER');"><img src="/images/common/owl_ico16.gif" width="19" height="19" border="0"></a><a href="javascript:;" onclick="javascript:GetFullWordCBK('1', 'wordER');"> Спряжение </a></span><span class="ref_source">mother<wrs><span class="sforms_src"><span class="w_des">Infinitive</span><b>mother</b><br><span class="w_des">Past Indefinite</span><b>mothered</b><br><span class="w_des">Participle II</span><b>mothered</b><br><span class="w_des">Participle I</span><b>mothering</b></span></wrs></span>&nbsp;<span class="ref_info"></span>, <span class="ref_psp">Глагол</span></div> <div class="tr_pr"><span class="transcription">[ˈmʌðə]</span><span class="pronunciation"><a href="javascript:;" class="pbf_s" id="lnkGtTr1" onclick="javascript:ListenWord(this,'mother',1,'play');"><img src="/images/common/vol_on.gif" align="absmiddle" border="0" id="imgGtTr1"><span> powered by <img src="/images/common/logoforvo.gif" width="59" height="17" border="0" hspace="5" align="absmiddle" style="cursor:point; cursor:hand;" onclick="window.open('http://ru.forvo.com/');"></span></a><span class="loadFrv" id="loadFrv1"><img hspace="10" src="/images/common/al_fullWR.gif" align="absmiddle"></span><span style="width:20px; height:17px;" class="pbf_s" id="speaker_on1"><span> powered by <img src="/images/common/logoforvo.gif" width="59" height="17" border="0" hspace="5" align="absmiddle" style="cursor:point; cursor:hand;" onclick="window.open('http://ru.forvo.com/');"></span></span></span></div> <div id="translations" onclick="javascript:GetFullWordCBK('1', 'wordER');"> <ol> <li><span class="ref_result">относиться по-матерински<wrs><span class="sforms_src"></span></wrs></span> <span class="ref_info"></span></li> </ol> </div> </div><script> $('.sforms_src').filter(function(index) { return $(this).html().length == 0;}).remove();//getPrLink('mother ');//$('#speaker_on').unbind('click','ShowFullWRefERRE')//$('#speaker_on').click(function(){alert("не открывать окно расширеной справки");}); </script><div class="cforms_result" id="cforms_result2"> <div class="ref_cform" onclick="javascript:GetFullWordCBK('2', 'wordER');"><span class="fsform_link"><a href="javascript:;" onclick="javascript:GetFullWordCBK('2', 'wordER');"><img src="/images/common/owl_ico16.gif" width="19" height="19" border="0"></a><a href="javascript:;" onclick="javascript:GetFullWordCBK('2', 'wordER');"> Склонение </a></span><span class="ref_source">mother<wrs><span class="sforms_src"><span class="w_des">Singular</span><b>mother</b><br><span class="w_des">Plural</span><b>mothers</b></span></wrs></span>&nbsp;<span class="ref_info"></span>, <span class="ref_psp">Существительное</span></div> <div class="tr_pr"><span class="transcription">[ˈmʌðə]</span><span class="pronunciation"><a href="javascript:;" class="pbf_s" id="lnkGtTr2" onclick="javascript:ListenWord(this,'mother',2,'play');"><img src="/images/common/vol_on.gif" align="absmiddle" border="0" id="imgGtTr2"><span> powered by <img src="/images/common/logoforvo.gif" width="59" height="17" border="0" hspace="5" align="absmiddle" style="cursor:point; cursor:hand;" onclick="window.open('http://ru.forvo.com/');"></span></a><span class="loadFrv" id="loadFrv2"><img hspace="10" src="/images/common/al_fullWR.gif" align="absmiddle"></span><span style="width:20px; height:17px;" class="pbf_s" id="speaker_on2"><span> powered by <img src="/images/common/logoforvo.gif" width="59" height="17" border="0" hspace="5" align="absmiddle" style="cursor:point; cursor:hand;" onclick="window.open('http://ru.forvo.com/');"></span></span></span></div> <div id="translations" onclick="javascript:GetFullWordCBK('2', 'wordER');"> <ol> <li><span class="ref_result">мать<wrs><span class="sforms_src"></span></wrs></span> <span class="ref_info">f</span></li> <li><span class="ref_result">родительский элемент<wrs><span class="sforms_src"></span></wrs></span> <span class="ref_info">m</span><span class="ref_dictionary"> (ИТ - базовый) </span></li> <li><span class="ref_result">родительский<wrs><span class="sforms_src"></span></wrs></span><span class="ref_comment"> (attributive) </span> <span class="ref_info"></span><span class="ref_dictionary"> (ИТ - базовый) </span></li> <li><span class="ref_result">прототип<wrs><span class="sforms_src"></span></wrs></span> <span class="ref_info">m</span><span class="ref_dictionary"> (Политехнический) </span></li> <li><span class="ref_result">начало<wrs><span class="sforms_src"></span></wrs></span> <span class="ref_info">n</span><span class="ref_dictionary"> (Политехнический) </span></li> </ol> </div> </div><script> $('.sforms_src').filter(function(index) { return $(this).html().length == 0;}).remove();//getPrLink('mother ');//$('#speaker_on').unbind('click','ShowFullWRefERRE')//$('#speaker_on').click(function(){alert("не открывать окно расширеной справки");}); </script><div class="cforms_result" id="cforms_result3"> <div class="ref_cform" onclick="javascript:GetFullWordCBK('3', 'wordER');"><span class="fsform_link"><a href="javascript:;" onclick="javascript:GetFullWordCBK('3', 'wordER');"><img src="/images/common/owl_ico16.gif" width="19" height="19" border="0"></a><a href="javascript:;" onclick="javascript:GetFullWordCBK('3', 'wordER');"> Склонение </a></span><span class="ref_source">mother<wrs><span class="sforms_src"><span class="w_des">Positive</span><b>mother</b><br></span></wrs></span>&nbsp;<span class="ref_info"></span>, <span class="ref_psp">Прилагательное</span></div> <div class="tr_pr"><span class="transcription">[ˈmʌðə]</span><span class="pronunciation"><a href="javascript:;" class="pbf_s" id="lnkGtTr3" onclick="javascript:ListenWord(this,'mother',3,'play');"><img src="/images/common/vol_on.gif" align="absmiddle" border="0" id="imgGtTr3"><span> powered by <img src="/images/common/logoforvo.gif" width="59" height="17" border="0" hspace="5" align="absmiddle" style="cursor:point; cursor:hand;" onclick="window.open('http://ru.forvo.com/');"></span></a><span class="loadFrv" id="loadFrv3"><img hspace="10" src="/images/common/al_fullWR.gif" align="absmiddle"></span><span style="width:20px; height:17px;" class="pbf_s" id="speaker_on3"><span> powered by <img src="/images/common/logoforvo.gif" width="59" height="17" border="0" hspace="5" align="absmiddle" style="cursor:point; cursor:hand;" onclick="window.open('http://ru.forvo.com/');"></span></span></span></div> <div id="translations" onclick="javascript:GetFullWordCBK('3', 'wordER');"> <ol> <li><span class="ref_result">родительский<wrs><span class="sforms_src"></span></wrs></span> <span class="ref_info"></span><span class="ref_dictionary"> (ИТ - базовый) </span></li> </ol> </div> </div><script> $('.sforms_src').filter(function(index) { return $(this).html().length == 0;}).remove();//getPrLink('mother ');//$('#speaker_on').unbind('click','ShowFullWRefERRE')//$('#speaker_on').click(function(){alert("не открывать окно расширеной справки");}); </script><div id="fullRLink"><a href="javascript:GetFullWordCBK('1', 'wordER');">Показать полную словарную статью</a><span id="al_fullWR"><img src="/images/common/al_fullWR.gif" align="middle" hspace="10"> Загружаем...</span></div></div>

我想在此模式之间获取文本<span class="ref_result">TEXT<wrs>

我使用此代码获取所有匹配项

const string pattern = "ref_result\">\\w+<";
Regex rgx = new Regex(pattern, RegexOptions.Compiled);
var text = SantinizeOutput(result.result);
MatchCollection matches = rgx.Matches(text);
if(matches.Count > 0)
{
  resultsList = new List<string>(matches.Count);
  foreach(Match match in rgx.Matches(text))
  {
    string formattedWord = match.Value;
    int leftAngleBracketIndex = formattedWord.IndexOf(">");
    var word = formattedWord.Remove(0, leftAngleBracketIndex + 1);
    word = word.TrimEnd('<');
    resultsList.Add(word);
  }
}


private string SantinizeOutput(string input)
{
  var text = input.Replace("\n", "").Replace("\r", "");
  return Regex.Replace(text, "\\s+", " ");
}

在本文中,有 7 个匹配项,但结果只有 5 个。

我在哪里做错了?

4

3 回答 3

3

\w表示“字字符”;它不匹配空格。观察其中两个ref_result标签包含空格:

<span class="ref_result">относиться по-матерински<wrs>
<span class="ref_result">родительский элемент<wrs>

只需用于"ref_result\">[^<]+<wrs"获取所有非标签内容。

于 2012-09-29T18:00:34.047 回答
2

尝试将您的 \w 更改为 .*?

所以:

const string pattern = "ref_result\">.*?<";

.*? 将获取所有字符(以非贪婪的方式),直到它击中第一个 < 字符。

.* 将获取所有字符(以贪婪的方式),直到它碰到最后一个 < 字符。您将需要使用非贪婪方法。

于 2012-09-29T18:00:32.170 回答
0

通过更改您的正则表达式,您还可以删除代码中的一些逻辑。

const string pattern = "ref_result\">([^<]*)";
Regex rgx = new Regex(pattern, RegexOptions.Compiled);
var text = SantinizeOutput(result.result);
MatchCollection matches = rgx.Matches(text);

List<string> resultsList = new List<string>(matches.Count);
for(int i=0; i<resultsList.Length; i++) {
  resultsList.Add(matches[i].Groups[1].Value);
}

private string SantinizeOutput(string input) {
  var text = input.Replace("\n", "").Replace("\r", "");
  return Regex.Replace(text, "\\s+", " ");
}
于 2012-09-29T18:22:35.303 回答