我为此搜索了很多,但直到现在我都找不到任何解决方案。我有一个与邻近搜索相结合的大型查询。我需要找出查询中结果的确切位置在哪里。例如查询的一部分是 "hospital"~2 "readmissio"~2 。现在 Lucene 检索到正确的文档,其中一个包含正确的值“hospital re-admission”,但是我如何在文档中的“re-admission”查询中突出显示“readmissio”。有一些像 levenshtein 这样的解决方法,但由于检索量很大,这是不切实际的,我希望 lucene 中有一些解决方案可以从查询中找出检索到的数据的位置?请指教。
string newQueryString = "";
string querystring = "To determine whether high performing hospitals with low 30 day risk standardized hospital readmissio rates have a lower proportion of readmission";
string[] tokens = NLP.Tokenize(querystring);
for (int i = 0; i < tokens.Length; i++)
{
string token = tokens[i];
token = "\"" + token + "\"" + "~" + 2;
// add token to new string expression
newQueryString = newQueryString + " " + token;
}
query = MultiFieldQueryParser.Parse(Lucene.Net.Util.Version.LUCENE_CURRENT,
new string[] { newQueryString }
, new string[] { "TERM" },
selectedAnalyzer);
TopDocs tp = indexSearcher.Search(query, 1);
int max = tp.TotalHits;
List<Lucene.Net.Documents.Document> ids = new List<Lucene.Net.Documents.Document>();
TopScoreDocCollector collector = TopScoreDocCollector.Create(max+1, true);
indexSearcher.Search(query, collector);
ScoreDoc[] hits = collector.TopDocs().ScoreDocs;
for (int i = 0; i < hits.Length; i++)
{
int docId = hits[i].Doc;
Lucene.Net.Documents.Document doc = indexSearcher.Doc(docId);
string concept = doc.GetFieldable("TERM").StringValue;
string[] contentTerms = NLP.Tokenize(querystring);
ITermFreqVector tfvector = reader.GetTermFreqVector(docId, "TERM");
TermPositionVector tpvector = (TermPositionVector)tfvector;
for (int k = 0; k < contentTerms.Length; k++)
{
string[] terms = tfvector.GetTerms();
int termidx = tfvector.IndexOf(contentTerms[k].Trim()); /// How to have readmission/readmissions here?????
int[] termposx = tpvector.GetTermPositions(termidx);
TermVectorOffsetInfo[] tvoffsetinfo = tpvector.GetOffsets(termidx);
int offsetStart = 0;
int offsetEnd = 0;
List<Tokens> conceptList = new List<Tokens>();
for (int j = 0; j < tvoffsetinfo.Length; j++)
{
offsetStart = tvoffsetinfo[j].StartOffset;
offsetEnd = tvoffsetinfo[j].EndOffset;
string key = concept.Substring(offsetStart, offsetEnd - offsetStart);
//////////////// Code continue////////////////
}
}
}
因为它显示了如何在查询字符串中突出显示“重新入院”概念的“redamissi”部分。???或者在另一个例子中,如果我的查询中有重新接纳,我如何突出显示它以获取“重新接纳”或“接纳”的检索值?