0
Here are some code to access terms in a Lucene document:
int docId = hits[i].doc;  
TermFreqVector tfvector = reader.getTermFreqVector(docId, "contents");  
TermPositionVector tpvector = (TermPositionVector)tfvector;  
// this part works only if there is one term in the query string,  
// otherwise you will have to iterate this section over the query terms.  
int termidx = tfvector.indexOf(querystr);  
int[] termposx = tpvector.getTermPositions(termidx);  
TermVectorOffsetInfo[] tvoffsetinfo = tpvector.getOffsets(termidx);  

我的问题是,使用 termposx,如何根据 termposx 数组获取术语?

4

1 回答 1

0

Zincup: termposx has {7, 19, 34}. What is the term at 8 or 9? how to access it?

TermPositionVector.getTermPositions() returns an array of positions in which the term is found.

Terms are identified by the index at which its number appears in the term String array obtained from the indexOf method.

So it is the same term that appears at multiple positions at {7, 19, 34}.

Using the TermPositionVector, you can gain access to the "positions in which each of the terms is found", but not the other way.

I am afraid, you've to iterate to find the term at positions 8,9. I will explore the API further and let you know if I find a solution.

于 2013-03-07T01:59:02.663 回答