Is there a way with Lucene 4.4 to determine exactly which terms satisfied a query? I need to highlight only terms that caused the document to be returned, not the same term elsewhere in the document. For example, given the document:
We are going to visit the White House today. I hear it is painted white.
and the phrase query "white house"
, I want to highlight these terms:
We are going to visit the <b>White</b> <b>House</b> today. I hear it is painted white.
I've been using PostingsHighlighter, but it will highlight the word "white" in the second sentence as well. I don't want that because the single term "white" does not satisfy the phrase query.
It looks like the only information that comes back from a search are the document IDs and scores. I don't really care about scores for the purpose of relevancy ranking, because I'll be working with all of the documents returned. Is there something I could do with custom scoring that would preserve the information I need? Or is there a better approach that I'm missing?