nlp - NLP - How to Identify whether 2 texts refer to similar object

Question

I am looking for some general guidance here.

The high-level use case is such that I receive some product documents from which I need to extract some information and process it. Before doing that, I need to verify that the document is actually referring to the correct product. For that I need to validate the product heading/description from document against what I know to be correct.

So I have 2 texts

Text 1 - this refers to the product information extracted from some document
Text 2 - this is the actual product heading/description available with me, which can be considered as correct.

I need to validate that both texts refer to same product or object.

Example:

Text 1 (to be validated) - Optimus Prime Costume, Blue, with good packaging and warranty
Text 2 (correct info) - Optimus Prime Blue Costume, Medium Size`

You see, I need to validate that both text refer to Optimus Prime Costume.

I tried following methods -

Cosine Similarity
TF-IDF similarity
Overlapping words between strings

But the problem with them is that they depends on the entire text rather than the primary object being referred in the text.

I was thinking of processing as follows:-

Remove colors, size info etc. from the text 2. The text 2, is very concise and does not contain random data. It contains product name and size, colour info.
Validate that the remaining elements from Text 2 are present in Text 1, or atleast a majority of them are.

I am not quite sure what different NLP techniques might be there, which would be better than this approach, so any suggestions would be appreciated.

score 0 · Accepted Answer

根据您的目标，这可能是中等难度或非常难度。

您可以使用以下几样东西：

NER肯定会有所帮助：

Wikiifier 也可能有用：http ://cogcomp.org/page/demo_view/Wikifier

还有语义角色标签。在此处查看更多注释：http: //nlp.cogcomp.org/

很难从一个示例中判断什么是确切的算法，但是如果您有更多示例，可能会更容易提出更好的形式化。

可以在这项工作中使用的内容中找到对此的扩展。

nlp - NLP - How to Identify whether 2 texts refer to similar object

1 回答 1

Related

Reference