I would like to analyze the given document to find out whether the document contains the content of my interested domain knowledge or something not related to the domain knowledge.
For example, I have a document that contains data about Android OS, and I have a domain ontology which specifies the full knowledge about android.Now I have to find out how many percentage of valid content my document poses with regard to domain ontology.
One way of reaching near to the solution is to use ANNIE(GATE) to extract Named Entities(NE) from document and compare them with the instances of domain ontology and the percentage of valid content can be found.
- Can you suggest any other better technique that I can use?
- Are there any other open source APIs are available? I tried, Lingpipe but I can't use that in a commercial product.
- Are there any Open source applications available of this kind? I searched a lot but I could not find any application.