0

For example the terms experience, yrs, ctc must imply the subject jobs, badge, unlocked associated with foursquare.

How do I get the subject from its terms? I want to analyse less-than-formal english like emails, tweets etc. Is there a data repository and API for this? Can I query Freebase for this? I prefer something that can be self-hosted.

4

2 回答 2

1

Freebase includes WordNet but doesn't really have much which will help with this task -- at least directly. As Miguel implies with his question, if you had gold standard data you could train a classifier, or set of classifiers, for your problem. The other option would be to pay for a commercial service to do this.

于 2013-06-05T13:51:54.587 回答
0

@TomMorris 的回答非常清楚,我同意只能间接使用 FreeBase(或类似方法),因为全球分类法可能无法直接映射到您的问题。

我的建议,如果无法提供主题信息,我会怎么做:

  1. 将聚类技术应用于您的数据。
  2. 尝试(自动或不自动)决定每个集群的含义。
  3. 假设集群中的所有文档都属于那个“类”。
  4. 使用该信息来提供分类器。

主要问题: 1. 我不知道您的数据大小,但这可能是集群和/或集群手动标记的问题。2. 质量可能远低于使用人工判断。

我希望这至少能给你一些提示。

于 2013-06-06T08:43:40.617 回答