survey - 如何计算定性调查答案的选票

Question

我们正在为一个客户创建一个网站，该客户想要一个基于对人们“最喜欢的 10 件事”的调查的网站。每个用户必须回答 10 个问题，例如“你最喜欢的颜色是什么”、“你最喜欢的名人是谁”等，然后将结果整理到主页上的全球 Top 10 列表中。

难题在于既允许用户输入他们想要的任何内容，例如他们最喜欢的度假目的地可能是“奶奶家”，又能够准确地计算选票，例如用户 A 可能会说他们最喜欢的名人是“女王”和用户 B 可能会说它是“英国女王”——我们需要将这两个答案计为对同一“事物”的两票。

如果我们强迫用户从一个庞大但预先确定的列表中选择每个问题，它就会限制用户将任何东西定义为他们“最喜欢的东西”的能力。然而，如果我们有一个纯文本输入字段并在提交后尝试解释答案，那么在相同答案的名称或拼写不同的情况下计算选票将变得更加困难。

是否可以通过某种形式的搜索短语建议引擎实时自动调整他们的答案？如果输入法是纯文本字段，我们如何确保允许拼写变化？

如果有人对此功能的可能解决方案有任何想法，可能是一个软件、一个插件、一个 API 等等，请告诉我们。

谢谢，请要求任何澄清。

score 0 · Accepted Answer

As Eric J said, this is getting into cutting edge NLP applications. These are fields of study that are very important for AI/automation researchers and computer science in general, but are still very fledgeling. There are a number of programs and algorithms you can use, the drawbacks and benefits of which very widely. RapidMiner is good, WordNet is widely used in medical applications and should be relatively easy to adjust to your own corpus, and there are more advanced methods like latent Dirichlet allocation. Here are a few resources you should start with (in addition to the Wikipedia article provided above)

http://www.semanticsearchart.com/index.html

http://www.mitpressjournals.org/loi/coli

http://marimba.d.umn.edu/ (try the SenseClusters calculator)

http://wordnet.princeton.edu/

score 0 · Accepted Answer

对简短答案进行分类的最佳方法是k-means clustering. 你需要应用词干。然后您需要使用基本字典将单词转换为索引。您可以使用EverGroingDictionary.cs来自sematicsearchart.com. 将短语扔到字典后，它将被转换为数字序列或向量。将接近度作为单词中的巧合数引入并应用k-means，速度快如闪电algorithm。k-means将所有答案分组。每个组中最常用的单词将是该组的签名。您的整个程序C++或C#或Java必须少于 1000 行。

score 0 · Accepted Answer

如果您想自动计算“The Queen”和“The Queen of England”，那么您的工作可能比“有趣的小调查”更复杂。如果音量足够轻，请考虑仅手动计算结果。只是给你一个感觉，如果有人进入“瑞典女王”或“莱蒂法女王音乐会”怎么办？

如果您真的想走这条路，请查看自然语言处理 (NLP)。具体来说，分类领域。

对于 NLP 的一般介绍，我推荐相关的 Wikipedia 文章

http://en.wikipedia.org/wiki/Natural_language_processing

RapidMiner是一个值得研究的开源 NLP 解决方案。

survey - 如何计算定性调查答案的选票

3 回答 3

Related

Reference