我有一个调查应用程序,我需要对响应进行聚类以检测连贯或退连贯的迹象。
我正在使用AI4R,我的代码如下所示(示例代码来自 AI4R)
# 5 Questions on a post training survey
questions = [ "The material covered was appropriate for someone with my level of knowledge of the subject.",
"The material was presented in a clear and logical fashion",
"There was sufficient time in the session to cover the material that was presented",
"The instructor was respectful of students",
"The instructor provided good examples"]
# Answers to each question go from 1 (bad) to 5 (excellent)
# The answers array has an element per survey complemented.
# Each survey completed is in turn an array with the answer of each question.
answers = [
[ 1, 2, 3, 2, 2], # Answers of person 1
[ 5, 5, 3, 2, 2], # Answers of person 2
]
data_set = DataSet.new(:data_items => answers, :data_labels => questions)
# Let's group answers in 4 groups
clusterer = Diana.new.build(data_set, 4)
这反过来又让我可以创建这样的图表(调查中有与主题/轴相关的问题)。
问题是现在你必须选择要传递给 AI4R 的集群数量。我如何使用 Ruby 来检测集群的数量(这个问题归结为统计学科......)。
输入肘部方法...
我在维基百科上看到有一种叫做肘法的技术(插图来自维基百科),
它将聚类的数量与它们解释的方差进行比较。这种技术非常适合我的需要,但我不知道如何在 Ruby 中实现它。(我在本科时做过 ANOVA,所以我明白了它们的含义,但这就是它停止的地方。我可能还需要在统计论坛上交叉发布)。
是否有 Ruby 库可以帮助解决这个问题,我还没有偶然发现,或者如何使用 Ruby 生态系统解决这个问题?