1

The seqecmpgroup() function returns a table that, among other things, include frequencies for each of the specified groups. However, when I run this it generates frequencies below 1 (e.g. 0.00035). Should I interpret these frequencies as percentages showing in how many of the groups that each subsequence occurs?

Below I've pasted an example output (the frequencies for each group are listed as "Freq.1", "Freq.2", etc.:

      Subsequence     Support     p.value statistic index      Freq.1
1      (FA)-(IN)-(FA) 0.004807692 0.002293660 12.155213   538 0.000000000
2 (NR)-(TR)-(EX)-(IN) 0.004807692 0.002293660 12.155213   685 0.000000000
3 (NR)-(TR)-(IN)-(IN) 0.004807692 0.002293660 12.155213   687 0.000000000
4      (IS)-(IS)-(NR) 0.019230769 0.006788125  9.985161    98 0.040322581
5      (FA)-(NR)-(QU) 0.012820513 0.009031434  9.414088   172 0.008064516
       Freq.2     Freq.3    Resid.1   Resid.2   Resid.3
1 0.000000000 0.02419355 -1.0919284 -1.100699  3.113347
2 0.000000000 0.02419355 -1.0919284 -1.100699  3.113347
3 0.000000000 0.02419355 -1.0919284 -1.100699  3.113347
4 0.007936508 0.00000000  2.3951978 -1.292885 -1.544220
5 0.003968254 0.04032258 -0.6614769 -1.241085  2.704727

Computed on 624 event sequences
  Constraint Value
  countMethod  COBJ
4

1 回答 1

2

频率实际上是相对频率。它们对应于每组内的相对支持度,也就是说,它们表示每组中包含子序列的组中序列的比例。

例如,我们从您的结果中得知,第一个子序列(FA)-(IN)-(FA)从未出现在前两组中,并且是第 3 组序列的 2.4% 的子序列。

提供时,比例说明序列权重。

现在,我在您的示例输出中看不到任何负频率。并且您指示的 0.00035 的值不低于 0!

于 2015-01-15T10:44:23.850 回答