0

我试图通过应用 FP-Growth 在我的市场篮分析中找出关联规则。我关心的是按日期查找关联规则,这意味着在长达一年的时间里每天查找项目关联。我可以设计几天来获得关联,但设计它需要 356 天的时间。数据集如下。 在此处输入图像描述

我使用了 Rapidminer 中提供的购物篮分析模板。 在此处输入图像描述

我怎样才能通过几个步骤实现这一目标,而不是每天执行长达一年?

谢谢

<?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process" origin="GENERATED_TEMPLATE">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="9.0.002" expanded="true" height="68" name="Retrieve Clustered Data with Items" width="90" x="45" y="187">
        <parameter key="repository_entry" value="Clustered Data with Items"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="9.0.002" expanded="true" height="103" name="Filter Examples" width="90" x="313" y="187">
        <list key="filters_list">
          <parameter key="filters_entry_key" value="ReceiptDate.eq.01/05/2017"/>
        </list>
      </operator>
      <operator activated="true" class="aggregate" compatibility="6.0.006" expanded="true" height="82" name="Aggregate" origin="GENERATED_TEMPLATE" width="90" x="112" y="336">
        <list key="aggregation_attributes">
          <parameter key="Orders" value="sum"/>
        </list>
        <parameter key="group_by_attributes" value="Invoice|product 1"/>
      </operator>
      <operator activated="true" class="pivot" compatibility="9.0.002" expanded="true" height="82" name="Pivot" origin="GENERATED_TEMPLATE" width="90" x="246" y="336">
        <parameter key="group_attribute" value="Invoice"/>
        <parameter key="index_attribute" value="product 1"/>
      </operator>
      <operator activated="true" class="rename_by_replacing" compatibility="9.0.002" expanded="true" height="82" name="Rename by Replacing" origin="GENERATED_TEMPLATE" width="90" x="380" y="336">
        <parameter key="attribute" value="Invoice"/>
        <parameter key="replace_what" value="sum\(Orders\)_"/>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="9.0.002" expanded="true" height="103" name="Replace Missing Values" origin="GENERATED_TEMPLATE" width="90" x="112" y="442">
        <parameter key="default" value="zero"/>
        <list key="columns"/>
      </operator>
      <operator activated="true" class="numerical_to_binominal" compatibility="6.0.003" expanded="true" height="82" name="Numerical to Binominal" origin="GENERATED_TEMPLATE" width="90" x="246" y="442"/>
      <operator activated="true" class="set_role" compatibility="9.0.002" expanded="true" height="82" name="Set Role" origin="GENERATED_TEMPLATE" width="90" x="380" y="442">
        <parameter key="attribute_name" value="Invoice"/>
        <parameter key="target_role" value="id"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="concurrency:fp_growth" compatibility="9.0.002" expanded="true" height="82" name="FP-Growth" origin="GENERATED_TEMPLATE" width="90" x="648" y="289">
        <parameter key="positive_value" value="true"/>
        <parameter key="min_support" value="0.005"/>
        <parameter key="find_min_number_of_itemsets" value="false"/>
        <enumeration key="must_contain_list"/>
      </operator>
      <operator activated="true" class="create_association_rules" compatibility="9.0.002" expanded="true" height="82" name="Create Association Rules" origin="GENERATED_TEMPLATE" width="90" x="648" y="442">
        <parameter key="min_confidence" value="0.1"/>
      </operator>
      <connect from_op="Retrieve Clustered Data with Items" from_port="output" to_op="Filter Examples" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
      <connect from_op="Aggregate" from_port="example set output" to_op="Pivot" to_port="example set input"/>
      <connect from_op="Pivot" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
      <connect from_op="Rename by Replacing" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
      <connect from_op="Replace Missing Values" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
      <connect from_op="Numerical to Binominal" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
      <connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
      <connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/>
      <connect from_op="Create Association Rules" from_port="item sets" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="147"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="42"/>
      <description align="left" color="yellow" colored="false" height="70" resized="false" width="850" x="20" y="25">MARKET BASKET ANALYSIS&lt;br&gt;Model associations between products by determining sets of items frequently purchased together and building association rules to derive recommendations.</description>
      <description align="left" color="blue" colored="true" height="185" resized="true" width="550" x="20" y="105">Step 1:&lt;br/&gt;Load transaction data containing a transaction id, a product id and a quantifier. The data denotes how many times a certain product has been purchased as part of a transactions.</description>
      <description align="left" color="purple" colored="true" height="341" resized="true" width="549" x="20" y="300">&lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; Step 2:&lt;br&gt;Edit, transform &amp;amp; load (ETL) - Aggregate transaction data to account for multiple occurrences of the same product in a transaction. Pivot the data so that each transaction is represented by a row. Transform purchase amounts to binary &amp;quot;product purchased yes/no &amp;quot; indicators.&lt;br&gt;</description>
      <description align="left" color="green" colored="true" height="310" resized="true" width="290" x="580" y="105">Step 3:&lt;br/&gt;Using FP-Growth, determine frequent item sets. A frequent item sets denotes that the items (products) in the set have been purchased together frequently, i.e. in a certain ratio of transactions. This ratio is given by the support of the item set.</description>
      <description align="left" color="green" colored="true" height="215" resized="true" width="286" x="579" y="425">&lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; Step 4:&lt;br/&gt;Create association rules which can be used for product recommendations depending on the confidences of the rules.&lt;br&gt;</description>
      <description align="left" color="yellow" colored="false" height="35" resized="true" width="849" x="20" y="655">Outputs: association rules, frequent item set&lt;br&gt;</description>
    </process>
  </operator>
</process>

样本数据

4

1 回答 1

0

在检索您的数据作为示例集后,您可以使用Loop Values运算符循环ReceiptDate属性。当前值(在您的情况下为日期)存储在loop_value宏中。

然后,您将构建关联规则的整个过程放在子流程中,并将您的Filter Examples运算符更改为条件类expression,并ReceiptDate==%{loop_value}作为参数表达式。

这将过滤您的整个数据集,因此您只保留当前日期的示例,然后在该子集上构建您的模型。结果,您在Loop Values.

如果您经常基于一个(或多个)参数构建不同的模型,那么看看Old World Computing的Jackhammer扩展可能会很有趣-操作员正是为您执行此操作(为不同的参数值构建特定模型)。它使使用这些特定模型变得轻而易举,因为您获得了一个模型,然后您可以将其应用于您的数据 - 自动选择并应用与参数匹配的模型。Indexed Model

于 2019-01-22T15:16:05.637 回答