1

我正在使用 featuretools 文档来学习实体集,并且当前收到KeyError: 'Variable: device not found in entity'以下代码的错误:

import featuretools as ft
data = ft.demo.load_mock_customer()
customers_df = data["customers"]
customers_df
sessions_df = data["sessions"]
sessions_df.sample(5)
transactions_df = data["transactions"]
transactions_df.sample(10)
products_df = data["products"]
products_df
### Creating an entity set 
es = ft.EntitySet(id="transactions")
### Adding entities
es = es.entity_from_dataframe(entity_id="transactions", dataframe=transactions_df, index="transaction_id", time_index="transaction_time", variable_types={"product_id": ft.variable_types.Categorical})
es
es["transactions"].variables
es =  es.entity_from_dataframe(entity_id="products",dataframe=products_df,index="product_id")
es
### Adding new relationship

new_relationship = ft.Relationship(es["products"]["product_id"],
                                   es["transactions"]["product_id"]) 
es = es.add_relationship(new_relationship)
es

### Creating entity from existing table
es = es.normalize_entity(base_entity_id="transactions",
        new_entity_id="sessions",
        index = "session_id",
        additional_variables=["device",customer_id","zip_code"])

这是根据 URL - https://docs.featuretools.com/loading_data/using_entitysets.html

从 API es.normalise_entity 看来,该函数将创建索引为“session_id”的新实体“会话”,其余 3 个变量,但错误是:

C:\Users\s_belvi\AppData\Local\Continuum\Anaconda2\lib\site-packages\featuretools\entityset\entity.pyc in _get_variable(self, variable_id) 250 return v 251 --> 252 raise KeyError("Variable: %在实体中找不到 % (variable_id)) 253 254 @property

KeyError:'变量:在实体中找不到设备'

在使用 es.normalize_entity 之前,我们是否需要单独创建实体“会话”?看起来流程中的语法出现了问题,一些小错误..

4

1 回答 1

0

这里的错误是由于device不是您的transactions_df. 该文档页面中引用的“事务”表比demo.load_mock_customer其字典形式的列多。return_single_table您可以使用该参数找到其余列。这是一个完整的工作示例,normalize_entity仅对您尝试的代码稍作修改:

import featuretools as ft
data = ft.demo.load_mock_customer(return_single_table=True)

es = ft.EntitySet(id="Mock Customer")
es = es.entity_from_dataframe(entity_id="transactions", 
                              dataframe=data, 
                              index="transaction_id", 
                              time_index="transaction_time", 
                              variable_types={"product_id": ft.variable_types.Categorical})

es = es.normalize_entity(base_entity_id="transactions",
        new_entity_id="sessions",
        index = "session_id",
        additional_variables=["device","customer_id","zip_code"])

这将返回一个包含两个实体和一个关系的 EntitySet:

Entityset: Mock Customer
  Entities:
    transactions [Rows: 500, Columns: 8]
    sessions [Rows: 35, Columns: 5]
  Relationships:
    transactions.session_id -> sessions.session_id
于 2018-07-31T19:50:52.533 回答