下面定义了一个EntitySet。我已在事务表上声明did
为,但它注册为,而不是。这是为什么?Index
tx
Id
Index
目标是删除下面的警告。
在什么情况下,Index
分配将被覆盖为(主键与外部键?),并且注册为与警告相关Id
的事实?did
Id
一个表中uid
可以有多个did
s tx
。
es = ft.EntitySet(id="the_entity_set")
# hse
es = es.entity_from_dataframe(entity_id="hse",
dataframe=hse,
index="uid",
variable_types={"Gender": ft.variable_types.Categorical,
"Income": ft.variable_types.Numeric,
"dob" : ft.variable_types.Datetime})
# types
es = es.entity_from_dataframe(entity_id="types",
dataframe=types,
index="type_id",
variable_types={"type": ft.variable_types.Categorical})
# files
es = es.entity_from_dataframe(entity_id="files",
dataframe=files,
index="file_id",
variable_types={"file": ft.variable_types.Categorical})
# uid_donations
es = es.entity_from_dataframe(entity_id="uid_txlup",
dataframe=uid_txlup,
index="did",
variable_types={"uid": ft.variable_types.Categorical})
# transactions
es = es.entity_from_dataframe(entity_id="tx",
dataframe=tx,
index="did",
time_index="dt",
variable_types={"file_id": ft.variable_types.Categorical,
"type_id": ft.variable_types.Categorical,
"amt": ft.variable_types.Numeric})
rels = [
ft.Relationship(es["files"]["file_id"],es["tx"]["file_id"]),
ft.Relationship(es["types"]["type_id"],es["tx"]["type_id"]),
ft.Relationship(es["hse"]["uid"], es["uid_txlup"]["uid"]),
ft.Relationship(es["uid_txlup"]["did"],es["tx"]["did"])
]
es.add_relationships( rels )
这就是 EntitySet 的样子
Entityset: the_entity_set
Entities:
hse [Rows: 100, Columns: 4]
types [Rows: 8, Columns: 2]
files [Rows: 2, Columns: 2]
uid_txlup [Rows: 336, Columns: 2]
tx [Rows: 336, Columns: 5]
Relationships:
tx.file_id -> files.file_id
tx.type_id -> types.type_id
uid_txlup.uid -> hse.uid
tx.did -> uid_txlup.did
es.entities
[Entity: hse
Variables:
uid (dtype: index)
Gender (dtype: categorical)
Income (dtype: numeric)
dob (dtype: datetime)
Shape:
(Rows: 100, Columns: 4), Entity: types
Variables:
type_id (dtype: index)
type (dtype: categorical)
Shape:
(Rows: 8, Columns: 2), Entity: files
Variables:
file_id (dtype: index)
file (dtype: categorical)
Shape:
(Rows: 2, Columns: 2), Entity: uid_txlup
Variables:
did (dtype: index)
uid (dtype: categorical)
Shape:
(Rows: 336, Columns: 2), Entity: tx
Variables:
did (dtype: id) ### <<< external key ???
dt (dtype: datetime)
file_id (dtype: categorical)
type_id (dtype: categorical)
amt (dtype: numeric)
Shape:
(Rows: 336, Columns: 5)]
为什么我打电话时did
显示为Id
而不是?Index
fts
这是警告:
feature_matrix, feature_defs = ft.dfs(entityset=es,
target_entity="hse",
agg_primitives=["sum","mode","percent_true"],
where_primitives=["count", "avg_time_between"],
max_depth=2)
feature_defs
.../anaconda3/lib/python3.6/site-packages/featuretools-0.2.1-py3.6.egg/featuretools/entityset/entityset.py:432: FutureWarning: 'did' is both an index level and a column label.
Defaulting to column, but this will raise an ambiguity error in a future version
end_entity_id=child_eid)