2

我试图根据早期的结果来预测足球比赛的结果。我在 Windows 上运行 Python 3.6 并使用 Featuretools 0.4.1。

假设我有以下代表结果历史的数据框。

原始数据名

使用上面的数据框,我想创建以下数据框,它将作为X输入机器学习算法。请注意,尽管过去的比赛场地,主队和客队的平均进球数都需要按球队计算。有没有办法使用Featuretools创建这样的数据框?

结果数据框

可以在此处找到用于模拟转换的 Excel 文件。

4

1 回答 1

4

这是一个棘手的功能,但很好地使用了 Featuretools 中的自定义原语。

第一步是将匹配的 CSV 加载到 Featuretools 实体集中

es = ft.EntitySet()
matches_df = pd.read_csv("./matches.csv")
es.entity_from_dataframe(entity_id="matches",
                         index="match_id",
                         time_index="match_date",
                         dataframe=matches_df)

然后我们定义一个自定义转换原语,计算最近 n 场比赛的平均进球数。它有一个参数可以控制过去的比赛场数以及是否为主队或客队计算。有关定义自定义原语的信息在我们的文档herehere中。

from featuretools.variable_types import Numeric, Categorical
from featuretools.primitives import make_trans_primitive

def avg_goals_previous_n_games(home_team, away_team, home_goals, away_goals, which_team=None, n=1):
    # make dataframe so it's easier to work with
    df = pd.DataFrame({
        "home_team": home_team,
        "away_team": away_team,
        "home_goals": home_goals,
        "away_goals": away_goals
        })

    result = []
    for i, current_game in df.iterrows():
        # get the right team for this game
        team = current_game[which_team]

        # find all previous games that have been played
        prev_games =  df.iloc[:i]

        # only get games the team participated in
        participated = prev_games[(prev_games["home_team"] == team) | (prev_games["away_team"] == team)]
        if participated.shape[0] < n:
            result.append(None)
            continue

        # get last n games
        last_n = participated.tail(n)

        # calculate games per game
        goal_as_home = (last_n["home_team"] == team) * last_n["home_goals"]
        goal_as_away = (last_n["away_team"] == team) * last_n["away_goals"]

        # calculate mean across all home and away games
        mean = (goal_as_home + goal_as_away).mean()

        result.append(mean)

    return result

# custom function so the name of the feature prints out correctly
def make_name(self):
    return "%s_goal_last_%d" % (self.kwargs['which_team'], self.kwargs['n'])


AvgGoalPreviousNGames = make_trans_primitive(function=avg_goals_previous_n_games,
                                          input_types=[Categorical, Categorical, Numeric, Numeric],
                                          return_type=Numeric,
                                          cls_attributes={"generate_name": make_name, "uses_full_entity":True})

现在我们可以使用这个原语来定义特征。在这种情况下,我们将不得不手动进行。

input_vars = [es["matches"]["home_team"], es["matches"]["away_team"], es["matches"]["home_goals"], es["matches"]["away_goals"]]
home_team_last1 = AvgGoalPreviousNGames(*input_vars, which_team="home_team", n=1)
home_team_last3 = AvgGoalPreviousNGames(*input_vars, which_team="home_team", n=3)
home_team_last5 = AvgGoalPreviousNGames(*input_vars, which_team="home_team", n=5)
away_team_last1 = AvgGoalPreviousNGames(*input_vars, which_team="away_team", n=1)
away_team_last3 = AvgGoalPreviousNGames(*input_vars, which_team="away_team", n=3)
away_team_last5 = AvgGoalPreviousNGames(*input_vars, which_team="away_team", n=5)

features = [home_team_last1, home_team_last3, home_team_last5,
            away_team_last1, away_team_last3, away_team_last5]

最后,我们可以计算特征矩阵

fm = ft.calculate_feature_matrix(entityset=es, features=features)

这返回

          home_team_goal_last_1  home_team_goal_last_3  home_team_goal_last_5  away_team_goal_last_1  away_team_goal_last_3  away_team_goal_last_5
match_id                                                                                                                                          
1                           NaN                    NaN                    NaN                    NaN                    NaN                    NaN
2                           2.0                    NaN                    NaN                    0.0                    NaN                    NaN
3                           1.0                    NaN                    NaN                    0.0                    NaN                    NaN
4                           3.0               1.000000                    NaN                    0.0               1.000000                    NaN
5                           1.0               1.333333                    NaN                    1.0               0.666667                    NaN
6                           2.0               2.000000                    1.2                    0.0               0.333333                    0.8
7                           1.0               0.666667                    0.6                    2.0               1.666667                    1.6
8                           2.0               1.000000                    0.8                    2.0               2.000000                    2.0
9                           0.0               1.000000                    0.8                    1.0               1.666667                    1.6
10                          3.0               2.000000                    2.0                    1.0               1.000000                    0.8
11                          3.0               2.333333                    2.2                    1.0               0.666667                    1.0
12                          2.0               2.666667                    2.2                    2.0               1.333333                    1.2

最后,我们还可以使用这些手动定义的特征作为使用深度特征合成的自动化特征工程的输入,此处对此进行了解释。通过在 as 中传递手动定义的特征seed_featuresft.dfs将自动堆叠在它们之上。

fm, feature_defs = ft.dfs(entityset=es, 
                          target_entity="matches",
                          seed_features=features, 
                          agg_primitives=[], 
                          trans_primitives=["day", "month", "year", "weekday", "percentile"])

feature_defs

[<Feature: home_team>,
 <Feature: away_team>,
 <Feature: home_goals>,
 <Feature: away_goals>,
 <Feature: label>,
 <Feature: home_team_goal_last_1>,
 <Feature: home_team_goal_last_3>,
 <Feature: home_team_goal_last_5>,
 <Feature: away_team_goal_last_1>,
 <Feature: away_team_goal_last_3>,
 <Feature: away_team_goal_last_5>,
 <Feature: DAY(match_date)>,
 <Feature: MONTH(match_date)>,
 <Feature: YEAR(match_date)>,
 <Feature: WEEKDAY(match_date)>,
 <Feature: PERCENTILE(home_goals)>,
 <Feature: PERCENTILE(away_goals)>,
 <Feature: PERCENTILE(home_team_goal_last_1)>,
 <Feature: PERCENTILE(home_team_goal_last_3)>,
 <Feature: PERCENTILE(home_team_goal_last_5)>,
 <Feature: PERCENTILE(away_team_goal_last_1)>,
 <Feature: PERCENTILE(away_team_goal_last_3)>,
 <Feature: PERCENTILE(away_team_goal_last_5)>]

特征矩阵为

         home_team away_team  home_goals  away_goals label  home_team_goal_last_1  home_team_goal_last_3  home_team_goal_last_5  away_team_goal_last_1  away_team_goal_last_3  away_team_goal_last_5  DAY(match_date)  MONTH(match_date)  YEAR(match_date)  WEEKDAY(match_date)  PERCENTILE(home_goals)  PERCENTILE(away_goals)  PERCENTILE(home_team_goal_last_1)  PERCENTILE(home_team_goal_last_3)  PERCENTILE(home_team_goal_last_5)  PERCENTILE(away_team_goal_last_1)  PERCENTILE(away_team_goal_last_3)  PERCENTILE(away_team_goal_last_5)
match_id                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
1          Arsenal   Chelsea           2           0     1                    NaN                    NaN                    NaN                    NaN                    NaN                    NaN                1                  1              2014                    2                0.666667                0.166667                                NaN                                NaN                                NaN                                NaN                                NaN                                NaN
2          Arsenal   Chelsea           1           0     1                    2.0                    NaN                    NaN                    0.0                    NaN                    NaN                2                  1              2014                    3                0.333333                0.166667                           0.590909                                NaN                                NaN                           0.227273                                NaN                                NaN
3          Arsenal   Chelsea           0           3     2                    1.0                    NaN                    NaN                    0.0                    NaN                    NaN                3                  1              2014                    4                0.125000                0.958333                           0.272727                                NaN                                NaN                           0.227273                                NaN                                NaN
4          Chelsea   Arsenal           1           1     X                    3.0               1.000000                    NaN                    0.0               1.000000                    NaN                4                  1              2014                    5                0.333333                0.500000                           0.909091                           0.333333                                NaN                           0.227273                           0.500000                                NaN
5          Chelsea   Arsenal           2           0     1                    1.0               1.333333                    NaN                    1.0               0.666667                    NaN                5                  1              2014                    6                0.666667                0.166667                           0.272727                           0.555556                                NaN                           0.590909                           0.277778                                NaN
6          Chelsea   Arsenal           2           1     1                    2.0               2.000000                    1.2                    0.0               0.333333                    0.8                6                  1              2014                    0                0.666667                0.500000                           0.590909                           0.722222                           0.571429                           0.227273                           0.111111                           0.214286
7          Arsenal   Chelsea           2           2     X                    1.0               0.666667                    0.6                    2.0               1.666667                    1.6                7                  1              2014                    1                0.666667                0.791667                           0.272727                           0.111111                           0.142857                           0.909091                           0.833333                           0.785714
8          Arsenal   Chelsea           0           1     2                    2.0               1.000000                    0.8                    2.0               2.000000                    2.0                8                  1              2014                    2                0.125000                0.500000                           0.590909                           0.333333                           0.357143                           0.909091                           1.000000                           1.000000
9          Arsenal   Chelsea           1           3     2                    0.0               1.000000                    0.8                    1.0               1.666667                    1.6                9                  1              2014                    3                0.333333                0.958333                           0.090909                           0.333333                           0.357143                           0.590909                           0.833333                           0.785714
10         Chelsea   Arsenal           3           1     1                    3.0               2.000000                    2.0                    1.0               1.000000                    0.8               10                  1              2014                    4                0.916667                0.500000                           0.909091                           0.722222                           0.714286                           0.590909                           0.500000                           0.214286
11         Chelsea   Arsenal           2           2     X                    3.0               2.333333                    2.2                    1.0               0.666667                    1.0               11                  1              2014                    5                0.666667                0.791667                           0.909091                           0.888889                           0.928571                           0.590909                           0.277778                           0.428571
12         Chelsea   Arsenal           4           1     1                    2.0               2.666667                    2.2                    2.0               1.333333                    1.2               12                  1              2014                    6                1.000000                0.500000                           0.590909                           1.000000                           0.928571                           0.909091                           0.666667                           0.571429
于 2018-12-06T00:30:23.137 回答