1

这是我上一个问题的后续问题:我有一个这样的数据框

Company_id  year  dummy_1 dummy_2 dummy_3 dummy_4 dummy_5
1           1990   1       0        1        1      1
1           1991   0       0        1        1      0
1           1992   0       0        1        1      0
1           1993   1       0        1        1      0
1           1994   0       1        1        1      0
1           1995   0       0        1        1      0
1           1996   0       0        1        1      1

我通过以下方式创建了一个 numpy 数组:

df = df.assign(vector = df.iloc[:, -5:].values.tolist())
df['vector'] = df['vector'].apply(np.array)

我想比较公司在战略实践方面的独特性与过去 5 年的竞争对手相比。这是我使用的代码:

df.sort_values('year', ascending=False)



# These will be our lists of differences.
diffs = []

# Loop over all unique dates
for date in df.year.unique():
    # Only take dates earlier then current date.
    compare_df = df.loc[df.year - date <= 5 ].copy()
    # Loop over each company for this date
    for row in df.loc[df.year == date].itertuples():
        # If no data available use nans.
        if compare_df.empty:
            diffs.append(float('nan'))
        # Calculate cosine and fill in otherwise
        else:
            compare_df['distinctivness'] = spatial.distance.cosine(np.array(compare_df.vector) , np.array(row.vector))
            row_of_interest = compare_df.distinctivness.mean()
            diffs.append(row_of_interest.distinctivness.values[0])

但是,我得到

    compare_df['distinctivness'] = spatial.distance.cosine(np.array(compare_df.vector) - np.array(row.vector))

ValueError: operands could not be broadcast together with shapes (29254,) (93,) 

我怎么能摆脱这个问题?

4

0 回答 0