1

编辑:我现在有一个候选解决方案来解决我的问题(参见下面的玩具示例)——如果你能想到更强大的东西,请告诉我。


我刚刚发现了 python 的patsy包,用于从 R 样式公式创建设计矩阵,它看起来很棒。我的问题是:给定一个 patsy 公式,例如“(x1 + x2 + x3)**2”,是否有一种简单的方法可以创建一个包含与特定变量相关的导数的设计矩阵,例如“x1”?

这是一个玩具示例:

import numpy as np
import pandas as pd
import patsy
import sympy
import sympy.parsing.sympy_parser as sympy_parser

n_obs = 200
df = pd.DataFrame(np.random.uniform(size=(n_obs, 3)), columns=["x1", "x2", "x3"])
df.describe()

design_matrix = patsy.dmatrix("(I(7*x1) + x2 + x3)**2 + I(x1**2) + I(x1*x2*x3)", df)
design_matrix.design_info.column_names
## ['Intercept', 'I(7 * x1)', 'x2', 'x3', 'I(7 * x1):x2', 'I(7 * x1):x3', 'x2:x3', 'I(x1 ** 2)', 'I(x1 * x2 * x3)']

x1, x2, x3 = sympy.symbols("x1 x2 x3")

def diff_wrt_x1(string):
    return str(sympy.diff(sympy_parser.parse_expr(string), x1))

colnames_to_differentiate = [colname.replace(":", "*").replace("Intercept", "1").replace("I", "")
                             for colname in design_matrix.design_info.column_names]
derivatives_wrt_x1 = [diff_wrt_x1(colname) for colname in colnames_to_differentiate]

def get_column(string):
    try:
        return float(string) * np.ones((len(df), 1))  # For cases like string == "7"
    except ValueError:
        return patsy.dmatrix("0 + I(%s)" % string, df)  # For cases like string == "x2*x3"

derivative_columns = tuple(get_column(derivative_string) for derivative_string in derivatives_wrt_x1)
design_matrix_derivative = np.hstack(derivative_columns)

design_matrix_derivative[0]  # Contains [0, 7, 0, 0, 7*x2, 7*x3, 0, 2*x1, x2*x3]
design_matrix_derivative_manual = np.zeros_like(design_matrix_derivative)
design_matrix_derivative_manual[:, 1] = 7.0
design_matrix_derivative_manual[:, 4] = 7*df["x2"]
design_matrix_derivative_manual[:, 5] = 7*df["x3"]
design_matrix_derivative_manual[:, 7] = 2*df["x1"]
design_matrix_derivative_manual[:, 8] = df["x2"] * df["x3"]
np.all(np.isclose(design_matrix_derivative, design_matrix_derivative_manual))  # True!

该代码生成一个带有列的设计矩阵[1, 7*x1, x2, x3, 7*x1*x2, 7*x1*x3, x2*x3, x1^2, x1*x2*x3]

假设我想要一个新的公式来区分 design_matrix 关于 x1。所需的结果是一个与 形状相同design_matrix但其列为 的矩阵[0, 7, 0, 0, 7*x2, 7*x3, 0, 2*x1, x2*x3]。有没有一种程序化的方式来做到这一点?我试过搜索 patsy 文档以及 stackoverflow,但我什么也没看到。当然,我可以手动创建导数矩阵,但如果有一个函数可以做到这一点,那就太好了(当我将公式更改为 时,不必更新它"(x1 + x2 + x3 + x4)**2 + I(x1**3)")。

4

0 回答 0