我有一个使用 pypspark 和现在 dagster 用 python 编写的项目。我们使用 Sphinx 来构建文档,使用 napoleon 来解析 Google 风格的文档字符串。我们已经开始包含如下预包装的 dagster 实体:
@solid(
config_schema={
"join_key": String,
"join_style": String,
"df1_name": String,
"df2_name": String,
}
)
def join_two_dfs_solid(
context, df1: SparkDataFrame, df2: SparkDataFrame
) -> SparkDataFrame:
"""
Solid to join two DataFrames on the sepcified key.
Args:
context (dict): Dagster Context Dict
df1 (SparkDataFrame): Spark DataFrame with the same schema
df2 (SparkDataFrame): Spark DataFrame with the same schema
Config Parameters:
join_key (str): name of column to join on. Specified column must exist in both columns.
join_style (str): spark join style, e.g., "left", "inner", "outer", etc.; default is "inner"
df1_name (str): alias name for the first dataframe.
df2_name (str): alias name for the second dataframe.
Returns:
DataFrame
"""
key = context.solid_config["join_key"]
join_style = context.solid_config.get("join_style", "inner")
df1_name = context.solid_config["df1_name"]
df2_name = context.solid_config["df2_name"]
context.log.info(f"Running join of two dataframes on {key}")
check_required_columns(df1, [key])
check_required_columns(df2, [key])
output = df1.alias(df1_name).join(
df2.alias(df2_name),
sf.col(f"{df1_name}.{key}") == sf.col(f"{df2_name}.{key}"),
how=join_style,
)
return output
当我们使用 sphinx-apidoc 构建时,我可以通过检查看到该函数的文档字符串存在,join_two_dfs_solid.__doc__
并且 dagster 附加join_two_dfs_solid._description
字段为空,这应该意味着它使用了文档字符串。但是,当 sphinx 文档构建时,我得到一个空白的 .rst 文件,用于包含该实体的模块。有谁知道狮身人面像中是否有任何其他配置设置或我需要更改以使其正确构建的实体?