1

我正在使用 Google Cloud Platform 发布的 Terraform 便捷模块来启动包含多个表和多个视图的 BigQuery 数据集。这非常有效,除了当视图依赖于表时,我必须重试我的apply,因为没有什么告诉 Terraform 在另一个之前做一个。我根据子目录的内容分配tablesviews属性assets,这为我提供了我想要的所有模式和模板化查询。

除了放弃漂亮的模块重用模式之外,有没有一种受支持的方法可以做到这一点?我对我拥有的模式非常满意。

为了完整起见,我调用模块的代码是

locals {
  #
  # Tables
  #

  # Description attribute for each table. If absent, no description is set (null).
  table_to_description = {
# ...
  }

  # Values that don't ever change set for this dataset.
  TABLE_CONSTANTS = {
    time_partitioning = null
    expiration_time   = null
    clustering        = []
    labels = {
      terraform_managed = "true"
    }
  }

  TABLE_SCHEMA_SUFFIX = ".json"


  table_schema_filenames = fileset(pathexpand("${path.module}/assets/schemas"), "*.json")
  // fileset() doesn't have an option to output full paths, so we need to re-expand them
  table_schema_paths = [for file_name in local.table_schema_filenames : pathexpand("${path.module}/assets/schemas/${file_name}")]

  # Build a vector of objects, one for each table
  table_inputs = [for full_path in local.table_schema_paths : {
    schema = full_path
    # TODO(jaycarlton) I do not yet see a way around doing the replacement twice, as it's not possible
    #   to refer to other values in the same object when defining it.
    table_id    = replace(basename(full_path), local.TABLE_SCHEMA_SUFFIX, "")
    description = lookup(local.table_to_description, replace(basename(full_path), local.TABLE_SCHEMA_SUFFIX, ""), null)
  }]

  # Merge calculated inputs with the ones we use every time.
  tables = [for table_input in local.table_inputs :
    merge(table_input, local.TABLE_CONSTANTS)
  ]

  #
  # Views
  #
  VIEW_CONSTANTS = {
    # Reporting Subsystem always uses Standard SQL Syntax
    use_legacy_sql = false,
    labels = {
      terraform_managed = "true"
    }
  }
  QUERY_TEMPLATE_SUFFIX = ".sql"
  # Local filenames for view templates. Returns something like ["latest_users.sql", "users_by_id.sql"]
  view_query_template_filenames = fileset("${path.module}/assets/views", "*.sql")
  # expanded to fully qualified path, e.g. ["/repos/workbench/terraform/modules/reporting/views/latest_users.sql", ...]
  //  view_query_template_paths = [for file_name in local.view_query_template_filenames : pathexpand("./reporting/views/${file_name}")]
  view_query_template_paths = [for file_name in local.view_query_template_filenames : pathexpand("${path.module}/assets/views/${file_name}")]

  # Create views for each .sql file in the views directory. There is no Terraform
  # dependency from the view to the table(s) it queries, and I  don't believe the SQL is even checked
  # for accuracy prior to creation on the BQ side.
  views = [for view_query_template_path in local.view_query_template_paths :
    merge({
      view_id = replace(basename(view_query_template_path), local.QUERY_TEMPLATE_SUFFIX, ""),
      query = templatefile(view_query_template_path, {
        project = var.project_id
        dataset = var.reporting_dataset_id
      })
  }, local.VIEW_CONSTANTS)]

}

# All BigQuery assets for Reporting subsystem
module "main" {
  source     = "terraform-google-modules/bigquery/google"
  version    = "~> 4.3"
  dataset_id = var.reporting_dataset_id
  project_id = var.project_id
  location   = "US"

  # Note: friendly_name is discovered in plan and apply steps, but can't be
  # entered here. Maybe they're just not exposed by the dataset module but the resources are looking
  # for them?
  dataset_name = "Workbench ${title(var.aou_env)} Environment Reporting Data" # exposed as friendly_name in plan
  description  = "Daily output of relational tables and time series views for analysis. Views are provided for general ad-hoc analysis."

  tables = local.tables

  # Note that, when creating this module fom the ground up, it's common to see an error like
  # `Error: googleapi: Error 404: Not found: Table my-project:my_dataset.my_table, notFound`. It seems
  # to be a momentary issue due to the dataset's existence not yet being observable to the table/view
  # create API. So far, it's always worked on a re-run.
  # TODO(jaycarlton) see if there's a way to put a retry on this. I'm not convinced that will work
  #   outside of a resource context (and inside a third-party module).
  views = local.views

}

错误如下所示:

Error: googleapi: Error 404: Not found: Table <MY_PROJECT>:<MY_DATASET>.user, notFound

  on .terraform/modules/workbench.reporting.main/main.tf line 76, in resource "google_bigquery_table" "view":
  76: resource "google_bigquery_table" "view" {

这些视图被 BigQuery 拒绝而被 Terraform 接受,因为它们引用的表尚未创建或尚不可用。它看起来像是depends_on在一个资源块中,但据我所知,在这种情况下,这些是抽象出来的。重试器也可以解决我的问题(但不那么优雅),就像在所有情况下重新运行terraform apply工作一样。

4

2 回答 2

1

如果我在这里了解必要的数据流,我认为使这项工作的一种方法是安排local.views依赖于模块中与表相关的输出值之一。由于您的local.views表达式包含一个templatefile调用,您可以通过将表名传递到模板中来实现:

  views = [for view_query_template_path in local.view_query_template_paths :
    merge({
      view_id = replace(basename(view_query_template_path), local.QUERY_TEMPLATE_SUFFIX, ""),
      query = templatefile(view_query_template_path, {
        project     = var.project_id
        dataset     = var.reporting_dataset_id
        table_names = module.main.table_names
      })
  }, local.VIEW_CONSTANTS)]

因为table_names输出值依赖于模块中的表资源,所以可以用它来声明对表的间接依赖。然后,模块的views参数本身将依赖于local.views,因此间接(现在已删除级别)依赖于表。

以上假设模块内的表的配置不以任何方式依赖于视图。如果存在这样的依赖关系,那么这将创建一个依赖关系循环,但是从快速阅读模块源代码来看,这似乎不是问题。

这个答案所依赖的一个关键是 Terraform 模块的输入变量和输出变量都是依赖图中一个单独的节点,而不是整个模块都是一个节点。因此,一个模块的一个输入变量可能依赖于同一模块的一个输出值,只要模块内部的依赖关系不会导致它创建一个依赖循环。

于 2020-11-12T01:48:12.603 回答
0

我们使用类似的方法实现了我们的框架,但为了减轻这种用例,我们决定定义一个策略,其中 bq-dataset 永远不会将表和视图托管在一起,只托管其中一个。这样,我们的 CI 将始终首先运行/应用 ds/tables 的代码,然后是 ds/views 的代码。

于 2021-06-24T01:13:05.863 回答