问题标签 [great-expectations]

问问题

For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.

62 问题

0 投票

1 回答

492 浏览

apache-spark - How to get Great_Expectations to work with Spark Dataframes in Apache Spark ValueError: Unrecognized spark type: string

I have a Apache Spark dataframe which as a 'string' type field. However, Great_Expectations doesn't recognize the field type. I have imported the modules that I think are necessary, but not sure why Great_Expectations doesn't recognize the field

The following code reads in the csv as a dataframe

The following shows the schema:

I think the following line of code creates Great_Expectation dataframe from the above Spark Dataframe

I then code in the following expectation:

However, I get the following error:

Not sure why Great_Expectations cannot recognize the Spark Type?

2021-06-17T13:02:18.403

0 投票

2 回答

958 浏览

apache-spark - 如何将远大的期望结果从 Apache Spark 保存到文件中 - 使用数据文档

我已成功创建 Great_Expectation 结果，我想将期望结果输出到 html 文件。

很少有链接强调如何使用所谓的“数据文档”以人类可读的方式显示结果https://docs.greatexpectations.io/en/latest/guides/tutorials/getting_started/set_up_data_docs.html#tutorials-getting-started -设置数据文档

但老实说，文档非常难以遵循。

我的期望只是验证我的数据集中的乘客数量在 1 和 6 之间。我想帮助使用“数据文档”将结果输出到文件夹，或者可以将数据输出到文件夹：

代码从 Apache Spark 运行。

如果有人能指出我正确的方向，我将能够弄清楚。

apache-spark pyspark great-expectations

2021-06-17T16:43:25.090

0 投票

1 回答

271 浏览

python - 远大的期望需要很长时间

假设我们有一个大约 17,000 行的 PySpark 数据框，并且想要检查列“a”是否不为空。以下代码需要多长时间才能运行：

到目前为止，它已经花费了大约 14 个小时，并且仍然作为胶水作业运行。这是预期的吗？

python performance pyspark aws-glue great-expectations

2021-06-30T12:19:39.757

0 投票

1 回答

356 浏览

python-2.7 - 如何对 Azure Data Lake 或 Blob Store 寄予厚望

我正在尝试使用以下代码行将 great_expectations 'expectation_suite 保存到 Azue ADLS Gen 2 或 Blob 存储。

但是，我收到以下错误：

以下是成功的，但是我不知道期望套件保存到哪里：

如果有人可以让我知道如何保存到 adls gen2 或者让我知道期望保存在哪里，那就太好了

python-2.7 apache-spark great-expectations

2021-07-08T19:26:09.240

0 投票

1 回答

136 浏览

great-expectations - 当有很多列时，SparkDF 上的 Great Expectation 分析需要很长时间

我需要在 Databricks 中分析来自雪花的数据。数据只是 100 行的样本，但包含 3k+ 列，最终会有更多行。当我减少列数时，分析完成得非常快，但是列越多，它得到的时间就越长。我尝试对样本进行分析，10 多小时后，我不得不取消这项工作。

这是我使用的代码

您可以使用具有大量列的任何数据进行测试。这是正常的还是我做错了什么？

great-expectations

2021-07-16T19:53:02.507

0 投票

0 回答

74 浏览

python - great_expectations 和scrapy

当我使用一个带有great_expectations 和scrapy 的项目时，似乎有一些错误会以某种方式发生冲突。

当我卸载这两个库中的任何一个时，一切正常，但同时使用这两个库时会出现一些错误。

这是我的堆栈跟踪，但我无法弄清楚根本原因是什么，任何帮助都会很棒：

python scrapy great-expectations

2021-07-18T06:40:27.453

0 投票

0 回答

76 浏览

apache-spark - 是否可以更改 Great_Expectations 数据文档中显示的 Great Expectations 徽标

我在使用 Databricks 和 Synapse 可视化来自 Apache Spark 的 Great_Expectations 数据文档方面获得了巨大帮助，请参阅如何将 Great Expectations 结果保存到来自 Apache Spark 的文件 - 使用数据文档

我想知道是否可以自定义显示的徽标，请参见图片。

我想将徽标更改为我自己的徽标 - 否则会违反某种版权

apache-spark azure-databricks great-expectations

2021-07-18T22:31:03.103

0 投票

1 回答

141 浏览

apache-spark - 如何在 Azure Synapse WorkSpace 中找到并安装 Great_Expectations .JAR 文件

我正在尝试通过 Azure Synapse Studio 找到 Great_Expectations .JAR 文件并将其上传到 Azure Synapse，以更新 Apache Spark

我通常会通过 Apache Spark Pool 'Packages' 手动上传 requirements.txt，但我在这样做时遇到了问题，因此尝试上传 .JAR 文件。

有人可以告诉我在哪里可以找到 Great_Expectations.JAR 文件吗？

或者，有人可以告诉我在哪里可以找到 Great_Expectations 的 Python Wheel 文件或工作区包吗

apache-spark azure-synapse great-expectations

2021-07-29T09:49:01.663

0 投票

2 回答

399 浏览

python-3.x - 如何创建 Python Wheel 或确定 Python Wheel 中包含哪些模块/库

我正在尝试为Great_Expectations创建一个 Python Wheel 。Great_Expectations 提供的 .whl 存在于https://pypi.org/project/great-expectations/#files - great-expectations 0.13.25。不幸的是，这个 .whl 似乎不包含我在 Azure Synapse Apache Spark Pool 中使用 Great_Expectations 所需的所有库。

因此，看起来我要么必须创建我自己的Great_Expectations 包，一个 python 项目及其所有依赖项，以供离线 install.whl 使用，要么至少尝试确定现有包中包含哪些库great-expectations 0.13.25

因此，有人可以让我知道如何创建 Python Wheel（即 Python 包，以及 Great_Expectations 的所有依赖项）。或者，有人可以让我知道如何确定包中包含哪些模块/依赖项吗？

谢谢

python-3.x apache-spark azure-synapse great-expectations

2021-08-04T09:53:25.227

0 投票

0 回答

79 浏览

python-3.x - 错误：没有为 tqdm>=4.59.0 找到匹配的分布（来自伟大的期望==0.13.24）

我们正在尝试将 Great_Expectations 工作区包 great_expectations-0.13.25-py3-none-any.whl (4.8 MB) 保存到我们的 Azure Synapse Apache Spark Pool。但是，我们不断收到以下错误：

有人可以让我知道如何解决这个问题。

我知道这个问题看起来类似于我已经发布的关于安装 Great_Expectations 工作区包的问题，但是，这个问题更多是关于创建包或 Python 轮，而这个问题是关于保存已经创建的 python.whl 的问题

python-3.x apache-spark azure-synapse great-expectations

2021-08-04T13:00:48.757

1 2 3 4 5 6 7 8 9 10

问题标签 [great-expectations]

Reference