所以,我在 Spark 中运行了一个简单的 Deequ 检查,结果如下:
val verificationResult: VerificationResult = { VerificationSuite()
.onData(dataset)
.addCheck(
Check(CheckLevel.Error, "Review Check")
.isComplete("col1")
.isUnique("col2")
.hasSize(_ == count_date)
.satisfies(
"abs(col4 - col5) <= 0.20 * col5",
"value(col4) lies between value(col5)-20% and value(col5)+20%"
)
.run()
}
val result1 = checkResultsAsDataFrame(spark, verificationResult)
现在,我的result1
数据框看起来像这样:
+------------+-----------+------------+--------------------+-----------------+--------------------+
| check|check_level|check_status| constraint|constraint_status| constraint_message|
+------------+-----------+------------+--------------------+-----------------+--------------------+
|Review Check| Error| Error|CompletenessConst...| Success| |
|Review Check| Error| Error|UniquenessConstra...| Failure|Value: 7.62664794...|
|Review Check| Error| Error|SizeConstraint(Si...| Success| |
|Review Check| Error| Success|ComplianceConstra...| Success| |
+------------+-----------+------------+--------------------+-----------------+--------------------+
我在列check_status
和constraint_status
. 它们有何不同?我的检查结果应该在后一项吧?那么前者意味着什么呢?
我在deequ 博客中也找不到任何明确的说明。有人可以解释一下吗?