Integrate Great Expectations with ZenML - Data Validator Integrations

Ensure Data Quality and Consistency in Your ML Pipelines with Great Expectations and ZenML

Integrate Great Expectations with ZenML to seamlessly incorporate data profiling, testing, and documentation into your ML workflows. This powerful combination allows you to maintain high data quality standards, improve communication, and enhance observability throughout your ML pipeline.

Features with ZenML

Seamless integration of Great Expectations data validation within ZenML pipelines
Automated storage and versioning of Expectation Suites and Validation Results using ZenML's Artifact Store
Easy visualization of Great Expectations artifacts directly in the ZenML dashboard or Jupyter notebooks
Flexible deployment options for stores to leverage existing Great Expectations configurations or let ZenML manage the setup

‍

Main Features

Automated data profiling to generate validation rules (Expectations) based on dataset properties
Comprehensive data quality checks using predefined or inferred Expectations
Human-readable documentation of validation rules, quality checks, and results
Support for various data formats and sources, with ZenML currently supporting pandas DataFrames

‍

How to use ZenML with

Great Expectations

from zenml.integrations.great_expectations.steps.ge_validator import (
    great_expectations_validator_step,
)

ge_validator_step = great_expectations_validator_step.with_options(
    parameters={
        "expectations_list": [
            GreatExpectationExpectationConfig(
                expectation_name="expect_column_values_to_be_between",
                expectation_args={
                    "column": "X_Minimum",
                    "min_value": 0,
                    "max_value": 2000
                },
            ),
        ],
        "data_asset_name": "steel_plates_train_df",
    }
)

@pipeline(enable_cache=False, settings={"docker": docker_settings})
def validation_pipeline():
    imported_data = importer()
    train, test = splitter(imported_data)
    ge_validator_step(train)

validation_pipeline()

The code example demonstrates a simple ZenML pipeline that integrates Great Expectations for data validation. It starts by importing the great_expectations_validator_step step and defining a data importer step. We can specify our list of expectations using the GreatExpectationExpectationConfig class, where each expectation is defined through an expectation name and some expectation arguments like the column name. When you run the pipeline, the resulting artifacts are automatically stored and versioned using ZenML's Artifact Store. By default, the great validation stores for validation results and checkpoints are also configured to your active artifact store.

Additional Resources

ZenML Great Expectations Integration Docs

Great Expectations Documentation

ZenML Great Expectation SDK Docs

Ensure Data Quality and Consistency in Your ML Pipelines with Great Expectations and ZenML

Great Expectations

Ensure Data Quality and Consistency in Your ML Pipelines with Great Expectations and ZenML

Features with ZenML

Main Features

Ensure Data Quality and Consistency in Your ML Pipelines with Great Expectations and ZenML

Unify Your ML and LLM Workflows

Connect Your ML Pipelines to a World of Tools

Connect Your ML Pipelines to a World of Tools