ZenML
WhyLabs whylogs
All integrations

WhyLabs whylogs

Maintain data quality and detect drift with WhyLabs whylogs in ZenML pipelines

Add to ZenML

Maintain data quality and detect drift with WhyLabs whylogs in ZenML pipelines

The WhyLabs whylogs integration with ZenML enables you to seamlessly integrate data and model profiling capabilities into your ML pipelines. By leveraging whylogs profiles, you can monitor data quality, detect data and model drift, and take automated corrective actions to ensure the reliability and performance of your models in production.

Features with ZenML

  • Seamless data profiling in ZenML pipelines
    Easily generate whylogs data profiles directly within your ZenML pipeline steps for any pandas DataFrame.
  • Flexible integration options
    Use the standard WhylogsProfilerStep, custom steps with the WhylogsDataValidator, or call the whylogs library directly.
  • Automated data validation
    Implement data quality checks and corrective actions based on the generated whylogs profiles.
  • Effortless visualization of profiles
    View interactive whylogs profile visualizations directly in the ZenML dashboard or Jupyter notebooks.
  • Easy WhyLabs platform logging
    Upload profiles to WhyLabs’ cloud platform for centralized tracking, analysis and documentation of data and models.
WhyLabs whylogs integration screenshot

Main Features

  • Statistical data profiling and summarization
  • Data quality validation
  • Data drift detection
  • Model drift and performance degradation detection
  • Support for tabular data in pandas DataFrames

How to use ZenML with WhyLabs whylogs

# zenml integration install whylogs -y
# zenml data-validator register whylogs_data_validator --flavor=whylogs
# zenml stack register custom_stack -dv whylogs_data_validator -o default -a default --set


from typing import Annotated,Tuple
import pandas as pd
import whylogs as why
from sklearn import datasets
from whylogs.core import DatasetProfileView

from zenml.integrations.whylogs.flavors.whylogs_data_validator_flavor import (
    WhylogsDataValidatorSettings,
)
from zenml import step, pipeline


@step(
    settings={
        "data_validator.whylogs": WhylogsDataValidatorSettings(
            enable_whylabs=True, dataset_id="model-1"
        )
    }
)
def data_loader() -> Tuple[
    Annotated[pd.DataFrame, "data"],
    Annotated[DatasetProfileView, "profile"]
]:
    """Load the diabetes dataset."""
    X, y = datasets.load_diabetes(return_X_y=True, as_frame=True)

    # merge X and y together
    df = pd.merge(X, y, left_index=True, right_index=True)

    profile = why.log(pandas=df).profile().view()
    return df, profile

@pipeline(enable_cache=False)
def my_pipeline():
    data, profile = data_loader()
    #... do something with the data

if __name__ == "__main__":
    my_pipeline()

Additional Resources

Connect Your ML Pipelines to a World of Tools

Expand your ML pipelines with more than 50 ZenML Integrations

  • Amazon S3
  • Apache Airflow
  • Argilla
  • AutoGen
  • AWS
  • AWS Strands
  • Azure Blob Storage
  • Azure Container Registry
  • AzureML Pipelines
  • BentoML
  • Comet