Integrate Argilla with ZenML - Data Annotator Integrations

Streamline Data Annotation in ZenML Pipelines with Argilla

Enhance your machine learning workflows by integrating Argilla, an open-source data curation platform, with ZenML. This integration enables efficient data annotation within ZenML pipelines, leveraging Argilla's human-in-the-loop approach for improved data quality and model performance.

Features with ZenML

Seamless integration of Argilla's data annotation capabilities within ZenML pipelines
Support for local and deployed instances of Argilla, including Hugging Face Spaces
Access to annotated datasets and annotations through ZenML CLI and SDK
Efficient data curation and labeling for text data in ML workflows
Enhanced model performance through human feedback and expertise

‍

Main Features

Focus on specific use cases and human-in-the-loop approaches
Support for each step in the MLOps cycle, from data labeling to model monitoring
Faster data curation using both human and machine feedback
Designed to enhance the development of small and large language models (LLMs) and NLP tasks
Actively involves human experts in the tool-building process

‍

How to use ZenML with

Argilla


# register an annotator authentication secret first
# zenml secret create argilla_secrets --api_key="<your_argilla_api_key>"
# then register the annotator itself
# zenml annotator register argilla --flavor argilla --authentication_secret=argilla_secrets

from zenml.client import Client

client = Client()
annotator = client.active_stack.annotator

# list dataset names
dataset_names = annotator.get_dataset_names()

# get a specific dataset
dataset = annotator.get_dataset("dataset_name")

# get the annotations for a dataset
annotations = annotator.get_labeled_data(dataset_name="dataset_name")

# launch the annotation interface via the CLI
# zenml annotator dataset annotate <dataset_name>

The code example demonstrates how to use the ZenML Python SDK to interact with the Argilla annotator. It shows how to list dataset names, retrieve a specific dataset, and get the annotations for a dataset using the annotator object obtained from the active ZenML stack.

Optimizing RAG Pipelines by fine-tuning custom embedding models on synthetic data with ZenML

Additional Resources

Argilla GitHub Repository

ZenML Argilla integration documentation

ZenML Argilla Integration SDK Docs

Streamline Data Annotation in ZenML Pipelines with Argilla

Argilla

Streamline Data Annotation in ZenML Pipelines with Argilla

Features with ZenML

Main Features

Streamline Data Annotation in ZenML Pipelines with Argilla

Unify Your ML and LLM Workflows

Connect Your ML Pipelines to a World of Tools

Connect Your ML Pipelines to a World of Tools