Streamline Data Annotation in ZenML Pipelines with Argilla
Enhance your machine learning workflows by integrating Argilla, an open-source data curation platform, with ZenML. This integration enables efficient data annotation within ZenML pipelines, leveraging Argilla's human-in-the-loop approach for improved data quality and model performance.
Features with ZenML
- Seamless integration of Argilla's data annotation capabilities within ZenML pipelines
- Support for local and deployed instances of Argilla, including Hugging Face Spaces
- Access to annotated datasets and annotations through ZenML CLI and SDK
- Efficient data curation and labeling for text data in ML workflows
- Enhanced model performance through human feedback and expertise
Main Features
- Focus on specific use cases and human-in-the-loop approaches
- Support for each step in the MLOps cycle, from data labeling to model monitoring
- Faster data curation using both human and machine feedback
- Designed to enhance the development of small and large language models (LLMs) and NLP tasks
- Actively involves human experts in the tool-building process
How to use ZenML with
Argilla
# register an annotator authentication secret first
# zenml secret create argilla_secrets --api_key="<your_argilla_api_key>"
# then register the annotator itself
# zenml annotator register argilla --flavor argilla --authentication_secret=argilla_secrets
from zenml.client import Client
client = Client()
annotator = client.active_stack.annotator
# list dataset names
dataset_names = annotator.get_dataset_names()
# get a specific dataset
dataset = annotator.get_dataset("dataset_name")
# get the annotations for a dataset
annotations = annotator.get_labeled_data(dataset_name="dataset_name")
# launch the annotation interface via the CLI
# zenml annotator dataset annotate <dataset_name>
The code example demonstrates how to use the ZenML Python SDK to interact with the Argilla annotator. It shows how to list dataset names, retrieve a specific dataset, and get the annotations for a dataset using the annotator object obtained from the active ZenML stack.
Optimizing RAG Pipelines by fine-tuning custom embedding models on synthetic data with ZenML
Additional Resources
Argilla GitHub Repository
ZenML Argilla integration documentation
ZenML Argilla Integration SDK Docs