Streamline Data Annotation in ZenML Pipelines with Label Studio
Integrate Label Studio, a leading open-source annotation platform, with ZenML to seamlessly incorporate data annotation into your ML workflows. This integration enables efficient labeling of diverse data types, including images, audio, text, and time series, directly within ZenML pipelines.
Features with ZenML
- Seamless integration of data annotation steps into ZenML pipelines
- Support for various annotation types (image, audio, text, time series)
- Automated dataset registration and syncing with Label Studio
- Easy access to annotated data for downstream pipeline steps
- Seamless integration with ZenML’s cloud artifact stores (AWS, Azure, GCP)
Main Features
- Supports a wide range of annotation types and use cases
- User-friendly web interface for efficient data labeling
- Customizable label configurations for project-specific requirements
- Collaborative annotation with multiple users and roles
- Export annotations in standard formats for further analysis
How to use ZenML with
Label Studio
# Setup Label Studio integration
# 1. Create a secret with your Label Studio API key:
# zenml secret create label_studio_secrets --api_key="<your_label_studio_api_key>"
# 2. Register the Label Studio annotator:
# zenml annotator register label_studio --flavor label_studio --authentication_secret="label_studio_secrets"
# 3. Update your stack with the Label Studio annotator:
# zenml stack update -an label_studio
from zenml import pipeline, step
from typing import Dict, Any
from zenml.client import Client
@step
def data_loader() -> Dict[str, Any]:
"""Load labeled data from the active annotator."""
client = Client()
annotator = client.active_stack.annotator
return annotator.get_labeled_data(dataset_name="my_dataset")
@pipeline
def my_pipeline():
"""Define the pipeline using the data loader step."""
data = data_loader()
# Process the labeled data here
if __name__ == "__main__":
my_pipeline()
# Additional CLI commands for working with Label Studio:
# - List all datasets:
# zenml annotator dataset list
# - Get statistics for a specific dataset:
# zenml annotator dataset stats <dataset_id>
This code snippet demonstrates how to set up and use the Label Studio annotator integration with ZenML. It includes instructions for creating a secret with the Label Studio API key, registering the Label Studio annotator, and updating the ZenML stack with the annotator. The code defines a pipeline with a data_loader step that retrieves labeled data from the active annotator using the specified dataset name. The pipeline can then process the labeled data further. Additional CLI commands for working with Label Studio datasets are also provided.
Data Annotation and Labeling MLOps with ZenML and Label Studio
Additional Resources
End-to-End Computer Vision Example with Label Studio
Label Studio Integration Documentation
Label Studio Official Documentation