Streamline Data Annotation with Prodigy and ZenML
Enhance your machine learning workflows by integrating Prodigy, a modern annotation tool, with ZenML. This powerful combination enables efficient data labeling, data inspection, and error analysis, streamlining your ML pipeline and improving model performance.
Features with ZenML
- Seamless Integration:
Easily incorporate Prodigy as a data annotation step within your ZenML pipelines. - Efficient Data Labeling:
Leverage Prodigy's intuitive and optimized interface for fast and accurate data annotation. - Flexible Workflow Customization:
Customize annotation workflows using Prodigy's pre-built components and ZenML's extensible architecture. - Streamlined Data Management:
Effortlessly manage datasets, annotations, and metadata within the ZenML framework.
Main Features
- Intuitive and efficient web-based annotation interface
- Pre-built workflows for various annotation tasks
- Customizable scripts for data loading, saving, and annotation logic
- Extensible front-end with custom HTML and JavaScript support
- Optimized for fast and accurate data labeling
How to use ZenML with
Prodigy
# zenml annotator register prodigy --flavor prodigy
# optionally also pass in --custom_config_path="&alt;PATH_TO_CUSTOM_CONFIG_FILE>"
# zenml stack register prodigy -o default -a default -an prodigy --set
# wget https://raw.githubusercontent.com/explosion/prodigy-recipes/master/example-datasets/news_headlines.jsonl
# Now annotate your data
# zenml annotator dataset annotate your_dataset --command="textcat.manual news_topics ./news_headlines.jsonl --label Technology,Politics,Economy,Entertainment"
# access the data later on using Python in your pipelines
from zenml import step
from zenml.client import Client
@step
def import_annotations() -> List[Dict[str, Any]]:
zenml_client = Client()
annotations = zenml_client.active_stack.annotator.get_labeled_data(dataset_name="your_dataset")
# Do something with the annotations
return annotations
This code snippet demonstrates how to import annotations from Prodigy within a ZenML step. It uses the ZenML client to access the active stack's annotator component and retrieves the labeled data for a specific dataset. The annotations can then be processed further in the pipeline.
Additional Resources
ZenML Prodigy Integration Docs
Prodigy Documentation
Blog: How to annotate image data for object detection with Prodigy