Last updated: February 24, 2023
In today’s information age, we are bombarded with a constant stream of news and media from a variety of sources. Summarizing tasks, particularly when it comes to news sources, can be a powerful tool for the efficient consumption of information. They distill complex or lengthy content into easily digestible chunks that can be scanned and absorbed quickly, allowing us to keep up with the news without being overwhelmed. They can also help us separate the signal from the noise, highlighting the most important details and helping us identify what’s worth further investigation.
What we built: ZenNews
This is where ZenNews comes into play. It offers a tool that uses ZenML to automate the summarization process and save users time and effort while providing them with the information they need. This can be particularly valuable for busy professionals or anyone who wants to keep up with the news but doesn’t have the time to read every article in full.
Why did we build it?
Apart from the advantages of solving a summarization task itself, this project aims to showcase some key benefits of using ZenML.
- ZenML features a simple and clean Python SDK. In this project, we leverage it to define our steps and pipelines and to access/manage the resources and artifacts that we interact with along the way. This project shows how this such a design can significantly simplify the process of building robust applications.
- ZenML is an extensible framework. We realize that ML projects often require custom-tailored solutions that deviate from off-the-shelf offerings. This is why we employed base abstractions that empower users to craft their solutions without needlessly reinventing the wheel. Take a look, for instance, at the custom materializer and the custom stack component showcased in this project to see how effortlessly one can implement custom solutions.
- ZenML separates your code from your stack. In other words, it offers a distinct separation between the code and the underlying infrastructure. As you explore this example, you’ll notice how this separation can allow you to switch effortlessly between a local default stack and a remote deployment with scheduled pipelines, all with the simple flip of a flag.
- ZenML can help you to scale up. While this PoC-like example serves as evidence of ZenML’s potential to streamline workflows and hasten the development process, it merely scratches the surface of its capabilities. To delve deeper into the extensive possibilities that ZenML has to offer, we encourage you to check out our docs.
How does it work?
The ZenNews project is published as a PyPI package that you can install through pip:
It includes a main pipeline called zen_news_pipeline with three steps: collect, summarize, and report. In this version, the only collect step implementation is the bbc_news_source that collects articles from the BBC news feed, whereas the only summarize step implementation uses the bart_large_cnn_samsum model to generate summaries, and the only report step creates a report and share the results using an alerter. Additionally, the package includes a custom stack component called DiscordAlerter.
Lastly, the package also includes a CLI application named zennews, which serves as the primary interface for interacting with the pipeline and its steps.
How do I use it?
Once you have installed the zennews package, you can immediately test it locally. By running the following command, you will retrieve the top five articles from the BBC news feed, summarize them, and display the results:
As an output, you should see:
You can also parameterize this process. In order to see the possible parameters, please use:
To fully utilize the potential of an application like zennews, it’s recommended to schedule the summarization pipelines instead of manually triggering them. This is possible by using --schedule if you have a ZenML stack which features an orchestrator that supports scheduling.
If you would like to see how you can set up such a stack, you can visit the GitHub page which contains a much more substantial technical summary around implementation and how you can reproduce it on your local setup/system.
Where to go from here?
If you have any questions or feedback about this implementation of a news summarization tool and pipeline, let us know on Slack or join our weekly community meeting. If you want to know more about ZenML or see more examples, check out our docs, examples or our other projects.