ProPublica: LLMs for Investigative Data Analysis in Journalism

LLMOps Database

Media & Entertainment

ProPublica

Company

ProPublica

Title

LLMs for Investigative Data Analysis in Journalism

Industry

Media & Entertainment

Link

https://www.propublica.org/article/using-ai-responsibly-for-reporting

Year

2025

Summary (short)

ProPublica utilized LLMs to analyze a large database of National Science Foundation grants that were flagged as "woke" by Senator Ted Cruz's office. The AI helped journalists quickly identify patterns and assess why grants were flagged, while maintaining journalistic integrity through human verification. This approach demonstrated how AI can be used responsibly in journalism to accelerate data analysis while maintaining high standards of accuracy and accountability.

Tags

data_analysis

classification

content_moderation

high_stakes_application

ProPublica, a nonprofit investigative journalism organization, provides an insightful case study of how Large Language Models can be responsibly integrated into professional journalism workflows. This case demonstrates the practical application of AI in production while maintaining high journalistic standards and ethical considerations. The core use case centered around analyzing a database of over 3,400 National Science Foundation grants that were labeled as "woke" by Senator Ted Cruz's office. The challenge was to efficiently process this large dataset to understand patterns in how grants were being flagged and to verify the accuracy of these classifications. This presented a perfect opportunity to leverage AI while maintaining journalistic integrity. **Technical Implementation and LLMOps Approach** The implementation showcased several key LLMOps best practices: * Careful Prompt Engineering: The team developed specific prompts that instructed the AI to behave like an investigative journalist. The prompts were structured to extract specific information categories: * woke_description: Analysis of why a grant might be considered "woke" * why_flagged: Investigation of specific categorization reasons * citation_for_flag: Direct evidence supporting the classification * Error Prevention: The system was explicitly designed to prevent AI hallucination by instructing the model to skip cases where it wasn't confident, rather than make assumptions. * Human-in-the-Loop Verification: Every AI-generated insight was reviewed and confirmed by human journalists before publication, demonstrating a robust quality control process. * Clear Documentation: The team maintained transparency about their AI usage, sharing their prompt engineering approach and methodology. **Broader LLMOps Infrastructure** ProPublica's approach to LLMOps extends beyond this single case. They have developed a broader framework for AI integration in journalism that includes: * Self-hosted Open-source AI Solutions: For sensitive materials, such as in the Uvalde school shooting investigation, they used self-hosted AI tools to maintain control over data and privacy. * Multi-modal Applications: Their system can handle various data types, including text analysis and audio transcription. * Verification Protocols: Established procedures for human verification of AI outputs, maintaining journalistic standards. **Risk Management and Ethical Considerations** ProPublica's implementation shows careful consideration of AI risks and ethical concerns: * Data Privacy: Using self-hosted solutions when dealing with sensitive information * Accuracy Verification: Multiple layers of human review * Transparency: Open communication about AI usage in their reporting * Bias Prevention: Careful prompt engineering to avoid leading the AI toward predetermined conclusions **Results and Impact** The implementation proved successful in several ways: * Efficiency: Rapid analysis of large datasets that would have been time-consuming to process manually * Accuracy: The system helped identify patterns that might have been missed in manual review * Scalability: The approach has been successfully applied to multiple investigations * Maintainability: The framework allows for consistent application across different types of investigations **Lessons Learned and Best Practices** The case study reveals several important lessons for LLMOps implementation: * Always maintain human oversight and verification * Design prompts carefully to prevent hallucination and bias * Be transparent about AI usage and methodology * Use AI as a tool to augment human expertise, not replace it * Implement proper security measures for sensitive data **Technical Challenges and Solutions** The team faced several technical challenges: * Handling Large Datasets: Developed efficient processing pipelines * Maintaining Accuracy: Created robust verification workflows * Privacy Concerns: Implemented self-hosted solutions where necessary * Integration with Existing Workflows: Developed processes that complement rather than disrupt existing journalistic practices **Future Developments** ProPublica continues to explore new applications of AI in journalism while maintaining their commitment to responsible use. They are: * Expanding their AI capabilities while maintaining strict ethical guidelines * Developing new verification protocols as AI technology evolves * Sharing their learnings with other newsrooms * Investigating potential AI risks and abuses as part of their journalistic mission This case study represents a thoughtful and practical approach to implementing LLMs in a professional context where accuracy and ethical considerations are paramount. It demonstrates how AI can be effectively used to augment human expertise while maintaining high professional standards and ethical guidelines.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.

Learn more

Try Free