ZenML

From SMS to AI: Lessons from 5 Years of Chatbot Development for Social Impact

ONE 2024
View original source

ONE's journey deploying chatbots for advocacy work from 2018-2024 provides valuable insights into operating messaging systems at scale for social impact. Starting with a shift from SMS to Facebook Messenger, and later expanding to WhatsApp, ONE developed two chatbots reaching over 38,000 users across six African countries. The project demonstrated both the potential and limitations of non-AI chatbots, achieving 17,000+ user actions while identifying key challenges in user acquisition costs ($0.17-$1.77 per user), retention, and re-engagement restrictions. Their experience highlights the importance of starting small, continuous user testing, marketing investment planning, systematic re-engagement strategies, and organization-wide integration of chatbot initiatives.

Industry

Other

Technologies

Overview

This case study documents ONE’s journey building chatbot services for social good advocacy across Africa. ONE is a campaigning and advocacy organization focused on fighting extreme poverty and preventable diseases, particularly in Africa. Importantly, this case study explicitly notes that their chatbots are not currently AI-powered but use simple decision-tree formats with keyword recognition. However, the learnings and infrastructure described provide valuable foundational insights for organizations considering the transition from rule-based chatbots to generative AI-powered solutions in production environments.

The case study represents approximately five years of iterative development (2019-2024), during which ONE developed multiple versions of chatbots serving over 38,000 unique users across Kenya, Nigeria, Ethiopia, South Africa, Senegal, and the Democratic Republic of Congo. The organization shares practical lessons that would apply equally to LLMOps implementations, particularly around user acquisition, engagement, testing, and organizational integration.

Technical Architecture

ONE’s chatbot infrastructure consists of several integrated components forming a production system:

The current implementation supports bilingual operation (English and French, with French speakers representing 28% of the audience) and enables multiple action types including learning about ONE, becoming a supporter, learning about campaigning issues, signing petitions, taking quizzes, answering FAQs, and contacting the ONE team directly. Nigerian users can additionally join local WhatsApp activist groups for real-world campaigning activities.

Development Methodology and Iteration

ONE’s approach to chatbot development offers lessons applicable to LLMOps deployments. They started with focused experimentation in 2019, initially targeting Nigerian youth on Messenger and WhatsApp. The organization used a cycle of user research, designing, launching, and testing prototypes built around specific campaigns and markets one at a time.

The development process involved what the authors describe as a “steep learning curve” covering:

The MVP approach allowed ONE to adapt quickly during the COVID-19 pandemic, rapidly updating the bot with content promoting trusted Nigerian COVID information and support services, as well as case-tracking data. This demonstrates the value of modular, maintainable chatbot architecture that can respond to changing requirements.

User Testing and Feedback Integration

A significant portion of the case study focuses on user testing methodology, which would be equally critical for LLMOps deployments. ONE leveraged their network of youth activists in Nigeria and other markets, using WhatsApp groups to gather qualitative insights and monitoring feedback via polls and open questions within the bot itself.

User testing revealed valuable insights about user experience preferences:

Pain points identified through testing included message length issues, the need for responsive small-talk handling (greetings like “hello,” “thank you,” “OK”), and requests for increased personalization such as name-based greetings. This feedback loop demonstrates the importance of continuous user research in any production conversational system.

The case study emphasizes that while backend data analysis provides the “what,” regular user feedback provides the “why,” and ultimately the confidence to continue investing in the platform. This dual approach to understanding system performance would be essential for any LLMOps deployment.

User Acquisition Economics

ONE’s experiments with user acquisition provide concrete benchmarks that would inform LLMOps cost planning. Key findings include:

The case study notes that only 14% of their userbase is on WhatsApp despite the additional investment, highlighting how platform economics can shape user distribution in ways that may not align with strategic preferences.

Re-engagement Challenges and Meta Platform Constraints

One of the most valuable sections for LLMOps practitioners concerns user re-engagement within Meta’s messaging platform constraints. Outside a free 24-hour window triggered by user-initiated messages, organizations can only send unsolicited messages at cost that match specific use cases, or pay for sponsored messages with significant limitations.

WhatsApp-specific constraints include:

ONE initially built a “fairly complex automated re-engagement system” but eventually scaled back significantly after learning how challenging it was to work around broadcast message rules and make sense of re-engagement data. Their current approach involves one re-engagement nudge within 24 hours of first chat (when users are most likely to re-engage) and a system for Messenger-only broadcast messages outside the 24-hour window.

Current data shows 25% of nudged users take at least one additional action within the bot. The case study also highlights that re-engagement challenges extend beyond technical constraints to audience engagement strategy—they realized they had not mapped out a supporter journey beyond initial engagements, meaning they often did not know what to drive existing users toward next.

Organizational Integration Challenges

The case study candidly discusses organizational challenges that parallel common LLMOps adoption issues. The chatbot work operated largely in a silo, relying on a small group of staff and external experts with technical expertise. While the team coordinated with local and global colleagues for campaign objectives, content development, user testing, and sharing learnings, making sense of performance data on a regular basis and generating motivation and accountability proved challenging.

The authors recommend:

They note that chatbot work enabled ONE to better understand its African audience (adding value to campaign and policy teams) and raised questions about supporter journeys that fed into wider organizational discussions. This demonstrates how conversational AI systems can generate insights beyond their primary function.

Future AI Considerations

The case study concludes with reflections on transitioning from rule-based chatbots to generative AI-powered solutions. The authors argue their learnings remain relevant because marketing costs, retention strategies, and staff expertise challenges are “tech-agnostic hurdles.”

However, they acknowledge that user tolerance for non-AI-backed services may decrease as generative AI becomes mainstream on platforms like Snapchat. They envision potential AI applications including:

The authors also raise important accessibility considerations for generative AI in low-and-middle-income country contexts:

The case study concludes that basic chatbots, SMS, and IVR still have roles to play, alongside real-world interactions—noting that one of the most popular chatbot conversations was how to join a local activist group.

Performance Metrics and Results

The chatbot currently serves over 38,000 unique users with over 17,000 actions taken, with most engaged users taking multiple actions. The 25% re-engagement rate for nudged users represents a starting point for optimization. While the organization considers these results positive for reaching and engaging citizens, they acknowledge not having definitively answered questions about comparative value relative to other digital channels (Facebook, X, email) regarding user retention.

The case study emphasizes the importance of demonstrating clear Return on Investment with meaningful data rather than vanity metrics, and ensuring robust reporting infrastructure focusing on small amounts of meaningful data to enable performance analysis and iterative improvement.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Large-Scale Personalization and Product Knowledge Graph Enhancement Through LLM Integration

DoorDash 2025

DoorDash faced challenges in scaling personalization and maintaining product catalogs as they expanded beyond restaurants into new verticals like grocery, retail, and convenience stores, dealing with millions of SKUs and cold-start scenarios for new customers and products. They implemented a layered approach combining traditional machine learning with fine-tuned LLMs, RAG systems, and LLM agents to automate product knowledge graph construction, enable contextual personalization, and provide recommendations even without historical user interaction data. The solution resulted in faster, more cost-effective catalog processing, improved personalization for cold-start scenarios, and the foundation for future agentic shopping experiences that can adapt to real-time contexts like emergency situations.

customer_support question_answering classification +64

Reinforcement Learning for Code Generation and Agent-Based Development Tools

Cursor 2025

This case study examines Cursor's implementation of reinforcement learning (RL) for training coding models and agents in production environments. The team discusses the unique challenges of applying RL to code generation compared to other domains like mathematics, including handling larger action spaces, multi-step tool calling processes, and developing reward signals that capture real-world usage patterns. They explore various technical approaches including test-based rewards, process reward models, and infrastructure optimizations for handling long context windows and high-throughput inference during RL training, while working toward more human-centric evaluation metrics beyond traditional test coverage.

code_generation code_interpretation data_analysis +61