Google Deepmind: Building Deep Research: A Production AI Research Assistant Agent

LLMOps Database

Tech

Google Deepmind

Company

Google Deepmind

Title

Building Deep Research: A Production AI Research Assistant Agent

Industry

Tech

Link

https://www.youtube.com/watch?v=3HWOzuHp7VI

Year

2024

Summary (short)

Google Deepmind developed Deep Research, a feature that acts as an AI research assistant using Gemini to help users learn about any topic in depth. The system takes a query, browses the web for about 5 minutes, and outputs a comprehensive research report that users can review and ask follow-up questions about. The system uses iterative planning, transparent research processes, and a sophisticated orchestration backend to manage long-running autonomous research tasks.

Tags

Google Deepmind's Deep Research represents a significant step forward in deploying LLMs in production as autonomous research agents. This case study examines the technical and operational challenges of building and deploying such a system, drawing from discussions with the PM and Tech Lead of the project. The core value proposition of Deep Research is to help users quickly gain deep understanding of topics by having an AI agent conduct comprehensive web research on their behalf. What makes this particularly interesting from an LLMOps perspective is how it handles long-running, autonomous tasks that require complex planning and orchestration. **System Architecture and Key Components** The system is built on several key technical foundations: * Gemini 1.5 Pro as the base model with custom post-training * An asynchronous processing platform for handling long-running tasks * A sophisticated planning and execution engine * Web browsing and search capabilities * Real-time progress tracking and transparency features One of the most interesting aspects from an LLMOps perspective is how they handle the orchestration of long-running tasks. The team built a custom asynchronous platform that can handle job failures, retries, and state management. This was necessary because unlike typical chat interactions which are synchronous, Deep Research sessions can run for several minutes and users need to be able to close their browser or app and return later to see results. The system employs a novel approach to iterative planning, where the model creates an initial research plan that users can review and modify. This plan then serves as a contract for execution. The model can parallelize certain research steps while maintaining the ability to reason over previous findings to inform subsequent searches. This requires careful management of context and search strategy. **Context Management and Memory** A particularly challenging aspect was managing the trade-off between keeping information in context versus using retrieval (RAG). The team found that for recent research tasks where users are likely to ask follow-up questions, keeping information in context works better than retrieval. This allows for more complex comparisons between current and previous findings. However, for older tasks, they move information to retrieval systems. The system maintains extensive context (up to 1-2 million tokens) during research sessions, but also has RAG capabilities as a fallback when context limits are exceeded. This hybrid approach allows for both detailed reasoning over recent findings and efficient retrieval of older information. **Evaluation and Quality Assurance** The team developed a sophisticated evaluation framework based on what they call an "ontology of research patterns." Rather than focusing on vertical-specific metrics, they identified underlying research behaviors such as: * Broad but shallow exploration (e.g., shopping queries) * Deep dives into specific topics * Comparative analysis between options * Project-based research with multiple components This framework guides their evaluation process, combining automated metrics (like distribution of planning steps, execution time, etc.) with human evaluation of comprehensive research quality. **Production Challenges and Solutions** Several key production challenges emerged: * Managing the 5+ minute execution time while maintaining user engagement * Handling failures and retries in long-running tasks * Balancing model reasoning with web verification * Ensuring consistent quality across different types of research queries * Managing compute resources efficiently The team addressed these through several innovative approaches: * Showing real-time progress and transparency of websites being accessed * Building a robust async execution platform * Implementing sophisticated retry and state management * Creating clear research plans that users can review and modify **User Experience and Interface Design** The team made several interesting design decisions to handle the unique challenges of a research agent: * Showing the research plan upfront for transparency and modification * Displaying real-time progress of web browsing * Creating a side-by-side interface for report reading and follow-up questions * Making citation and source checking easy and transparent **Future Directions and Challenges** The team identified several areas for future development: * Better personalization based on user expertise and context * Multimodal input and output capabilities * Integration with private and subscription-based content * More sophisticated verification mechanisms * Enhanced security for accessing authenticated content **Technical Implementation Details** The system uses several sophisticated technical approaches: * Custom post-training techniques to enhance research capabilities * Careful prompt engineering to manage iterative planning * Sophisticated orchestration for handling long-running tasks * Context management systems for handling large amounts of information * Real-time progress tracking and update systems A particularly interesting technical challenge was implementing the iterative planning system in a generalizable way, without having to teach planning strategies for each domain separately. This required careful balance in post-training to enhance capabilities without losing pre-trained knowledge. From an LLMOps perspective, this case study demonstrates the challenges and solutions involved in deploying LLMs for complex, long-running tasks in production. It highlights the importance of robust orchestration systems, careful context management, and transparent user interfaces when building AI agents that operate autonomously over extended periods.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.

Learn more

Try Free