The Defense Innovation Unit developed a system to detect illegal, unreported, and unregulated fishing vessels using satellite-based synthetic aperture radar (SAR) imagery and machine learning. They created a large annotated dataset of SAR images, developed ML models for vessel detection, and deployed the system to over 100 countries through a platform called SeaVision. The system successfully identifies "dark vessels" that turn off their AIS transponders to hide illegal fishing activities, enabling better maritime surveillance and law enforcement.
This case study comes from a seminar presentation by Jal Dunman, Senior Adviser for Strategic Initiatives at the Defense Innovation Unit (DIU), who previously served as Technical Director of the AI portfolio at DIU. The presentation covers both high-level challenges of deploying AI in the public sector and a specific deep-dive into a maritime monitoring system designed to detect illegal fishing activity using machine learning applied to synthetic aperture radar (SAR) satellite imagery.
The project represents a fascinating intersection of environmental protection, international maritime law enforcement, and cutting-edge ML systems deployed at global scale. The system was ultimately deployed to SeaVision, a maritime domain awareness platform used by over 100 countries worldwide.
Illegal, unreported, and unregulated (IUU) fishing represents a massive global challenge. According to the presentation, approximately one in five fish caught globally originates from IUU fishing. This problem is exacerbated by the fact that deep-water fishing fleets frequently overfish in other countries’ exclusive economic zones (EEZs), causing significant environmental, economic, and legal issues.
The International Maritime Organization requires ships over approximately 300 gross tons to be equipped with Automatic Identification Systems (AIS) – radio frequency transponders that broadcast ship identity, location, and activity. However, vessels engaged in illegal fishing typically disable these transponders to avoid detection, becoming what are termed “dark vessels.”
Traditional monitoring approaches face a fundamental scaling challenge: the ocean is enormous, and no country has sufficient patrol vessels to monitor everywhere simultaneously. This is particularly acute for nations like Pacific island states that have massive EEZs but limited coast guard resources.
The solution leverages the revolution in commercial space-based sensing, specifically synthetic aperture radar (SAR) satellites. SAR offers critical advantages over electro-optical (camera-based) imagery for this use case:
SAR works by having a satellite emit radar signals while moving, then receiving the reflected signals. The resulting imagery shows reflectivity rather than visible light intensity, creating distinctive visual patterns where metal objects appear as bright star-like patterns. The imagery includes different polarizations (VV and VH) that reveal different features.
However, SAR imagery presents unique ML challenges compared to standard computer vision:
Rather than building models entirely in-house, the DIU team partnered with Global Fishing Watch (a nonprofit focused on monitoring legal fishing) and other organizations to create a large-scale public dataset and challenge. This approach reflects a common pattern in government ML work: demonstrating value on open-source analogs before accessing classified or restricted data.
The dataset characteristics:
The task design was carefully structured to match operational needs rather than simple object detection:
The competition attracted thousands of submissions from around the world, with performance improving substantially over time.
The presentation highlights several persistent MLOps challenges specific to public sector AI deployment:
Legacy infrastructure: Most government systems were built decades ago, before modern ML practices existed. The concept of software as a “living thing” requiring constant updates is not common practice in many government IT environments.
Hardware and software access: Applications lead access to compute resources. Even when cloud environments exist, they may lack the GPU resources needed for ML workloads. Some use cases require on-premises deployment for security reasons, fundamentally changing the engineering trade-offs.
Acquisition timelines: The government budgets three years in advance, meaning current-year budgets were written before ChatGPT existed. This creates significant lag between capability availability and deployment authorization.
Model lifecycle management: The ability to create, deploy, and manage multiple models while ensuring the workforce has the necessary skills remains a significant challenge. Bureaucratic processes can interfere with the rapid update cycles that ML systems require.
Domain shift monitoring: Unlike cloud deployments where models can be easily monitored, edge deployments (discussed below) require on-device performance monitoring, out-of-distribution detection, and autonomous decision-making about when to trust model outputs.
One of the most interesting technical challenges discussed involves efficient inference for operational deployment. The best-performing competition models were ensembles that took approximately 15 minutes to process a single scene on a V100 GPU. While this might seem acceptable for occasional analysis, the operational reality creates different constraints:
Communication bottleneck: The limiting factor for satellite-based monitoring is not compute but downlink bandwidth. Commercial satellite constellations typically don’t record images over open ocean because there’s insufficient economic value to justify the communication costs.
On-satellite processing: The ideal solution involves running compressed models directly on the satellite, processing images to identify interesting targets, and only transmitting relevant detections to ground stations. This requires dramatic model compression while maintaining detection performance.
Power and compute constraints: Satellite hardware operates under severe power and compute restrictions, similar to embedded or “tiny ML” applications but with the added complexity of space-qualified hardware requirements.
Latency requirements: For detections to be actionable, they must reach decision-makers within hours. This requires intelligent prioritization of what data to transmit.
The culmination of the project was deployment to SeaVision, a maritime domain awareness platform used by over 100 countries. The system displays:
The visualization powerfully demonstrates the scale of dark fishing activity. In areas like the waters off Chennai, India, or the Galápagos Islands, the number of untracked vessels vastly exceeds those visible through AIS alone. Manual analysis of SAR imagery at this scale is infeasible – it’s “hard, slow, unreliable, and frankly not feasible” for most countries.
The team implemented an efficient queuing and inference system to minimize costs while providing coverage for a large number of countries. Even occasional patrols informed by this data can affect fishing behavior, providing deterrence value beyond direct enforcement.
The presentation emphasizes that the vast majority of government AI systems are human-in-the-loop by design. Department of Defense policy specifies required levels of human control for various system types. For the dark fishing detection system, the ML flags potential targets, but humans make final determinations about vessel identity and enforcement actions.
This pattern – high recall ML to reduce human workload, with human decision-making retained – is described as particularly useful across government applications, from document analysis to financial systems.
The work was published at NeurIPS in the Datasets and Benchmarks track, demonstrating that public sector AI work can contribute to academic research. The dataset and challenge created multiple open research problems:
The speaker offers several insights for successful government AI projects:
The case study demonstrates that while government AI faces unique challenges around infrastructure, acquisition, and regulatory constraints, it also offers unique opportunities: problems that don’t exist in the private sector, massive distribution potential, and genuine societal impact.
Predibase, a fine-tuning and model serving platform, announced its acquisition by Rubrik, a data security and governance company, with the goal of combining Predibase's generative AI capabilities with Rubrik's secure data infrastructure. The integration aims to address the critical challenge that over 50% of AI pilots never reach production due to issues with security, model quality, latency, and cost. By combining Predibase's post-training and inference capabilities with Rubrik's data security posture management, the merged platform seeks to provide an end-to-end solution that enables enterprises to deploy generative AI applications securely and efficiently at scale.
Volkswagen Group Services partnered with AWS to build a production-scale generative AI platform for automotive marketing content generation and compliance evaluation. The problem was a slow, manual content supply chain that took weeks to months, created confidentiality risks with pre-production vehicles, and faced massive compliance bottlenecks across 10 brands and 200+ countries. The solution involved fine-tuning diffusion models on proprietary vehicle imagery (including digital twins from CAD), automated prompt enhancement using LLMs, and multi-stage image evaluation using vision-language models for both component-level accuracy and brand guideline compliance. Results included massive time savings (weeks to minutes), automated compliance checks across legal and brand requirements, and a reusable shared platform supporting multiple use cases across the organization.
Stripe, processing approximately 1.3% of global GDP, has evolved from traditional ML-based fraud detection to deploying transformer-based foundation models for payments that process every transaction in under 100ms. The company built a domain-specific foundation model treating charges as tokens and behavior sequences as context windows, ingesting tens of billions of transactions to power fraud detection, improving card-testing detection from 59% to 97% accuracy for large merchants. Stripe also launched the Agentic Commerce Protocol (ACP) jointly with OpenAI to standardize how agents discover and purchase from merchant catalogs, complemented by internal AI adoption reaching 8,500 employees daily using LLM tools, with 65-70% of engineers using AI coding assistants and achieving significant productivity gains like reducing payment method integrations from 2 months to 2 weeks.