As the Department of Defense (DOD) and its coalition military partners pursue a range of autonomous, semi-autonomous, and human-machine teaming programs across domains, it is essential that those programs invest in the infrastructure required for program success. A development pipeline that follows best practices pioneered by commercial autonomy companies will speed up the development and deployment of autonomous systems and set the foundation for a scalable, continuously-improving fleet.
Effective data management is at the heart of any modern development pipeline: Machine learning (ML) models require real-world data for iterative development and testing, but only a fraction of the data collected by on-vehicle sensors will enable model improvements. Programs must prioritize investment in a foundational platform that allows them to utilize the data they collect across the life of the program by:
In autonomous systems, log data is any data collected on the vehicle or system corresponding to the autonomous task at hand, including everything from raw sensor inputs to pedal, yoke, or wheel actuation commands.
The scale of the sensor log data collected by autonomous vehicle (AV) programs is staggering: A single hour of driving in a commercial test vehicle or flying an intelligence, surveillance, and reconnaissance (ISR) platform can, in some cases, produce more than two terabytes of data collected from various cameras and sensor payloads, diagnostic sensors, and internal signals. When operating fleets of platforms in exercises or operations every day, the need for a robust data ingestion, processing, and management program becomes evident.
The firehose of data created from development and production fleets has enormous potential to accelerate developer velocity across an entire organization. When managed effectively, sensor data logs serve as a powerful resource for quick autonomy software development and refinement in a virtual environment, reducing the need for time-consuming real-world tests that slow down iterative development processes.
AV programs need to be able to review data from specific sensors, add and refine labels, and identify relevant instances for re-testing and/or disengagements for stack improvement from hours of sensor data logs. A robust data management platform will enrich motion, actuation, radar, lidar, and camera feeds to make querying within and across real-world tests more efficient. Once annotated, the platform should enable programs to rapidly identify anomalies and stack performance issues by allowing users to search for specific parameters across the logs, including stack disengagements, rapid decelerations, actor behaviors, and more. The platform should also facilitate a triage workflow that enables engineers to quickly review issues, identify true issues, determine their severity, and understand root causes.
Furthermore, AV programs need to be able to share sensor data logs across their organization, ideally without forcing individuals to download massive raw sensor data log files. A good data management program will, for example, enable data to be shared with labeling and ML teams as part of the ML process, or with test engineers to assess system performance and adjust test priorities accordingly. Defense autonomy programs should think of themselves as an enterprise, which means taking advantage of enterprise-level infrastructure like shared cloud services and tools optimized for sharing across diverse, specialized teams.
Clearly, autonomy programs within the DOD and its equivalents in our allied nations must prioritize the acquisition or development of an enterprise data management platform. When it comes to the development of networks of AVs, the need for an enterprise-wide data ingestion, management, and triage platform is even more important.
Despite the obvious need for these programs, however, data management has thus far been overlooked within DOD autonomy programs. We have seen sensor data logs stored on hard drives and stashed under desks, unsecured and inaccessible across the enterprise. We have also seen the government hemorrhage its data across corporate networks, never to be seen again.
These are symptoms of a common problem when it comes to software-defined capabilities, including autonomous systems: Too often, DOD wants to jump directly to building the vehicle, rather than building the infrastructure that will actually enable the capabilities to reach the field at scale. On top of that, program dollars are typically assigned to specific programs, meaning that there is little incentive to share resources for an enterprise-wide capability that spans across multiple similar programs.
As we have learned from our experience in the commercial market, autonomy teams—including those at the DOD—that under-invest in data infrastructure and management capabilities will face significant hurdles that extend development timelines, inflate costs, and dampen outcomes.
U.S. defense autonomy programs should learn from our closest allies. The UK Ministry of Defence (MOD) is proactively including data management as part of their Human Machine Teaming (HMT) program. By investing in a robust data management capability early in the development of a suite of teamed autonomous systems spanning domains, the UK MOD is demonstrating superior program design expertise.
Now, the UK MOD just needs to pick the right solution: One built from the ground up for autonomy programs and optimized for collaborating across large enterprises.
Strada, Applied Intuition’s log data management and exploration platform, is built for ingesting and quickly making sense of sensor data across developers, executives, and government program managers. With Strada, testing teams, data platform teams, and autonomy engineering teams can quickly analyze field issues to identify priority areas for algorithm improvement.
Strada accelerates autonomy development by providing a web application that allows development teams to work with log data without downloading the original files. This helps teams save time and removes barriers to collaboration and data sharing. Strada can automatically ingest, classify, and tag specific events in log data based on predefined rules, making it seamless to search and extract datasets for further algorithm development and validation. Strada also comes with a web-based triage dashboard, allowing both engineering and non-engineering teams to view, triage, and verify events encountered in the real world.
Interested in learning more about how Strada can streamline your data ingestion, management, and triage needs? See the workflow.