Artificial intelligence systems’ need for access to many large datasets often doesn’t align with current cybersecurity fundamentals and implementations.
Cybersecurity is commonly regarded as the biggest strategic challenge confronting the United States. Recent headlines only confirm this trend, as every day seems to bring with it the announcement of a new vulnerability, hack or breach. Since 2013, the U.S. intelligence community has ranked cybersecurity as the No. 1 threat facing the nation in each of its annual global threat assessments. Only in 2021 at the height of a global pandemic did cybersecurity lose its top spot.
However, there is one major fault with the commonly accepted wisdom about cybersecurity: It has a blind spot.
More specifically, traditional cybersecurity measures all too frequently fail to account for data science methodologies and the vulnerabilities that are unique to artificial intelligence systems. The policies being developed and deployed to secure software systems do not account for data science activities and the AI systems they give rise to, namely the user’s or system’s need for access to many large datasets in a manner that often doesn’t align with current cybersecurity fundamentals and implementations. This means that just as emerging technologies like AI and data analytics are gaining traction -- motivating policy after policy championing its benefits -- today’s software security practices are fundamentally blind to the challenges they create. This is because the new technologies require and receive unfettered access to the underlying data and rely on trusted data and high-quality data to ensure resulting algorithms and data science products are accurate.
We cannot simultaneously have both more AI and more security -- at least not without significantly adjusting how we approach securing software and data.
The Biden administration’s recently released Executive Order on Improving the Nation’s Cybersecurity is an ambitious and thoughtful attempt at addressing this paradox. However, it contains significant gaps that mirror the ways in which data science’s impact on cybersecurity is often overlooked. Ultimately, we need to help the right hand of cybersecurity develop a better understanding of what the left hand of data science is doing.
Embracing zero trust
How can agencies maintain security in an environment plagued by threat actors? One prominent answer is to embrace a zero trust model -- a concept at the heart of the executive order -- which requires assuming breaches in nearly all scenarios.
Exactly what this means in practice is clear in the environment of traditional software and controls: implementing risk-based access controls, ensuring that least-privilege access is implemented by default and embedding resiliency requirements into network architectures to minimize single points of failure.
However, the problem is that none of this accounts for data science, which requires continuous access to data. It’s rare that data scientists even know all the data required at the beginning of any one analytics project. Instead, they frequently require access to all the available data to deliver a model that sufficiently solves the problem at hand.
So how does zero trust fit into this environment, where users building AI systems actively require access to voluminous amounts of data? The simple answer is that it does not. The more complicated answer is that zero trust works for applications and production-ready AI models but not for training AI.
A new kind of supply chain
The idea that software systems suffer from a supply chain issue is also common wisdom. These systems are complex, and it can be easy to hide or obscure vulnerabilities within this complexity. This is, at least in part, why the executive order so forcefully emphasizes the importance of supply chain management, both the physical hardware and the software running on it.
However, the problem is again one of mismatch. Efforts to focus on the software security do not apply to data science environments, which are predicated on access to data that in turn forms the foundation for AI code. Whereas humans painstakingly program software line-by-line in traditional systems, AI is largely “programmed” by the data it is trained upon, creating new cybersecurity vulnerabilities and challenges.
What, then, can be done about these types of security issues? The answer, like so many other things in the world of AI, is to focus on the data. Knowing where the data came from, how it has been accessed and by whom and tracking access in real-time are the only long-term ways to monitor for and address these evolving vulnerabilities. To ensure that both software and AI are secure, organizations must add efforts to track data to the already complicated supply chain.
A new kind of scale -- and urgency
Perhaps most importantly as AI becomes adopted more widely, I believe that cybersecurity vulnerabilities are likely to grow in proportion to a system’s underlying code base. As we move to a world in which data itself is the code, these vulnerabilities will scale in proportion to the data the AI systems is trained upon, meaning threats will grow exponentially in proportion to the code required in the system. Based simply on the growing volume of data we generate as we deploy more AI, we are simultaneously creating an ever-expanding attack surface.
The good news is that this new AI-driven world will give rise to boundless opportunities for innovation. The intelligence community will know more about adversaries in as close to real time as possible. The armed forces will benefit from a new type of strategic intelligence, which will reshape battlefield boundaries and enhance their speed of response. However, this future is also likely to be afflicted with insecurities that are destined to grow at rates faster than human comprehension allows.
To take cybersecurity seriously, agencies must understand and address how AI creates and exacerbates these vulnerabilities. The same goes for strategic investments in AI. The long-term success of the nation’s cybersecurity policies will rest on how accurately they apply to the world of AI.