Can an autonomous weapons system be trusted?
Last August, an expert group established by the United Nations met in Geneva to discuss lethal autonomous weapons systems (LAWS), their defining characteristics, the differing degrees of interaction between humans and machines, and their inherent security challenges. International humanitarian law has established that human control must be maintained over weapons systems, yet the type and extent of human control has not been defined.
The United States, China, Israel, South Korea, Russia, and the United Kingdom are developing weapons systems with decreasing levels of human engagement and oversight, according to the nongovernmental organization Human Rights Watch. The rapid advancement toward autonomous weapons technologies has sparked ongoing international debate; human rights organizations have called on nations to limit their autonomous weapons development, and an open letter signed by tech and robotics company leaders, including Elon Musk of SpaceX and Tesla, and Mustafa Suleyman of DeepMind, urged the UN to ban fully autonomous weapons.
As the debate rages on, Increment spoke with David Danks, the department head and L.L. Thurstone professor of philosophy and psychology at Carnegie Mellon University, about LAWS, cybersecurity, and the ethics of trust. His publications include the book Unifying the Mind: Cognitive Representations as Graphical Models and numerous research papers on how trust and identity are affected by AI systems.
This interview has been edited and condensed for length and clarity.
Increment: A definition for lethal autonomous weapons systems, or LAWS, has not been established. What do you consider to be a working definition for LAWS?
Danks: I actually don’t like the term LAWS. Especially in the weapons debate, it’s important not to think about an entire system as being autonomous, but rather to distinguish the kinds of autonomous capabilities that these weapons systems are going to have. A weapons system that is able to do perception [which means it can perceive a target and then take simple actions in regard to that] is distinct from one that relies on input from planning and has planning capabilities.
Now, within that, how to define [these systems]: What matters is that these are systems whose functioning is not entirely known or predictable, given the knowledge of the person deploying the system. If I have a robot and I say, “Clear this room of hostile forces,” I don’t know in advance how it’s going to move through the room. I don’t know what it’s necessarily going to consider hostile. It’s [about] predictability with regard to what you intended, and when that is lost, the systems can no longer be trusted.
AI systems often make decisions that are difficult for humans to predict or understand. How can human actors be assured of a system’s security?
The features that are often mentioned as desirable in an AI—prediction, explanation, transparency, reproducibility—are of value because that information is needed to trust a system in a surprising or novel circumstance. It’s very hard to know whether a system will do the right kind of thing unless you have an understanding of what the system prioritizes. How does it value the world? What are the actions that it has available to it? If a system was truly unpredictable, I don’t think anyone in a military chain of command would want to use it. The way that modern, industrialized Western militaries are structured, if something goes wrong with an autonomous weapon, it’s the person who deployed it who is going to be responsible for the failure. So militaries actually have a pretty strong reason not to use these systems unless they really trust them.
How are autonomous weapons systems susceptible to behavioral hacking? Does this necessarily involve an adversary penetrating a system?
When we think of hacking, we typically have in mind an attack that compromises a computational system, changing something about it—its underlying data or its code—and extracting information. We can also think about two other ways of exploiting a system. The first is basically feeding it bad data. You can do this in part by exhibiting behaviors that trick the system. There are some famous examples: You can put pieces of tape on a stop sign, and many of the perceptual systems in autonomous vehicles will perceive that as a speed limit sign. You aren’t changing the self-driving car—what you’re doing is changing its environment so that it behaves in entirely inappropriate ways. This requires you to have an understanding of how the cybersecurity system will respond to various external inputs. In the autonomous systems world, this is also called “passive hacking.”
A different way to think about behavioral hacking is to recognize that humans play a key role. You can find ways to change the human’s behavior to get them to do things that cause real problems for the cyber infrastructure. We’re used to thinking about things like spear phishing, or regular phishing, or the long-standing idea that the weakest link in any cyber system is actually the human user. This is different, because you’re not tricking humans into giving you information; rather, you’re tricking them into doing something that damages the system.
What other security concerns are most relevant?
What I referred to as passive hacking, the first kind of behavioral hacking, is incredibly important. You’re in a setting that is highly adversarial. If there is a way to trick the weapon into misperceiving what’s happening—and there will be a way, because every system has vulnerabilities—then we can expect the adversary to try very hard to construct those situations.
You can harden the weapons system all you want, computationally. You can put in all the cybersecurity you want. Almost certainly, just by the nature of the complexity of these systems, there will be ways to trick them. The only question is, can you make it sufficiently hard to trick the system so that it’s not worth the adversary’s effort? It requires rethinking the very way we approach the design of these systems, to recognize that there’s a whole set of questions we need to be asking that are not part of the normal development pipeline.
What other questions can developers be asking?
How do you tell the user, “Here are the contexts where this system’s going to work, and here are the ones where it probably won’t”? In general, we really have to think carefully about our intentions, values, and goals, and how we can ensure that technologies with autonomous capabilities are designed, developed, and deployed in order to support our intentions, values, and goals, rather than those of the company that produces it or some other third party.
What kinds of fail-safe measures can be established to ensure security?
One thing that’s very important to have is a particular kind of off button. If you give an off button to a system that works for you, the adversary is going to try very hard to push the button themselves. What we need is the opportunity for a human to make a go/no-go decision in advance of a system starting on its tactical mission. This is sometimes called “meaningful human control,” where the human is able to meaningfully look at the system and its context. That’s the kind of on/off switch we need to have, recognizing that once it goes, you may not be able to recall it. That has all kinds of implications, because there are loitering weapons that can linger for long periods of time with no human oversight and that could easily be made more autonomous. This is the kind of case where what’s needed is that ability to say yes or no to whether it starts its operation.
Will militaries develop fully autonomous weapons within the next decade?
If what you mean by “fully autonomous” is systems that we can deploy and trust in environments that we don’t control for long periods of time, I don’t think that full autonomy is coming any time soon. The history of both AI and cognitive science gives us reason to think that we’re very good at coming up with systems that are highly reliable and trustworthy in particular contexts, but are not necessarily good at figuring out what context they’re in or recognizing changes in context.
From a national security defense warfare perspective, the far more interesting technologies are the rise and spread of AI to improve battle planning, tactics, and quick response. In intelligence, surveillance, reconnaissance (ISR), a system has the ability to drop a giant sensor grid with tiny sensors that can communicate to one another over short distances and offer a comprehensive view of a battle space. According to some open-source reports, these things are being deployed. They use a lot of AI on the backend, and they’re shaping the way commanders understand the conflicts they’re engaged in. I think that will have a much bigger impact on the conduct of warfare than legions of killer robots wandering around.