Google is making plans to counter the threat of irreparable harm inflicted by advanced artificial intelligence systems that surpass humans’ capabilities.
A research team at Google DeepMind, the Big Tech company’s AI lab, said this week they are fixated on the problem of AI becoming self-aware and choosing to ignore human intervention.
Artificial general intelligence, AGI, that exceeds human boundaries may arrive in the coming years, according to the researchers, and it could wreak havoc if not properly aligned with human values.
“We’re also conducting extensive research on the risk of deceptive alignment, i.e. the risk of an AI system becoming aware that its goals do not align with human instructions, and deliberately trying to bypass the safety measures put in place by humans to prevent it from taking misaligned action,” the researchers said on Google DeepMind’s blog.
AI misaligned with humans could create a range of harms from petty theft to massive cyberattacks. For example, the researchers said a misaligned AI system tasked with procuring movie tickets may hack into a ticketing system to secure occupied seats from a sold-out show.
But Google’s larger fear is that AI breaks out of the box that technologists use to contain the systems.
Thirty researchers from Google DeepMind published a whitepaper sharing their early work on how they plan to stop rogue AI.
“A misaligned AI system can generate malicious attack code and run it on the host infrastructure, for example, to exfiltrate itself and selfreplicate,” the researchers wrote. “Even if the AI system is not misaligned, outside actors can compromise the AI system to mount attacks.”
This potential chaos prompted Google’s team to publish its plans to guard against such threats. Some safeguards included:
• Sandboxing or isolating the potentially harmful system.
• Insider controls, which are security measures that would mitigate the risk of malicious actions by those with legitimate access to a system.
• Least privilege access or limiting the system to only the minimum access rights necessary to perform its designated tasks.
The Google researchers emphasized prioritizing the need to contain powerful AI.
“One should design a sandbox from which the AI system should not be able to escape,” the researchers said.
The researchers recommended limiting who has access to the advanced AI and treating it as an insider threat.
If or when the AI system does attack, the researchers said the use of extensive logging and auditing will help those working in the lab recover from the AI system’s potential damage.
“Our approach centers around blocking malicious actors’ access to dangerous capabilities,” the researchers wrote. “For misalignment, our approach has two prongs. First, we aim to better understand the reasons behind an AI system’s actions, which can help us oversee it. Second, we aim to harden the environments in which AI systems act.”
A growing number of leading technologists are warning about rogue AI as a realistic threat and not a science fiction scenario.
The RAND Corporation told Congress last month that lawmakers must prepare for advanced artificial intelligence “wonder weapons” to emerge in cyberspace upon the development of AGI.
RAND’s public warning on advanced AI came in an open hearing of the Senate Armed Services Committee, immediately after lawmakers met privately with U.S. Cyber Command, the Defense Advanced Research Projects Agency, and the Department of Defense’s Chief Digital and Artificial Intelligence Office.
Private AI companies beyond Google also fret about the arrival of AGI’s potential damage. OpenAI convened a team in 2023 focused on stopping AI from going rogue and leading to human extinction. Ilya Sutskever, one of the team’s leaders, subsequently quit OpenAI to create a new project called Safe Superintelligence Inc.
Whether any of the technologists’ proposed mitigations would work against advanced AI systems is largely unknown. Google’s team said it expects it will need to create new plans to stop AGI systems’ potential damage.
“Since the application of security mechanisms to defend against misaligned AI is especially nascent, our thinking is particularly tentative and we expect it to change significantly,” the researchers said.