Google DeepMind today published an update to its Frontier Safety Framework, which outlines the ways it intends to address the potential risks of future AI models. This is an update from the original model introduced in May.
The original framework included a set of protocols to help Google DeepMind stay one step ahead of the potential severe risks of powerful frontier AI models. The company has since collaborated with experts in industry, academia, and government to deepen the developers' understanding of risks, empirical evaluations to test for them, and mitigations that can can be applied.
Developers have also implemented the Framework in safety and governance processes for evaluating Frontier models such as Gemini 2.0. The work led to publication of the updated framework.
advertisement
advertisement
Frameworks like the one from Google DeepMind have become very important to the advertising industry for many reasons. One reason is that the technology aims to protect brands and companies that use AI in their creative. It is also aimed at avoiding regulatory actions, and convincing advertisers and other types of businesses that it is safe to adopt its models.
The latest framework from Google DeepMind includes guidelines on addressing the security risks of AI models and updated procedures on how to deal with their misuse.
The set of protocols aims to address severe risks that may surface from the capabilities of foundation models. The set is intended to complement Google’s existing suite of AI responsibility and safety practices, and enable innovation and deployment with AI that is consistent with the company's AI Principles, according to the paper.
Key updates to the framework focus on security-level recommendations for Critical Capability Levels (CCLs) that help identify the strongest elements to help curb the risk of exfiltration, implementing a more consistent procedure for how to apply security mitigations, and outlining an industry approach to dealing with deceptive risk.
The DeepMind research teams outlined security mitigations to help prevent unauthorized actors from using an exfiltrating model, which "is the process of stealing AI model weights from a developer's control," according to Gemini.
. This has become important because access to the model allows diminished safeguards. The stakes involved for powerful AI could have serious implications for safety and security.
"Our initial Framework recognized the need for a tiered approach to security, allowing for the implementation of mitigations with varying strengths to be tailored to the risk," researchers wrote in a blog post. "This proportionate approach also ensures we get the balance right between mitigating risks and fostering access and innovation."
Anthropic, Microsoft and Meta did something very similar, developing techniques to prevent users from creating or accessing harmful content.
Anthropic -- the maker of Claude -- introduced its technology this week. Microsoft's Prompt Shields and Meta's Prompt Guard models were both introduced last year and were initially unsuccessful, as hackers quickly found ways to bypass the systems. They have since been fixed.