Unpatched software vulnerabilities have been a chronic cybersecurity problem for years, leading to costly data breaches every year. According to IBM's Cost of a Data Breach Report 2023, data breaches caused by exploiting known vulnerabilities cost an average of $4.17 million.
The problem: Organizations don't patch software flaws as quickly as threat actors find and exploit them. According to Verizon's 2024 Data Breach Investigations Report, malicious scanning activity begins within a median of five days after a critical vulnerability is disclosed. However, nearly half of critical vulnerabilities remain unfixed two months after they became available.
Potential solution: Generative AI. Some cybersecurity experts believe GenAI can help close that gap by not only finding bugs but also fixing them. In internal experiments, Google's Large-Scale Language Model (LLM) has already had modest but significant success, fixing 15% of the bugs in the simple software it targets.
Elie Bursztein, cybersecurity technology and research lead at Google DeepMind, said in a presentation at RSA Conference (RSAC) 2024 that his team is actively working on a variety of AI security use cases, from phishing prevention to incident response. I said I'm testing it. But when it comes to AI security, Google's LLM can help protect codebases by finding and patching vulnerabilities, ultimately reducing or eliminating the number of vulnerabilities that need patching. It's at the top of my wish list.
Patch application experiment using Google's AI
In a recent experiment, Bursztein's team compiled 1,000 simple vulnerabilities in the Google codebase discovered by C/C++ sanitizers.
They then asked a Gemini-based AI model, similar to Google's publicly available Gemini Pro, to generate and test patches to identify which patches were best suited for human review. Researchers Jan Nowakowski and Jan Keller said in a technical report that the experimental prompts followed the following general structure:
You are a senior software engineer tasked with fixing an error in a sanitizer. Please correct.
… code
// occurred here
Please correct the error. … LOC pointed to by stack trace
…code
Engineers reviewed patches generated by AI. This was a significant and time-consuming task, which Bursztein says was ultimately approved by his 15% and added to Google's codebase.
“Instead of software engineers spending an average of two hours creating each commit, the required patches are now automatically created in seconds,” Nowakowski and Keller wrote.
And given that thousands of bugs are discovered every year, automatically finding a fix for even a small percentage can save up to months of engineering time and effort. They pointed out that there is a gender.
Ellie BursteinGoogle DeepMind, Cybersecurity Technology Research Lead
AI-driven patching wins
In his RSAC presentation, Bursztein said the results of the AI patching experiment suggest that Google researchers are moving in the right direction. “This model demonstrates an understanding of code and coding principles, which is very impressive,” he said.
For example, in one example, LLM correctly identified and fixed a race condition by adding a mutex.
“It's not easy to understand the concept that there is a race condition,” Bursztein said, adding that the model also fixed some data leaks by eliminating the use of pointers. “So, in a way, it’s like writing.”
AI-driven patching challenges
While the results of the AI patching experiments have been promising, the technology still falls short of what Google hopes to one day achieve: reliably and autonomously fixing 90% to 95% of bugs. We're a long way off, Bursztein warned. “We have a very long way to go,” he said.
This experiment highlighted the following key issues:
- complicated. The researchers found that AI appears to be better at fixing some types of bugs than others, with more bugs having fewer lines.
- verification. The process of validating AI-suggested fixes (where a human operator verifies that a patch addresses the vulnerability in question without any impact on the production environment) remains complex and requires manual intervention is.
- Create dataset and train model. Bursztein said that in one example of problematic behavior, the AI commented out the bug to remove it, but also removed the code in the process. “Problem solved!” Burshteyn said. “Besides being funny, this shows how difficult it is.”
Training an AI from such behavior requires a dataset containing thousands of benchmarks that assess both whether vulnerabilities are fixed and whether the program's functionality remains intact, he said. added. Creating these, Bursztein predicted, will be a challenge for the entire cybersecurity community.
Despite these challenges, he remains optimistic that AI might one day autonomously drive bug discovery and patch management, shrinking the vulnerability window until there are almost no vulnerabilities. I am.
“It's going to be interesting to see how we get there,” Burshteyn said. “But I hope we get there because the benefits are huge.”
Alissa Irei is a senior site editor at TechTarget Security.