Google Using Deep Learning To Protect Gmail

Google began experimenting a year ago with deep-learning-based email security in an effort to refine its ability to protect Gmail users from spam, phishing attempts, and malware. The company's efforts led to the development of a system that today helps to block more than 99.9 percent of these potential threats, the company says.

Elie Bursztein, who leads Google's Security and Anti-abuse Research Team, shared the details of this success story at the annual RSA conference in San Francisco this week. Bursztein's session, "Malicious Documents Emerging Trends: A Gmail Perspective," offered a rare look at the kinds of attachments malicious hackers send to Gmail users, and their favorite targets.

In a snapshot of the hundreds of billions of attachments scanned by the system in just a few recent weeks, Microsoft Office docs containing macros accounted for about 56% of the malicious attachments, Bursztein said. The other 44% were PDFs (2%), archived files, and HTML-based document attachments, among others.

The most common targets of these tainted emails are government organizations, Bursztein said, followed by transportation companies, utilities, and manufacturing operations. Norway topped the list of most often targeted countries, followed by the UK, Finland, and the US.

At the heart of the system is a malware scanner powered by TensorFlow, the popular open-source machine learning (ML) framework that Google developed. When Google implemented the scanner last year, it was detecting and blocking about 100 million additional spam messages every day. But the bad guys never rest; Google estimates that 63% of the malicious documents the system blocks differ from day to day. To keep up with the ever-evolving threats, Google added a new generation of document scanners that rely on deep learning to improve the system's detection capabilities.

Google launched the upgraded scanner at the end of 2019. Since then, daily detection coverage of Office documents that contain malicious scripts have increased by 10%. Bursztein, along with Google software engineer David Tao, and Gmail security product manager Neil Kumaran, posted that statistic on the Google Security Blog. "Our technology is especially helpful at detecting adversarial, bursty attacks," they wrote. "In these cases, our new scanner has improved our detection rate by 150%."

"Under the hood," the new scanner uses a distinct TensorFlow deep-learning model trained with TensorFlow Extended (TFX), they said, along with a custom document analyzer for each file type. TFX provides a sequence of components that implement an ML pipeline; the document analyzers are responsible for parsing the document, identifying common attack patterns, extracting macros, de-obfuscating content, and performing feature extraction.

"Our new scanner runs in parallel with existing detection capabilities," they explained, "all of which contribute to the final verdict of our decision engine to block a malicious document. Combining different scanners is one of the cornerstones of our defense-in-depth approach to help protect users and ensure our detection system is resilient to adversarial attacks."

About the Author

John K. Waters is the editor in chief of a number of sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at