Facebook Releases PyTorch Library for ML Training with Differential Privacy

Facebook has released a new high-speed machine learning (ML) library for training PyTorch models with differential privacy. Called Opacus, the open-source library supports training with minimal code changes required on the client, has little impact on training performance, and allows the client to track the privacy budget expended at any given moment, the company says.

Differential privacy (DP) is a model of cyber security that proponents claim can protect personal data better than traditional methods. Facebook describes it "a mathematically rigorous framework for quantifying the anonymization of sensitive data."

"With the release of Opacus, we hope to provide an easier path for researchers and engineers to adopt differential privacy in ML, as well as to accelerate DP research in the field," wrote Facebook applied research scientists Davide Testuggine and Ilya Mironov in a recent blog post.

PyTorch, one of the most popular ML libraries, was developed primarily by Facebook's AI Research lab (FAIR). Based on the Torch open-source machine learning library, and released under the Modified BSD license, it has been gaining fans and market share against its closest competitor, Google's TensorFlow, since version 1.0 was released in 2018. Microsoft has been a supporter of PyTorch from early days.

Opacus defines a lightweight API by introducing the PrivacyEngine abstraction, the Facebook researchers explained, which attaches to a standard PyTorch optimizer and works behind the scenes, handling the privacy budget (a parameter in DP that represents the degree of privacy offered) and working on a model's gradients--which makes training with Opacus "as easy as adding these lines of code at the beginning of your training code."

"The core idea behind this algorithm is that we can protect the privacy of a training dataset by intervening on the parameter gradients that the model uses to update its weights, rather than the data directly," they wrote. "By adding noise to the gradients in every iteration, we prevent the model from memorizing its training examples, while still enabling learning in aggregate. The (unbiased) noise will naturally tend to cancel out over the many batches seen during the course of training."

The list of features Facebook is highlighting in this release includes:

  • The ability to leverage Autograd hooks in PyTorch to compute batched per-sample gradients, instead of microbatching, for an order of magnitude increase in speed.
  • The ability to use a cryptographically safe pseudo-random number generator for its security-critical code.
  • The ability to prototype quickly by mixing and matching Facebook's code with PyTorch code and pure Python code.
  • The ability to track of how much of the privacy budget is being spent at any given point in time, enabling early stopping and real-time monitoring.

Opacus also comes with tutorials, helper functions that warn about incompatible layers before the training even starts, and automatic refactoring mechanisms.

With Opacus, Facebook is targeting two types of users: ML practitioners, who will find it to be "a gentle introduction to training a model with differential privacy;" and experienced DP scientists, who will find it to be "an easy to experiment to tinker with." And it's being released into an engaged and growing privacy-preserving machine learning (PPML) community.

"We're excited by the ecosystem that's already forming around Opacus with leaders in PPML," the researchers said.

Opacus is open-source for public use, and it is licensed under the Apache-2.0.

About the Author

John K. Waters is the editor in chief of a number of sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at