PyTorch 1.3 Experiments with Named Tensors, Quantization, Mobile
- By John K. Waters
With the latest release of its open source PyTorch machine learning (ML) library, the Facebook AI research group is ranging into new territory -- specifically, seamless model deployment to mobile devices, 8-bit model quantization and tensor naming.
Lin Qiao, who leads Facebook's AI infrastructure developer platform team, unveiled PyTorch 1.3 and this group of early-release experimental features at the recent PyTorch Developer Conference in San Francisco. PyTorch is built on several core elements, she said, including a focus on both eager- and graph-based execution, providing developers with the ability to author dynamic neural networks and use control flow, native support for distributed training, hardware-accelerated inference and the principle of "simplicity over complexity."
Version 1.3 introduces PyTorch Mobile, which Qiao emphasized is not a different framework, but a fully supported feature of TorchScript, which is an intermediate representation of a PyTorch model -- essentially, a way to create serializable and optimizable models from PyTorch code.
"Running ML on edge devices is growing in importance as applications continue to demand lower latency," the Facebook group explained in a blog post. "It is also a foundational element for privacy-preserving techniques, such as federated learning." To enable more efficient on-device ML, PyTorch 1.3 also supports an end-to-end workflow from Python to deployment on iOS and Android, they added.
To support the efficient use of both server-side and on-device compute resources when developing ML applications, this release adds 8-bit model quantization capabilities using the eager-mode Python API. "Quantization" is the process of reducing the number of bits that represent a number. It refers here to techniques used to perform computation and storage at "reduced precision," the Facebook team explained, such as 8-bit integers (as opposed to 32-bit floating point). "Eager execution" is an imperative, define-by-run interface in which operations are executed immediately as they are called from Python.
This experimental feature includes support for post-training quantization, dynamic quantization and quantization-aware training, the group added. It leverages the FBGEMM and QNNPACK state-of-the-art quantized kernel back ends for x86 and ARM CPUs, respectively, which are integrated with PyTorch and now share a common API.
Named tensors are also appearing in this release, Qiao explained, to address concerns that PyTorch doesn't associate semantics closely in tensors in its data sets. "It creates the problem that the code, over the course of time, can be prone to error, and it's very easy to make mistakes," she said. Allowing users to associate explicit names with tensor dimensions will make them easier to use, the bloggers said. Most operations that take dimension parameters will accept dimension names, which obviates the need to track dimensions by position. Named tensors also use names to check automatically that APIs are being used correctly at runtime. Named tensors also make the code much more readable and the maintainable, Qiao said.
A new tool for model interoperability called Captum and a new community-based security research platform called Crypten are also part of PyTorch 1.3, as are tools for multimodal AI systems and cloud-provider hardware ecosystem support.
Based on the Torch open source machine learning library and released under the Modified BSD license, PyTorch has been gaining fans and market share against its closest competitor, Google's TensorFlow, as it matures and the community grows.
"PyTorch continues to gain momentum because of its focus on meeting the needs of researchers, its streamlined workflow for production use, and most of all because of the enthusiastic support it has received from the AI community," the PyTorch team said. The number of contributors to the platform has grown by more than 50 percent over last year to nearly 1,200, they said.
Facebook, Microsoft and Uber, among others, are jumping on the PyTorch bandwagon, they said. And the PyTorch team has even collaborated with Google and Salesforce to add broad support for Cloud Tensor Processing Units (Google's custom-developed ASICs used to accelerate machine learning workloads) to provide an accelerated option for training large-scale deep neural networks. Alibaba Cloud also joins Amazon Web Services, Microsoft Azure and Google Cloud as supported cloud platforms for PyTorch users.
John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at email@example.com.