Facebook and AWS Collaborate on PyTorch-Based Model Server

Facebook and Amazon Web Services (AWS) jointly announced the availability of a new model-serving framework based on the popular PyTorch machine learning library.

Available as part of the PyTorch open source project, TorchServe is a purpose-built framework for deploying PyTorch machine learning models at scale without custom code.

PyTorch, one of the most popular deep learning libraries, was developed primarily by Facebook's AI Research lab (FAIR). It continues to be the programming language of choice for machine learning developers. AWS is also a favorite platform among PyTorch developers: AWS claims that more than eight out of 10 PyTorch projects are hosted on the AWS cloud.

TorchServe was developed to simplify the model deployment process, AWS technical evangelist Julien Simon wrote in a blog post.

"One way to simplify the model deployment process is to use a model server," Simon wrote, "i.e. an off-the-shelf web application specially designed to serve machine learning predictions in production. Model servers make it easy to load one or several models, automatically creating a prediction API backed by a scalable web server. They're also able to run preprocessing and postprocessing code on prediction requests. Last but not least, model servers also provide production-critical features like logging, monitoring, and security."

Facebook and AWS joined forces on this project to address a list of pain points PyTorch developers face when trying to take a PyTorch model into production, they two companies wrote in their GitHub Request for Comments (RFC). That list included:

  • Building a high-performance Web serving component to host PyTorch models is difficult to build and requires experience and domain knowledge.
  • Adding custom preprocessing and postprocessing for a model in service currently requires significant rework on the model server itself.
  • Supporting multiple accelerators requires additional work.
  • Any customization to the model server would require significant understanding of the existing serving framework itself and would also require significant rework.

Available now on GitHub, TorchServe lets developers run their PyTorch models without having to write custom code. It comes with commonly used handlers, such as those for object detection and text and image classification. It also includes a prediction API. Other features include multi-model serving, model versioning for A/B testing, monitoring metrics, and RESTful endpoints for application integration.

TorchServe also supports the Amazon SageMaker machine learning platform and the Amazon Elastic Container Service for Kubernetes.

"With TorchServe, you can deploy PyTorch models in either eager or graph mode using TorchScript, serve multiple models simultaneously, version production models for A/B testing, load and unload models dynamically, and monitor detailed logs and customizable metrics," AWS said in a statement.

Among the early adopters of TorchServe are Toyota, whose research division is using the service to deploy PyTorch models to fleets of automated vehicles, and Matroid, a maker of computer vision software that's using TorchServe in its efforts around object detection.

Simon provides a detailed explanation on how to install TorchServe and load a pretrained model on Amazon Elastic Compute Cloud (EC2) in the blog post.

About the Author

Gladys Rama (@GladysRama3) is the editorial director of Converge360.