News

Amazon Web Services Adds Job Scheduling to Machine Learning Platform

Amazon Web Services Inc. integrated its Batch job scheduling service with SageMaker Training, allowing organizations to queue and prioritize machine learning workloads more efficiently, the company said Wednesday.

The integration addresses a common challenge where data science teams wait for graphics processing unit availability while infrastructure administrators struggle to maximize utilization of expensive computing resources, AWS said in a blog post.

AWS Batch automatically provisions computing resources based on job requirements and scales capacity up or down based on demand. The service includes retry mechanisms for failed jobs and fair-share scheduling to distribute resources equitably among users or projects.

"With multiple variants of Large Behavior Models to train, we needed a sophisticated job scheduling system," said Peter Richmond, director of information engineering at Toyota Research Institute, in the blog. "AWS Batch's priority queuing, combined with SageMaker AI Training Jobs, allowed our researchers to adjust their training pipelines dynamically."

The new capability allows machine learning scientists to submit training jobs to queues without the need for manual coordination of infrastructure allocation. Organizations can assign priority levels to ensure that critical workloads receive resources first, while also maximizing the utilization of expensive accelerated computing instances.

AWS Batch has previously supported job scheduling for container services, including Elastic Container Service, Elastic Kubernetes Service, and Fargate. The SageMaker integration extends those capabilities to machine learning training workflows.

The service creates job definitions that specify container images, instance types, and security roles, while job queues hold submitted work until resources become available. Service environments define maximum infrastructure capacity limits for scheduling purposes.

AWS did not disclose pricing details for the integrated service. SageMaker Training jobs are billed based on instance usage time, while AWS Batch itself does not charge additional fees beyond underlying computing costs.

The feature is available immediately through the AWS Batch console and command-line interface.

Amazon Web Services Inc. integrated its Batch job scheduling service with SageMaker Training, allowing organizations to queue and prioritize machine learning workloads more efficiently, the company said Wednesday.

The integration addresses a common challenge where data science teams wait for graphics processing unit availability while infrastructure administrators struggle to maximize utilization of expensive computing resources, AWS said in a blog post.

AWS Batch automatically provisions computing resources based on job requirements and scales capacity up or down based on demand. The service includes retry mechanisms for failed jobs and fair-share scheduling to distribute resources equitably among users or projects.

"With multiple variants of Large Behavior Models to train, we needed a sophisticated job scheduling system," said Peter Richmond, director of information engineering at Toyota Research Institute, in a blog post. "AWS Batch's priority queuing, combined with SageMaker AI Training Jobs, allowed our researchers to adjust their training pipelines dynamically."

The new capability allows machine learning scientists to submit training jobs to queues without the need for manual coordination of infrastructure allocation. Organizations can assign priority levels to ensure that critical workloads receive resources first, while also maximizing the utilization of expensive accelerated computing instances.

AWS Batch has previously supported job scheduling for container services, including Elastic Container Service, Elastic Kubernetes Service, and Fargate. The SageMaker integration extends those capabilities to machine learning training workflows.

The service creates job definitions that specify container images, instance types, and security roles, while job queues hold submitted work until resources become available. Service environments define maximum infrastructure capacity limits for scheduling purposes.

AWS did not disclose pricing details for the integrated service. SageMaker Training jobs are billed based on instance usage time, while AWS Batch itself does not charge additional fees beyond underlying computing costs.

The feature is available immediately through the AWS Batch console and command-line interface.

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at [email protected].

Featured

Upcoming Training Events

0 AM
TechMentor @ Microsoft HQ
August 11-15, 2025
Live! 360 Orlando
November 16-21, 2025
Cloud & Containers Live! Orlando
November 16-21, 2025
Data Platform Live! Orlando
November 16-21, 2025
TechMentor Orlando
November 16-21, 2025
TechMentor @ Microsoft HQ
August 10-14, 2026