Batch enables batch computing workloads on the AWS Cloud using Docker images. Can run serverless using Fargate OR dynamic provisioning of on-demand or Spot EC2 instances. Can be scheduled by EventBridge and orchestrated using Step Functions.

Use Multi-node parallel jobs in Batch to run large-scale, tightly coupled, high performance computing applications and distributed GPU model training without the need to launch, configure, and manage Amazon EC2 resources directly. There is 1 main node and many child nodes; feels similar how Hadoop works. Works best with EC2 launch mode is a Cluster placement group; does not work with Spot.

Workloads can run Managed or UnManaged compute environments. Un-managed compute environments require instances configuration, provisioning, scaling, and management. Managed compute environments you set the min/max vCPU and fleet type (on-demand or Spot) and Batch manages the capacity and instances types. Managed environments require the VPC must have access to the ECS and ECR service - NAT/IGW or VPC endpoint work.

Managed compute environment set up:

  1. Set min/max vCPU and fleet type (on-demand or Spot)
  2. Batch launches instance that are likely different sizes
  3. SDK is used to add Batch Job Queue
  4. Batch Job Queue distributes jobs
  5. Autoscaling happens

Lambda vs Batch

Item Lambda Batch
Time limit 15 minutes no time limit
Runtime limited any runtime
Disk space limited unlimited (EBS)
Compute Serverless Fargate or EC2 (on-demand, Spot)