Batch enables batch computing workloads on the AWS Cloud using Docker images. Can run serverless using Fargate OR dynamic provisioning of on-demand or Spot EC2 instances. Can be scheduled by EventBridge and orchestrated using Step Functions.
Use Multi-node parallel jobs in Batch to run large-scale, tightly coupled, high performance computing applications and distributed GPU model training without the need to launch, configure, and manage Amazon EC2 resources directly. There is 1 main node and many child nodes; feels similar how Hadoop works. Works best with EC2 launch mode is a Cluster placement group; does not work with Spot.
Workloads can run Managed or UnManaged compute environments. Un-managed compute environments require instances configuration, provisioning, scaling, and management. Managed compute environments you set the min/max vCPU and fleet type (on-demand or Spot) and Batch manages the capacity and instances types. Managed environments require the VPC must have access to the ECS and ECR service - NAT/IGW or VPC endpoint work.
Managed compute environment set up:
- Set min/max vCPU and fleet type (on-demand or Spot)
- Batch launches instance that are likely different sizes
- SDK is used to add Batch Job Queue
- Batch Job Queue distributes jobs
- Autoscaling happens
Lambda vs Batch
Item | Lambda | Batch |
---|---|---|
Time limit | 15 minutes | no time limit |
Runtime | limited | any runtime |
Disk space | limited | unlimited (EBS) |
Compute | Serverless | Fargate or EC2 (on-demand, Spot) |