MLOps

Compute & Storage

FSx for Lustre can speed training jobs by serving Amazon S3 data to Amazon SageMaker at high speeds. There isn’t an interface directly from SageMaker to DynamoDB; use Data Pipeline (!), assuming you want to use the commandline because it’s in maintenance mode…, and move the data to S3.

Common data format for SageMaker training include recordio, recordio-protobuf, jsonlines, jpeg/png, csv, libsvm (XGBoost).

When performing distributed training, protect data in transit by enabling inter-container traffic encryption and place instances in a VPC.

To adapt your own training container, put the script that copies and configures the container in the /opt/ml/code/ directory and set the SAGEMAKER_PROGRAM environment variable to the file name.

Training Compiler integrates with Deep Learning Containers to compile and optimize training jobs on GPU instances. Works with Hugging Face transformers or BYOM. Enable debugging using the compiler_config parameter; PyTorch must use the XLA model save function.

Inference

Lots of options on how to run inference code in a container. Built-in algos can use SageMaker containers without modification; need custom packages? use prebuilt containers, extend the container or use requirements.txt. Need custom model? Use the SageMaker Training and SageMaker Inference toolkits to run scripts, train algorithms, and deploy models using custom containers; use the Docker ENTRYPOINT to specify the starting point for SageMaker. Local mode enables workstation based testing; requires Python SDK.

For batch scenarios, like for an entire dataset at a time, SageMaker Batch Transform is the best tool to use. Batch Transform can exclude predictions and join the results with CSV, text or JSON data.

SageMaker Neo enables machine learning models to train once and run anywhere in the cloud and at the edge. Neo automatically optimizes Gluon, Keras, MXNet, PyTorch, TensorFlow, TensorFlow-Lite, and ONNX models for inference on Android, Linux, and Windows machines based on processors from Ambarella, ARM, Intel, Nvidia, NXP, Qualcomm, Texas Instruments, and Xilinx.

SageMaker Inference Recommender automates load testing and model tuning across ML instances given the model is in the SageMaker model registry, uses a prebuilt inference image, and a bunch of other things. Works with new models, Neo, and existing endpoints.

Inference pipelines are a series of two to five container definitions that process inference requests in a fully managed pipeline. Pipelines can combine preprocessing, predictions, and post-processing tasks on a set of 2-15 instances with the goal of simplifying deployment of code common to both preprocessing and feature engineering.

Deployment Safeguards support blue/green deployments (all at once, canary, linear) with auto-rollbacks. Shadow variants are similar to canary deployments.

Inference Triage

AB test? test multiple models or model versions behind the same endpoint using production variants; variants are named and have weights.