Demonstrate ability to implement DR for systems based on RPO and RTO

  • Recovery point objective (RPO) - amount of data, based on time, the business can loose; the shorter then more expensive

  • Recovery time objective (RTO) - The time it takes after a disruption to restore a business process to its service level including time to restore data

AWS is great for DR because it is flexible, Opex model, automation is easy.

Four scenarios: Backup and restore, pilot light, warm standby, and multi-site

backup and restore

  • RDS - auto backup with max 35 days retention; no backup of read-replicas; On-premise RDS replication

  • Elasticache - redis can be backed up (very similiar to RDS); not memecached

  • Redshift - snapshots with continuous S3 backup for nodes; snapshot cross-region copy feature; restore = new cluster complete with configuration

  • EBS - NOT automatic; encrypted volumnes = encrypted snapshot

  • S3/Glacier - S3 is a great target for backup; Glacier has a long RTO metric

  • Storage Gateway

  • Snowball & Import/Export Snowball

Pilot light

  • Route53 - health checks

  • ASG - Autoscaling min/max adjustment & stored launch Configuration

  • EC2 - AMI as a backup of system configurations

  • RDS - replication between AZ or from on-prem

Warm Standby

  • Route53 - Route53 - Weighted

  • RDS - Multi-AZ - synchronous


  • Route53 - latency based routing with health checks;

  • DynamoDB - Cross region Replication

  • RDS - cross region read Replicas - async; No Oracle or MS; encryption, options sets, and parameter sets challenging

  • Redshift - automated cross region snapshot copy

Determine appropriate use of multi-Availability Zones vs. multi-Region architectures

nothing really…

Demonstrate ability to implement self-healing capabilities

  • RDS - Multi-AZ

  • ASG - Autoscaling min/max adjustment & stored launch Configuration

Key Resources