Relational Database Service enables numerous RDMBS systems to run as a service. These transactional storage engines are recommended for ACID transactions (Atomicity, Consistency, Isolated, Durability). RDS instances without Multi-AZ don’t perform as well as those that are - backups, restores and all housekeeping activities are performed on the secondary instance.

Backups

Backups are stored on S3. By default, backups are enabled and have a max retention of 35 days. A backup retention period of zero days will disable automated backups for this DB Instance.

Deleting an RDS instance deletes ALL the automated backups… but not the manual ones. =»“I acknowledge that upon instance deletion, automated backups, including system snapshots and point-in-time recovery, will no longer be available.”

In a multi-AZ configuration, backups are done from the standby. Backups without multi-AZ are done on the primary and I/O is suspended for about 1 minute.

Every 5 minutes, RDS backs up the transaction log. Combined with regular backups the “Latest Restore Time” is computed and “Restore to point in Time” option allows you to pinpoint a time to restore from.

Restoration

Restores manifest themselves as a new RDS instance with a new endpoint. Restoration only works with the DB parameter and security groups are associated with the snapshot. You maybe can change storage engines - if they are related.

Snapshots

Snapshots are a full copy of RDS database that are independent of scheduled backups. In a multi-AZ configuration, snapshots are done from the standby.

Encryption

All RDS instance types support encryption at rest and every RDS instance also includes a SSL Certificate for data encryption in flight. Encryption at rest must be setup at instance creation time and uses KMS to generate Master keys then use envelope encryption to generate data keys for the RDS instance. Once enabled, EVERYTHING is encrypted including read replicas, logs, and snapshots. The keys can’t be changed after installation.

Amazon RDS creates an SSL certificate and installs the certificate on the DB instance when Amazon RDS provisions the instance.

To copy an encrypted snapshot from one region to another, you must specify the KMS key identifier of the destination region. This is because KMS encryption keys are specific to the region that they are created in.

Oracle and MS SQL Server support Transparent Data Encryption. Oracle does NOT integrate with KMS but will work with CloudHSM and the Oracle Wallet. SQL Server requires a key but that key is managed by RDS after enabling TDE.

Multi-AZ failover

A multi-AZ failover process is key in the event of an AZ failure but is NOT a scaling solution. Multi-AZ systems are setup within the same region - that would be obvious from the name.

For MySQL, Oracle and PostgreSQL replication from one zone to another causes a lot of higher write and commit latency - this is a synchronous physical replication using AWS technology - provisioned IOPS is recommended. Native replication, called Mirroring, is used for MS SQLServer.

Patching is another benefit of standy instances - patch the standby, failover, then patch the primary. Backups can be created from the standby instance and that can eliminate I/O locking and latency spikes from backups on the primary.

Failover is automatically triggers when:

  • the AZ fails

  • the underlying hardware fails

  • Loss of connectivity

  • Or you can force a fail over using a Reboot with failover using the console or with RebootDBInstance API call

  • Software patching

A slow server and corrupt data will NOT lead to a failover. Getting notification about a failover as you would expect - RDS events.

Failovers are implemented as a DNS change so the application servers have to re-establish connections with the failover instance.

RDS Read Replicas

Read Replicas are used to scale RDS by creating a READ ONLY copy of your database in a single AZ with a new DNS address. Can be done from the console OR using the CreateDBInstanceReadReplica API call. Possible use cases include:

  • Scale beyond single instance I/O capabilities

  • Serve read-only traffic during times the primary is unavailable (hard to visualize a scenarios where this is useful)

  • Business reporting against almost live data without affecting the performance of the primary instance

A transactional DB engine must be used to support read replicas - so Aurora/MySQL/MariaDB must have InnoDB engine installed. Needs MySQL version 5.6 or later. PostgreSQL requires 9.3.5 and in this configuration uses the native PostgreSQL asynchronous streaming replication engine is used.

MySQL, PostgreSQL, and MariaDB support having read replicas in a different region BUT are not multi-AZ. Using SSL for cross region replication is a great idea seeing that the data moves across the open Internet. Encryption is supported but most RR must match be encrypted if the DB cluster is encrypted.

Oracle and SQL Server do not support read replicas.

Limits:

  • max 5 read replica per primary instance

  • DB Snapshots and backups can NOT be taken from read replicas.

  • Replicas can be promoted to a primary but this breaks the replication link.

Replica lag is a key metric - keeping the read replica on a similiar, in fact, exact same configuration of instances can help keep this metric inline.

Read Replica Setup

Read replicas are created from a snapshot. Auto backup must be enabled (this is done by turning the backup retention period to greater than 0) then RDS takes a backup copy of the database then enables replication. Without multi-AZ operations, IO is suspended while RDS snapshots the database; with multi-AZ, the snapshot is taken from the slave. This can be enabled from the console OR using the CreateDBInstanceReadReplica call from the API.

Aurora

MySQL compatible, relational DB that starts with 10Gb and scales in 10Gb increments to 64 Tb and up to 32 vCPUs and 244 Gb of RAM.

Security is a good thing; must exist in a VPC, AES-256 for data in transit, use KMS keys or other keys to encrypt storage, snapshots, backup, and replicas.

It has amazing HA capabilities with 2 copies in 3 AZ so you get 6 copies of your data and it is self-healing through disk and data block data scanning and errors are fixed. You can have up to 15 Aurora read replicas and up to 5 MySQL read replicas (large affect on the primary because the transaction log is played against the replica)

There is automatic, continous incremental backups with a point-in-time restore within a second and are stored for 35 days by default.

Oracle on RDS

Version of Oracle available include Oracle EE, Oracle SE, and Oracle SE One. Support Multi-AZ & manual snapshots. Oracle does NOT support multi-region replication.

Importing uses two different tools… Oracle SQL Developer for small amounts and Oracle Data Pump for large amounts of data.

Recovery Manager (RMAN) can be used for backup and recovery scenarios including hybrid on-prem/Cloud scenarios and Oracle EC2 instances. Use RDS snapshots for point-in-time snapshots in AWS only solutions.

HA: Oracle RAC is not supported on RDS but can be implemented using EC2 placement groups and using Ntop N2n VPN software to tunnel between nodes. In addition, Oracle Data Guard can be implemented as well.

MS SQL on RDS

Supports point-in-time backups using snapshots, Multi-AZ deployments using the native MS Mirroring technology but read replicas are not supported. There is no support for multi-region or Cloud to on-prem replication in RDS based MS SQL but native tools can be used.

On-prem to Cloud migration is super ugly… create RDS empty tables, disable backups/key contraints, import flat files. In addition, FILESTREAM functions are not supported and there is no ability to restore data from file.

DB Subnet

A DB subnet defines the regions for a multi-AZ RDS setup. The high level steps to create a DB subnet are as follows:

  1. Create 2 new subnets in a VPC

  2. In RDS Dashboard create subnet group & add the subnets you just created

  3. Now launch an RDS instance and select Multi-AZ Deployment

  4. In “configure advanced settings” select subnet group you created

Presto, you have a multi-AZ DB subnet!

Reserved RDS Instances

Each reservation is associated to a specific DB Engine, instance class, deployment type, license model and region (NOT AZ). And yes, you can have a reserved read replicas.

If any of these attributes are changed then pricing switches to on-demand pricing. Switch your instance BACK to the specs of the reservation OR create a new instance with the same specs, reservation pricing will apply.

Troubleshooting

Scaling the instance can happen immediately, if the checkbox is checked, or will happen in your maintenance window.

Resizing will result in downtime if in a single AZ configuration. In a multi-az, the standby gets resized, there is a fail-over then the master is resized.

RDS Monitoring

There are some non-critical metrics that can be monitored like CPUUtilization, DatabaseConnections, DiskQueueDepth, FreeableMemory and FreeStorageSpace.

More critical metrics and their solutions include:

Metric Solution
SwapUsage Increase RAM
ReadIOPS/WroteIOPS Increase IOPS
ReadLatency/WriteLatency Increase IOPS
ReadThroughPut/WriteThroughPut Increase IOPS
FreeStorageSpace Increase disk space
ReplicaLag Increase IOPS or instance size

In addition to CloudWatch you can use the RDS service console to monitor RDS events. Event Subscriptions can be created as well.

Triage

Multi-AZ? For DR not scaling, sync replication

Read replica? Scaling not DR; async replication

Index/Query focused data? DynamoDB

Lots of BLOBS? Pointer to S3

Push button scalability? DynamoDB

DB2, Informix, Sybase? EC2

Complete control? EC2