S3

Re-buildable asset? S3-IA (because you can regenerate it!)
Infrequently access (Backup, archive or DR yet still active)? S3-IA
Greater than 4 hours retrieval, cheap (archive, digital preservation)? Glacier Cheap archive but charged for instant retrieval? Glacier instant retrieval Cheap archive but ok with little delay?Glacier Flexible expedited (1-5 minutes), standard (3-5 hours), or bulk (5-12 hours) Pure archive with 12-48 hours? S3 Deep archive (think frozen data)
Change S3 access permissions? role/group with policy (loose permission before assume role) OR bucket policy (to add access permissions using existing role/user/group)
Different access to same bucket? Access points
Protect S3 file from delete or overwrite? enable versioning
Protect S3 bucket deletion? backup bucket to another account
Only certain folks can delete and overwrite? enable MFA delete
Only certain folks can change state of versioning? enable MFA
S3 Cross-region replication? Requires versioning on source bucket; deletes and lifecycle actions not supported
Authenticated upload? Pre-signed upload URLs
Faster download of large objects? range based GET or use CloudFront
High TPS workload? prefix the key name with a hash or reverse timestamp string
Slow file upload speed? S3 Transfer acceleration or multi-part uploads
PUT optimization on a strong network? multi-part uploads of between 25-50MB to max throughput
PUT optimization on a weak network? multi-part uploads of about 10MB to prevent upload restart
bucket is public OR objects are public? Not Trusted Advisor (can’t check for public objects in the bucket) instead use EventBridge/S3 Events OR Config.
WORM data

EBS

Burst-able? gp2/3
temp storage 365+K IOPS? instance store
Shared file storage, no POSIX? EBS io2 multi-attach
Shared file storage, POSIX? EFS
NoSQL? io2/3/express (365K IOPS)
MPP/Hadoop/EMR? D2 3.5 Gib/s
Backup instance store? file backup on mounted EFS/EBS, no snap shotting possible

DataSync/Storage Gateway/Snowball

Sync from on-prem to S3, EFS, EBS FSx? DataSync (agent that includes encryption and integrity checks for NFS or SMB)
Between storage services? DataSync (EFS -> EBS on EC2 instance)
Low end DR? S3 directly using DataSync (Storage Gateway is less performant).
When Storage Gateway Cached? frequently access to small amount of data; less to maintain local so cheaper outlay,
When Storage Gateway Stored? low latency for ALL data with EBS Snapshots; snapshots can be used as a migration tool; better for traditional instance backup
When Virtual Tape Library? fast restore. 1500 tapes; stored on S3.
When Virtual Tape Archive? unlimited storage and OK with a 24 hour turn around time;
When Snowball? more than 2TB of data for a T3 connection, 5TB for a 100MB connection, and 60TB for 1000 Mbps connections.

Database options

ACID, joins, complex queries? RDS
Don’t use RDS for index and query focused, BLOB
Use DynamoDB for indexed, query focused data, automated scalability
prewritten apps, data larger than 400kb, BLOB, relational data? !DynamoDB
DB2, Informix or Sybase? EC2
Complete control of DB? EC2 for

Backup

Need to automate backup? EBS, FSx for Lustre
No need for Automated backup? Redshift, Redis, RDS, FSx
Automate EBS backup? Data Lifecycle Manager (DLM)
Automate all AWS Backup? = AWS Backup (including EBS)

EC2 attached storage

Long term data storage? EBS
data shared between instance fast? memecached or redis
persist data shared between instances? EFS
DB or INTENSE data warehouse server? IOPS/io1
General purpose = gp2
Large I/O including EMR, Kafka, log processing and data warehouse ETL? st1 (sequential data reads)
Super high disk IO? either RAID 0 or RAID 10 EBS, EBS-optimized instances, modern Linux kernel

FSx vs EFS

FSx for windows? SMB/DFS/namespaces windows-based file server supporting AD; can be used for centralized storage for Sharepoint, SQL server, Workspaces or IIS.
HPC applications? FSx Lustre (sub-millisecond latencies, 100 GPS throughput and millions of IOPS; can be loaded from S3)
NFS for Linux boxes? EFS
Linux >7000 file operations per sec? EFS Max I/O
Shared File storage for EC2 multi-threaded linux? General Purpose EFS (burstable)

FSx

No downtime FSx for Windows Single AZ to multi-AZ? Create a multi-AZ file system then use DataSync to migrate the data over.
For downtime ok FSx for Windows Single AZ to multi-AZ? Backup the single-AZ and restore multi-AZ works well.
Reduce size of file system? create smaller version and DataSync
Increase size of file system? increase must be greater than 10%, must have throughput of 16 MB/s, 6 hours after last increase

AWS - Data Storage 03 December 2022

S3

EBS

DataSync/Storage Gateway/Snowball

Database options

Backup

EC2 attached storage

FSx vs EFS

FSx