Amazon Simple Storage Service

S3 is an object based storage and is a key-value based - the key is the file name; the value is the data in the file plus meta data like the Version ID, and access control information. S3 offers Read after Write consistency for PUT (for new objects or over writes) and DELETE. There is no object locking; there is no atomic updates across keys.

The minimum size for a s3 object is 0 bytes; 100 S3 buckets per account; 5Tb object size max. AWS charges you for storage, requests, and data transfer. Bucket names can not start with a ‘.’ or ‘-‘ and cannot be formatted like an IP address.

S3 Storage Classes

  • S3 - AWS guarantees 99.99% availability and 11 x 9 for durability for the S3 Platform.
  • S3-Infrequently Accessed - Less frequently used data that needs rapid access - there is a charge for retrieval; 99.9% availability; minimum charge of 30 days and 128kb regardless of size.
  • S3-One Zone-IA - 99.99% durability & availability - like the name suggests files are only stored in a single AZ. Useful for files that can be regen’d like thumbnails. 128kb minimum size
  • S3 Glacier Instant Retrieval - an object storage capability that is way, way cheaper than S3 Standard but you get charged for instant retrieval. The minimum time in class is 90 days and 128kb minimum size.
  • S3 Glacier Flexible - this was formally known as S3 Glacier - objects appear in a bucket but you must retrieve them to a S3-IA using expedited (1-5 minutes), standard (3-5 hours), or bulk (5-12 hours). Also has a 90 day minimum and 40kb minimum size.
  • S3 Deep archive - think frozen data; retrieval is really slow - standard 12 hours and bulk which is up to 48 hours. Also has a 180 day minimum and 40kb minimum size.
  • S3 Intelligent tiering - uses AI to put the data in the right storage class; slightly modified storage classes with Archive Access and Deep Archive as options

Replication

Cross region replication requires versioning (on both ends) and an IAM role setup. Once replication is turned on, only new/modified objects are replicated; delete markers are not replicated but public permissions are replicated between buckets. Replication can be combined with lifecycle rules. S3 Replication Time Control replicates 99.99% of objects within 15 minutes, with most replicated in seconds.

Performance

Baseline, S3 is capable of 3500 PUT/COPY/POST/DELETE AND 5500 GET/HEAD requests per second per prefix. If you introduce some randomness in your key name prefixes the I/O load will be distributed across more than one partition. Using a sequential prefix, such as time-stamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition.

Multi Part Upload

Multi-part upload allows you to upload parts of an object once broken apart. Multipart uploads are recommended for files >100Mb and required for files over 5 GB. Use a lifecycle policy to remove incomplete uploads; use CLI to list incomplete multi-part uploads.

S3 Transfer Acceleration

S3 Transfer Acceleration uses an edge location ingress point to the AWS network ASAP to forward the data to the right region faster. There is a distinct URL that syncs from the CloudFront edge locations to S3 bucket. This feature is like Global Accelerator but specifically for S3.

Byte-Range Fetch

S3 has a Byte-Range Fetch feature which parallelizes downloads… this can be used for speeding the downloads of an entire file OR partial amounts of a file.

S3 Select & Glacier Select

S3 Select and Glacier Select enables the retrieval of data using simple SQL statements and is similar to Athena except for a single file. This enables you to download just the data you need from a CSV without downloading the entire file.

Analysis

S3 Analytics Storage Class Analysis

S3 Analytics, which might be called Storage Class Analysis on the test, provides recommendations for Standard and Standard IA class objects, but NOT One-Zone IA or Glacier. The reports is updated daily but takes 24-48 hours to start working. Visualize data in QuickSight.

S3 Storage Lens

S3 Storage Lens delivers Organization-wide, as in AWS Organizations, visibility into object storage usage, activity trends, and makes actionable recommendations to optimize costs and apply data protection best practices.

Data can be aggregated across an organization, accounts, regions, and buckets into a default or customer dashboard; the data can also be exported daily to an S3 bucket in CSV or Parquet format. The default dashboard is free, can be disabled but not deleted. For more cash, addition data includes advanced metrics, CloudWatch publishing, and prefix aggregation are available going back 15 months.

S3 Inventory

Use S3 Inventory to audit and report on the replication and encryption status of your objects for business, compliance, and regulatory needs. Feels like the predecessor to S3 Storage Lens but NOT for Organizations.

Access Analyzer for S3

While Access Analyzer is part of IAM, Access Analyzer for S3 provides findings for buckets that can be accessed outside your AWS account including the bucket name, when the find occurred, how it is shared, and what access level there is.

Storage Classes & Lifecycle Rules

S3 Lifecycle Rules are XML documents that are stored as a lifecycle sub-resource attached to the bucket. PUT, GET and DELETE actions can be done with the lifecycle policies. Transitions between classes have minimum times.. sometimes.

  • S3 -> S3-IA - must be here for a minimum of 30 days (and be 128kb) before it can move to glacier
  • S3 -> Glacier - can move here directly after creation if needed
  • S3 -> Permanently Delete - yep, cool stuff this.
  • Can’t move from SA-IA -> RRS or S3
  • Move to Glacier? if 128k (directlyAfterCreation == true   daysInS3-IA > 30 days)
  • Move from Glacier? nope.

Access Control

Buckets are private by default; access control is executed using an IAM policy, a bucket policy, or an access control list (ACL). ACLs are not recommended and should be disabled; if ACLs are enabled who ever creates the object owns it which complicates administration. Access logs and CloudTrail are useful for audit and traceability. Use Access Analyzer to figure out what is shared out of an account.

IAM Policies for S3

S3 IAM policies can NOT grant anonymous access; use pre-signed URLs for this use case. But you can DENY access with IAM Policies.

Bucket Policies

While IAM policies apply to the user level, bucket policies apply at the resource level. Bucket owners can specify what other users can do the contents of the bucket. Use Bucket Policies to grant public access, force encryption, cross account access; conditions include:

  1. Public or Elastic IP (but not Private IP) for Source IP
  2. Source VPC or Source VPC Endpoint
  3. CloudFront Origin
  4. MFA

S3 Bucket ACL

Stored in XML and can be used to manage access to objects NOT owned by the bucket owner. They can not explicitly deny permissions; they grant read and write permissions to other AWS accounts. Bucket ACLs are not recommended; they can be disabled using S3 Object Ownership tool.

Cross account access

Three ways to make this happen:

  1. bucket policy and IAM - entire bucket, programmatic access only.
  2. Cross account IAM role - programmatic and console access

Pre-signed URLs

Pre-signed URL can be used to create an object, retrieve an object, or retrieve meta data about an object. Recipient of URL has same rights as principle that signed; if a process creates the URL it must have access to the resources. Can use SDK/CLI for downloads; must use SDK for uploads. 3600 seconds timeout that can be changed using the expires-in [seconds] option.

To put an object and require SSE-C encryption you use x-amz-server-side​-encryption​-customer-algorithm; to use the rest API use x-amz-server-side​-encryption​-customer-algorithm, x-amz-server-side​-encryption​-customer-key or ‘x-amz-server-side​-encryption​-customer-key-MD5`.

Block Public Access

Use S3 Block Public Access to block Public Access settings to override public policies and permissions so that you can limit public access to these resources.

S3 Access Points

To simplify access to buckets use an Access Point which gets it’s own DNS and policy; use more than one to further simplify things. Policies can restrict traffic from a specific VPC; access points are linked to specific bucket. A bucket can have more than one access point. Privately share bucket in other accounts scenario:

  1. Create S3 Access point in account sharing bucket using “Condition” S3:AccessPointNetworkOrigin: VPC (Restrict traffic to outside the VPC)
  2. Create Gateway Endpoint linking it to the S3 Access point in the shared bucket account
  3. Create Gateway Endpoint Policy enabling access

Data Security

Data in-transit security happens using SSL or TLS. To enforce HTTPS, use the aws:SecureTransport condition in the resource policy. MFA delete is an option that keeps randos from deleting your stuff. Customer can also encrypted on the client side. All Glacier data is AES-256 encrypted using AWS keys.

Data at-Rest - There are three flavors of Server Side Encryption (SSE) Server side encryption can be enabled using the REST API using x-amz-server-side-encryption header.

  • SSE-S3 - keys handled and managed by AWS; each object is encrypted by a unique key with the master key that encrypts itself.
  • SSE-KMS - KMS Managed Keys; provides an audit trail using CloudTrail, the kms:GenerateDataKey permission must be present when writing to S3 (s3:PutObject). KMS decryption has a hard limit for the number of requests possible and the number is region specific; could be 5,500, 10K or 30K per second.
  • Customer Provided Keys (SSE-C) - you provide key; you manage key; you upload key to AWS…

S3 Endpoints

S3 exposes an HTTP and HTTPS public endpoint; HTTPS is mandatory for SSE-C.

S3 Gateway Endpoints

To secure S3 further, enable an S3 Gateway endpoint for private subnets. Use the bucket policy to restrict access on AWS:SourceVpce or AWS:SourceVpc

Endpoints don’t work with NACL but can be protected by an SG, can’t be moved to another VPC, can’t be extended outside of the VPC, and can’t be tagged. This also requires DNS resolution in the VPC. Endpoints can have policies which are more like bucket policies and require a Principle to be defined.

Versioning

Versioning only starts after it is enable and files have been added - existing files end up with a NULL version initially and have an alphanumeric value after update. ALL version of the file are stored so the amount of spaced required can end up being HUGE. Versioning stores all version of an object and can be used with MFA to provide extra layer of DELETE security. Specific versions of files can be made public; later versions of a public file are NOT public. Versioning can be suspended, which keeps all file versions created so far, but versioning can not be turned off for a bucket. When a versioned file is delete a delete flag is enabled.

S3 Website Hosting

The minimum steps to make a website on S3 is to create an index document to your S3 bucket, enable static website hosting in your S3 bucket properties, select the ‘Make Public’ permission for your bucket’s objects or apply a bucket policy. The only allowed domain prefix when creating Route 53 Aliases for S3 static websites is the ‘www’ prefix but you can create Aliases that point to the www alias. S3 buckets do not have https.

CORS rules - The bucket hosting the assets needs CORS configuration - add the domain of the “static” site to fix this up.

S3 Object Lock & Glacier Vault Lock

Object lock is available for S3 and Glacier enabling you to store objects in a write once, read many times model (WORM). There are two modes: governance mode and compliance mode. Governance mode is for when you want to protect data for a period of time but want admin control to be able to over-ride the protection. Compliance mode is protected until the retention date has passed.

Object lock has a retention period, from 1 to no max # of days, and a legal hold. The legal hold must be explicitly removed to change the file.

Access Logging and Events

  • S3 Access logs - delivery takes hours and might be incomplete
  • S3 Event Notifications - seconds to minutes for delivery to SNS, SQS or Lambda; lots of notifications with versioning; it’s possible to get just one notification for simultaneous writes
  • EventBridge - enable CloudTrail object logging on S3 first