S3 is an object based storage and is a key-value based - the key is the file name; the value is the data in the file plus some other data like the Version ID, access control information and metadata. S3 offers Read after Write consistency for PUT but only Eventual consistency for update PUT or DELETE.

AWS charges you for storage, requests and data transfer.

Use Multipart upload to stop and resume uploads and ideally for any file over 100mg and required over 5Gig. A benefit of multi-part upload is that you can upload a file as it is being created.

S3 Limits - the minimium size for a s3 object is 0 bytes; 100 S3 buckets per account; 5Tb object size max.

Security

Buckets are private by default; access control is executed using a bucket policy and or an access control list. Access logs and CloudTrail are useful for audit and traceability.

Bucket Policies

While IAM policies apply to the user level, bucket policies apply to the resource level. Bucket owners can specify what other users can do the contents of the bucket - provided they can log into the console. This includes scenarios where others own the object in the bucket - think cross account PUT of an object.

Bucket Policies are:

  1. Resource Based

  2. A JSON file limited to 20kb

  3. Should be used to manage all cross-account permissions for ALL S3 resources - ACL do not control detailed permissions.

  4. Possibily conditional - they can use ACL attributes to conditionally grant access to objects

S3 ACL

Stored in XML and can be used to manage access to objects NOT owned by the bucket owner. They can not explicitly deny permissions; they grant read and write permissions to other AWS accounts.

S3 IAM Policies

S3 IAM policies can NOT grant anonymous access; use pre-signed URLs for this use case. But you can DENY access with IAM Policies. The star (*) can be used in a policy as a wild card as can template strings like: ${aws:username}

Elements of an Access Policy include:

  1. Principle - specific to bucket policies NOT user policies.

  2. Effect - allow or deney

  3. Action - what we want to describe

  4. Resource - the ARN; the prefix for a bucket policy is “arn:aws:s3:::”

Data Security

Data in-transit security happens using SSL or TLS.

Data at-Rest - There are three flavors of Server Side Encryption (SSE) Server side encryption can be enabled using the REST API using x-amz-server-side-encryption header.

  • S3 Managed Keys (SSE-S3) - no audit trail. Can’t imagine this gets used much.

  • AWS Key Management Service, Managed Keys (SSE-KMS) - this provides an audit trail.

  • Customer Provided Keys (SSE-C) - you provide key; you manage key; you upload key to AWS…

  • Customer encrypted - You could encrypt on the client too.

S3 Endpoints

To secure S3 further, enable an S3 end point for private subnets and create a bucket policy to restrict traffic. This disables the ability for private subnets to reach end points outside of the region. End points can’t be used for NACL but can be used for SG, can’t be moved to another VPC, can’t be extended outside of the VPC and can’t be tagged. This also requires DNS resolution in the VPC. End points can have policies which are more like bucket policies and require a Principle to be defined.

S3 Storage Classes

  • S3 - AWS guarantees 99.99% availability for the S3 Platform and 11 x 9 for durability.

  • S3-Infrequently Accessed - Less frequently used data that needs rapid access - there is a charge a retrieval fee; 99.9% availability

  • S3-Reduced Redundancy Storage - 99.99% durability & availablity - for files that can be regen’d.

  • Glacier - an object storage capability that is mostly offline and way, way cheaper than S3.

Glacier

It might take hours (like 3-5 hours) to retrieve an object from Glacier so it is mostly suited towards stuff you need to store but will rarely, if ever, need to access.

Files moved to Glacier via a lifecycle policy can only be managed using the S3 interface/API.

Normally, Glacier stores data in archives which can range in size from 1 byte to 40 terabytes. The largest archive that can be uploaded in a single Upload request is 4 gigabytes - this would be smaller than regular S3.

Retrevial takes 3-5 hours and puts an RRS object in s3 with the original remaining in Glacier.

Versioning

Stores all version of an object; can be suspended but not disabled for bucket; can be used with MFA to provide extra layer of DELETE security. Cross region replication requires versioning and an IAM role setup. ALL version of the file are stored so the amount of spaced required can end up being HUGE. Versioning only starts after it is enable and files have been added - existing files end up with a NULL version.

Storage Tiers & Lifecycle Policies

S3 lifecycle Policy - These are XML documents that are stored as a lifecycle sub-resource attached to the bucket. PUT, GET and DELETE actions can be done with the lifecycle policies. Transitions between classess have minimium times.. sometimes.

  • S3 -> S3-IA - must be here for a minimum of 30 days (and be 128kb) before it can move to glacier

  • S3 -> Glacier - can move here directly after creation if needed

  • S3 -> Permenantly Delete - yep, cool stuff this.

  • Can’t move from SA-IA -> RRS or S3

  • Move to Glacier? if 128k (directlyAfterCreation == true   daysInS3-IA > 30 days)
  • Move from Glacier? nope.

Bucket Names & Such

Bucket names can not start with a ‘.’ or ‘-‘ and cannot be formatted like an IP address.

Files must be stored in buckets and buckets are a universal namespace.

Using a sequential prefix, such as time-stamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition. If you introduce some randomness in your key name prefixes, the key names, and therefore the I/O load will be distributed across more than one partition.

S3 Website Hosting

The minimium steps to make a website on S3 is to create an index document to your S3 bucket, enable static website hosting in your S3 bucket properties, select the ‘Make Public’ permission for your bucket’s objects or apply a bucket policy. The only allowed domain prefix when creating Route 53 Aliases for S3 static websites is the ‘www’ prefix but you can create Aliases that point to the www alias.

S3 buckets do not have https

CORS rules - The bucket hosting the assets needs CORS configuration - add the domain of the “static” site to fix this up.

S3 Transfer Acceleration

Edge Location used by CloudFront are not just for download acceleration. S3 Transfer Acceleration uses the AWS backbone to speed up uploads. This costs more money. There is a distinct URL that syncs from the CloudFront edge locations to S3 bucket.

Multi Part Upload

Multi-part upload API allows you to upload parts of an object once broken apart. As a file/object is being created, the multi-part upload API will allow you to upload the file to S3. Only after all parts of the object have been uploaded do you execute the CompleteMultipartUpload API call which completes a multi-part upload by assembling previously uploaded parts.

You first initiate the multi-part upload and then upload all parts using the Upload Parts operation (see Upload Part). After successfully uploading all relevant parts of an upload, you call this operation to complete the upload. Upon receiving this request, Amazon S3 concatenates all the parts in ascending order by part number to create a new object. In the Complete Multi-part Upload request, you must provide the parts list. You must ensure the parts list is complete, this operation concatenates the parts you provide in the list. For each part in the list, you must provide the part number and the ETag header value, returned after that part was uploaded.

S3 API Reference Points

Bucket API (s3api)

Seeing that S3 is a REST API, all the operations… well, they are REST.

S3 Bucket API Method Notes
DELETE Bucket, CORS, website, etc.
get/put bucket-versioning enable/suspend bucket-versioning
get/put bucket ACL Bucket, CORS, website, etc.
head-bucket & head-object bucket/object exist?
put-bucket-notification-configuration add notification

Object API (s3)

This section of the API is REST-like and almost mirrors the linux command line.

S3 Object API Notes
DELETE single file or using a DELETE Object
GET Object, Object ACL, Object Torrent
HEAD get meta data (might returns 404 or 403 error)
POST writes an object to S3 from a form (can also copy the file)
mb & rb make bucket & remove bucket
website creates a website
mv & rm move bucket & remove
sync sync locations

Error Codes from API

Common Errors - S3 handles error codes with HTTP response codes - which makes sense seeing that we are using a REST API here.

S3 API HTTP Responses Notes
403 Forbidden AccessDenied or AccountProblem
409 Conflict BucketNotEmpty - you tried to delete a bucket that was not empty.
400 Bad Request InvalidBucketName, InlineDataTooLarge, InvalidPart/InvalidPartOrder, TooManyBuckets
404 Not Found NoSuchBucket, NoSuchBucketPolicy
200 - Yeah! All is good, continue about your day!