S3 is an object based storage and is a key-value based - the key is the file name; the value is the data in the file plus some other data like the Version ID, access control information and metadata. S3 offers Read after Write consistency for
DELETE and only Eventual consistency for update for
AWS charges you for storage, requests, and data transfer.
Use Multipart upload to stop and resume uploads and ideally for any file over 100mg and required over 5Gig. A benefit of multi-part upload is that you can upload a file as it is being created.
S3 Limits - the minimium size for a s3 object is 0 bytes; 100 S3 buckets per account; 5Tb object size max.
Buckets are private by default; access control is executed using an IAM policy, a bucket policy, or an access control list (ACL). ACLs are not recommended. Access logs and CloudTrail are useful for audit and traceability.
IAM Policies for S3
S3 IAM policies can NOT grant anonymous access; use pre-signed URLs for this use case. But you can DENY access with IAM Policies. The star (*) can be used in a policy as a wild card as can template strings like:
Elements of an Access Policy include:
Principle - specific to bucket policies NOT user policies.
Effect - allow or deney
Action - what we want to describe
Resource - the ARN; the prefix for a bucket policy is “arn:aws:s3:::”
While IAM policies apply to the user level, bucket policies apply to the resource level. Bucket owners can specify what other users can do the contents of the bucket - provided they can log into the console. This includes scenarios where others own the object in the bucket - think cross account PUT of an object.
Bucket Policies are:
A JSON file limited to 20kb
Should be used to manage all cross-account permissions for ALL S3 resources.
Possibily conditional - they can use ACL attributes to conditionally grant access to objects
S3 Bucket ACL
Stored in XML and can be used to manage access to objects NOT owned by the bucket owner. They can not explicitly deny permissions; they grant read and write permissions to other AWS accounts. Bucket ACLs are not recommended.
Cross account access
Three ways to make this happen:
bucket policy and IAM - entire bucket, programmatic access only.
bucket ACL and IAM - entire bucket & individual objects, programmatic access only. (not recommended)
Cross account IAM role - programmatic and console access
Data in-transit security happens using SSL or TLS. MFA delete is an option that keeps randos from deleting your stuff.
Data at-Rest - There are three flavors of Server Side Encryption (SSE) Server side encryption can be enabled using the REST API using x-amz-server-side-encryption header.
AWS Key Management Service, Managed Keys (SSE-KMS) - this provides an audit trail.
Customer Provided Keys (SSE-C) - you provide key; you manage key; you upload key to AWS…
Customer encrypted - You could encrypt on the client too.
To secure S3 further, enable an S3 end point for private subnets and create a bucket policy to restrict traffic. This disables the ability for private subnets to reach end points outside of the region. End points can’t be used for NACL but can be used for SG, can’t be moved to another VPC, can’t be extended outside of the VPC, and can’t be tagged. This also requires DNS resolution in the VPC. End points can have policies which are more like bucket policies and require a
Principle to be defined.
S3 Storage Classes
S3 - AWS guarantees 99.99% availability for the S3 Platform and 11 x 9 for durability.
S3-Infrequently Accessed - Less frequently used data that needs rapid access - there is a charge a retrieval fee; 99.9% availability
S3-One Zone (used to be called Reduced Redundancy Storage) - 99.99% durability & availablity - like the name suggests files are only stored in a single region. Useful for files that can be regen’d like thumbnails.
S3 Intelligent tiering - uses AI to put the data in the right storage class
Glacier - an object storage capability that is mostly offline and way, way cheaper than S3. Retrieval times are configurable from minutes to
S3 Glacier deep archive - 12 hour retrieval time is ok
It might minutes to hours to retrieve an object from Glacier so it is mostly suited towards stuff you need to store but will rarely, if ever, need to access.
Files moved to Glacier via a lifecycle policy can only be managed using the S3 interface/API.
Normally, Glacier stores data in archives which can range in size from 1 byte to 40 terabytes. The largest archive that can be uploaded in a single upload request is 4 gigabytes - this would be smaller than regular S3. It’s not possible to write to a vault from the console.
Stores all version of an object and can be used with MFA to provide extra layer of DELETE security. ALL version of the file are stored so the amount of spaced required can end up being HUGE. Versioning only starts after it is enable and files have been added - existing files end up with a NULL version. Specific versions of files can be made public; later versions of a public file are NOT public. Versioning can be suspended, which keeps all file versions created so far, but versioning can not be turned off for a bucket. When a versioned file is delete a delete flag is enabled.
Storage Classes & Lifecycle Rules
S3 lifecycle Rules - These are XML documents that are stored as a lifecycle sub-resource attached to the bucket.
DELETE actions can be done with the lifecycle policies. Transitions between classes have minimium times.. sometimes.
S3 -> S3-IA - must be here for a minimum of 30 days (and be 128kb) before it can move to glacier
S3 -> Glacier - can move here directly after creation if needed
S3 -> Permenantly Delete - yep, cool stuff this.
Can’t move from SA-IA -> RRS or S3
Move to Glacier? if 128k (directlyAfterCreation == true daysInS3-IA > 30 days)
- Move from Glacier? nope.
Bucket Names & Such
Bucket names can not start with a ‘.’ or ‘-‘ and cannot be formatted like an IP address.
Files must be stored in buckets and buckets are a universal namespace.
Using a sequential prefix, such as time-stamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition. If you introduce some randomness in your key name prefixes the I/O load will be distributed across more than one partition. You can use this approach to get 3.5k PUT/COPY/POST/DELETE and 5.5 GET/HEAD requests per second per prefix.
S3 also has a byte-range fetch which parallelizes downloads… this can be used for speeding the downloads of an entire file OR partial amounts of a file.
KMS decryption has a hard limit for the number of requests possible and the number is region specific; could be 5,500, 10K or 30K per second.
S3 & Glacier Select
S3 select and Glacier select enables the retrieval of data using simple SQL statements and is similiar to Athena except for a single file. This enables you to download just the data you need from a CSV without downloading the entire file.
S3 Website Hosting
The minimium steps to make a website on S3 is to create an index document to your S3 bucket, enable static website hosting in your S3 bucket properties, select the ‘Make Public’ permission for your bucket’s objects or apply a bucket policy. The only allowed domain prefix when creating Route 53 Aliases for S3 static websites is the ‘www’ prefix but you can create Aliases that point to the www alias.
S3 buckets do not have https
CORS rules - The bucket hosting the assets needs CORS configuration - add the domain of the “static” site to fix this up.
S3 Transfer Acceleration
Edge Location used by CloudFront are not just for download acceleration. S3 Transfer Acceleration uses the AWS backbone to speed up uploads. This costs more money. There is a distinct URL that syncs from the CloudFront edge locations to S3 bucket.
Cross region replication
Cross region replication requires versioning (on both ends) and an IAM role setup. Once replication is turned on, only new/modified objects are replicated; delete markers are not replicated but public permissions are replicated between buckets.
S3 Object Lock
Object lock is available for S3 and Glacier enabling you to store objects in a write once, read many times model (WORM). There are two modes: governance mode and compliance mode. Governance mode is for when you want to protect data for a period of time but want admin control to be able to over-ride the protection. Compliance mode is protected until the retention date has passed.
Object lock has a retention period, from 1 to no max # of days, and a legal hold. The legal hold must be explicitly removed to change the file.
Multi Part Upload
Multi-part upload API allows you to upload parts of an object once broken apart. Multipart uploads are recommended for files >100Mb and required for files over 5 GB.
As a file/object is being created, the multi-part upload API will allow you to upload the file to S3. Only after all parts of the object have been uploaded do you execute the CompleteMultipartUpload API call which completes a multi-part upload by assembling previously uploaded parts.
You first initiate the multi-part upload and then upload all parts using the Upload Parts operation (see Upload Part). After successfully uploading all relevant parts of an upload, you call this operation to complete the upload. Upon receiving this request, Amazon S3 concatenates all the parts in ascending order by part number to create a new object. In the Complete Multi-part Upload request, you must provide the parts list. You must ensure the parts list is complete, this operation concatenates the parts you provide in the list. For each part in the list, you must provide the part number and the ETag header value, returned after that part was uploaded.
S3 API Reference Points
Bucket API (s3api)
Seeing that S3 is a REST API, all the operations… well, they are REST.
|S3 Bucket API Method||Notes|
|DELETE||Bucket, CORS, website, etc.|
|get/put bucket-versioning||enable/suspend bucket-versioning|
|get/put bucket ACL||Bucket, CORS, website, etc.|
Object API (s3)
This section of the API is REST-like and almost mirrors the linux command line.
|S3 Object API||Notes|
||single file or using a DELETE Object|
||Object, Object ACL, Object Torrent|
||get meta data (might returns 404 or 403 error)|
||writes an object to S3 from a form (can also copy the file)|
||make bucket & remove bucket|
||creates a website|
||move bucket & remove|
Return Codes from API
200 code is success!… but there are other codes that are NOT success. Common Errors are h handled with HTTP response codes - which makes sense seeing that we are using a REST API here.
|S3 API HTTP Responses||Notes|
||AccessDenied or AccountProblem|
||BucketNotEmpty - you tried to delete a bucket that was not empty.|
||InvalidBucketName, InlineDataTooLarge, InvalidPart/InvalidPartOrder, TooManyBuckets|