AWS Elastic Compute Cloud (EC2) is the grand-pappy of cloud computing products. In a nutshell, EC2 enables you to create any number of virtual computers with a huge variety of software preloaded on images.
There are several purchase options for EC2:
- On-Demand Instances - Most expensive and most flexible.
- Spot Instances - you set a max price and if the prices rises about your instance gets nuked; a Spot instance runs as long as its capacity is available and your bid price is higher than the Spot price… so expect that it will get terminated. If Amazon terminates then you don’t get charged for partial; if you terminate you DO get charged. Persistent instances always get recreated after termination, given the price is actionable, while one-time only run the first time the price is actionable.
- Dedicated Hosts - an entire physical server in an AWS datacenter dedicated to your use with targeted placement; enables visibility of cores, sockets and host ID and enables you to run MS, SUSE, REHL and other software bound to VMs, sockets or physical cores. Detailed reporting for the instance is provided by AWS Config. Use the
--affinity hostCLI command to make sure placement is on the dedicated host.
- Dedicated Instances - Instances that run on hardware that is dedicated to a single user.
- Free Instance - free, yet small, and mostly
On-Demand Capacity Reservation
On-Demand Capacity Reservation is an EC2 offering that lets you create and manage reserved capacity on Amazon EC2. You can create a Capacity Reservation by choosing an Availability Zone and quantity (number of instances) along with other instance specifications such as instance type and tenancy. Once created, the EC2 capacity is held for you regardless of whether you run the instances or not.
Reserved Instances have changed a lot over the last several years. You can still get a contact for RI with 1 or 3 year terms and the best deal is still the all-up-front payment method. There is a marketplace for selling them if needed. Easy to renew; simply queue up an RI purchase whenever the previous one expires.
- Scheduled RI - predictable reserved capacity for parts of a day, week or month
- Standard RI - up to 75% off on-demand pricing; it is possible to modify the AZ, network platform, or instance size not instance type or region; changes can be made to the RI based if the Normalization Factor, which is the “Footprint” of the instance, is the same or less. Often is a better approach to buy more capacity than you need and distribute it across instances. Does not apply to SUSE or RHEL.
- Convertible RI - up to 45% off on-demand pricing but can be converted to the same or more expensive instance type and change OS, tenancy & payment options but limited to a specific region AND a three year term. Convertible RI can’t sell on the Reserve MarketPlace yet.
Each instance type offers different compute, memory, and storage capabilities and are grouped in instance families based on these capabilities. Instances can also be classified as burstable or consistent. The new mnemonic is DR. MCGIFT PX
|R||RAM||Memory Intensive Apps/DB|
|C||Compute||CPU intensive apps|
|G||Graphics||Video Encoding/3D Apps|
|T||Burstable Cheapness||Web servers/Small DB|
|P||Picts (graphics)||Machine Learning, Bitcoin|
|X||Extreme memory||HANA, Spark|
Useful for combining on-demand and spot instances and can have a mix of instances types and support EC2, ECS, ASG, and Batch. Fleets are created from Launch Pools (instance type, OS, AZ) using a capacity or max cost limit. Spot instances can be selected by
Placement groups have become more complicated; there are three types now! And one critical thing has changed - you can have multiple instances types in a group and you can add an existing instance, in a stopped state, to a placement group using the CLI
Clustered Placement groups
A Clustered group is a logical group of instances within a single AZ that participate in a low latency 10Gbps network. The name for the group must be unique within your account and only certain types of instances can be launched in a placement group. Ideally only ONE type of instance per placement group. Placement groups can span subnets and VPC; in the case of spanning VPC the placement group may not have full bandwidth. Use Enhanced Networking instance types!
Partitioned Placement groups
Partitioned placement groups are designed to have can have multiple instances per rack across many racks. Think: multiple instances per rack for use cases like HDFS, HBase and Cassandra.
Spread Placement groups
Spread Placement Groups are on distinctly different hardware and can be deployed across availability zones since they spread the instances further apart. Think: one critical EC2 instance per rack. Limited to 7 instances per AZ per placement group.
Enhanced networking uses SR-IOV (which reduces CPU interrupts therefore improving overall performance). Some instances use Intel 82599 Virtual Function (10 Gbps) or Elastic Network Adapter (20 Gbps). This feature requires both an HVM AMI and the kernel module ixgbevf. Amazon Linux has ixgbevf on by default but other distros require configuration. Generally HPC requires enhanced networking require jumbo frames (up to 9000 bytes of data) instead of ethernet frames of 1500 bytes.
EBS-optimized instances enable you to get consistently high performance for your EBS volumes by eliminating contention between Amazon EBS I/O and other network traffic from your instance with options between 500 Mbps and 12,000 Mbps, depending on the instance type you use. The last several generations of instances (T3+, M4+, C4+) have this on by default. The EBS volume comes from a snapshot of an EBS volume.
Instance storage is physically attached to the host but ephemeral; Root devices are deleted by default with the instance is terminated but this can be changed. Instance storage is loaded from a template in S3 and can only be created at the time of launch. EC2 with instance storage can not be stopped.
Hibernation stores the contents of RAM to the EBS root volume. Hibernation is available for instances with less than 150 GB on Windows, AL2, and Ubuntu. When restarted in the next 60 days, the instance retains the same data volumes and instance ID. This must be enabled at instance creation (so additional storage is allocated on the root volume and the root volume can/must be encrypted) and is basically an additional Stop behavior.
Tenancy for instances is follows:
- Default - shared hardware; can’t be changed after launch
- Dedicated - single tenant hardware; can be changed to Host
- Host - Instance runs on dedicated host; can be changed to Dedicated
If the VPC is set up dedicated, then all instances are dedicated; can’t be changed at launch. In this case, the tenancy of an instance can only be change between variants of dedicated tenancy hosting.
Performance for EBS is measured in IOPS which is a complicated metric. For SSDs, IOPS are measured in 256 Kibibytes, because they are so good a small, non-sequential reads. For HDDs, IOPS are measured in 1024 Kibibyte chunks, because the have better performance with large sequential reads.
The root device on an instance can be encrypted. Previously you had to create an instance, encrypt the volume while snap shotting the volume, create an image using the snapshot THEN relaunch an instance using that image. Now all you need to do is set the encryption flag when you create the instance.
Multiple IP addresses can be assigned to an interface for use cases including multiple SSL certs (one IP required per cert), might need to move one of the addresses, multiple subnets.
SG are attached to an interface NOT an IP address. You can’t move a primary interface but can move secondary interfaces. ENI have a fixed MAC address.
HVM - Emulates a bare-metal experience for virtualization; originally very slow but improved after CPU vendors added support for virtualization instructions and soon network & I/O as well. Enhanced networking features generally require HVM. Can run for Windows.
PV - Short for ParaVirtualization is the previous version of virtualization. Uses a special boot loader called PV-GRUB and does not run Windows. In general, poor performance is the name of the game yet attractive spot pricing can be attractive.
Status checks run every 5 minutes. If an instance fails a status check it is marked as
- System Status Checks - Checks the host. Generally networking, power, hardware and system software problems that can be resolved by stopping and restarting the instance or contacting AWS. Stop and start the VM to fix this issue OR configure the CloudWatch alarm monitor
StatusCheckFailed_Systemto recover the instance.
- Instance Status Checks - Checks the instance. Flags configuration problems, out of memory, corrupted file system, or incompatible kernel. AWS does NOT help you with these sort of problems. Reboot the machine to fix these issues.
EC2 instances report CloudWatch metric every 5 minutes; selecting the “Enable CloudWatch detailed monitoring” to see the metrics in 1 minute intervals. Even if you have more granular data, CloudWatch rolls the data up into 1 minute intervals. Unlike many instance options, you can enable this feature after the instance starts. Autoscaling group metric aggregation is also reportable using detailed monitoring.
CPUUtilization, network (in/out), disk (read/write - ops and bytes) and status checks are reported to CloudWatch by default. RAM & disk space is a custom metric.
- CPUUtilization - duh
- DiskReadOps & DiskWriteOps - total operations (not bytes) to and from disk (instance store)
- DiskReadBytes & DiskWriteBytes - total bytes to and from the disk (instance store)
- NetworkIn & NetworkOut - total bytes to and from the instance over the wire
- NetworkPacketsIn & NetworkPacketsOut - number of packets in and out instead of bytes
- StatusCheckFailed_Instance & StatusCheckFailed_System - has anything failed an instance or system check?
- StatusCheckFailed - combination of both status checks… 1 = yes; 0 = no
EC2 Metric Dimensions & Custom Metrics
These metrics, except for AutoScalingGroupName, are only available with detailed monitoring therefore update every minutes vs the normal 5 minute interval. The metrics include: AutoScalingGroupName, ImageId, InstanceId, InstanceType
The big reason to use custom metrics and to push metrics to cloudwatch in general is the aggregation of data in CloudWatch instead of all over the place.. EC2 instances need a role to write information to CloudWatch.
EC2 Image Builder
Automates the creation, maintenance, validation, and testing of AMIs (& containers) typically happens as part of a CI/CD pipeline; testing happens at runtime. Has the ability to distribute the AMI across accounts and regions.
EC2 Instance Connect
- Set up SG with allow list for the Instance Connect service IPs on port 22
- Fire up instance with EC2 Instance Connect installed (Amazon Linux 2, UbuntU 20.04 have it preinstalled)
ec2-instance-connect:SendSSHPublicKeypermissions to push key to instance
- Fire up console or EC2 Instance Connect CLI
- Push SSH Public Key valid for 60 second using SendSSHPublicKey API
Placement for HPC? Clustered (all together) Placement for HDFS, Hadoop? Partitioned (separated clusters) Placement for HA? Spread (1 instance per rack) Cover peak load requirements (fleet/master & core instance groups as On-Demand)? Spot Instances using diversified allocation strategy