Caching is a complicated thing to implement well. Time-to-live, cache expiration and other issues make caching strategies difficult.
The Lazy Loading caching strategy is where the application looks at the cache first, either finds the data or goes to the source and then adds the record to the cache. If there is a node failure it’s replacement will fill in due time so no big dealio.
The Write Through caching strategy updates the cache when the data is written to the DB so data is never stale or need update. Keeping all nodes up to date, high latency because of multiple writes per record, and the storage of some data that might be infrequently accessed are all problems with this strategy.
The easiest way to implement caching on AWS is ElastiCache , a managed service the provides caching services for apps. There are two engines available from ElastiCache: Redis and memecached. Reserved instances are a great choice here; spot instances are not.
memcached vs Redis
memecached | Redis | |
---|---|---|
Use case complexity | low | high |
Threading | multi | single |
Scaling | horizontal | vertical |
AZs | single | multi |
replication? | nope | yep |
fail-over | nope | yep |
persist it | nope | yep |
pub-sub | nope | yep |
auto-discovery | yep | no |
memcached
multithreaded; and performs well up to 90% utilization then increase size of node or # of nodes
Metric | Description | Solution |
---|---|---|
CPU utilization | good up to 90% utilization | increase size of node or # of nodes |
evictions | # records ejected from cache | larger instances and # of nodes |
CurrConnections | # app to memcached connections | likely an application problem with no closing connections |
SwapUsage | should be 0-50MB | increase node size; increase ConnectionOverhead (decrease memory for caching data) |
Redis
Single threaded; generally scale UP with larger instances by snap-shotting and increasing the instance size OR scale out with more READ replicas which use aysnc replication. Automatic and manual snapshots work like RDS with storage in S3; snap-shotting read replicas is a great idea because snap-shotting will degrade performance.
Metric | Description | Solution |
---|---|---|
CPU utilization | threshold: 90 / # of CPU cores | read heavy: read replicas; write heavy: larger cache instance |
evictions | # records ejected from cache | more nodes |
CurrConnections | # app to redis connections | likely an application problem with no closing connections |
Monitoring
What Elasticache Metrics to Monitor is a giant question and largely based on which caching engine is in use.
Triage
- Simple use case, non-persistent data, horizontal scaling (shard), multi-threaded? memecached
- Complex? Redis
- Multi-AZ with failover? Redis
- HA? Redis
- Read replicas? Redis
- Backup and Restore? Redis