CloudSearch offers many types of search including full text, boolean, prefix and range. In addition, you can use term boosting, faceting, highlighting, and enable autocomplete as well. Normal types of files (html, pdf, MS document) can be searched as can DynamoDB tables.
What is a bit different from other search offering is the data load process. Instead of the system indexing data in a series of paths, data is uploaded to a search domain location defined by CloudSearch then indexed.
Integration with IAM - You control access to the Amazon CloudSearch configuration service APIs and the domain services, which control the use of the domain, APIs independently.
Scaling is automatic based on data and search traffic but can manually scale out as well. Multi-AZ is available as well.
Enable access using access policy (private by default)
Set Instance type, desired replica count (for stuff bigger than
search.m3.2xlarge) and partition count
Upload content in batches of less than 5 mb
aws cloudsearch define-index-field to manually setup index field or
cs-configure-from-batches to automatically setup index fields.
The model is to upload data into CloudSearch, so if data changes it must be resubmitted to CloudSearch. A document batch is a collection of add and delete operations that represent the documents you want to add, update, or delete from your domain. Batches can be described in either JSON or XML… Maximize batches to get update performance and the max size for a batch is 5 mb…
Amazon CloudSearch configuration service APIs and the domain services APIs are independently packaged.
As normal, when accessing via the CLI or an SDK requests are signed and this saves back and forth authentication traffic.
Filtering is efficient and does not contribute to ranking.
There is no way to easily delete all the documents in a domain and must be re-indexed to scale down.
Clusters start up on
search.m3.small instances and scale up for increased load, speed requests, increased size and improving fault tolerance. Multi-AZ increases fault tolerance but doubles the cost.
Use manual scaling for data load and query spikes and realize that setting that
update-scaling-parameters sets the baseline.
ElasticSearch vs CloudSearch
|HA||Single AZ||Multi AZ|
507errors will occur if the batch size is at a rate too high or too large; Use the CLI for batches bigger than 5 mb.
507errors can also be a general service overload condition; scale out manually
409errors are generally service resource limits; contact AWS
Reduce hit size by querying after 2 characters in the UI, use stopwords list