Glue
Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development. There are three main parts to Glue: a serverless ETL function, the Glue Data Catalog, and Glue Data Crawler which builds the Data Catalog.
Glue Data Catalog
The Glue Data Catalog includes databases, tables, and connections information to most AWS data services. The catalog can be used by Athena, Redshift Spectrum, and EMR.
Glue Data Crawler
Discovers and defines the schema across many different (including S3, JDBC, DynamoDB, DocumentDB/MongoDB) and builds the Glue Data Catalog. It also includes a scheduler and can also use EventBridge as a scheduler.
ETL Function
The ETL function includes jobs, triggers, and workflows. Glue can convert data into Parquet or ORC format.
DataBrew
AWS Glue DataBrew is a visual data preparation tool that enables users to clean and normalize data without writing any code.
Triage
Custom Python code? SageMaker data wrangler
ETL with no code? DataBrew