Glue

Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development. There are three main parts to Glue: a serverless ETL function, the Glue Data Catalog, and Glue Data Crawler which builds the Data Catalog.

Glue Data Catalog

The Glue Data Catalog includes databases, tables, and connections information to most AWS data services. The catalog can be used by Athena, Redshift Spectrum, and EMR.

Glue Data Crawler

Discovers and defines the schema across many different (including S3, JDBC, DynamoDB, DocumentDB/MongoDB) and builds the Glue Data Catalog. It also includes a scheduler and can also use EventBridge as a scheduler.

ETL Function

The ETL function includes jobs, triggers, and workflows. Glue can convert data into Parquet or ORC format.

DataBrew

AWS Glue DataBrew is a visual data preparation tool that enables users to clean and normalize data without writing any code.

Triage

Custom Python code? SageMaker data wrangler