Skip to content

AWS Glue

AWS Glue is more suited for ETL and Integration. It can run in Python Shell mode or Spark. A Glue Job can be triggered using EventBridge.

Features

  • Glue Crawler to catalog data
  • Glue ETL jobs (Spark-based)
  • Bookmarking for tracking processed files
  • Glue Data Quality for validation
  • CloudWatch for monitoring

Pros

  • Built-in bookmarking for state tracking
  • Handles large files efficiently
  • Auto-scaling with DPUs
  • Rich ETL capabilities
  • Supports streaming with Glue Streaming

Cons:

  • Higher minimum cost (10-minute billing minimum)
  • Spark overhead for small files
  • More complex setup