Recently, I worked on building an automated end-to-end ELT pipeline using Snowflake and Amazon Web Services S3 to process event-level website analytics data from the Amplitude Export API. Throughout the project, several Snowflake-native concepts played a key role in making the pipeline scalable, automated, and incremental. From Snowpipe handling continuous ingestion from S3, to Streams enabling efficient incremental processing, Stored Procedures orchestrating transformation logic, and Tasks automating scheduled execution, each component solved a specific challenge within the pipeline. Below is an overview of the main concepts used, what they are, and how they helped streamline the overall data flow architecture.
Snowpipe
What it is
Snowpipe is Snowflake’s continuous data ingestion service used to automatically load new files from cloud storage into Snowflake.
Main purpose
Automate incremental file ingestion.
Common use cases
- Loading JSON exports from APIs
- Ingesting clickstream/event data
- Continuous loading from Amazon Web Services S3
- Near real-time ingestion pipelines
Why use it
- Automatically detects new files
- Only loads new files
- Reduces manual ingestion work
- Event-driven architecture using S3 notifications
Streams
What it is
A Stream tracks changes made to a table, such as newly inserted rows.
Main purpose
Enable incremental processing.
Common use cases
- Detecting new records after ingestion
- Incremental ETL/ELT pipelines
- Processing only changed data instead of full tables
- Change Data Capture (CDC)-style workflows
Why use it
- Avoids full table scans
- Reduces compute usage
- Makes pipelines scalable
- Allows downstream transformations to process only new data
Stored Procedures
What it is
A Stored Procedure is a reusable block of SQL logic that executes multiple SQL statements sequentially.
Main purpose
Orchestrate transformation workflows.
Common use cases
- Multi-step ELT pipelines
- Data transformations
- Loading dimension and fact tables
- Business rule execution
- Data quality workflows
Why use it
- Centralises pipeline logic
- Reusable and maintainable
- Executes multiple dependent steps in order
- Easier automation and scheduling
Tasks
What it is
A Task is Snowflake’s scheduling and automation feature used to run SQL or Stored Procedures automatically.
Main purpose
Automate execution timing.
Common use cases
- Daily ETL refreshes
- Scheduled data transformations
- Triggering procedures when streams contain new data
- Pipeline orchestration
Why use it
- Removes manual execution
- Supports CRON scheduling
- Can trigger based on stream activity
- Enables fully automated pipelines
