Ready to go template for Snowpipe Implementation for AWS

Introduction :  

Snowpipe enables loading data from files as soon as they’re available in a stage. This means you can load data from files in micro-batches, making it available to users within minutes, rather than manually executing COPY statements on a schedule to load larger batches. 

This blog focus on Data ingestion from AWS Cloud Storage into Snowflake table through the Snowpipe. 

Prerequisites: 

The first step is setting up an account on these platforms If you don’t have one. 

Here are the links for signing up for the trial account: 

Steps to follow:  

  1. Create S3 bucket in the AWS account and upload files. 
  1. Create IAM role to access the bucket in AWS account. 
  1. Create Policies in AWS. 
  1. Create storage integration  and exchanging ARN between two accounts. 
  1. Create External stage and  Snowpipe in snowflake account. 
  1. Configure event notification in the AWS account. 

Creating S3 Bucket in AWS and upload files: 

  • Once the account setup is done in AWS, Login into AWS, In the search bar Go to S3. 
  •  Select create bucket option. 
  • Specify the bucket name and AWS region for your bucket. 
  • Scroll down and uncheck the “Block all public access” option and check “I acknowledge that the current settings might result in this bucket and the objects within becoming public.”
  •  Select create bucket. 
  • Once you successfully created the bucket open the bucket you created. 
  • Create the folder inside the bucket using Create folder option. 
  • Give the folder name in folder name box and click Create Folder. 
  • Once you successfully created the folder open the folder you created. 
  • Click the upload option to upload the file. 
  • Click Add files and upload the file from local system and check the uploaded file and click Upload. 

Create role in AWS: 

  • In the search bar Go to IAM. 
  • In the left navigation pane click on Roles. 
  • Click the Create role option. 
  • Check the AWS account option then enable the Require external ID box and give “0000” as external ID and click Next. 
  • Search AmasonS3FullAccess permission policies then select the permission and click next. 
  • Mention the role name in Role name box and click Create Role. 

Create Policies in AWS: 

  • In the search bar Go to IAM. 
  • In the left navigation pane click on Policies and Create Policy. 
  • Click JSON file format tab. 
  • Paste the below code in policy editor and click create policy. 
  • Make sure to replace bucket and prefix with your actual bucket name and folder path prefix. 

Create storage integration in SNOWFLAKE and exchanging ARN between two accounts: 

  • Copy the ARN credentials from AWS and specify it in the storage integration property (STORAGE_AWS_ROLE_ARN) and mention the bucket path in STORAGE_ALLOWED_LOCATIONS property. 

CREATE OR REPLACE
STORAGE INTEGRATION STORAGE_INTEGRATION_DEMO TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = ‘S3’ ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = ‘arn:aws:iam::*:role/snowflake-mar1’
STORAGE_ALLOWED_LOCATIONS = (‘s3:////’);

   

  • Run the below comment in snowflake to retrieve the (STORAGE_AWS_IAM_USER_ARN) and (STORAGE_AWS_EXTERNAL_ID ) from STORAGE INTEGRATION for the AWS IAM role. 

DESCRIBE INTEGRATION STORAGE_INTEGRATION_DEMO; 

Follow the instructions to configure IAM access permissions for the snowflake in your AWS console so that you can use a s3 bucket to access data: 

  • Login to the AWS console and select IAM services and then select the role that you have created. 
  • Select Trust relationship tab and click edit trust policy. 
  • Modify the policy document with the DESC STORAGE INTEGRATION output values you recorded for STORAGE_AWS_IAM_USER_ARN and STORAGE_AWS_EXTERNAL_ID. 

Create external stage and file format in snowflake account: 

  • TO CREATE DATABASE AND SCHEMA IN SNOWFLAKE. 

Data base => CREATE OR REPLACE DATABASE SNOWFLAKE_DB_AWS; 

Schema    => CREATE OR REPLACE SCHEMA SNOWFLAKE_DB_AWS .STAGE_SCHEMA; 

  • For creating external stage, specifying the s3 bucket URL and STORAGE INEGRATION name and file format is optional. 

CREATE OR REPLACE FILE FORMAT SNOWFLAKE_DB_AWS.PUBLIC.SNOWFLAKE_FILE_FORMAT
TYPE = CSV
SKIP_HEADER = 1
TRIM_SPACE = true
FIELD_OPTIONALLY_ENCLOSED_BY = ‘”‘
REPLACE_INVALID_CHARACTERS = TRUE
NULL_IF = ‘NULL’
DATE_FORMAT=”DD-MM-YYYY”,
TIME_FORMAT=”MM-DD-YYYY HH24:MI”,
TIMESTAMP_FORMAT=”MM-DD-YYYY HH24:MI”;

Creating STAGE in Snowflake:

  • Mention the STORAGE_INTEGRATION and FILE_FORMAT details in the respected stage properties.

CREATE OR REPLACE STAGE SNOWFLAKE_DB_AWS.STAGE_SCHEMA.STAGE_TABLE
URL = ‘s3://snowflake-mar1/Employee/’
STORAGE_INTEGRATION = STORAGE_INTEGRATION_DEMO
FILE_FORMAT = SNOWFLAKE_FILE_FORMAT

Create PIPE in snowflake account:

To load the data from stage to table create the pipe through below steps.

  • Create the table to load the data from snowflake stage.
  • And use the below script to create the pipe and mention the PATTERN.

CREATE OR REPLACE PIPE SNOWFLAKE_DB_AWS.RAW_SCHEMA.RAW_PIPE
AUTO_INGEST = true as
copy into SNOWFLAKE_DB_AWS.RAW_SCHEMA.RAW_TABLE From
SNOWFLAKE_DB_AWS.STAGE_SCHEMA.STAGE_TABLE
PATTERN = ‘.Employees.*.csv’;

Configure event notification in the AWS account:

  • Even notification is used to notify the pipe to load the updated file from s3 bucket.
  •  By running the DESCRIBE PIPE command, you will get the notification channel ARN.
  • Navigate to the s3 bucket your file upload folder properties in AWS console.
  • Scroll down and move to the event notification session and then click create event notification.
  • In the general configuration session, specify event name and suffix.
  • Under the event type, check the All object create events box.
  • Under the Destination, select option SQS queue and enter the SQS queue ARN from the snowflake DESCRIBE PIPE pipe name command and click Save Changes.

Conclusion:

This blog helps to ingest data loads automatically from S3 bucket to snowflake destination table using snowpipe integration and amazon event notifications.

Cittabase is a select partner with Snowflake. Please feel free to contact us regarding your Snowflake solution needs. Our snowflake solutions encompass a suite of services for your data integration and migration needs. We are committed to providing personalized assistance and support customized to your requirements.



Leave a Reply