- January 8, 2025
- Posted by: Samuel R
- Categories:
Introduction
This documentation provides a comprehensive guide to the essential processes involved in managing data and services with Snowflake, Git, and SFTP. It covers a series of key tasks ranging from creating roles and warehouses in Snowflake to managing data transfer via SFTP. The following sections outline the steps required for:
- Snowflake Commands: Learn how to create essential Snowflake objects such as roles, warehouses, stages, and image repositories.
- Git Commands: Understand the Git commands needed for version control and code management.
- Creating a Service for an SFTP Server: A step-by-step process to set up an SFTP server service for secure file transfer.
- Getting SFTP Host Name and Port: Instructions on how to retrieve the necessary connection details for your SFTP server.
- Connecting with SFTP Using WinSCP: Detailed guidance on configuring and using WinSCP to connect to your SFTP server.
- Loading Data from SFTP to Snowflake: A tutorial on transferring data from an SFTP server into Snowflake tables efficiently.
By following this guide, users will be able to streamline their workflow, from setting up Snowflake environments and version control to securely transferring and loading data.
Prerequisites
- Docker
- Snowflake – cli
- Git
- Git Bash – to run.sh command in window
- Auth Token for ngrok – Sign up with ngrok console you find the Auth Token generated for your username (subscribe to developer licence)
Architecture
Snowflake Commands:
- Execute the below snowflake command to create Roles, Warehouses, Stage and Image Repositories:
Command | Purpose |
USE ROLE ACCOUNTADMIN; | Sets the current role to ACCOUNTADMIN to perform tasks that require elevated privileges. |
CREATE ROLE IF NOT EXISTS sftp_role; | Creates the sftp_role if it doesn’t already exist, to grant permissions needed for SFTP-related operations. |
CREATE DATABASE IF NOT EXISTS sftp_db; | Creates the sftp_db database if it doesn’t already exist, which will store the schema and objects for SFTP operations. |
CREATE SCHEMA IF NOT EXISTS sftp_schema; | Creates the sftp_schema inside the sftp_db database if it doesn’t already exist, to organize data objects related to SFTP. |
CREATE OR REPLACE NETWORK RULE ngrok_egress_access MODE = EGRESS TYPE = HOST_PORT VALUE_LIST = (‘connect.ngrok-agent.com:443’, ‘crl.ngrok-agent.com:80’); | Creates a network rule ngrok_egress_access allowing outbound access to ngrok endpoints for secure communication in hybrid cloud environments. |
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION ngrok_egress_access_integration ALLOWED_NETWORK_RULES = (ngrok_egress_access) ENABLED = true; | Creates or replaces an external access integration named ngrok_egress_access_integration that links the network rule with Snowflake for ngrok egress access. |
GRANT OWNERSHIP ON DATABASE sftp_db TO ROLE sftp_role COPY CURRENT GRANTS; | Grants ownership of the sftp_db database to sftp_role, copying any existing grants to the role. |
GRANT OWNERSHIP ON SCHEMA sftp_db. sftp_schema TO ROLE sftp_role COPY CURRENT GRANTS; | Grants ownership of the sftp_schema schema to sftp_role, including any existing privileges. |
GRANT USAGE ON INTEGRATION ngrok_egress_access_integration TO ROLE sftp_role; | Grants the sftp_role permission to use the ngrok_egress_access_integration, enabling the role to utilize the network integration for egress access. |
CREATE OR REPLACE WAREHOUSE sftp_warehouse WITH WAREHOUSE_SIZE=’X-SMALL’; | Creates or replaces a virtual warehouse sftp_warehouse with a size of X-SMALL, for resource allocation when executing queries and tasks. |
GRANT USAGE ON WAREHOUSE sftp_warehouse TO ROLE sftp_role; | Grants the sftp_role the ability to use the sftp_warehouse for querying and processing data. |
CREATE COMPUTE POOL sftp_compute_pool MIN_NODES = 1 MAX_NODES = 1 INSTANCE_FAMILY = CPU_X64_XS; | Creates a compute pool sftp_compute_pool with a fixed node count and instance family, for processing tasks such as SFTP file transfers. |
GRANT USAGE, MONITOR ON COMPUTE POOL sftp_compute_pool TO ROLE sftp_role; | Grants the sftp_role the USAGE and MONITOR privileges on the sftp_compute_pool, allowing the role to use and monitor the compute pool. |
GRANT ROLE SFTP_ROLE TO ROLE ACCOUNTADMIN; | Grants the SFTP_ROLE to ACCOUNTADMIN, allowing users with ACCOUNTADMIN to inherit the permissions of sftp_role. |
USE ROLE sftp_role; | Sets the current role to sftp_role, allowing operations to be performed using the permissions of the sftp_role. |
USE DATABASE sftp_db; | Sets the active context to the sftp_db database, so subsequent operations will be performed within it. |
USE SCHEMA sftp_schema; | Sets the active context to the sftp_schema schema, so subsequent operations will be performed within it. |
USE WAREHOUSE sftp_warehouse; | Sets the current warehouse to sftp_warehouse for running queries and tasks that require compute resources. |
CREATE IMAGE REPOSITORY IF NOT EXISTS sftp_repository; | Creates an image repository named sftp_repository if it doesn’t exist, for storing and managing image data related to SFTP operations. |
CREATE OR REPLACE STAGE sftp_stage ENCRYPTION = (type = ‘SNOWFLAKE_SSE’); | Creates or replaces a named stage sftp_stage, enabling server-side encryption (SNOWFLAKE_SSE) for securely loading/unloading data. |
SHOW IMAGE REPOSITORIES; | Displays a list of all image repositories in the account, verifying that the sftp_repository exists or was created successfully. |
2. GIT Commands
Step 1: Open the docker desktop and start the docker Engine – Status should change to running
Step 2: Next open the visual studio code and clone the below git repository
git clone- https://github.com/sfc-gh-bculberson/sftp_server.git
Step 3: Login to the Snowflake account using image repository URL (sftp repository URL which we created in previous step and do not add http:// in the Repository URL)
snow spcs repository_URL login
Step4: Navigate ngrok folder and open build.sh file and replace the ngrok Auth token and run the command
sh build.sh
Step5: Navigate sftp folder and open build.sh file and replace the registry url and run command
sh build.sh
Step6: After the we can see the sftp_host_ras_key is generated, this key file used to login sftp server instead of password
Step7: Next run the docker push command to upload to image repository
docker push repository_URL/sftp_db/sftp_schema/sftp_repository/sftp:multi2
docker -run repository_URL/sftp_db/sftp_schema/sftp_repository/ngrok:multi2
3. Creating Service for SFTP server
Step 1: Run the below command for set the Role, Database and Schema
USE ROLE SFTP_ROLE; | Switches the current role to SFTP_ROLE to ensure that all subsequent commands are executed with the permissions granted to the SFTP_ROLE. |
USE DATABASE SFTP_DB; | Sets the active context to the SFTP_DB database, ensuring that all operations are executed within this database. |
USE SCHEMA SFTP_SCHEMA; | Sets the active context to the SFTP_SCHEMA schema within the SFTP_DB database, ensuring operations are executed within this schema. |
Step2: Run the below script to create the service
Script | Purpose |
CREATE SERVICE sftp_service IN COMPUTE POOL sftp_compute_pool FROM SPECIFICATION $$ spec: containers: – name: sftp image: /sftp_db/sftp_schema/sftp_repository/sftp:multi args: – foo::1001 volumeMounts: – name: stage mountPath: /home/foo/stage – name: ngrok image: /sftp_db/sftp_schema/sftp_repository/ngrok:multi volumes: – name: stage source: “@SFTP_STAGE” $$ EXTERNAL_ACCESS_INTEGRATIONS = (ngrok_egress_access_integration) MIN_INSTANCES=1 MAX_INSTANCES=1; | Creates an SFTP Service: This script defines a service (sftp_service) in Snowflake that runs in a specified compute pool (sftp_compute_pool). The service consists of two containers: one for handling SFTP-related operations (sftp) and another for external access via ngrok (ngrok). The containers are defined with their respective images and runtime arguments. A volume (SFTP_STAGE) is mounted to the sftp container for file storage. The service is configured to allow only one instance of the service to run at a time, and it integrates with ngrok_egress_access_integration for external network access. |
Step3: Use the below statement to call the service
CALL SYSTEM$GET_SERVICE_STATUS(‘sftp_service’);
CALL SYSTEM$GET_SERVICE_LOGS(‘sftp_service’, ‘0’, ‘sftp’, 100);
CALL SYSTEM$GET_SERVICE_LOGS(‘sftp_service’, ‘0’, ‘ngrok’, 100);
ls @SFTP_STAGE;
4. Getting SFTP Host Name and Port
Step1: First Run the SFTP log command
CALL SYSTEM$GET_SERVICE_LOGS(‘sftp_service’, ‘0’, ‘sftp’, 100);
Step2: We will see that sftp server started and listening at localhost which means sftp server has started successfully.
Step3: Second run the ngrok command
CALL SYSTEM$GET_SERVICE_LOGS(‘sftp_service’, ‘0’, ‘ngrok’, 100);
Step4: We get the log files for ngrok , in the we will see the URL which has host name and port number
5. Steps to Connect with SFTP using WINSCP
Step1: Install Winscp to connect with SFTP.
Step2: Enter the Host Name and Port Number that we got in the previous step
Step3: User Name : foo
Step4: Next Click Advanced, Under Authentication tab, Provide the sftp_host_ras_key (after converting this into putty private key – .ppk) and click ok.
Step5: Next Click Login
Step6: Click Accept to add private key
Step7: Logged into the SFTP Server.
6. Loading data from SFTP to snowflake table
- Once the file is place in sftp server the file will be mirrored in sftp_stage that we created in first step.
Next using task scheduler, it can be loaded from stage to snowflake table. Snowflake task scheduler will take care for the incremental files.
- Create the table to load the data
Script | Purpose |
CREATE OR REPLACE TABLE SFTP_DB.SFTP_SCHEMA.COUNTRY_INFO ( COUNTRY_CODE VARCHAR(16777216), SHORT_NAME VARCHAR(16777216), LONG_NAME VARCHAR(16777216), CURRENCY_UNIT VARCHAR(16777216), NATIONAL_ACCOUNTS_BASE_YEAR VARCHAR(16777216), LENDING_CATEGORY VARCHAR(16777216), SYSTEM_OF_NATIONAL_ACCOUNTS VARCHAR(16777216), BALANCE_OF_PAYMENTS_MANUAL_IN_USE VARCHAR(16777216), GOVERNMENT_ACCOUNTING_CONCEPT VARCHAR(16777216), SYSTEM_OF_TRADE VARCHAR(16777216), FILE_NAME VARCHAR(16777216), CREATED_ON TIMESTAMP_TZ(9) DEFAULT CURRENT_TIMESTAMP()); | Creates or replaces a table named COUNTRY_INFO in the SFTP_DB.SFTP_SCHEMA schema. This table is designed to store various country-related data such as country code, short and long names, currency unit, national accounts base year, lending category, and other related information, with a timestamp (CREATED_ON) to record when each entry was created. The default timestamp is the current time at the moment of insertion. |
- Create the task for scheduling the load
Script | Purpose |
CREATE OR REPLACE TASK SFTP_TASK WAREHOUSE = SFTP_WAREHOUSE SCHEDULE = ‘5 MINUTE’ AS COPY INTO COUNTRY_INFO FROM @SFTP_STAGE FILE_FORMAT = csv_format; | Creates or replaces a scheduled Snowflake task that copies data from an SFTP stage to the COUNTRY_INFO table every 5 minutes, using a CSV file format. |
Conclusion
In conclusion, this documentation provides a structured approach to managing key processes within Snowflake, Git, and SFTP environments. By following the steps outlined for creating roles, warehouses, stages, and image repositories in Snowflake, as well as utilizing Git for version control, users can ensure a secure, organized, and efficient data management workflow. Additionally, the guide covers the configuration and use of SFTP servers for seamless data transfer and integration with Snowflake, ensuring smooth and reliable data loading from external sources.
By mastering these techniques, teams can optimize their data pipelines, improve collaboration through version control, and leverage Snowflake’s powerful cloud data platform for improved data processing and analysis. This holistic approach simplifies complex workflows and empowers users to efficiently manage and manipulate data within a secure, scalable, and automated ecosystem.