Introduction

This documentation provides a step-by-step guide for setting up and configuring the essential components in your workflow. It covers the following tasks necessary for deploying and managing a service:

  • Creating Role and Warehouse
  • Creating Compute pool
  • Creating Image Repository and Stages
  • Creating Secrets
  • Building Docker file
  • Generating YAML File
  • Creating a service
  • Service Restart
  • AIRFLOW URL
  • Installing Additional Packages

By following these steps, you’ll have a fully configured and operational service ready for use, with all necessary components set up for development, deployment, and management.

Prerequisites

  • Install Docker
  • visual studio
  • git
  • python (Install and Open using run as administrator)

Architecture

1.Creating Role and Warehouse

  • Define roles and set up your data warehouse to organize resources and manage access.
                            Command                                            Purpose
USE ROLE ACCOUNTADMIN;Switch to the ACCOUNTADMIN role to have sufficient privileges to create roles, databases, etc.
CREATE ROLE airflow_admin_rl;Create a new role named airflow_admin_rl.
CREATE DATABASE IF NOT EXISTS airflow_db;Create a new database named airflow_db if it does not already exist.
GRANT OWNERSHIP ON DATABASE airflow_db TO ROLE airflow_admin_rl COPY CURRENT GRANTS;Grant ownership of airflow_db to the airflow_admin_rl role, along with all current grants.
CREATE OR REPLACE WAREHOUSE airflow_wh WITH WAREHOUSE_SIZE=’X-SMALL’;Create or replace a warehouse named airflow_wh with size X-SMALL.
GRANT USAGE ON WAREHOUSE airflow_wh TO ROLE airflow_admin_rl;Grant usage privileges on the airflow_wh warehouse to the airflow_admin_rl role.

2.Creating Compute pool

Set up a compute pool for efficient resource management and scaling.

CommandPurpose
USE ROLE ACCOUNTADMIN;Switch to the ACCOUNTADMIN role to ensure sufficient privileges to create compute pools and assign grants.
CREATE COMPUTE POOL postgres_redis MIN_NODES = 1 MAX_NODES = 1 INSTANCE_FAMILY = CPU_X64_XS;Create a compute pool named postgres_redis with 1 minimum and 1 maximum node, using the CPU_X64_XS instance family
CREATE COMPUTE POOL airflow_server MIN_NODES = 1 MAX_NODES = 1 INSTANCE_FAMILY = CPU_X64_S;Create a compute pool named airflow_server with 1 minimum and 1 maximum node, using the CPU_X64_S instance family.
CREATE COMPUTE POOL airflow_workers MIN_NODES = 1 MAX_NODES = 2 INSTANCE_FAMILY = CPU_X64_S;Create a compute pool named airflow_workers with 1 minimum node and 2 maximum nodes (for auto-scaling), using CPU_X64_S.
GRANT USAGE, MONITOR ON COMPUTE POOL postgres_redis TO ROLE airflow_admin_rl;Grant USAGE and MONITOR privileges on the postgres_redis compute pool to the airflow_admin_rl role.
GRANT USAGE, MONITOR ON COMPUTE POOL airflow_server TO ROLE airflow_admin_rl;Grant USAGE and MONITOR privileges on the airflow_server compute pool to the airflow_admin_rl role.
GRANT USAGE, MONITOR ON COMPUTE POOL airflow_workers TO ROLE airflow_admin_rl;Grant USAGE and MONITOR privileges on the airflow_workers compute pool to the airflow_admin_rl role.

3.Creating Image Repository and Stages

Establish an image repository and configure stages to manage your deployment of pipelines.

             CommandPurpose
USE ROLE airflow_admin_rl;Switch to the airflow_admin_rl role to ensure the correct privileges for creating schemas, stages, and repositories.
USE DATABASE airflow_db;Switch to the airflow_db database to work within it for subsequent commands.
USE WAREHOUSE airflow_wh;Switch to the airflow_wh warehouse to perform operations that require compute resources.
CREATE SCHEMA IF NOT EXISTS airflow_schema;   Create the schema airflow_schema if it doesn’t already exist in the airflow_db database.
USE SCHEMA airflow_db.airflow_schema;Switch to the airflow_schema schema within the airflow_db database to perform operations on objects in this schema.
CREATE IMAGE REPOSITORY IF NOT EXISTS airflow_repository;Create an image repository named airflow_repository if it doesn’t already exist in the current schema.
CREATE STAGE IF NOT EXISTS service_spec DIRECTORY = ( ENABLE = true );Create a stage named service_spec with directory support enabled if it doesn’t already exist. This stage is used for data loading.
CREATE STAGE IF NOT EXISTS airflow_logs ENCRYPTION = (TYPE = ‘SNOWFLAKE_SSE’);Create a stage named airflow_logs with Snowflake’s server-side encryption (SNOWFLAKE_SSE) if it doesn’t already exist.
CREATE STAGE IF NOT EXISTS airflow_dags ENCRYPTION = (TYPE = ‘SNOWFLAKE_SSE’);Create a stage named airflow_dags with Snowflake’s server-side encryption (SNOWFLAKE_SSE) if it doesn’t already exist.
SHOW IMAGE REPOSITORIES IN ACCOUNT;Show all image repositories in the Snowflake account to verify and list available image repositories.

4.Creating Secrets

Securely store and manage sensitive information such as API keys and passwords.

CommandPurpose
USE ROLE airflow_admin_rl;Switch to the airflow_admin_rl role to ensure the correct privileges for creating secrets and network rules.
USE SCHEMA airflow_db.airflow_schema;Switch to the airflow_schema schema within the airflow_db database to perform operations on objects in this schema.
CREATE OR REPLACE SECRET airflow_fernet_key TYPE = password username = ‘airflow_fernet_key’ password = ‘Jd82T9oHtxOPIYfio7g0ThaW_i8It-e0GABfDzBQoCo=’;Create or replace a secret named airflow_fernet_key of type password to securely store a Fernet encryption key for Airflow.
CREATE SECRET airflow_postgres_pwd TYPE = password username = ‘postgres’ password = ‘admin123’;Create a secret named airflow_postgres_pwd to securely store the password for the postgres user.
CREATE SECRET airflow_redis_pwd TYPE = password username = ‘airflow’ password = ‘admin123’;Create a secret named airflow_redis_pwd to securely store the password for the airflow user for Redis.
USE ROLE airflow_admin_rl;Switch to the airflow_admin_rl role again to ensure proper privileges to create and manage network rules.
USE SCHEMA airflow_db.airflow_schema;Switch to the airflow_schema schema again within airflow_db to create the network rule.
CREATE OR REPLACE NETWORK RULE airflow_spcs_egress_rule TYPE = ‘HOST_PORT’ MODE= ‘EGRESS’ VALUE_LIST = (‘0.0.0.0:443′,’0.0.0.0’);Create or replace a network rule named airflow_spcs_egress_rule to allow outbound egress traffic (port 443 for HTTPS).
GRANT USAGE ON NETWORK RULE airflow_spcs_egress_rule TO ROLE accountadmin;Grant USAGE privileges on the airflow_spcs_egress_rule network rule to the accountadmin role.
USE ROLE accountadmin;Switch to the accountadmin role to have sufficient privileges to create external access integrations.
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION airflow_spcs_egress_access_integration ALLOWED_NETWORK_RULES = (airflow_db.airflow_schema.airflow_spcs_egress_rule) ENABLED = true;Create or replace an external access integration named airflow_spcs_egress_access_integration with the specified allowed egress rule.
GRANT USAGE ON INTEGRATION airflow_spcs_egress_access_integration TO ROLE airflow_admin_rl;Grant USAGE privileges on the airflow_spcs_egress_access_integration to the airflow_admin_rl role.

5.Building Docker file

Create a Docker file to containerize your application and ensure consistency across environments

Step 1: Open the Visual Studio and execute the following command to clone the git repository.

git clone https://github.com/Snowflake-Labs/spcs-templates.git

Step 2: Change the directory to spcs-templates\airflow-spcs and run the following docker build commands to build docker image from Dockerfile

docker build  –rm –platform linux/amd64 -t xxxxxxx-xxxxxxx.registry.snowflakecomputing.com/airflow_db/airflow_schema/airflow_repository/airflow:2.7.3 -f ./airflow/airflow/Dockerfile .

docker build  –rm –platform linux/amd64 -t xxxxxxx-xxxxxxx.registry.snowflakecomputing.com/airflow_db/airflow_schema/airflow_repository/postgres:14.10 -f ./airflow/postgres/Dockerfile .

docker build  –rm –platform linux/amd64 -t xxxxxxx-xxxxxxx.registry.snowflakecomputing.com/airflow_db/airflow_schema/airflow_repository/redis:7.0 -f ./airflow/redis/Dockerfile .

Step 3: To authenticate or log in to a Docker Registry, run the following command:

login xxxxxxx-xxxxxxx.registry.snowflakecomputing.com -u DIXXX (it will prompt for the password)

Step 4: Next, to push Docker images to a Docker registry, run the following commands:

docker push xxxxxxx-xxxxxxx.registry.snowflakecomputing.com/airflow_db/airflow_schema/airflow_repository/airflow:2.7.3

docker push xxxxxxx-xxxxxxx.registry.snowflakecomputing.com/airflow_db/airflow_schema/airflow_repository/postgres:14.10

docker push xxxxxxx-xxxxxxx.registry.snowflakecomputing.com/airflow_db/airflow_schema/airflow_repository/redis:7.0

6.Generating YAML File

Generate YAML configuration files to define application settings and deployment parameters.

Step 1: Change the directory to spcs-templates\airflow-spcs\airflow\airflow\template and run the following command

Python render_template.py.

The above command will generate the following files

airflow/airflow_server.yaml

                airflow/airflow_worker.yaml

postgres/postgres.yaml

                redis/redis.yaml

Step 2: Upload these YAML  files to service_spec stage that we crated before. (through UI)

7.Creating a service

Deploy your application as a service for scalable and efficient execution.

CommandPurpose
USE ROLE airflow_admin_rl;Switch to the airflow_admin_rl role to ensure the correct privileges for creating services and accessing their status/logs.
USE SCHEMA airflow_db.airflow_schema;Switch to the airflow_schema schema within the airflow_db database to perform operations on objects in this schema.
CREATE SERVICE postgres_service IN COMPUTE POOL postgres_redis FROM @service_spec SPECIFICATION_FILE=’postgres.yaml’ MIN_INSTANCES=1 MAX_INSTANCES=1;Create a service named postgres_service in the postgres_redis compute pool, based on the postgres.yaml specification file with 1 minimum and 1 maximum instance.
SELECT SYSTEM$GET_SERVICE_STATUS(‘postgres_service’);Check the status of the postgres_service to ensure it is running properly.
CALL SYSTEM$GET_SERVICE_LOGS(‘postgres_service’, ‘0’,’postgres’);Retrieve the container logs for the postgres_service (instance 0) for debugging or monitoring.
CREATE SERVICE redis_service IN COMPUTE POOL postgres_redis FROM @service_spec SPECIFICATION_FILE=’redis.yaml’ MIN_INSTANCES=1 MAX_INSTANCES=1;Create a service named redis_service in the postgres_redis compute pool, based on the redis.yaml specification file with 1 minimum and 1 maximum instance.
SELECT SYSTEM$GET_SERVICE_STATUS(‘redis_service’);Check the status of the redis_service to ensure it is running properly.
CALL SYSTEM$GET_SERVICE_LOGS(‘redis_service’, ‘0’,’redis’);Retrieve the container logs for the redis_service (instance 0) for debugging or monitoring.
CREATE SERVICE airflow_service IN COMPUTE POOL airflow_server FROM @service_spec SPECIFICATION_FILE=’airflow_server.yaml’ MIN_INSTANCES=1 MAX_INSTANCES=1 EXTERNAL_ACCESS_INTEGRATIONS = (airflow_spcs_egress_access_integration);Create a service named airflow_service in the airflow_server compute pool, based on the airflow_server.yaml specification file with 1 minimum and 1 maximum instance. Additionally, enable external access integration.
SELECT SYSTEM$GET_SERVICE_STATUS(‘airflow_service’);Check the status of the airflow_service to ensure it is running properly.
CALL SYSTEM$GET_SERVICE_LOGS(‘airflow_service’, ‘0’,’webserver’);Retrieve the container logs for the airflow_service (instance 0) related to the webserver component for debugging or monitoring.
CALL SYSTEM$GET_SERVICE_LOGS(‘airflow_service’, ‘0’,’scheduler’);Retrieve the container logs for the airflow_service (instance 0) related to the scheduler component for debugging or monitoring.
CREATE SERVICE airflow_worker IN COMPUTE POOL airflow_workers FROM @service_spec SPECIFICATION_FILE=’airflow_worker.yaml’ MIN_INSTANCES=1 MAX_INSTANCES=2 EXTERNAL_ACCESS_INTEGRATIONS = (airflow_spcs_egress_access_integration);Create a service named airflow_worker in the airflow_workers compute pool, based on the airflow_worker.yaml specification file with 1 minimum and 2 maximum instances, with external access integration enabled.
SELECT SYSTEM$GET_SERVICE_STATUS(‘airflow_worker’);Check the status of the airflow_worker service to ensure it is running properly.
CALL SYSTEM$GET_SERVICE_LOGS(‘airflow_worker’, ‘0’,’worker’);Retrieve the container logs for the airflow_worker service (instance 0) for debugging or monitoring.
CALL SYSTEM$GET_SERVICE_LOGS(‘airflow_worker’, ‘1’,’worker’);Retrieve the container logs for the airflow_worker service (instance 1) for debugging or monitoring.

8. Service Restart

  • Here, learn how to restart a service with the latest Docker image or updated service specifications after any changes.
  • If you have changed Docker image or changed service specification files, run below command to restart the service with latest changes.
CommandPurpose
USE ROLE airflow_admin_rl;Switch to the airflow_admin_rl role to ensure proper privileges for modifying services.
USE SCHEMA airflow_db.airflow_schema;Switch to the airflow_schema schema within the airflow_db database to perform operations on the services within this schema.
ALTER SERVICE airflow_worker FROM @service_spec SPECIFICATION_FILE=’airflow_worker.yaml’;Alter the airflow_worker service by updating its configuration based on the airflow_worker.yaml specification file.
ALTER SERVICE airflow_worker SET EXTERNAL_ACCESS_INTEGRATIONS = (airflow_spcs_egress_access_integration);Update the airflow_worker service to use the airflow_spcs_egress_access_integration for external access.
ALTER SERVICE airflow_service FROM @service_spec SPECIFICATION_FILE=’airflow_server.yaml’;Alter the airflow_service by updating its configuration based on the airflow_server.yaml specification file.
ALTER SERVICE airflow_service SET EXTERNAL_ACCESS_INTEGRATIONS = (airflow_spcs_egress_access_integration);Update the airflow_service to use the airflow_spcs_egress_access_integration for external access.

9. AIRFLOW URL

  • Understand how to access the Airflow UI to monitor and manage workflows.
  • To retrieve and display the endpoints associated with the service named airflow_service, execute the following command:

SHOW ENDPOINTS IN SERVICE airflow_service;

  • First login with your snowflake account and then Airflow login (usr: airflow and pwd: airflow)

10. Installing Additional Packages

Instructions for installing any required packages to extend the functionality of your environment.

Step 1: Go to spcs-templates\airflow-spcs\airflow\airflow folder , we find the docker file open the docker file

And add RUN pip install <package_name> command

Example

RUN pip install pyarrow==15.0.2

RUN pip install XlsxWriter==3.2.0

RUN pip install cryptography==38.0.4

RUN pip install pyxlsb==1.0.10

Step 2: Run the docker build and push command by change version number Eg: airflow:2.7.3_1

docker build  –rm –platform linux/amd64 -t xxxxxxx-xxxxxxx.registry.snowflakecomputing.com/airflow_db/airflow_schema/airflow_repository/airflow:2.7.3 _1 ./airflow/airflow/Dockerfile .

docker push xxxxxxx-xxxxxxx.registry.snowflakecomputing.com/airflow_db/airflow_schema/airflow_repository/airflow:2.7.3_1

Step 3:

Next Change the image name in Airflow_server.yaml and Airflow_worker.yaml file to  xxxxxxx-xxxxxxx.registry.snowflakecomputing.com/airflow_db/airflow_schema/airflow_repository/airflow:2.7.3 _1

and re upload to the service_spec stage

Step 4:

  • Run the below command in snowflake
CommandPurpose
USE ROLE airflow_admin_rl;Switch to the airflow_admin_rl role to ensure that you have sufficient privileges to perform the operations on the Airflow services.
USE SCHEMA airflow_db.airflow_schema;Switch to the airflow_schema schema in the airflow_db database where the services and configurations are located.
ALTER SERVICE airflow_worker FROM @service_spec SPECIFICATION_FILE=’airflow_worker.yaml’;Modify the airflow_worker service to update its configuration using the airflow_worker.yaml specification file, which defines its setup.
ALTER SERVICE airflow_worker SET EXTERNAL_ACCESS_INTEGRATIONS = (airflow_spcs_egress_access_integration);Set the external access integration for the airflow_worker service to airflow_spcs_egress_access_integration, allowing the service to interact with external systems through defined rules.
ALTER SERVICE airflow_service FROM @service_spec SPECIFICATION_FILE=’airflow_server.yaml’;Modify the airflow_service to update its configuration using the airflow_server.yaml specification file, defining how the service is set up.
ALTER SERVICE airflow_service SET EXTERNAL_ACCESS_INTEGRATIONS = (airflow_spcs_egress_access_integration);Set the external access integration for the airflow_service to airflow_spcs_egress_access_integration, enabling external network communications through specified rules.

Conclusion

In conclusion, the steps outlined in this guide provide a comprehensive approach to setting up, managing, and deploying applications in modern cloud-native environments. By focusing on key aspects such as resource management, security, containerization, you can ensure that your infrastructure remains scalable, secure, and efficient.

From defining roles and managing access with a data warehouse to creating scalable compute pools and organizing deployment pipelines, each stage in the process contributes to a streamlined workflow and reliable application performance. Building Docker containers, managing sensitive data with secrets, and leveraging YAML files for configuration allow for consistency and easy scaling across environments.

Deploying your application as a service, ensuring easy restarts with the latest updates, and utilizing tools like Airflow for workflow orchestration are essential steps to ensure the smooth running of your applications at scale. Additionally, installing the right packages can extend the functionality of your environment, helping you stay adaptable to evolving needs.

By mastering these key areas, you will not only improve the security and efficiency of your deployments but also ensure the continuous delivery and operational stability of your applications. This approach enables you to focus more on innovation while relying on a robust infrastructure to support your business objectives.



Leave a Reply